Reduce a bit the 750Ti speed but improve a lot the 9xx speed. Keep compat for SM 3/3.5 in a second file.. Note: With this code and Cuda 7.5, the speed won is the reverse... May be "reverted" soon