Original Commit :
Removed sharedmem and reduced calculations with precalcing (ECHO hash).
750ti + 20KHASH(x11)
tpruvot notes:
Real change is more of 10 KH/s on stock clocks (but real)
launch bounds disabled, no perf increase with 64 registers
echo : 40.056ms -> 39.241ms
cube : 14.490ms -> 13.511ms
cube hash change look like useless (__device__ code in generally inlined)
but the reality proves that cuda documentation is wrong...
tpruvot: fixed dos lines ending in echo,
and used my style for cuda function attributes
Project was updated for VS2013 and CUDA SDK 6.5
add also a --cputest function to dump cpu hash results
TODO: x15 is not fully functional, but first loop seems ok
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>