Original Commit :
Removed sharedmem and reduced calculations with precalcing (ECHO hash).
750ti + 20KHASH(x11)
tpruvot notes:
Real change is more of 10 KH/s on stock clocks (but real)
launch bounds disabled, no perf increase with 64 registers
echo : 40.056ms -> 39.241ms
cube : 14.490ms -> 13.511ms
cube hash change look like useless (__device__ code in generally inlined)
but the reality proves that cuda documentation is wrong...
tpruvot: fixed dos lines ending in echo,
and used my style for cuda function attributes
based on klaus commits, will increase a bit speed of most algos
PS: main increase is due to the register count tuning in Makefile
and for skein512 on linux, its the ROTL64
but almost no changes on X11 : 2648MH/s vs 2630 before
Cleaned up and adapted to my changes (cputest added)
Remove Makefile.in which should be in gitignore
(Plz refresh it with ./config.sh to compile on linux)
Project was updated for VS2013 and CUDA SDK 6.5
add also a --cputest function to dump cpu hash results
TODO: x15 is not fully functional, but first loop seems ok
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>