heavy: reduce by 256 threads default intensity to all -i 20
cuda: put static thread init bools outside the code (made once)
api: fix nvml header to build without
Sample with -i 18.5
Adding 131072 threads to intensity 18, 393216 cuda threads
And with -i 19.5
Adding 262144 threads to intensity 19, 786432 cuda threads
Small echo rewrite. +10KHASH on the 650(compute 3.0)
tpruvot: add Linux Makefile - Force to 80 registers (else -30KH/s)
Note : the hashrate seems more constant with this change
Was maybe my fault, but the benchmark mode was
always recomputing from nonce 0.
Also fix blake if -d 1 is used (one thread but second gpu)
stats: do not use thread id as key, prefer gpu id...
Previous echo commit was only increasing linux performance, and reducing
windows perf compared to the 1.4.9, this one seems to give at least
the 1.4.9 on windows, and the same on linux...
Shavite optimisation seems ok on both (use now 64 registers)
the launch_bounds will force the number of registers, so remove specific
Makefile rules on linux...
manual "cherry pick" with fixed line endings and some adaptations
Original Commit :
Removed sharedmem and reduced calculations with precalcing (ECHO hash).
750ti + 20KHASH(x11)
tpruvot notes:
Real change is more of 10 KH/s on stock clocks (but real)
launch bounds disabled, no perf increase with 64 registers
echo : 40.056ms -> 39.241ms
cube : 14.490ms -> 13.511ms
cube hash change look like useless (__device__ code in generally inlined)
but the reality proves that cuda documentation is wrong...
tpruvot: fixed dos lines ending in echo,
and used my style for cuda function attributes
based on klaus commits, will increase a bit speed of most algos
PS: main increase is due to the register count tuning in Makefile
and for skein512 on linux, its the ROTL64
but almost no changes on X11 : 2648MH/s vs 2630 before
Cleaned up and adapted to my changes (cputest added)
Remove Makefile.in which should be in gitignore
(Plz refresh it with ./config.sh to compile on linux)
Project was updated for VS2013 and CUDA SDK 6.5
add also a --cputest function to dump cpu hash results
TODO: x15 is not fully functional, but first loop seems ok
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>