heavy: reduce by 256 threads default intensity to all -i 20
cuda: put static thread init bools outside the code (made once)
api: fix nvml header to build without
The core problem was the cuda hefty Thread per block set to high
but took me several hours to find that...
btw... +25% in heavy 12500 with 256 threads per block... vs 128 & 512
if max reg count is set to 80...
Displayed data is the average of the last 50 scans in the 5 last minutes
Also move cuda common functions in a new file (cuda.cu)
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
Project was updated for VS2013 and CUDA SDK 6.5
add also a --cputest function to dump cpu hash results
TODO: x15 is not fully functional, but first loop seems ok
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>