heavy: reduce by 256 threads default intensity to all -i 20
cuda: put static thread init bools outside the code (made once)
api: fix nvml header to build without
The DLL exists for x64 targets but seems not loadable
The nvml.cpp code was wrote to support both NVAPI and NVML on windows
because both apis have unique "features". like Fan RPM vs Fan Percent
Sample with -i 18.5
Adding 131072 threads to intensity 18, 393216 cuda threads
And with -i 19.5
Adding 262144 threads to intensity 19, 786432 cuda threads
Was maybe my fault, but the benchmark mode was
always recomputing from nonce 0.
Also fix blake if -d 1 is used (one thread but second gpu)
stats: do not use thread id as key, prefer gpu id...
nvml.dll doesnt exists for 32bit binaries! use nvapi to get infos
seems to have more/different features than NVML... like pstate etc..
This is nvapi r343 : https://developer.nvidia.com/nvapi
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
There was a different behavior on linux and visual studio
That was making it hard to link functions correctly
That remove some ifdef / extern "C" requirements
note about x86 releases, x86 nvml.dll is not installed on Windows x64!
Based on mwhite73 <marvin.white@gmail.com> implementation
Linked to the api system
Also fix Makefile to support standard c++ files
This prevent nvcc use without device code
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
possible values :
5000 or :5000 to use port 5000 (local only)
0.0.0.0:5000 to allow connections from the network
127.0.0.1:4068 to only allow local connections (default)
Use -b 0 to disable the API system.
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
Note: Heavy and Mjollnir are broken on linux (only)...
To check in the next version... 4 hours i try to fix that without
success. djm34 variant seems ok but also make a lot of rejects.
Displayed data is the average of the last 50 scans in the 5 last minutes
Also move cuda common functions in a new file (cuda.cu)
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
Like cgminer, the value equals to 1 << n
if 0, we keep the default value defined in algo (19 for Xn algos)
19 = 524288 threads per gpu call
GTX 970 and 980 handle a higher number of threads compared to the 750 Ti
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
Unlike other hash algos, blake256 compute the hash
with blocks of 64 bytes.
We can do the first part on the cpu, only the 4 last int32
are computed on gpu (including the tested nonce)
Previous method was also using this kind of cache with a crc.
Blake Hash Speed: +5%