Sample with -i 18.5
Adding 131072 threads to intensity 18, 393216 cuda threads
And with -i 19.5
Adding 262144 threads to intensity 19, 786432 cuda threads
Was maybe my fault, but the benchmark mode was
always recomputing from nonce 0.
Also fix blake if -d 1 is used (one thread but second gpu)
stats: do not use thread id as key, prefer gpu id...
nvml.dll doesnt exists for 32bit binaries! use nvapi to get infos
seems to have more/different features than NVML... like pstate etc..
This is nvapi r343 : https://developer.nvidia.com/nvapi
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
There was a different behavior on linux and visual studio
That was making it hard to link functions correctly
That remove some ifdef / extern "C" requirements
note about x86 releases, x86 nvml.dll is not installed on Windows x64!
Based on mwhite73 <marvin.white@gmail.com> implementation
Linked to the api system
Also fix Makefile to support standard c++ files
This prevent nvcc use without device code
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
possible values :
5000 or :5000 to use port 5000 (local only)
0.0.0.0:5000 to allow connections from the network
127.0.0.1:4068 to only allow local connections (default)
Use -b 0 to disable the API system.
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
Note: Heavy and Mjollnir are broken on linux (only)...
To check in the next version... 4 hours i try to fix that without
success. djm34 variant seems ok but also make a lot of rejects.
Displayed data is the average of the last 50 scans in the 5 last minutes
Also move cuda common functions in a new file (cuda.cu)
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
Like cgminer, the value equals to 1 << n
if 0, we keep the default value defined in algo (19 for Xn algos)
19 = 524288 threads per gpu call
GTX 970 and 980 handle a higher number of threads compared to the 750 Ti
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
Unlike other hash algos, blake256 compute the hash
with blocks of 64 bytes.
We can do the first part on the cpu, only the 4 last int32
are computed on gpu (including the tested nonce)
Previous method was also using this kind of cache with a crc.
Blake Hash Speed: +5%