import and keep my code for older archs, like skein 64
reduce the gap between our versions...
+150kH x11 GTX 960 / +30kH 750Ti
+900kH quark GTX 960 / +230kH 750Ti
Switching to a pool with a different algo will require a barrier
to free ressources, like what was made in the global benchmark.
add also the algo in pool structure...
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)
80 bytes implementation to do/test ... (skein/skein2)
but keep my previous version for older devices...
ccminer -n 2>NUL
GPU #0: SM 5.2 GeForce GTX 970
GPU #1: SM 5.0 Gigabyte GTX 750 Ti
GPU #2: SM 5.2 ASUS GTX 970
note: nvml destroy is made in proper_exit function
reduce "false" warnings, and ignore unrelated/small ones <= 1 MB
On windows the gpu memory can be allocated by other processes
+ some cleanup in algos... (free/gpulog)
Mostly to do compatibilty tests, SM 2.1 support is very limited
SM 3.0 code should run on SM 3.5 (only a few cards use this arch)
As i can't test SM 3.5, its best to let users do their own tests...
The full benchmark can now be launched with "ccminer --benchmark"
add a new helper function which log a warning with last cuda error
(not shown with the quiet option) : CUDA_LOG_ERROR();
it can be used where miner.h is included (.c/.cpp/.cu)
fix x14 (in ccminer.cpp), a break was missing in switch..case
when using multiple cpu threads per gpu, use the T prefix, ex:
[2015-10-11 09:52:49] GPU #0: app clocks set to P0 (3600/1228)
vs
[2015-10-11 09:52:51] GPU T0: MSI GTX 960, 5953.35 kH/s
Only thr_id is required, the function take care of the dev id
remains my original lyra2 implementation to fix... (cuda_lyra2.cu)
I guess some kind of memory overflow force the driver to allocate
memory... but was unable to free it without device reset.
Note: lyra2, lyra2v2 and script seems to have problems
to coexist with other algos... to run after some of them...
moved lyra2 first and skip scrypt/jane for the moment...
Only stored in memory for now.. to display a table after the bench
ccminer -a auto --benchmark
Results may be exported later to a json file...
make a difference between whirlpool and whirlcoin algos (stratum)
Look like the old SHA merkleroot method doesnt work on recent coins
Doesn't affect solo mining, only pools using stratum+tcp:// protocol
Reduce a bit the 750Ti speed but improve a lot the 9xx speed.
Keep compat for SM 3/3.5 in a second file..
Note: With this code and Cuda 7.5, the speed won is the reverse...
May be "reverted" soon
which display submitted block and net difficulty and is able
to detect shares above net diff (solved blocs)
Note: only made on lyra2v2 and zr5 algos
TODO: compute the found diff on all algos...
require changes in all scan hash "kernel" function parameters
to be continued...
0: cudaDeviceScheduleAuto
1: cudaDeviceScheduleSpin
2: cudaDeviceScheduleYield
4: cudaDeviceScheduleBlockingSync
Also set the best one (4) for luffa algo by default...