This will allow later more gpu candidates.
Note: This is an unfinished work, we keep the previous behavior for now
To finish this, all algos solutions should be migrated and submitted nonces attributes stored.
Its required to handle the different share diff per nonce and fix the possible solved count error (if 1/2 nonces is solved).
fix some algo weird hashrates (like blake)
and reset device between algos, for better accuracy
but this reset doesnt seems enough to bench all algos correctly...
to test on linux, could be a driver issue...
heavy: fix first alloc and indent with tabs...
import and keep my code for older archs, like skein 64
reduce the gap between our versions...
+150kH x11 GTX 960 / +30kH 750Ti
+900kH quark GTX 960 / +230kH 750Ti
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)
80 bytes implementation to do/test ... (skein/skein2)
but keep my previous version for older devices...
reduce "false" warnings, and ignore unrelated/small ones <= 1 MB
On windows the gpu memory can be allocated by other processes
+ some cleanup in algos... (free/gpulog)
Added to most algos, checkhash function scans a big range
and can find multiple nonces at once if the difficulty is low.
Stop ignoring them, submit second one if found...
Clean the draft code for rc=2 implemented for blake and pentablake
btw... fix the reduced displayed hashrate when a nonce is found...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
Tested on x11 which find sometimes 3 nonces in one call,
actually they are ignored because only the biggest was kept...
This commit doesnt fix that, but will allow to enhance shares rate later...
heavy: reduce by 256 threads default intensity to all -i 20
cuda: put static thread init bools outside the code (made once)
api: fix nvml header to build without
Sample with -i 18.5
Adding 131072 threads to intensity 18, 393216 cuda threads
And with -i 19.5
Adding 262144 threads to intensity 19, 786432 cuda threads
Small echo rewrite. +10KHASH on the 650(compute 3.0)
tpruvot: add Linux Makefile - Force to 80 registers (else -30KH/s)
Note : the hashrate seems more constant with this change
Was maybe my fault, but the benchmark mode was
always recomputing from nonce 0.
Also fix blake if -d 1 is used (one thread but second gpu)
stats: do not use thread id as key, prefer gpu id...