fix some algo weird hashrates (like blake)
and reset device between algos, for better accuracy
but this reset doesnt seems enough to bench all algos correctly...
to test on linux, could be a driver issue...
heavy: fix first alloc and indent with tabs...
reduce "false" warnings, and ignore unrelated/small ones <= 1 MB
On windows the gpu memory can be allocated by other processes
+ some cleanup in algos... (free/gpulog)
Mostly to do compatibilty tests, SM 2.1 support is very limited
SM 3.0 code should run on SM 3.5 (only a few cards use this arch)
As i can't test SM 3.5, its best to let users do their own tests...
seems to resolve solo mining lock on share.
export also computed solo work diff in api (not perfect)
In high rate algos, throughput should be unsigned...
This fixes keccak, blake and doom problems
And change terminal color of debug lines, to be selectable in putty,
color code is not supported in windows but selection is ok there.
Seems to be djm34 work, i recognize the code style ;)
Code was cleaned/indented and adapted to my fork...
Only usable on the test pool until 16 december 2014!
Added to most algos, checkhash function scans a big range
and can find multiple nonces at once if the difficulty is low.
Stop ignoring them, submit second one if found...
Clean the draft code for rc=2 implemented for blake and pentablake
btw... fix the reduced displayed hashrate when a nonce is found...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
heavy: reduce by 256 threads default intensity to all -i 20
cuda: put static thread init bools outside the code (made once)
api: fix nvml header to build without
The core problem was the cuda hefty Thread per block set to high
but took me several hours to find that...
btw... +25% in heavy 12500 with 256 threads per block... vs 128 & 512
if max reg count is set to 80...
Was maybe my fault, but the benchmark mode was
always recomputing from nonce 0.
Also fix blake if -d 1 is used (one thread but second gpu)
stats: do not use thread id as key, prefer gpu id...
Unlike other hash algos, blake256 compute the hash
with blocks of 64 bytes.
We can do the first part on the cpu, only the 4 last int32
are computed on gpu (including the tested nonce)
Previous method was also using this kind of cache with a crc.
Blake Hash Speed: +5%
use clz (leading zeros) asm func for a fast gpu compare of ptarget[6]:[7]
add also missing windows ctz/clz host functions
New NEOS speed: 227MH to 270MH (Gigabyte 750Ti Black Edition)