Seems to be djm34 work, i recognize the code style ;)
Code was cleaned/indented and adapted to my fork...
Only usable on the test pool until 16 december 2014!
But use a define in AES to use or not device initial memcpy
I already tried to use everywhere direct device constants
and its not faster for big arrays (difference is small)
also change launch bounds to reduce spills (72 regs)
to check on windows too, could improve the perf... or not
Added to most algos, checkhash function scans a big range
and can find multiple nonces at once if the difficulty is low.
Stop ignoring them, submit second one if found...
Clean the draft code for rc=2 implemented for blake and pentablake
btw... fix the reduced displayed hashrate when a nonce is found...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
Tested on x11 which find sometimes 3 nonces in one call,
actually they are ignored because only the biggest was kept...
This commit doesnt fix that, but will allow to enhance shares rate later...
Allow to directly get api data in HTML5
Tested on Chrome... IE>=10 required, not tested
IE11 seems buggy on connection close... todo
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
heavy: reduce by 256 threads default intensity to all -i 20
cuda: put static thread init bools outside the code (made once)
api: fix nvml header to build without
The core problem was the cuda hefty Thread per block set to high
but took me several hours to find that...
btw... +25% in heavy 12500 with 256 threads per block... vs 128 & 512
if max reg count is set to 80...
The DLL exists for x64 targets but seems not loadable
The nvml.cpp code was wrote to support both NVAPI and NVML on windows
because both apis have unique "features". like Fan RPM vs Fan Percent
Sample with -i 18.5
Adding 131072 threads to intensity 18, 393216 cuda threads
And with -i 19.5
Adding 262144 threads to intensity 19, 786432 cuda threads
Small echo rewrite. +10KHASH on the 650(compute 3.0)
tpruvot: add Linux Makefile - Force to 80 registers (else -30KH/s)
Note : the hashrate seems more constant with this change