made for linux and require libpci-dev (optional)
if libpci is not installed, card's vendor names are not handled...
Note: only a few vendor names were added, common GeForce vendors.
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
Should give same or better than SP and klaus versions
Keep old code for older devices and skein2 compat
Linux perf: 750Ti 78MH/s and GTX 970 260MH/s
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
Seems to be djm34 work, i recognize the code style ;)
Code was cleaned/indented and adapted to my fork...
Only usable on the test pool until 16 december 2014!
heavy: reduce by 256 threads default intensity to all -i 20
cuda: put static thread init bools outside the code (made once)
api: fix nvml header to build without
The core problem was the cuda hefty Thread per block set to high
but took me several hours to find that...
btw... +25% in heavy 12500 with 256 threads per block... vs 128 & 512
if max reg count is set to 80...
Small echo rewrite. +10KHASH on the 650(compute 3.0)
tpruvot: add Linux Makefile - Force to 80 registers (else -30KH/s)
Note : the hashrate seems more constant with this change
Previous echo commit was only increasing linux performance, and reducing
windows perf compared to the 1.4.9, this one seems to give at least
the 1.4.9 on windows, and the same on linux...
Shavite optimisation seems ok on both (use now 64 registers)
the launch_bounds will force the number of registers, so remove specific
Makefile rules on linux...
manual "cherry pick" with fixed line endings and some adaptations
There was a different behavior on linux and visual studio
That was making it hard to link functions correctly
That remove some ifdef / extern "C" requirements
note about x86 releases, x86 nvml.dll is not installed on Windows x64!
Based on mwhite73 <marvin.white@gmail.com> implementation
Linked to the api system
Also fix Makefile to support standard c++ files
This prevent nvcc use without device code
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
Displayed data is the average of the last 50 scans in the 5 last minutes
Also move cuda common functions in a new file (cuda.cu)
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
based on klaus commits, will increase a bit speed of most algos
PS: main increase is due to the register count tuning in Makefile
and for skein512 on linux, its the ROTL64
but almost no changes on X11 : 2648MH/s vs 2630 before