Tanguy Pruvot
73f6720121
whirlpool: restore old source code for SM 3.0
...
SM 3.0 implementation need a manual define in whirlpool.cu...
alexis variant is 2x slower on SM3.0 (GT 740)
8 years ago
Tanguy Pruvot
feb99d020f
skein: merge the double implementations in one
...
based on alexis skein kernels, tested ok on SM 2.1 and 3.0
code is a bit hard to read but... well... users dont care :p
8 years ago
Tanguy Pruvot
f8aa16f8d2
skein: cleanup, and precompute h8
8 years ago
Tanguy Pruvot
47f309ffb4
ifdef some unused kernels on SM5+
...
no need to build both (mine and sm variants)
and put global hashrate to 0 while waiting...
9 years ago
Tanguy Pruvot
ef817df79a
import sp skein512 unrolled 64-bytes kernel (+0,6% x11)
...
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)
80 bytes implementation to do/test ... (skein/skein2)
but keep my previous version for older devices...
9 years ago
Tanguy Pruvot
ed4927fcd0
quark/x11: set signed int hashPosition vars to off_t
...
groestl (and keccak?) seems faster with 64bit vars (off_t or int64_t)...
10 years ago
Tanguy Pruvot
0224d4705e
skein: fix wrong hashes seen on x11 with cuda 7
...
Look like a stream synch problem, not related to cuda 7 headers or cudart
The threadfence() added doesnt changes performances, and could also
be related to the random cpu validation errors... so keep it for all.
Note: the 80-bytes variant used in skein2 doesn't seems affected.
10 years ago
Tanguy Pruvot
123fe287b6
x11: temporary workaround for cuda 7.0
10 years ago
Tanguy Pruvot
051ba521be
skein2: minimal host changes
10 years ago
Tanguy Pruvot
2f541065fb
cuda_helper: rename correctly hiword/loword functions
10 years ago
Tanguy Pruvot
275a028935
skein: compute midstate first
...
"Real" optimization based on KlausT precalc
10 years ago
Tanguy Pruvot
163430daae
Skein/Skein2 SM 3.0 devices support
...
+ code cleanup
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
10 years ago
Tanguy Pruvot
37395eefe4
skein: restore previous x11 speed
10 years ago
Tanguy Pruvot
38e6672d70
Allow test of SM 2.1/3.0 binaries on newer cards
...
Implementation based on klausT work.. a bit different
This code must be placed in a common .cu file,
cuda.cpp is not compiled with nvcc and doesnt allow cuda code...
10 years ago
Tanguy Pruvot
f86784ee56
Add skein algo (Skeincoin, Myriad, Unat...)
...
SKEIN512 + SHA256
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
10 years ago
Tanguy Pruvot
e6112e878d
cleanup: use unsigned throughput parameters
...
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
10 years ago
Tanguy Pruvot
ec5a48f420
x11: small simd512 gpu_expand improvement
10 years ago
Tanguy Pruvot
1e24e4899c
skein: uint2 optimisation with SM 3.0 compat (+15KH)
...
Thanks to sp and djm34 for this fast uint64 storage alternative
10 years ago
Tanguy Pruvot
b128312efb
cuda: store device SM in a global var
...
sample usage made for blake and fugue (higher intensity for SM5.2)
add these to cuda_helper and clean unused code
10 years ago
Tanguy Pruvot
db8681c1db
update readme and fix SM 3.0 build
10 years ago
Tanguy Pruvot
d8a23fa970
Tune quark part of Xn funcs
...
based on klaus commits, will increase a bit speed of most algos
PS: main increase is due to the register count tuning in Makefile
and for skein512 on linux, its the ROTL64
but almost no changes on X11 : 2648MH/s vs 2630 before
10 years ago
Tanguy Pruvot
a586cee493
quark: dos2unix files to reduce problems later
11 years ago
Tanguy Pruvot
1fbcbbacc4
Add whirlcoin and optimize x11 luffa (maxrregcount)
11 years ago
Tanguy Pruvot
d9ea5f72ce
Remove duplicated defines present in cuda_helper.h
...
also add cudaDeviceReset() on Ctrl+C for nvprof
11 years ago
Christian Buchner
3b21069504
bump to revision V1.1 with Killer Groestl
11 years ago
Christian Buchner
e049f32fee
bump to revision v0.9 (VC++ project files not updated yet)
11 years ago
Christian Buchner
433d653723
bump to revision 0.7
11 years ago