Tanguy Pruvot
b9da6c67f5
improve jh512 with vectors (nist5,quark,sib,x11+,zr5)
...
the main improvement is to reduce asm calls to read global mem
but, a few more regs are used (68 mini vs 64 on SM 5.2)
so reduce the forced launch bounds to allow 80 or 128 regs per thread
Note: cuda 6.5 seems not able to store with v4.u32... (7.5 is fine)
st.global.v4.u32 [%rd2], {%r3783, %r3824, %r3823, %r3822};
st.global.v2.u32 [%rd2+16], {%r3821, %r3820};
st.global.u32 [%rd2+24], %r3819;
st.global.u32 [%rd2+28], %r3818;
st.global.u32 [%rd2+44], %r3814;
st.global.u32 [%rd2+40], %r3815;
...
todo, check alexis variant.. but wanted to keep this code before in git...
8 years ago
Tanguy Pruvot
0ff75791e5
migrate 2nd nonce storage of most algos
...
This allow to keep pdata[19] as cursor between scans, and later, to sort them..
remains... heavy, scrypt, sia...
8 years ago
Tanguy Pruvot
5a77d36635
groestl: explain code and improve perf on SM 2.x
...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
8 years ago
Tanguy Pruvot
feb99d020f
skein: merge the double implementations in one
...
based on alexis skein kernels, tested ok on SM 2.1 and 3.0
code is a bit hard to read but... well... users dont care :p
8 years ago
Tanguy Pruvot
9eead77027
diff: show by default, rework shares diff storage
...
This will allow later more gpu candidates.
Note: This is an unfinished work, we keep the previous behavior for now
To finish this, all algos solutions should be migrated and submitted nonces attributes stored.
Its required to handle the different share diff per nonce and fix the possible solved count error (if 1/2 nonces is solved).
8 years ago
Tanguy Pruvot
009b013d25
nist5: rename and move source file
...
build tip: autoreconf && make -j
8 years ago
Tanguy Pruvot
34e97bf3e6
Show intensity on init for all algos
8 years ago
Tanguy Pruvot
f8aa16f8d2
skein: cleanup, and precompute h8
8 years ago
Tanguy Pruvot
de738ccc2b
x11: secure groestl against possible cuda errors
...
big cleanup...
9 years ago
Tanguy Pruvot
a4196b341d
neoscrypt: apply last VTC improvements
...
rewrote almost properly ;)
9 years ago
Tanguy Pruvot
c0e9370ba2
quark: real hashrate was wrong, add a few kHs
9 years ago
Tanguy Pruvot
a237601747
1.7.1 release
...
set schedule flags to reduce linux cpu usage without MyStreamSynchronize()
9 years ago
Tanguy Pruvot
d7c2168f2b
quark: static shared memory allocation for SM3+
...
from KlausT committed on 4 Jan, add a few kH/s
9 years ago
Tanguy Pruvot
64e14b7d82
quark: final cleanup for the 1.7
9 years ago
Tanguy Pruvot
2247605d23
quark: add support for SM 2 devices
...
todo: use nonce vectors for the second branch
GPU #0 : Gigabyte GTX 460, 261.26 kH/s
accepted: 2/2 (diff 0.046), 254.36 kH/s yay!!!
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
9 years ago
Tanguy Pruvot
e50556b637
various changes, cleanup for the release
...
small fixes to handle better the multi thread per gpu
explicitly report than quark is not compatible with SM 2.1 (compact shuffle)
9 years ago
Tanguy Pruvot
47f309ffb4
ifdef some unused kernels on SM5+
...
no need to build both (mine and sm variants)
and put global hashrate to 0 while waiting...
9 years ago
Tanguy Pruvot
8d4d4d65ce
cuda: header for common kernel functions (quark/x11)
...
Was thinking about doing that since months ;) lets go
9 years ago
Tanguy Pruvot
26c7316a08
vstudio: clean and fix blake ifdef for x64
...
the allocated var was not used... sigh
9 years ago
Tanguy Pruvot
2d83f74a7e
vstudio: special ifdef for the constant (bmw)
9 years ago
Tanguy Pruvot
d43dc9a021
use blake512 sp kernels on SM 5+ (80+64)
...
import and keep my code for older archs, like skein 64
reduce the gap between our versions...
+150kH x11 GTX 960 / +30kH 750Ti
+900kH quark GTX 960 / +230kH 750Ti
9 years ago
Tanguy Pruvot
957d919a6a
bmw512: save a few KBs, ifdef 80-bytes kernel
...
was only used by animecoin
Also ifdef SM 3.0 compat. code to be ignored on recent archs
9 years ago
Tanguy Pruvot
3b7ef923c7
lyra2(v1): use a common uint2x4 include
...
lyrav2 still need more definitions (uint16)
9 years ago
Tanguy Pruvot
82a7e62b30
skein: cleanup, strip uint2x4.h + update vstudio
9 years ago
Tanguy Pruvot
ef817df79a
import sp skein512 unrolled 64-bytes kernel (+0,6% x11)
...
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)
80 bytes implementation to do/test ... (skein/skein2)
but keep my previous version for older devices...
9 years ago
Tanguy Pruvot
5bf1f98200
various fixes for SM 2.1 and the benchmark
...
X11+ algos and quark are not compatible for the moment
but these ones are :
Benchmark results for Gigabyte GTX 460 (SM 2.1 / 1 GB):
blakecoin : 159090.5 kH/s, 1 MB, 1048576 thr.
blake : 70208.9 kH/s, 1 MB, 1048576 thr.
bmw : 122802.6 kH/s, 65 MB, 2097152 thr.
deep : 3533.6 kH/s, 33 MB, 524288 thr.
fugue256 : 43177.9 kH/s, 17 MB, 524288 thr.
heavy : 4118.2 kH/s, 147 MB, 524032 thr.
keccak : 18673.1 kH/s, 129 MB, 2097152 thr.
luffa : 28816.0 kH/s, 257 MB, 4194304 thr.
lyra2 : 213.7 kH/s, 570 MB, 65536 thr.
mjollnir : 3895.6 kH/s, 147 MB, 524032 thr.
nist5 : 1101.4 kH/s, 67 MB, 1048576 thr.
penta : 501.6 kH/s, 21 MB, 327680 thr.
skein : 5432.4 kH/s, 65 MB, 1048576 thr.
skein2 : 6788.9 kH/s, 33 MB, 524288 thr.
whirlpool : 688.5 kH/s, 33 MB, 524288 thr.
zr5 : 122.5 kH/s, 86 MB, 262144 thr.
9 years ago
Tanguy Pruvot
d195f2e8a2
intensity: do not reduce throughput before init
...
Else the memory allocated could be less than required later
btw, use the new "cuda" function to apply intensity/throughput
9 years ago
Tanguy Pruvot
4e1e03b891
benchmark: store all algos results + cuda fixes
...
Note: lyra2, lyra2v2 and script seems to have problems
to coexist with other algos... to run after some of them...
moved lyra2 first and skip scrypt/jane for the moment...
Only stored in memory for now.. to display a table after the bench
ccminer -a auto --benchmark
Results may be exported later to a json file...
9 years ago
Tanguy Pruvot
922c2a5cd7
algos: free allocated mem for algo switch
...
All can be freed propertly now, except script (reset) and lyra2 (leak)
9 years ago
Tanguy Pruvot
ee93927fac
diff: use the new function in all algos
9 years ago
Tanguy Pruvot
e1c4b3042c
algos: add functions to free allocated resources
...
Will be used later for algo switching
not really tested yet...
9 years ago
Tanguy Pruvot
5308898d1c
start v1.7, apply new prototypes to all algos
9 years ago
Tanguy Pruvot
e3548f46f3
drop animecoin support
...
no more really minable... just minable in french
9 years ago
Tanguy Pruvot
4709668995
jh512: rewrite and optimize with asm swap
...
5% improvement by the vshl asm swap functions, mixed shl+add inst.,
Add also xchg(x, y) func and XCHG(x, y) define in cuda_helper for later use...
other jh changes are mainly for the beauty of the code...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
10 years ago
Tanguy Pruvot
a55b148ecc
windows: fix missing off_t include
10 years ago
Tanguy Pruvot
ed4927fcd0
quark/x11: set signed int hashPosition vars to off_t
...
groestl (and keccak?) seems faster with 64bit vars (off_t or int64_t)...
10 years ago
Tanguy Pruvot
ebe95aac2f
bmw512: cleanup after cuda 7 bug fix
10 years ago
Tanguy Pruvot
0224d4705e
skein: fix wrong hashes seen on x11 with cuda 7
...
Look like a stream synch problem, not related to cuda 7 headers or cudart
The threadfence() added doesnt changes performances, and could also
be related to the random cpu validation errors... so keep it for all.
Note: the 80-bytes variant used in skein2 doesn't seems affected.
10 years ago
Tanguy Pruvot
123fe287b6
x11: temporary workaround for cuda 7.0
10 years ago
Tanguy Pruvot
d9b0312897
x64: fix some size_t warnings
10 years ago
Tanguy Pruvot
051ba521be
skein2: minimal host changes
10 years ago
Tanguy Pruvot
2f541065fb
cuda_helper: rename correctly hiword/loword functions
10 years ago
Tanguy Pruvot
2113be6eec
blake80: some changes and launch bounds, no perf changes
10 years ago
Tanguy Pruvot
3d3f2e2cb5
warnings: use the right device id (device_map[thr_id])
10 years ago
Tanguy Pruvot
275a028935
skein: compute midstate first
...
"Real" optimization based on KlausT precalc
10 years ago
Tanguy Pruvot
e7ae27137e
x11/qubit: remove some extra MyStreamSynchronize
...
only one per loop is required to prevent 100% cpu usage
10 years ago
Tanguy Pruvot
163430daae
Skein/Skein2 SM 3.0 devices support
...
+ code cleanup
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
10 years ago
Tanguy Pruvot
d58d53f2b2
update README, small changes, prepare release 1.6.1
...
still need a SM 3.0 fix for skein...
10 years ago
Tanguy Pruvot
48515ad707
groestl: rename included cuda files
10 years ago
Tanguy Pruvot
37395eefe4
skein: restore previous x11 speed
10 years ago