Tanguy Pruvot
2d83f74a7e
vstudio: special ifdef for the constant (bmw)
2015-10-24 15:13:35 +02:00
Tanguy Pruvot
d43dc9a021
use blake512 sp kernels on SM 5+ (80+64)
...
import and keep my code for older archs, like skein 64
reduce the gap between our versions...
+150kH x11 GTX 960 / +30kH 750Ti
+900kH quark GTX 960 / +230kH 750Ti
2015-10-24 13:43:22 +02:00
Tanguy Pruvot
957d919a6a
bmw512: save a few KBs, ifdef 80-bytes kernel
...
was only used by animecoin
Also ifdef SM 3.0 compat. code to be ignored on recent archs
2015-10-24 07:30:57 +02:00
Tanguy Pruvot
3b7ef923c7
lyra2(v1): use a common uint2x4 include
...
lyrav2 still need more definitions (uint16)
2015-10-23 15:25:24 +02:00
Tanguy Pruvot
82a7e62b30
skein: cleanup, strip uint2x4.h + update vstudio
2015-10-23 13:32:18 +02:00
Tanguy Pruvot
ef817df79a
import sp skein512 unrolled 64-bytes kernel (+0,6% x11)
...
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)
80 bytes implementation to do/test ... (skein/skein2)
but keep my previous version for older devices...
2015-10-23 09:43:20 +02:00
Tanguy Pruvot
5bf1f98200
various fixes for SM 2.1 and the benchmark
...
X11+ algos and quark are not compatible for the moment
but these ones are :
Benchmark results for Gigabyte GTX 460 (SM 2.1 / 1 GB):
blakecoin : 159090.5 kH/s, 1 MB, 1048576 thr.
blake : 70208.9 kH/s, 1 MB, 1048576 thr.
bmw : 122802.6 kH/s, 65 MB, 2097152 thr.
deep : 3533.6 kH/s, 33 MB, 524288 thr.
fugue256 : 43177.9 kH/s, 17 MB, 524288 thr.
heavy : 4118.2 kH/s, 147 MB, 524032 thr.
keccak : 18673.1 kH/s, 129 MB, 2097152 thr.
luffa : 28816.0 kH/s, 257 MB, 4194304 thr.
lyra2 : 213.7 kH/s, 570 MB, 65536 thr.
mjollnir : 3895.6 kH/s, 147 MB, 524032 thr.
nist5 : 1101.4 kH/s, 67 MB, 1048576 thr.
penta : 501.6 kH/s, 21 MB, 327680 thr.
skein : 5432.4 kH/s, 65 MB, 1048576 thr.
skein2 : 6788.9 kH/s, 33 MB, 524288 thr.
whirlpool : 688.5 kH/s, 33 MB, 524288 thr.
zr5 : 122.5 kH/s, 86 MB, 262144 thr.
2015-10-14 02:59:54 +00:00
Tanguy Pruvot
d195f2e8a2
intensity: do not reduce throughput before init
...
Else the memory allocated could be less than required later
btw, use the new "cuda" function to apply intensity/throughput
2015-10-11 05:01:41 +02:00
Tanguy Pruvot
4e1e03b891
benchmark: store all algos results + cuda fixes
...
Note: lyra2, lyra2v2 and script seems to have problems
to coexist with other algos... to run after some of them...
moved lyra2 first and skip scrypt/jane for the moment...
Only stored in memory for now.. to display a table after the bench
ccminer -a auto --benchmark
Results may be exported later to a json file...
2015-10-09 02:07:08 +02:00
Tanguy Pruvot
922c2a5cd7
algos: free allocated mem for algo switch
...
All can be freed propertly now, except script (reset) and lyra2 (leak)
2015-10-08 21:35:30 +02:00
Tanguy Pruvot
ee93927fac
diff: use the new function in all algos
2015-10-07 20:10:15 +02:00
Tanguy Pruvot
e1c4b3042c
algos: add functions to free allocated resources
...
Will be used later for algo switching
not really tested yet...
2015-09-25 07:51:57 +02:00
Tanguy Pruvot
5308898d1c
start v1.7, apply new prototypes to all algos
2015-09-23 15:42:17 +02:00
Tanguy Pruvot
e3548f46f3
drop animecoin support
...
no more really minable... just minable in french
2015-08-22 12:35:22 +02:00
Tanguy Pruvot
4709668995
jh512: rewrite and optimize with asm swap
...
5% improvement by the vshl asm swap functions, mixed shl+add inst.,
Add also xchg(x, y) func and XCHG(x, y) define in cuda_helper for later use...
other jh changes are mainly for the beauty of the code...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-06-16 08:20:48 +02:00
Tanguy Pruvot
a55b148ecc
windows: fix missing off_t include
2015-06-08 16:58:12 +02:00
Tanguy Pruvot
ed4927fcd0
quark/x11: set signed int hashPosition vars to off_t
...
groestl (and keccak?) seems faster with 64bit vars (off_t or int64_t)...
2015-06-05 22:03:05 +02:00
Tanguy Pruvot
ebe95aac2f
bmw512: cleanup after cuda 7 bug fix
2015-05-29 14:32:23 +02:00
Tanguy Pruvot
0224d4705e
skein: fix wrong hashes seen on x11 with cuda 7
...
Look like a stream synch problem, not related to cuda 7 headers or cudart
The threadfence() added doesnt changes performances, and could also
be related to the random cpu validation errors... so keep it for all.
Note: the 80-bytes variant used in skein2 doesn't seems affected.
2015-05-29 12:16:54 +02:00
Tanguy Pruvot
123fe287b6
x11: temporary workaround for cuda 7.0
2015-05-28 21:19:24 +02:00
Tanguy Pruvot
d9b0312897
x64: fix some size_t warnings
2015-05-17 04:56:42 +02:00
Tanguy Pruvot
051ba521be
skein2: minimal host changes
2015-05-14 19:38:03 +02:00
Tanguy Pruvot
2f541065fb
cuda_helper: rename correctly hiword/loword functions
2015-05-12 17:13:58 +02:00
Tanguy Pruvot
2113be6eec
blake80: some changes and launch bounds, no perf changes
2015-04-24 14:12:21 +02:00
Tanguy Pruvot
3d3f2e2cb5
warnings: use the right device id (device_map[thr_id])
2015-04-23 09:41:56 +02:00
Tanguy Pruvot
275a028935
skein: compute midstate first
...
"Real" optimization based on KlausT precalc
2015-04-16 02:11:37 +02:00
Tanguy Pruvot
e7ae27137e
x11/qubit: remove some extra MyStreamSynchronize
...
only one per loop is required to prevent 100% cpu usage
2015-04-15 05:30:22 +02:00
Tanguy Pruvot
163430daae
Skein/Skein2 SM 3.0 devices support
...
+ code cleanup
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-04-15 01:27:48 +02:00
Tanguy Pruvot
d58d53f2b2
update README, small changes, prepare release 1.6.1
...
still need a SM 3.0 fix for skein...
2015-04-14 23:28:00 +02:00
Tanguy Pruvot
48515ad707
groestl: rename included cuda files
2015-04-06 23:46:34 +02:00
Tanguy Pruvot
37395eefe4
skein: restore previous x11 speed
2015-03-28 13:32:08 +01:00
Tanguy Pruvot
4f43abb402
bmw512: indent and restore SM 3.0 compat
...
could be also the source of the problem seen with CUDA 7
restored the code before sp/klaus changes for SM 3.0 devices...
2015-03-28 12:01:50 +01:00
Tanguy Pruvot
38e6672d70
Allow test of SM 2.1/3.0 binaries on newer cards
...
Implementation based on klausT work.. a bit different
This code must be placed in a common .cu file,
cuda.cpp is not compiled with nvcc and doesnt allow cuda code...
2015-03-28 12:00:53 +01:00
Tanguy Pruvot
f86784ee56
Add skein algo (Skeincoin, Myriad, Unat...)
...
SKEIN512 + SHA256
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-03-27 15:24:27 +01:00
Tanguy Pruvot
a37e909db9
Add zr5 algo (for SM 3.5+)
...
uint4 copy + keccak cleanup, groestl: small uint4 opt
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-03-27 15:16:25 +01:00
Tanguy Pruvot
9734186a37
jh512: import and improve klaus and sp changes
...
did not import the extra final function, which should stay compatible
with the common cuda_check_hash()
2015-03-20 05:36:40 +01:00
KlausT
ae8e863591
remove uint32_t cast
2015-03-12 01:01:47 +01:00
Tanguy Pruvot
e6112e878d
cleanup: use unsigned throughput parameters
...
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
2015-02-28 14:05:09 +01:00
Tanguy Pruvot
09c3ac6b4b
linux: fix missing dirname include
2015-02-11 18:36:57 +01:00
Tanguy Pruvot
2d5e8aaced
anime: fix uint2 error (bmw)
2015-02-08 18:32:42 +01:00
KlausT
a452c330dd
quark: remove unused variables
2015-02-02 10:41:14 +01:00
Tanguy Pruvot
26b51a557b
Allow different intensity per device
...
and clean the old variables, no more required
2015-01-24 11:17:29 +01:00
Tanguy Pruvot
768b5ccb76
import bmw512 uint2 changes from sp
...
+ some cleanup... 15KH/s won (750Ti)
2015-01-24 08:02:41 +01:00
Tanguy Pruvot
9f2dd3ee60
Remove some useless conversions
...
do not impact perfs neither...
2015-01-24 08:00:22 +01:00
Tanguy Pruvot
2a5233f56e
api: report throughput when default
2015-01-22 06:28:59 +01:00
Tanguy Pruvot
cafd4477d7
Handle a maximum of 16 gpus (vs 8 before)
...
Some cards have 2 gpus on board...
2015-01-22 04:55:27 +01:00
Tanguy Pruvot
b521acb480
groestl: use sp bitslice enhancement, prepare SM 2.x variant
...
todo: simd512 SM 2.x variant (shfl op), and groestl/myriad functions
2015-01-19 00:42:14 +01:00
Tanguy Pruvot
ec5a48f420
x11: small simd512 gpu_expand improvement
2014-12-19 09:16:55 +01:00
Tanguy Pruvot
1e24e4899c
skein: uint2 optimisation with SM 3.0 compat (+15KH)
...
Thanks to sp and djm34 for this fast uint64 storage alternative
2014-12-16 13:52:54 +01:00
Tanguy Pruvot
2585e10814
keccak uint2 optimisation for SM>3.0 (x11 +40KH/s)
...
based on djm34 keccak 256-bit changes, and keep SM3.0 compat
affect most other algos too (quark, nist5, x13...)
2014-12-15 11:34:03 +01:00