84 Commits

Author SHA1 Message Date
Tanguy Pruvot
3b7ef923c7 lyra2(v1): use a common uint2x4 include
lyrav2 still need more definitions (uint16)
2015-10-23 15:25:24 +02:00
Tanguy Pruvot
82a7e62b30 skein: cleanup, strip uint2x4.h + update vstudio 2015-10-23 13:32:18 +02:00
Tanguy Pruvot
ef817df79a import sp skein512 unrolled 64-bytes kernel (+0,6% x11)
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)

80 bytes implementation to do/test ... (skein/skein2)

but keep my previous version for older devices...
2015-10-23 09:43:20 +02:00
Tanguy Pruvot
5bf1f98200 various fixes for SM 2.1 and the benchmark
X11+ algos and quark are not compatible for the moment

but these ones are :

Benchmark results for Gigabyte GTX 460 (SM 2.1 / 1 GB):

   blakecoin :     159090.5 kH/s,     1 MB,  1048576 thr.
       blake :      70208.9 kH/s,     1 MB,  1048576 thr.
         bmw :     122802.6 kH/s,    65 MB,  2097152 thr.
        deep :       3533.6 kH/s,    33 MB,   524288 thr.
    fugue256 :      43177.9 kH/s,    17 MB,   524288 thr.
       heavy :       4118.2 kH/s,   147 MB,   524032 thr.
      keccak :      18673.1 kH/s,   129 MB,  2097152 thr.
       luffa :      28816.0 kH/s,   257 MB,  4194304 thr.
       lyra2 :        213.7 kH/s,   570 MB,    65536 thr.
    mjollnir :       3895.6 kH/s,   147 MB,   524032 thr.
       nist5 :       1101.4 kH/s,    67 MB,  1048576 thr.
       penta :        501.6 kH/s,    21 MB,   327680 thr.
       skein :       5432.4 kH/s,    65 MB,  1048576 thr.
      skein2 :       6788.9 kH/s,    33 MB,   524288 thr.
   whirlpool :        688.5 kH/s,    33 MB,   524288 thr.
         zr5 :        122.5 kH/s,    86 MB,   262144 thr.
2015-10-14 02:59:54 +00:00
Tanguy Pruvot
d195f2e8a2 intensity: do not reduce throughput before init
Else the memory allocated could be less than required later

btw, use the new "cuda" function to apply intensity/throughput
2015-10-11 05:01:41 +02:00
Tanguy Pruvot
4e1e03b891 benchmark: store all algos results + cuda fixes
Note: lyra2, lyra2v2 and script seems to have problems
to coexist with other algos... to run after some of them...

moved lyra2 first and skip scrypt/jane for the moment...

Only stored in memory for now.. to display a table after the bench

ccminer -a auto --benchmark

Results may be exported later to a json file...
2015-10-09 02:07:08 +02:00
Tanguy Pruvot
922c2a5cd7 algos: free allocated mem for algo switch
All can be freed propertly now, except script (reset) and lyra2 (leak)
2015-10-08 21:35:30 +02:00
Tanguy Pruvot
ee93927fac diff: use the new function in all algos 2015-10-07 20:10:15 +02:00
Tanguy Pruvot
e1c4b3042c algos: add functions to free allocated resources
Will be used later for algo switching

not really tested yet...
2015-09-25 07:51:57 +02:00
Tanguy Pruvot
5308898d1c start v1.7, apply new prototypes to all algos 2015-09-23 15:42:17 +02:00
Tanguy Pruvot
e3548f46f3 drop animecoin support
no more really minable... just minable in french
2015-08-22 12:35:22 +02:00
Tanguy Pruvot
4709668995 jh512: rewrite and optimize with asm swap
5% improvement by the vshl asm swap functions, mixed shl+add inst.,

Add also xchg(x, y) func and XCHG(x, y) define in cuda_helper for later use...

other jh changes are mainly for the beauty of the code...

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-06-16 08:20:48 +02:00
Tanguy Pruvot
a55b148ecc windows: fix missing off_t include 2015-06-08 16:58:12 +02:00
Tanguy Pruvot
ed4927fcd0 quark/x11: set signed int hashPosition vars to off_t
groestl (and keccak?) seems faster with 64bit vars (off_t or int64_t)...
2015-06-05 22:03:05 +02:00
Tanguy Pruvot
ebe95aac2f bmw512: cleanup after cuda 7 bug fix 2015-05-29 14:32:23 +02:00
Tanguy Pruvot
0224d4705e skein: fix wrong hashes seen on x11 with cuda 7
Look like a stream synch problem, not related to cuda 7 headers or cudart

The threadfence() added doesnt changes performances, and could also
be related to the random cpu validation errors... so keep it for all.

Note: the 80-bytes variant used in skein2 doesn't seems affected.
2015-05-29 12:16:54 +02:00
Tanguy Pruvot
123fe287b6 x11: temporary workaround for cuda 7.0 2015-05-28 21:19:24 +02:00
Tanguy Pruvot
d9b0312897 x64: fix some size_t warnings 2015-05-17 04:56:42 +02:00
Tanguy Pruvot
051ba521be skein2: minimal host changes 2015-05-14 19:38:03 +02:00
Tanguy Pruvot
2f541065fb cuda_helper: rename correctly hiword/loword functions 2015-05-12 17:13:58 +02:00
Tanguy Pruvot
2113be6eec blake80: some changes and launch bounds, no perf changes 2015-04-24 14:12:21 +02:00
Tanguy Pruvot
3d3f2e2cb5 warnings: use the right device id (device_map[thr_id]) 2015-04-23 09:41:56 +02:00
Tanguy Pruvot
275a028935 skein: compute midstate first
"Real" optimization based on KlausT precalc
2015-04-16 02:11:37 +02:00
Tanguy Pruvot
e7ae27137e x11/qubit: remove some extra MyStreamSynchronize
only one per loop is required to prevent 100% cpu usage
2015-04-15 05:30:22 +02:00
Tanguy Pruvot
163430daae Skein/Skein2 SM 3.0 devices support
+ code cleanup

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-04-15 01:27:48 +02:00
Tanguy Pruvot
d58d53f2b2 update README, small changes, prepare release 1.6.1
still need a SM 3.0 fix for skein...
2015-04-14 23:28:00 +02:00
Tanguy Pruvot
48515ad707 groestl: rename included cuda files 2015-04-06 23:46:34 +02:00
Tanguy Pruvot
37395eefe4 skein: restore previous x11 speed 2015-03-28 13:32:08 +01:00
Tanguy Pruvot
4f43abb402 bmw512: indent and restore SM 3.0 compat
could be also the source of the problem seen with CUDA 7

restored the code before sp/klaus changes for SM 3.0 devices...
2015-03-28 12:01:50 +01:00
Tanguy Pruvot
38e6672d70 Allow test of SM 2.1/3.0 binaries on newer cards
Implementation based on klausT work.. a bit different

This code must be placed in a common .cu file,
cuda.cpp is not compiled with nvcc and doesnt allow cuda code...
2015-03-28 12:00:53 +01:00
Tanguy Pruvot
f86784ee56 Add skein algo (Skeincoin, Myriad, Unat...)
SKEIN512 + SHA256

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-03-27 15:24:27 +01:00
Tanguy Pruvot
a37e909db9 Add zr5 algo (for SM 3.5+)
uint4 copy + keccak cleanup, groestl: small uint4 opt

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-03-27 15:16:25 +01:00
Tanguy Pruvot
9734186a37 jh512: import and improve klaus and sp changes
did not import the extra final function, which should stay compatible
with the common cuda_check_hash()
2015-03-20 05:36:40 +01:00
KlausT
ae8e863591 remove uint32_t cast 2015-03-12 01:01:47 +01:00
Tanguy Pruvot
e6112e878d cleanup: use unsigned throughput parameters
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
2015-02-28 14:05:09 +01:00
Tanguy Pruvot
09c3ac6b4b linux: fix missing dirname include 2015-02-11 18:36:57 +01:00
Tanguy Pruvot
2d5e8aaced anime: fix uint2 error (bmw) 2015-02-08 18:32:42 +01:00
KlausT
a452c330dd quark: remove unused variables 2015-02-02 10:41:14 +01:00
Tanguy Pruvot
26b51a557b Allow different intensity per device
and clean the old variables, no more required
2015-01-24 11:17:29 +01:00
Tanguy Pruvot
768b5ccb76 import bmw512 uint2 changes from sp
+ some cleanup... 15KH/s won (750Ti)
2015-01-24 08:02:41 +01:00
Tanguy Pruvot
9f2dd3ee60 Remove some useless conversions
do not impact perfs neither...
2015-01-24 08:00:22 +01:00
Tanguy Pruvot
2a5233f56e api: report throughput when default 2015-01-22 06:28:59 +01:00
Tanguy Pruvot
cafd4477d7 Handle a maximum of 16 gpus (vs 8 before)
Some cards have 2 gpus on board...
2015-01-22 04:55:27 +01:00
Tanguy Pruvot
b521acb480 groestl: use sp bitslice enhancement, prepare SM 2.x variant
todo: simd512 SM 2.x variant (shfl op), and groestl/myriad functions
2015-01-19 00:42:14 +01:00
Tanguy Pruvot
ec5a48f420 x11: small simd512 gpu_expand improvement 2014-12-19 09:16:55 +01:00
Tanguy Pruvot
1e24e4899c skein: uint2 optimisation with SM 3.0 compat (+15KH)
Thanks to sp and djm34 for this fast uint64 storage alternative
2014-12-16 13:52:54 +01:00
Tanguy Pruvot
2585e10814 keccak uint2 optimisation for SM>3.0 (x11 +40KH/s)
based on djm34 keccak 256-bit changes, and keep SM3.0 compat

affect most other algos too (quark, nist5, x13...)
2014-12-15 11:34:03 +01:00
Tanguy Pruvot
c3bdb623e8 Check and submit multiple nonces in one loop
Added to most algos, checkhash function scans a big range
and can find multiple nonces at once if the difficulty is low.

Stop ignoring them, submit second one if found...

Clean the draft code for rc=2 implemented for blake and pentablake

btw... fix the reduced displayed hashrate when a nonce is found...

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2014-12-05 15:53:40 +00:00
Tanguy Pruvot
118a6be361 checkhash: simplify the common function
use klaus trivial function, the old code has always been a bit weird..

split cuda_check_cpu_hash_64 in two functions, keep old for branched stuff
2014-12-01 00:20:40 +01:00
Tanguy Pruvot
c218c3f514 quark/anime: +100KH, bmw tpb was not correct
This small change also enhance a bit x11..17 algos
2014-11-28 22:18:48 +01:00