95 Commits

Author SHA1 Message Date
Tanguy Pruvot
a237601747 1.7.1 release
set schedule flags to reduce linux cpu usage without MyStreamSynchronize()
2016-01-26 20:43:16 +01:00
Tanguy Pruvot
d7c2168f2b quark: static shared memory allocation for SM3+
from KlausT committed on 4 Jan, add a few kH/s
2015-11-06 15:16:43 +01:00
Tanguy Pruvot
64e14b7d82 quark: final cleanup for the 1.7 2015-11-06 14:55:43 +01:00
Tanguy Pruvot
2247605d23 quark: add support for SM 2 devices
todo: use nonce vectors for the second branch

GPU #0: Gigabyte GTX 460, 261.26 kH/s
accepted: 2/2 (diff 0.046), 254.36 kH/s yay!!!

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-11-06 04:10:06 +01:00
Tanguy Pruvot
e50556b637 various changes, cleanup for the release
small fixes to handle better the multi thread per gpu

explicitly report than quark is not compatible with SM 2.1 (compact shuffle)
2015-11-04 14:59:59 +01:00
Tanguy Pruvot
47f309ffb4 ifdef some unused kernels on SM5+
no need to build both (mine and sm variants)

and put global hashrate to 0 while waiting...
2015-10-28 07:25:52 +01:00
Tanguy Pruvot
8d4d4d65ce cuda: header for common kernel functions (quark/x11)
Was thinking about doing that since months ;) lets go
2015-10-25 06:54:17 +01:00
Tanguy Pruvot
26c7316a08 vstudio: clean and fix blake ifdef for x64
the allocated var was not used... sigh
2015-10-24 18:21:45 +02:00
Tanguy Pruvot
2d83f74a7e vstudio: special ifdef for the constant (bmw) 2015-10-24 15:13:35 +02:00
Tanguy Pruvot
d43dc9a021 use blake512 sp kernels on SM 5+ (80+64)
import and keep my code for older archs, like skein 64

reduce the gap between our versions...

+150kH x11   GTX 960 / +30kH  750Ti
+900kH quark GTX 960 / +230kH 750Ti
2015-10-24 13:43:22 +02:00
Tanguy Pruvot
957d919a6a bmw512: save a few KBs, ifdef 80-bytes kernel
was only used by animecoin

Also ifdef SM 3.0 compat. code to be ignored on recent archs
2015-10-24 07:30:57 +02:00
Tanguy Pruvot
3b7ef923c7 lyra2(v1): use a common uint2x4 include
lyrav2 still need more definitions (uint16)
2015-10-23 15:25:24 +02:00
Tanguy Pruvot
82a7e62b30 skein: cleanup, strip uint2x4.h + update vstudio 2015-10-23 13:32:18 +02:00
Tanguy Pruvot
ef817df79a import sp skein512 unrolled 64-bytes kernel (+0,6% x11)
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)

80 bytes implementation to do/test ... (skein/skein2)

but keep my previous version for older devices...
2015-10-23 09:43:20 +02:00
Tanguy Pruvot
5bf1f98200 various fixes for SM 2.1 and the benchmark
X11+ algos and quark are not compatible for the moment

but these ones are :

Benchmark results for Gigabyte GTX 460 (SM 2.1 / 1 GB):

   blakecoin :     159090.5 kH/s,     1 MB,  1048576 thr.
       blake :      70208.9 kH/s,     1 MB,  1048576 thr.
         bmw :     122802.6 kH/s,    65 MB,  2097152 thr.
        deep :       3533.6 kH/s,    33 MB,   524288 thr.
    fugue256 :      43177.9 kH/s,    17 MB,   524288 thr.
       heavy :       4118.2 kH/s,   147 MB,   524032 thr.
      keccak :      18673.1 kH/s,   129 MB,  2097152 thr.
       luffa :      28816.0 kH/s,   257 MB,  4194304 thr.
       lyra2 :        213.7 kH/s,   570 MB,    65536 thr.
    mjollnir :       3895.6 kH/s,   147 MB,   524032 thr.
       nist5 :       1101.4 kH/s,    67 MB,  1048576 thr.
       penta :        501.6 kH/s,    21 MB,   327680 thr.
       skein :       5432.4 kH/s,    65 MB,  1048576 thr.
      skein2 :       6788.9 kH/s,    33 MB,   524288 thr.
   whirlpool :        688.5 kH/s,    33 MB,   524288 thr.
         zr5 :        122.5 kH/s,    86 MB,   262144 thr.
2015-10-14 02:59:54 +00:00
Tanguy Pruvot
d195f2e8a2 intensity: do not reduce throughput before init
Else the memory allocated could be less than required later

btw, use the new "cuda" function to apply intensity/throughput
2015-10-11 05:01:41 +02:00
Tanguy Pruvot
4e1e03b891 benchmark: store all algos results + cuda fixes
Note: lyra2, lyra2v2 and script seems to have problems
to coexist with other algos... to run after some of them...

moved lyra2 first and skip scrypt/jane for the moment...

Only stored in memory for now.. to display a table after the bench

ccminer -a auto --benchmark

Results may be exported later to a json file...
2015-10-09 02:07:08 +02:00
Tanguy Pruvot
922c2a5cd7 algos: free allocated mem for algo switch
All can be freed propertly now, except script (reset) and lyra2 (leak)
2015-10-08 21:35:30 +02:00
Tanguy Pruvot
ee93927fac diff: use the new function in all algos 2015-10-07 20:10:15 +02:00
Tanguy Pruvot
e1c4b3042c algos: add functions to free allocated resources
Will be used later for algo switching

not really tested yet...
2015-09-25 07:51:57 +02:00
Tanguy Pruvot
5308898d1c start v1.7, apply new prototypes to all algos 2015-09-23 15:42:17 +02:00
Tanguy Pruvot
e3548f46f3 drop animecoin support
no more really minable... just minable in french
2015-08-22 12:35:22 +02:00
Tanguy Pruvot
4709668995 jh512: rewrite and optimize with asm swap
5% improvement by the vshl asm swap functions, mixed shl+add inst.,

Add also xchg(x, y) func and XCHG(x, y) define in cuda_helper for later use...

other jh changes are mainly for the beauty of the code...

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-06-16 08:20:48 +02:00
Tanguy Pruvot
a55b148ecc windows: fix missing off_t include 2015-06-08 16:58:12 +02:00
Tanguy Pruvot
ed4927fcd0 quark/x11: set signed int hashPosition vars to off_t
groestl (and keccak?) seems faster with 64bit vars (off_t or int64_t)...
2015-06-05 22:03:05 +02:00
Tanguy Pruvot
ebe95aac2f bmw512: cleanup after cuda 7 bug fix 2015-05-29 14:32:23 +02:00
Tanguy Pruvot
0224d4705e skein: fix wrong hashes seen on x11 with cuda 7
Look like a stream synch problem, not related to cuda 7 headers or cudart

The threadfence() added doesnt changes performances, and could also
be related to the random cpu validation errors... so keep it for all.

Note: the 80-bytes variant used in skein2 doesn't seems affected.
2015-05-29 12:16:54 +02:00
Tanguy Pruvot
123fe287b6 x11: temporary workaround for cuda 7.0 2015-05-28 21:19:24 +02:00
Tanguy Pruvot
d9b0312897 x64: fix some size_t warnings 2015-05-17 04:56:42 +02:00
Tanguy Pruvot
051ba521be skein2: minimal host changes 2015-05-14 19:38:03 +02:00
Tanguy Pruvot
2f541065fb cuda_helper: rename correctly hiword/loword functions 2015-05-12 17:13:58 +02:00
Tanguy Pruvot
2113be6eec blake80: some changes and launch bounds, no perf changes 2015-04-24 14:12:21 +02:00
Tanguy Pruvot
3d3f2e2cb5 warnings: use the right device id (device_map[thr_id]) 2015-04-23 09:41:56 +02:00
Tanguy Pruvot
275a028935 skein: compute midstate first
"Real" optimization based on KlausT precalc
2015-04-16 02:11:37 +02:00
Tanguy Pruvot
e7ae27137e x11/qubit: remove some extra MyStreamSynchronize
only one per loop is required to prevent 100% cpu usage
2015-04-15 05:30:22 +02:00
Tanguy Pruvot
163430daae Skein/Skein2 SM 3.0 devices support
+ code cleanup

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-04-15 01:27:48 +02:00
Tanguy Pruvot
d58d53f2b2 update README, small changes, prepare release 1.6.1
still need a SM 3.0 fix for skein...
2015-04-14 23:28:00 +02:00
Tanguy Pruvot
48515ad707 groestl: rename included cuda files 2015-04-06 23:46:34 +02:00
Tanguy Pruvot
37395eefe4 skein: restore previous x11 speed 2015-03-28 13:32:08 +01:00
Tanguy Pruvot
4f43abb402 bmw512: indent and restore SM 3.0 compat
could be also the source of the problem seen with CUDA 7

restored the code before sp/klaus changes for SM 3.0 devices...
2015-03-28 12:01:50 +01:00
Tanguy Pruvot
38e6672d70 Allow test of SM 2.1/3.0 binaries on newer cards
Implementation based on klausT work.. a bit different

This code must be placed in a common .cu file,
cuda.cpp is not compiled with nvcc and doesnt allow cuda code...
2015-03-28 12:00:53 +01:00
Tanguy Pruvot
f86784ee56 Add skein algo (Skeincoin, Myriad, Unat...)
SKEIN512 + SHA256

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-03-27 15:24:27 +01:00
Tanguy Pruvot
a37e909db9 Add zr5 algo (for SM 3.5+)
uint4 copy + keccak cleanup, groestl: small uint4 opt

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-03-27 15:16:25 +01:00
Tanguy Pruvot
9734186a37 jh512: import and improve klaus and sp changes
did not import the extra final function, which should stay compatible
with the common cuda_check_hash()
2015-03-20 05:36:40 +01:00
KlausT
ae8e863591 remove uint32_t cast 2015-03-12 01:01:47 +01:00
Tanguy Pruvot
e6112e878d cleanup: use unsigned throughput parameters
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
2015-02-28 14:05:09 +01:00
Tanguy Pruvot
09c3ac6b4b linux: fix missing dirname include 2015-02-11 18:36:57 +01:00
Tanguy Pruvot
2d5e8aaced anime: fix uint2 error (bmw) 2015-02-08 18:32:42 +01:00
KlausT
a452c330dd quark: remove unused variables 2015-02-02 10:41:14 +01:00
Tanguy Pruvot
26b51a557b Allow different intensity per device
and clean the old variables, no more required
2015-01-24 11:17:29 +01:00