Commit Graph

622 Commits

Author SHA1 Message Date
Tanguy Pruvot
0d9d3520ac simd: add support for SM 2.1 devices
Add support for x11..x17, s3, fresh and qubit

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-11-01 12:37:52 +01:00
Tanguy Pruvot
03b2bddc16 lyra2v2: fix SM 3.5 support
May work also on SM 3.0 (to check)
2015-10-29 13:10:41 +00:00
Tanguy Pruvot
47f309ffb4 ifdef some unused kernels on SM5+
no need to build both (mine and sm variants)

and put global hashrate to 0 while waiting...
2015-10-28 07:25:52 +01:00
Tanguy Pruvot
2673b3aeff stratum: hide timeout warnings while waiting
this timeout is not important, we reconnect after
2015-10-26 09:17:00 +01:00
Tanguy Pruvot
c4d6310143 heavy: fix define typo, else it works with cuda 7.5 2015-10-26 08:41:50 +01:00
Tanguy Pruvot
31bd1697b1 heavy: workaround to build on ubuntu 15.10
gcc 5.2.1 with cuda 6.5.19 give a weird C++ error
2015-10-25 11:13:52 +01:00
Tanguy Pruvot
8d4d4d65ce cuda: header for common kernel functions (quark/x11)
Was thinking about doing that since months ;) lets go
2015-10-25 06:54:17 +01:00
Tanguy Pruvot
26c7316a08 vstudio: clean and fix blake ifdef for x64
the allocated var was not used... sigh
2015-10-24 18:21:45 +02:00
Tanguy Pruvot
2d83f74a7e vstudio: special ifdef for the constant (bmw) 2015-10-24 15:13:35 +02:00
Tanguy Pruvot
098310abc6 pentablake: use common blake kernels (quark)
reduce the binary size and improve the speed...
2015-10-24 14:18:16 +02:00
Tanguy Pruvot
d43dc9a021 use blake512 sp kernels on SM 5+ (80+64)
import and keep my code for older archs, like skein 64

reduce the gap between our versions...

+150kH x11   GTX 960 / +30kH  750Ti
+900kH quark GTX 960 / +230kH 750Ti
2015-10-24 13:43:22 +02:00
Tanguy Pruvot
e12d666d36 pool switch: add thr_id param to handle a future barrier
Switching to a pool with a different algo will require a barrier
to free ressources, like what was made in the global benchmark.

add also the algo in pool structure...
2015-10-24 09:58:25 +02:00
Tanguy Pruvot
957d919a6a bmw512: save a few KBs, ifdef 80-bytes kernel
was only used by animecoin

Also ifdef SM 3.0 compat. code to be ignored on recent archs
2015-10-24 07:30:57 +02:00
Tanguy Pruvot
3b7ef923c7 lyra2(v1): use a common uint2x4 include
lyrav2 still need more definitions (uint16)
2015-10-23 15:25:24 +02:00
Tanguy Pruvot
82a7e62b30 skein: cleanup, strip uint2x4.h + update vstudio 2015-10-23 13:32:18 +02:00
Tanguy Pruvot
ef817df79a import sp skein512 unrolled 64-bytes kernel (+0,6% x11)
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)

80 bytes implementation to do/test ... (skein/skein2)

but keep my previous version for older devices...
2015-10-23 09:43:20 +02:00
Tanguy Pruvot
dec6dbed77 api: add best share diff and last share time
best share diff require --show-diff

shown in the "pool" command
2015-10-22 15:11:16 +02:00
Tanguy Pruvot
e90ade048a ndevs: get vendor names on windows too
ccminer -n 2>NUL

GPU #0: SM 5.2 GeForce GTX 970
GPU #1: SM 5.0 Gigabyte GTX 750 Ti
GPU #2: SM 5.2 ASUS GTX 970

note: nvml destroy is made in proper_exit function
2015-10-22 13:36:46 +02:00
Tanguy Pruvot
59a6cd133b nvapi: x86 can also get sub vendor ids 2015-10-22 12:29:03 +02:00
Tanguy Pruvot
355b835ae0 benchmark: enhance the mem leak detection
reduce "false" warnings, and ignore unrelated/small ones <= 1 MB

On windows the gpu memory can be allocated by other processes

+ some cleanup in algos... (free/gpulog)
2015-10-16 22:04:30 +02:00
Tanguy Pruvot
4868c412b0 windows: add support for SM 2.1, drop SM 3.5 (x86)
Mostly to do compatibilty tests, SM 2.1 support is very limited

SM 3.0 code should run on SM 3.5 (only a few cards use this arch)

As i can't test SM 3.5, its best to let users do their own tests...
2015-10-15 23:02:35 +02:00
Tanguy Pruvot
a7d54cd7ef blake: no need to fail on init, no big alloc 2015-10-15 20:10:58 +02:00
Tanguy Pruvot
c3d10db873 algos: move cmdline algo/alias parser in a func 2015-10-15 08:49:40 +02:00
Tanguy Pruvot
e5d1cf8416 lyra2v2: typo in type, its a struct of 4x uint2 :p 2015-10-15 06:48:42 +02:00
Tanguy Pruvot
6a9280a045 lyra2v2: set a better TPB for intensity 20 (sm52)
use sp forced unroll in skein and do some cleanup...
2015-10-15 02:01:34 +02:00
Tanguy Pruvot
5a08c21355 diff: store solved blocs count, update the api
Also show the real target diff on pools for the algos with a factor (lyra)

require the --show-diff parameter, may be used as default in the final 1.7
2015-10-14 20:21:14 +02:00
Tanguy Pruvot
32f212469b lyra2/v2: fixes for vstudio 2015-10-14 03:31:18 +02:00
Tanguy Pruvot
5bf1f98200 various fixes for SM 2.1 and the benchmark
X11+ algos and quark are not compatible for the moment

but these ones are :

Benchmark results for Gigabyte GTX 460 (SM 2.1 / 1 GB):

   blakecoin :     159090.5 kH/s,     1 MB,  1048576 thr.
       blake :      70208.9 kH/s,     1 MB,  1048576 thr.
         bmw :     122802.6 kH/s,    65 MB,  2097152 thr.
        deep :       3533.6 kH/s,    33 MB,   524288 thr.
    fugue256 :      43177.9 kH/s,    17 MB,   524288 thr.
       heavy :       4118.2 kH/s,   147 MB,   524032 thr.
      keccak :      18673.1 kH/s,   129 MB,  2097152 thr.
       luffa :      28816.0 kH/s,   257 MB,  4194304 thr.
       lyra2 :        213.7 kH/s,   570 MB,    65536 thr.
    mjollnir :       3895.6 kH/s,   147 MB,   524032 thr.
       nist5 :       1101.4 kH/s,    67 MB,  1048576 thr.
       penta :        501.6 kH/s,    21 MB,   327680 thr.
       skein :       5432.4 kH/s,    65 MB,  1048576 thr.
      skein2 :       6788.9 kH/s,    33 MB,   524288 thr.
   whirlpool :        688.5 kH/s,    33 MB,   524288 thr.
         zr5 :        122.5 kH/s,    86 MB,   262144 thr.
2015-10-14 02:59:54 +00:00
Tanguy Pruvot
8fd2739a65 lyra2: support for SM 2.1 cards (GTX 460)
also fix the build (scrypt) for this arch.

else, 318,26 kH/s on a GTX 460...
2015-10-14 01:12:41 +00:00
Tanguy Pruvot
fc84c719e9 lyra2: improve cuda implementation (part 1, SM5+)
based on the new djm34 method, 2x faster than first version

cleaned and tuned for the GTX 750/960 (linux / cuda 6.5)
2015-10-13 00:57:29 +02:00
Tanguy Pruvot
9dfa757dc7 warn on cuda errors + various small changes
The full benchmark can now be launched with "ccminer --benchmark"

add a new helper function which log a warning with last cuda error
(not shown with the quiet option) : CUDA_LOG_ERROR();
it can be used where miner.h is included (.c/.cpp/.cu)

fix x14 (in ccminer.cpp), a break was missing in switch..case
2015-10-12 08:46:13 +02:00
Tanguy Pruvot
8fbfe2cfda add gpulog() function helper, simple and multi-threads
when using multiple cpu threads per gpu, use the T prefix, ex:

[2015-10-11 09:52:49] GPU #0: app clocks set to P0 (3600/1228)
 vs
[2015-10-11 09:52:51] GPU T0: MSI GTX 960, 5953.35 kH/s

Only thr_id is required, the function take care of the dev id
2015-10-11 10:46:05 +02:00
Tanguy Pruvot
58c0bb5c02 intensity: fix typo and drop old function 2015-10-11 08:39:07 +02:00
Tanguy Pruvot
d195f2e8a2 intensity: do not reduce throughput before init
Else the memory allocated could be less than required later

btw, use the new "cuda" function to apply intensity/throughput
2015-10-11 05:01:41 +02:00
Tanguy Pruvot
c6dcc5e5cf benchmark: show mem and default throughput in results
and prepare a new function to get the default intensity

also, take care of multiple threads per gpu...
2015-10-11 04:38:28 +02:00
Tanguy Pruvot
8db5a0bc9e blake: change dynamic round system
blakecoin was conflicting with lyra2, set the rounds more properly
2015-10-11 03:46:30 +02:00
Tanguy Pruvot
c7cfe0e2ca Fix windows linkage, C/C++ mismatch 2015-10-11 00:55:22 +02:00
Tanguy Pruvot
ab5cc7162e refactor: create bench.cpp and algos.h
Also enhance multi-thread benchmark synchro. with pthread barriers
2015-10-11 00:10:27 +02:00
Tanguy Pruvot
c2214091ae benchmark: free last memory leaks on algo switch
remains my original lyra2 implementation to fix... (cuda_lyra2.cu)

I guess some kind of memory overflow force the driver to allocate
memory... but was unable to free it without device reset.
2015-10-10 02:15:32 +02:00
Tanguy Pruvot
4e1e03b891 benchmark: store all algos results + cuda fixes
Note: lyra2, lyra2v2 and script seems to have problems
to coexist with other algos... to run after some of them...

moved lyra2 first and skip scrypt/jane for the moment...

Only stored in memory for now.. to display a table after the bench

ccminer -a auto --benchmark

Results may be exported later to a json file...
2015-10-09 02:07:08 +02:00
Tanguy Pruvot
934555994d benchmark: allow -a auto to bench all algos at once 2015-10-08 21:41:20 +02:00
Tanguy Pruvot
922c2a5cd7 algos: free allocated mem for algo switch
All can be freed propertly now, except script (reset) and lyra2 (leak)
2015-10-08 21:35:30 +02:00
Tanguy Pruvot
ee93927fac diff: use the new function in all algos 2015-10-07 20:10:15 +02:00
Tanguy Pruvot
42789f1a0d whirlpool: allow stratum compat with new coins
make a difference between whirlpool and whirlcoin algos (stratum)

Look like the old SHA merkleroot method doesnt work on recent coins

Doesn't affect solo mining, only pools using stratum+tcp:// protocol
2015-10-07 02:26:17 +00:00
Tanguy Pruvot
5f12943de5 whirlpool: add algo free function + vstudio 2015-10-06 23:53:03 +02:00
Tanguy Pruvot
b641bfdf8b diff: rename functions like cpuminer-multi
more proper, intuitive...
2015-10-06 23:37:13 +02:00
Tanguy Pruvot
3f589cc4db restore the whirlpool algo 2015-10-06 23:37:07 +02:00
Tanguy Pruvot
87edf84bf3 lyra2v2: increase default intensity
to be able to say, like sp, that its faster :p
2015-10-04 21:54:51 +02:00
Tanguy Pruvot
b3adebdf2a lyra2v2: improve speed on SM 5.2 (Cuda 6.5) with sp unrolls
Reduce a bit the 750Ti speed but improve a lot the 9xx speed.

Keep compat for SM 3/3.5 in a second file..

Note: With this code and Cuda 7.5, the speed won is the reverse...
      May be "reverted" soon
2015-10-04 20:22:45 +02:00
Tanguy Pruvot
2ebcd1fbd5 neoscrypt: handle both getwork data sizes FTC/ORB
only affect solo mining, this patch should handle more weird cases

also set getblocktemplate param type to an empty object (ORB)
2015-09-29 13:00:14 +02:00