826 Commits

Author SHA1 Message Date
Tanguy Pruvot
d3e2088398 basic pool algo switch (without free barrier)
not really proper but should works for 2 "small" algos.

just put the "algo" param in each pools config
2015-11-02 17:52:24 +01:00
Tanguy Pruvot
113e22de2e blake: prevent empty scan ranges with multiple gpus
in some cases, an empty scan range was possible in benchmark..
2015-11-01 22:14:17 +01:00
Tanguy Pruvot
61ff92b5b4 never interrupt global benchmark with found nonces
fix some algo weird hashrates (like blake)
and reset device between algos, for better accuracy

but this reset doesnt seems enough to bench all algos correctly...

to test on linux, could be a driver issue...

heavy: fix first alloc and indent with tabs...
2015-11-01 21:12:50 +01:00
Tanguy Pruvot
2308f555c3 simd: cleanup and ignore linux host warning 2015-11-01 13:35:36 +01:00
Tanguy Pruvot
0d9d3520ac simd: add support for SM 2.1 devices
Add support for x11..x17, s3, fresh and qubit

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-11-01 12:37:52 +01:00
Tanguy Pruvot
03b2bddc16 lyra2v2: fix SM 3.5 support
May work also on SM 3.0 (to check)
2015-10-29 13:10:41 +00:00
Tanguy Pruvot
47f309ffb4 ifdef some unused kernels on SM5+
no need to build both (mine and sm variants)

and put global hashrate to 0 while waiting...
2015-10-28 07:25:52 +01:00
Tanguy Pruvot
2673b3aeff stratum: hide timeout warnings while waiting
this timeout is not important, we reconnect after
2015-10-26 09:17:00 +01:00
Tanguy Pruvot
c4d6310143 heavy: fix define typo, else it works with cuda 7.5 2015-10-26 08:41:50 +01:00
Tanguy Pruvot
31bd1697b1 heavy: workaround to build on ubuntu 15.10
gcc 5.2.1 with cuda 6.5.19 give a weird C++ error
2015-10-25 11:13:52 +01:00
Tanguy Pruvot
8d4d4d65ce cuda: header for common kernel functions (quark/x11)
Was thinking about doing that since months ;) lets go
2015-10-25 06:54:17 +01:00
Tanguy Pruvot
26c7316a08 vstudio: clean and fix blake ifdef for x64
the allocated var was not used... sigh
2015-10-24 18:21:45 +02:00
Tanguy Pruvot
2d83f74a7e vstudio: special ifdef for the constant (bmw) 2015-10-24 15:13:35 +02:00
Tanguy Pruvot
098310abc6 pentablake: use common blake kernels (quark)
reduce the binary size and improve the speed...
2015-10-24 14:18:16 +02:00
Tanguy Pruvot
d43dc9a021 use blake512 sp kernels on SM 5+ (80+64)
import and keep my code for older archs, like skein 64

reduce the gap between our versions...

+150kH x11   GTX 960 / +30kH  750Ti
+900kH quark GTX 960 / +230kH 750Ti
2015-10-24 13:43:22 +02:00
Tanguy Pruvot
e12d666d36 pool switch: add thr_id param to handle a future barrier
Switching to a pool with a different algo will require a barrier
to free ressources, like what was made in the global benchmark.

add also the algo in pool structure...
2015-10-24 09:58:25 +02:00
Tanguy Pruvot
957d919a6a bmw512: save a few KBs, ifdef 80-bytes kernel
was only used by animecoin

Also ifdef SM 3.0 compat. code to be ignored on recent archs
2015-10-24 07:30:57 +02:00
Tanguy Pruvot
3b7ef923c7 lyra2(v1): use a common uint2x4 include
lyrav2 still need more definitions (uint16)
2015-10-23 15:25:24 +02:00
Tanguy Pruvot
82a7e62b30 skein: cleanup, strip uint2x4.h + update vstudio 2015-10-23 13:32:18 +02:00
Tanguy Pruvot
ef817df79a import sp skein512 unrolled 64-bytes kernel (+0,6% x11)
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)

80 bytes implementation to do/test ... (skein/skein2)

but keep my previous version for older devices...
2015-10-23 09:43:20 +02:00
Tanguy Pruvot
dec6dbed77 api: add best share diff and last share time
best share diff require --show-diff

shown in the "pool" command
2015-10-22 15:11:16 +02:00
Tanguy Pruvot
e90ade048a ndevs: get vendor names on windows too
ccminer -n 2>NUL

GPU #0: SM 5.2 GeForce GTX 970
GPU #1: SM 5.0 Gigabyte GTX 750 Ti
GPU #2: SM 5.2 ASUS GTX 970

note: nvml destroy is made in proper_exit function
2015-10-22 13:36:46 +02:00
Tanguy Pruvot
59a6cd133b nvapi: x86 can also get sub vendor ids 2015-10-22 12:29:03 +02:00
Tanguy Pruvot
355b835ae0 benchmark: enhance the mem leak detection
reduce "false" warnings, and ignore unrelated/small ones <= 1 MB

On windows the gpu memory can be allocated by other processes

+ some cleanup in algos... (free/gpulog)
2015-10-16 22:04:30 +02:00
Tanguy Pruvot
4868c412b0 windows: add support for SM 2.1, drop SM 3.5 (x86)
Mostly to do compatibilty tests, SM 2.1 support is very limited

SM 3.0 code should run on SM 3.5 (only a few cards use this arch)

As i can't test SM 3.5, its best to let users do their own tests...
2015-10-15 23:02:35 +02:00
Tanguy Pruvot
a7d54cd7ef blake: no need to fail on init, no big alloc 2015-10-15 20:10:58 +02:00
Tanguy Pruvot
c3d10db873 algos: move cmdline algo/alias parser in a func 2015-10-15 08:49:40 +02:00
Tanguy Pruvot
e5d1cf8416 lyra2v2: typo in type, its a struct of 4x uint2 :p 2015-10-15 06:48:42 +02:00
Tanguy Pruvot
6a9280a045 lyra2v2: set a better TPB for intensity 20 (sm52)
use sp forced unroll in skein and do some cleanup...
2015-10-15 02:01:34 +02:00
Tanguy Pruvot
5a08c21355 diff: store solved blocs count, update the api
Also show the real target diff on pools for the algos with a factor (lyra)

require the --show-diff parameter, may be used as default in the final 1.7
2015-10-14 20:21:14 +02:00
Tanguy Pruvot
32f212469b lyra2/v2: fixes for vstudio 2015-10-14 03:31:18 +02:00
Tanguy Pruvot
5bf1f98200 various fixes for SM 2.1 and the benchmark
X11+ algos and quark are not compatible for the moment

but these ones are :

Benchmark results for Gigabyte GTX 460 (SM 2.1 / 1 GB):

   blakecoin :     159090.5 kH/s,     1 MB,  1048576 thr.
       blake :      70208.9 kH/s,     1 MB,  1048576 thr.
         bmw :     122802.6 kH/s,    65 MB,  2097152 thr.
        deep :       3533.6 kH/s,    33 MB,   524288 thr.
    fugue256 :      43177.9 kH/s,    17 MB,   524288 thr.
       heavy :       4118.2 kH/s,   147 MB,   524032 thr.
      keccak :      18673.1 kH/s,   129 MB,  2097152 thr.
       luffa :      28816.0 kH/s,   257 MB,  4194304 thr.
       lyra2 :        213.7 kH/s,   570 MB,    65536 thr.
    mjollnir :       3895.6 kH/s,   147 MB,   524032 thr.
       nist5 :       1101.4 kH/s,    67 MB,  1048576 thr.
       penta :        501.6 kH/s,    21 MB,   327680 thr.
       skein :       5432.4 kH/s,    65 MB,  1048576 thr.
      skein2 :       6788.9 kH/s,    33 MB,   524288 thr.
   whirlpool :        688.5 kH/s,    33 MB,   524288 thr.
         zr5 :        122.5 kH/s,    86 MB,   262144 thr.
2015-10-14 02:59:54 +00:00
Tanguy Pruvot
8fd2739a65 lyra2: support for SM 2.1 cards (GTX 460)
also fix the build (scrypt) for this arch.

else, 318,26 kH/s on a GTX 460...
2015-10-14 01:12:41 +00:00
Tanguy Pruvot
fc84c719e9 lyra2: improve cuda implementation (part 1, SM5+)
based on the new djm34 method, 2x faster than first version

cleaned and tuned for the GTX 750/960 (linux / cuda 6.5)
2015-10-13 00:57:29 +02:00
Tanguy Pruvot
9dfa757dc7 warn on cuda errors + various small changes
The full benchmark can now be launched with "ccminer --benchmark"

add a new helper function which log a warning with last cuda error
(not shown with the quiet option) : CUDA_LOG_ERROR();
it can be used where miner.h is included (.c/.cpp/.cu)

fix x14 (in ccminer.cpp), a break was missing in switch..case
2015-10-12 08:46:13 +02:00
Tanguy Pruvot
8fbfe2cfda add gpulog() function helper, simple and multi-threads
when using multiple cpu threads per gpu, use the T prefix, ex:

[2015-10-11 09:52:49] GPU #0: app clocks set to P0 (3600/1228)
 vs
[2015-10-11 09:52:51] GPU T0: MSI GTX 960, 5953.35 kH/s

Only thr_id is required, the function take care of the dev id
2015-10-11 10:46:05 +02:00
Tanguy Pruvot
58c0bb5c02 intensity: fix typo and drop old function 2015-10-11 08:39:07 +02:00
Tanguy Pruvot
d195f2e8a2 intensity: do not reduce throughput before init
Else the memory allocated could be less than required later

btw, use the new "cuda" function to apply intensity/throughput
2015-10-11 05:01:41 +02:00
Tanguy Pruvot
c6dcc5e5cf benchmark: show mem and default throughput in results
and prepare a new function to get the default intensity

also, take care of multiple threads per gpu...
2015-10-11 04:38:28 +02:00
Tanguy Pruvot
8db5a0bc9e blake: change dynamic round system
blakecoin was conflicting with lyra2, set the rounds more properly
2015-10-11 03:46:30 +02:00
Tanguy Pruvot
c7cfe0e2ca Fix windows linkage, C/C++ mismatch 2015-10-11 00:55:22 +02:00
Tanguy Pruvot
ab5cc7162e refactor: create bench.cpp and algos.h
Also enhance multi-thread benchmark synchro. with pthread barriers
2015-10-11 00:10:27 +02:00
Tanguy Pruvot
c2214091ae benchmark: free last memory leaks on algo switch
remains my original lyra2 implementation to fix... (cuda_lyra2.cu)

I guess some kind of memory overflow force the driver to allocate
memory... but was unable to free it without device reset.
2015-10-10 02:15:32 +02:00
Tanguy Pruvot
4e1e03b891 benchmark: store all algos results + cuda fixes
Note: lyra2, lyra2v2 and script seems to have problems
to coexist with other algos... to run after some of them...

moved lyra2 first and skip scrypt/jane for the moment...

Only stored in memory for now.. to display a table after the bench

ccminer -a auto --benchmark

Results may be exported later to a json file...
2015-10-09 02:07:08 +02:00
Tanguy Pruvot
934555994d benchmark: allow -a auto to bench all algos at once 2015-10-08 21:41:20 +02:00
Tanguy Pruvot
922c2a5cd7 algos: free allocated mem for algo switch
All can be freed propertly now, except script (reset) and lyra2 (leak)
2015-10-08 21:35:30 +02:00
Tanguy Pruvot
ee93927fac diff: use the new function in all algos 2015-10-07 20:10:15 +02:00
Tanguy Pruvot
42789f1a0d whirlpool: allow stratum compat with new coins
make a difference between whirlpool and whirlcoin algos (stratum)

Look like the old SHA merkleroot method doesnt work on recent coins

Doesn't affect solo mining, only pools using stratum+tcp:// protocol
2015-10-07 02:26:17 +00:00
Tanguy Pruvot
5f12943de5 whirlpool: add algo free function + vstudio 2015-10-06 23:53:03 +02:00
Tanguy Pruvot
b641bfdf8b diff: rename functions like cpuminer-multi
more proper, intuitive...
2015-10-06 23:37:13 +02:00