Tanguy Pruvot
d7c2168f2b
quark: static shared memory allocation for SM3+
...
from KlausT committed on 4 Jan, add a few kH/s
2015-11-06 15:16:43 +01:00
Tanguy Pruvot
64e14b7d82
quark: final cleanup for the 1.7
2015-11-06 14:55:43 +01:00
Tanguy Pruvot
2247605d23
quark: add support for SM 2 devices
...
todo: use nonce vectors for the second branch
GPU #0 : Gigabyte GTX 460, 261.26 kH/s
accepted: 2/2 (diff 0.046), 254.36 kH/s yay!!!
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-11-06 04:10:06 +01:00
Tanguy Pruvot
21115b7fc6
scrypt: link texture-cache parameter
2015-11-05 17:06:35 +01:00
Tanguy Pruvot
e50556b637
various changes, cleanup for the release
...
small fixes to handle better the multi thread per gpu
explicitly report than quark is not compatible with SM 2.1 (compact shuffle)
2015-11-04 14:59:59 +01:00
Tanguy Pruvot
1e3db41a8d
multialgo: clear hashrate stats on switch
2015-11-02 19:05:43 +01:00
Tanguy Pruvot
e9b88b45e4
prepare the 1.7 release
2015-11-02 17:52:24 +01:00
Tanguy Pruvot
d3e2088398
basic pool algo switch (without free barrier)
...
not really proper but should works for 2 "small" algos.
just put the "algo" param in each pools config
2015-11-02 17:52:24 +01:00
Tanguy Pruvot
113e22de2e
blake: prevent empty scan ranges with multiple gpus
...
in some cases, an empty scan range was possible in benchmark..
2015-11-01 22:14:17 +01:00
Tanguy Pruvot
61ff92b5b4
never interrupt global benchmark with found nonces
...
fix some algo weird hashrates (like blake)
and reset device between algos, for better accuracy
but this reset doesnt seems enough to bench all algos correctly...
to test on linux, could be a driver issue...
heavy: fix first alloc and indent with tabs...
2015-11-01 21:12:50 +01:00
Tanguy Pruvot
2308f555c3
simd: cleanup and ignore linux host warning
2015-11-01 13:35:36 +01:00
Tanguy Pruvot
0d9d3520ac
simd: add support for SM 2.1 devices
...
Add support for x11..x17, s3, fresh and qubit
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-11-01 12:37:52 +01:00
Tanguy Pruvot
03b2bddc16
lyra2v2: fix SM 3.5 support
...
May work also on SM 3.0 (to check)
2015-10-29 13:10:41 +00:00
Tanguy Pruvot
47f309ffb4
ifdef some unused kernels on SM5+
...
no need to build both (mine and sm variants)
and put global hashrate to 0 while waiting...
2015-10-28 07:25:52 +01:00
Tanguy Pruvot
2673b3aeff
stratum: hide timeout warnings while waiting
...
this timeout is not important, we reconnect after
2015-10-26 09:17:00 +01:00
Tanguy Pruvot
c4d6310143
heavy: fix define typo, else it works with cuda 7.5
2015-10-26 08:41:50 +01:00
Tanguy Pruvot
31bd1697b1
heavy: workaround to build on ubuntu 15.10
...
gcc 5.2.1 with cuda 6.5.19 give a weird C++ error
2015-10-25 11:13:52 +01:00
Tanguy Pruvot
8d4d4d65ce
cuda: header for common kernel functions (quark/x11)
...
Was thinking about doing that since months ;) lets go
2015-10-25 06:54:17 +01:00
Tanguy Pruvot
26c7316a08
vstudio: clean and fix blake ifdef for x64
...
the allocated var was not used... sigh
2015-10-24 18:21:45 +02:00
Tanguy Pruvot
2d83f74a7e
vstudio: special ifdef for the constant (bmw)
2015-10-24 15:13:35 +02:00
Tanguy Pruvot
098310abc6
pentablake: use common blake kernels (quark)
...
reduce the binary size and improve the speed...
2015-10-24 14:18:16 +02:00
Tanguy Pruvot
d43dc9a021
use blake512 sp kernels on SM 5+ (80+64)
...
import and keep my code for older archs, like skein 64
reduce the gap between our versions...
+150kH x11 GTX 960 / +30kH 750Ti
+900kH quark GTX 960 / +230kH 750Ti
2015-10-24 13:43:22 +02:00
Tanguy Pruvot
e12d666d36
pool switch: add thr_id param to handle a future barrier
...
Switching to a pool with a different algo will require a barrier
to free ressources, like what was made in the global benchmark.
add also the algo in pool structure...
2015-10-24 09:58:25 +02:00
Tanguy Pruvot
957d919a6a
bmw512: save a few KBs, ifdef 80-bytes kernel
...
was only used by animecoin
Also ifdef SM 3.0 compat. code to be ignored on recent archs
2015-10-24 07:30:57 +02:00
Tanguy Pruvot
3b7ef923c7
lyra2(v1): use a common uint2x4 include
...
lyrav2 still need more definitions (uint16)
2015-10-23 15:25:24 +02:00
Tanguy Pruvot
82a7e62b30
skein: cleanup, strip uint2x4.h + update vstudio
2015-10-23 13:32:18 +02:00
Tanguy Pruvot
ef817df79a
import sp skein512 unrolled 64-bytes kernel (+0,6% x11)
...
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)
80 bytes implementation to do/test ... (skein/skein2)
but keep my previous version for older devices...
2015-10-23 09:43:20 +02:00
Tanguy Pruvot
dec6dbed77
api: add best share diff and last share time
...
best share diff require --show-diff
shown in the "pool" command
2015-10-22 15:11:16 +02:00
Tanguy Pruvot
e90ade048a
ndevs: get vendor names on windows too
...
ccminer -n 2>NUL
GPU #0 : SM 5.2 GeForce GTX 970
GPU #1 : SM 5.0 Gigabyte GTX 750 Ti
GPU #2 : SM 5.2 ASUS GTX 970
note: nvml destroy is made in proper_exit function
2015-10-22 13:36:46 +02:00
Tanguy Pruvot
59a6cd133b
nvapi: x86 can also get sub vendor ids
2015-10-22 12:29:03 +02:00
Tanguy Pruvot
355b835ae0
benchmark: enhance the mem leak detection
...
reduce "false" warnings, and ignore unrelated/small ones <= 1 MB
On windows the gpu memory can be allocated by other processes
+ some cleanup in algos... (free/gpulog)
2015-10-16 22:04:30 +02:00
Tanguy Pruvot
4868c412b0
windows: add support for SM 2.1, drop SM 3.5 (x86)
...
Mostly to do compatibilty tests, SM 2.1 support is very limited
SM 3.0 code should run on SM 3.5 (only a few cards use this arch)
As i can't test SM 3.5, its best to let users do their own tests...
2015-10-15 23:02:35 +02:00
Tanguy Pruvot
a7d54cd7ef
blake: no need to fail on init, no big alloc
2015-10-15 20:10:58 +02:00
Tanguy Pruvot
c3d10db873
algos: move cmdline algo/alias parser in a func
2015-10-15 08:49:40 +02:00
Tanguy Pruvot
e5d1cf8416
lyra2v2: typo in type, its a struct of 4x uint2 :p
2015-10-15 06:48:42 +02:00
Tanguy Pruvot
6a9280a045
lyra2v2: set a better TPB for intensity 20 (sm52)
...
use sp forced unroll in skein and do some cleanup...
2015-10-15 02:01:34 +02:00
Tanguy Pruvot
5a08c21355
diff: store solved blocs count, update the api
...
Also show the real target diff on pools for the algos with a factor (lyra)
require the --show-diff parameter, may be used as default in the final 1.7
2015-10-14 20:21:14 +02:00
Tanguy Pruvot
32f212469b
lyra2/v2: fixes for vstudio
2015-10-14 03:31:18 +02:00
Tanguy Pruvot
5bf1f98200
various fixes for SM 2.1 and the benchmark
...
X11+ algos and quark are not compatible for the moment
but these ones are :
Benchmark results for Gigabyte GTX 460 (SM 2.1 / 1 GB):
blakecoin : 159090.5 kH/s, 1 MB, 1048576 thr.
blake : 70208.9 kH/s, 1 MB, 1048576 thr.
bmw : 122802.6 kH/s, 65 MB, 2097152 thr.
deep : 3533.6 kH/s, 33 MB, 524288 thr.
fugue256 : 43177.9 kH/s, 17 MB, 524288 thr.
heavy : 4118.2 kH/s, 147 MB, 524032 thr.
keccak : 18673.1 kH/s, 129 MB, 2097152 thr.
luffa : 28816.0 kH/s, 257 MB, 4194304 thr.
lyra2 : 213.7 kH/s, 570 MB, 65536 thr.
mjollnir : 3895.6 kH/s, 147 MB, 524032 thr.
nist5 : 1101.4 kH/s, 67 MB, 1048576 thr.
penta : 501.6 kH/s, 21 MB, 327680 thr.
skein : 5432.4 kH/s, 65 MB, 1048576 thr.
skein2 : 6788.9 kH/s, 33 MB, 524288 thr.
whirlpool : 688.5 kH/s, 33 MB, 524288 thr.
zr5 : 122.5 kH/s, 86 MB, 262144 thr.
2015-10-14 02:59:54 +00:00
Tanguy Pruvot
8fd2739a65
lyra2: support for SM 2.1 cards (GTX 460)
...
also fix the build (scrypt) for this arch.
else, 318,26 kH/s on a GTX 460...
2015-10-14 01:12:41 +00:00
Tanguy Pruvot
fc84c719e9
lyra2: improve cuda implementation (part 1, SM5+)
...
based on the new djm34 method, 2x faster than first version
cleaned and tuned for the GTX 750/960 (linux / cuda 6.5)
2015-10-13 00:57:29 +02:00
Tanguy Pruvot
9dfa757dc7
warn on cuda errors + various small changes
...
The full benchmark can now be launched with "ccminer --benchmark"
add a new helper function which log a warning with last cuda error
(not shown with the quiet option) : CUDA_LOG_ERROR();
it can be used where miner.h is included (.c/.cpp/.cu)
fix x14 (in ccminer.cpp), a break was missing in switch..case
2015-10-12 08:46:13 +02:00
Tanguy Pruvot
8fbfe2cfda
add gpulog() function helper, simple and multi-threads
...
when using multiple cpu threads per gpu, use the T prefix, ex:
[2015-10-11 09:52:49] GPU #0 : app clocks set to P0 (3600/1228)
vs
[2015-10-11 09:52:51] GPU T0: MSI GTX 960, 5953.35 kH/s
Only thr_id is required, the function take care of the dev id
2015-10-11 10:46:05 +02:00
Tanguy Pruvot
58c0bb5c02
intensity: fix typo and drop old function
2015-10-11 08:39:07 +02:00
Tanguy Pruvot
d195f2e8a2
intensity: do not reduce throughput before init
...
Else the memory allocated could be less than required later
btw, use the new "cuda" function to apply intensity/throughput
2015-10-11 05:01:41 +02:00
Tanguy Pruvot
c6dcc5e5cf
benchmark: show mem and default throughput in results
...
and prepare a new function to get the default intensity
also, take care of multiple threads per gpu...
2015-10-11 04:38:28 +02:00
Tanguy Pruvot
8db5a0bc9e
blake: change dynamic round system
...
blakecoin was conflicting with lyra2, set the rounds more properly
2015-10-11 03:46:30 +02:00
Tanguy Pruvot
c7cfe0e2ca
Fix windows linkage, C/C++ mismatch
2015-10-11 00:55:22 +02:00
Tanguy Pruvot
ab5cc7162e
refactor: create bench.cpp and algos.h
...
Also enhance multi-thread benchmark synchro. with pthread barriers
2015-10-11 00:10:27 +02:00
Tanguy Pruvot
c2214091ae
benchmark: free last memory leaks on algo switch
...
remains my original lyra2 implementation to fix... (cuda_lyra2.cu)
I guess some kind of memory overflow force the driver to allocate
memory... but was unable to free it without device reset.
2015-10-10 02:15:32 +02:00