Tanguy Pruvot
42bcb91ca0
x11: update sp luffa/cube to get closer x11 speeds..
...
i had to clean it... lot of unused defines...
2015-06-17 02:31:15 +02:00
Tanguy Pruvot
2113be6eec
blake80: some changes and launch bounds, no perf changes
2015-04-24 14:12:21 +02:00
Tanguy Pruvot
3d3f2e2cb5
warnings: use the right device id (device_map[thr_id])
2015-04-23 09:41:56 +02:00
Tanguy Pruvot
d58d53f2b2
update README, small changes, prepare release 1.6.1
...
still need a SM 3.0 fix for skein...
2015-04-14 23:28:00 +02:00
Tanguy Pruvot
4f43abb402
bmw512: indent and restore SM 3.0 compat
...
could be also the source of the problem seen with CUDA 7
restored the code before sp/klaus changes for SM 3.0 devices...
2015-03-28 12:01:50 +01:00
KlausT
ae8e863591
remove uint32_t cast
2015-03-12 01:01:47 +01:00
Tanguy Pruvot
35cc5908ee
windows: return to normal priority, fix json decref
...
the jansson error seems only seen in windows debug mode
2015-03-10 19:14:15 +01:00
Tanguy Pruvot
ebd23bcc66
whirlpoolx: real fix for multi gpus
...
Main problem was the arrays allocations which should be made per cpu
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-03-08 22:56:04 +01:00
Tanguy Pruvot
9c4158aadb
debug: x11 algo traces for cuda 7 problem
2015-03-02 16:29:46 +01:00
Tanguy Pruvot
e6112e878d
cleanup: use unsigned throughput parameters
...
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
2015-02-28 14:05:09 +01:00
Tanguy Pruvot
26b51a557b
Allow different intensity per device
...
and clean the old variables, no more required
2015-01-24 11:17:29 +01:00
Tanguy Pruvot
2a5233f56e
api: report throughput when default
2015-01-22 06:28:59 +01:00
Tanguy Pruvot
cafd4477d7
Handle a maximum of 16 gpus (vs 8 before)
...
Some cards have 2 gpus on board...
2015-01-22 04:55:27 +01:00
Tanguy Pruvot
c3bdb623e8
Check and submit multiple nonces in one loop
...
Added to most algos, checkhash function scans a big range
and can find multiple nonces at once if the difficulty is low.
Stop ignoring them, submit second one if found...
Clean the draft code for rc=2 implemented for blake and pentablake
btw... fix the reduced displayed hashrate when a nonce is found...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2014-12-05 15:53:40 +00:00
Tanguy Pruvot
f387898ead
Prepare multiple nonces support in one loop (if found)
...
Tested on x11 which find sometimes 3 nonces in one call,
actually they are ignored because only the biggest was kept...
This commit doesnt fix that, but will allow to enhance shares rate later...
2014-12-05 10:16:06 +01:00
Tanguy Pruvot
118a6be361
checkhash: simplify the common function
...
use klaus trivial function, the old code has always been a bit weird..
split cuda_check_cpu_hash_64 in two functions, keep old for branched stuff
2014-12-01 00:20:40 +01:00
Tanguy Pruvot
8ad180cc70
various small changes
...
heavy: reduce by 256 threads default intensity to all -i 20
cuda: put static thread init bools outside the code (made once)
api: fix nvml header to build without
2014-11-28 20:57:35 +01:00
Tanguy Pruvot
9b1ff1280e
Allow intermediate intensity (decimals)
...
Sample with -i 18.5
Adding 131072 threads to intensity 18, 393216 cuda threads
And with -i 19.5
Adding 262144 threads to intensity 19, 786432 cuda threads
2014-11-25 19:57:56 +01:00
Tanguy Pruvot
c88750332c
simd512: restore SM3/3.5 perfs
...
Simple change which affect all algos based on SIMD512
fresh, qubit, s3, x11 to x17...
2014-11-23 19:07:06 +01:00
sp-hash
f0d91ab8a6
Luffa and simd merged to one kernal.
...
Small echo rewrite. +10KHASH on the 650(compute 3.0)
tpruvot: add Linux Makefile - Force to 80 registers (else -30KH/s)
Note : the hashrate seems more constant with this change
2014-11-23 07:04:07 +01:00
Tanguy Pruvot
73f22b237a
Prepare trap of hardware/mem failures
2014-11-20 18:44:25 +01:00
Tanguy Pruvot
bdfce54c3b
x11: restore default intensity to 19 on windows
2014-11-17 14:48:55 +01:00
Tanguy Pruvot
fe4ad36b73
intensity: sign warnings fixes min(i,u)
2014-11-17 14:48:55 +01:00
Tanguy Pruvot
c859041993
quark/blake512 opt. pointed by sp without asm
...
indeed, the pragma unroll doesnt always make things faster
asm part... to check later
2014-11-17 00:01:32 +01:00
Tanguy Pruvot
438308b3a2
Rework benchmark mode and min/max range
...
Was maybe my fault, but the benchmark mode was
always recomputing from nonce 0.
Also fix blake if -d 1 is used (one thread but second gpu)
stats: do not use thread id as key, prefer gpu id...
2014-11-16 23:28:18 +01:00
Tanguy Pruvot
11dbbcc12d
checkhash: some work on a faster variant (wip)
...
This should not be used for all algos... not enabled yet
todo: multiple nounces or blake32 style checkup
2014-11-16 17:37:02 +01:00
Tanguy Pruvot
14a41959f8
x11: switch to intensity 20 for SM>=5.2 750+970
2014-11-16 17:34:50 +01:00
Tanguy Pruvot
b128312efb
cuda: store device SM in a global var
...
sample usage made for blake and fugue (higher intensity for SM5.2)
add these to cuda_helper and clean unused code
2014-11-11 19:11:16 +01:00
Tanguy Pruvot
11c5ec810d
Handle intensity param in all algos
...
and add a check related to start/max nounce params
2014-11-09 22:27:32 +01:00
Tanguy Pruvot
93f4409dde
simd: then reindent the code
...
no changes, only error checks (cuda safe call)
2014-10-25 23:03:20 +02:00
Tanguy Pruvot
d8a23fa970
Tune quark part of Xn funcs
...
based on klaus commits, will increase a bit speed of most algos
PS: main increase is due to the register count tuning in Makefile
and for skein512 on linux, its the ROTL64
but almost no changes on X11 : 2648MH/s vs 2630 before
2014-10-20 03:15:17 +02:00
Tanguy Pruvot
7cc5222394
Move common check_cpu functions to root
2014-09-10 00:27:01 +02:00
Tanguy Pruvot
95ac1d0f19
x11: adapt some blake 256 opts to 512 one
...
blake512: for the moment 6.2ms vs 7.12 before (+10%)
2014-09-09 17:55:07 +02:00
Tanguy Pruvot
b4e690b486
sources: swith to UTF-8
2014-08-21 08:27:48 +02:00
Tanguy Pruvot
d9ea5f72ce
Remove duplicated defines present in cuda_helper.h
...
also add cudaDeviceReset() on Ctrl+C for nvprof
2014-08-19 03:29:11 +02:00
Tanguy Pruvot
a9a3ad8afc
cuda: check for errors on cuda mem alloc
2014-08-17 22:41:05 +02:00
Tanguy Pruvot
06763c20b1
Implement x14 (cuda + cpu functions)
...
Project was updated for VS2013 and CUDA SDK 6.5
add also a --cputest function to dump cpu hash results
TODO: x15 is not fully functional, but first loop seems ok
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2014-08-12 14:47:03 +02:00
Christian Buchner
3b21069504
bump to revision V1.1 with Killer Groestl
2014-06-14 01:43:28 +02:00
Christian Buchner
be5ba30131
massive speed upgrade for the SIMD hash. AMD, be afraid.
2014-05-14 11:04:09 +02:00
Christian Buchner
44d38e3a9a
Simplification of the SIMD hash code (remove unnecessary lookup tables), increase X11 throughput value somewhat
2014-05-11 02:24:26 +02:00
Christian Buchner
af07302b4b
v1.0 - Yo, I heard y'all like X11
2014-05-10 00:29:59 +02:00