43 Commits

Author SHA1 Message Date
Tanguy Pruvot
a237601747 1.7.1 release
set schedule flags to reduce linux cpu usage without MyStreamSynchronize()
2016-01-26 20:43:16 +01:00
Tanguy Pruvot
e50556b637 various changes, cleanup for the release
small fixes to handle better the multi thread per gpu

explicitly report than quark is not compatible with SM 2.1 (compact shuffle)
2015-11-04 14:59:59 +01:00
Tanguy Pruvot
61ff92b5b4 never interrupt global benchmark with found nonces
fix some algo weird hashrates (like blake)
and reset device between algos, for better accuracy

but this reset doesnt seems enough to bench all algos correctly...

to test on linux, could be a driver issue...

heavy: fix first alloc and indent with tabs...
2015-11-01 21:12:50 +01:00
Tanguy Pruvot
8d4d4d65ce cuda: header for common kernel functions (quark/x11)
Was thinking about doing that since months ;) lets go
2015-10-25 06:54:17 +01:00
Tanguy Pruvot
d43dc9a021 use blake512 sp kernels on SM 5+ (80+64)
import and keep my code for older archs, like skein 64

reduce the gap between our versions...

+150kH x11   GTX 960 / +30kH  750Ti
+900kH quark GTX 960 / +230kH 750Ti
2015-10-24 13:43:22 +02:00
Tanguy Pruvot
355b835ae0 benchmark: enhance the mem leak detection
reduce "false" warnings, and ignore unrelated/small ones <= 1 MB

On windows the gpu memory can be allocated by other processes

+ some cleanup in algos... (free/gpulog)
2015-10-16 22:04:30 +02:00
Tanguy Pruvot
9dfa757dc7 warn on cuda errors + various small changes
The full benchmark can now be launched with "ccminer --benchmark"

add a new helper function which log a warning with last cuda error
(not shown with the quiet option) : CUDA_LOG_ERROR();
it can be used where miner.h is included (.c/.cpp/.cu)

fix x14 (in ccminer.cpp), a break was missing in switch..case
2015-10-12 08:46:13 +02:00
Tanguy Pruvot
d195f2e8a2 intensity: do not reduce throughput before init
Else the memory allocated could be less than required later

btw, use the new "cuda" function to apply intensity/throughput
2015-10-11 05:01:41 +02:00
Tanguy Pruvot
c2214091ae benchmark: free last memory leaks on algo switch
remains my original lyra2 implementation to fix... (cuda_lyra2.cu)

I guess some kind of memory overflow force the driver to allocate
memory... but was unable to free it without device reset.
2015-10-10 02:15:32 +02:00
Tanguy Pruvot
922c2a5cd7 algos: free allocated mem for algo switch
All can be freed propertly now, except script (reset) and lyra2 (leak)
2015-10-08 21:35:30 +02:00
Tanguy Pruvot
ee93927fac diff: use the new function in all algos 2015-10-07 20:10:15 +02:00
Tanguy Pruvot
e1c4b3042c algos: add functions to free allocated resources
Will be used later for algo switching

not really tested yet...
2015-09-25 07:51:57 +02:00
Tanguy Pruvot
5308898d1c start v1.7, apply new prototypes to all algos 2015-09-23 15:42:17 +02:00
Tanguy Pruvot
42bcb91ca0 x11: update sp luffa/cube to get closer x11 speeds..
i had to clean it... lot of unused defines...
2015-06-17 02:31:15 +02:00
Tanguy Pruvot
2113be6eec blake80: some changes and launch bounds, no perf changes 2015-04-24 14:12:21 +02:00
Tanguy Pruvot
3d3f2e2cb5 warnings: use the right device id (device_map[thr_id]) 2015-04-23 09:41:56 +02:00
KlausT
ae8e863591 remove uint32_t cast 2015-03-12 01:01:47 +01:00
Tanguy Pruvot
e6112e878d cleanup: use unsigned throughput parameters
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
2015-02-28 14:05:09 +01:00
Tanguy Pruvot
26b51a557b Allow different intensity per device
and clean the old variables, no more required
2015-01-24 11:17:29 +01:00
Tanguy Pruvot
45206e49c1 hamsi: TPB of 128 give better results (+10kh) 2015-01-24 07:17:12 +01:00
Tanguy Pruvot
2a5233f56e api: report throughput when default 2015-01-22 06:28:59 +01:00
Tanguy Pruvot
cafd4477d7 Handle a maximum of 16 gpus (vs 8 before)
Some cards have 2 gpus on board...
2015-01-22 04:55:27 +01:00
Tanguy Pruvot
c3bdb623e8 Check and submit multiple nonces in one loop
Added to most algos, checkhash function scans a big range
and can find multiple nonces at once if the difficulty is low.

Stop ignoring them, submit second one if found...

Clean the draft code for rc=2 implemented for blake and pentablake

btw... fix the reduced displayed hashrate when a nonce is found...

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2014-12-05 15:53:40 +00:00
Tanguy Pruvot
118a6be361 checkhash: simplify the common function
use klaus trivial function, the old code has always been a bit weird..

split cuda_check_cpu_hash_64 in two functions, keep old for branched stuff
2014-12-01 00:20:40 +01:00
Tanguy Pruvot
8ad180cc70 various small changes
heavy: reduce by 256 threads default intensity to all -i 20
cuda: put static thread init bools outside the code (made once)
api: fix nvml header to build without
2014-11-28 20:57:35 +01:00
Tanguy Pruvot
6ae28162db various extern cleanup + api history uids and gpu SM
uids could be useful to create graphes from history data

Note: please do a clean build after this commit (changes in miner.h)
2014-11-26 11:55:42 +01:00
Tanguy Pruvot
9b1ff1280e Allow intermediate intensity (decimals)
Sample with -i 18.5
  Adding 131072 threads to intensity 18, 393216 cuda threads

And with -i 19.5
  Adding 262144 threads to intensity 19, 786432 cuda threads
2014-11-25 19:57:56 +01:00
Tanguy Pruvot
71f9003901 x13: use tsiv hamsi implementation (+70KH) 2014-11-24 23:01:41 +01:00
Tanguy Pruvot
c88750332c simd512: restore SM3/3.5 perfs
Simple change which affect all algos based on SIMD512

fresh, qubit, s3, x11 to x17...
2014-11-23 19:07:06 +01:00
sp-hash
f0d91ab8a6 Luffa and simd merged to one kernal.
Small echo rewrite. +10KHASH on the 650(compute 3.0)

tpruvot: add Linux Makefile - Force to 80 registers (else -30KH/s)

Note : the hashrate seems more constant with this change
2014-11-23 07:04:07 +01:00
Tanguy Pruvot
73f22b237a Prepare trap of hardware/mem failures 2014-11-20 18:44:25 +01:00
Tanguy Pruvot
fe4ad36b73 intensity: sign warnings fixes min(i,u) 2014-11-17 14:48:55 +01:00
Tanguy Pruvot
c859041993 quark/blake512 opt. pointed by sp without asm
indeed, the pragma unroll doesnt always make things faster

asm part... to check later
2014-11-17 00:01:32 +01:00
Tanguy Pruvot
438308b3a2 Rework benchmark mode and min/max range
Was maybe my fault, but the benchmark mode was
always recomputing from nonce 0.

Also fix blake if -d 1 is used (one thread but second gpu)

stats: do not use thread id as key, prefer gpu id...
2014-11-16 23:28:18 +01:00
Tanguy Pruvot
b128312efb cuda: store device SM in a global var
sample usage made for blake and fugue (higher intensity for SM5.2)

add these to cuda_helper and clean unused code
2014-11-11 19:11:16 +01:00
Tanguy Pruvot
11c5ec810d Handle intensity param in all algos
and add a check related to start/max nounce params
2014-11-09 22:27:32 +01:00
Tanguy Pruvot
7cc5222394 Move common check_cpu functions to root 2014-09-10 00:27:01 +02:00
Tanguy Pruvot
b4e690b486 sources: swith to UTF-8 2014-08-21 08:27:48 +02:00
Tanguy Pruvot
d9ea5f72ce Remove duplicated defines present in cuda_helper.h
also add cudaDeviceReset() on Ctrl+C for nvprof
2014-08-19 03:29:11 +02:00
Tanguy Pruvot
a9a3ad8afc cuda: check for errors on cuda mem alloc 2014-08-17 22:41:05 +02:00
Tanguy Pruvot
cf7351d138 x10 funcs cleanup, we dont need host constant tables 2014-08-15 03:40:13 +02:00
Tanguy Pruvot
06763c20b1 Implement x14 (cuda + cpu functions)
Project was updated for VS2013 and CUDA SDK 6.5

add also a --cputest function to dump cpu hash results

TODO: x15 is not fully functional, but first loop seems ok

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2014-08-12 14:47:03 +02:00
Christian Buchner
d99b91ea65 adding third party X13 and Diamond Groestl code contributions. 2014-06-15 14:31:20 +02:00