91 Commits

Author SHA1 Message Date
Tanguy Pruvot
683dc0e149 VeltorCoin Streebog based algo (veltor)
also known as "Thor's Riddle"... yes sure ;)

Credits to ocminer who found and "implemented" it.

Note: tested "ok" on x64 and CUDA 6.5 x86, not on 7.5 and 8.0 x86

PS: Don't have the time for a more proper CUDA implementation of Streebog
2016-08-18 18:47:37 +02:00
Tanguy Pruvot
de738ccc2b x11: secure groestl against possible cuda errors
big cleanup...
2016-08-06 12:56:02 +02:00
Tanguy Pruvot
0a0fd33cac attempt to reduce shared mem errors 2016-08-06 12:56:02 +02:00
Tanguy Pruvot
85c212eaad implement x11evo algo
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2016-05-31 20:05:15 +02:00
Tanguy Pruvot
a237601747 1.7.1 release
set schedule flags to reduce linux cpu usage without MyStreamSynchronize()
2016-01-26 20:43:16 +01:00
Tanguy Pruvot
76a22479b1 whirlpool midstate and debug/trace defines
+ new cuda_debug.cuh include to trace gpu data

Happy new year!

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2016-01-01 10:40:26 +01:00
Tanguy Pruvot
8ceb5cfd65 sib: add missing algo free entry + opt 64 2016-01-01 07:58:59 +01:00
Tanguy Pruvot
e75b26feb4 sib coin algo (X11 + Streebog)
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-12-31 19:11:48 +01:00
Tanguy Pruvot
61ff92b5b4 never interrupt global benchmark with found nonces
fix some algo weird hashrates (like blake)
and reset device between algos, for better accuracy

but this reset doesnt seems enough to bench all algos correctly...

to test on linux, could be a driver issue...

heavy: fix first alloc and indent with tabs...
2015-11-01 21:12:50 +01:00
Tanguy Pruvot
2308f555c3 simd: cleanup and ignore linux host warning 2015-11-01 13:35:36 +01:00
Tanguy Pruvot
0d9d3520ac simd: add support for SM 2.1 devices
Add support for x11..x17, s3, fresh and qubit

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-11-01 12:37:52 +01:00
Tanguy Pruvot
8d4d4d65ce cuda: header for common kernel functions (quark/x11)
Was thinking about doing that since months ;) lets go
2015-10-25 06:54:17 +01:00
Tanguy Pruvot
d43dc9a021 use blake512 sp kernels on SM 5+ (80+64)
import and keep my code for older archs, like skein 64

reduce the gap between our versions...

+150kH x11   GTX 960 / +30kH  750Ti
+900kH quark GTX 960 / +230kH 750Ti
2015-10-24 13:43:22 +02:00
Tanguy Pruvot
ef817df79a import sp skein512 unrolled 64-bytes kernel (+0,6% x11)
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)

80 bytes implementation to do/test ... (skein/skein2)

but keep my previous version for older devices...
2015-10-23 09:43:20 +02:00
Tanguy Pruvot
355b835ae0 benchmark: enhance the mem leak detection
reduce "false" warnings, and ignore unrelated/small ones <= 1 MB

On windows the gpu memory can be allocated by other processes

+ some cleanup in algos... (free/gpulog)
2015-10-16 22:04:30 +02:00
Tanguy Pruvot
d195f2e8a2 intensity: do not reduce throughput before init
Else the memory allocated could be less than required later

btw, use the new "cuda" function to apply intensity/throughput
2015-10-11 05:01:41 +02:00
Tanguy Pruvot
922c2a5cd7 algos: free allocated mem for algo switch
All can be freed propertly now, except script (reset) and lyra2 (leak)
2015-10-08 21:35:30 +02:00
Tanguy Pruvot
ee93927fac diff: use the new function in all algos 2015-10-07 20:10:15 +02:00
Tanguy Pruvot
e1c4b3042c algos: add functions to free allocated resources
Will be used later for algo switching

not really tested yet...
2015-09-25 07:51:57 +02:00
Tanguy Pruvot
5308898d1c start v1.7, apply new prototypes to all algos 2015-09-23 15:42:17 +02:00
Tanguy Pruvot
c5df142124 Add c11 algo (x11 variant)
Used by Chaincoin and Flaxscript
2015-06-29 11:46:16 +02:00
Tanguy Pruvot
7981e83db7 nvml: separated vendor id to string function
for the day nvidia will fix their nvmlDeviceGetPciInfo api..
2015-06-23 10:01:31 +02:00
Tanguy Pruvot
e21c75793a Revert "x11: improve aes (shavite/echo)"
make a lot of cpu validation errors on windows,
to be double checked in the next version...

This reverts commit 1187a6e7e3211f0216111554a55b685687003b11.
2015-06-23 09:27:40 +02:00
Tanguy Pruvot
1187a6e7e3 x11: improve aes (shavite/echo)
shavite is faster, echo doesn't really change due to the reg. overload

This changes allow custom lauchbounds without other code changes and improve
the portability against different devices.

also set a minimum throughput to 1024 for these algos (shared mem req. size)
2015-06-19 05:23:06 +02:00
Tanguy Pruvot
9f5744d4c0 luffa/cube: fine tuning of maxregcount for the 750Ti
This allow to get 69 regs used (tested on linux) 69 or 72 make
the compiler to use 64 regs which is not enough on the 750Ti
for optimal performance...
2015-06-17 03:58:31 +02:00
Tanguy Pruvot
634bea21f5 luffa/cube: unroll 1 really required on the 9xx 2015-06-17 03:39:48 +02:00
Tanguy Pruvot
42bcb91ca0 x11: update sp luffa/cube to get closer x11 speeds..
i had to clean it... lot of unused defines...
2015-06-17 02:31:15 +02:00
Tanguy Pruvot
2113be6eec blake80: some changes and launch bounds, no perf changes 2015-04-24 14:12:21 +02:00
Tanguy Pruvot
3d3f2e2cb5 warnings: use the right device id (device_map[thr_id]) 2015-04-23 09:41:56 +02:00
Tanguy Pruvot
e7ae27137e x11/qubit: remove some extra MyStreamSynchronize
only one per loop is required to prevent 100% cpu usage
2015-04-15 05:30:22 +02:00
Tanguy Pruvot
d58d53f2b2 update README, small changes, prepare release 1.6.1
still need a SM 3.0 fix for skein...
2015-04-14 23:28:00 +02:00
Tanguy Pruvot
4f43abb402 bmw512: indent and restore SM 3.0 compat
could be also the source of the problem seen with CUDA 7

restored the code before sp/klaus changes for SM 3.0 devices...
2015-03-28 12:01:50 +01:00
KlausT
ae8e863591 remove uint32_t cast 2015-03-12 01:01:47 +01:00
Tanguy Pruvot
35cc5908ee windows: return to normal priority, fix json decref
the jansson error seems only seen in windows debug mode
2015-03-10 19:14:15 +01:00
Tanguy Pruvot
ebd23bcc66 whirlpoolx: real fix for multi gpus
Main problem was the arrays allocations which should be made per cpu

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-03-08 22:56:04 +01:00
Tanguy Pruvot
9c4158aadb debug: x11 algo traces for cuda 7 problem 2015-03-02 16:29:46 +01:00
Tanguy Pruvot
e6112e878d cleanup: use unsigned throughput parameters
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
2015-02-28 14:05:09 +01:00
Tanguy Pruvot
26b51a557b Allow different intensity per device
and clean the old variables, no more required
2015-01-24 11:17:29 +01:00
Tanguy Pruvot
2a5233f56e api: report throughput when default 2015-01-22 06:28:59 +01:00
Tanguy Pruvot
cafd4477d7 Handle a maximum of 16 gpus (vs 8 before)
Some cards have 2 gpus on board...
2015-01-22 04:55:27 +01:00
Tanguy Pruvot
b521acb480 groestl: use sp bitslice enhancement, prepare SM 2.x variant
todo: simd512 SM 2.x variant (shfl op), and groestl/myriad functions
2015-01-19 00:42:14 +01:00
Tanguy Pruvot
90efbdcece simd cleanup 2014-12-19 09:16:55 +01:00
Tanguy Pruvot
ec5a48f420 x11: small simd512 gpu_expand improvement 2014-12-19 09:16:55 +01:00
Tanguy Pruvot
6c7fce187b x11: use KlausT optimisation (+20 KHs)
But use a define in AES to use or not device initial memcpy

I already tried to use everywhere direct device constants
and its not faster for big arrays (difference is small)

also change launch bounds to reduce spills (72 regs)

to check on windows too, could improve the perf... or not
2014-12-06 04:14:36 +01:00
Tanguy Pruvot
c3bdb623e8 Check and submit multiple nonces in one loop
Added to most algos, checkhash function scans a big range
and can find multiple nonces at once if the difficulty is low.

Stop ignoring them, submit second one if found...

Clean the draft code for rc=2 implemented for blake and pentablake

btw... fix the reduced displayed hashrate when a nonce is found...

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2014-12-05 15:53:40 +00:00
Tanguy Pruvot
f387898ead Prepare multiple nonces support in one loop (if found)
Tested on x11 which find sometimes 3 nonces in one call,
actually they are ignored because only the biggest was kept...

This commit doesnt fix that, but will allow to enhance shares rate later...
2014-12-05 10:16:06 +01:00
Tanguy Pruvot
118a6be361 checkhash: simplify the common function
use klaus trivial function, the old code has always been a bit weird..

split cuda_check_cpu_hash_64 in two functions, keep old for branched stuff
2014-12-01 00:20:40 +01:00
Tanguy Pruvot
8ad180cc70 various small changes
heavy: reduce by 256 threads default intensity to all -i 20
cuda: put static thread init bools outside the code (made once)
api: fix nvml header to build without
2014-11-28 20:57:35 +01:00
Tanguy Pruvot
6ae28162db various extern cleanup + api history uids and gpu SM
uids could be useful to create graphes from history data

Note: please do a clean build after this commit (changes in miner.h)
2014-11-26 11:55:42 +01:00
Tanguy Pruvot
9b1ff1280e Allow intermediate intensity (decimals)
Sample with -i 18.5
  Adding 131072 threads to intensity 18, 393216 cuda threads

And with -i 19.5
  Adding 262144 threads to intensity 19, 786432 cuda threads
2014-11-25 19:57:56 +01:00