Tanguy Pruvot
276562b623
vstudio: remove remains, move sha in tree
...
+ small code fixes
2017-03-12 18:51:52 +01:00
Tanguy Pruvot
61231bc66c
fix various memory leaks on algo switch
2017-03-11 11:19:20 +01:00
Tanguy Pruvot
07ebcb544d
timetravel algo
...
+ new kernels jh512-80 groestl-80 and cubehash-80
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2017-03-07 20:16:15 +01:00
Tanguy Pruvot
c66e8622b3
api: report per thread cpu hash checks (ACC/REJ)
...
+ update all algos for that...
2017-02-07 06:26:02 +01:00
Tanguy Pruvot
0ff75791e5
migrate 2nd nonce storage of most algos
...
This allow to keep pdata[19] as cursor between scans, and later, to sort them..
remains... heavy, scrypt, sia...
2017-01-29 05:46:45 +01:00
Tanguy Pruvot
36aedbb48e
veltor update, 10x faster :p
...
From Alexis work, sib hash rate 200% also..
2016-11-03 18:54:29 +01:00
Tanguy Pruvot
9eead77027
diff: show by default, rework shares diff storage
...
This will allow later more gpu candidates.
Note: This is an unfinished work, we keep the previous behavior for now
To finish this, all algos solutions should be migrated and submitted nonces attributes stored.
Its required to handle the different share diff per nonce and fix the possible solved count error (if 1/2 nonces is solved).
2016-09-27 09:03:24 +02:00
Tanguy Pruvot
34e97bf3e6
Show intensity on init for all algos
2016-09-27 00:33:06 +02:00
Tanguy Pruvot
683dc0e149
VeltorCoin Streebog based algo (veltor)
...
also known as "Thor's Riddle"... yes sure ;)
Credits to ocminer who found and "implemented" it.
Note: tested "ok" on x64 and CUDA 6.5 x86, not on 7.5 and 8.0 x86
PS: Don't have the time for a more proper CUDA implementation of Streebog
2016-08-18 18:47:37 +02:00
Tanguy Pruvot
de738ccc2b
x11: secure groestl against possible cuda errors
...
big cleanup...
2016-08-06 12:56:02 +02:00
Tanguy Pruvot
0a0fd33cac
attempt to reduce shared mem errors
2016-08-06 12:56:02 +02:00
Tanguy Pruvot
85c212eaad
implement x11evo algo
...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2016-05-31 20:05:15 +02:00
Tanguy Pruvot
a237601747
1.7.1 release
...
set schedule flags to reduce linux cpu usage without MyStreamSynchronize()
2016-01-26 20:43:16 +01:00
Tanguy Pruvot
76a22479b1
whirlpool midstate and debug/trace defines
...
+ new cuda_debug.cuh include to trace gpu data
Happy new year!
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2016-01-01 10:40:26 +01:00
Tanguy Pruvot
8ceb5cfd65
sib: add missing algo free entry + opt 64
2016-01-01 07:58:59 +01:00
Tanguy Pruvot
e75b26feb4
sib coin algo (X11 + Streebog)
...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-12-31 19:11:48 +01:00
Tanguy Pruvot
61ff92b5b4
never interrupt global benchmark with found nonces
...
fix some algo weird hashrates (like blake)
and reset device between algos, for better accuracy
but this reset doesnt seems enough to bench all algos correctly...
to test on linux, could be a driver issue...
heavy: fix first alloc and indent with tabs...
2015-11-01 21:12:50 +01:00
Tanguy Pruvot
2308f555c3
simd: cleanup and ignore linux host warning
2015-11-01 13:35:36 +01:00
Tanguy Pruvot
0d9d3520ac
simd: add support for SM 2.1 devices
...
Add support for x11..x17, s3, fresh and qubit
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-11-01 12:37:52 +01:00
Tanguy Pruvot
8d4d4d65ce
cuda: header for common kernel functions (quark/x11)
...
Was thinking about doing that since months ;) lets go
2015-10-25 06:54:17 +01:00
Tanguy Pruvot
d43dc9a021
use blake512 sp kernels on SM 5+ (80+64)
...
import and keep my code for older archs, like skein 64
reduce the gap between our versions...
+150kH x11 GTX 960 / +30kH 750Ti
+900kH quark GTX 960 / +230kH 750Ti
2015-10-24 13:43:22 +02:00
Tanguy Pruvot
ef817df79a
import sp skein512 unrolled 64-bytes kernel (+0,6% x11)
...
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)
80 bytes implementation to do/test ... (skein/skein2)
but keep my previous version for older devices...
2015-10-23 09:43:20 +02:00
Tanguy Pruvot
355b835ae0
benchmark: enhance the mem leak detection
...
reduce "false" warnings, and ignore unrelated/small ones <= 1 MB
On windows the gpu memory can be allocated by other processes
+ some cleanup in algos... (free/gpulog)
2015-10-16 22:04:30 +02:00
Tanguy Pruvot
d195f2e8a2
intensity: do not reduce throughput before init
...
Else the memory allocated could be less than required later
btw, use the new "cuda" function to apply intensity/throughput
2015-10-11 05:01:41 +02:00
Tanguy Pruvot
922c2a5cd7
algos: free allocated mem for algo switch
...
All can be freed propertly now, except script (reset) and lyra2 (leak)
2015-10-08 21:35:30 +02:00
Tanguy Pruvot
ee93927fac
diff: use the new function in all algos
2015-10-07 20:10:15 +02:00
Tanguy Pruvot
e1c4b3042c
algos: add functions to free allocated resources
...
Will be used later for algo switching
not really tested yet...
2015-09-25 07:51:57 +02:00
Tanguy Pruvot
5308898d1c
start v1.7, apply new prototypes to all algos
2015-09-23 15:42:17 +02:00
Tanguy Pruvot
c5df142124
Add c11 algo (x11 variant)
...
Used by Chaincoin and Flaxscript
2015-06-29 11:46:16 +02:00
Tanguy Pruvot
7981e83db7
nvml: separated vendor id to string function
...
for the day nvidia will fix their nvmlDeviceGetPciInfo api..
2015-06-23 10:01:31 +02:00
Tanguy Pruvot
e21c75793a
Revert "x11: improve aes (shavite/echo)"
...
make a lot of cpu validation errors on windows,
to be double checked in the next version...
This reverts commit 1187a6e7e3211f0216111554a55b685687003b11.
2015-06-23 09:27:40 +02:00
Tanguy Pruvot
1187a6e7e3
x11: improve aes (shavite/echo)
...
shavite is faster, echo doesn't really change due to the reg. overload
This changes allow custom lauchbounds without other code changes and improve
the portability against different devices.
also set a minimum throughput to 1024 for these algos (shared mem req. size)
2015-06-19 05:23:06 +02:00
Tanguy Pruvot
9f5744d4c0
luffa/cube: fine tuning of maxregcount for the 750Ti
...
This allow to get 69 regs used (tested on linux) 69 or 72 make
the compiler to use 64 regs which is not enough on the 750Ti
for optimal performance...
2015-06-17 03:58:31 +02:00
Tanguy Pruvot
634bea21f5
luffa/cube: unroll 1 really required on the 9xx
2015-06-17 03:39:48 +02:00
Tanguy Pruvot
42bcb91ca0
x11: update sp luffa/cube to get closer x11 speeds..
...
i had to clean it... lot of unused defines...
2015-06-17 02:31:15 +02:00
Tanguy Pruvot
2113be6eec
blake80: some changes and launch bounds, no perf changes
2015-04-24 14:12:21 +02:00
Tanguy Pruvot
3d3f2e2cb5
warnings: use the right device id (device_map[thr_id])
2015-04-23 09:41:56 +02:00
Tanguy Pruvot
e7ae27137e
x11/qubit: remove some extra MyStreamSynchronize
...
only one per loop is required to prevent 100% cpu usage
2015-04-15 05:30:22 +02:00
Tanguy Pruvot
d58d53f2b2
update README, small changes, prepare release 1.6.1
...
still need a SM 3.0 fix for skein...
2015-04-14 23:28:00 +02:00
Tanguy Pruvot
4f43abb402
bmw512: indent and restore SM 3.0 compat
...
could be also the source of the problem seen with CUDA 7
restored the code before sp/klaus changes for SM 3.0 devices...
2015-03-28 12:01:50 +01:00
KlausT
ae8e863591
remove uint32_t cast
2015-03-12 01:01:47 +01:00
Tanguy Pruvot
35cc5908ee
windows: return to normal priority, fix json decref
...
the jansson error seems only seen in windows debug mode
2015-03-10 19:14:15 +01:00
Tanguy Pruvot
ebd23bcc66
whirlpoolx: real fix for multi gpus
...
Main problem was the arrays allocations which should be made per cpu
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-03-08 22:56:04 +01:00
Tanguy Pruvot
9c4158aadb
debug: x11 algo traces for cuda 7 problem
2015-03-02 16:29:46 +01:00
Tanguy Pruvot
e6112e878d
cleanup: use unsigned throughput parameters
...
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
2015-02-28 14:05:09 +01:00
Tanguy Pruvot
26b51a557b
Allow different intensity per device
...
and clean the old variables, no more required
2015-01-24 11:17:29 +01:00
Tanguy Pruvot
2a5233f56e
api: report throughput when default
2015-01-22 06:28:59 +01:00
Tanguy Pruvot
cafd4477d7
Handle a maximum of 16 gpus (vs 8 before)
...
Some cards have 2 gpus on board...
2015-01-22 04:55:27 +01:00
Tanguy Pruvot
b521acb480
groestl: use sp bitslice enhancement, prepare SM 2.x variant
...
todo: simd512 SM 2.x variant (shfl op), and groestl/myriad functions
2015-01-19 00:42:14 +01:00
Tanguy Pruvot
90efbdcece
simd cleanup
2014-12-19 09:16:55 +01:00