Tanguy Pruvot
73dd6aac5c
keccak: avoid to use twice cuda_default_throughput
...
and drop useless gpu hash alloc...
2018-01-04 15:58:01 +01:00
Tanguy Pruvot
b70409ab5b
lyra2RE: link the merged blake/keccak kernel into algos
...
old keccak256_gpu_hash_32 kernel commented to reduce binary size
compat. not yet tested on old cards
2017-12-14 18:21:40 +01:00
Myrinia
18d29914ec
Improve Lyra2RE2 Performance
...
Improved Lyra2Re2 Performance by 1 %
2017-12-14 18:21:04 +01:00
Tanguy Pruvot
6c0e656030
keccak: fix issue with intensity
2017-12-09 16:54:48 +01:00
Tanguy Pruvot
015d129aa6
keccak second nonce, and higher intensity
2017-12-04 21:52:11 +01:00
Tanguy Pruvot
cf886b5907
import and adapt alexis optimised keccak256 for SM 5+
...
and increase default intensity for these recent cards
2017-12-04 16:38:20 +01:00
Tanguy Pruvot
e388c11c02
blake2s fix and more missing cuda arch (for the benchmarks)
2017-03-08 13:13:52 +01:00
Tanguy Pruvot
2cdf2ddd43
Add missing real cuda arch checks
2017-03-08 09:19:10 +01:00
Tanguy Pruvot
c66e8622b3
api: report per thread cpu hash checks (ACC/REJ)
...
+ update all algos for that...
2017-02-07 06:26:02 +01:00
Tanguy Pruvot
6440a9bf41
windows: some default intensity adjustments
2017-01-30 02:31:44 +01:00
Tanguy Pruvot
b47d9acaf5
readme + small warnings detected by vstudio
2017-01-29 22:23:05 +01:00
Tanguy Pruvot
0ff75791e5
migrate 2nd nonce storage of most algos
...
This allow to keep pdata[19] as cursor between scans, and later, to sort them..
remains... heavy, scrypt, sia...
2017-01-29 05:46:45 +01:00
Tanguy Pruvot
50534789bc
Release 1.8.4
2016-12-21 20:35:09 +01:00
Tanguy Pruvot
44bd244fc4
blake2s improved
...
based on alexis work, with the new work->nonces
2016-12-21 19:44:20 +01:00
Tanguy Pruvot
a43205a84f
decred: multiple nonces code cleanup
...
The double loop is not useful, and prefer the __thread attribute
to enhance the code readability (remove the 2D host arrays).
squashed: return to host 2D array to allow the free
2016-09-27 22:50:52 +02:00
Tanguy Pruvot
9eead77027
diff: show by default, rework shares diff storage
...
This will allow later more gpu candidates.
Note: This is an unfinished work, we keep the previous behavior for now
To finish this, all algos solutions should be migrated and submitted nonces attributes stored.
Its required to handle the different share diff per nonce and fix the possible solved count error (if 1/2 nonces is solved).
2016-09-27 09:03:24 +02:00
Tanguy Pruvot
2f57ee9157
bench: skip the disabled whirlpoolx
...
+ veltor free
+ some missed/extra log things...
2016-09-27 01:41:49 +02:00
Tanguy Pruvot
34e97bf3e6
Show intensity on init for all algos
2016-09-27 00:33:06 +02:00
Tanguy Pruvot
2ee8bc9791
nvapi: do not print that on normal -D
2016-06-24 10:14:58 +02:00
Tanguy Pruvot
eae4ede111
decred: return to previous implementation + second nonce
...
seems better on windows and a bit easier to read...
2016-06-23 03:54:33 +02:00
Tanguy Pruvot
c643b3b900
decred: and even faster implementation by Alexis
...
optimized for the 9xx and more recent, same results on the 750 Ti
+ restore second nonce support not present in nicehash published version
Better on linux at least...
2016-06-23 00:36:28 +02:00
Tanguy Pruvot
7e490693e0
decred: nicehash/alexis improvement
2016-06-22 22:32:23 +02:00
Tanguy Pruvot
0deb9a2aca
win32: implement a nvapi.dll wrapper like nvml
...
Allow to get/set missing infos like the power limit on x86
squashed for a better min/max and device mapping
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2016-06-21 05:16:12 +02:00
Tanguy Pruvot
710d9292af
fix duplicates on skein2 and blake2s (nonce endian)
2016-05-18 02:53:53 +02:00
Tanguy Pruvot
c0fca5c932
decred: magic improvement in one line
...
+ ifdef the 4WAY commented code...
2016-04-04 17:49:54 +02:00
pallas1
ebf885d482
~10% speedup
2016-04-02 22:21:31 +02:00
alexis78
be1f64446a
vanilla: sync with MrM4D, remove SSE2 midstate computation
...
was not useful and hard to read...
2016-03-23 11:39:34 +01:00
Tanguy Pruvot
5a69056ee5
blake2s cleanup
2016-03-13 19:36:01 +01:00
Tanguy Pruvot
7ffe65c262
blake2s algo
...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2016-03-13 16:50:32 +01:00
Tanguy Pruvot
d58490911a
decred: remove some useless double flip
2016-02-28 18:19:32 +01:00
Tanguy Pruvot
a823cca7f9
decred: allow custom extranonce sizes
...
the extranonce is already placed after header in job.coinbase
2016-02-19 15:52:17 +01:00
Tanguy Pruvot
096f136c36
enhance vanilla second nonce check
2016-02-19 11:31:00 +01:00
Tanguy Pruvot
4944e1a098
mrM4D vnl, with some changes
2016-02-19 11:31:00 +01:00
Tanguy Pruvot
7c9ec8629f
decred: handle a second nonce
2016-02-18 22:47:03 +01:00
Tanguy Pruvot
6e95407dcf
decred algo for longpoll/getwork
...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2016-02-11 07:10:46 +01:00
Tanguy Pruvot
da64c50059
blake: some more tuning and cleanup
2016-01-31 17:07:11 +01:00
Tanguy Pruvot
7c1137f335
blake: small change for the second nonce
2016-01-28 03:05:25 +01:00
Tanguy Pruvot
934f0e5054
blake: reduce intensity (and fix older devices)
2016-01-27 20:04:19 +01:00
Tanguy Pruvot
4a7e239d7c
blake: merge sp improvements, start 1.7.2 dev..
...
to be tested on old arch too...
2016-01-27 18:30:06 +01:00
Tanguy Pruvot
a237601747
1.7.1 release
...
set schedule flags to reduce linux cpu usage without MyStreamSynchronize()
2016-01-26 20:43:16 +01:00
Tanguy Pruvot
e50556b637
various changes, cleanup for the release
...
small fixes to handle better the multi thread per gpu
explicitly report than quark is not compatible with SM 2.1 (compact shuffle)
2015-11-04 14:59:59 +01:00
Tanguy Pruvot
113e22de2e
blake: prevent empty scan ranges with multiple gpus
...
in some cases, an empty scan range was possible in benchmark..
2015-11-01 22:14:17 +01:00
Tanguy Pruvot
61ff92b5b4
never interrupt global benchmark with found nonces
...
fix some algo weird hashrates (like blake)
and reset device between algos, for better accuracy
but this reset doesnt seems enough to bench all algos correctly...
to test on linux, could be a driver issue...
heavy: fix first alloc and indent with tabs...
2015-11-01 21:12:50 +01:00
Tanguy Pruvot
355b835ae0
benchmark: enhance the mem leak detection
...
reduce "false" warnings, and ignore unrelated/small ones <= 1 MB
On windows the gpu memory can be allocated by other processes
+ some cleanup in algos... (free/gpulog)
2015-10-16 22:04:30 +02:00
Tanguy Pruvot
4868c412b0
windows: add support for SM 2.1, drop SM 3.5 (x86)
...
Mostly to do compatibilty tests, SM 2.1 support is very limited
SM 3.0 code should run on SM 3.5 (only a few cards use this arch)
As i can't test SM 3.5, its best to let users do their own tests...
2015-10-15 23:02:35 +02:00
Tanguy Pruvot
a7d54cd7ef
blake: no need to fail on init, no big alloc
2015-10-15 20:10:58 +02:00
Tanguy Pruvot
6a9280a045
lyra2v2: set a better TPB for intensity 20 (sm52)
...
use sp forced unroll in skein and do some cleanup...
2015-10-15 02:01:34 +02:00
Tanguy Pruvot
5bf1f98200
various fixes for SM 2.1 and the benchmark
...
X11+ algos and quark are not compatible for the moment
but these ones are :
Benchmark results for Gigabyte GTX 460 (SM 2.1 / 1 GB):
blakecoin : 159090.5 kH/s, 1 MB, 1048576 thr.
blake : 70208.9 kH/s, 1 MB, 1048576 thr.
bmw : 122802.6 kH/s, 65 MB, 2097152 thr.
deep : 3533.6 kH/s, 33 MB, 524288 thr.
fugue256 : 43177.9 kH/s, 17 MB, 524288 thr.
heavy : 4118.2 kH/s, 147 MB, 524032 thr.
keccak : 18673.1 kH/s, 129 MB, 2097152 thr.
luffa : 28816.0 kH/s, 257 MB, 4194304 thr.
lyra2 : 213.7 kH/s, 570 MB, 65536 thr.
mjollnir : 3895.6 kH/s, 147 MB, 524032 thr.
nist5 : 1101.4 kH/s, 67 MB, 1048576 thr.
penta : 501.6 kH/s, 21 MB, 327680 thr.
skein : 5432.4 kH/s, 65 MB, 1048576 thr.
skein2 : 6788.9 kH/s, 33 MB, 524288 thr.
whirlpool : 688.5 kH/s, 33 MB, 524288 thr.
zr5 : 122.5 kH/s, 86 MB, 262144 thr.
2015-10-14 02:59:54 +00:00
Tanguy Pruvot
fc84c719e9
lyra2: improve cuda implementation (part 1, SM5+)
...
based on the new djm34 method, 2x faster than first version
cleaned and tuned for the GTX 750/960 (linux / cuda 6.5)
2015-10-13 00:57:29 +02:00
Tanguy Pruvot
d195f2e8a2
intensity: do not reduce throughput before init
...
Else the memory allocated could be less than required later
btw, use the new "cuda" function to apply intensity/throughput
2015-10-11 05:01:41 +02:00