Tanguy Pruvot
cf886b5907
import and adapt alexis optimised keccak256 for SM 5+
...
and increase default intensity for these recent cards
7 years ago
Tanguy Pruvot
e388c11c02
blake2s fix and more missing cuda arch (for the benchmarks)
8 years ago
Tanguy Pruvot
2cdf2ddd43
Add missing real cuda arch checks
8 years ago
Tanguy Pruvot
c66e8622b3
api: report per thread cpu hash checks (ACC/REJ)
...
+ update all algos for that...
8 years ago
Tanguy Pruvot
6440a9bf41
windows: some default intensity adjustments
8 years ago
Tanguy Pruvot
b47d9acaf5
readme + small warnings detected by vstudio
8 years ago
Tanguy Pruvot
0ff75791e5
migrate 2nd nonce storage of most algos
...
This allow to keep pdata[19] as cursor between scans, and later, to sort them..
remains... heavy, scrypt, sia...
8 years ago
Tanguy Pruvot
50534789bc
Release 1.8.4
8 years ago
Tanguy Pruvot
44bd244fc4
blake2s improved
...
based on alexis work, with the new work->nonces
8 years ago
Tanguy Pruvot
a43205a84f
decred: multiple nonces code cleanup
...
The double loop is not useful, and prefer the __thread attribute
to enhance the code readability (remove the 2D host arrays).
squashed: return to host 2D array to allow the free
8 years ago
Tanguy Pruvot
9eead77027
diff: show by default, rework shares diff storage
...
This will allow later more gpu candidates.
Note: This is an unfinished work, we keep the previous behavior for now
To finish this, all algos solutions should be migrated and submitted nonces attributes stored.
Its required to handle the different share diff per nonce and fix the possible solved count error (if 1/2 nonces is solved).
8 years ago
Tanguy Pruvot
2f57ee9157
bench: skip the disabled whirlpoolx
...
+ veltor free
+ some missed/extra log things...
8 years ago
Tanguy Pruvot
34e97bf3e6
Show intensity on init for all algos
8 years ago
Tanguy Pruvot
2ee8bc9791
nvapi: do not print that on normal -D
9 years ago
Tanguy Pruvot
eae4ede111
decred: return to previous implementation + second nonce
...
seems better on windows and a bit easier to read...
9 years ago
Tanguy Pruvot
c643b3b900
decred: and even faster implementation by Alexis
...
optimized for the 9xx and more recent, same results on the 750 Ti
+ restore second nonce support not present in nicehash published version
Better on linux at least...
9 years ago
Tanguy Pruvot
7e490693e0
decred: nicehash/alexis improvement
9 years ago
Tanguy Pruvot
0deb9a2aca
win32: implement a nvapi.dll wrapper like nvml
...
Allow to get/set missing infos like the power limit on x86
squashed for a better min/max and device mapping
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
9 years ago
Tanguy Pruvot
710d9292af
fix duplicates on skein2 and blake2s (nonce endian)
9 years ago
Tanguy Pruvot
c0fca5c932
decred: magic improvement in one line
...
+ ifdef the 4WAY commented code...
9 years ago
pallas1
ebf885d482
~10% speedup
9 years ago
alexis78
be1f64446a
vanilla: sync with MrM4D, remove SSE2 midstate computation
...
was not useful and hard to read...
9 years ago
Tanguy Pruvot
5a69056ee5
blake2s cleanup
9 years ago
Tanguy Pruvot
7ffe65c262
blake2s algo
...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
9 years ago
Tanguy Pruvot
d58490911a
decred: remove some useless double flip
9 years ago
Tanguy Pruvot
a823cca7f9
decred: allow custom extranonce sizes
...
the extranonce is already placed after header in job.coinbase
9 years ago
Tanguy Pruvot
096f136c36
enhance vanilla second nonce check
9 years ago
Tanguy Pruvot
4944e1a098
mrM4D vnl, with some changes
9 years ago
Tanguy Pruvot
7c9ec8629f
decred: handle a second nonce
9 years ago
Tanguy Pruvot
6e95407dcf
decred algo for longpoll/getwork
...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
9 years ago
Tanguy Pruvot
da64c50059
blake: some more tuning and cleanup
9 years ago
Tanguy Pruvot
7c1137f335
blake: small change for the second nonce
9 years ago
Tanguy Pruvot
934f0e5054
blake: reduce intensity (and fix older devices)
9 years ago
Tanguy Pruvot
4a7e239d7c
blake: merge sp improvements, start 1.7.2 dev..
...
to be tested on old arch too...
9 years ago
Tanguy Pruvot
a237601747
1.7.1 release
...
set schedule flags to reduce linux cpu usage without MyStreamSynchronize()
9 years ago
Tanguy Pruvot
e50556b637
various changes, cleanup for the release
...
small fixes to handle better the multi thread per gpu
explicitly report than quark is not compatible with SM 2.1 (compact shuffle)
9 years ago
Tanguy Pruvot
113e22de2e
blake: prevent empty scan ranges with multiple gpus
...
in some cases, an empty scan range was possible in benchmark..
9 years ago
Tanguy Pruvot
61ff92b5b4
never interrupt global benchmark with found nonces
...
fix some algo weird hashrates (like blake)
and reset device between algos, for better accuracy
but this reset doesnt seems enough to bench all algos correctly...
to test on linux, could be a driver issue...
heavy: fix first alloc and indent with tabs...
9 years ago
Tanguy Pruvot
355b835ae0
benchmark: enhance the mem leak detection
...
reduce "false" warnings, and ignore unrelated/small ones <= 1 MB
On windows the gpu memory can be allocated by other processes
+ some cleanup in algos... (free/gpulog)
9 years ago
Tanguy Pruvot
4868c412b0
windows: add support for SM 2.1, drop SM 3.5 (x86)
...
Mostly to do compatibilty tests, SM 2.1 support is very limited
SM 3.0 code should run on SM 3.5 (only a few cards use this arch)
As i can't test SM 3.5, its best to let users do their own tests...
9 years ago
Tanguy Pruvot
a7d54cd7ef
blake: no need to fail on init, no big alloc
9 years ago
Tanguy Pruvot
6a9280a045
lyra2v2: set a better TPB for intensity 20 (sm52)
...
use sp forced unroll in skein and do some cleanup...
9 years ago
Tanguy Pruvot
5bf1f98200
various fixes for SM 2.1 and the benchmark
...
X11+ algos and quark are not compatible for the moment
but these ones are :
Benchmark results for Gigabyte GTX 460 (SM 2.1 / 1 GB):
blakecoin : 159090.5 kH/s, 1 MB, 1048576 thr.
blake : 70208.9 kH/s, 1 MB, 1048576 thr.
bmw : 122802.6 kH/s, 65 MB, 2097152 thr.
deep : 3533.6 kH/s, 33 MB, 524288 thr.
fugue256 : 43177.9 kH/s, 17 MB, 524288 thr.
heavy : 4118.2 kH/s, 147 MB, 524032 thr.
keccak : 18673.1 kH/s, 129 MB, 2097152 thr.
luffa : 28816.0 kH/s, 257 MB, 4194304 thr.
lyra2 : 213.7 kH/s, 570 MB, 65536 thr.
mjollnir : 3895.6 kH/s, 147 MB, 524032 thr.
nist5 : 1101.4 kH/s, 67 MB, 1048576 thr.
penta : 501.6 kH/s, 21 MB, 327680 thr.
skein : 5432.4 kH/s, 65 MB, 1048576 thr.
skein2 : 6788.9 kH/s, 33 MB, 524288 thr.
whirlpool : 688.5 kH/s, 33 MB, 524288 thr.
zr5 : 122.5 kH/s, 86 MB, 262144 thr.
9 years ago
Tanguy Pruvot
fc84c719e9
lyra2: improve cuda implementation (part 1, SM5+)
...
based on the new djm34 method, 2x faster than first version
cleaned and tuned for the GTX 750/960 (linux / cuda 6.5)
9 years ago
Tanguy Pruvot
d195f2e8a2
intensity: do not reduce throughput before init
...
Else the memory allocated could be less than required later
btw, use the new "cuda" function to apply intensity/throughput
9 years ago
Tanguy Pruvot
c6dcc5e5cf
benchmark: show mem and default throughput in results
...
and prepare a new function to get the default intensity
also, take care of multiple threads per gpu...
9 years ago
Tanguy Pruvot
8db5a0bc9e
blake: change dynamic round system
...
blakecoin was conflicting with lyra2, set the rounds more properly
9 years ago
Tanguy Pruvot
c2214091ae
benchmark: free last memory leaks on algo switch
...
remains my original lyra2 implementation to fix... (cuda_lyra2.cu)
I guess some kind of memory overflow force the driver to allocate
memory... but was unable to free it without device reset.
9 years ago
Tanguy Pruvot
4e1e03b891
benchmark: store all algos results + cuda fixes
...
Note: lyra2, lyra2v2 and script seems to have problems
to coexist with other algos... to run after some of them...
moved lyra2 first and skip scrypt/jane for the moment...
Only stored in memory for now.. to display a table after the bench
ccminer -a auto --benchmark
Results may be exported later to a json file...
9 years ago
Tanguy Pruvot
922c2a5cd7
algos: free allocated mem for algo switch
...
All can be freed propertly now, except script (reset) and lyra2 (leak)
9 years ago