Tanguy Pruvot
355b835ae0
benchmark: enhance the mem leak detection
...
reduce "false" warnings, and ignore unrelated/small ones <= 1 MB
On windows the gpu memory can be allocated by other processes
+ some cleanup in algos... (free/gpulog)
9 years ago
Tanguy Pruvot
4868c412b0
windows: add support for SM 2.1, drop SM 3.5 (x86)
...
Mostly to do compatibilty tests, SM 2.1 support is very limited
SM 3.0 code should run on SM 3.5 (only a few cards use this arch)
As i can't test SM 3.5, its best to let users do their own tests...
9 years ago
Tanguy Pruvot
a7d54cd7ef
blake: no need to fail on init, no big alloc
9 years ago
Tanguy Pruvot
6a9280a045
lyra2v2: set a better TPB for intensity 20 (sm52)
...
use sp forced unroll in skein and do some cleanup...
9 years ago
Tanguy Pruvot
5bf1f98200
various fixes for SM 2.1 and the benchmark
...
X11+ algos and quark are not compatible for the moment
but these ones are :
Benchmark results for Gigabyte GTX 460 (SM 2.1 / 1 GB):
blakecoin : 159090.5 kH/s, 1 MB, 1048576 thr.
blake : 70208.9 kH/s, 1 MB, 1048576 thr.
bmw : 122802.6 kH/s, 65 MB, 2097152 thr.
deep : 3533.6 kH/s, 33 MB, 524288 thr.
fugue256 : 43177.9 kH/s, 17 MB, 524288 thr.
heavy : 4118.2 kH/s, 147 MB, 524032 thr.
keccak : 18673.1 kH/s, 129 MB, 2097152 thr.
luffa : 28816.0 kH/s, 257 MB, 4194304 thr.
lyra2 : 213.7 kH/s, 570 MB, 65536 thr.
mjollnir : 3895.6 kH/s, 147 MB, 524032 thr.
nist5 : 1101.4 kH/s, 67 MB, 1048576 thr.
penta : 501.6 kH/s, 21 MB, 327680 thr.
skein : 5432.4 kH/s, 65 MB, 1048576 thr.
skein2 : 6788.9 kH/s, 33 MB, 524288 thr.
whirlpool : 688.5 kH/s, 33 MB, 524288 thr.
zr5 : 122.5 kH/s, 86 MB, 262144 thr.
9 years ago
Tanguy Pruvot
fc84c719e9
lyra2: improve cuda implementation (part 1, SM5+)
...
based on the new djm34 method, 2x faster than first version
cleaned and tuned for the GTX 750/960 (linux / cuda 6.5)
9 years ago
Tanguy Pruvot
d195f2e8a2
intensity: do not reduce throughput before init
...
Else the memory allocated could be less than required later
btw, use the new "cuda" function to apply intensity/throughput
9 years ago
Tanguy Pruvot
c6dcc5e5cf
benchmark: show mem and default throughput in results
...
and prepare a new function to get the default intensity
also, take care of multiple threads per gpu...
9 years ago
Tanguy Pruvot
8db5a0bc9e
blake: change dynamic round system
...
blakecoin was conflicting with lyra2, set the rounds more properly
9 years ago
Tanguy Pruvot
c2214091ae
benchmark: free last memory leaks on algo switch
...
remains my original lyra2 implementation to fix... (cuda_lyra2.cu)
I guess some kind of memory overflow force the driver to allocate
memory... but was unable to free it without device reset.
9 years ago
Tanguy Pruvot
4e1e03b891
benchmark: store all algos results + cuda fixes
...
Note: lyra2, lyra2v2 and script seems to have problems
to coexist with other algos... to run after some of them...
moved lyra2 first and skip scrypt/jane for the moment...
Only stored in memory for now.. to display a table after the bench
ccminer -a auto --benchmark
Results may be exported later to a json file...
9 years ago
Tanguy Pruvot
922c2a5cd7
algos: free allocated mem for algo switch
...
All can be freed propertly now, except script (reset) and lyra2 (leak)
9 years ago
Tanguy Pruvot
ee93927fac
diff: use the new function in all algos
9 years ago
Tanguy Pruvot
5f12943de5
whirlpool: add algo free function + vstudio
9 years ago
Tanguy Pruvot
b641bfdf8b
diff: rename functions like cpuminer-multi
...
more proper, intuitive...
9 years ago
Tanguy Pruvot
e1c4b3042c
algos: add functions to free allocated resources
...
Will be used later for algo switching
not really tested yet...
9 years ago
Tanguy Pruvot
5308898d1c
start v1.7, apply new prototypes to all algos
9 years ago
Tanguy Pruvot
8f98bde4fb
lyra2v2: improve cubehash with uint2
9 years ago
Tanguy Pruvot
21f5435420
lyra2: improve skein256 component
9 years ago
Tanguy Pruvot
01f3183c31
bmw algo for MDT, with midstate
...
which could be extracted from json too
replace a satcoin by another one ;)
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
9 years ago
Tanguy Pruvot
b256ca47a0
bmw256: reduce target array size
9 years ago
Tanguy Pruvot
53cd591956
lyra2v2, bmw256 and cubehash256 cleanup + diff fix
9 years ago
Tanguy Pruvot
d4e191610e
Import and adapt lyra2v2
...
not tested on windows and with SM <= 5
9 years ago
Tanguy Pruvot
b02f79b58b
lyra2: recover the kH/s lost in last commit
10 years ago
Tanguy Pruvot
2b43d57d42
lyra2: simplify skein code (no perf changes)
10 years ago
Tanguy Pruvot
e95712a2ea
lyra2: reduce blake message len.
10 years ago
Tanguy Pruvot
2f541065fb
cuda_helper: rename correctly hiword/loword functions
10 years ago
Tanguy Pruvot
03c3b7d341
Various algos cleanup + lyra2 sec nonce fix
10 years ago
Tanguy Pruvot
34fd408440
lyra2: get a second nonce per gpu scan
10 years ago
Tanguy Pruvot
3d3f2e2cb5
warnings: use the right device id (device_map[thr_id])
10 years ago
Tanguy Pruvot
38e6672d70
Allow test of SM 2.1/3.0 binaries on newer cards
...
Implementation based on klausT work.. a bit different
This code must be placed in a common .cu file,
cuda.cpp is not compiled with nvcc and doesnt allow cuda code...
10 years ago
KlausT
ae8e863591
remove uint32_t cast
10 years ago
Tanguy Pruvot
77c737ff72
various small changes and update readme
10 years ago
Tanguy Pruvot
e6112e878d
cleanup: use unsigned throughput parameters
...
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
10 years ago
Tanguy Pruvot
26b51a557b
Allow different intensity per device
...
and clean the old variables, no more required
10 years ago
Tanguy Pruvot
9f2dd3ee60
Remove some useless conversions
...
do not impact perfs neither...
10 years ago
Tanguy Pruvot
2a5233f56e
api: report throughput when default
10 years ago
Tanguy Pruvot
cafd4477d7
Handle a maximum of 16 gpus (vs 8 before)
...
Some cards have 2 gpus on board...
10 years ago
Tanguy Pruvot
a66d78e692
reduce lyra2 blake and pentablake cpu load
10 years ago
Tanguy Pruvot
63e3387dbb
lyra2: add sm30 device compat (skein256)
10 years ago
Tanguy Pruvot
fa7d744a6c
lyra2: make_uint2 and set pool difficulty
10 years ago
Tanguy Pruvot
49a73971c4
Enhance stale work detection + throughput fixes
...
seems to resolve solo mining lock on share.
export also computed solo work diff in api (not perfect)
In high rate algos, throughput should be unsigned...
This fixes keccak, blake and doom problems
And change terminal color of debug lines, to be selectable in putty,
color code is not supported in windows but selection is ok there.
10 years ago
Tanguy Pruvot
ef8a73d6aa
keccak: not compatible with second nonces (was broken)
...
Use djm34 new uint2 method to get a +40% boost (115 to 153MH/s)
10 years ago
Tanguy Pruvot
c5b349e079
Add Lyra2 algo, based on Vertcoin published code
...
Seems to be djm34 work, i recognize the code style ;)
Code was cleaned/indented and adapted to my fork...
Only usable on the test pool until 16 december 2014!
10 years ago