Tanguy Pruvot
225f25a6b9
uint2: remove the slower asm in operators funcs
8 years ago
Tanguy Pruvot
9f2ed5135b
lbry maxwell and pascal update (up to 10% on pascal)
...
Based on alexis78 work and sponsored by LBRY.IO team (thanks)
Release 1.8.2, use cuda 8 for x86
8 years ago
Tanguy Pruvot
dad0110557
x17 cleanup
...
haval256 is now 2x faster, but sha512 perf depends a lot on cuda version...
9 years ago
Tanguy Pruvot
82a7e62b30
skein: cleanup, strip uint2x4.h + update vstudio
9 years ago
Tanguy Pruvot
ef817df79a
import sp skein512 unrolled 64-bytes kernel (+0,6% x11)
...
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)
80 bytes implementation to do/test ... (skein/skein2)
but keep my previous version for older devices...
9 years ago
Tanguy Pruvot
ab5cc7162e
refactor: create bench.cpp and algos.h
...
Also enhance multi-thread benchmark synchro. with pthread barriers
9 years ago
Tanguy Pruvot
e1c4b3042c
algos: add functions to free allocated resources
...
Will be used later for algo switching
not really tested yet...
9 years ago
Tanguy Pruvot
d4e191610e
Import and adapt lyra2v2
...
not tested on windows and with SM <= 5
9 years ago
Tanguy Pruvot
15293d063f
remove pluck algo
...
Supcoin seems.... dead and the algo was not supported on all devices
9 years ago
Tanguy Pruvot
4709668995
jh512: rewrite and optimize with asm swap
...
5% improvement by the vshl asm swap functions, mixed shl+add inst.,
Add also xchg(x, y) func and XCHG(x, y) define in cuda_helper for later use...
other jh changes are mainly for the beauty of the code...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
10 years ago
Tanguy Pruvot
52df82917a
cuda: fix uint2 substract operator
10 years ago
Tanguy Pruvot
7bf256c81c
cuda_helper: define UINT32_MAX if not defined
...
seems not defined on slackware...
10 years ago
Tanguy Pruvot
2f541065fb
cuda_helper: rename correctly hiword/loword functions
10 years ago
Tanguy Pruvot
b35a6742fe
cuda_helper: properly ifdef for vstudio c++ compat
10 years ago
Tanguy Pruvot
7c7f40a634
neoscrypt: attempt to recode shift256R for SM 3.0
10 years ago
Tanguy Pruvot
1ad34dc13d
reset: take care of multi-threaded gpus (-d 0,0)
...
to be tested... could create problems when reset in a chain like x11...
10 years ago
Tanguy Pruvot
38e6672d70
Allow test of SM 2.1/3.0 binaries on newer cards
...
Implementation based on klausT work.. a bit different
This code must be placed in a common .cu file,
cuda.cpp is not compiled with nvcc and doesnt allow cuda code...
10 years ago
Tanguy Pruvot
7939dce0aa
pluck: adaptation from djm repo
...
remains the cpu validation check to do...
throughput for this algo is divided by 128 to keep same kind of intensity values (default 18.0)
10 years ago
Tanguy Pruvot
3ed1c552bd
cuda: always disable asm for host code
10 years ago
Tanguy Pruvot
e6112e878d
cleanup: use unsigned throughput parameters
...
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
10 years ago
Tanguy Pruvot
768b5ccb76
import bmw512 uint2 changes from sp
...
+ some cleanup... 15KH/s won (750Ti)
10 years ago
Tanguy Pruvot
9f2dd3ee60
Remove some useless conversions
...
do not impact perfs neither...
10 years ago
Tanguy Pruvot
cafd4477d7
Handle a maximum of 16 gpus (vs 8 before)
...
Some cards have 2 gpus on board...
10 years ago
Tanguy Pruvot
b3188669e2
lyra2: cleanup
...
quickly tested with a SM 3.0 binary...
10 years ago
Tanguy Pruvot
da2e2528a7
uint2: fix SM 3.0 ROR and ROL
...
Not sure its the fastest way, but it works for offsets 0-63 + 64
Also note than asm SM 3.5+ doesn't support ROR with offset 64
10 years ago
Tanguy Pruvot
c5b349e079
Add Lyra2 algo, based on Vertcoin published code
...
Seems to be djm34 work, i recognize the code style ;)
Code was cleaned/indented and adapted to my fork...
Only usable on the test pool until 16 december 2014!
10 years ago
Tanguy Pruvot
c3bdb623e8
Check and submit multiple nonces in one loop
...
Added to most algos, checkhash function scans a big range
and can find multiple nonces at once if the difficulty is low.
Stop ignoring them, submit second one if found...
Clean the draft code for rc=2 implemented for blake and pentablake
btw... fix the reduced displayed hashrate when a nonce is found...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
10 years ago
Tanguy Pruvot
f387898ead
Prepare multiple nonces support in one loop (if found)
...
Tested on x11 which find sometimes 3 nonces in one call,
actually they are ignored because only the biggest was kept...
This commit doesnt fix that, but will allow to enhance shares rate later...
10 years ago
Tanguy Pruvot
118a6be361
checkhash: simplify the common function
...
use klaus trivial function, the old code has always been a bit weird..
split cuda_check_cpu_hash_64 in two functions, keep old for branched stuff
10 years ago
Tanguy Pruvot
6ae28162db
various extern cleanup + api history uids and gpu SM
...
uids could be useful to create graphes from history data
Note: please do a clean build after this commit (changes in miner.h)
10 years ago
sp-hash
26b9fe3586
faster x15, +23KH or 4ms on whirpool (30ms vs 34ms)
...
tpruvot: i didnt pick the asm replace_hiword, slower on linux
10 years ago
Tanguy Pruvot
73f22b237a
Prepare trap of hardware/mem failures
10 years ago
Tanguy Pruvot
11dbbcc12d
checkhash: some work on a faster variant (wip)
...
This should not be used for all algos... not enabled yet
todo: multiple nounces or blake32 style checkup
10 years ago
Tanguy Pruvot
b128312efb
cuda: store device SM in a global var
...
sample usage made for blake and fugue (higher intensity for SM5.2)
add these to cuda_helper and clean unused code
10 years ago
Tanguy Pruvot
987edf63f3
vstudio: fix launch_bounds intellisense warnings in ide
10 years ago
Tanguy Pruvot
149143d5cd
Fix left value warning in SWAPDWORDS + groestl change
10 years ago
Tanguy Pruvot
a747e4ca0f
blake512: use a new SWAPDWORDS asm func (0.05ms)
...
small improvement, do it on pentablake and heavy variants too
based on sp commit (but SWAP32 is already used for 32bit ints)
10 years ago
Tanguy Pruvot
5bc969fa57
Some work on data alignment
...
linux: add -march=native (we build it ourself) and some other flags
+ remove unused vars (seen with -Wall)
10 years ago
Tanguy Pruvot
2de9b1375b
prepare next version
10 years ago
Tanguy Pruvot
d8a23fa970
Tune quark part of Xn funcs
...
based on klaus commits, will increase a bit speed of most algos
PS: main increase is due to the register count tuning in Makefile
and for skein512 on linux, its the ROTL64
but almost no changes on X11 : 2648MH/s vs 2630 before
10 years ago
Tanguy Pruvot
ba33492592
blake: return to ptarget 6:7 compare
...
clz can be erroneous, ex 0xE0 vs 0xF0
10 years ago
Tanguy Pruvot
91eea0d76b
blake: remove int cudaMemcpyToSymbol for MSVC
...
use clz (leading zeros) asm func for a fast gpu compare of ptarget[6]:[7]
add also missing windows ctz/clz host functions
New NEOS speed: 227MH to 270MH (Gigabyte 750Ti Black Edition)
10 years ago
Tanguy Pruvot
c3eb66683a
Import djm34 qubit, deep and doom algos
...
Indent, and put commonly used functions proto. in cuda_helper.h
And add them to --cputest function
Also change the color option to --nocolor, -C is no more needed
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
(Which is tired to remove these german copy/pasted comments)
10 years ago
Tanguy Pruvot
13bb9d267e
Remove debug rpc, already exists with -P
10 years ago
Tanguy Pruvot
64e8cd3f98
add x17 algo, cleaned djm34 commit
...
todo: visual studio...
10 years ago
Tanguy Pruvot
3f6ebc10cc
whirlpool: x64 asm is very slow (30ms win32 vs 90)
10 years ago
Tanguy Pruvot
912ef1215d
small reg tunes, rename whirlcoin to whirl
10 years ago
Tanguy Pruvot
1fbcbbacc4
Add whirlcoin and optimize x11 luffa (maxrregcount)
10 years ago
Tanguy Pruvot
4bc23048b5
x15: use djm34 code with asm xor64 + my rot64
...
some optimizations could be done later, after whirlcoin integration
10 years ago
Tanguy Pruvot
d9ea5f72ce
Remove duplicated defines present in cuda_helper.h
...
also add cudaDeviceReset() on Ctrl+C for nvprof
10 years ago