Tanguy Pruvot
4709668995
jh512: rewrite and optimize with asm swap
...
5% improvement by the vshl asm swap functions, mixed shl+add inst.,
Add also xchg(x, y) func and XCHG(x, y) define in cuda_helper for later use...
other jh changes are mainly for the beauty of the code...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
10 years ago
Tanguy Pruvot
a55b148ecc
windows: fix missing off_t include
10 years ago
Tanguy Pruvot
ed4927fcd0
quark/x11: set signed int hashPosition vars to off_t
...
groestl (and keccak?) seems faster with 64bit vars (off_t or int64_t)...
10 years ago
Tanguy Pruvot
ebe95aac2f
bmw512: cleanup after cuda 7 bug fix
10 years ago
Tanguy Pruvot
0224d4705e
skein: fix wrong hashes seen on x11 with cuda 7
...
Look like a stream synch problem, not related to cuda 7 headers or cudart
The threadfence() added doesnt changes performances, and could also
be related to the random cpu validation errors... so keep it for all.
Note: the 80-bytes variant used in skein2 doesn't seems affected.
10 years ago
Tanguy Pruvot
123fe287b6
x11: temporary workaround for cuda 7.0
10 years ago
Tanguy Pruvot
d9b0312897
x64: fix some size_t warnings
10 years ago
Tanguy Pruvot
051ba521be
skein2: minimal host changes
10 years ago
Tanguy Pruvot
2f541065fb
cuda_helper: rename correctly hiword/loword functions
10 years ago
Tanguy Pruvot
2113be6eec
blake80: some changes and launch bounds, no perf changes
10 years ago
Tanguy Pruvot
3d3f2e2cb5
warnings: use the right device id (device_map[thr_id])
10 years ago
Tanguy Pruvot
275a028935
skein: compute midstate first
...
"Real" optimization based on KlausT precalc
10 years ago
Tanguy Pruvot
e7ae27137e
x11/qubit: remove some extra MyStreamSynchronize
...
only one per loop is required to prevent 100% cpu usage
10 years ago
Tanguy Pruvot
163430daae
Skein/Skein2 SM 3.0 devices support
...
+ code cleanup
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
10 years ago
Tanguy Pruvot
d58d53f2b2
update README, small changes, prepare release 1.6.1
...
still need a SM 3.0 fix for skein...
10 years ago
Tanguy Pruvot
48515ad707
groestl: rename included cuda files
10 years ago
Tanguy Pruvot
37395eefe4
skein: restore previous x11 speed
10 years ago
Tanguy Pruvot
4f43abb402
bmw512: indent and restore SM 3.0 compat
...
could be also the source of the problem seen with CUDA 7
restored the code before sp/klaus changes for SM 3.0 devices...
10 years ago
Tanguy Pruvot
38e6672d70
Allow test of SM 2.1/3.0 binaries on newer cards
...
Implementation based on klausT work.. a bit different
This code must be placed in a common .cu file,
cuda.cpp is not compiled with nvcc and doesnt allow cuda code...
10 years ago
Tanguy Pruvot
f86784ee56
Add skein algo (Skeincoin, Myriad, Unat...)
...
SKEIN512 + SHA256
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
10 years ago
Tanguy Pruvot
a37e909db9
Add zr5 algo (for SM 3.5+)
...
uint4 copy + keccak cleanup, groestl: small uint4 opt
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
10 years ago
Tanguy Pruvot
9734186a37
jh512: import and improve klaus and sp changes
...
did not import the extra final function, which should stay compatible
with the common cuda_check_hash()
10 years ago
KlausT
ae8e863591
remove uint32_t cast
10 years ago
Tanguy Pruvot
e6112e878d
cleanup: use unsigned throughput parameters
...
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
10 years ago
Tanguy Pruvot
09c3ac6b4b
linux: fix missing dirname include
10 years ago
Tanguy Pruvot
2d5e8aaced
anime: fix uint2 error (bmw)
10 years ago
KlausT
a452c330dd
quark: remove unused variables
10 years ago
Tanguy Pruvot
26b51a557b
Allow different intensity per device
...
and clean the old variables, no more required
10 years ago
Tanguy Pruvot
768b5ccb76
import bmw512 uint2 changes from sp
...
+ some cleanup... 15KH/s won (750Ti)
10 years ago
Tanguy Pruvot
9f2dd3ee60
Remove some useless conversions
...
do not impact perfs neither...
10 years ago
Tanguy Pruvot
2a5233f56e
api: report throughput when default
10 years ago
Tanguy Pruvot
cafd4477d7
Handle a maximum of 16 gpus (vs 8 before)
...
Some cards have 2 gpus on board...
10 years ago
Tanguy Pruvot
b521acb480
groestl: use sp bitslice enhancement, prepare SM 2.x variant
...
todo: simd512 SM 2.x variant (shfl op), and groestl/myriad functions
10 years ago
Tanguy Pruvot
ec5a48f420
x11: small simd512 gpu_expand improvement
10 years ago
Tanguy Pruvot
1e24e4899c
skein: uint2 optimisation with SM 3.0 compat (+15KH)
...
Thanks to sp and djm34 for this fast uint64 storage alternative
10 years ago
Tanguy Pruvot
2585e10814
keccak uint2 optimisation for SM>3.0 (x11 +40KH/s)
...
based on djm34 keccak 256-bit changes, and keep SM3.0 compat
affect most other algos too (quark, nist5, x13...)
10 years ago
Tanguy Pruvot
c3bdb623e8
Check and submit multiple nonces in one loop
...
Added to most algos, checkhash function scans a big range
and can find multiple nonces at once if the difficulty is low.
Stop ignoring them, submit second one if found...
Clean the draft code for rc=2 implemented for blake and pentablake
btw... fix the reduced displayed hashrate when a nonce is found...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
10 years ago
Tanguy Pruvot
118a6be361
checkhash: simplify the common function
...
use klaus trivial function, the old code has always been a bit weird..
split cuda_check_cpu_hash_64 in two functions, keep old for branched stuff
10 years ago
Tanguy Pruvot
c218c3f514
quark/anime: +100KH, bmw tpb was not correct
...
This small change also enhance a bit x11..17 algos
10 years ago
Tanguy Pruvot
8ad180cc70
various small changes
...
heavy: reduce by 256 threads default intensity to all -i 20
cuda: put static thread init bools outside the code (made once)
api: fix nvml header to build without
10 years ago
Tanguy Pruvot
6ae28162db
various extern cleanup + api history uids and gpu SM
...
uids could be useful to create graphes from history data
Note: please do a clean build after this commit (changes in miner.h)
10 years ago
Tanguy Pruvot
73f22b237a
Prepare trap of hardware/mem failures
10 years ago
Tanguy Pruvot
fe4ad36b73
intensity: sign warnings fixes min(i,u)
10 years ago
Tanguy Pruvot
c859041993
quark/blake512 opt. pointed by sp without asm
...
indeed, the pragma unroll doesnt always make things faster
asm part... to check later
10 years ago
Tanguy Pruvot
b128312efb
cuda: store device SM in a global var
...
sample usage made for blake and fugue (higher intensity for SM5.2)
add these to cuda_helper and clean unused code
10 years ago
Tanguy Pruvot
7a4e1bb327
Reduce keccak, deep & anime intensity + handle groestl -i param
...
default intensity was the max supported by the card, and perf is
not really better. I prefer to let it one under for cards with lower
memory (1GB)
10 years ago
Tanguy Pruvot
7acf987aba
Add intensity to last algos and fix quark speed
10 years ago
Tanguy Pruvot
11c5ec810d
Handle intensity param in all algos
...
and add a check related to start/max nounce params
10 years ago
Tanguy Pruvot
a747e4ca0f
blake512: use a new SWAPDWORDS asm func (0.05ms)
...
small improvement, do it on pentablake and heavy variants too
based on sp commit (but SWAP32 is already used for 32bit ints)
10 years ago
Tanguy Pruvot
e7beac6b1c
x11: tiny sp_ opt on jh512 (0.05ms)
...
modified a bit.. (and removed the mixed dos end of lines ^M)
also, remove the max reg count, now determined with __launch_bounds__
10 years ago