Tanguy Pruvot
922c2a5cd7
algos: free allocated mem for algo switch
...
All can be freed propertly now, except script (reset) and lyra2 (leak)
2015-10-08 21:35:30 +02:00
Tanguy Pruvot
ee93927fac
diff: use the new function in all algos
2015-10-07 20:10:15 +02:00
Tanguy Pruvot
e1c4b3042c
algos: add functions to free allocated resources
...
Will be used later for algo switching
not really tested yet...
2015-09-25 07:51:57 +02:00
Tanguy Pruvot
5308898d1c
start v1.7, apply new prototypes to all algos
2015-09-23 15:42:17 +02:00
Tanguy Pruvot
e3548f46f3
drop animecoin support
...
no more really minable... just minable in french
2015-08-22 12:35:22 +02:00
Tanguy Pruvot
4709668995
jh512: rewrite and optimize with asm swap
...
5% improvement by the vshl asm swap functions, mixed shl+add inst.,
Add also xchg(x, y) func and XCHG(x, y) define in cuda_helper for later use...
other jh changes are mainly for the beauty of the code...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-06-16 08:20:48 +02:00
Tanguy Pruvot
a55b148ecc
windows: fix missing off_t include
2015-06-08 16:58:12 +02:00
Tanguy Pruvot
ed4927fcd0
quark/x11: set signed int hashPosition vars to off_t
...
groestl (and keccak?) seems faster with 64bit vars (off_t or int64_t)...
2015-06-05 22:03:05 +02:00
Tanguy Pruvot
ebe95aac2f
bmw512: cleanup after cuda 7 bug fix
2015-05-29 14:32:23 +02:00
Tanguy Pruvot
0224d4705e
skein: fix wrong hashes seen on x11 with cuda 7
...
Look like a stream synch problem, not related to cuda 7 headers or cudart
The threadfence() added doesnt changes performances, and could also
be related to the random cpu validation errors... so keep it for all.
Note: the 80-bytes variant used in skein2 doesn't seems affected.
2015-05-29 12:16:54 +02:00
Tanguy Pruvot
123fe287b6
x11: temporary workaround for cuda 7.0
2015-05-28 21:19:24 +02:00
Tanguy Pruvot
d9b0312897
x64: fix some size_t warnings
2015-05-17 04:56:42 +02:00
Tanguy Pruvot
051ba521be
skein2: minimal host changes
2015-05-14 19:38:03 +02:00
Tanguy Pruvot
2f541065fb
cuda_helper: rename correctly hiword/loword functions
2015-05-12 17:13:58 +02:00
Tanguy Pruvot
2113be6eec
blake80: some changes and launch bounds, no perf changes
2015-04-24 14:12:21 +02:00
Tanguy Pruvot
3d3f2e2cb5
warnings: use the right device id (device_map[thr_id])
2015-04-23 09:41:56 +02:00
Tanguy Pruvot
275a028935
skein: compute midstate first
...
"Real" optimization based on KlausT precalc
2015-04-16 02:11:37 +02:00
Tanguy Pruvot
e7ae27137e
x11/qubit: remove some extra MyStreamSynchronize
...
only one per loop is required to prevent 100% cpu usage
2015-04-15 05:30:22 +02:00
Tanguy Pruvot
163430daae
Skein/Skein2 SM 3.0 devices support
...
+ code cleanup
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-04-15 01:27:48 +02:00
Tanguy Pruvot
d58d53f2b2
update README, small changes, prepare release 1.6.1
...
still need a SM 3.0 fix for skein...
2015-04-14 23:28:00 +02:00
Tanguy Pruvot
48515ad707
groestl: rename included cuda files
2015-04-06 23:46:34 +02:00
Tanguy Pruvot
37395eefe4
skein: restore previous x11 speed
2015-03-28 13:32:08 +01:00
Tanguy Pruvot
4f43abb402
bmw512: indent and restore SM 3.0 compat
...
could be also the source of the problem seen with CUDA 7
restored the code before sp/klaus changes for SM 3.0 devices...
2015-03-28 12:01:50 +01:00
Tanguy Pruvot
38e6672d70
Allow test of SM 2.1/3.0 binaries on newer cards
...
Implementation based on klausT work.. a bit different
This code must be placed in a common .cu file,
cuda.cpp is not compiled with nvcc and doesnt allow cuda code...
2015-03-28 12:00:53 +01:00
Tanguy Pruvot
f86784ee56
Add skein algo (Skeincoin, Myriad, Unat...)
...
SKEIN512 + SHA256
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-03-27 15:24:27 +01:00
Tanguy Pruvot
a37e909db9
Add zr5 algo (for SM 3.5+)
...
uint4 copy + keccak cleanup, groestl: small uint4 opt
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-03-27 15:16:25 +01:00
Tanguy Pruvot
9734186a37
jh512: import and improve klaus and sp changes
...
did not import the extra final function, which should stay compatible
with the common cuda_check_hash()
2015-03-20 05:36:40 +01:00
KlausT
ae8e863591
remove uint32_t cast
2015-03-12 01:01:47 +01:00
Tanguy Pruvot
e6112e878d
cleanup: use unsigned throughput parameters
...
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
2015-02-28 14:05:09 +01:00
Tanguy Pruvot
09c3ac6b4b
linux: fix missing dirname include
2015-02-11 18:36:57 +01:00
Tanguy Pruvot
2d5e8aaced
anime: fix uint2 error (bmw)
2015-02-08 18:32:42 +01:00
KlausT
a452c330dd
quark: remove unused variables
2015-02-02 10:41:14 +01:00
Tanguy Pruvot
26b51a557b
Allow different intensity per device
...
and clean the old variables, no more required
2015-01-24 11:17:29 +01:00
Tanguy Pruvot
768b5ccb76
import bmw512 uint2 changes from sp
...
+ some cleanup... 15KH/s won (750Ti)
2015-01-24 08:02:41 +01:00
Tanguy Pruvot
9f2dd3ee60
Remove some useless conversions
...
do not impact perfs neither...
2015-01-24 08:00:22 +01:00
Tanguy Pruvot
2a5233f56e
api: report throughput when default
2015-01-22 06:28:59 +01:00
Tanguy Pruvot
cafd4477d7
Handle a maximum of 16 gpus (vs 8 before)
...
Some cards have 2 gpus on board...
2015-01-22 04:55:27 +01:00
Tanguy Pruvot
b521acb480
groestl: use sp bitslice enhancement, prepare SM 2.x variant
...
todo: simd512 SM 2.x variant (shfl op), and groestl/myriad functions
2015-01-19 00:42:14 +01:00
Tanguy Pruvot
ec5a48f420
x11: small simd512 gpu_expand improvement
2014-12-19 09:16:55 +01:00
Tanguy Pruvot
1e24e4899c
skein: uint2 optimisation with SM 3.0 compat (+15KH)
...
Thanks to sp and djm34 for this fast uint64 storage alternative
2014-12-16 13:52:54 +01:00
Tanguy Pruvot
2585e10814
keccak uint2 optimisation for SM>3.0 (x11 +40KH/s)
...
based on djm34 keccak 256-bit changes, and keep SM3.0 compat
affect most other algos too (quark, nist5, x13...)
2014-12-15 11:34:03 +01:00
Tanguy Pruvot
c3bdb623e8
Check and submit multiple nonces in one loop
...
Added to most algos, checkhash function scans a big range
and can find multiple nonces at once if the difficulty is low.
Stop ignoring them, submit second one if found...
Clean the draft code for rc=2 implemented for blake and pentablake
btw... fix the reduced displayed hashrate when a nonce is found...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2014-12-05 15:53:40 +00:00
Tanguy Pruvot
118a6be361
checkhash: simplify the common function
...
use klaus trivial function, the old code has always been a bit weird..
split cuda_check_cpu_hash_64 in two functions, keep old for branched stuff
2014-12-01 00:20:40 +01:00
Tanguy Pruvot
c218c3f514
quark/anime: +100KH, bmw tpb was not correct
...
This small change also enhance a bit x11..17 algos
2014-11-28 22:18:48 +01:00
Tanguy Pruvot
8ad180cc70
various small changes
...
heavy: reduce by 256 threads default intensity to all -i 20
cuda: put static thread init bools outside the code (made once)
api: fix nvml header to build without
2014-11-28 20:57:35 +01:00
Tanguy Pruvot
6ae28162db
various extern cleanup + api history uids and gpu SM
...
uids could be useful to create graphes from history data
Note: please do a clean build after this commit (changes in miner.h)
2014-11-26 11:55:42 +01:00
Tanguy Pruvot
73f22b237a
Prepare trap of hardware/mem failures
2014-11-20 18:44:25 +01:00
Tanguy Pruvot
fe4ad36b73
intensity: sign warnings fixes min(i,u)
2014-11-17 14:48:55 +01:00
Tanguy Pruvot
c859041993
quark/blake512 opt. pointed by sp without asm
...
indeed, the pragma unroll doesnt always make things faster
asm part... to check later
2014-11-17 00:01:32 +01:00
Tanguy Pruvot
b128312efb
cuda: store device SM in a global var
...
sample usage made for blake and fugue (higher intensity for SM5.2)
add these to cuda_helper and clean unused code
2014-11-11 19:11:16 +01:00