1
0
mirror of https://github.com/GOSTSec/ccminer synced 2025-01-15 01:00:19 +00:00

50 Commits

Author SHA1 Message Date
Tanguy Pruvot
dad0110557 x17 cleanup
haval256 is now 2x faster, but sha512 perf depends a lot on cuda version...
2016-05-09 16:34:18 +02:00
Tanguy Pruvot
82a7e62b30 skein: cleanup, strip uint2x4.h + update vstudio 2015-10-23 13:32:18 +02:00
Tanguy Pruvot
ef817df79a import sp skein512 unrolled 64-bytes kernel (+0,6% x11)
Quark and S3 are now a bit faster (+1 %)
x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960)

80 bytes implementation to do/test ... (skein/skein2)

but keep my previous version for older devices...
2015-10-23 09:43:20 +02:00
Tanguy Pruvot
ab5cc7162e refactor: create bench.cpp and algos.h
Also enhance multi-thread benchmark synchro. with pthread barriers
2015-10-11 00:10:27 +02:00
Tanguy Pruvot
e1c4b3042c algos: add functions to free allocated resources
Will be used later for algo switching

not really tested yet...
2015-09-25 07:51:57 +02:00
Tanguy Pruvot
d4e191610e Import and adapt lyra2v2
not tested on windows and with SM <= 5
2015-08-18 09:27:11 +02:00
Tanguy Pruvot
15293d063f remove pluck algo
Supcoin seems.... dead and the algo was not supported on all devices
2015-06-28 20:48:23 +02:00
Tanguy Pruvot
4709668995 jh512: rewrite and optimize with asm swap
5% improvement by the vshl asm swap functions, mixed shl+add inst.,

Add also xchg(x, y) func and XCHG(x, y) define in cuda_helper for later use...

other jh changes are mainly for the beauty of the code...

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-06-16 08:20:48 +02:00
Tanguy Pruvot
52df82917a cuda: fix uint2 substract operator 2015-05-29 14:32:13 +02:00
Tanguy Pruvot
7bf256c81c cuda_helper: define UINT32_MAX if not defined
seems not defined on slackware...
2015-05-12 18:05:09 +02:00
Tanguy Pruvot
2f541065fb cuda_helper: rename correctly hiword/loword functions 2015-05-12 17:13:58 +02:00
Tanguy Pruvot
b35a6742fe cuda_helper: properly ifdef for vstudio c++ compat 2015-05-12 05:33:57 +02:00
Tanguy Pruvot
7c7f40a634 neoscrypt: attempt to recode shift256R for SM 3.0 2015-05-08 23:42:24 +02:00
Tanguy Pruvot
1ad34dc13d reset: take care of multi-threaded gpus (-d 0,0)
to be tested... could create problems when reset in a chain like x11...
2015-04-21 09:12:43 +02:00
Tanguy Pruvot
38e6672d70 Allow test of SM 2.1/3.0 binaries on newer cards
Implementation based on klausT work.. a bit different

This code must be placed in a common .cu file,
cuda.cpp is not compiled with nvcc and doesnt allow cuda code...
2015-03-28 12:00:53 +01:00
Tanguy Pruvot
7939dce0aa pluck: adaptation from djm repo
remains the cpu validation check to do...

throughput for this algo is divided by 128 to keep same kind of intensity values (default 18.0)
2015-03-08 15:16:11 +01:00
Tanguy Pruvot
3ed1c552bd cuda: always disable asm for host code 2015-03-05 18:15:52 +01:00
Tanguy Pruvot
e6112e878d cleanup: use unsigned throughput parameters
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
2015-02-28 14:05:09 +01:00
Tanguy Pruvot
768b5ccb76 import bmw512 uint2 changes from sp
+ some cleanup... 15KH/s won (750Ti)
2015-01-24 08:02:41 +01:00
Tanguy Pruvot
9f2dd3ee60 Remove some useless conversions
do not impact perfs neither...
2015-01-24 08:00:22 +01:00
Tanguy Pruvot
cafd4477d7 Handle a maximum of 16 gpus (vs 8 before)
Some cards have 2 gpus on board...
2015-01-22 04:55:27 +01:00
Tanguy Pruvot
b3188669e2 lyra2: cleanup
quickly tested with a SM 3.0 binary...
2014-12-20 13:10:33 +01:00
Tanguy Pruvot
da2e2528a7 uint2: fix SM 3.0 ROR and ROL
Not sure its the fastest way, but it works for offsets 0-63 + 64

Also note than asm SM 3.5+ doesn't support ROR with offset 64
2014-12-19 21:45:40 +01:00
Tanguy Pruvot
c5b349e079 Add Lyra2 algo, based on Vertcoin published code
Seems to be djm34 work, i recognize the code style ;)

Code was cleaned/indented and adapted to my fork...

Only usable on the test pool until 16 december 2014!
2014-12-06 11:28:26 +01:00
Tanguy Pruvot
c3bdb623e8 Check and submit multiple nonces in one loop
Added to most algos, checkhash function scans a big range
and can find multiple nonces at once if the difficulty is low.

Stop ignoring them, submit second one if found...

Clean the draft code for rc=2 implemented for blake and pentablake

btw... fix the reduced displayed hashrate when a nonce is found...

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2014-12-05 15:53:40 +00:00
Tanguy Pruvot
f387898ead Prepare multiple nonces support in one loop (if found)
Tested on x11 which find sometimes 3 nonces in one call,
actually they are ignored because only the biggest was kept...

This commit doesnt fix that, but will allow to enhance shares rate later...
2014-12-05 10:16:06 +01:00
Tanguy Pruvot
118a6be361 checkhash: simplify the common function
use klaus trivial function, the old code has always been a bit weird..

split cuda_check_cpu_hash_64 in two functions, keep old for branched stuff
2014-12-01 00:20:40 +01:00
Tanguy Pruvot
6ae28162db various extern cleanup + api history uids and gpu SM
uids could be useful to create graphes from history data

Note: please do a clean build after this commit (changes in miner.h)
2014-11-26 11:55:42 +01:00
sp-hash
26b9fe3586 faster x15, +23KH or 4ms on whirpool (30ms vs 34ms)
tpruvot: i didnt pick the asm replace_hiword, slower on linux
2014-11-20 19:19:27 +01:00
Tanguy Pruvot
73f22b237a Prepare trap of hardware/mem failures 2014-11-20 18:44:25 +01:00
Tanguy Pruvot
11dbbcc12d checkhash: some work on a faster variant (wip)
This should not be used for all algos... not enabled yet

todo: multiple nounces or blake32 style checkup
2014-11-16 17:37:02 +01:00
Tanguy Pruvot
b128312efb cuda: store device SM in a global var
sample usage made for blake and fugue (higher intensity for SM5.2)

add these to cuda_helper and clean unused code
2014-11-11 19:11:16 +01:00
Tanguy Pruvot
987edf63f3 vstudio: fix launch_bounds intellisense warnings in ide 2014-11-09 20:51:24 +01:00
Tanguy Pruvot
149143d5cd Fix left value warning in SWAPDWORDS + groestl change 2014-11-09 13:23:31 +01:00
Tanguy Pruvot
a747e4ca0f blake512: use a new SWAPDWORDS asm func (0.05ms)
small improvement, do it on pentablake and heavy variants too

based on sp commit (but SWAP32 is already used for 32bit ints)
2014-11-09 01:26:55 +01:00
Tanguy Pruvot
5bc969fa57 Some work on data alignment
linux: add -march=native (we build it ourself) and some other flags

+ remove unused vars (seen with -Wall)
2014-11-03 16:40:13 +01:00
Tanguy Pruvot
2de9b1375b prepare next version 2014-10-20 19:00:44 +02:00
Tanguy Pruvot
d8a23fa970 Tune quark part of Xn funcs
based on klaus commits, will increase a bit speed of most algos

PS: main increase is due to the register count tuning in Makefile

and for skein512 on linux, its the ROTL64

but almost no changes on X11 : 2648MH/s vs 2630 before
2014-10-20 03:15:17 +02:00
Tanguy Pruvot
ba33492592 blake: return to ptarget 6:7 compare
clz can be erroneous, ex 0xE0 vs 0xF0
2014-09-19 05:01:16 +02:00
Tanguy Pruvot
91eea0d76b blake: remove int cudaMemcpyToSymbol for MSVC
use clz (leading zeros) asm func for a fast gpu compare of ptarget[6]:[7]

add also missing windows ctz/clz host functions

New NEOS speed: 227MH to 270MH (Gigabyte 750Ti Black Edition)
2014-09-13 17:31:01 +02:00
Tanguy Pruvot
c3eb66683a Import djm34 qubit, deep and doom algos
Indent, and put commonly used functions proto. in cuda_helper.h

And add them to --cputest function

Also change the color option to --nocolor, -C is no more needed

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
(Which is tired to remove these german copy/pasted comments)
2014-09-10 00:26:55 +02:00
Tanguy Pruvot
13bb9d267e Remove debug rpc, already exists with -P 2014-09-09 21:59:03 +02:00
Tanguy Pruvot
64e8cd3f98 add x17 algo, cleaned djm34 commit
todo: visual studio...
2014-08-23 22:44:17 +02:00
Tanguy Pruvot
3f6ebc10cc whirlpool: x64 asm is very slow (30ms win32 vs 90) 2014-08-22 04:09:16 +02:00
Tanguy Pruvot
912ef1215d small reg tunes, rename whirlcoin to whirl 2014-08-21 02:57:10 +02:00
Tanguy Pruvot
1fbcbbacc4 Add whirlcoin and optimize x11 luffa (maxrregcount) 2014-08-20 07:49:22 +02:00
Tanguy Pruvot
4bc23048b5 x15: use djm34 code with asm xor64 + my rot64
some optimizations could be done later, after whirlcoin integration
2014-08-20 05:54:47 +02:00
Tanguy Pruvot
d9ea5f72ce Remove duplicated defines present in cuda_helper.h
also add cudaDeviceReset() on Ctrl+C for nvprof
2014-08-19 03:29:11 +02:00
Tanguy Pruvot
a9a3ad8afc cuda: check for errors on cuda mem alloc 2014-08-17 22:41:05 +02:00
Christian Buchner
f22ae4ebde forgot this file in previous commit 2014-05-03 21:09:43 +02:00