Tanguy Pruvot
4709668995
jh512: rewrite and optimize with asm swap
...
5% improvement by the vshl asm swap functions, mixed shl+add inst.,
Add also xchg(x, y) func and XCHG(x, y) define in cuda_helper for later use...
other jh changes are mainly for the beauty of the code...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2015-06-16 08:20:48 +02:00
Tanguy Pruvot
52df82917a
cuda: fix uint2 substract operator
2015-05-29 14:32:13 +02:00
Tanguy Pruvot
7bf256c81c
cuda_helper: define UINT32_MAX if not defined
...
seems not defined on slackware...
2015-05-12 18:05:09 +02:00
Tanguy Pruvot
2f541065fb
cuda_helper: rename correctly hiword/loword functions
2015-05-12 17:13:58 +02:00
Tanguy Pruvot
b35a6742fe
cuda_helper: properly ifdef for vstudio c++ compat
2015-05-12 05:33:57 +02:00
Tanguy Pruvot
7c7f40a634
neoscrypt: attempt to recode shift256R for SM 3.0
2015-05-08 23:42:24 +02:00
Tanguy Pruvot
1ad34dc13d
reset: take care of multi-threaded gpus (-d 0,0)
...
to be tested... could create problems when reset in a chain like x11...
2015-04-21 09:12:43 +02:00
Tanguy Pruvot
38e6672d70
Allow test of SM 2.1/3.0 binaries on newer cards
...
Implementation based on klausT work.. a bit different
This code must be placed in a common .cu file,
cuda.cpp is not compiled with nvcc and doesnt allow cuda code...
2015-03-28 12:00:53 +01:00
Tanguy Pruvot
7939dce0aa
pluck: adaptation from djm repo
...
remains the cpu validation check to do...
throughput for this algo is divided by 128 to keep same kind of intensity values (default 18.0)
2015-03-08 15:16:11 +01:00
Tanguy Pruvot
3ed1c552bd
cuda: always disable asm for host code
2015-03-05 18:15:52 +01:00
Tanguy Pruvot
e6112e878d
cleanup: use unsigned throughput parameters
...
Yes, its a big commit, was waiting 1.6 to do that...
Sorry for your possible merge issues ;)
2015-02-28 14:05:09 +01:00
Tanguy Pruvot
768b5ccb76
import bmw512 uint2 changes from sp
...
+ some cleanup... 15KH/s won (750Ti)
2015-01-24 08:02:41 +01:00
Tanguy Pruvot
9f2dd3ee60
Remove some useless conversions
...
do not impact perfs neither...
2015-01-24 08:00:22 +01:00
Tanguy Pruvot
cafd4477d7
Handle a maximum of 16 gpus (vs 8 before)
...
Some cards have 2 gpus on board...
2015-01-22 04:55:27 +01:00
Tanguy Pruvot
b3188669e2
lyra2: cleanup
...
quickly tested with a SM 3.0 binary...
2014-12-20 13:10:33 +01:00
Tanguy Pruvot
da2e2528a7
uint2: fix SM 3.0 ROR and ROL
...
Not sure its the fastest way, but it works for offsets 0-63 + 64
Also note than asm SM 3.5+ doesn't support ROR with offset 64
2014-12-19 21:45:40 +01:00
Tanguy Pruvot
c5b349e079
Add Lyra2 algo, based on Vertcoin published code
...
Seems to be djm34 work, i recognize the code style ;)
Code was cleaned/indented and adapted to my fork...
Only usable on the test pool until 16 december 2014!
2014-12-06 11:28:26 +01:00
Tanguy Pruvot
c3bdb623e8
Check and submit multiple nonces in one loop
...
Added to most algos, checkhash function scans a big range
and can find multiple nonces at once if the difficulty is low.
Stop ignoring them, submit second one if found...
Clean the draft code for rc=2 implemented for blake and pentablake
btw... fix the reduced displayed hashrate when a nonce is found...
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2014-12-05 15:53:40 +00:00
Tanguy Pruvot
f387898ead
Prepare multiple nonces support in one loop (if found)
...
Tested on x11 which find sometimes 3 nonces in one call,
actually they are ignored because only the biggest was kept...
This commit doesnt fix that, but will allow to enhance shares rate later...
2014-12-05 10:16:06 +01:00
Tanguy Pruvot
118a6be361
checkhash: simplify the common function
...
use klaus trivial function, the old code has always been a bit weird..
split cuda_check_cpu_hash_64 in two functions, keep old for branched stuff
2014-12-01 00:20:40 +01:00
Tanguy Pruvot
6ae28162db
various extern cleanup + api history uids and gpu SM
...
uids could be useful to create graphes from history data
Note: please do a clean build after this commit (changes in miner.h)
2014-11-26 11:55:42 +01:00
sp-hash
26b9fe3586
faster x15, +23KH or 4ms on whirpool (30ms vs 34ms)
...
tpruvot: i didnt pick the asm replace_hiword, slower on linux
2014-11-20 19:19:27 +01:00
Tanguy Pruvot
73f22b237a
Prepare trap of hardware/mem failures
2014-11-20 18:44:25 +01:00
Tanguy Pruvot
11dbbcc12d
checkhash: some work on a faster variant (wip)
...
This should not be used for all algos... not enabled yet
todo: multiple nounces or blake32 style checkup
2014-11-16 17:37:02 +01:00
Tanguy Pruvot
b128312efb
cuda: store device SM in a global var
...
sample usage made for blake and fugue (higher intensity for SM5.2)
add these to cuda_helper and clean unused code
2014-11-11 19:11:16 +01:00
Tanguy Pruvot
987edf63f3
vstudio: fix launch_bounds intellisense warnings in ide
2014-11-09 20:51:24 +01:00
Tanguy Pruvot
149143d5cd
Fix left value warning in SWAPDWORDS + groestl change
2014-11-09 13:23:31 +01:00
Tanguy Pruvot
a747e4ca0f
blake512: use a new SWAPDWORDS asm func (0.05ms)
...
small improvement, do it on pentablake and heavy variants too
based on sp commit (but SWAP32 is already used for 32bit ints)
2014-11-09 01:26:55 +01:00
Tanguy Pruvot
5bc969fa57
Some work on data alignment
...
linux: add -march=native (we build it ourself) and some other flags
+ remove unused vars (seen with -Wall)
2014-11-03 16:40:13 +01:00
Tanguy Pruvot
2de9b1375b
prepare next version
2014-10-20 19:00:44 +02:00
Tanguy Pruvot
d8a23fa970
Tune quark part of Xn funcs
...
based on klaus commits, will increase a bit speed of most algos
PS: main increase is due to the register count tuning in Makefile
and for skein512 on linux, its the ROTL64
but almost no changes on X11 : 2648MH/s vs 2630 before
2014-10-20 03:15:17 +02:00
Tanguy Pruvot
ba33492592
blake: return to ptarget 6:7 compare
...
clz can be erroneous, ex 0xE0 vs 0xF0
2014-09-19 05:01:16 +02:00
Tanguy Pruvot
91eea0d76b
blake: remove int cudaMemcpyToSymbol for MSVC
...
use clz (leading zeros) asm func for a fast gpu compare of ptarget[6]:[7]
add also missing windows ctz/clz host functions
New NEOS speed: 227MH to 270MH (Gigabyte 750Ti Black Edition)
2014-09-13 17:31:01 +02:00
Tanguy Pruvot
c3eb66683a
Import djm34 qubit, deep and doom algos
...
Indent, and put commonly used functions proto. in cuda_helper.h
And add them to --cputest function
Also change the color option to --nocolor, -C is no more needed
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
(Which is tired to remove these german copy/pasted comments)
2014-09-10 00:26:55 +02:00
Tanguy Pruvot
13bb9d267e
Remove debug rpc, already exists with -P
2014-09-09 21:59:03 +02:00
Tanguy Pruvot
64e8cd3f98
add x17 algo, cleaned djm34 commit
...
todo: visual studio...
2014-08-23 22:44:17 +02:00
Tanguy Pruvot
3f6ebc10cc
whirlpool: x64 asm is very slow (30ms win32 vs 90)
2014-08-22 04:09:16 +02:00
Tanguy Pruvot
912ef1215d
small reg tunes, rename whirlcoin to whirl
2014-08-21 02:57:10 +02:00
Tanguy Pruvot
1fbcbbacc4
Add whirlcoin and optimize x11 luffa (maxrregcount)
2014-08-20 07:49:22 +02:00
Tanguy Pruvot
4bc23048b5
x15: use djm34 code with asm xor64 + my rot64
...
some optimizations could be done later, after whirlcoin integration
2014-08-20 05:54:47 +02:00
Tanguy Pruvot
d9ea5f72ce
Remove duplicated defines present in cuda_helper.h
...
also add cudaDeviceReset() on Ctrl+C for nvprof
2014-08-19 03:29:11 +02:00
Tanguy Pruvot
a9a3ad8afc
cuda: check for errors on cuda mem alloc
2014-08-17 22:41:05 +02:00
Christian Buchner
f22ae4ebde
forgot this file in previous commit
2014-05-03 21:09:43 +02:00