Tanguy Pruvot
11dbbcc12d
checkhash: some work on a faster variant (wip)
...
This should not be used for all algos... not enabled yet
todo: multiple nounces or blake32 style checkup
10 years ago
Tanguy Pruvot
b128312efb
cuda: store device SM in a global var
...
sample usage made for blake and fugue (higher intensity for SM5.2)
add these to cuda_helper and clean unused code
10 years ago
Tanguy Pruvot
987edf63f3
vstudio: fix launch_bounds intellisense warnings in ide
10 years ago
Tanguy Pruvot
149143d5cd
Fix left value warning in SWAPDWORDS + groestl change
10 years ago
Tanguy Pruvot
a747e4ca0f
blake512: use a new SWAPDWORDS asm func (0.05ms)
...
small improvement, do it on pentablake and heavy variants too
based on sp commit (but SWAP32 is already used for 32bit ints)
10 years ago
Tanguy Pruvot
5bc969fa57
Some work on data alignment
...
linux: add -march=native (we build it ourself) and some other flags
+ remove unused vars (seen with -Wall)
10 years ago
Tanguy Pruvot
2de9b1375b
prepare next version
10 years ago
Tanguy Pruvot
d8a23fa970
Tune quark part of Xn funcs
...
based on klaus commits, will increase a bit speed of most algos
PS: main increase is due to the register count tuning in Makefile
and for skein512 on linux, its the ROTL64
but almost no changes on X11 : 2648MH/s vs 2630 before
10 years ago
Tanguy Pruvot
ba33492592
blake: return to ptarget 6:7 compare
...
clz can be erroneous, ex 0xE0 vs 0xF0
10 years ago
Tanguy Pruvot
91eea0d76b
blake: remove int cudaMemcpyToSymbol for MSVC
...
use clz (leading zeros) asm func for a fast gpu compare of ptarget[6]:[7]
add also missing windows ctz/clz host functions
New NEOS speed: 227MH to 270MH (Gigabyte 750Ti Black Edition)
10 years ago
Tanguy Pruvot
c3eb66683a
Import djm34 qubit, deep and doom algos
...
Indent, and put commonly used functions proto. in cuda_helper.h
And add them to --cputest function
Also change the color option to --nocolor, -C is no more needed
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
(Which is tired to remove these german copy/pasted comments)
10 years ago
Tanguy Pruvot
13bb9d267e
Remove debug rpc, already exists with -P
10 years ago
Tanguy Pruvot
64e8cd3f98
add x17 algo, cleaned djm34 commit
...
todo: visual studio...
10 years ago
Tanguy Pruvot
3f6ebc10cc
whirlpool: x64 asm is very slow (30ms win32 vs 90)
10 years ago
Tanguy Pruvot
912ef1215d
small reg tunes, rename whirlcoin to whirl
10 years ago
Tanguy Pruvot
1fbcbbacc4
Add whirlcoin and optimize x11 luffa (maxrregcount)
10 years ago
Tanguy Pruvot
4bc23048b5
x15: use djm34 code with asm xor64 + my rot64
...
some optimizations could be done later, after whirlcoin integration
10 years ago
Tanguy Pruvot
d9ea5f72ce
Remove duplicated defines present in cuda_helper.h
...
also add cudaDeviceReset() on Ctrl+C for nvprof
10 years ago
Tanguy Pruvot
a9a3ad8afc
cuda: check for errors on cuda mem alloc
10 years ago
Christian Buchner
f22ae4ebde
forgot this file in previous commit
11 years ago