83 Commits

Author SHA1 Message Date
Tanguy Pruvot
c859041993 quark/blake512 opt. pointed by sp without asm
indeed, the pragma unroll doesnt always make things faster

asm part... to check later
2014-11-17 00:01:32 +01:00
Tanguy Pruvot
438308b3a2 Rework benchmark mode and min/max range
Was maybe my fault, but the benchmark mode was
always recomputing from nonce 0.

Also fix blake if -d 1 is used (one thread but second gpu)

stats: do not use thread id as key, prefer gpu id...
2014-11-16 23:28:18 +01:00
Tanguy Pruvot
11dbbcc12d checkhash: some work on a faster variant (wip)
This should not be used for all algos... not enabled yet

todo: multiple nounces or blake32 style checkup
2014-11-16 17:37:02 +01:00
Tanguy Pruvot
14a41959f8 x11: switch to intensity 20 for SM>=5.2 750+970 2014-11-16 17:34:50 +01:00
Tanguy Pruvot
fdd5d29071 x11: shavite and echo from sp (now ok on win32)
Previous echo commit was only increasing linux performance, and reducing
windows perf compared to the 1.4.9, this one seems to give at least
the 1.4.9 on windows, and the same on linux...

Shavite optimisation seems ok on both (use now 64 registers)

the launch_bounds will force the number of registers, so remove specific
Makefile rules on linux...

manual "cherry pick" with fixed line endings and some adaptations
2014-11-16 17:34:50 +01:00
sp-hash
e18a54e8fc sp echo optimisation + cleanup
Original Commit :
Removed sharedmem and reduced calculations with precalcing (ECHO hash).
750ti + 20KHASH(x11)

tpruvot notes:
Real change is more of 10 KH/s on stock clocks (but real)
launch bounds disabled, no perf increase with 64 registers
2014-11-16 03:08:46 +01:00
Tanguy Pruvot
b128312efb cuda: store device SM in a global var
sample usage made for blake and fugue (higher intensity for SM5.2)

add these to cuda_helper and clean unused code
2014-11-11 19:11:16 +01:00
Tanguy Pruvot
11c5ec810d Handle intensity param in all algos
and add a check related to start/max nounce params
2014-11-09 22:27:32 +01:00
sp-hash
5be6811dcf x11: echo and cubehash optimization
echo : 40.056ms -> 39.241ms
cube : 14.490ms -> 13.511ms

cube hash change look like useless (__device__ code in generally inlined)
but the reality proves that cuda documentation is wrong...

tpruvot: fixed dos lines ending in echo,
and used my style for cuda function attributes
2014-11-06 15:17:26 +01:00
Tanguy Pruvot
b191d713a0 s3: reduce a bit the intensity on windows 2014-10-26 11:18:59 +01:00
Tanguy Pruvot
6169bf683b Add S3 Algo (1Coin)
Simple addition of the algo using existing X11 code
2014-10-26 09:10:58 +01:00
Tanguy Pruvot
93f4409dde simd: then reindent the code
no changes, only error checks (cuda safe call)
2014-10-25 23:03:20 +02:00
Tanguy Pruvot
b465fe6825 optimize x11 simd512 (+100KH/s)
change picked from tsiv repo
2014-10-25 22:15:43 +02:00
Tanguy Pruvot
1b241df5c0 cubehash and luffa funnel shit (from klaus)
No gain... but i like this define, more readable in luffa ;)
2014-10-20 19:06:27 +02:00
Tanguy Pruvot
d8a23fa970 Tune quark part of Xn funcs
based on klaus commits, will increase a bit speed of most algos

PS: main increase is due to the register count tuning in Makefile

and for skein512 on linux, its the ROTL64

but almost no changes on X11 : 2648MH/s vs 2630 before
2014-10-20 03:15:17 +02:00
Tanguy Pruvot
7cc5222394 Move common check_cpu functions to root 2014-09-10 00:27:01 +02:00
Tanguy Pruvot
95ac1d0f19 x11: adapt some blake 256 opts to 512 one
blake512: for the moment 6.2ms vs 7.12 before (+10%)
2014-09-09 17:55:07 +02:00
Tanguy Pruvot
b4e690b486 sources: swith to UTF-8 2014-08-21 08:27:48 +02:00
Tanguy Pruvot
912ef1215d small reg tunes, rename whirlcoin to whirl 2014-08-21 02:57:10 +02:00
Tanguy Pruvot
1fbcbbacc4 Add whirlcoin and optimize x11 luffa (maxrregcount) 2014-08-20 07:49:22 +02:00
Tanguy Pruvot
194fda87c1 x11: restore simd host2dev memcpytosymbol to reduce used cmem
Remove define attempts for SM 2.1 devices, fermi is not compatible
2014-08-19 18:32:14 +02:00
Tanguy Pruvot
bc2eb75758 Add fresh algo (based on djm34 code)
Cleaned up and adapted to my changes (cputest added)

Remove Makefile.in which should be in gitignore

(Plz refresh it with ./config.sh to compile on linux)
2014-08-19 18:31:26 +02:00
Tanguy Pruvot
d9ea5f72ce Remove duplicated defines present in cuda_helper.h
also add cudaDeviceReset() on Ctrl+C for nvprof
2014-08-19 03:29:11 +02:00
Tanguy Pruvot
a9a3ad8afc cuda: check for errors on cuda mem alloc 2014-08-17 22:41:05 +02:00
Tanguy Pruvot
6984a001d6 Win32 build fix after linux work (configure) 2014-08-15 03:59:49 +02:00
Tanguy Pruvot
cf7351d138 x10 funcs cleanup, we dont need host constant tables 2014-08-15 03:40:13 +02:00
Tanguy Pruvot
9d3d09103b Try to restore compat with 2.1 devices (GTX 460) 2014-08-12 18:07:50 +02:00
Tanguy Pruvot
06763c20b1 Implement x14 (cuda + cpu functions)
Project was updated for VS2013 and CUDA SDK 6.5

add also a --cputest function to dump cpu hash results

TODO: x15 is not fully functional, but first loop seems ok

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2014-08-12 14:47:03 +02:00
Christian Buchner
d99b91ea65 adding third party X13 and Diamond Groestl code contributions. 2014-06-15 14:31:20 +02:00
Christian Buchner
3b21069504 bump to revision V1.1 with Killer Groestl 2014-06-14 01:43:28 +02:00
Christian Buchner
be5ba30131 massive speed upgrade for the SIMD hash. AMD, be afraid. 2014-05-14 11:04:09 +02:00
Christian Buchner
44d38e3a9a Simplification of the SIMD hash code (remove unnecessary lookup tables), increase X11 throughput value somewhat 2014-05-11 02:24:26 +02:00
Christian Buchner
af07302b4b v1.0 - Yo, I heard y'all like X11 2014-05-10 00:29:59 +02:00