Commit Graph

184 Commits

Author SHA1 Message Date
Tanguy Pruvot
7a4e1bb327 Reduce keccak, deep & anime intensity + handle groestl -i param
default intensity was the max supported by the card, and perf is
not really better. I prefer to let it one under for cards with lower
memory (1GB)
2014-11-10 18:08:23 +01:00
Tanguy Pruvot
98451267d8 vstudio: std::min fix 2014-11-10 17:06:39 +01:00
Tanguy Pruvot
7acf987aba Add intensity to last algos and fix quark speed 2014-11-10 16:56:03 +01:00
Tanguy Pruvot
a35b150b7f fix for jackpot hash
max nounce was too low (bad cpu miner copy/paste i guess)

hash speed was not right also... (was divided per 2)
2014-11-10 14:22:10 +01:00
Tanguy Pruvot
2ab1e3700f update readme 2014-11-09 22:31:12 +01:00
Tanguy Pruvot
11c5ec810d Handle intensity param in all algos
and add a check related to start/max nounce params
2014-11-09 22:27:32 +01:00
Tanguy Pruvot
9f62014690 Add intensity parameter (-i 0:31)
Like cgminer, the value equals to 1 << n
if 0, we keep the default value defined in algo (19 for Xn algos)

19 = 524288 threads per gpu call

GTX 970 and 980 handle a higher number of threads compared to the 750 Ti

Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2014-11-09 22:21:11 +01:00
Tanguy Pruvot
987edf63f3 vstudio: fix launch_bounds intellisense warnings in ide 2014-11-09 20:51:24 +01:00
Tanguy Pruvot
8046284843 align: missed one aligned free of struct work (solo) 2014-11-09 20:15:45 +01:00
Tanguy Pruvot
149143d5cd Fix left value warning in SWAPDWORDS + groestl change 2014-11-09 13:23:31 +01:00
Tanguy Pruvot
a747e4ca0f blake512: use a new SWAPDWORDS asm func (0.05ms)
small improvement, do it on pentablake and heavy variants too

based on sp commit (but SWAP32 is already used for 32bit ints)
2014-11-09 01:26:55 +01:00
Tanguy Pruvot
2d98d127f8 groestl: enhance sp andmask optimisation
profile of quark_groestl512_gpu_hash_64_quad()
 before: 35.692ms
 sp : 35.151ms
 new: 35.061ms
2014-11-09 00:20:39 +01:00
Tanguy Pruvot
e7beac6b1c x11: tiny sp_ opt on jh512 (0.05ms)
modified a bit.. (and removed the mixed dos end of lines ^M)

also, remove the max reg count, now determined with __launch_bounds__
2014-11-09 00:20:39 +01:00
Tanguy Pruvot
4c3964539f Fix vc debug builds, missing symbols 2014-11-06 17:42:01 +01:00
sp-hash
5be6811dcf x11: echo and cubehash optimization
echo : 40.056ms -> 39.241ms
cube : 14.490ms -> 13.511ms

cube hash change look like useless (__device__ code in generally inlined)
but the reality proves that cuda documentation is wrong...

tpruvot: fixed dos lines ending in echo,
and used my style for cuda function attributes
2014-11-06 15:17:26 +01:00
Tanguy Pruvot
12fafd5687 Try to reconnect on pool duplicates
reduce log announces and define uchar in miner.h
2014-11-04 15:14:24 +01:00
Tanguy Pruvot
5e8ff5226b update curl prebuilt libs to a light 7.38.0
curl built from tpruvot/curl-for-windows project with the HTTP_ONLY define

This project doesnt require SSH, LDAP and all the internel protocols ;)

Remove 200KB to the final binaries
2014-11-04 14:47:28 +01:00
Tanguy Pruvot
187e293f71 blake: some fine tuning + cleanup 2014-11-03 20:55:03 +01:00
Tanguy Pruvot
5bc969fa57 Some work on data alignment
linux: add -march=native (we build it ourself) and some other flags

+ remove unused vars (seen with -Wall)
2014-11-03 16:40:13 +01:00
Tanguy Pruvot
93bb428bdf blake: rewrite the cache system
Unlike other hash algos, blake256 compute the hash
with blocks of 64 bytes.

We can do the first part on the cpu, only the 4 last int32
are computed on gpu (including the tested nonce)

Previous method was also using this kind of cache with a crc.

Blake Hash Speed: +5%
2014-11-03 16:33:59 +01:00
Tanguy Pruvot
b191d713a0 s3: reduce a bit the intensity on windows 2014-10-26 11:18:59 +01:00
Tanguy Pruvot
f7849d36a1 Update README for 1.4.6 2014-10-26 09:43:32 +01:00
Tanguy Pruvot
6169bf683b Add S3 Algo (1Coin)
Simple addition of the algo using existing X11 code
2014-10-26 09:10:58 +01:00
Tanguy Pruvot
93f4409dde simd: then reindent the code
no changes, only error checks (cuda safe call)
2014-10-25 23:03:20 +02:00
Tanguy Pruvot
b465fe6825 optimize x11 simd512 (+100KH/s)
change picked from tsiv repo
2014-10-25 22:15:43 +02:00
Tanguy Pruvot
1b241df5c0 cubehash and luffa funnel shit (from klaus)
No gain... but i like this define, more readable in luffa ;)
2014-10-20 19:06:27 +02:00
Tanguy Pruvot
2de9b1375b prepare next version 2014-10-20 19:00:44 +02:00
Tanguy Pruvot
7bdebdb5ff README fixes 2014-10-20 06:34:57 +02:00
Tanguy Pruvot
db8681c1db update readme and fix SM 3.0 build 2014-10-20 06:27:02 +02:00
Tanguy Pruvot
f737f7f0cb Fix usage and big strings on windows (colors rel.)
vsnprintf doesnt return the len on windows on fail, so use _vscprintf
2014-10-20 05:39:48 +02:00
Tanguy Pruvot
1ee1462011 msvc: fix the LTCG warning 2014-10-20 05:39:44 +02:00
Tanguy Pruvot
d8a23fa970 Tune quark part of Xn funcs
based on klaus commits, will increase a bit speed of most algos

PS: main increase is due to the register count tuning in Makefile

and for skein512 on linux, its the ROTL64

but almost no changes on X11 : 2648MH/s vs 2630 before
2014-10-20 03:15:17 +02:00
Tanguy Pruvot
0720797f1b Add proper keccak-256 (maxcoin)
Cleaned from djm34 repo, tuned for the 750 Ti
2014-10-17 06:46:20 +02:00
Tanguy Pruvot
cdc29336f7 stats: compute work difficulty from target 2014-09-30 10:03:12 +02:00
Tanguy Pruvot
9f3c6b0520 Include windows curl and openssl prebuilt libs
Curl 7.35 without SSH2
OpenSSL 1.0.1e
ZLib 1.2.8

built with https://github.com/peters/curl-for-windows
2014-09-30 06:25:38 +02:00
Tanguy Pruvot
4f326576d2 implement X-Mining-Hashrate header
remove midstate extension, seems only used in sha256/scrypt

and prepare noncerange, need a pool which supports that to finish...
2014-09-29 08:24:12 +02:00
Tanguy Pruvot
799b230af2 enhance solo mining, update http headers
and prepare next version...
2014-09-28 15:34:44 +02:00
Tanguy Pruvot
c0b5513316 Try some obscure cuda flags (kbomba)
http://devblogs.nvidia.com/parallelforall/separate-compilation-linking-cuda-device-code/
2014-09-27 13:58:29 +02:00
Tanguy Pruvot
a6fcc8fdb6 use cudart_static.lib, keep SM 5.0 by default
SM 5.2 works also on the 750 Ti but if we specify both at compile time,
hash speed will be reduced (the 750Ti will use 5.2 which is not optimal)
2014-09-27 12:53:19 +02:00
Tanguy Pruvot
5579b91cfb build for both GM104 and GM204
For the GTX 750 and new 970/980

also fix -a luffa parameter for 1.4.4 release
2014-09-27 09:46:52 +02:00
Tanguy Pruvot
ba33492592 blake: return to ptarget 6:7 compare
clz can be erroneous, ex 0xE0 vs 0xF0
2014-09-19 05:01:16 +02:00
Tanguy Pruvot
91eea0d76b blake: remove int cudaMemcpyToSymbol for MSVC
use clz (leading zeros) asm func for a fast gpu compare of ptarget[6]:[7]

add also missing windows ctz/clz host functions

New NEOS speed: 227MH to 270MH (Gigabyte 750Ti Black Edition)
2014-09-13 17:31:01 +02:00
Tanguy Pruvot
9efe0b965d blake: only use high part of target on gpu
Add another few MH/s boost :)
2014-09-13 00:15:34 +02:00
Tanguy Pruvot
cc296a0618 stratum: check if job was read 2014-09-13 00:15:25 +02:00
Tanguy Pruvot
8925a7551f blake: final cleanup (225MH/s) 2014-09-11 20:16:16 +02:00
Tanguy Pruvot
347d4e4928 blake: +8MH/s on linux, weird optimisation
Like doom/luffa, using a int pos make the proc faster
2014-09-11 02:33:34 +02:00
Tanguy Pruvot
23f0cee61f Add cuda error checks on qubit algos
And rename doom to luffa, like djm34
2014-09-11 02:20:52 +02:00
Tanguy Pruvot
1aec4555cc Tune reg. count for qubit (luffa) algos 2014-09-11 00:50:27 +02:00
Tanguy Pruvot
31f77b6524 Put bloc height extraction in a function 2014-09-10 16:50:17 +02:00
Tanguy Pruvot
edf756deb5 update readme 2014-09-10 10:49:41 +02:00