Tanguy Pruvot
7a4e1bb327
Reduce keccak, deep & anime intensity + handle groestl -i param
...
default intensity was the max supported by the card, and perf is
not really better. I prefer to let it one under for cards with lower
memory (1GB)
2014-11-10 18:08:23 +01:00
Tanguy Pruvot
98451267d8
vstudio: std::min fix
2014-11-10 17:06:39 +01:00
Tanguy Pruvot
7acf987aba
Add intensity to last algos and fix quark speed
2014-11-10 16:56:03 +01:00
Tanguy Pruvot
a35b150b7f
fix for jackpot hash
...
max nounce was too low (bad cpu miner copy/paste i guess)
hash speed was not right also... (was divided per 2)
2014-11-10 14:22:10 +01:00
Tanguy Pruvot
2ab1e3700f
update readme
2014-11-09 22:31:12 +01:00
Tanguy Pruvot
11c5ec810d
Handle intensity param in all algos
...
and add a check related to start/max nounce params
2014-11-09 22:27:32 +01:00
Tanguy Pruvot
9f62014690
Add intensity parameter (-i 0:31)
...
Like cgminer, the value equals to 1 << n
if 0, we keep the default value defined in algo (19 for Xn algos)
19 = 524288 threads per gpu call
GTX 970 and 980 handle a higher number of threads compared to the 750 Ti
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
2014-11-09 22:21:11 +01:00
Tanguy Pruvot
987edf63f3
vstudio: fix launch_bounds intellisense warnings in ide
2014-11-09 20:51:24 +01:00
Tanguy Pruvot
8046284843
align: missed one aligned free of struct work (solo)
2014-11-09 20:15:45 +01:00
Tanguy Pruvot
149143d5cd
Fix left value warning in SWAPDWORDS + groestl change
2014-11-09 13:23:31 +01:00
Tanguy Pruvot
a747e4ca0f
blake512: use a new SWAPDWORDS asm func (0.05ms)
...
small improvement, do it on pentablake and heavy variants too
based on sp commit (but SWAP32 is already used for 32bit ints)
2014-11-09 01:26:55 +01:00
Tanguy Pruvot
2d98d127f8
groestl: enhance sp andmask optimisation
...
profile of quark_groestl512_gpu_hash_64_quad()
before: 35.692ms
sp : 35.151ms
new: 35.061ms
2014-11-09 00:20:39 +01:00
Tanguy Pruvot
e7beac6b1c
x11: tiny sp_ opt on jh512 (0.05ms)
...
modified a bit.. (and removed the mixed dos end of lines ^M)
also, remove the max reg count, now determined with __launch_bounds__
2014-11-09 00:20:39 +01:00
Tanguy Pruvot
4c3964539f
Fix vc debug builds, missing symbols
2014-11-06 17:42:01 +01:00
sp-hash
5be6811dcf
x11: echo and cubehash optimization
...
echo : 40.056ms -> 39.241ms
cube : 14.490ms -> 13.511ms
cube hash change look like useless (__device__ code in generally inlined)
but the reality proves that cuda documentation is wrong...
tpruvot: fixed dos lines ending in echo,
and used my style for cuda function attributes
2014-11-06 15:17:26 +01:00
Tanguy Pruvot
12fafd5687
Try to reconnect on pool duplicates
...
reduce log announces and define uchar in miner.h
2014-11-04 15:14:24 +01:00
Tanguy Pruvot
5e8ff5226b
update curl prebuilt libs to a light 7.38.0
...
curl built from tpruvot/curl-for-windows project with the HTTP_ONLY define
This project doesnt require SSH, LDAP and all the internel protocols ;)
Remove 200KB to the final binaries
2014-11-04 14:47:28 +01:00
Tanguy Pruvot
187e293f71
blake: some fine tuning + cleanup
2014-11-03 20:55:03 +01:00
Tanguy Pruvot
5bc969fa57
Some work on data alignment
...
linux: add -march=native (we build it ourself) and some other flags
+ remove unused vars (seen with -Wall)
2014-11-03 16:40:13 +01:00
Tanguy Pruvot
93bb428bdf
blake: rewrite the cache system
...
Unlike other hash algos, blake256 compute the hash
with blocks of 64 bytes.
We can do the first part on the cpu, only the 4 last int32
are computed on gpu (including the tested nonce)
Previous method was also using this kind of cache with a crc.
Blake Hash Speed: +5%
2014-11-03 16:33:59 +01:00
Tanguy Pruvot
b191d713a0
s3: reduce a bit the intensity on windows
2014-10-26 11:18:59 +01:00
Tanguy Pruvot
f7849d36a1
Update README for 1.4.6
2014-10-26 09:43:32 +01:00
Tanguy Pruvot
6169bf683b
Add S3 Algo (1Coin)
...
Simple addition of the algo using existing X11 code
2014-10-26 09:10:58 +01:00
Tanguy Pruvot
93f4409dde
simd: then reindent the code
...
no changes, only error checks (cuda safe call)
2014-10-25 23:03:20 +02:00
Tanguy Pruvot
b465fe6825
optimize x11 simd512 (+100KH/s)
...
change picked from tsiv repo
2014-10-25 22:15:43 +02:00
Tanguy Pruvot
1b241df5c0
cubehash and luffa funnel shit (from klaus)
...
No gain... but i like this define, more readable in luffa ;)
2014-10-20 19:06:27 +02:00
Tanguy Pruvot
2de9b1375b
prepare next version
2014-10-20 19:00:44 +02:00
Tanguy Pruvot
7bdebdb5ff
README fixes
2014-10-20 06:34:57 +02:00
Tanguy Pruvot
db8681c1db
update readme and fix SM 3.0 build
2014-10-20 06:27:02 +02:00
Tanguy Pruvot
f737f7f0cb
Fix usage and big strings on windows (colors rel.)
...
vsnprintf doesnt return the len on windows on fail, so use _vscprintf
2014-10-20 05:39:48 +02:00
Tanguy Pruvot
1ee1462011
msvc: fix the LTCG warning
2014-10-20 05:39:44 +02:00
Tanguy Pruvot
d8a23fa970
Tune quark part of Xn funcs
...
based on klaus commits, will increase a bit speed of most algos
PS: main increase is due to the register count tuning in Makefile
and for skein512 on linux, its the ROTL64
but almost no changes on X11 : 2648MH/s vs 2630 before
2014-10-20 03:15:17 +02:00
Tanguy Pruvot
0720797f1b
Add proper keccak-256 (maxcoin)
...
Cleaned from djm34 repo, tuned for the 750 Ti
2014-10-17 06:46:20 +02:00
Tanguy Pruvot
cdc29336f7
stats: compute work difficulty from target
2014-09-30 10:03:12 +02:00
Tanguy Pruvot
9f3c6b0520
Include windows curl and openssl prebuilt libs
...
Curl 7.35 without SSH2
OpenSSL 1.0.1e
ZLib 1.2.8
built with https://github.com/peters/curl-for-windows
2014-09-30 06:25:38 +02:00
Tanguy Pruvot
4f326576d2
implement X-Mining-Hashrate header
...
remove midstate extension, seems only used in sha256/scrypt
and prepare noncerange, need a pool which supports that to finish...
2014-09-29 08:24:12 +02:00
Tanguy Pruvot
799b230af2
enhance solo mining, update http headers
...
and prepare next version...
2014-09-28 15:34:44 +02:00
Tanguy Pruvot
c0b5513316
Try some obscure cuda flags (kbomba)
...
http://devblogs.nvidia.com/parallelforall/separate-compilation-linking-cuda-device-code/
2014-09-27 13:58:29 +02:00
Tanguy Pruvot
a6fcc8fdb6
use cudart_static.lib, keep SM 5.0 by default
...
SM 5.2 works also on the 750 Ti but if we specify both at compile time,
hash speed will be reduced (the 750Ti will use 5.2 which is not optimal)
2014-09-27 12:53:19 +02:00
Tanguy Pruvot
5579b91cfb
build for both GM104 and GM204
...
For the GTX 750 and new 970/980
also fix -a luffa parameter for 1.4.4 release
2014-09-27 09:46:52 +02:00
Tanguy Pruvot
ba33492592
blake: return to ptarget 6:7 compare
...
clz can be erroneous, ex 0xE0 vs 0xF0
2014-09-19 05:01:16 +02:00
Tanguy Pruvot
91eea0d76b
blake: remove int cudaMemcpyToSymbol for MSVC
...
use clz (leading zeros) asm func for a fast gpu compare of ptarget[6]:[7]
add also missing windows ctz/clz host functions
New NEOS speed: 227MH to 270MH (Gigabyte 750Ti Black Edition)
2014-09-13 17:31:01 +02:00
Tanguy Pruvot
9efe0b965d
blake: only use high part of target on gpu
...
Add another few MH/s boost :)
2014-09-13 00:15:34 +02:00
Tanguy Pruvot
cc296a0618
stratum: check if job was read
2014-09-13 00:15:25 +02:00
Tanguy Pruvot
8925a7551f
blake: final cleanup (225MH/s)
2014-09-11 20:16:16 +02:00
Tanguy Pruvot
347d4e4928
blake: +8MH/s on linux, weird optimisation
...
Like doom/luffa, using a int pos make the proc faster
2014-09-11 02:33:34 +02:00
Tanguy Pruvot
23f0cee61f
Add cuda error checks on qubit algos
...
And rename doom to luffa, like djm34
2014-09-11 02:20:52 +02:00
Tanguy Pruvot
1aec4555cc
Tune reg. count for qubit (luffa) algos
2014-09-11 00:50:27 +02:00
Tanguy Pruvot
31f77b6524
Put bloc height extraction in a function
2014-09-10 16:50:17 +02:00
Tanguy Pruvot
edf756deb5
update readme
2014-09-10 10:49:41 +02:00