ccminer-gostd-lite

Author	SHA1	Message	Date
Tanguy Pruvot	de738ccc2b	x11: secure groestl against possible cuda errors big cleanup...	2016-08-06 12:56:02 +02:00
Tanguy Pruvot	0a0fd33cac	attempt to reduce shared mem errors	2016-08-06 12:56:02 +02:00
Tanguy Pruvot	85c212eaad	implement x11evo algo Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>	2016-05-31 20:05:15 +02:00
Tanguy Pruvot	a237601747	1.7.1 release set schedule flags to reduce linux cpu usage without MyStreamSynchronize()	2016-01-26 20:43:16 +01:00
Tanguy Pruvot	76a22479b1	whirlpool midstate and debug/trace defines + new cuda_debug.cuh include to trace gpu data Happy new year! Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>	2016-01-01 10:40:26 +01:00
Tanguy Pruvot	8ceb5cfd65	sib: add missing algo free entry + opt 64	2016-01-01 07:58:59 +01:00
Tanguy Pruvot	e75b26feb4	sib coin algo (X11 + Streebog) Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>	2015-12-31 19:11:48 +01:00
Tanguy Pruvot	61ff92b5b4	never interrupt global benchmark with found nonces fix some algo weird hashrates (like blake) and reset device between algos, for better accuracy but this reset doesnt seems enough to bench all algos correctly... to test on linux, could be a driver issue... heavy: fix first alloc and indent with tabs...	2015-11-01 21:12:50 +01:00
Tanguy Pruvot	2308f555c3	simd: cleanup and ignore linux host warning	2015-11-01 13:35:36 +01:00
Tanguy Pruvot	0d9d3520ac	simd: add support for SM 2.1 devices Add support for x11..x17, s3, fresh and qubit Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>	2015-11-01 12:37:52 +01:00
Tanguy Pruvot	8d4d4d65ce	cuda: header for common kernel functions (quark/x11) Was thinking about doing that since months ;) lets go	2015-10-25 06:54:17 +01:00
Tanguy Pruvot	d43dc9a021	use blake512 sp kernels on SM 5+ (80+64) import and keep my code for older archs, like skein 64 reduce the gap between our versions... +150kH x11 GTX 960 / +30kH 750Ti +900kH quark GTX 960 / +230kH 750Ti	2015-10-24 13:43:22 +02:00
Tanguy Pruvot	ef817df79a	import sp skein512 unrolled 64-bytes kernel (+0,6% x11) Quark and S3 are now a bit faster (+1 %) x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960) 80 bytes implementation to do/test ... (skein/skein2) but keep my previous version for older devices...	2015-10-23 09:43:20 +02:00
Tanguy Pruvot	355b835ae0	benchmark: enhance the mem leak detection reduce "false" warnings, and ignore unrelated/small ones <= 1 MB On windows the gpu memory can be allocated by other processes + some cleanup in algos... (free/gpulog)	2015-10-16 22:04:30 +02:00
Tanguy Pruvot	d195f2e8a2	intensity: do not reduce throughput before init Else the memory allocated could be less than required later btw, use the new "cuda" function to apply intensity/throughput	2015-10-11 05:01:41 +02:00
Tanguy Pruvot	922c2a5cd7	algos: free allocated mem for algo switch All can be freed propertly now, except script (reset) and lyra2 (leak)	2015-10-08 21:35:30 +02:00
Tanguy Pruvot	ee93927fac	diff: use the new function in all algos	2015-10-07 20:10:15 +02:00
Tanguy Pruvot	e1c4b3042c	algos: add functions to free allocated resources Will be used later for algo switching not really tested yet...	2015-09-25 07:51:57 +02:00
Tanguy Pruvot	5308898d1c	start v1.7, apply new prototypes to all algos	2015-09-23 15:42:17 +02:00
Tanguy Pruvot	c5df142124	Add c11 algo (x11 variant) Used by Chaincoin and Flaxscript	2015-06-29 11:46:16 +02:00
Tanguy Pruvot	7981e83db7	nvml: separated vendor id to string function for the day nvidia will fix their nvmlDeviceGetPciInfo api..	2015-06-23 10:01:31 +02:00
Tanguy Pruvot	e21c75793a	Revert "x11: improve aes (shavite/echo)" make a lot of cpu validation errors on windows, to be double checked in the next version... This reverts commit 1187a6e7e3211f0216111554a55b685687003b11.	2015-06-23 09:27:40 +02:00
Tanguy Pruvot	1187a6e7e3	x11: improve aes (shavite/echo) shavite is faster, echo doesn't really change due to the reg. overload This changes allow custom lauchbounds without other code changes and improve the portability against different devices. also set a minimum throughput to 1024 for these algos (shared mem req. size)	2015-06-19 05:23:06 +02:00
Tanguy Pruvot	9f5744d4c0	luffa/cube: fine tuning of maxregcount for the 750Ti This allow to get 69 regs used (tested on linux) 69 or 72 make the compiler to use 64 regs which is not enough on the 750Ti for optimal performance...	2015-06-17 03:58:31 +02:00
Tanguy Pruvot	634bea21f5	luffa/cube: unroll 1 really required on the 9xx	2015-06-17 03:39:48 +02:00
Tanguy Pruvot	42bcb91ca0	x11: update sp luffa/cube to get closer x11 speeds.. i had to clean it... lot of unused defines...	2015-06-17 02:31:15 +02:00
Tanguy Pruvot	2113be6eec	blake80: some changes and launch bounds, no perf changes	2015-04-24 14:12:21 +02:00
Tanguy Pruvot	3d3f2e2cb5	warnings: use the right device id (device_map[thr_id])	2015-04-23 09:41:56 +02:00
Tanguy Pruvot	e7ae27137e	x11/qubit: remove some extra MyStreamSynchronize only one per loop is required to prevent 100% cpu usage	2015-04-15 05:30:22 +02:00
Tanguy Pruvot	d58d53f2b2	update README, small changes, prepare release 1.6.1 still need a SM 3.0 fix for skein...	2015-04-14 23:28:00 +02:00
Tanguy Pruvot	4f43abb402	bmw512: indent and restore SM 3.0 compat could be also the source of the problem seen with CUDA 7 restored the code before sp/klaus changes for SM 3.0 devices...	2015-03-28 12:01:50 +01:00
KlausT	ae8e863591	remove uint32_t cast	2015-03-12 01:01:47 +01:00
Tanguy Pruvot	35cc5908ee	windows: return to normal priority, fix json decref the jansson error seems only seen in windows debug mode	2015-03-10 19:14:15 +01:00
Tanguy Pruvot	ebd23bcc66	whirlpoolx: real fix for multi gpus Main problem was the arrays allocations which should be made per cpu Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>	2015-03-08 22:56:04 +01:00
Tanguy Pruvot	9c4158aadb	debug: x11 algo traces for cuda 7 problem	2015-03-02 16:29:46 +01:00
Tanguy Pruvot	e6112e878d	cleanup: use unsigned throughput parameters Yes, its a big commit, was waiting 1.6 to do that... Sorry for your possible merge issues ;)	2015-02-28 14:05:09 +01:00
Tanguy Pruvot	26b51a557b	Allow different intensity per device and clean the old variables, no more required	2015-01-24 11:17:29 +01:00
Tanguy Pruvot	2a5233f56e	api: report throughput when default	2015-01-22 06:28:59 +01:00
Tanguy Pruvot	cafd4477d7	Handle a maximum of 16 gpus (vs 8 before) Some cards have 2 gpus on board...	2015-01-22 04:55:27 +01:00
Tanguy Pruvot	b521acb480	groestl: use sp bitslice enhancement, prepare SM 2.x variant todo: simd512 SM 2.x variant (shfl op), and groestl/myriad functions	2015-01-19 00:42:14 +01:00
Tanguy Pruvot	90efbdcece	simd cleanup	2014-12-19 09:16:55 +01:00
Tanguy Pruvot	ec5a48f420	x11: small simd512 gpu_expand improvement	2014-12-19 09:16:55 +01:00
Tanguy Pruvot	6c7fce187b	x11: use KlausT optimisation (+20 KHs) But use a define in AES to use or not device initial memcpy I already tried to use everywhere direct device constants and its not faster for big arrays (difference is small) also change launch bounds to reduce spills (72 regs) to check on windows too, could improve the perf... or not	2014-12-06 04:14:36 +01:00
Tanguy Pruvot	c3bdb623e8	Check and submit multiple nonces in one loop Added to most algos, checkhash function scans a big range and can find multiple nonces at once if the difficulty is low. Stop ignoring them, submit second one if found... Clean the draft code for rc=2 implemented for blake and pentablake btw... fix the reduced displayed hashrate when a nonce is found... Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>	2014-12-05 15:53:40 +00:00
Tanguy Pruvot	f387898ead	Prepare multiple nonces support in one loop (if found) Tested on x11 which find sometimes 3 nonces in one call, actually they are ignored because only the biggest was kept... This commit doesnt fix that, but will allow to enhance shares rate later...	2014-12-05 10:16:06 +01:00
Tanguy Pruvot	118a6be361	checkhash: simplify the common function use klaus trivial function, the old code has always been a bit weird.. split cuda_check_cpu_hash_64 in two functions, keep old for branched stuff	2014-12-01 00:20:40 +01:00
Tanguy Pruvot	8ad180cc70	various small changes heavy: reduce by 256 threads default intensity to all -i 20 cuda: put static thread init bools outside the code (made once) api: fix nvml header to build without	2014-11-28 20:57:35 +01:00
Tanguy Pruvot	6ae28162db	various extern cleanup + api history uids and gpu SM uids could be useful to create graphes from history data Note: please do a clean build after this commit (changes in miner.h)	2014-11-26 11:55:42 +01:00
Tanguy Pruvot	9b1ff1280e	Allow intermediate intensity (decimals) Sample with -i 18.5 Adding 131072 threads to intensity 18, 393216 cuda threads And with -i 19.5 Adding 262144 threads to intensity 19, 786432 cuda threads	2014-11-25 19:57:56 +01:00
Tanguy Pruvot	d0316220dd	simd512: restore full maxwell power (typo)	2014-11-23 21:19:35 +01:00

1 2

90 Commits