ccminer

mirror of https://github.com/GOSTSec/ccminer synced 2025-01-24 13:34:19 +00:00

Author	SHA1	Message	Date
Tanguy Pruvot	26c7316a08	vstudio: clean and fix blake ifdef for x64 the allocated var was not used... sigh	2015-10-24 18:21:45 +02:00
Tanguy Pruvot	2d83f74a7e	vstudio: special ifdef for the constant (bmw)	2015-10-24 15:13:35 +02:00
Tanguy Pruvot	d43dc9a021	use blake512 sp kernels on SM 5+ (80+64) import and keep my code for older archs, like skein 64 reduce the gap between our versions... +150kH x11 GTX 960 / +30kH 750Ti +900kH quark GTX 960 / +230kH 750Ti	2015-10-24 13:43:22 +02:00
Tanguy Pruvot	957d919a6a	bmw512: save a few KBs, ifdef 80-bytes kernel was only used by animecoin Also ifdef SM 3.0 compat. code to be ignored on recent archs	2015-10-24 07:30:57 +02:00
Tanguy Pruvot	3b7ef923c7	lyra2(v1): use a common uint2x4 include lyrav2 still need more definitions (uint16)	2015-10-23 15:25:24 +02:00
Tanguy Pruvot	82a7e62b30	skein: cleanup, strip uint2x4.h + update vstudio	2015-10-23 13:32:18 +02:00
Tanguy Pruvot	ef817df79a	import sp skein512 unrolled 64-bytes kernel (+0,6% x11) Quark and S3 are now a bit faster (+1 %) x11 get +0.6 % (+20kH/s on a 750ti, +30kH on a 960) 80 bytes implementation to do/test ... (skein/skein2) but keep my previous version for older devices...	2015-10-23 09:43:20 +02:00
Tanguy Pruvot	5bf1f98200	various fixes for SM 2.1 and the benchmark X11+ algos and quark are not compatible for the moment but these ones are : Benchmark results for Gigabyte GTX 460 (SM 2.1 / 1 GB): blakecoin : 159090.5 kH/s, 1 MB, 1048576 thr. blake : 70208.9 kH/s, 1 MB, 1048576 thr. bmw : 122802.6 kH/s, 65 MB, 2097152 thr. deep : 3533.6 kH/s, 33 MB, 524288 thr. fugue256 : 43177.9 kH/s, 17 MB, 524288 thr. heavy : 4118.2 kH/s, 147 MB, 524032 thr. keccak : 18673.1 kH/s, 129 MB, 2097152 thr. luffa : 28816.0 kH/s, 257 MB, 4194304 thr. lyra2 : 213.7 kH/s, 570 MB, 65536 thr. mjollnir : 3895.6 kH/s, 147 MB, 524032 thr. nist5 : 1101.4 kH/s, 67 MB, 1048576 thr. penta : 501.6 kH/s, 21 MB, 327680 thr. skein : 5432.4 kH/s, 65 MB, 1048576 thr. skein2 : 6788.9 kH/s, 33 MB, 524288 thr. whirlpool : 688.5 kH/s, 33 MB, 524288 thr. zr5 : 122.5 kH/s, 86 MB, 262144 thr.	2015-10-14 02:59:54 +00:00
Tanguy Pruvot	d195f2e8a2	intensity: do not reduce throughput before init Else the memory allocated could be less than required later btw, use the new "cuda" function to apply intensity/throughput	2015-10-11 05:01:41 +02:00
Tanguy Pruvot	4e1e03b891	benchmark: store all algos results + cuda fixes Note: lyra2, lyra2v2 and script seems to have problems to coexist with other algos... to run after some of them... moved lyra2 first and skip scrypt/jane for the moment... Only stored in memory for now.. to display a table after the bench ccminer -a auto --benchmark Results may be exported later to a json file...	2015-10-09 02:07:08 +02:00
Tanguy Pruvot	922c2a5cd7	algos: free allocated mem for algo switch All can be freed propertly now, except script (reset) and lyra2 (leak)	2015-10-08 21:35:30 +02:00
Tanguy Pruvot	ee93927fac	diff: use the new function in all algos	2015-10-07 20:10:15 +02:00
Tanguy Pruvot	e1c4b3042c	algos: add functions to free allocated resources Will be used later for algo switching not really tested yet...	2015-09-25 07:51:57 +02:00
Tanguy Pruvot	5308898d1c	start v1.7, apply new prototypes to all algos	2015-09-23 15:42:17 +02:00
Tanguy Pruvot	e3548f46f3	drop animecoin support no more really minable... just minable in french	2015-08-22 12:35:22 +02:00
Tanguy Pruvot	4709668995	jh512: rewrite and optimize with asm swap 5% improvement by the vshl asm swap functions, mixed shl+add inst., Add also xchg(x, y) func and XCHG(x, y) define in cuda_helper for later use... other jh changes are mainly for the beauty of the code... Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>	2015-06-16 08:20:48 +02:00
Tanguy Pruvot	a55b148ecc	windows: fix missing off_t include	2015-06-08 16:58:12 +02:00
Tanguy Pruvot	ed4927fcd0	quark/x11: set signed int hashPosition vars to off_t groestl (and keccak?) seems faster with 64bit vars (off_t or int64_t)...	2015-06-05 22:03:05 +02:00
Tanguy Pruvot	ebe95aac2f	bmw512: cleanup after cuda 7 bug fix	2015-05-29 14:32:23 +02:00
Tanguy Pruvot	0224d4705e	skein: fix wrong hashes seen on x11 with cuda 7 Look like a stream synch problem, not related to cuda 7 headers or cudart The threadfence() added doesnt changes performances, and could also be related to the random cpu validation errors... so keep it for all. Note: the 80-bytes variant used in skein2 doesn't seems affected.	2015-05-29 12:16:54 +02:00
Tanguy Pruvot	123fe287b6	x11: temporary workaround for cuda 7.0	2015-05-28 21:19:24 +02:00
Tanguy Pruvot	d9b0312897	x64: fix some size_t warnings	2015-05-17 04:56:42 +02:00
Tanguy Pruvot	051ba521be	skein2: minimal host changes	2015-05-14 19:38:03 +02:00
Tanguy Pruvot	2f541065fb	cuda_helper: rename correctly hiword/loword functions	2015-05-12 17:13:58 +02:00
Tanguy Pruvot	2113be6eec	blake80: some changes and launch bounds, no perf changes	2015-04-24 14:12:21 +02:00
Tanguy Pruvot	3d3f2e2cb5	warnings: use the right device id (device_map[thr_id])	2015-04-23 09:41:56 +02:00
Tanguy Pruvot	275a028935	skein: compute midstate first "Real" optimization based on KlausT precalc	2015-04-16 02:11:37 +02:00
Tanguy Pruvot	e7ae27137e	x11/qubit: remove some extra MyStreamSynchronize only one per loop is required to prevent 100% cpu usage	2015-04-15 05:30:22 +02:00
Tanguy Pruvot	163430daae	Skein/Skein2 SM 3.0 devices support + code cleanup Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>	2015-04-15 01:27:48 +02:00
Tanguy Pruvot	d58d53f2b2	update README, small changes, prepare release 1.6.1 still need a SM 3.0 fix for skein...	2015-04-14 23:28:00 +02:00
Tanguy Pruvot	48515ad707	groestl: rename included cuda files	2015-04-06 23:46:34 +02:00
Tanguy Pruvot	37395eefe4	skein: restore previous x11 speed	2015-03-28 13:32:08 +01:00
Tanguy Pruvot	4f43abb402	bmw512: indent and restore SM 3.0 compat could be also the source of the problem seen with CUDA 7 restored the code before sp/klaus changes for SM 3.0 devices...	2015-03-28 12:01:50 +01:00
Tanguy Pruvot	38e6672d70	Allow test of SM 2.1/3.0 binaries on newer cards Implementation based on klausT work.. a bit different This code must be placed in a common .cu file, cuda.cpp is not compiled with nvcc and doesnt allow cuda code...	2015-03-28 12:00:53 +01:00
Tanguy Pruvot	f86784ee56	Add skein algo (Skeincoin, Myriad, Unat...) SKEIN512 + SHA256 Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>	2015-03-27 15:24:27 +01:00
Tanguy Pruvot	a37e909db9	Add zr5 algo (for SM 3.5+) uint4 copy + keccak cleanup, groestl: small uint4 opt Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>	2015-03-27 15:16:25 +01:00
Tanguy Pruvot	9734186a37	jh512: import and improve klaus and sp changes did not import the extra final function, which should stay compatible with the common cuda_check_hash()	2015-03-20 05:36:40 +01:00
KlausT	ae8e863591	remove uint32_t cast	2015-03-12 01:01:47 +01:00
Tanguy Pruvot	e6112e878d	cleanup: use unsigned throughput parameters Yes, its a big commit, was waiting 1.6 to do that... Sorry for your possible merge issues ;)	2015-02-28 14:05:09 +01:00
Tanguy Pruvot	09c3ac6b4b	linux: fix missing dirname include	2015-02-11 18:36:57 +01:00
Tanguy Pruvot	2d5e8aaced	anime: fix uint2 error (bmw)	2015-02-08 18:32:42 +01:00
KlausT	a452c330dd	quark: remove unused variables	2015-02-02 10:41:14 +01:00
Tanguy Pruvot	26b51a557b	Allow different intensity per device and clean the old variables, no more required	2015-01-24 11:17:29 +01:00
Tanguy Pruvot	768b5ccb76	import bmw512 uint2 changes from sp + some cleanup... 15KH/s won (750Ti)	2015-01-24 08:02:41 +01:00
Tanguy Pruvot	9f2dd3ee60	Remove some useless conversions do not impact perfs neither...	2015-01-24 08:00:22 +01:00
Tanguy Pruvot	2a5233f56e	api: report throughput when default	2015-01-22 06:28:59 +01:00
Tanguy Pruvot	cafd4477d7	Handle a maximum of 16 gpus (vs 8 before) Some cards have 2 gpus on board...	2015-01-22 04:55:27 +01:00
Tanguy Pruvot	b521acb480	groestl: use sp bitslice enhancement, prepare SM 2.x variant todo: simd512 SM 2.x variant (shfl op), and groestl/myriad functions	2015-01-19 00:42:14 +01:00
Tanguy Pruvot	ec5a48f420	x11: small simd512 gpu_expand improvement	2014-12-19 09:16:55 +01:00
Tanguy Pruvot	1e24e4899c	skein: uint2 optimisation with SM 3.0 compat (+15KH) Thanks to sp and djm34 for this fast uint64 storage alternative	2014-12-16 13:52:54 +01:00

1 2

88 Commits