10 Commits

Author SHA1 Message Date
Tanguy Pruvot
e21c75793a Revert "x11: improve aes (shavite/echo)"
make a lot of cpu validation errors on windows,
to be double checked in the next version...

This reverts commit 1187a6e7e3211f0216111554a55b685687003b11.
2015-06-23 09:27:40 +02:00
Tanguy Pruvot
1187a6e7e3 x11: improve aes (shavite/echo)
shavite is faster, echo doesn't really change due to the reg. overload

This changes allow custom lauchbounds without other code changes and improve
the portability against different devices.

also set a minimum throughput to 1024 for these algos (shared mem req. size)
2015-06-19 05:23:06 +02:00
Tanguy Pruvot
6c7fce187b x11: use KlausT optimisation (+20 KHs)
But use a define in AES to use or not device initial memcpy

I already tried to use everywhere direct device constants
and its not faster for big arrays (difference is small)

also change launch bounds to reduce spills (72 regs)

to check on windows too, could improve the perf... or not
2014-12-06 04:14:36 +01:00
Tanguy Pruvot
73f22b237a Prepare trap of hardware/mem failures 2014-11-20 18:44:25 +01:00
Tanguy Pruvot
fdd5d29071 x11: shavite and echo from sp (now ok on win32)
Previous echo commit was only increasing linux performance, and reducing
windows perf compared to the 1.4.9, this one seems to give at least
the 1.4.9 on windows, and the same on linux...

Shavite optimisation seems ok on both (use now 64 registers)

the launch_bounds will force the number of registers, so remove specific
Makefile rules on linux...

manual "cherry pick" with fixed line endings and some adaptations
2014-11-16 17:34:50 +01:00
sp-hash
5be6811dcf x11: echo and cubehash optimization
echo : 40.056ms -> 39.241ms
cube : 14.490ms -> 13.511ms

cube hash change look like useless (__device__ code in generally inlined)
but the reality proves that cuda documentation is wrong...

tpruvot: fixed dos lines ending in echo,
and used my style for cuda function attributes
2014-11-06 15:17:26 +01:00
Tanguy Pruvot
912ef1215d small reg tunes, rename whirlcoin to whirl 2014-08-21 02:57:10 +02:00
Tanguy Pruvot
194fda87c1 x11: restore simd host2dev memcpytosymbol to reduce used cmem
Remove define attempts for SM 2.1 devices, fermi is not compatible
2014-08-19 18:32:14 +02:00
Tanguy Pruvot
cf7351d138 x10 funcs cleanup, we dont need host constant tables 2014-08-15 03:40:13 +02:00
Christian Buchner
af07302b4b v1.0 - Yo, I heard y'all like X11 2014-05-10 00:29:59 +02:00