alexkarnew: (for driver 13.4 and newer, and cgminer 3.3.1)
alexkarold: (for older drivers than 13.4, and cgminer 3.3.1)
https://litecointalk.org/index.php?topic=4082.0
> I was able to optimize the code of cgminer's scrypt.cl.
> It gives 0-3% increase, depending on the drivers and hardware.
> 1. Without optimization, when "CO" is used, every time
> z+x*zSIZE+y*xSIZE*zSIZE is calculated.
> I have created "CO" variable, and made so that x*SIZE is calculated only
> once. Now, when "CO" is used, every time z+y*xSIZE*zSIZE is calculated.
> In one case, variable y is incremented by 1 after 8 "CO" calculations.
> I have created "CO_tmp" variable, where contains result of xSIZE*zSIZE.
> And after 8 "CO" calculations I add "CO_tmp" to "CO".
> Now, when "CO" is used, every time only z is calculated. It is faster as
> z+x*zSIZE+y*xSIZE*zSIZE :)
> In other case when "CO" is used, every time z+y*xSIZE*zSIZE is
> calculated, but it faster than z+x*zSIZE+y*xSIZE*zSIZE too.
> 2. I have replaced multiplication by 2 with bit rotation - it is faster.
> For 7xxx cards you can try to set --thread-concurrency equal to (2^n + 1).
> It may give a little more mining speed.
> For example: 16385 (it is 2^14 + 1), 8193 (2^13 + 1), or 4097 (2^12 + 1).
> I have almost no information, how it works on other series.
> LMqRcHdwnZtTMH6c2kWoxSoKM5KySfaP5C
Changed encoding to UTF-8.
Will not build with sgminer (fix in next commit).
http://www.reddit.com/r/dogecoin/comments/1ui3bx/increase_such_hashrate_1_to_5_scrypt_tweaking/ceir5na
> It is pretty much stock, except that I have removed all the #pragma
> unrolls, and optimized the inner scrypt_core loop. #pragma unroll does
> not give any speedup here.
> The idea is to move the "if (j&1)" comparison to outside of the lookup
> loops. Then, if j&1 happens to be zero, the V[z] and X[z] loops can be
> combined to a single loop, which gives the speedup!
> This loop and the salsa function are the most important places in the
> entire source, it probably spends over 90% of time in here.. There's
> very little to be gained outside of these, I think.
> Donations: DQj4t2DFMQtXofhstouyZw1sYUKWUJn4wv
https://github.com/veox/sgminer/issues/4#issuecomment-32753290
> Most of these optimized kernels (including mine), have fixed
> lookup-gap=2. However, I have never seen anyone use any other value, for
> any GPU, so I think you could just remove the configurable value.
> Or with some #if LOOKUP_GAP==2 magic it is of course possible to make
> such source that allows any value.
> Some users have reported slightly slower hashrate with my kernel as
> well, but this could be some misconfiguration also.. If scrypt kernel
> becomes faster, you may need to lower the GPU engine clock to get full
> speed. Same as if you increase GPU clock too high, you will get a drop
> in hash rate.
> My source is free to use in sgminer. And if you diff to original you
> will see that the changes are not very big.
> Removing of #pragma unrolls helps in any GPU, in my opinion.. Current
> compilers know better when unrolling helps.