alexkarnew: (for driver 13.4 and newer, and cgminer 3.3.1)
alexkarold: (for older drivers than 13.4, and cgminer 3.3.1)
https://litecointalk.org/index.php?topic=4082.0
> I was able to optimize the code of cgminer's scrypt.cl.
> It gives 0-3% increase, depending on the drivers and hardware.
> 1. Without optimization, when "CO" is used, every time
> z+x*zSIZE+y*xSIZE*zSIZE is calculated.
> I have created "CO" variable, and made so that x*SIZE is calculated only
> once. Now, when "CO" is used, every time z+y*xSIZE*zSIZE is calculated.
> In one case, variable y is incremented by 1 after 8 "CO" calculations.
> I have created "CO_tmp" variable, where contains result of xSIZE*zSIZE.
> And after 8 "CO" calculations I add "CO_tmp" to "CO".
> Now, when "CO" is used, every time only z is calculated. It is faster as
> z+x*zSIZE+y*xSIZE*zSIZE :)
> In other case when "CO" is used, every time z+y*xSIZE*zSIZE is
> calculated, but it faster than z+x*zSIZE+y*xSIZE*zSIZE too.
> 2. I have replaced multiplication by 2 with bit rotation - it is faster.
> For 7xxx cards you can try to set --thread-concurrency equal to (2^n + 1).
> It may give a little more mining speed.
> For example: 16385 (it is 2^14 + 1), 8193 (2^13 + 1), or 4097 (2^12 + 1).
> I have almost no information, how it works on other series.
> LMqRcHdwnZtTMH6c2kWoxSoKM5KySfaP5C