moved enable_pool(), disable_pool() and other functions related to
state. These could probably be factored out altogether.
Pool state default is now "enabled" - it was previosly "disabled",
but there was an unconditional function call to enable all pools
in main() previously. It was factored out by joe's earlier commits,
so not visible in this one.
This setting allows to set the GPU intensity value directly without any modifiers, it does not
get any more raw than this! Look at the xintensity description raw for examples of regular
intensity values. You can also set this value through the ncurses interface by pressing:
G -> A -> select device id -> enter value.
Minor xintensity code cleanup as well.
Conflicts:
driver-opencl.c
miner.h
sgminer.c
alexkarnew: (for driver 13.4 and newer, and cgminer 3.3.1)
alexkarold: (for older drivers than 13.4, and cgminer 3.3.1)
https://litecointalk.org/index.php?topic=4082.0
> I was able to optimize the code of cgminer's scrypt.cl.
> It gives 0-3% increase, depending on the drivers and hardware.
> 1. Without optimization, when "CO" is used, every time
> z+x*zSIZE+y*xSIZE*zSIZE is calculated.
> I have created "CO" variable, and made so that x*SIZE is calculated only
> once. Now, when "CO" is used, every time z+y*xSIZE*zSIZE is calculated.
> In one case, variable y is incremented by 1 after 8 "CO" calculations.
> I have created "CO_tmp" variable, where contains result of xSIZE*zSIZE.
> And after 8 "CO" calculations I add "CO_tmp" to "CO".
> Now, when "CO" is used, every time only z is calculated. It is faster as
> z+x*zSIZE+y*xSIZE*zSIZE :)
> In other case when "CO" is used, every time z+y*xSIZE*zSIZE is
> calculated, but it faster than z+x*zSIZE+y*xSIZE*zSIZE too.
> 2. I have replaced multiplication by 2 with bit rotation - it is faster.
> For 7xxx cards you can try to set --thread-concurrency equal to (2^n + 1).
> It may give a little more mining speed.
> For example: 16385 (it is 2^14 + 1), 8193 (2^13 + 1), or 4097 (2^12 + 1).
> I have almost no information, how it works on other series.
> LMqRcHdwnZtTMH6c2kWoxSoKM5KySfaP5C
Changed encoding to UTF-8.
Will not build with sgminer (fix in next commit).
http://www.reddit.com/r/dogecoin/comments/1ui3bx/increase_such_hashrate_1_to_5_scrypt_tweaking/ceir5na
> It is pretty much stock, except that I have removed all the #pragma
> unrolls, and optimized the inner scrypt_core loop. #pragma unroll does
> not give any speedup here.
> The idea is to move the "if (j&1)" comparison to outside of the lookup
> loops. Then, if j&1 happens to be zero, the V[z] and X[z] loops can be
> combined to a single loop, which gives the speedup!
> This loop and the salsa function are the most important places in the
> entire source, it probably spends over 90% of time in here.. There's
> very little to be gained outside of these, I think.
> Donations: DQj4t2DFMQtXofhstouyZw1sYUKWUJn4wv
https://github.com/veox/sgminer/issues/4#issuecomment-32753290
> Most of these optimized kernels (including mine), have fixed
> lookup-gap=2. However, I have never seen anyone use any other value, for
> any GPU, so I think you could just remove the configurable value.
> Or with some #if LOOKUP_GAP==2 magic it is of course possible to make
> such source that allows any value.
> Some users have reported slightly slower hashrate with my kernel as
> well, but this could be some misconfiguration also.. If scrypt kernel
> becomes faster, you may need to lower the GPU engine clock to get full
> speed. Same as if you increase GPU clock too high, you will get a drop
> in hash rate.
> My source is free to use in sgminer. And if you diff to original you
> will see that the changes are not very big.
> Removing of #pragma unrolls helps in any GPU, in my opinion.. Current
> compilers know better when unrolling helps.