Con Kolivas
0bde957912
Update all kernel version names.
13 years ago
Con Kolivas
8f08a775ad
Use any() in kernel output code and revert breakage of diakgcn kernel.
13 years ago
Con Kolivas
145f3c0b1d
Put the nonce for each vector offset in advance, avoiding one extra addition in the kernel.
13 years ago
Con Kolivas
5e31785e7b
Increase poclbm version number.
13 years ago
Con Kolivas
49c28b3929
Use PreVal4addT1 instead of PreVal4 in poclbm kernel.
13 years ago
Con Kolivas
5c4df1309a
Import PreVal4 and PreVal0 into poclbm kernel.
13 years ago
Con Kolivas
f5c296785f
Import more prepared constants into poclbm kernel.
...
Conflicts:
poclbm120213.cl
13 years ago
Con Kolivas
734dfecec5
Keep variables in one array but use Vals[] name for consistency with other kernel designs.
13 years ago
Con Kolivas
3f9e34a53c
Replace constants that are mandatorily added in poclbm kernel with one value.
13 years ago
Con Kolivas
b941146c29
Remove addition of final constant before testing for result in poclbm kernel.
13 years ago
Con Kolivas
81cb584586
Hand optimise variable addition order.
13 years ago
Con Kolivas
dc2d553d5b
Hand optimise first variable declaration order in poclbm kernel.
13 years ago
Con Kolivas
f39fac9e4d
Third pass reorder.
13 years ago
Con Kolivas
b754fb8f4e
2nd pass radical reorder.
13 years ago
ckolivas
e2b3c85d59
Radical reordering machine based first pass to change variables as late as possible, bringing their usage close together.
13 years ago
Con Kolivas
57dad38d04
Unroll all additions to enable further optimisations.
13 years ago
Con Kolivas
64acb9dae7
Increase version numbers of modified kernels.
13 years ago
Con Kolivas
210fe9d5b9
Constify nonce in poclbm.
13 years ago
Con Kolivas
60f8ccb313
Use local and group id on poclbm kernel as well.
13 years ago
Con Kolivas
8be9d13ff2
Further generic microoptimisations to poclbm kernel.
13 years ago
Con Kolivas
cad84c6f2c
Change poclbm version number.
13 years ago
Con Kolivas
4f1676f67f
One array is faster than 2 separate arrays so change to that in poclbm kernel..
13 years ago
Con Kolivas
f5903e609d
Microoptimisations to poclbm kernel which increase throughput slightly.
13 years ago
Con Kolivas
2fa142d1ce
One array is faster than 2 separate arrays so change to that in poclbm kernel..
13 years ago
Con Kolivas
1355859742
Microoptimisations to poclbm kernel which increase throughput slightly.
13 years ago
Con Kolivas
ebaa2be1df
Update poclbm kernel for better performance on GCN and new SDKs with bitalign support when not BFI INT patching.
...
Update phatk kernel to work properly for non BFI INT patched kernels, providing support for phatk to run on GCN and non-ATI cards.
13 years ago
Con Kolivas
3567b69e5e
Remove fragile source patching for bitalign, vectors et. al and simply pass it with the compiler options.
13 years ago
Con Kolivas
6d10ef2f6e
Bump version numbers of kernels to indicate slightly different versions.
13 years ago
Con Kolivas
bd79a61c43
Move poclbm to new branch optimisation as well.
13 years ago
Con Kolivas
cf54f9b850
Move to 256 sized buffers and don't risk overwrite by using only 127 mask.
13 years ago
Con Kolivas
0f782ba6bd
Update poclbm kernel to FF sized mask and only check that range.
13 years ago
Con Kolivas
95f878294f
The extra shift in the output actually appears detrimental in cgminer and there is a miniscule chance of missing the actual result if it ends up in the same spot as MAXBUFFERS.
14 years ago
Con Kolivas
a7707a26cb
Rename the poclbm file to ensure a new binary is built since.
14 years ago
Con Kolivas
b198badcf4
The poclbm kernel needs to be updated to work with the change to 4k sized output buffers.
14 years ago
Con Kolivas
13b43cfad1
Update copyright and authors.
14 years ago
Con Kolivas
2b6e841673
Use a buffer of up to 512 * 4 integers when retrieving work from the GPU.
...
This allows each local thread id to have one slot to put any positive results into, thus making overlapping results far less likely.
Thus races will be much rarer, allowing more threads.
It should also pick up blocks close to each other more reliably and hopefully decrease the number of rejects and opencl errors.
Do the search over the buffer entirely in a separate thread to allow the GPU to stay as busy as possible.
Detach threads from themselves to prevent unlucky even where dereferencing occurs by freeing the data that stores the thread info.
14 years ago
Con Kolivas
2dbb39444d
Base was being set wrongly meaning we were repeating searches and the rate was actually lower than displayed :(
...
Tweak Ma with new changes.
Change default vectors to 2 since it's faster than 4 even when 4 is reported as preferred.
14 years ago
Con Kolivas
623b9b9fd8
Patch bitalign separately from bfi_int.
...
Recover from failing to patch for bfi int.
14 years ago
Con Kolivas
8253f1414b
Use some line breaks in the kernel.
14 years ago
Con Kolivas
4257deafdb
Convert abcd... to an array.
14 years ago
Con Kolivas
75cf5ccda6
Replace Ws with an array.
14 years ago
ckolivas
19eea9067f
Implement code detecting max work size and optimal vector width.
...
Use this to patch the kernel to suit the idea values for the card.
Then use these values when invoking the kernel.
14 years ago
Con Kolivas
f54d2cc0ed
Make poclbm use 4 vectors and decrease worksize to keep pipelines fullish.
...
Make it possible to have 0 CPU threads and update docs.
Fix counter with no cpu threads.
14 years ago
ckolivas
b4d2733cfc
Convert to poclbm kernel.
14 years ago