Con Kolivas
8be9d13ff2
Further generic microoptimisations to poclbm kernel.
13 years ago
Con Kolivas
cad84c6f2c
Change poclbm version number.
13 years ago
Con Kolivas
4f1676f67f
One array is faster than 2 separate arrays so change to that in poclbm kernel..
13 years ago
Con Kolivas
f5903e609d
Microoptimisations to poclbm kernel which increase throughput slightly.
13 years ago
Con Kolivas
2fa142d1ce
One array is faster than 2 separate arrays so change to that in poclbm kernel..
13 years ago
Con Kolivas
1355859742
Microoptimisations to poclbm kernel which increase throughput slightly.
13 years ago
Con Kolivas
ebaa2be1df
Update poclbm kernel for better performance on GCN and new SDKs with bitalign support when not BFI INT patching.
...
Update phatk kernel to work properly for non BFI INT patched kernels, providing support for phatk to run on GCN and non-ATI cards.
13 years ago
Con Kolivas
3567b69e5e
Remove fragile source patching for bitalign, vectors et. al and simply pass it with the compiler options.
13 years ago
Con Kolivas
6d10ef2f6e
Bump version numbers of kernels to indicate slightly different versions.
13 years ago
Con Kolivas
bd79a61c43
Move poclbm to new branch optimisation as well.
13 years ago
Con Kolivas
cf54f9b850
Move to 256 sized buffers and don't risk overwrite by using only 127 mask.
13 years ago
Con Kolivas
0f782ba6bd
Update poclbm kernel to FF sized mask and only check that range.
13 years ago
Con Kolivas
95f878294f
The extra shift in the output actually appears detrimental in cgminer and there is a miniscule chance of missing the actual result if it ends up in the same spot as MAXBUFFERS.
14 years ago
Con Kolivas
a7707a26cb
Rename the poclbm file to ensure a new binary is built since.
14 years ago
Con Kolivas
b198badcf4
The poclbm kernel needs to be updated to work with the change to 4k sized output buffers.
14 years ago
Con Kolivas
13b43cfad1
Update copyright and authors.
14 years ago
Con Kolivas
2b6e841673
Use a buffer of up to 512 * 4 integers when retrieving work from the GPU.
...
This allows each local thread id to have one slot to put any positive results into, thus making overlapping results far less likely.
Thus races will be much rarer, allowing more threads.
It should also pick up blocks close to each other more reliably and hopefully decrease the number of rejects and opencl errors.
Do the search over the buffer entirely in a separate thread to allow the GPU to stay as busy as possible.
Detach threads from themselves to prevent unlucky even where dereferencing occurs by freeing the data that stores the thread info.
14 years ago
Con Kolivas
2dbb39444d
Base was being set wrongly meaning we were repeating searches and the rate was actually lower than displayed :(
...
Tweak Ma with new changes.
Change default vectors to 2 since it's faster than 4 even when 4 is reported as preferred.
14 years ago
Con Kolivas
623b9b9fd8
Patch bitalign separately from bfi_int.
...
Recover from failing to patch for bfi int.
14 years ago
Con Kolivas
8253f1414b
Use some line breaks in the kernel.
14 years ago
Con Kolivas
4257deafdb
Convert abcd... to an array.
14 years ago
Con Kolivas
75cf5ccda6
Replace Ws with an array.
14 years ago
ckolivas
19eea9067f
Implement code detecting max work size and optimal vector width.
...
Use this to patch the kernel to suit the idea values for the card.
Then use these values when invoking the kernel.
14 years ago
Con Kolivas
f54d2cc0ed
Make poclbm use 4 vectors and decrease worksize to keep pipelines fullish.
...
Make it possible to have 0 CPU threads and update docs.
Fix counter with no cpu threads.
14 years ago
ckolivas
b4d2733cfc
Convert to poclbm kernel.
14 years ago