fc84c719e9
based on the new djm34 method, 2x faster than first version cleaned and tuned for the GTX 750/960 (linux / cuda 6.5)