both merged and unmerged implementations are broken with CUDA 6.5 No perf changes...
Based on alexis78 work and sponsored by LBRY.IO team (thanks) Release 1.8.2, use cuda 8 for x86