Split source into multiple files. Add experimental, unoptimized OpenCL implementation. Bump version to 0.13