a43205a84f
The double loop is not useful, and prefer the __thread attribute to enhance the code readability (remove the 2D host arrays). squashed: return to host 2D array to allow the free