use clz (leading zeros) asm func for a fast gpu compare of ptarget[6]:[7]
add also missing windows ctz/clz host functions
New NEOS speed: 227MH to 270MH (Gigabyte 750Ti Black Edition)
Indent, and put commonly used functions proto. in cuda_helper.h
And add them to --cputest function
Also change the color option to --nocolor, -C is no more needed
Signed-off-by: Tanguy Pruvot <tanguy.pruvot@gmail.com>
(Which is tired to remove these german copy/pasted comments)