Add likely() macro.
Optimise a few obvious code paths with likely/unlikely.
Change algo to sse2_amd64 by default.
Move priority change to worker threads only.
Detect number of CPUs and set default number of threads == CPUs.
Add scheduling policy change to worker threads to SCHED_IDLE first and fallback to SCHED_BATCH on linux.
Don't error when failing to set priority.
Add CPU affinity and bind worker threads to CPUs when number of threads is a multiple of number of CPUs.
Update NEWS with changes.