- Non-GCC intrinsics alternative implementation removed. Let's worry
about the absence of the intrinsics once/if that becomes relevant.
- Spinlock release now performed using __sync_lock_release, as per
svoj's advice.
- while-looping on the variable used as lock removed, so it no longer
need to be volatile.
- Boolean function returns bool.
- Size of profiling counters increased.
- Risk for division-by-zero removed.
- Documentation moved from implementation to header.