1. Replace std::max with a ternary expression, std::max is much heavier than the ternary operator
2. Replace std::set with arrays, std::set is based on red-black trees, traversal will follow the chain domain, and cache hits are not good
3. Optimize the serialize function, improve the calculation speed of num_non_zero_registers by reducing branches, and the serialization of _registers after optimization is faster
4. The test found that the performance improvement is more obvious