34aba55045995c9f80906817959396d513c980f2
TurboPFor: Fastest Integer Compression
- 100% C, without inline assembly
- Fastest **"Variable Byte"** implementation
- Novel **"Variable Simple"** faster than simple16 and more compact than simple64
- Scalar **"Binary Packing"** with bulk decoding as fast as SIMD FastPFor in realistic (No "pure cache") scenarios - Binary Packing with **Direct/Random Access** without decompressing entire blocks - Access any single binary packed entry with **zero decompression**
- Novel **"TurboPFor"** (Patched Frame-of-Reference) scheme with direct access or bulk decoding
- Several times faster than other libraries - Usage as easy as memcpy - Instant access to compressed *frequency* and *position* data in inverted index with zero decoding
Benchmark:
i7-2600k at 3.4GHz, gcc 4.9, ubuntu 14.10.
- Single thread
- Realistic and practical benchmark with large integer arrays.
- No PURE cache benchmark
Synthetic data:
coming soon!
data files
- clueweb09.sorted from FastPFor (http://lemire.me/data/integercompression2014.html)
./icbench -n10000000000 clueweb09.sorted
Size | Ratio in % | Bits/Integer | C Time MB/s | D Time MB/s | Function |
---|---|---|---|---|---|
514438405 | 8.16 | 2.61 | 357.22 | 1286.42 | TurboPFor |
514438405 | 8.16 | 2.61 | 358.09 | 309.70 | TurboPFor DA |
539841792 | 8.56 | 2.74 | 6.47 | 767.35 | OptP4 |
583184112 | 9.25 | 2.96 | 132.42 | 914.89 | Simple16 |
623548565 | 9.89 | 3.17 | 235.32 | 925.71 | SimpleV |
733365952 | 11.64 | 3.72 | 162.21 | 1312.15 | Simple64 |
862464289 | 13.68 | 4.38 | 1274.01 | 1980.55 | TurboPack |
862464289 | 13.68 | 4.38 | 1285.28 | 868.06 | TurboPack DA |
862465391 | 13.68 | 4.38 | 1402.12 | 2075.15 | SIMD-BitPack FPF |
6303089028 | 100.00 | 32.00 | 1257.50 | 1308.22 | copy |
Compile:
make
Usage
Synthetic data:
- test all functions<br > ./icbench -a1.0 -m0 -x8 -n100000000
- zipfian distribution alpha = 1.0 (Ex. -a1.0=uniform -a1.5=skewed distribution)
- number of integers = 100000000
- integer range from 0 to 255 (integer size = 0 to 8 bits)
- individual function test (ex. copy TurboPack TurboPack Direct access)<br > ./icbench -a1.0 -m0 -x8 -ecopy/turbopack/turbopack,da -n100000000
Data files:
- Data file Benchmark (file format as in FastPFOR)<br > ./icbench -n10000000000 clueweb09.sorted
Reference:
- "SIMD-BitPack FPF" from FastPFor https://github.com/lemire/simdcomp
- OptP4 and Simple-16 from http://jinruhe.com/
Languages
C
93.1%
Rust
5.8%
Makefile
0.6%
Java
0.4%