TurboPFor: Fastest Integer Compression 
- 100% C (C++ compatible headers), without inline assembly
- Fastest **"Variable Byte"** implementation
- Novel **"Variable Simple"** faster than simple16 and more compact than simple64
- Scalar **"Bit Packing"** with bulk decoding as fast as SIMD FastPFor in realistic and practical (No "pure cache") scenarios - Bit Packing with **Direct/Random Access** without decompressing entire blocks - Access any single bit packed entry with **zero decompression** - Reducing **Cache Pollution**
- Novel **"TurboPFor"** (Patched Frame-of-Reference) scheme with direct access or bulk decoding. Outstanding compression
- Several times faster than other libraries - Usage in C/C++ as easy as memcpy - Most functions optimized for speed and others for high compression ratio - **New:** Include more functions
- Instant access to compressed *frequency* and *position* data in inverted index with zero decompression - **New:** Inverted Index Demo + Benchmarks: Intersection of lists of sorted integers. - more than **1000 queries per second** on gov2 (25 millions documents) on a **SINGLE** core. - Decompress only the minimum necessary blocks.
Benchmark:
i7-2600k at 3.4GHz, gcc 4.9, ubuntu 14.10.
- Single thread
- Realistic and practical benchmark with large integer arrays.
- No PURE cache benchmark
Synthetic data:
coming soon!
data files
- clueweb09.sorted from FastPFor (http://lemire.me/data/integercompression2014.html)
./icbench -n10000000000 clueweb09.sorted
Size | Ratio in % | Bits/Integer | C Time MB/s | D Time MB/s | Function |
---|---|---|---|---|---|
514438405 | 8.16 | 2.61 | 357.22 | 1286.42 | TurboPFor |
514438405 | 8.16 | 2.61 | 358.09 | 309.70 | TurboPFor DA |
539841792 | 8.56 | 2.74 | 6.47 | 767.35 | OptP4 |
583184112 | 9.25 | 2.96 | 132.42 | 914.89 | Simple16 |
623548565 | 9.89 | 3.17 | 235.32 | 925.71 | SimpleV |
733365952 | 11.64 | 3.72 | 162.21 | 1312.15 | Simple64 |
862464289 | 13.68 | 4.38 | 1274.01 | 1980.55 | TurboPack |
862464289 | 13.68 | 4.38 | 1285.28 | 868.06 | TurboPack DA |
862465391 | 13.68 | 4.38 | 1402.12 | 2075.15 | SIMD-BitPack FPF |
6303089028 | 100.00 | 32.00 | 1257.50 | 1308.22 | copy |
Compile:
make
Benchmark
Synthetic data:
- test all functions
*./icbench -a1.0 -m0 -x8 -n100000000*
- zipfian distribution alpha = 1.0 (Ex. -a1.0=uniform -a1.5=skewed distribution)
- number of integers = 100000000
- integer range from 0 to 255 (integer size = 0 to 8 bits)
- individual function test (ex. copy TurboPack TurboPack Direct access)
*./icbench -a1.0 -m0 -x8 -ecopy/turbopack/turbopackda -n100000000*
Data files:
-
Data file Benchmark (file format as in FastPFOR)
./icbench -c1 gov2.sorted
Benchmarking intersections
-
Download "gov2.sorted" (or clueweb09) + query file "aol.txt" from "http://lemire.me/data/integercompression2014.html"
-
Create index file gov2.sorted.i
./idxcr gov2.sorted .
create inverted index file "gov2.sorted.i" in the current directory
-
Benchmarking intersections
./idxqry gov2.sorted.i aol.txt
run queries in file "aol.txt" over the index of gov2 file
8GB Minimum of RAM required (16GB recommended for benchmarking "clueweb09" files).
Function usage:
In general compression/decompression functions are of the form:
char *endptr = compress( unsigned *in, int n, char *out) endptr : set by compress to the next character in "out" after the compressed buffer in : input integer array n : number of elements out : pointer to output buffer
char *endptr = decompress( char *in, int n, unsigned *out) endptr : set by decompress to the next character in "in" after the decompressed buffer in : pointer to input buffer n : number of elements out : output integer array
header files with documentation : vint.h - variable byte vsimple.h - variable simple vp4dc.h,vp4dd.h - TurboPFor bitpack.h,bitunpack.h - Bit Packing
Reference:
- "SIMD-BitPack FPF" from FastPFor https://github.com/lemire/simdcomp
- Sorted integer datasets from http://lemire.me/data/integercompression2014.html
- OptP4 and Simple-16 from http://jinruhe.com/
#------------------------------------------------