TurboPFor: Fastest Integer Compression 
- 100% C (C++ compatible headers), without inline assembly
- Fastest **"Variable Byte"** implementation
- Novel **"Variable Simple"** faster than simple16 and more compact than simple8-b
- Scalar **"Bit Packing"** with bulk decoding as fast as SIMD FastPFor in realistic and practical (No "pure cache") scenarios - Bit Packing with **Direct/Random Access** without decompressing entire blocks - Access any single bit packed entry with **zero decompression** - **New:** **Direct Update** of individual bit packed entries - Reducing **Cache Pollution**
- Novel **"TurboPFor"** (Patched Frame-of-Reference) scheme with direct access or bulk decoding. Outstanding compression
- Several times faster than other libraries - Usage in C/C++ as easy as memcpy - Most functions optimized for speed and others for high compression ratio - **New:** Include more functions
- Instant access to compressed *frequency* and *position* data in inverted index with zero decompression - **New:** Inverted Index Demo + Benchmarks: Intersection of lists of sorted integers. - more than **1000 queries per second** on gov2 (25 millions documents) on a **SINGLE** core. - Decompress only the minimum necessary blocks (Ex. 10-15%).
Benchmark:
CPU: Sandy bridge i7-2600k at 4.2GHz, gcc 4.9, ubuntu 14.10, Single thread.
- Realistic and practical benchmark with large integer arrays.
- No PURE cache benchmark
Synthetic data:
-
Generate and test skewed distribution.
./icbench -a1.5 -m0 -M8 -n100000000
Size | Ratio in % | Bits/Integer | C Time MB/s | D Time MB/s | Function |
---|---|---|---|---|---|
63392801 | 15.85 | 5.07 | 316.96 | 893.67 | TurboPFor |
63392801 | 15.85 | 5.07 | 315.59 | 227.15 | TurboPForDA |
65359916 | 16.34 | 5.23 | 7.09 | 638.96 | OptPFD |
72364024 | 18.09 | 5.79 | 85.31 | 762.00 | Simple16 |
78514276 | 19.63 | 6.28 | 229.21 | 748.32 | SimpleV |
95915096 | 23.98 | 7.67 | 221.46 | 1049.70 | Simple-8b |
99910930 | 24.98 | 7.99 | 1553.92 | 1904.21 | SIMDPackFPF |
99910930 | 24.98 | 7.99 | 953.29 | 1872.02 | TurboPack |
99910930 | 24.98 | 7.99 | 953.13 | 869.84 | TurboPackDA |
102074663 | 25.52 | 8.17 | 1131.47 | 1184.68 | TurboVbyte |
102074663 | 25.52 | 8.17 | 1110.75 | 897.86 | VbyteFPF |
112500000 | 28.12 | 9.00 | 305.85 | 1899.15 | VarintG8IU |
400000000 | 100.00 | 32.00 | 1470.87 | 1477.93 | Copy |
####----------------------------------------------------
Data files
-
gov2.sorted (from http://lemire.me/data/integercompression2014.html) Blocksize=128
(+ SimpleV 64k). Benchmark repeated several times../icbench -c1 gov2.sorted
Size | Ratio in % | Bits/Integer | C Time MB/s | D Time MB/s | Function |
---|---|---|---|---|---|
3214763689 | 13.44 | 4.30 | 279.93 | 665.41 | SimpleV 64k |
3337758854 | 13.95 | 4.47 | 5.06 | 513.00 | OptPFD |
3357673495 | 14.04 | 4.49 | 270.57 | 813.83 | TurboPFor |
3501671314 | 14.64 | 4.68 | 258.56 | 720.76 | SimpleV |
3820190182 | 15.97 | 5.11 | 118.81 | 650.21 | Simple16 |
4521326518 | 18.90 | 6.05 | 209.17 | 824.26 | Simple-8b |
4953768342 | 20.71 | 6.63 | 647.75 | 1501.24 | TurboPack |
5203353057 | 21.75 | 6.96 | 1560.34 | 1806.60 | SIMDPackFPFD1 |
6699519000 | 28.01 | 8.96 | 502.86 | 624.12 | TurboVbyte |
6699519000 | 28.01 | 8.96 | 472.01 | 495.12 | VbyteFPF |
7622896878 | 31.87 | 10.20 | 208.73 | 1197.74 | VarintG8IU |
23918861764 | 100.00 | 32.00 | 1391.82 | 1420.03 | Copy |
####-------------------------------------------------------
Compressed Inverted Index Intersections with GOV2
GOV2: 426GB, 25 Millions documents, average doc. size=18k.
-
Aol: 1100 queries per second
18000 queries in 16.31s [1103.9 q/s] [0.906 ms/q]
Ratio = 14.37% Decoded/Total Integers. -
TREC Million Query Track (1MQT): 950 queries per second
20000 queries in 21.03s, [951.0 q/s] [1.052 ms/q]
Ratio = 11.59% Decoded/Total Integers.
Compile:
make
Testing
Synthetic data:
- test all functions
*./icbench -a1.0 -m0 -M8 -n100000000*
- zipfian distribution alpha = 1.0 (Ex. -a1.0=uniform -a1.5=skewed distribution)
- number of integers = 100000000
- integer range from 0 to 255 (integer size = 0 to 8 bits)
- individual function test (ex. copy TurboPack TurboPack Direct access)
*./icbench -a1.5 -m0 -M8 -ecopy/turbopack/turbopackda -n100000000*
Data files:
-
Data file Benchmark (file format as in FastPFOR)
./icbench -c1 gov2.sorted
Benchmarking intersections
-
Download gov2 (or clueweb09) + query file (Ex. "1mq.txt")
from "http://lemire.me/data/integercompression2014.html" -
Create index file
./idxcr gov2.sorted .
create inverted index file "gov2.sorted.i" in the current directory
-
Benchmarking intersections
./idxqry gov2.sorted.i 1mq.txt
run queries in file "1mq.txt" over the index of gov2 file
8GB RAM required (16GB recommended for benchmarking "clueweb09" files).
Function usage:
In general compression/decompression functions are of the form:
char *endptr = compress( unsigned *in, int n, [int b,] char *out)
endptr : set by compress to the next character in "out" after the compressed buffer
in : input integer array
n : number of elements
out : pointer to output buffer
b : number of bits. Only for bit packing functions
char *endptr = decompress( char *in, int n, [int b,] unsigned *out)
endptr : set by decompress to the next character in "in" after the decompressed buffer
in : pointer to input buffer
n : number of elements
out : output integer array
b : number of bits. Only for bit unpacking functions
header files to use with documentation :
vint.h | Variable byte |
---|---|
vsimple.h | Variable simple |
vp4dc.h, vp4dd.h | TurboPFor |
bitpack.h bitunpack.h | Bit Packing |
Reference:
- "SIMD-BitPack FPF" from FastPFor https://github.com/lemire/simdcomp
- Sorted integer datasets from http://lemire.me/data/integercompression2014.html
- OptPFD (OptP4) and Simple-16 from http://jinruhe.com/
#------------------------------------------------