2015-01-03 15:08:17 +01:00
2014-12-20 00:43:16 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 23:00:08 +01:00
2015-01-03 15:08:17 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00
2014-10-28 22:19:48 +01:00

TurboPFor: Fastest Integer Compression

  • 100% C, without inline assembly

- Fastest **"Variable Byte"** implementation

- Novel **"Variable Simple"** faster than simple16 and more compact than simple64

- Scalar **"Binary Packing"** with bulk decoding as fast as SIMD FastPFor in realistic (No "pure cache") scenarios - Binary Packing with **Direct/Random Access** without decompressing entire blocks - Access any single binary packed entry with **zero decompression**

- Novel **"TurboPFor"** (Patched Frame-of-Reference) scheme with direct access or bulk decoding

- Several times faster than other libraries - Usage as easy as memcpy - Instant access to compressed *frequency* and *position* data in inverted index with zero decoding

Benchmark:

i7-2600k at 3.4GHz, gcc 4.9, ubuntu 14.10.

  • Single thread
  • Realistic and practical benchmark with large integer arrays.
  • No PURE cache benchmark

Synthetic data:

coming soon!

data files

SizeRatio in %Bits/IntegerC Time MB/sD Time MB/sFunction
5144384058.162.61357.221286.42TurboPFor
5144384058.162.61358.09309.70TurboPFor DA
5398417928.562.746.47767.35OptP4
5831841129.252.96132.42914.89Simple16
6235485659.893.17235.32925.71SimpleV
73336595211.643.72162.211312.15Simple64
86246428913.684.381274.011980.55TurboPack
86246428913.684.381285.28868.06TurboPack DA
86246539113.684.381402.122075.15SIMD-BitPack FPF
6303089028100.0032.001257.501308.22copy

Compile:

make

Usage

Synthetic data:
  1. test all functions<br > ./icbench -a1.0 -m0 -x8 -n100000000
- zipfian distribution alpha = 1.0 (Ex. -a1.0=uniform -a1.5=skewed distribution)
- number of integers = 100000000
- integer range from 0 to 255 (integer size = 0 to 8 bits)
  1. individual function test (ex. copy TurboPack TurboPack Direct access)<br > ./icbench -a1.0 -m0 -x8 -ecopy/turbopack/turbopack,da -n100000000
Data files:
  • Data file Benchmark (file format as in FastPFOR)<br > ./icbench -n10000000000 clueweb09.sorted

Reference:

Description
No description provided
Readme GPL-2.0 6.4 MiB
Languages
C 93.1%
Rust 5.8%
Makefile 0.6%
Java 0.4%