Readme
This commit is contained in:
57
README.md
57
README.md
@ -1,52 +1,60 @@
|
|||||||
TurboPFor: Fastest Integer Compression [](https://travis-ci.org/powturbo/TurboPFor)
|
TurboPFor: Fastest Integer Compression [](https://travis-ci.org/powturbo/TurboPFor)
|
||||||
======================================
|
======================================
|
||||||
+ **TurboPFor**
|
+ **TurboPFor: The new synonym for integer compression**
|
||||||
- 100% C (C++ compatible headers), w/o inline assembly
|
- 100% C (C++ compatible headers), w/o inline assembly
|
||||||
|
- Usage as simple as memcpy
|
||||||
|
- :+1: Java Critical Native Interface. Access TurboPFor **incl. SIMD!** from Java as fast as calling from C
|
||||||
|
- :sparkles: **FULL** range 16/32/64 bits integer lists and Floating point
|
||||||
- No other "Integer Compression" compress or decompress faster with better compression
|
- No other "Integer Compression" compress or decompress faster with better compression
|
||||||
- Direct Access is several times faster than other libraries
|
- Direct Access is several times faster than other libraries
|
||||||
- Usage in C/C++ as easy as memcpy
|
|
||||||
- :sparkles: Integrated (SIMD) differential/Zigzag encoding/decoding for sorted/unsorted integer lists
|
- :sparkles: Integrated (SIMD) differential/Zigzag encoding/decoding for sorted/unsorted integer lists
|
||||||
- :sparkles: **Full** range 16/32, 64 bits integer lists and Floating point
|
|
||||||
- :+1: Java Critical Native Interface. Access TurboPFor incl. SIMD from Java as fast as calling from C.
|
|
||||||
- Compress better and faster than special binary compressors like blosc
|
- Compress better and faster than special binary compressors like blosc
|
||||||
<p>
|
<p>
|
||||||
+ **Features**
|
+ **Variable byte**
|
||||||
- :sparkles: Scalar **"Variable Byte"** faster and more efficient than any other implementation
|
- :sparkles: Scalar **"Variable Byte"** faster and more efficient than **ANY** other (incl. SIMD MaskeVByte) implementation
|
||||||
<p>
|
<p>
|
||||||
- :sparkles: **Novel** **"Variable Simple"** (incl. RLE) faster and more efficient than simple16 or simple8-b
|
+ **Simple family**
|
||||||
|
- :sparkles: **Novel** **"Variable Simple"** (incl. **RLE**) faster and more efficient than simple16, simple8-b
|
||||||
|
or other "simple family" implementation
|
||||||
<p>
|
<p>
|
||||||
|
+ **Bit Packing**
|
||||||
|
- :sparkles: Fastest and most efficient **"SIMD Bit Packing"**
|
||||||
- Scalar **"Bit Packing"** decoding as fast as SIMD-Packing in realistic (No "pure cache") scenarios
|
- Scalar **"Bit Packing"** decoding as fast as SIMD-Packing in realistic (No "pure cache") scenarios
|
||||||
- Bit Packing with **Direct/Random Access** without decompressing entire blocks
|
- Bit Packing with **Direct/Random Access** without decompressing entire blocks
|
||||||
- Access any single bit packed entry with **zero decompression**
|
- Access any single bit packed entry with **zero decompression**
|
||||||
- :sparkles: **Direct Update** of individual bit packed entries
|
- :sparkles: **Direct Update** of individual bit packed entries
|
||||||
- Reducing **Cache Pollution**
|
- Reducing **Cache Pollution**
|
||||||
<p>
|
<p>
|
||||||
- :sparkles: Fastest and most efficient **"SIMD Bit Packing"**
|
+ **Elias fano**
|
||||||
<p>
|
- :sparkles: Fastest **"Elias Fano"** implementation w/ or w/o SIMD
|
||||||
- :sparkles: Fastest **"Elias Fano"** implementation w/ or w/o SIMD.
|
|
||||||
<p>
|
<p>
|
||||||
|
+ **For/PFor/PForDelta**
|
||||||
- **Novel** **"TurboPFor"** (Patched Frame-of-Reference,PFor/PForDelta) scheme with **direct access** or bulk decoding.
|
- **Novel** **"TurboPFor"** (Patched Frame-of-Reference,PFor/PForDelta) scheme with **direct access** or bulk decoding.
|
||||||
Outstanding compression and speed. More efficient than **ANY** other fast "integer compression" scheme.
|
Outstanding compression and speed. More efficient than **ANY** other fast "integer compression" scheme.
|
||||||
- :new: **TurboPFor now 30%! more faster**
|
- :new: **TurboPFor now 30%! more faster**
|
||||||
|
- Compress 70 times faster and decompress up to 3 times faster than OptPFD
|
||||||
<p>
|
<p>
|
||||||
|
+ **Transform**
|
||||||
- :sparkles: Scalar & SIMD Transform: Delta, Zigzag, Transpose/Shuffle, Floating point<->Integer
|
- :sparkles: Scalar & SIMD Transform: Delta, Zigzag, Transpose/Shuffle, Floating point<->Integer
|
||||||
<p>
|
<p>
|
||||||
+ **Inverted Index ...do less, go fast!**
|
+ **Inverted Index ...do less, go fast!**
|
||||||
- Direct Access to compressed *frequency* and *position* data in inverted index with zero decompression
|
- Direct Access to compressed *frequency* and *position* data in inverted index with zero decompression
|
||||||
- :sparkles: **Novel** **"Intersection w/ skip intervals"**, decompress the minimum necessary blocks (~10-15%).
|
- :sparkles: **Novel** **"Intersection w/ skip intervals"**, decompress the minimum necessary blocks (~10-15%).
|
||||||
- **Novel** Implicit skips with zero extra overhead
|
- **Novel** Implicit skips with zero extra overhead
|
||||||
- **Novel** Efficient **Bidirectional** Inverted Index Architecture (forward/backwards traversal).
|
- **Novel** Efficient **Bidirectional** Inverted Index Architecture (forward/backwards traversal) incl. "integer compression".
|
||||||
- more than **2000! queries per second** on GOV2 dataset (25 millions documents) on a **SINGLE** core
|
- more than **2000! queries per second** on GOV2 dataset (25 millions documents) on a **SINGLE** core
|
||||||
- :sparkles: Revolutionary Parallel Query Processing on Multicores w/ more than **7000!!! queries/sec** on a quad core PC.<br>
|
- :sparkles: Revolutionary Parallel Query Processing on Multicores w/ more than **7000!!! queries/sec** on a quad core PC.<br>
|
||||||
**...forget** ~~Map Reduce, Hadoop, multi-node clusters,~~ ...
|
**...forget** ~~Map Reduce, Hadoop, multi-node clusters,~~ ...
|
||||||
|
|
||||||
### Benchmark:
|
### Benchmark:
|
||||||
CPU: Sandy bridge i7-2600k at 4.2GHz, gcc 5.1, ubuntu 15.04, single thread.
|
CPU: Sandy bridge i7-2600k at 4.2GHz, gcc 5.1, ubuntu 15.04, single thread.
|
||||||
- Realistic and practical benchmark with large integer arrays.
|
- Realistic and practical "integer compression" benchmark with large integer arrays.
|
||||||
- No PURE cache benchmark
|
- No PURE cache benchmark
|
||||||
|
|
||||||
##### - Synthetic data:
|
##### - Synthetic data:
|
||||||
- Generate and test skewed distribution (100.000.000 integers, Block size=128).
|
- Generate and test skewed distribution (100.000.000 integers, Block size=128)
|
||||||
|
Note: Unlike general purpose compression, a small fixed size (ex. 128 integers) is in general used in "integer compression".
|
||||||
|
Large blocks involved, while processing queries (inverted index, search engines, databases, graphs, in memory computing,...) need to be entirely decoded
|
||||||
|
|
||||||
|
|
||||||
./icbench -a1.5 -m0 -M255 -n100m
|
./icbench -a1.5 -m0 -M255 -n100m
|
||||||
@ -72,9 +80,9 @@ CPU: Sandy bridge i7-2600k at 4.2GHz, gcc 5.1, ubuntu 15.04, single thread.
|
|||||||
| | | | N/A | N/A |**EliasFano**|
|
| | | | N/A | N/A |**EliasFano**|
|
||||||
MI/s: 1.000.000 integers/second. 1000 MI/s = 4 GB/s<br>
|
MI/s: 1.000.000 integers/second. 1000 MI/s = 4 GB/s<br>
|
||||||
**#BOLD** = pareto frontier. FPF=FastPFor<br>
|
**#BOLD** = pareto frontier. FPF=FastPFor<br>
|
||||||
TurboPForDA,TurboForDA: Direct Access is normally used when accessing individual values.
|
TurboPForDA,TurboForDA: Direct Access is normally used when accessing few individual values.
|
||||||
|
|
||||||
CPU: Skylake i7-6700 3.7GHz
|
CPU: Skylake i7-6700 w/ only 3.7GHz
|
||||||
|
|
||||||
|Size| Ratio % |Bits/Integer |C Time MI/s |D Time MI/s |Function |
|
|Size| Ratio % |Bits/Integer |C Time MI/s |D Time MI/s |Function |
|
||||||
|--------:|-----:|----:|-------:|-------:|---------|
|
|--------:|-----:|----:|-------:|-------:|---------|
|
||||||
@ -98,6 +106,7 @@ CPU: Skylake i7-6700 3.7GHz
|
|||||||
|400000000| 100.00| 32.00| 2240.24|2237.05|Copy|
|
|400000000| 100.00| 32.00| 2240.24|2237.05|Copy|
|
||||||
------------------------------------------------------------------------
|
------------------------------------------------------------------------
|
||||||
##### - Data files:
|
##### - Data files:
|
||||||
|
- CPU: Sandy bridge i7-2600k at 4.2GHz
|
||||||
- gov2.sorted from [DocId data set](#DocId data set) Block size=128 (lz4+blosc+VSimple w/ 64Ki)
|
- gov2.sorted from [DocId data set](#DocId data set) Block size=128 (lz4+blosc+VSimple w/ 64Ki)
|
||||||
|
|
||||||
|
|
||||||
@ -166,14 +175,14 @@ q/s: queries/second, ms/q:milliseconds/query
|
|||||||
- Most search engines are using pruning strategies, caching popular queries,... to reduce the time for intersections and query processing.
|
- Most search engines are using pruning strategies, caching popular queries,... to reduce the time for intersections and query processing.
|
||||||
- As indication, google is processing [40.000 Queries per seconds](http://www.internetlivestats.com/google-search-statistics/),
|
- As indication, google is processing [40.000 Queries per seconds](http://www.internetlivestats.com/google-search-statistics/),
|
||||||
using [900.000 multicore servers](https://www.cloudyn.com/blog/10-facts-didnt-know-server-farms/) for searching [8 billions web pages](http://searchenginewatch.com/sew/study/2063479/coincidentally-googles-index-size-jumps) (320 X size of GOV2).
|
using [900.000 multicore servers](https://www.cloudyn.com/blog/10-facts-didnt-know-server-farms/) for searching [8 billions web pages](http://searchenginewatch.com/sew/study/2063479/coincidentally-googles-index-size-jumps) (320 X size of GOV2).
|
||||||
- Recent GOV2 experiments (best paper at ECIR 2014) [On Inverted Index Compression for Search Engine Efficiency](http://www.dcs.gla.ac.uk/~craigm/publications/catena14compression.pdf) using 8-core Xeon PC are reporting 1.2 seconds per query (for 1.000 Top-k docids).
|
- Recent "integer compression" GOV2 experiments (best paper at ECIR 2014) [On Inverted Index Compression for Search Engine Efficiency](http://www.dcs.gla.ac.uk/~craigm/publications/catena14compression.pdf) using 8-core Xeon PC are reporting 1.2 seconds per query (for 1.000 Top-k docids).
|
||||||
|
|
||||||
### Compile:
|
### Compile:
|
||||||
*make*
|
*make*
|
||||||
|
|
||||||
### Testing:
|
### Testing:
|
||||||
##### - Synthetic data:
|
##### - Synthetic data:
|
||||||
+ test all functions<br />
|
+ test all "integer compression" functions<br />
|
||||||
|
|
||||||
|
|
||||||
./icbench -a1.0 -m0 -M255 -n100m
|
./icbench -a1.0 -m0 -M255 -n100m
|
||||||
@ -243,7 +252,7 @@ using [900.000 multicore servers](https://www.cloudyn.com/blog/10-facts-didnt-kn
|
|||||||
>*run queries in file "1mq.txt" over the index of all gov2 partitions "gov2.sorted.s00.i - gov2.sorted.s07.i".*
|
>*run queries in file "1mq.txt" over the index of all gov2 partitions "gov2.sorted.s00.i - gov2.sorted.s07.i".*
|
||||||
|
|
||||||
### Function usage:
|
### Function usage:
|
||||||
See benchmark "icbench" program for usage examples.
|
See benchmark "icbench" program for "integer compression" usage examples.
|
||||||
In general encoding/decoding functions are of the form:
|
In general encoding/decoding functions are of the form:
|
||||||
|
|
||||||
|
|
||||||
@ -267,8 +276,8 @@ In general encoding/decoding functions are of the form:
|
|||||||
|
|
||||||
header files to use with documentation:<br />
|
header files to use with documentation:<br />
|
||||||
|
|
||||||
| header file|Functions|
|
| header file|Integer Compression functions|
|
||||||
|------|--------------|
|
|------------|-----------------------------|
|
||||||
|vint.h|variable byte|
|
|vint.h|variable byte|
|
||||||
|vsimple.h|variable simple|
|
|vsimple.h|variable simple|
|
||||||
|vp4dc.h, vp4dd.h|TurboPFor|
|
|vp4dc.h, vp4dd.h|TurboPFor|
|
||||||
@ -282,7 +291,7 @@ header files to use with documentation:<br />
|
|||||||
- Windows: MinGW-w64 (no parallel query processing)
|
- Windows: MinGW-w64 (no parallel query processing)
|
||||||
|
|
||||||
###### Multithreading:
|
###### Multithreading:
|
||||||
- All TurboPFor functions are thread safe
|
- All TurboPFor integer compression functions are thread safe
|
||||||
|
|
||||||
### References:
|
### References:
|
||||||
|
|
||||||
@ -291,14 +300,14 @@ header files to use with documentation:<br />
|
|||||||
+ <a name="MaskedVByte"></a>[MaskedVByte](http://maskedvbyte.org/). See also: [Vectorized VByte Decoding](http://engineering.indeed.com/blog/2015/03/vectorized-vbyte-decoding-high-performance-vector-instructions/)
|
+ <a name="MaskedVByte"></a>[MaskedVByte](http://maskedvbyte.org/). See also: [Vectorized VByte Decoding](http://engineering.indeed.com/blog/2015/03/vectorized-vbyte-decoding-high-performance-vector-instructions/)
|
||||||
+ <a name="Simple-8b"></a>[Index Compression Using 64-Bit Words](http://people.eng.unimelb.edu.au/ammoffat/abstracts/am10spe.html): Simple-8b (speed optimized version tested)
|
+ <a name="Simple-8b"></a>[Index Compression Using 64-Bit Words](http://people.eng.unimelb.edu.au/ammoffat/abstracts/am10spe.html): Simple-8b (speed optimized version tested)
|
||||||
+ <a name="libfor"></a>[libfor](https://github.com/cruppstahl/for)
|
+ <a name="libfor"></a>[libfor](https://github.com/cruppstahl/for)
|
||||||
+ <a name="QMX"></a>[Compression, SIMD, and Postings Lists](http://www.cs.otago.ac.nz/homepages/andrew/papers/) QMX
|
+ <a name="QMX"></a>[Compression, SIMD, and Postings Lists](http://www.cs.otago.ac.nz/homepages/andrew/papers/) QMX integer compression from the "simple family"
|
||||||
+ <a name="lz4"></a>[lz4](https://github.com/Cyan4973/lz4). included w. block size 64K as indication. Tested after preprocessing w. delta+transpose
|
+ <a name="lz4"></a>[lz4](https://github.com/Cyan4973/lz4). included w. block size 64K as indication. Tested after preprocessing w. delta+transpose
|
||||||
+ <a name="blosc"></a>[blosc](https://github.com/Blosc/c-blosc). blosc is like transpose/shuffle+lz77. Tested blosc+lz4 and blosclz incl. vectorizeed shuffle.<br>
|
+ <a name="blosc"></a>[blosc](https://github.com/Blosc/c-blosc). blosc is like transpose/shuffle+lz77. Tested blosc+lz4 and blosclz incl. vectorizeed shuffle.<br>
|
||||||
+ <a name="DocId data set"></a>[Document identifier data set](http://lemire.me/data/integercompression2014.html)
|
+ <a name="DocId data set"></a>[Document identifier data set](http://lemire.me/data/integercompression2014.html)
|
||||||
+ **Publications:**
|
+ **Integer compression publications:**
|
||||||
- [SIMD Compression and the Intersection of Sorted Integers](http://arxiv.org/abs/1401.6399)
|
- [SIMD Compression and the Intersection of Sorted Integers](http://arxiv.org/abs/1401.6399)
|
||||||
- [Partitioned Elias-Fano Indexes](http://www.di.unipi.it/~ottavian/files/elias_fano_sigir14.pdf)
|
- [Partitioned Elias-Fano Indexes](http://www.di.unipi.it/~ottavian/files/elias_fano_sigir14.pdf)
|
||||||
- [On Inverted Index Compression for Search Engine Efficiency](http://www.dcs.gla.ac.uk/~craigm/publications/catena14compression.pdf)
|
- [On Inverted Index Compression for Search Engine Efficiency](http://www.dcs.gla.ac.uk/~craigm/publications/catena14compression.pdf)
|
||||||
- [Google's Group Varint Encoding](http://static.googleusercontent.com/media/research.google.com/de//people/jeff/WSDM09-keynote.pdf)
|
- [Google's Group Varint Encoding](http://static.googleusercontent.com/media/research.google.com/de//people/jeff/WSDM09-keynote.pdf)
|
||||||
|
|
||||||
Last update: 26 MAR 2015
|
Last update: 27 MAR 2015
|
||||||
|
|||||||
Reference in New Issue
Block a user