This commit is contained in:
x
2019-10-16 19:47:31 +02:00
parent 6df1da0f9b
commit 93093e3bfe

View File

@ -1,15 +1,15 @@
TurboPFor: Fastest Integer Compression [![Build Status](https://travis-ci.org/powturbo/TurboPFor.svg?branch=master)](https://travis-ci.org/powturbo/TurboPFor) TurboPFor: Fastest Integer Compression [![Build Status](https://travis-ci.org/powturbo/TurboPFor.svg?branch=master)](https://travis-ci.org/powturbo/TurboPFor)
====================================== ======================================
* **TurboPFor: The new synonym for "integer compression"** * **TurboPFor: The new synonym for "integer compression"**
* :new: (2019.7) all TurboPFor functions now available under 64 bits ARMv8 including NEON SIMD. * :new: (2019.11) **ALL** TurboPFor functions now available under **64 bits ARMv8** including **NEON** SIMD.
* 100% C (C++ headers), as simple as memcpy * 100% C (C++ headers), as simple as memcpy
* :+1: **Java** Critical Natives/JNI. Access TurboPFor **incl. SIMD/AVX2!** from Java as fast as calling from C * :+1: **Java** Critical Natives/JNI. Access TurboPFor **incl. SIMD/AVX2!** from Java as fast as calling from C
* :sparkles: **FULL** range 8/16/32/64 bits scalar + 16/32/64 bits SIMD functions * :sparkles: **FULL** range 8/16/32/64 bits scalar + 16/32/64 bits SIMD functions
* No other "Integer Compression" compress/decompress faster * No other "Integer Compression" compress/decompress faster
* :sparkles: Direct Access, **integrated** (SIMD/AVX2) FOR/delta/Delta of Delta/Zigzag for sorted/unsorted arrays * :sparkles: Direct Access, **integrated** (SIMD/AVX2) FOR/delta/Delta of Delta/Zigzag for sorted/unsorted arrays
* :new: **16 bits** + **64 bits** SIMD integrated functions * **16 bits** + **64 bits** SIMD integrated functions
* **For/PFor/PForDelta** * **For/PFor/PForDelta**
* **Novel TurboPFor** (PFor/PForDelta) scheme w./ **direct access** + **SIMD/AVX2**. :new:**+RLE** * **Novel TurboPFor** (PFor/PForDelta) scheme w./ **direct access** + **SIMD/AVX2**. **+RLE**
* Outstanding compression/speed. More efficient than **ANY** other fast "integer compression" scheme. * Outstanding compression/speed. More efficient than **ANY** other fast "integer compression" scheme.
* Compress 70 times faster and decompress up to 4 times faster than OptPFD * Compress 70 times faster and decompress up to 4 times faster than OptPFD
* **Bit Packing** * **Bit Packing**
@ -17,20 +17,23 @@ TurboPFor: Fastest Integer Compression [![Build Status](https://travis-ci.org/po
* Scalar **"Bit Packing"** decoding nearly as fast as SIMD-Packing in realistic (No "pure cache") scenarios * Scalar **"Bit Packing"** decoding nearly as fast as SIMD-Packing in realistic (No "pure cache") scenarios
* **Direct/Random Access** : Access any single bit packed entry with **zero decompression** * **Direct/Random Access** : Access any single bit packed entry with **zero decompression**
* **Variable byte** * **Variable byte**
* Scalar **"Variable Byte"** faster than **ANY** other (incl. SIMD) implementation * Scalar **"Variable Byte"** faster and more efficient than **ANY** other implementation
* :new: (2019.11) SIMD **TurboByte** fastest group varint (16+32 bits) incl. integrated delta,zigzag,...
* :new: (2019.11) **TurboByte+TurboPackV** novel hybrid scheme combining the fastest SIMD codecs.
* **Simple family** * **Simple family**
* **Novel** **"Variable Simple"** (incl. **RLE**) faster and more efficient than simple16, simple-8b * **Novel** **"Variable Simple"** (incl. **RLE**) faster and more efficient than simple16, simple-8b
* **Elias fano** * **Elias fano**
* Fastest **"Elias Fano"** implementation w/ or w/o SIMD/AVX2 * Fastest **"Elias Fano"** implementation w/ or w/o SIMD/AVX2
+ **Transform** + **Transform**
* Scalar & SIMD Transform: Delta, Zigzag, Zigzag of delta, XOR, Transpose/Shuffle, * Scalar & SIMD Transform: Delta, Zigzag, Zigzag of delta, XOR, Transpose/Shuffle,
* :new: **lossy** floating point compression with *TurboPFor* or [TurboTranspose](https://github.com/powturbo/TurboTranspose)+lz77 * **lossy** floating point compression with *TurboPFor* or [TurboTranspose](https://github.com/powturbo/TurboTranspose)+lz77
* **Floating Point Compression** * **Floating Point Compression**
* Delta/Zigzag + improved gorilla style + (Differential) Finite Context Method FCM/DFCM floating point compression * Delta/Zigzag + improved gorilla style + (Differential) Finite Context Method FCM/DFCM floating point compression
* Using **TurboPFor**, unsurpassed compression and more than 5 GB/s throughput * Using **TurboPFor**, unsurpassed compression and more than 5 GB/s throughput
* :new: Error bound **lossy** floating point compression * Point wise relative error bound **lossy** floating point compression
* :new: **Time Series Compression** * :new: (2019.10) **TurboFloat** novel efficient floating point compression using TurboPFor
* **Fastest Gorilla** 16/32/64 bits style compression (:new: **zigzag of delta** + **RLE**). * **Time Series Compression**
* **Fastest Gorilla** 16/32/64 bits style compression (**zigzag of delta** + **RLE**).
* can compress times series to only 0.01%. Speed > 10 GB/s compression and > 13 GB/s decompress. * can compress times series to only 0.01%. Speed > 10 GB/s compression and > 13 GB/s decompress.
* **Inverted Index ...do less, go fast!** * **Inverted Index ...do less, go fast!**
* Direct Access to compressed *frequency* and *position* data w/ zero decompression * Direct Access to compressed *frequency* and *position* data w/ zero decompression
@ -47,7 +50,7 @@ TurboPFor: Fastest Integer Compression [![Build Status](https://travis-ci.org/po
- :new: Download [IcApp](https://sites.google.com/site/powturbo/downloads) a new benchmark for TurboPFor<br> - :new: Download [IcApp](https://sites.google.com/site/powturbo/downloads) a new benchmark for TurboPFor<br>
for testing allmost all integer and floating point file types. for testing allmost all integer and floating point file types.
- Practical (No **PURE** cache) "integer compression" benchmark w/ **large** arrays. - Practical (No **PURE** cache) "integer compression" benchmark w/ **large** arrays.
- CPU: Skylake i7-6700 3.4GHz gcc 7.2 **single** thread - CPU: Skylake i7-6700 3.4GHz gcc 8.3 **single** thread
##### - Synthetic data: ##### - Synthetic data:
- Generate and test (zipfian) skewed distribution (100.000.000 integers, Block size=128/256)<br> - Generate and test (zipfian) skewed distribution (100.000.000 integers, Block size=128/256)<br>
@ -55,30 +58,32 @@ TurboPFor: Fastest Integer Compression [![Build Status](https://travis-ci.org/po
Large blocks involved, while processing queries (inverted index, search engines, databases, graphs, in memory computing,...) need to be entirely decoded. Large blocks involved, while processing queries (inverted index, search engines, databases, graphs, in memory computing,...) need to be entirely decoded.
./icbench -a1.5 -m0 -M255 -n100M ZIPF ./icbench -a1.5 -m0 -M255 -n100M ZIPF
|C Size|ratio%|Bits/Integer|C MB/s|D MB/s|Name| |C Size|ratio%|Bits/Integer|C MB/s|D MB/s|Name 2019.11|
|--------:|-----:|--------:|----------:|----------:|--------------| |--------:|-----:|--------:|----------:|----------:|--------------|
|62,939,886| 15.7| 5.04|**1588**|**9400**|**TurboPFor256**| |62,939,886| 15.7| 5.04|**2369**|**10950**|**TurboPFor256**|
|63,392,759| 15.8| 5.07|1320|6432|**TurboPFor**| |63,392,759| 15.8| 5.07|1359|7803|**TurboPFor128**|
|63,392,801| 15.8| 5.07|1328|924|**TurboPForDA**| |63,392,801| 15.8| 5.07|1328|924|**TurboPForDA**|
|65,060,504| 16.3| 5.20|60|2748|[FP_SIMDOptPFor](#FastPFor)| |65,060,504| 16.3| 5.20|60|2748|[FP_SIMDOptPFor](#FastPFor)|
|65,359,916|16.3| 5.23| 32|2436|PC_OptPFD| |65,359,916|16.3| 5.23| 32|2436|PC_OptPFD|
|73,477,088|18.4| 5.88|408|2484|PC_Simple16| |73,477,088|18.4| 5.88|408|2484|PC_Simple16|
|73,481,096| 18.4| 5.88|624|8748|[FP_SimdFastPFor](#FastPFor) 64Ki *| |73,481,096| 18.4| 5.88|624|8748|[FP_SimdFastPFor](#FastPFor) 64Ki *|
|76,345,136| 19.1| 6.11|980|2612|**VSimple**| |76,345,136| 19.1| 6.11|1072|2878|**VSimple**|
|91,947,533| 23.0| 7.36|284|11737|[QMX](#QMX) 64k *| |91,947,533| 23.0| 7.36|284|11737|[QMX](#QMX) 64k *|
|93,285,864| 23.3| 7.46|1568|10232|[FP_GroupSimple](#FastPFor) 64Ki *| |93,285,864| 23.3| 7.46|1568|10232|[FP_GroupSimple](#FastPFor) 64Ki *|
|95,915,096|24.0| 7.67| 848|3832|Simple-8b| |95,915,096|24.0| 7.67| 848|3832|Simple-8b|
|99,910,930| 25.0| 7.99|**13976**|**11872**|**TurboPackV**| |99,910,930| 25.0| 7.99|**17298**|**12408**|**TurboByte+TurboPack**|
|99,910,930| 25.0| 7.99|9468|9404|**TurboPack**| |99,910,930| 25.0| 7.99|**17357**|**12363**|**TurboPackV** sse|
|99,910,930| 25.0| 7.99|11694|10138|**TurboPack** scalar|
|99,910,930| 25.0| 7.99|8420|8876|**TurboFor**| |99,910,930| 25.0| 7.99|8420|8876|**TurboFor**|
|100,332,929| 25.1| 8.03|**14320**|**12124**|**TurboPack256V**| |100,332,929| 25.1| 8.03|17077|11170|**TurboPack256V** avx2|
|101,015,650| 25.3| 8.08|9520|9484|**TurboVByte**| |101,015,650| 25.3| 8.08|11191|10333|**TurboVByte**|
|102,074,663| 25.5| 8.17|5712|7916|[MaskedVByte](#MaskedVByte)| |102,074,663| 25.5| 8.17|6689|9524|[MaskedVByte](#MaskedVByte)|
|102,074,663| 25.5| 8.17|2260|4208|[PC_Vbyte](#PolyCom)| |102,074,663| 25.5| 8.17|2260|4208|[PC_Vbyte](#PolyCom)|
|102,083,036| 25.5| 8.17|5200|4268|[FP_VByte](#FastPFor)| |102,083,036| 25.5| 8.17|5200|4268|[FP_VByte](#FastPFor)|
|112,500,000| 28.1| 9.00|1528|**12140**|[VarintG8IU](#VarintG8IU)| |112,500,000| 28.1| 9.00|1528|12140|[VarintG8IU](#VarintG8IU)|
|125,000,000| 31.2|10.00|4788|11288|[StreamVbyte](#StreamVByte)| |125,000,000| 31.2|10.00|13039|12366|**TurboByte**|
|125,000,000| 31.2|10.00|11197|11984|[StreamVbyte 2019](#StreamVByte)|
|400,000,000| 100.00| 32.00| 8960|8948|Copy| |400,000,000| 100.00| 32.00| 8960|8948|Copy|
| | | | N/A | N/A |EliasFano| | | | | N/A | N/A |EliasFano|
@ -97,28 +102,29 @@ TurboPFor: Fastest Integer Compression [![Build Status](https://travis-ci.org/po
![Speed/Ratio](ext/gov2.png "Speed/Ratio: Decompression") ![Speed/Ratio](ext/gov2.png "Speed/Ratio: Decompression")
|Size |Ratio %|Bits/Integer|C Time MB/s|D Time MB/s|Function | |Size |Ratio %|Bits/Integer|C Time MB/s|D Time MB/s|Function 2019.11|
|-----------:|------:|-----:|-------:|-------:|---------------------| |-----------:|------:|-----:|-------:|-------:|---------------------|
| 3,321,663,893| 13.9| 4.44|**1320**|**6088**|**TurboPFor**| | 3,321,663,893| 13.9| 4.44|**1320**|**6088**|**TurboPFor**|
| 3,339,730,557| 14.0| 4.47| 32| 2144|PC.OptPFD| | 3,339,730,557| 14.0| 4.47| 32| 2144|PC.OptPFD|
| 3,350,717,959| 14.0| 4.48|**1536**|**7128**|**TurboPFor256**| | 3,350,717,959| 14.0| 4.48|**1536**|**7128**|**TurboPFor256**|
| 3,501,671,314| 14.6| 4.68| 56| 2840|**VSimple**| | 3,501,671,314| 14.6| 4.68| 56| 2840|**VSimple**|
| 3,768,146,467| 15.8| 5.04|**3228**| 3652|**EliasFanoV**| | 3,768,146,467| 15.8| 5.04|**3228**| 3652|**EliasFanoV**|
| 3,822,161,885| 16.0| 5.11| 572| 2444|PC_Simple16| | 3,822,161,885| 16.0| 5.11| 572| 2444|PC_Simple16|
| 4,411,714,936| 18.4| 5.90|**9304**|**10444**|**TurboByte+TurboPack**|
| 4,521,326,518| 18.9| 6.05| 836| 3296|Simple-8b| | 4,521,326,518| 18.9| 6.05| 836| 3296|Simple-8b|
| 4,649,671,427| 19.4| 6.22|3084|3848|**TurboVbyte**| | 4,649,671,427| 19.4| 6.22|3084| 3848|**TurboVbyte**|
| 4,955,740,045| 20.7| 6.63|**7064**|**10268**|**TurboPackV**| | 4,955,740,045| 20.7| 6.63|7064|10268|**TurboPackV**|
| 4,955,740,045| 20.7| 6.63|5724|8020|**TurboPack**| | 4,955,740,045| 20.7| 6.63|5724| 8020|**TurboPack**|
| 5,205,324,760|21.8| 6.96|6952|9488|SC_SIMDPack128| | 5,205,324,760| 21.8| 6.96|6952| 9488|SC_SIMDPack128|
| 5,393,769,503| 22.5| 7.21|**9912**|**11588**|**TurboPackV256**| | 5,393,769,503| 22.5| 7.21|**14466**|**11902**|**TurboPackV256**|
| 6,221,886,390| 26.0| 8.32|6668|6952|**TurboFor**| | 6,221,886,390| 26.0| 8.32|6668| 6952|**TurboFor**|
| 6,221,886,390| 26.0| 8.32|6644| 2260|**TurboForDA**| | 6,221,886,390| 26.0| 8.32|6644| 2260|**TurboForDA**|
| 6,699,519,000| 28.0| 8.96| 1888| 1980|FP_Vbyte| | 6,699,519,000| 28.0| 8.96|1888| 1980|FP_Vbyte|
| 6,700,989,563| 28.0| 8.96| 2740| 3384|MaskedVByte| | 6,700,989,563| 28.0| 8.96|2740| 3384|MaskedVByte|
| 7,622,896,878| 31.9|10.20| 836|4792|VarintG8IU| | 7,622,896,878| 31.9|10.20| 836| 4792|VarintG8IU|
| 8,060,125,035| 33.7|11.50| 3536|8684|Streamvbyte| | 8,060,125,035| 33.7|11.50|8456| 9476|Streamvbyte 2019|
| 8,594,342,216| 35.9|11.50|5228|6376|libfor| | 8,594,342,216| 35.9|11.50|5228| 6376|libfor|
|23,918,861,764|100.0|32.00|5824|5924|Copy| |23,918,861,764|100.0|32.00|5824| 5924|Copy|
Block size: 64Ki = 256k bytes. Ki=1024 Integers Block size: 64Ki = 256k bytes. Ki=1024 Integers
@ -174,7 +180,7 @@ Block size: 64Ki = 256k bytes. Ki=1024 Integers
./icapp -Ftf file " text file (1 entry per line) ./icapp -Ftf file " text file (1 entry per line)
./icapp -Ftf file -v5 " + display the first entries read ./icapp -Ftf file -v5 " + display the first entries read
./icapp -Ftf file.csv -K3 " but 3th column in a csv file (ex. number,Text,456.5 -> 456.5 ./icapp -Ftf file.csv -K3 " but 3th column in a csv file (ex. number,Text,456.5 -> 456.5
./icapp -Ftf file -g.001 " lossy compression with allowed error 0.001 ./icapp -Ftf file -g.001 " lossy compression with allowed pointwise relative error 0.001
- see also [TurboTranspose](https://github.com/powturbo/TurboTranspose) - see also [TurboTranspose](https://github.com/powturbo/TurboTranspose)