This commit is contained in:
x
2019-07-15 10:32:55 +02:00
parent b70d6b0b22
commit 95bc9fc7f0

144
README.md
View File

@ -1,6 +1,7 @@
TurboPFor: Fastest Integer Compression [![Build Status](https://travis-ci.org/powturbo/TurboPFor.svg?branch=master)](https://travis-ci.org/powturbo/TurboPFor) TurboPFor: Fastest Integer Compression [![Build Status](https://travis-ci.org/powturbo/TurboPFor.svg?branch=master)](https://travis-ci.org/powturbo/TurboPFor)
====================================== ======================================
* **TurboPFor: The new synonym for "integer compression"** * **TurboPFor: The new synonym for "integer compression"**
* :new: (2019.7) all TurboPFor functions now available under 64 bits ARMv8 including NEON SIMD.
* 100% C (C++ headers), as simple as memcpy * 100% C (C++ headers), as simple as memcpy
* :+1: **Java** Critical Natives/JNI. Access TurboPFor **incl. SIMD/AVX2!** from Java as fast as calling from C * :+1: **Java** Critical Natives/JNI. Access TurboPFor **incl. SIMD/AVX2!** from Java as fast as calling from C
* :sparkles: **FULL** range 8/16/32/64 bits scalar + 16/32/64 bits SIMD functions * :sparkles: **FULL** range 8/16/32/64 bits scalar + 16/32/64 bits SIMD functions
@ -267,7 +268,7 @@ using [900.000 multicore servers](https://www.cloudyn.com/blog/10-facts-didnt-kn
+ Unit test: test function from bit size 0 to 32 + Unit test: test function from bit size 0 to 32
./icbench -m0 -M32 -eturbpfor ./icbench -m0 -M32 -eturbpfor -fu
./icbench -m0 -M8 -eturbopack -fs -n1M ./icbench -m0 -M8 -eturbopack -fs -n1M
##### - Data files: ##### - Data files:
@ -275,11 +276,11 @@ using [900.000 multicore servers](https://www.cloudyn.com/blog/10-facts-didnt-kn
./icbench file ./icbench file
./icapp file ./icapp file
./icapp -Fs file "16 bits binary file ./icapp -Fs file "16 bits raw binary file
./icapp -Fu file "32 bits binary file ./icapp -Fu file "32 bits raw binary file
./icapp -Fl file "64 bits binary file ./icapp -Fl file "64 bits raw binary file
./icapp -Ff file "32 bits floating point binary file ./icapp -Ff file "32 bits raw floating point binary file
./icapp -Fd file "64 bits floating point binary file ./icapp -Fd file "64 bits raw floating point binary file
- Text file: 1 entry per line. [Test data: ts.txt(sorted) and lat.txt(unsorted)](https://github.com/zhenjl/encoding/tree/master/benchmark/data)) - Text file: 1 entry per line. [Test data: ts.txt(sorted) and lat.txt(unsorted)](https://github.com/zhenjl/encoding/tree/master/benchmark/data))
@ -385,21 +386,22 @@ In general encoding/decoding functions are of the form:
compressed_size : number of bytes read from compressed input buffer in<br /> compressed_size : number of bytes read from compressed input buffer in<br />
### Function syntax: ### Function syntax:
- {vb | p4 | bit | vs}[d | d1 | f | fm | z ]{enc/dec | pack/unpack}[| 128V | 256V][8 | 16 | 32 | 64]:<br /> - {vb | p4 | bit | vs}[n][d | d1 | f | fm | z ]{enc/dec | pack/unpack}[| 128V | 256V][8 | 16 | 32 | 64]:<br />
vb: variable byte<br /> vb: variable byte<br />
p4: turbopfor<br /> p4: turbopfor<br />
vs: variable simple<br /> vs: variable simple<br />
bit: bit packing<br /> bit: bit packing<br />
n : high level array functions for large arrays.
d: delta encoding for increasing integer lists (sorted w/ duplicate)<br /> '' : encoding for unsorted integer lists<br />
d1: delta encoding for strictly increasing integer lists (sorted unique)<br /> 'd' : delta encoding for increasing integer lists (sorted w/ duplicate)<br />
f : FOR encoding for sorted integer lists<br /> 'd1': delta encoding for strictly increasing integer lists (sorted unique)<br />
fm: FOR encoding for unsorted integer lists<br /> 'f' : FOR encoding for sorted integer lists<br />
z: ZigZag encoding for unsorted integer lists<br /> 'z' : ZigZag encoding for unsorted integer lists<br />
enc/pack: encode<br /> 'enc' or 'pack' : encode or bitpack<br />
dec/unpack:decode<br /> 'dec' or 'unpack': decode or bitunpack<br />
XX : integer size (8/16/32/64)<br /> 'NN' : integer size (8/16/32/64)<br />
header files to use with documentation:<br /> header files to use with documentation:<br />
@ -411,12 +413,15 @@ header files to use with documentation:<br />
|bitpack.h|Bit Packing, For, +Direct Access| bitpack256v32/bitunpack256v32 bitforenc64/bitfordec64| |bitpack.h|Bit Packing, For, +Direct Access| bitpack256v32/bitunpack256v32 bitforenc64/bitfordec64|
|eliasfano.h|Elias Fano| efanoenc256v32/efanoc256v32 | |eliasfano.h|Elias Fano| efanoenc256v32/efanoc256v32 |
Note: Some low level functions (like p4enc32) are limited to 128/256 (SSE/AVX2) integers per call.
### Environment: ### Environment:
###### OS/Compiler (64 bits): ###### OS/Compiler (64 bits):
- Linux: GNU GCC (>=4.6) - Linux: GNU GCC (>=4.6)
- clang (>=3.2) - clang (>=3.2)
- Windows: MinGW-w64 (no parallel query processing demo app) - Windows: MinGW-w64 (no parallel query processing demo app)
- Visual c++ (VS2008-VS2017) - Visual c++ (VS2008-VS2017)
- Linux aarch64 for 64 bits ARM CPU : gcc
###### Multithreading: ###### Multithreading:
- All TurboPFor integer compression functions are thread safe - All TurboPFor integer compression functions are thread safe
@ -451,5 +456,112 @@ header files to use with documentation:<br />
* [Small Polygon Compression](https://arxiv.org/abs/1509.05505) + [Poster](http://abhinavjauhri.me/publications/dcc_poster_2016.pdf) + [code](https://github.com/ajauhri/bignum_compression) * [Small Polygon Compression](https://arxiv.org/abs/1509.05505) + [Poster](http://abhinavjauhri.me/publications/dcc_poster_2016.pdf) + [code](https://github.com/ajauhri/bignum_compression)
* [Parallel Graph Analysis (Lecture 18)](http://www.cs.rpi.edu/~slotag/classes/FA16/) + [code](http://www.cs.rpi.edu/~slotag/classes/FA16/handson/lec18-comp2.cpp) * [Parallel Graph Analysis (Lecture 18)](http://www.cs.rpi.edu/~slotag/classes/FA16/) + [code](http://www.cs.rpi.edu/~slotag/classes/FA16/handson/lec18-comp2.cpp)
Last update: 09 Nov 2018 Last update: 15 Jul 2019
## APPENDIX: icbench Integer Compression Benchmark
##### TurboPFor + external libraries
<pre>
TurboPFor https://github.com/powturbo/TurboPFor
FastPFor (FP) https://github.com/lemire/FastPFor
lz4 https://github.com/Cyan4973/lz4
LittleIntPacker (LI) https://github.com/lemire/LittleIntPacker
MaskedVbyte http://maskedvbyte.org
Polycom (PC) https://github.com/encode84/bcm
simdcomp (SC) https://github.com/lemire/simdcomp
Simple-8b optimized https://github.com/powturbo/TurboPFor
Streamvbyte https://github.com/lemire/streamvbyte
VarintG8IU https://github.com/lemire/FastPFor
</pre>
##### Functions integrated into 'icbench' for benchmarking
<pre>
Codec group:
TURBOPFOR TurboPFor library TurboPFor256V/TurboPack256V/TurboPFor256N/TurboPFor/TurboPackV/TurboVByte/TurboPack/TurboForDA/EliasFano/VSimple/TurboPForN/TurboPackN/TurboPForDI
DEFAULT Default TurboPFor/TurboPackV/TurboVByte/TurboPack/TurboFor/TurboPForN/TurboPackN/TurboPForDI/TurboPFor256V/TurboPack256V/TurboPFor256N
BENCH Benchmark TurboPFor/TurboPackV/TurboVByte/TurboPack/QMX/FP.SimdFastPfor/FP.SimdOptPFor/MaskedVbyte/StreamVbyte
EFFICIENT Efficient TurboPFor/vsimple/turbovbyte
TRANSFORM transpose/shufle,delta,zigzag tpbyte4s/tpbyte,4/tpnibble,4/ZigZag_32/Delta_32/BitShuffle,4
BITPACK Bit Packing TurboPack256V/TurboPackV/TurboPackH/TurboPack/SC.SimdPack128/SC.SimdPack256
VBYTE Variable byte TurboVByte/FP.VByte/PC.Vbyte/VarintG8IU/MaskedVbyte/StreamVbyte
SIMPLE Simple Family simple8b/simple16/vsimple/qmx
LZ4 lz4+bitshufle/transpose 4,8 lz4_bitshufle/lz4_tp4/lz4_tp8
LI Little Integer LI_Pack/LI_TurboPack/LI_SuperPack/LI_HorPack
Function Description level
-------- ----------- -----
TurboPFor PFor (SSE2)
TurboPForN PFor (SSE2) large blocks
TurboPFor256 PFor (AVX2)
TurboPFor256N PFor (AVX2) large blocks
TurboPForDA PFor direct access
TurboPForDI PFord min
TurboPForZZ PFor zigzag of delta
TurboFor FOR
TurboForV FOR (SIMD)
TurboFor256V FOR (AVX2)
TurboForDA FOR direct access
TurboPackDA Bit packing direct access
TurboPack Bit packing (scalar)
TurboPackN Bit packing (scalar) large blocks
TurboPackV Bit packing (SSE2 Vertical)
TurboPackH Bit packing (SSE2 Horizontal)
TurboPackVN Bit packing (SSE2 large block)
TurboPack256V Bit packing (AVX2 Vertical)
TurboPack256N Bit packing (AVX2 large block)
TurboVByte Variable byte (scalar)
VSimple Variable simple (scalar)
EliasFano Elias fano (scalar)
EliasFanoV Eliasfano (SSE2)
EliasFano256V Elias fano (AVX2
memcpy memcpy
copy Integer copy
tpbyte4s Byte Transpose (scalar)
tpbyte Byte transpose (simd) 2,4,8
tpnibble Nibble transpose (simd) 2,4,8
ZigZag32 ZigZag encoding (sse2)
Delta32 Delta encoding (sse2)
DDelta32 Delta of delta encoding (sse2)
Xor32 Xor encoding (sse2)
FP_PREV64 Floating point PFOR
FP_FCM64 Floating point PFOR (FCM)
FP_DFCM64 Floating point PFOR (DFCM)
TurboPFor64 PFOR 64
TurboPFor64V PFOR 64
Simple8b 64 bits Simple family (instable)
PC_Simple16 Simple 16. limited to 28 bits
PC_OptPFD OptPFD. limited to 28 bits
PC_Vbyte Variable byte
PC_Rice Rice coding (instable)
VarintG8IU Variable byte SIMD
MaskedVbyte Variable byte SIMD
StreamVbyte Variable byte SIMD
FP_FastPFor PFor scalar (inefficient for small blocks)
FP_SimdFastPFor PFor SIMD (inefficient for small blocks)
FP_OptPFor OptPFor scalar
FP_SIMDOptPFor OptPFor SIMD
FP_VByte Variable byte
FP_Simple8bRLE Simple-8b + rle
FP_GROUPSIMPLE Group Simple
SC_SIMDPack128 Bit packing (SSE4.1)
SC_SIMDPack256 Bit packing (SSE4.1)
SC_For For (SSE4.1)
SC_ForDA For direct access (SSE4.1)
LibFor_For For
LibFor_ForDA For direct access
LI_Pack Bit packing (scalar)
LI_TurboPack Bit packing (scalar)
LI_SuperPack Bit packing (scalar)
LI_HorPack Bit packing (sse4.1 horizontal)
LI_BMIPack256 Bit packing (avx2)
lz4 lz4
lz4_bit Bitshuffle + [delta]+lz4 2,4,8
lz4_nibble TurboPFor's [delta]+nibble transpose + lz4 2,4,8
lz4_bitxor Bitshuffle + [xor]+lz4 2,4,8
lz4_nibblexor TurboPFor's [xor]+nibble transpose + lz4 2,4,8
lz4_byte TurboPFor's [delta]+byte transpose + lz4 2,4,8
BitShuffle Bit shuffle (simd) 2,4,8
</pre>