.
This commit is contained in:
146
README.md
146
README.md
@ -1,6 +1,7 @@
|
||||
TurboPFor: Fastest Integer Compression [](https://travis-ci.org/powturbo/TurboPFor)
|
||||
======================================
|
||||
* **TurboPFor: The new synonym for "integer compression"**
|
||||
* :new: (2019.7) all TurboPFor functions now available under 64 bits ARMv8 including NEON SIMD.
|
||||
* 100% C (C++ headers), as simple as memcpy
|
||||
* :+1: **Java** Critical Natives/JNI. Access TurboPFor **incl. SIMD/AVX2!** from Java as fast as calling from C
|
||||
* :sparkles: **FULL** range 8/16/32/64 bits scalar + 16/32/64 bits SIMD functions
|
||||
@ -267,7 +268,7 @@ using [900.000 multicore servers](https://www.cloudyn.com/blog/10-facts-didnt-kn
|
||||
|
||||
+ Unit test: test function from bit size 0 to 32
|
||||
|
||||
./icbench -m0 -M32 -eturbpfor
|
||||
./icbench -m0 -M32 -eturbpfor -fu
|
||||
./icbench -m0 -M8 -eturbopack -fs -n1M
|
||||
|
||||
##### - Data files:
|
||||
@ -275,11 +276,11 @@ using [900.000 multicore servers](https://www.cloudyn.com/blog/10-facts-didnt-kn
|
||||
|
||||
./icbench file
|
||||
./icapp file
|
||||
./icapp -Fs file "16 bits binary file
|
||||
./icapp -Fu file "32 bits binary file
|
||||
./icapp -Fl file "64 bits binary file
|
||||
./icapp -Ff file "32 bits floating point binary file
|
||||
./icapp -Fd file "64 bits floating point binary file
|
||||
./icapp -Fs file "16 bits raw binary file
|
||||
./icapp -Fu file "32 bits raw binary file
|
||||
./icapp -Fl file "64 bits raw binary file
|
||||
./icapp -Ff file "32 bits raw floating point binary file
|
||||
./icapp -Fd file "64 bits raw floating point binary file
|
||||
|
||||
- Text file: 1 entry per line. [Test data: ts.txt(sorted) and lat.txt(unsorted)](https://github.com/zhenjl/encoding/tree/master/benchmark/data))
|
||||
|
||||
@ -385,21 +386,22 @@ In general encoding/decoding functions are of the form:
|
||||
compressed_size : number of bytes read from compressed input buffer in<br />
|
||||
|
||||
### Function syntax:
|
||||
- {vb | p4 | bit | vs}[d | d1 | f | fm | z ]{enc/dec | pack/unpack}[| 128V | 256V][8 | 16 | 32 | 64]:<br />
|
||||
- {vb | p4 | bit | vs}[n][d | d1 | f | fm | z ]{enc/dec | pack/unpack}[| 128V | 256V][8 | 16 | 32 | 64]:<br />
|
||||
vb: variable byte<br />
|
||||
p4: turbopfor<br />
|
||||
vs: variable simple<br />
|
||||
bit: bit packing<br />
|
||||
|
||||
d: delta encoding for increasing integer lists (sorted w/ duplicate)<br />
|
||||
d1: delta encoding for strictly increasing integer lists (sorted unique)<br />
|
||||
f : FOR encoding for sorted integer lists<br />
|
||||
fm: FOR encoding for unsorted integer lists<br />
|
||||
z: ZigZag encoding for unsorted integer lists<br />
|
||||
n : high level array functions for large arrays.
|
||||
|
||||
'' : encoding for unsorted integer lists<br />
|
||||
'd' : delta encoding for increasing integer lists (sorted w/ duplicate)<br />
|
||||
'd1': delta encoding for strictly increasing integer lists (sorted unique)<br />
|
||||
'f' : FOR encoding for sorted integer lists<br />
|
||||
'z' : ZigZag encoding for unsorted integer lists<br />
|
||||
|
||||
enc/pack: encode<br />
|
||||
dec/unpack:decode<br />
|
||||
XX : integer size (8/16/32/64)<br />
|
||||
'enc' or 'pack' : encode or bitpack<br />
|
||||
'dec' or 'unpack': decode or bitunpack<br />
|
||||
'NN' : integer size (8/16/32/64)<br />
|
||||
|
||||
header files to use with documentation:<br />
|
||||
|
||||
@ -411,12 +413,15 @@ header files to use with documentation:<br />
|
||||
|bitpack.h|Bit Packing, For, +Direct Access| bitpack256v32/bitunpack256v32 bitforenc64/bitfordec64|
|
||||
|eliasfano.h|Elias Fano| efanoenc256v32/efanoc256v32 |
|
||||
|
||||
Note: Some low level functions (like p4enc32) are limited to 128/256 (SSE/AVX2) integers per call.
|
||||
|
||||
### Environment:
|
||||
###### OS/Compiler (64 bits):
|
||||
- Linux: GNU GCC (>=4.6)
|
||||
- clang (>=3.2)
|
||||
- Windows: MinGW-w64 (no parallel query processing demo app)
|
||||
- Visual c++ (VS2008-VS2017)
|
||||
- Linux aarch64 for 64 bits ARM CPU : gcc
|
||||
|
||||
###### Multithreading:
|
||||
- All TurboPFor integer compression functions are thread safe
|
||||
@ -451,5 +456,112 @@ header files to use with documentation:<br />
|
||||
* [Small Polygon Compression](https://arxiv.org/abs/1509.05505) + [Poster](http://abhinavjauhri.me/publications/dcc_poster_2016.pdf) + [code](https://github.com/ajauhri/bignum_compression)
|
||||
* [Parallel Graph Analysis (Lecture 18)](http://www.cs.rpi.edu/~slotag/classes/FA16/) + [code](http://www.cs.rpi.edu/~slotag/classes/FA16/handson/lec18-comp2.cpp)
|
||||
|
||||
Last update: 09 Nov 2018
|
||||
Last update: 15 Jul 2019
|
||||
|
||||
## APPENDIX: icbench Integer Compression Benchmark
|
||||
|
||||
##### TurboPFor + external libraries
|
||||
<pre>
|
||||
TurboPFor https://github.com/powturbo/TurboPFor
|
||||
FastPFor (FP) https://github.com/lemire/FastPFor
|
||||
lz4 https://github.com/Cyan4973/lz4
|
||||
LittleIntPacker (LI) https://github.com/lemire/LittleIntPacker
|
||||
MaskedVbyte http://maskedvbyte.org
|
||||
Polycom (PC) https://github.com/encode84/bcm
|
||||
simdcomp (SC) https://github.com/lemire/simdcomp
|
||||
Simple-8b optimized https://github.com/powturbo/TurboPFor
|
||||
Streamvbyte https://github.com/lemire/streamvbyte
|
||||
VarintG8IU https://github.com/lemire/FastPFor
|
||||
</pre>
|
||||
|
||||
##### Functions integrated into 'icbench' for benchmarking
|
||||
<pre>
|
||||
Codec group:
|
||||
TURBOPFOR TurboPFor library TurboPFor256V/TurboPack256V/TurboPFor256N/TurboPFor/TurboPackV/TurboVByte/TurboPack/TurboForDA/EliasFano/VSimple/TurboPForN/TurboPackN/TurboPForDI
|
||||
DEFAULT Default TurboPFor/TurboPackV/TurboVByte/TurboPack/TurboFor/TurboPForN/TurboPackN/TurboPForDI/TurboPFor256V/TurboPack256V/TurboPFor256N
|
||||
BENCH Benchmark TurboPFor/TurboPackV/TurboVByte/TurboPack/QMX/FP.SimdFastPfor/FP.SimdOptPFor/MaskedVbyte/StreamVbyte
|
||||
EFFICIENT Efficient TurboPFor/vsimple/turbovbyte
|
||||
TRANSFORM transpose/shufle,delta,zigzag tpbyte4s/tpbyte,4/tpnibble,4/ZigZag_32/Delta_32/BitShuffle,4
|
||||
BITPACK Bit Packing TurboPack256V/TurboPackV/TurboPackH/TurboPack/SC.SimdPack128/SC.SimdPack256
|
||||
VBYTE Variable byte TurboVByte/FP.VByte/PC.Vbyte/VarintG8IU/MaskedVbyte/StreamVbyte
|
||||
SIMPLE Simple Family simple8b/simple16/vsimple/qmx
|
||||
LZ4 lz4+bitshufle/transpose 4,8 lz4_bitshufle/lz4_tp4/lz4_tp8
|
||||
LI Little Integer LI_Pack/LI_TurboPack/LI_SuperPack/LI_HorPack
|
||||
|
||||
|
||||
Function Description level
|
||||
|
||||
-------- ----------- -----
|
||||
TurboPFor PFor (SSE2)
|
||||
TurboPForN PFor (SSE2) large blocks
|
||||
TurboPFor256 PFor (AVX2)
|
||||
TurboPFor256N PFor (AVX2) large blocks
|
||||
TurboPForDA PFor direct access
|
||||
TurboPForDI PFord min
|
||||
TurboPForZZ PFor zigzag of delta
|
||||
TurboFor FOR
|
||||
TurboForV FOR (SIMD)
|
||||
TurboFor256V FOR (AVX2)
|
||||
TurboForDA FOR direct access
|
||||
TurboPackDA Bit packing direct access
|
||||
TurboPack Bit packing (scalar)
|
||||
TurboPackN Bit packing (scalar) large blocks
|
||||
TurboPackV Bit packing (SSE2 Vertical)
|
||||
TurboPackH Bit packing (SSE2 Horizontal)
|
||||
TurboPackVN Bit packing (SSE2 large block)
|
||||
TurboPack256V Bit packing (AVX2 Vertical)
|
||||
TurboPack256N Bit packing (AVX2 large block)
|
||||
TurboVByte Variable byte (scalar)
|
||||
VSimple Variable simple (scalar)
|
||||
EliasFano Elias fano (scalar)
|
||||
EliasFanoV Eliasfano (SSE2)
|
||||
EliasFano256V Elias fano (AVX2
|
||||
memcpy memcpy
|
||||
copy Integer copy
|
||||
tpbyte4s Byte Transpose (scalar)
|
||||
tpbyte Byte transpose (simd) 2,4,8
|
||||
tpnibble Nibble transpose (simd) 2,4,8
|
||||
ZigZag32 ZigZag encoding (sse2)
|
||||
Delta32 Delta encoding (sse2)
|
||||
DDelta32 Delta of delta encoding (sse2)
|
||||
Xor32 Xor encoding (sse2)
|
||||
FP_PREV64 Floating point PFOR
|
||||
FP_FCM64 Floating point PFOR (FCM)
|
||||
FP_DFCM64 Floating point PFOR (DFCM)
|
||||
TurboPFor64 PFOR 64
|
||||
TurboPFor64V PFOR 64
|
||||
Simple8b 64 bits Simple family (instable)
|
||||
PC_Simple16 Simple 16. limited to 28 bits
|
||||
PC_OptPFD OptPFD. limited to 28 bits
|
||||
PC_Vbyte Variable byte
|
||||
PC_Rice Rice coding (instable)
|
||||
VarintG8IU Variable byte SIMD
|
||||
MaskedVbyte Variable byte SIMD
|
||||
StreamVbyte Variable byte SIMD
|
||||
FP_FastPFor PFor scalar (inefficient for small blocks)
|
||||
FP_SimdFastPFor PFor SIMD (inefficient for small blocks)
|
||||
FP_OptPFor OptPFor scalar
|
||||
FP_SIMDOptPFor OptPFor SIMD
|
||||
FP_VByte Variable byte
|
||||
FP_Simple8bRLE Simple-8b + rle
|
||||
FP_GROUPSIMPLE Group Simple
|
||||
SC_SIMDPack128 Bit packing (SSE4.1)
|
||||
SC_SIMDPack256 Bit packing (SSE4.1)
|
||||
SC_For For (SSE4.1)
|
||||
SC_ForDA For direct access (SSE4.1)
|
||||
LibFor_For For
|
||||
LibFor_ForDA For direct access
|
||||
LI_Pack Bit packing (scalar)
|
||||
LI_TurboPack Bit packing (scalar)
|
||||
LI_SuperPack Bit packing (scalar)
|
||||
LI_HorPack Bit packing (sse4.1 horizontal)
|
||||
LI_BMIPack256 Bit packing (avx2)
|
||||
lz4 lz4
|
||||
lz4_bit Bitshuffle + [delta]+lz4 2,4,8
|
||||
lz4_nibble TurboPFor's [delta]+nibble transpose + lz4 2,4,8
|
||||
lz4_bitxor Bitshuffle + [xor]+lz4 2,4,8
|
||||
lz4_nibblexor TurboPFor's [xor]+nibble transpose + lz4 2,4,8
|
||||
lz4_byte TurboPFor's [delta]+byte transpose + lz4 2,4,8
|
||||
BitShuffle Bit shuffle (simd) 2,4,8
|
||||
</pre>
|
||||
|
||||
|
Reference in New Issue
Block a user