diff --git a/README.md b/README.md index fa987f2..34073b6 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,7 @@ TurboPFor: Fastest Integer Compression [![Build Status](https://travis-ci.org/powturbo/TurboPFor.svg?branch=master)](https://travis-ci.org/powturbo/TurboPFor) ====================================== * **TurboPFor: The new synonym for "integer compression"** + * :new: (2019.7) all TurboPFor functions now available under 64 bits ARMv8 including NEON SIMD. * 100% C (C++ headers), as simple as memcpy * :+1: **Java** Critical Natives/JNI. Access TurboPFor **incl. SIMD/AVX2!** from Java as fast as calling from C * :sparkles: **FULL** range 8/16/32/64 bits scalar + 16/32/64 bits SIMD functions @@ -267,7 +268,7 @@ using [900.000 multicore servers](https://www.cloudyn.com/blog/10-facts-didnt-kn + Unit test: test function from bit size 0 to 32 - ./icbench -m0 -M32 -eturbpfor + ./icbench -m0 -M32 -eturbpfor -fu ./icbench -m0 -M8 -eturbopack -fs -n1M ##### - Data files: @@ -275,11 +276,11 @@ using [900.000 multicore servers](https://www.cloudyn.com/blog/10-facts-didnt-kn ./icbench file ./icapp file - ./icapp -Fs file "16 bits binary file - ./icapp -Fu file "32 bits binary file - ./icapp -Fl file "64 bits binary file - ./icapp -Ff file "32 bits floating point binary file - ./icapp -Fd file "64 bits floating point binary file + ./icapp -Fs file "16 bits raw binary file + ./icapp -Fu file "32 bits raw binary file + ./icapp -Fl file "64 bits raw binary file + ./icapp -Ff file "32 bits raw floating point binary file + ./icapp -Fd file "64 bits raw floating point binary file - Text file: 1 entry per line. [Test data: ts.txt(sorted) and lat.txt(unsorted)](https://github.com/zhenjl/encoding/tree/master/benchmark/data)) @@ -385,21 +386,22 @@ In general encoding/decoding functions are of the form: compressed_size : number of bytes read from compressed input buffer in
### Function syntax: - - {vb | p4 | bit | vs}[d | d1 | f | fm | z ]{enc/dec | pack/unpack}[| 128V | 256V][8 | 16 | 32 | 64]:
+ - {vb | p4 | bit | vs}[n][d | d1 | f | fm | z ]{enc/dec | pack/unpack}[| 128V | 256V][8 | 16 | 32 | 64]:
vb: variable byte
p4: turbopfor
vs: variable simple
bit: bit packing
- - d: delta encoding for increasing integer lists (sorted w/ duplicate)
- d1: delta encoding for strictly increasing integer lists (sorted unique)
- f : FOR encoding for sorted integer lists
- fm: FOR encoding for unsorted integer lists
- z: ZigZag encoding for unsorted integer lists
+ n : high level array functions for large arrays. + + '' : encoding for unsorted integer lists
+ 'd' : delta encoding for increasing integer lists (sorted w/ duplicate)
+ 'd1': delta encoding for strictly increasing integer lists (sorted unique)
+ 'f' : FOR encoding for sorted integer lists
+ 'z' : ZigZag encoding for unsorted integer lists
- enc/pack: encode
- dec/unpack:decode
- XX : integer size (8/16/32/64)
+ 'enc' or 'pack' : encode or bitpack
+ 'dec' or 'unpack': decode or bitunpack
+ 'NN' : integer size (8/16/32/64)
header files to use with documentation:
@@ -411,12 +413,15 @@ header files to use with documentation:
|bitpack.h|Bit Packing, For, +Direct Access| bitpack256v32/bitunpack256v32 bitforenc64/bitfordec64| |eliasfano.h|Elias Fano| efanoenc256v32/efanoc256v32 | +Note: Some low level functions (like p4enc32) are limited to 128/256 (SSE/AVX2) integers per call. + ### Environment: ###### OS/Compiler (64 bits): - Linux: GNU GCC (>=4.6) - clang (>=3.2) - Windows: MinGW-w64 (no parallel query processing demo app) - Visual c++ (VS2008-VS2017) +- Linux aarch64 for 64 bits ARM CPU : gcc ###### Multithreading: - All TurboPFor integer compression functions are thread safe @@ -451,5 +456,112 @@ header files to use with documentation:
* [Small Polygon Compression](https://arxiv.org/abs/1509.05505) + [Poster](http://abhinavjauhri.me/publications/dcc_poster_2016.pdf) + [code](https://github.com/ajauhri/bignum_compression) * [Parallel Graph Analysis (Lecture 18)](http://www.cs.rpi.edu/~slotag/classes/FA16/) + [code](http://www.cs.rpi.edu/~slotag/classes/FA16/handson/lec18-comp2.cpp) -Last update: 09 Nov 2018 +Last update: 15 Jul 2019 + +## APPENDIX: icbench Integer Compression Benchmark + +##### TurboPFor + external libraries +
+TurboPFor               	https://github.com/powturbo/TurboPFor
+FastPFor (FP)              	https://github.com/lemire/FastPFor
+lz4                     	https://github.com/Cyan4973/lz4
+LittleIntPacker (LI)       	https://github.com/lemire/LittleIntPacker
+MaskedVbyte             	http://maskedvbyte.org
+Polycom (PC)               	https://github.com/encode84/bcm
+simdcomp (SC)              	https://github.com/lemire/simdcomp
+Simple-8b optimized     	https://github.com/powturbo/TurboPFor
+Streamvbyte             	https://github.com/lemire/streamvbyte
+VarintG8IU              	https://github.com/lemire/FastPFor
+
+ +##### Functions integrated into 'icbench' for benchmarking +
+Codec group:
+TURBOPFOR        TurboPFor library TurboPFor256V/TurboPack256V/TurboPFor256N/TurboPFor/TurboPackV/TurboVByte/TurboPack/TurboForDA/EliasFano/VSimple/TurboPForN/TurboPackN/TurboPForDI
+DEFAULT          Default TurboPFor/TurboPackV/TurboVByte/TurboPack/TurboFor/TurboPForN/TurboPackN/TurboPForDI/TurboPFor256V/TurboPack256V/TurboPFor256N
+BENCH            Benchmark TurboPFor/TurboPackV/TurboVByte/TurboPack/QMX/FP.SimdFastPfor/FP.SimdOptPFor/MaskedVbyte/StreamVbyte
+EFFICIENT        Efficient TurboPFor/vsimple/turbovbyte
+TRANSFORM        transpose/shufle,delta,zigzag tpbyte4s/tpbyte,4/tpnibble,4/ZigZag_32/Delta_32/BitShuffle,4
+BITPACK          Bit Packing TurboPack256V/TurboPackV/TurboPackH/TurboPack/SC.SimdPack128/SC.SimdPack256
+VBYTE            Variable byte TurboVByte/FP.VByte/PC.Vbyte/VarintG8IU/MaskedVbyte/StreamVbyte
+SIMPLE           Simple Family simple8b/simple16/vsimple/qmx
+LZ4              lz4+bitshufle/transpose 4,8 lz4_bitshufle/lz4_tp4/lz4_tp8
+LI               Little Integer LI_Pack/LI_TurboPack/LI_SuperPack/LI_HorPack
+
+
+Function         Description                                      level
+
+--------         -----------                                      -----
+TurboPFor        PFor (SSE2)
+TurboPForN       PFor (SSE2) large blocks
+TurboPFor256     PFor (AVX2)
+TurboPFor256N    PFor (AVX2) large blocks
+TurboPForDA      PFor direct access
+TurboPForDI      PFord min                                        
+TurboPForZZ      PFor zigzag of delta                             
+TurboFor         FOR                                              
+TurboForV        FOR (SIMD)                                       
+TurboFor256V     FOR (AVX2)                                       
+TurboForDA       FOR direct access                                
+TurboPackDA      Bit packing direct access                        
+TurboPack        Bit packing (scalar)                             
+TurboPackN       Bit packing (scalar) large blocks                
+TurboPackV       Bit packing (SSE2 Vertical)                      
+TurboPackH       Bit packing (SSE2 Horizontal)                    
+TurboPackVN      Bit packing (SSE2 large block)                   
+TurboPack256V    Bit packing (AVX2 Vertical)                      
+TurboPack256N    Bit packing (AVX2 large block)                   
+TurboVByte       Variable byte (scalar)                           
+VSimple          Variable simple (scalar)                         
+EliasFano        Elias fano (scalar)                              
+EliasFanoV       Eliasfano  (SSE2)                                
+EliasFano256V    Elias fano (AVX2                                 
+memcpy           memcpy                                           
+copy             Integer copy                                     
+tpbyte4s         Byte Transpose (scalar)                          
+tpbyte           Byte transpose (simd)                            2,4,8
+tpnibble         Nibble transpose (simd)                          2,4,8
+ZigZag32         ZigZag encoding (sse2)                           
+Delta32          Delta encoding (sse2)                            
+DDelta32         Delta of delta encoding (sse2)                   
+Xor32            Xor encoding (sse2)                              
+FP_PREV64        Floating point PFOR                              
+FP_FCM64         Floating point PFOR (FCM)                        
+FP_DFCM64        Floating point PFOR (DFCM)                       
+TurboPFor64      PFOR 64                                          
+TurboPFor64V     PFOR 64                                          
+Simple8b         64 bits Simple family (instable)                 
+PC_Simple16      Simple 16. limited to 28 bits                    
+PC_OptPFD        OptPFD. limited to 28 bits                       
+PC_Vbyte         Variable byte                                    
+PC_Rice          Rice coding (instable)                           
+VarintG8IU       Variable byte SIMD                               
+MaskedVbyte      Variable byte SIMD                               
+StreamVbyte      Variable byte SIMD                               
+FP_FastPFor      PFor scalar (inefficient for small blocks)       
+FP_SimdFastPFor  PFor SIMD (inefficient for small blocks)         
+FP_OptPFor       OptPFor scalar                                   
+FP_SIMDOptPFor   OptPFor SIMD                                     
+FP_VByte         Variable byte                                    
+FP_Simple8bRLE   Simple-8b + rle                                  
+FP_GROUPSIMPLE   Group Simple                                     
+SC_SIMDPack128   Bit packing (SSE4.1)                             
+SC_SIMDPack256   Bit packing (SSE4.1)                             
+SC_For           For (SSE4.1)                                     
+SC_ForDA         For direct access (SSE4.1)                       
+LibFor_For       For                                              
+LibFor_ForDA     For direct access                                
+LI_Pack          Bit packing (scalar)                             
+LI_TurboPack     Bit packing (scalar)                             
+LI_SuperPack     Bit packing (scalar)                             
+LI_HorPack       Bit packing (sse4.1 horizontal)                  
+LI_BMIPack256    Bit packing (avx2)                               
+lz4              lz4                                              
+lz4_bit          Bitshuffle + [delta]+lz4                         2,4,8
+lz4_nibble       TurboPFor's [delta]+nibble transpose + lz4       2,4,8
+lz4_bitxor       Bitshuffle + [xor]+lz4                           2,4,8
+lz4_nibblexor    TurboPFor's [xor]+nibble transpose + lz4         2,4,8
+lz4_byte         TurboPFor's [delta]+byte transpose + lz4         2,4,8
+BitShuffle       Bit shuffle (simd)                               2,4,8
+