diff --git a/README.md b/README.md
index fa987f2..34073b6 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,7 @@
 TurboPFor: Fastest Integer Compression [![Build Status](https://travis-ci.org/powturbo/TurboPFor.svg?branch=master)](https://travis-ci.org/powturbo/TurboPFor)
 ======================================
 * **TurboPFor: The new synonym for "integer compression"**
+  * :new: (2019.7) all TurboPFor functions now available under 64 bits ARMv8 including NEON SIMD.
   * 100% C (C++ headers), as simple as memcpy
   * :+1: **Java** Critical Natives/JNI. Access TurboPFor **incl. SIMD/AVX2!** from Java as fast as calling from C
   * :sparkles: **FULL** range 8/16/32/64 bits scalar + 16/32/64 bits SIMD functions
@@ -267,7 +268,7 @@ using [900.000 multicore servers](https://www.cloudyn.com/blog/10-facts-didnt-kn
 		
   + Unit test: test function from bit size 0 to 32
   
-        ./icbench -m0 -M32 -eturbpfor 
+        ./icbench -m0 -M32 -eturbpfor -fu 
         ./icbench -m0 -M8 -eturbopack -fs -n1M 
 
 ##### - Data files:
@@ -275,11 +276,11 @@ using [900.000 multicore servers](https://www.cloudyn.com/blog/10-facts-didnt-kn
 
         ./icbench file
         ./icapp file           
-        ./icapp -Fs file         "16 bits binary file
-        ./icapp -Fu file         "32 bits binary file
-        ./icapp -Fl file         "64 bits binary file
-        ./icapp -Ff file         "32 bits floating point binary file
-        ./icapp -Fd file         "64 bits floating point binary file
+        ./icapp -Fs file         "16 bits raw binary file
+        ./icapp -Fu file         "32 bits raw binary file
+        ./icapp -Fl file         "64 bits raw binary file
+        ./icapp -Ff file         "32 bits raw floating point binary file
+        ./icapp -Fd file         "64 bits raw floating point binary file
 
   - Text file: 1 entry per line. [Test data: ts.txt(sorted) and lat.txt(unsorted)](https://github.com/zhenjl/encoding/tree/master/benchmark/data))
 
@@ -385,21 +386,22 @@ In general encoding/decoding functions are of the form:
   compressed_size : number of bytes read from compressed input buffer in<br />
 
 ### Function syntax:
- - {vb | p4 | bit | vs}[d | d1 | f | fm | z ]{enc/dec | pack/unpack}[| 128V | 256V][8 | 16 | 32 | 64]:<br />
+ - {vb | p4 | bit | vs}[n][d | d1 | f | fm | z ]{enc/dec | pack/unpack}[| 128V | 256V][8 | 16 | 32 | 64]:<br />
    vb:  variable byte<br />
    p4:  turbopfor<br />
    vs:  variable simple<br />
    bit: bit packing<br />
-		
-   d:  delta encoding for increasing integer lists (sorted w/ duplicate)<br />
-   d1: delta encoding for strictly increasing integer lists (sorted unique)<br />
-   f : FOR encoding for sorted integer lists<br />
-   fm: FOR encoding for unsorted integer lists<br />
-   z:  ZigZag encoding for unsorted integer lists<br />
+   n :  high level array functions for large arrays.
+	
+   ''  : encoding for unsorted integer lists<br />
+   'd' : delta encoding for increasing integer lists (sorted w/ duplicate)<br />
+   'd1': delta encoding for strictly increasing integer lists (sorted unique)<br />
+   'f' : FOR encoding for sorted integer lists<br />
+   'z' :  ZigZag encoding for unsorted integer lists<br />
    
-   enc/pack:  encode<br />
-   dec/unpack:decode<br />
-   XX : integer size (8/16/32/64)<br />
+   'enc' or 'pack'  : encode or bitpack<br />
+   'dec' or 'unpack': decode or bitunpack<br />
+   'NN'             : integer size (8/16/32/64)<br />
    
 header files to use with documentation:<br />
 
@@ -411,12 +413,15 @@ header files to use with documentation:<br />
 |bitpack.h|Bit Packing, For, +Direct Access| bitpack256v32/bitunpack256v32 bitforenc64/bitfordec64|
 |eliasfano.h|Elias Fano| efanoenc256v32/efanoc256v32 |
 
+Note: Some low level functions (like p4enc32) are limited to 128/256 (SSE/AVX2) integers per call.
+
 ### Environment:
 ###### OS/Compiler (64 bits):
 - Linux: GNU GCC (>=4.6)
 - clang (>=3.2)
 - Windows: MinGW-w64 (no parallel query processing demo app)
 - Visual c++ (VS2008-VS2017)
+- Linux aarch64 for 64 bits ARM CPU : gcc
 
 ###### Multithreading:
 - All TurboPFor integer compression functions are thread safe
@@ -451,5 +456,112 @@ header files to use with documentation:<br />
   * [Small Polygon Compression](https://arxiv.org/abs/1509.05505) + [Poster](http://abhinavjauhri.me/publications/dcc_poster_2016.pdf) + [code](https://github.com/ajauhri/bignum_compression)
   * [Parallel Graph Analysis (Lecture 18)](http://www.cs.rpi.edu/~slotag/classes/FA16/) + [code](http://www.cs.rpi.edu/~slotag/classes/FA16/handson/lec18-comp2.cpp)
 
-Last update:  09 Nov 2018
+Last update:  15 Jul 2019
+
+## APPENDIX: icbench Integer Compression Benchmark
+
+##### TurboPFor + external libraries
+<pre>
+TurboPFor               	https://github.com/powturbo/TurboPFor
+FastPFor (FP)              	https://github.com/lemire/FastPFor
+lz4                     	https://github.com/Cyan4973/lz4
+LittleIntPacker (LI)       	https://github.com/lemire/LittleIntPacker
+MaskedVbyte             	http://maskedvbyte.org
+Polycom (PC)               	https://github.com/encode84/bcm
+simdcomp (SC)              	https://github.com/lemire/simdcomp
+Simple-8b optimized     	https://github.com/powturbo/TurboPFor
+Streamvbyte             	https://github.com/lemire/streamvbyte
+VarintG8IU              	https://github.com/lemire/FastPFor
+</pre>
+
+##### Functions integrated into 'icbench' for benchmarking
+<pre>
+Codec group:
+TURBOPFOR        TurboPFor library TurboPFor256V/TurboPack256V/TurboPFor256N/TurboPFor/TurboPackV/TurboVByte/TurboPack/TurboForDA/EliasFano/VSimple/TurboPForN/TurboPackN/TurboPForDI
+DEFAULT          Default TurboPFor/TurboPackV/TurboVByte/TurboPack/TurboFor/TurboPForN/TurboPackN/TurboPForDI/TurboPFor256V/TurboPack256V/TurboPFor256N
+BENCH            Benchmark TurboPFor/TurboPackV/TurboVByte/TurboPack/QMX/FP.SimdFastPfor/FP.SimdOptPFor/MaskedVbyte/StreamVbyte
+EFFICIENT        Efficient TurboPFor/vsimple/turbovbyte
+TRANSFORM        transpose/shufle,delta,zigzag tpbyte4s/tpbyte,4/tpnibble,4/ZigZag_32/Delta_32/BitShuffle,4
+BITPACK          Bit Packing TurboPack256V/TurboPackV/TurboPackH/TurboPack/SC.SimdPack128/SC.SimdPack256
+VBYTE            Variable byte TurboVByte/FP.VByte/PC.Vbyte/VarintG8IU/MaskedVbyte/StreamVbyte
+SIMPLE           Simple Family simple8b/simple16/vsimple/qmx
+LZ4              lz4+bitshufle/transpose 4,8 lz4_bitshufle/lz4_tp4/lz4_tp8
+LI               Little Integer LI_Pack/LI_TurboPack/LI_SuperPack/LI_HorPack
+
+
+Function         Description                                      level
+
+--------         -----------                                      -----
+TurboPFor        PFor (SSE2)
+TurboPForN       PFor (SSE2) large blocks
+TurboPFor256     PFor (AVX2)
+TurboPFor256N    PFor (AVX2) large blocks
+TurboPForDA      PFor direct access
+TurboPForDI      PFord min                                        
+TurboPForZZ      PFor zigzag of delta                             
+TurboFor         FOR                                              
+TurboForV        FOR (SIMD)                                       
+TurboFor256V     FOR (AVX2)                                       
+TurboForDA       FOR direct access                                
+TurboPackDA      Bit packing direct access                        
+TurboPack        Bit packing (scalar)                             
+TurboPackN       Bit packing (scalar) large blocks                
+TurboPackV       Bit packing (SSE2 Vertical)                      
+TurboPackH       Bit packing (SSE2 Horizontal)                    
+TurboPackVN      Bit packing (SSE2 large block)                   
+TurboPack256V    Bit packing (AVX2 Vertical)                      
+TurboPack256N    Bit packing (AVX2 large block)                   
+TurboVByte       Variable byte (scalar)                           
+VSimple          Variable simple (scalar)                         
+EliasFano        Elias fano (scalar)                              
+EliasFanoV       Eliasfano  (SSE2)                                
+EliasFano256V    Elias fano (AVX2                                 
+memcpy           memcpy                                           
+copy             Integer copy                                     
+tpbyte4s         Byte Transpose (scalar)                          
+tpbyte           Byte transpose (simd)                            2,4,8
+tpnibble         Nibble transpose (simd)                          2,4,8
+ZigZag32         ZigZag encoding (sse2)                           
+Delta32          Delta encoding (sse2)                            
+DDelta32         Delta of delta encoding (sse2)                   
+Xor32            Xor encoding (sse2)                              
+FP_PREV64        Floating point PFOR                              
+FP_FCM64         Floating point PFOR (FCM)                        
+FP_DFCM64        Floating point PFOR (DFCM)                       
+TurboPFor64      PFOR 64                                          
+TurboPFor64V     PFOR 64                                          
+Simple8b         64 bits Simple family (instable)                 
+PC_Simple16      Simple 16. limited to 28 bits                    
+PC_OptPFD        OptPFD. limited to 28 bits                       
+PC_Vbyte         Variable byte                                    
+PC_Rice          Rice coding (instable)                           
+VarintG8IU       Variable byte SIMD                               
+MaskedVbyte      Variable byte SIMD                               
+StreamVbyte      Variable byte SIMD                               
+FP_FastPFor      PFor scalar (inefficient for small blocks)       
+FP_SimdFastPFor  PFor SIMD (inefficient for small blocks)         
+FP_OptPFor       OptPFor scalar                                   
+FP_SIMDOptPFor   OptPFor SIMD                                     
+FP_VByte         Variable byte                                    
+FP_Simple8bRLE   Simple-8b + rle                                  
+FP_GROUPSIMPLE   Group Simple                                     
+SC_SIMDPack128   Bit packing (SSE4.1)                             
+SC_SIMDPack256   Bit packing (SSE4.1)                             
+SC_For           For (SSE4.1)                                     
+SC_ForDA         For direct access (SSE4.1)                       
+LibFor_For       For                                              
+LibFor_ForDA     For direct access                                
+LI_Pack          Bit packing (scalar)                             
+LI_TurboPack     Bit packing (scalar)                             
+LI_SuperPack     Bit packing (scalar)                             
+LI_HorPack       Bit packing (sse4.1 horizontal)                  
+LI_BMIPack256    Bit packing (avx2)                               
+lz4              lz4                                              
+lz4_bit          Bitshuffle + [delta]+lz4                         2,4,8
+lz4_nibble       TurboPFor's [delta]+nibble transpose + lz4       2,4,8
+lz4_bitxor       Bitshuffle + [xor]+lz4                           2,4,8
+lz4_nibblexor    TurboPFor's [xor]+nibble transpose + lz4         2,4,8
+lz4_byte         TurboPFor's [delta]+byte transpose + lz4         2,4,8
+BitShuffle       Bit shuffle (simd)                               2,4,8
+</pre>