From 5e7ac8c70ae243800da9f384359b1bda2c45be68 Mon Sep 17 00:00:00 2001 From: powturbo Date: Tue, 23 Jun 2015 19:49:35 +0200 Subject: [PATCH] =?UTF-8?q?:new:=20Java+64=20bits=20lists=20for=20BitPacki?= =?UTF-8?q?ng,=20VSimple,=20VByte,=20Elias=20Fano,=E2=80=A6?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index eddd003..a4be1af 100644 --- a/README.md +++ b/README.md @@ -71,7 +71,7 @@ MI/s: 1.000.000 integers/second. 1000 MI/s = 4 GB/s
TurboPForDA,TurboForDA: Direct Access is normally used when accessing individual values. ##### - Data files: - - gov2.sorted from [DocId data set](#DocId data set) Block size=128 (lz4+blosc+VSimple w/ 64k) + - gov2.sorted from [DocId data set](#DocId data set) Block size=128 (lz4+blosc+VSimple w/ 64K) ./icbench -c1 gov2.sorted @@ -90,17 +90,18 @@ TurboPForDA,TurboForDA: Direct Access is normally used when accessing individual | 4.953.768.342| 20.71| 6.63|**1766.05**|**1943.87**|**TurboPackV**| | 4.953.768.342| 20.71| 6.63|1419.35|1512.86|**TurboPack**| | 5.203.353.057| 21.75| 6.96|1560.34|1806.60|SIMDPackD1 FPF| -| 6.074.995.117| 25.40| 8.13| 494.70| 729.97|[blosc](#blosc) lz4 64K| +| 6.074.995.117| 25.40| 8.13| 494.70| 729.97|[blosc_lz4](#blosc) 64K| | 6.221.886.390| 26.01| 8.32|1666.76|1737.72|**TurboFor**| | 6.221.886.390| 26.01| 8.32|1660.52| 565.25|**TurboForDA**| | 6.699.519.000| 28.01| 8.96| 472.01| 495.12|Vbyte FPF| | 6.700.989.563| 28.02| 8.96| 728.72| 991.57|MaskedVByte| | 7.622.896.878| 31.87|10.20| 208.73|1197.74|VarintG8IU| | 8.594.342.216| 35.93|11.50|1307.22|1593.07|libfor| +| 8.773.150.644| 36.68|11.74| 637.83|1301.05|blosc_lz 64K| |23.918.861.764|100.00|32.00|1456.17|1480.78|Copy| -"lz4+DT 64K" after preprocessing Delta+Transpose similar to delta + [blosc](#blosc)
-"blosc lz4" tested w/ lz4 compressor+vectorized shuffle +"lz4+DT 64K" = Delta+Transpose from TurboPFor + lz4
+"blosc_lz4" tested w/ lz4 compressor+vectorized shuffle ##### - Compressed Inverted Index Intersections with GOV2
GOV2: 426GB, 25 Millions documents, average doc. size=18k. @@ -263,7 +264,8 @@ header files to use with documentation:
+ [Index Compression Using 64-Bit Words](http://people.eng.unimelb.edu.au/ammoffat/abstracts/am10spe.html): Simple-8b (speed optimized version tested) + [libfor](https://github.com/cruppstahl/for) + [lz4](https://github.com/Cyan4973/lz4). included w. block size 64K as indication. Tested after preprocessing w. delta+transpose - + [blosc](https://github.com/Blosc/c-blosc). blosc is like transpose/shuffle+lz77. Tested blosc+lz4 incl. vectorizeed shuffle + + [blosc](https://github.com/Blosc/c-blosc). blosc is like transpose/shuffle+lz77. Tested blosc+lz4 and blosclz incl. vectorizeed shuffle.
+ see also [benchmarks from the author of blosc](https://github.com/powturbo/TurboPFor/issues/2) single+multithreading + [Document identifier data set](http://lemire.me/data/integercompression2014.html) + **Publications:** - [SIMD Compression and the Intersection of Sorted Integers](http://arxiv.org/abs/1401.6399) @@ -271,4 +273,4 @@ header files to use with documentation:
- [On Inverted Index Compression for Search Engine Efficiency](http://www.dcs.gla.ac.uk/~craigm/publications/catena14compression.pdf) - [Google's Group Varint Encoding](http://static.googleusercontent.com/media/research.google.com/de//people/jeff/WSDM09-keynote.pdf) -Last update: 22 JUN 2015 +Last update: 23 JUN 2015