Files
doris/docs/en/developer/developer-guide/bitmap-hll-file-format.md
wangyongfeng 2d39cffa5c [doc](website)Add Doris new official website code and documents (#9977)
In order to cooperate with Doris's successful graduation from Apache, the Doris official website also needs a new look
and more powerful feature, so we decided to redesign the Doris official website.
The code and documents of the new official website are included in this PR.

Since the new website is completely rewritten, the content and structure of the project are different from the previous one. 
In particular, the directory structure of documents has changed, and the number of documents is large, so the number of 
files in this PR is very large.

In the old website,all English documents are in the en/ directory, and Chinese documents in the zh-CN/ directory,
but in the new website,the documents are split into multiple directories according to the nav.
The document's directory structure changes as follows:
```
docs (old website)
|   |—— .vuepress (library)
|   |—— en
|   |   |—— admin-manual 
│   │   |—— advanced
|   |   |—— article
|   |   |—— benchmark
|   |   |—— case-user
|   |   |—— community
|   |   |—— data-operate
|   |   |—— data-table
|   |   |—— design
|   |   |—— developer-guide
|   |   |—— downloads
|   |   |—— ecosystem
|   |   |—— faq
|   |   |—— get-starting
|   |   |—— install
|   |   |—— sql-manual
|   |   |—— summary
|   |   |___ README.md
|   |—— zh-CN
...

docs (new website)
|   |—— .vuepress (library)
|   |—— en
|   |   |—— community (unchanged, community nav)
│   │   |—— developer (new directory, developer nav)
│   │   |   |—— design (moved from en/design)
│   │   |   |__ developer-guide (moved from en/developer-guide)
|   |   |—— docs (new directory, all children directories moved from en/, document nav)
│   │   |   |—— admin-manual 
│   │   |   |—— advanced
│   │   |   |—— benchmark
│   │   |   |—— data-operate
│   │   |   |—— data-table
│   │   |   |—— ecosystem
│   │   |   |—— faq
│   │   |   |—— get-starting
│   │   |   |—— install
│   │   |   |—— sql-manual
│   │   |   |—— summary
|   |   |—— downloads (unchanged, downloads nav)
|   |   |—— userCase (moved from en/case-user, user nav)
|   |   |___ README.md
|   |—— zh-CN
...
```
2022-06-08 17:45:12 +08:00

4.2 KiB

title, language
title language
Bitmap/HLL data format en

Bitmap format

Format description

The bitmap in Doris uses roaring bitmap storage, and the be side uses CRoaring. The serialization format of Roaring is compatible in languages ​​such as C++/java/go, while the serialization result of the format of C++ Roaring64Map is the same as that of Roaring64NavigableMap in Java. Not compatible. There are 5 types of Doris bimap, each of which is represented by one byte

The bitmap serialization format in Doris is explained as follows:

 | flag     | data .....|
 <--1Byte--><--n bytes-->

The Flag value description is as follows:

Value Description
0 EMPTY, empty bitmap, the following data part is empty, the whole serialization result is only one byte
1 SINGLE32, there is only one 32-bit unsigned integer value in the bitmap, and the next 4 bytes represent the 32-bit unsigned integer value
2 BITMAP32, 32-bit bitmap corresponds to the type org.roaringbitmap.RoaringBitmap in java. The type is roaring::Roaring in C++, and the following data is the structure after the sequence of roaring::Roaring. You can use org in java. .roaringbitmap.RoaringBitmap or roaring::Roaring in c++ directly deserialize
3 SINGLE64, there is only one 64-bit unsigned integer value in the bitmap, and the next 8 bytes represent the 64-bit unsigned integer value
4 BITMAP64, 64-bit bitmap corresponds to org.roaringbitmap.RoaringBitmap in java; Roaring64Map in doris in c++. The data structure is the same as the result in the roaring library, but the serialization and deserialization methods It is different, there will be 1-8 bytes of variable-length encoding uint64 in the bitmap representation of the size. The following data is a series of multiple high-order representations of 4 bytes and 32-bit roaring bitmap serialized data repeated

C++ serialization and deserialization examples are in the BitmapValue::write() method in be/src/util/bitmap_value.h and the Java examples are in the serialize() deserialize() method in fe/fe-common/src/main/java/org/apache/doris/common/io/BitmapValue.java.

HLL format description

Serialized data in HLL format is implemented in Doris itself. Similar to the Bitmap type, the HLL format is composed of a 1-byte flag followed by multiple bytes of data. The meaning of the flag is as follows

Value Description
0 HLL_DATA_EMPTY, empty HLL, the following data part is empty, the entire serialization result is only one byte
1 HLL_DATA_EXPLICIT, the next byte is explicit The number of data blocks, followed by multiple data blocks, each data block is composed of 8 bytes in length and data,
2 HLL_DATA_SPARSE, only non-zero values are stored, the next 4 bytes indicate the number of registers, and there are multiple register structures in the back. Each register is composed of the index of the first 2 bytes and the value of 1 byte
3 HLL_DATA_FULL, which means that all 16 * 1024 registers have values, followed by 16 * 1024 bytes of value data

C++ serialization and deserialization examples are in the serialize() deserialize() method of be/src/olap/hll.h, and the Java examples are in the serialize() deserialize() method in fe/fe-common/src/main/java/org/apache/doris/common/io/hll.java.