Commit Graph

13 Commits

Author SHA1 Message Date
20ef8a6e21 [feature-wip](remote storage)(step1) use a struct instead of string for parameter path, add basic remote method (#7098)
For the first, we need to make a parameter to discribe the data is local or remote.
At then, we need to support some basic function to support the operation for remote storage.
2021-12-22 22:58:23 +08:00
bd88309346 [Refactor] fix warning in gcc8+, fix warning from brpc, s2 (#5763)
Fix warning from brpc, S2
Fix -Warray-bounds
2021-05-12 10:38:46 +08:00
a56e7e2192 [Refactor] make uint24_t,OLAPIndexFixedHeader as a POD type (#5559) 2021-03-27 18:59:23 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
75e0ba32a1 Fixes some be typo (#4714) 2020-10-13 09:37:15 +08:00
60d9d31ec1 [Optimize] Optimize coding bit operation in BE (#4366)
Optimize bit operation in variable length coding. Remove unnecessary bit operation.
2020-08-20 09:29:53 +08:00
99ad56d1bf Support bitmap index for more type (#2630)
For #2589

1. date(uint24_t)/datetime(int64_t)/largeint(int128_t) use frame of reference code as dict.
2. decimal(decimal12_t) also uses frame of reference code as dict.
3. float/double use bitshuffle code as dict.
2020-01-31 21:09:29 +08:00
3b24287251 Support 64 bits integers for BITMAP type (#2772)
Fixes #2771 

Main changes in this CL
* RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h
* leveraging Roaring64Map to support unsigned BIGINT for BITMAP type
* introduces two new format (SINGLE64 and BITMAP64) for BITMAP type

So far we have three storage format for BITMAP type

```
EMPTY := TypeCode(0x00)
SINGLE32 := TypeCode(0x01), UInt32LittleEndian
BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/)
```

In order to support BIGINT element and keep backward compatibility, introduce two new format

```
SINGLE64 := TypeCode(0x03), UInt64LittleEndian
BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64
```

Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column.

Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are

1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the [C++ version](https://github.com/RoaringBitmap/CRoaring/blob/v0.2.60/cpp/roaring64map.hh#L545) is different from the [Java version](35104c564e/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java (L1097)). Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this.
2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.
2020-01-17 14:13:38 +08:00
d72fbdf425 Support bitmap index build (#2050)
This PR implements the build part of bitmap index support. It follows most of the design described in #1684 , but with the following differences and enhancements

1. Bitmap indexes are now written in the segment file for simplicity. Separate index file would be helpful when we're going to support `alter table add bitmap index` in the future though.
2. We switch to a generalized index page format for all data types rather than specialize for each one. Code simplicity and reusability is preferred here than optimal compression rate.
3. We introduce a new abstraction called `IndexedColumn` to unify the processing of the dictionary section and bitmap section of bitmap index. IndexedColumn is a column with an optional ordinal index and an optional value index. Ordinal index enables us to seek to a particular rowid within the column. Value index requires IndexedColumn to store ordered values and enables us to seek to a particular value. Therefore, the dictionary section can be represented by an IndexedColumn with value index and the bitmap section can be represented by an IndexedColumn with ordinal index.
2019-11-20 13:51:21 +08:00
2159293d23 Fix code's license (#1715) 2019-08-28 18:08:26 +08:00
e30844a321 Add column reader writer for segment V2 (#1346) 2019-06-25 16:59:26 +08:00
d1b1fce92f Change LICENSE file (#1265) 2019-06-09 15:55:46 +08:00
3e1c70d1b7 Add coding function (#1264) 2019-06-08 21:02:31 +08:00