Commit Graph

11 Commits

Author SHA1 Message Date
331fa50501 [feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280)
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.

Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.

The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.

To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
2022-07-08 12:18:39 +08:00
983cdc7b0d [feature-wip](array-type) Support loading data in vectorized format (#10065) 2022-06-15 14:40:28 +08:00
415b6b8086 [feature-wip](array-type) Support array type which doesn't contain null (#9809) 2022-06-12 23:35:28 +08:00
2a11a4ab99 [feature-wip][array-type] Support more sub types. (#9466)
Please refer to #9465
2022-05-26 08:41:34 +08:00
bd126f0679 [improvement] Refactor type info for further optimizations. (#8786)
## Design:

For now, there are two categories of types in Doris, one is for scalar types (such as int, char and etc.) and the other is for composite types (array and etc.). For the sake of performance, we can cache type info of scalar types globally (unique objects) due to the limited number of scalar types. When we consider the composite types, normally, the type info is generated in runtime (we can also use some cache strategy to speed up). The memory thereby should be reclaimed when we create type info for composite types.

There are a lots of interfaces to get the type info of a specific type. I reorganized those as the following describes.
1. `const TypeInfo* get_scalar_type_info(FieldType field_type)`
    The function is used to get the type info of scalar types. Due to the cache, the caller uses the result **WITHOUT** considering the problems about memory reclaim.
2. `const TypeInfo* get_collection_type_info(FieldType sub_type)`
    The function is used to get the type info of array types with just **ONE** depth. Due to the cache, the caller uses the result **WITHOUT** considering the problems about memory reclaim.
3. `TypeInfoPtr get_type_info(segment_v2::ColumnMetaPB* column_meta_pb)`
4. `TypeInfoPtr get_type_info(const TabletColumn* col)`
    These functions are used to get the type info of **BOTH** scalar types and composite types. The caller should be responsible to manage the resources returned.

#### About the new type `TypeInfoPtr`
`TypeInfoPtr` is an alias type to `unique_ptr` with a custom deleter.
1. For scalar types, the deleter does nothing.
2. For composite types, the deleter reclaim the memory.

By analyzing the callers of `get_type_info`, these classes should hold TypeInfoPtr:
1. `Field`
2. `ColumnReader`
3. `DefaultValueColumnIterator`

Other classes are either constructed by the foregoing classes or hold those, so they can just use the raw pointer of `TypeInfo` directly for the sake of performance.
1. `ScalarColumnWriter` - holds `Field`
    1. `ZoneMapIndexWriter` - created by `ScalarColumnWriter`, use `type_info` from the field in `ScalarColumnWriter`
        1. `IndexedColumnWriter` - created by `ZoneMapIndexWriter`, only uses scalar types.
    2. `BitmapIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter`
        1. `IndexedColumnWriter` - created by `BitmapIndexWriter`, uses `type_info` in `BitmapIndexWriter` and  `BitmapIndexWriter` doesn't support `ArrayType`.
    3. `BloomFilterIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter`
        1.  `IndexedColumnWriter` - created by `BloomFilterIndexWriter`, only uses scalar types.
2. `IndexedColumnReader` initializes `type_info` by the field type in meta (only scalar types).
3. `ColumnVectorBatch`
    1. `ZoneMapIndexReader` creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in  `IndexedColumnReader`
    2. `BitmapIndexReader` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BitmapIndexReader`
    3. `BloomFilterIndexWriter` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BloomFilterIndexWriter`
2022-04-20 14:47:29 +08:00
5a44eeaf62 [refactor] Unify all unit tests into one binary file (#8958)
1. solved the previous delayed unit test file size is too large (1.7G+) and the unit test link time is too long problem problems
2. Unify all unit tests into one file to significantly reduce unit test execution time to less than 3 mins
3. temporarily disable stream_load_test.cpp, metrics_action_test.cpp, load_channel_mgr_test.cpp because it will re-implement part of the code and affect other tests
2022-04-12 15:30:40 +08:00
4076c5466b [refactor][improvement](type_info) use template and single instance to refactor get type info logic (#8680)
1. use const pointer instead of shared_ptr
2. Restrict array types to support only primitive types and nest up to 9 levels.
2022-04-03 10:10:36 +08:00
e63afc1a3c [feature-wip](remote storage)(step2) add storage_backend_mgr on BE side (#8663)
1. add storage backend mgr
2. remove env_remote
2022-03-31 11:13:14 +08:00
b522de884c [feature-wip](array-type) Fix compilation error. (#8556) (#8591) 2022-03-22 15:52:34 +08:00
2580da4f72 [feature-wip](array-type) Support insertion for vectorized engine. (#8494) (#8590)
Please refer to #8493
2022-03-22 15:48:13 +08:00
b638c07533 [feature-wip](array-type) Support nested array insertion. (#8305) (#8586)
Please refer to #8304 .
2022-03-22 15:28:26 +08:00