Commit Graph

440 Commits

Author SHA1 Message Date
d9fe5f7b67 [enhancement](memory) Remove MemPool and replace it with Arena (#17820)
Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not.

Some comparisons between MemPool and Arena:

 1. Expansion
     Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression;
     MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk
     
     After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K.

 2. Alignment
     MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment;
     Arena has no default alignment;

 3. Memory reuse
     Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time.
     MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation

 4. Realloc
     Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is:
         1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start
         2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory
     MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools

 5. check mem limit
     MemPool checks the mem limit, and Arena checks at the Allocator layer.

 6. Support for ASAN
     Arena does something extra

 7. Error handling
     MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception.
Tests that Arena can consider

 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory;

 2. Support clear, memory multiplexing;

 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful.

 4. In some cases, it may be possible to allocate backwards to find chunks t
2023-03-29 20:56:49 +08:00
05db6e9b55 [refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009)
Follow #17586.
This PR mainly changes:

Remove env/
Remove FileUtils/FilesystemUtils
Some methods are moved to LocalFileSystem
Remove olap/file_cache
Add s3 client cache for s3 file system
In my test, the time of open s3 file can be reduced significantly
Fix cold/hot separation bug for s3 fs.
This is the last PR of #17764.
After this, all IO operation should be in io/fs.

Except for tests in #17586, I also tested some case related to fs io:

clone
concurrency query on local/s3/hdfs
load error log create and clean
disk metrics
2023-03-29 09:00:52 +08:00
5191b4f473 [fix](ut)support run be-ut on release mode (#18119)
Fixed improper usage. So now be ut could be run on release mode.
btw, split be build type environment variable to be/be-ut.
2023-03-27 23:00:03 +08:00
2929a96224 [Refactor](inverted index cache) Use asc set instead of priority queue at the lru cache (#18033)
use asc set instead of priority queue at the LRU cache, to keep the lifecycle of the LRUHandle consistent in the sorted set and the LRU free list
2023-03-27 10:27:37 +08:00
6cbf393665 [enhance](meta action) remove useless pb field and refactor writer cooldown meta code (#17652) 2023-03-22 11:13:13 +08:00
cb79e42e5c [refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586)
See #17764 for details
I have tested:
- Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp
- Outfile to local/s3/hdfs/broker.
- Load from local/s3/hdfs/broker.
- Query file on local/s3/hdfs/broker file system, with table value function and catalog.
- Backup/Restore with local/s3/hdfs/broker file system

Not test:
- cold & host data separation case.
2023-03-21 21:08:38 +08:00
bd8e3e6405 [refactor](date) unify DateTimeValue and VecDateTimeValue (#17670) 2023-03-20 16:27:08 +08:00
dd53bc1c8d [unify type system](remove unused type desc) remove some code (#17921)
There are many type definitions in BE. Should unify the type system and simplify the development.



---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-19 14:05:02 +08:00
caed2155f5 [test](fix) use vertorized interface in test (#17649) 2023-03-16 15:23:07 +08:00
273d2100ac [enhance](cooldown) turn write cooldown meta async (#16813) 2023-03-08 14:06:21 +08:00
e94170d81f [feature](cooldown)add ut for cooldown on be (#17246)
* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be

* add ut for cooldown on be
2023-03-07 10:42:53 +08:00
Pxl
d8f0ca7108 [Chore](schema change) remove some unused code in schema change (#17459)
remove some unused code in schema change.
remove some row-based config and code.
2023-03-07 09:18:34 +08:00
400b4bf7a7 [enhance](report) add local and remote size in tablet meta header action (#17406) 2023-03-06 10:43:57 +08:00
62ec74f4e7 segcompaction featuring verticalcompaction (#16731)
This patchset applies the following changes:

using vertical compaction machanism to do segcompaction
basic (WIP) refraction to separate segcompaction logic from BetaRowsetWriter
add segcompaction specific ut and regression tests
2023-03-01 10:55:40 +08:00
33acaa067b [refactor](mempool) remove mempool parameter from key decoder methods (#17137)
decode method is only used for big int and other decode method is only used in unit test.

I remove the useless method and we can remove mempool parameter from decode method.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-27 11:16:14 +08:00
e5f884a6fc [enhancement](cache) make segment cache prune more effectively (#17011)
BloomFilter in MoW table may consume lots of memory, and it's life cycle is same as segment. This patch try to improve the efficiency of recycling segment cache, to release the memory in time.
2023-02-23 18:24:18 +08:00
edead494cb [Enhancement](storage) add a new hidden column __DORIS_VERSION_COL__ for unique key table (#16509) 2023-02-23 15:47:17 +08:00
8eeb435963 [improvement](meta) Enhance Doris's fault tolerance to disk error (#16472)
Sense io error.
Retry query when io error.
Greylist: When finds one disk is completely broken, or the diff of tablet number in BE and FE meta is too large,reduce the query priority of the BE.
2023-02-23 08:40:45 +08:00
b194a7cf83 [improvement](memory) Support GC segment cache, when memory insufficient (#16987)
fix segment cache memory tracker statistics
support GC
2023-02-22 18:31:20 +08:00
0b624d282d [enhancement](ut) add merge-on-write ut code back (#16939) 2023-02-22 16:29:15 +08:00
30dafd6a44 [improve](inverted index) Add element count limit for inverted index searcher cache (#16758)
The element in InvertedIndexSearcherCache is inverted index searcher, which is a file descriptor of inverted index file, so InvertedIndexSearcherCache is actually cache file descriptor of inverted index file.

If open file descriptor limit of the Linux system is set too small and config inverted_index_searcher_cache_limit is too big, during high pressure load maybe cause "Too many open files".

So, when insert inverted index searcher into InvertedIndexSearcherCache, need also check whether reach file_descriptor_number limit for inverted index file.
2023-02-17 11:53:07 +08:00
Pxl
f50edff59d [Chore](build) enable fallthrough check annd fix some fallthrough bug (#16748)
* enable fallthrough check annd fix some fallthrough bug

* fix

* fix
2023-02-15 15:58:43 +08:00
aba843bb2b [Improvement](inverted index) inverted index query match bitmap cache (#16578)
Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows. 

Tests result:
- large result: matching 99% out of 247 million rows shows 8x speed up.
- small result: matching 0.1% out of 247 million rows shows 2x speed up.
2023-02-11 13:38:58 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00
1f631c388d [enhance](cooldown)accelerate cooldown task produce efficiency (#16089) 2023-02-10 16:58:27 +08:00
2bee26b05a [fix](merge-on-write) fix that the query result has duplicate keys (#16336)
* [fix](merge-on-write) fix that the query result has duplicate keys

* add ut
2023-02-06 17:09:53 +08:00
bd8ef4edeb [fix](cooldown) Fix core in remove_all_remote_rowsets (#16374) 2023-02-04 22:31:38 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
6ee0dbfb23 [fix](cooldown) Fix bugs in cooldown single replica files (#16299) 2023-02-02 19:31:26 +08:00
69f34cd1c3 [fix](load) sequence column do not compare correctly in memtable (#16211) 2023-02-02 11:00:23 +08:00
00a598a839 [feature](cooldown) Decouple storage policy and resource (#15873) 2023-01-31 14:13:47 +08:00
90b12143a3 [refactor](remove unused code) remove runtime tuple structure and useless utils class (#16237) 2023-01-30 16:45:14 +08:00
4b6a4b3cf7 [refactor](remove unused code) Remove unused mempool declare or function params (#16222)
* Remove unused mempool declare or function params

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-30 13:03:18 +08:00
5eaa995704 [refactor](some mempool) not memset 0 in default value iterator (#16194)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-29 22:50:39 +08:00
f97c51d786 [enhancement](compaction) Optimize calculating level size of input rowset in SizeBasedCumulativeCompactionPolicy (#15633)
* Optimize calculating level size of input rowset in SizeBasedCumulativeCompactionPolicy
2023-01-29 17:18:50 +08:00
4e64ff6329 [enhancement](load) avoid schema copy to reduce cpu usage (#16034) 2023-01-28 11:13:57 +08:00
615a5e7b51 [refactor](remove non vec code) remove non vec functions and AggregateInfo (#16138)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-25 12:53:05 +08:00
6e8eedc521 [refactor](remove unused code) remove storage buffer and orc reader (#16137)
remove olap storage byte buffer
remove orc reader
remove time operator
remove read_write_util
remove aggregate funcs
remove compress.h and cpp
remove bhp_lib

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-24 22:29:32 +08:00
199d7d3be8 [Refactor]Merged string_value into string_ref (#15925) 2023-01-22 16:39:23 +08:00
116e17428b [Enhancement](point query optimize) improve performace of point query on primary keys (#15491)
1. support row format using codec of jsonb
2. short path optimize for point query
3. support prepared statement for point query
4. support mysql binary format
2023-01-20 13:33:01 +08:00
0b5e71d3b4 [refactor](refactor field) remove unused method (#16068) 2023-01-19 10:16:09 +08:00
42b5d17fa1 [refactor](remove non vec) remove column block and column view (#16022)
* [refactor](remove non vec) remove column block and column view and column vectorized batch

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-18 12:40:53 +08:00
e579530c99 [Feature-WIP](inverted index) support use inverted index searcher cache (#16003)
use inverted index searcher cache to improve query performance

dependency pr: #14211 #15807 #15823
2023-01-18 09:30:55 +08:00
Pxl
b727033906 [Chore](build) enable -Wextra and remove some -Wno (#15760)
enable -Wextra and remove some -Wno
2023-01-15 10:40:35 +08:00
1489e3cfbf [Fix](file system) Make the constructor of XxxFileSystem a private method (#15889)
Since Filesystem inherited std::enable_shared_from_this , it is dangerous to create native point of FileSystem.
To avoid this behavior, making the constructor of XxxFileSystem a private method and using the static method create(...) to get a new FileSystem object.
2023-01-13 15:32:16 +08:00
f17d69e450 [feature](file cache)Import file cache for remote file reader (#15622)
The main purpose of this pr is to import `fileCache` for lakehouse reading remote files.
Use the local disk as the cache for reading remote file, so the next time this file is read,
the data can be obtained directly from the local disk.
In addition, this pr includes a few other minor changes

Import File Cache:
1. The imported `fileCache` is called `block_file_cache`, which uses lru replacement policy.
2. Implement a new FileRereader `CachedRemoteFilereader`, so that the logic of `file cache` is hidden under `CachedRemoteFilereader`.

Other changes:
1. Add a new interface `fs()` for `FileReader`.
2. `IOContext` adds some statistical information to count the situation of `FileCache`

Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>
2023-01-10 12:23:56 +08:00
ab186a60ce [enhancement](compaction) Optimize judging delete rowset and picking candidate rowsets for compaction #15631
Tablet::version_for_delete_predicate should travel all rowset metas in tablet meta which complex is O(N), however we can directly judge whether this rowset is a delete rowset by RowsetMeta::has_delete_predicate which complex is O(1).
As we won't call Tablet::version_for_delete_predicate when pick input rowsets for compaction, we can reduce the critical area of Tablet::_meta_lock.
2023-01-10 08:32:15 +08:00
14eaf41029 [refactor](remove rowblockv2) remove rowblock v2 structure (#15540)
* [refactor](remove rowblockv2) remove rowblock v2 structure

* fix bugs

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-03 09:21:57 +08:00
40c53931e5 [fix](vec) VMergeIterator add key same label for agg table (#14722) 2023-01-02 22:54:21 +08:00
365c3eec16 [enhancement](compaction) vertical compaction support unique-key mow (#15353) 2023-01-02 22:53:04 +08:00