Commit Graph

48 Commits

Author SHA1 Message Date
Pxl
a15a0b9193 [Chore](build) use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp (#20461)
use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp
2023-06-08 19:36:21 +08:00
a68fc551f0 [bug](cooldown) Fix async_write_cooldown_meta and snapshot cooldowned version not continuous bug (#20437) 2023-06-08 15:35:35 +08:00
16a394da0e [chore](build) Use include-what-you-use to optimize includes (PART III) (#18958)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-24 14:51:51 +08:00
d9fe5f7b67 [enhancement](memory) Remove MemPool and replace it with Arena (#17820)
Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not.

Some comparisons between MemPool and Arena:

 1. Expansion
     Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression;
     MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk
     
     After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K.

 2. Alignment
     MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment;
     Arena has no default alignment;

 3. Memory reuse
     Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time.
     MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation

 4. Realloc
     Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is:
         1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start
         2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory
     MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools

 5. check mem limit
     MemPool checks the mem limit, and Arena checks at the Allocator layer.

 6. Support for ASAN
     Arena does something extra

 7. Error handling
     MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception.
Tests that Arena can consider

 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory;

 2. Support clear, memory multiplexing;

 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful.

 4. In some cases, it may be possible to allocate backwards to find chunks t
2023-03-29 20:56:49 +08:00
05db6e9b55 [refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009)
Follow #17586.
This PR mainly changes:

Remove env/
Remove FileUtils/FilesystemUtils
Some methods are moved to LocalFileSystem
Remove olap/file_cache
Add s3 client cache for s3 file system
In my test, the time of open s3 file can be reduced significantly
Fix cold/hot separation bug for s3 fs.
This is the last PR of #17764.
After this, all IO operation should be in io/fs.

Except for tests in #17586, I also tested some case related to fs io:

clone
concurrency query on local/s3/hdfs
load error log create and clean
disk metrics
2023-03-29 09:00:52 +08:00
cb79e42e5c [refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586)
See #17764 for details
I have tested:
- Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp
- Outfile to local/s3/hdfs/broker.
- Load from local/s3/hdfs/broker.
- Query file on local/s3/hdfs/broker file system, with table value function and catalog.
- Backup/Restore with local/s3/hdfs/broker file system

Not test:
- cold & host data separation case.
2023-03-21 21:08:38 +08:00
00a598a839 [feature](cooldown) Decouple storage policy and resource (#15873) 2023-01-31 14:13:47 +08:00
1489e3cfbf [Fix](file system) Make the constructor of XxxFileSystem a private method (#15889)
Since Filesystem inherited std::enable_shared_from_this , it is dangerous to create native point of FileSystem.
To avoid this behavior, making the constructor of XxxFileSystem a private method and using the static method create(...) to get a new FileSystem object.
2023-01-13 15:32:16 +08:00
e640f49b6d [refactor](non-vec) remove non vectorized predicate and row_block (#15348)
remove non vectorized predicate and row_block
2022-12-25 21:45:00 +08:00
83a99a0f8b [refactor](non-vec) Remove non vec code from be (#15278)
* [refactor](removecode) remove some non-vectorization
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-12-22 23:28:30 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
94a6ffb906 [feature](compaction) support vertical_compaction & ordered_data_compaction (#14524) 2022-12-01 22:15:41 +08:00
eab8876abc [Feature](remote) Using heavy schema change if the table is not enable light weight schema change (#13487) 2022-10-28 15:48:22 +08:00
ea57bf6370 [refactor](delete predicate) Unify delete to segmentiterator (#11650)
* remove seek columns and unify delete columns in rowset reader


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-11 15:12:43 +08:00
2068bf2dea [Refactor](predicate) Use primitive type as template argument for predicate (#11647) 2022-08-11 12:06:44 +08:00
321107cb40 [refactor](schema change) Using tablet schema shared ptr instead of raw ptr (#11475)
* Using tabletschema shared ptr instead of raw ptrs


Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-08-05 11:04:38 +08:00
b35daf0a04 [improvement](light-schema-change) Support tablet schema cache (#11131) 2022-08-01 12:18:00 +08:00
d4fb27125a [feature-wip](unique-key-merge-on-write) row id conversion for compaction (#11149) 2022-07-27 16:32:13 +08:00
0b6d2ae290 [fix] Move s3 fs connect outside the lock critical area (#11026)
* fix potential bug of S3FileSystem

* move s3 fs connect outside the lock critical area
2022-07-23 16:06:29 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
a2ed4b5c78 [improvement] improvement for light weight schema change (#10860)
* improvement for dynamic schema
not use schema as lru cache key any more.
load segment just use the rowset's original schema not the current read schema.
generate column reader and column iterator using the original schema, using the read schema if it is a new column.
using column unique id as key instead of column ordinals.
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-07-18 17:53:31 +08:00
486cf0ebd4 [Feature] Lightweight schema change of add/drop column (#10136)
* [Schema Change] support fast add/drop column  (#49)

* [feature](schema-change) support fast schema change. coauthor: yixiutt

* [schema change] Using columns desc from fe to read data. coauthor: Lchangliang

* [feature](schema change) schema change optimize for add/drop columns.

1.add uniqueId field for class column.
2.schema change for add/drop columns directly update schema meta

Co-authored-by: yixiutt <yixiu@selectdb.com>
Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com>

[Feature](schema change) fix write and add regression test (#69)

Co-authored-by: yixiutt <yixiu@selectdb.com>

[schema change] be ssupport that delete use newest schema

add delete regression test

fix regression case (#107)

tmp

[feature](schema change) light schema change exclude rollup and agg/uniq/dup key type.

[feature](schema change) fe olapTable maxUniqueId write in disk.

[feature](schema change) add rpc iface for sc add column.

[feature](schema change) add columnsDesc to TPushReq for ligtht sc.

resolve the deadlock when schema change (#124)

fix columns from fe don't has bitmap_index flag (#134)

add update/delete case

construct MATERIALIZED schema from origin schema when insert

fix not vectorized compaction coredump

use segment cache

choose newest schema by schema version when compaction (#182)

[bugfix](schema change) fix ligth schema change problem.

[feature](schema change) light schema change add alter job. (#1)

fix be ut

[bug] (schema change) unique drop key column should not light schema
change

[feature](schema change) add schema change regression-test.

fix regression test

[bugfix](schema change) fix multi alter clauses for light schema change. (#2)

[bugfix](schema change) fix multi clauses calculate column unique id (#3)

modify PushTask process (#217)

[Bugfix](schema change) fix jobId replay cause bdbje exception.

[bug](schema change) fix max col unique id repeatitive. (#232)

[optimize](schema change) modify pendingMaxColUniqueId generate rule.

fix compaction error
* fix be ut

* fix snapshot load core

fix unique_id error (#278)

[refact](fe) remove redundant code for light schema change. (#4)

[refact](fe) remove redundant code for light schema change. (#4)

format fe core

format be core

fix be ut

modify fe meta version

fix rebase error

flush schema into rowset_meta in old table

[refactor](schema change) refact fe light schema change. (#5)

delete the change of schemahash and support get max version schema

* modify for review

* fix be ut

* fix schema change test
2022-07-12 19:41:06 +08:00
331fa50501 [feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280)
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.

Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.

The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.

To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
2022-07-08 12:18:39 +08:00
e5e0dc421d [refactor] Change ALL OLAPStatus to Status (#8855)
Currently, there are 2 status code in BE, one is common/Status.h,
and the other is olap/olap_define.h called OLAPStatus.
OLAPStatus is just an enum type, it is very simple and could not save many informations,
I will unify these code to common/Status.
2022-04-14 11:43:49 +08:00
5a44eeaf62 [refactor] Unify all unit tests into one binary file (#8958)
1. solved the previous delayed unit test file size is too large (1.7G+) and the unit test link time is too long problem problems
2. Unify all unit tests into one file to significantly reduce unit test execution time to less than 3 mins
3. temporarily disable stream_load_test.cpp, metrics_action_test.cpp, load_channel_mgr_test.cpp because it will re-implement part of the code and affect other tests
2022-04-12 15:30:40 +08:00
Pxl
a8af8d2981 [fix](vectorized) fix core dump on get_json_string and add some ut (#8496) 2022-03-17 10:08:31 +08:00
e17aef9467 [refactor] refactor the implement of MemTracker, and related usage (#8322)
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;

Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
2022-03-11 22:04:23 +08:00
Pxl
cd73a6b84b [chore] fix clang compile error (#7883) 2022-01-26 12:53:35 +08:00
20ef8a6e21 [feature-wip](remote storage)(step1) use a struct instead of string for parameter path, add basic remote method (#7098)
For the first, we need to make a parameter to discribe the data is local or remote.
At then, we need to support some basic function to support the operation for remote storage.
2021-12-22 22:58:23 +08:00
ed7a873a44 [Memory Usage] Implement segment lru cache to save memory of BE (#6829) 2021-10-25 10:07:15 +08:00
6c098e45fc [Optimize][Cache]Implementation of Separated Page Cache (#5008)
#4995
**Implementation of Separated Page Cache**
- Add config "index_page_cache_ratio" to set the ratio of capacity of index page cache
- Change the member of StoragePageCache to maintain two type of cache
- Change the interface of StoragePageCache for selecting type of cache
- Change the usage of page cache in read_and_decompress_page in page_io.cpp
  - add page type as argument
  - check if current page type is available in StoragePageCache (cover the situation of ratio == 0 or 1)
- Add type as argument in superior call of read_and_decompress_page
- Change Unit Test
2021-01-04 12:19:24 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
10e1e29711 Remove header file common/names.h (#4945) 2020-11-26 17:00:48 +08:00
45fa67aa71 [Refactor] Remove objects which are only used for unit test (#4751)
We create some objects which are only used for unit tests, it's not necessary,
and it may cause create duplicate instances for some classes.
This patch remove unnecessary instance of class BlockManager and StoragePageCache.
2020-10-18 21:37:12 +08:00
10f822eb43 [MemTracker] make all MemTrackers shared (#4135)
We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web.
As follows:
1. nearly all MemTracker raw ptr -> shared_ptr
2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent)
3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec 
     before MemTracker's destructor. So we don't change these code.
4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared 
    too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile.
Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.
2020-07-31 21:57:21 +08:00
e4dc2ec440 [StorageEngine] Make StorageEngine::open return more detailed info (#3761)
StorageEngine::open just return a very vague status info when failed,
we have to check logs to find out the root reason, and it's not
convenient to check logs if we run unit tests in CI dockers.
It would be better to return more detailed failure info to point out
the root reason, for example, it may return error status with message
"file descriptors limit is too small".
2020-06-07 10:21:33 +08:00
b58b1b3953 [metrics] Make DorisMetrics to be a real singleton (#3417) 2020-05-04 09:20:53 +08:00
72f3082358 [Metrics] Add some metrics for container size in BE (#3246)
We can observe the workload of BE, and also it's a way to check
whether there is any problem in BE, like some container increase
too large and lead to OOM.

This patch add the following metrics:
```
Name                                   Description
rowset_count_generated_and_in_use      The total count of rowset id generated and in use since BE last start
unused_rowsets_count                   The total count of unused rowset waiting to be GC
broker_count                           The total count of brokers in management
data_stream_receiver_count             The total count of data stream receivers in management
fragment_endpoint_count                The total count of fragment endpoints of data stream in management, should always equal to data_stream_receiver_count
active_scan_context_count              The total count of active scan contexts
plan_fragment_count                    The total count of plan fragments in executing
load_channel_count                     The total count of load channels in management
result_buffer_block_count              The total count of result buffer blocks for queries, each block has a limited queue size (default 1024)
result_block_queue_count               The total count of queues for fragments, each queue has a limited size (default 20, by config::max_memory_sink_batch_count)
routine_load_task_count                The total count of routine load tasks in executing
small_file_cache_count                 The total count of cached small files' digest info
stream_load_pipe_count                 The total count of stream load pipes, each pipe has a limited buffer size (default 1M)
tablet_writer_count                    The total count of tablet writers
brpc_endpoint_stub_count               The total count of brpc endpoints
```
2020-04-25 16:13:39 +08:00
4a7a88ede1 [LSAN] Fix some memory leak detected by LSAN (#3326) 2020-04-22 22:59:44 +08:00
47a3d5000b [UnitTest] Fix unit test bug in BetaRowset and PageCacheTest (#3157)
1. BlockManager has been added into StorageEngine.
   So StorageEngine should be initialized when starting BetaRowset unit test.

2. Cache should not use the same buf to store value, otherwise the address
   will be freed twice and crash.
2020-03-20 20:37:50 +08:00
fbee3c7722 Remove VersionHash used to comparison in BE (#2358) 2019-12-04 20:09:03 +08:00
d0316d158d Refactor and reorganize the file utils (#2089) 2019-11-11 20:25:41 +08:00
4f7cc7e033 add predicate filter(#1652) (#1775) 2019-10-17 19:20:00 +08:00
3bca253fb3 Fix beta rowset read slow (#1994)
[Bug][BetaRowset] fix beta rowset read slowly with limit

beta rowset do not update raw_rows_read in statistics and will read all
data in tablet when query with limit, which lead to long query time.
2019-10-17 19:19:46 +08:00
2fcb79e3ef Fix wrong group by result bug (#1987) 2019-10-16 07:19:53 +08:00
54fd3652e6 Fix bug in BetaRowsetReader which results in empty result (#1754) 2019-09-06 15:07:23 +08:00
a63989cc61 Use RowsetFactory to create and init RowsetWriter (#1740) 2019-09-04 17:02:43 +08:00
f76dad289e Basic implementation for BetaRowsetReader (#1718) 2019-09-03 13:52:16 +08:00