doris

Author	SHA1	Message	Date
Pxl	a15a0b9193	[Chore](build) use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp (#20461 ) use file(GLOB_RECURSE xxx CONFIGURE_DEPENDS) to replace set cpp	2023-06-08 19:36:21 +08:00
Pxl	c9b4eaea76	[Chore](storage) change FieldType to enum class #18500	2023-04-10 08:53:44 +08:00
Mingyu Chen	05db6e9b55	[refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009 ) Follow #17586. This PR mainly changes: Remove env/ Remove FileUtils/FilesystemUtils Some methods are moved to LocalFileSystem Remove olap/file_cache Add s3 client cache for s3 file system In my test, the time of open s3 file can be reduced significantly Fix cold/hot separation bug for s3 fs. This is the last PR of #17764. After this, all IO operation should be in io/fs. Except for tests in #17586, I also tested some case related to fs io: clone concurrency query on local/s3/hdfs load error log create and clean disk metrics	2023-03-29 09:00:52 +08:00
yiguolei	90b12143a3	[refactor](remove unused code) remove runtime tuple structure and useless utils class (#16237 )	2023-01-30 16:45:14 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
plat1ko	db07e51cd3	[refactor](status) Refactor status handling in agent task (#11940 ) Refactor TaggableLogger Refactor status handling in agent task: Unify log format in TaskWorkerPool Pass Status to the top caller, and replace some OLAPInternalError with more detailed error message Status Premature return with the opposite condition to reduce indention	2022-08-29 12:06:01 +08:00
plat1ko	331fa50501	[feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280 ) This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet, and there is no necessary to prohibit loading new data to cooled tablets. Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without perceiving the underlying filesystem. The abstracted `RemoteFileSystem` can try local caching strategies with different granularity, instead of caching segment files as before. To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory. In the future, `FileReader`s and `FileWriter`s should be unified.	2022-07-08 12:18:39 +08:00
yiguolei	e5e0dc421d	[refactor] Change ALL OLAPStatus to Status (#8855 ) Currently, there are 2 status code in BE, one is common/Status.h, and the other is olap/olap_define.h called OLAPStatus. OLAPStatus is just an enum type, it is very simple and could not save many informations, I will unify these code to common/Status.	2022-04-14 11:43:49 +08:00
Zhengguo Yang	5a44eeaf62	[refactor] Unify all unit tests into one binary file (#8958 ) 1. solved the previous delayed unit test file size is too large (1.7G+) and the unit test link time is too long problem problems 2. Unify all unit tests into one file to significantly reduce unit test execution time to less than 3 mins 3. temporarily disable stream_load_test.cpp, metrics_action_test.cpp, load_channel_mgr_test.cpp because it will re-implement part of the code and affect other tests	2022-04-12 15:30:40 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Yingchun Lai	e4dc2ec440	[StorageEngine] Make StorageEngine::open return more detailed info (#3761 ) StorageEngine::open just return a very vague status info when failed, we have to check logs to find out the root reason, and it's not convenient to check logs if we run unit tests in CI dockers. It would be better to return more detailed failure info to point out the root reason, for example, it may return error status with message "file descriptors limit is too small".	2020-06-07 10:21:33 +08:00
LingBin	324f1b8f51	Unify the type of path_hash to `size_t` (#2324 ) The type of path hash should be `size_t`(i.e. `uint32_t`), but the current code mixes `int64_t`, ` int32_t` and `size_t`	2019-11-28 18:48:52 +08:00
Seaven	d0316d158d	Refactor and reorganize the file utils (#2089 )	2019-11-11 20:25:41 +08:00
Mingyu Chen	c643cbd30c	Optimize the load performance for large file (#1798 ) The current load process is: Tablet Sink -> Tablet Channel Mgr -> Tablets Channel -> Delta Writer -> MemTable -> Flush to disk In the path of Tablets Channel -> DeltaWriter -> MemTable -> Flush to disk, the following operations are performed: Insert tuple into different memtables according to tablet ID When the memtable size reaches the threshold, it is written to disk. The above operations are equivalent to single thread execution for a single load task. In fact, the insertion of memtable and the flush of memtable can be executed synchronously. Perform these operation in single thread prevents the insertion of memtable from being delayed due to slow disk writing. In the new implementation, I added a MemTableFlushExecutor class with a set of flush queues and corresponding worker threads. By default, each data directory uses two worker threads for flush, which can be modified by the parameter flush_thread_num_per_store of BE. DeltaWriter will push the full memtable to MemTableFlushExecutor for flush operation and generate a new memtable for receiving new data. This design can improve the performance of load large files. In single host testing, the time to load a 1GB text file is reduced from 48 seconds to 29 seconds.	2019-09-25 13:49:32 +08:00

14 Commits