doris

Author	SHA1	Message	Date
zhangdong	ba3a0922eb	[fix](ipv6)Support IPV6 (#22219 ) fe：Remove restrictions from IPv4 be: thrift server Specify binding address be: Restore changed code of “be/src/olap/task/engine_clone_task.cpp”	2023-07-26 08:40:32 +08:00
Pxl	ca71048f7f	[Chore](status) avoid empty error msg on status (#21454 ) avoid empty error msg on status	2023-07-11 13:48:16 +08:00
Xin Liao	691a988c97	[enhancement](merge-on-write) add async publish task when version is discontinuous for merge on write table when clone (#21025 ) version discontinuity may occur when clone. To deal with this case, add async publish task when version is discontinuous.	2023-06-22 21:50:14 +08:00
Chenyang Sun	accaff1026	[Feature](compaction) wip: single replica compaction (#19237 ) Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica. The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica. The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool. When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.	2023-05-30 21:12:48 +08:00
yujun	42239d635a	[fix](tablet_manager_lock) fix create tablet timeout #20067 (#20069 )	2023-05-28 23:05:13 +08:00
YueW	ae352997b4	[Enhancement](alter inverted index) Improve alter inverted index performance with light weight add or drop inverted index (#19063 )	2023-05-28 11:23:07 +08:00
Jack Drogon	93933308e6	[Feature-WIP](CCR): Add ccr doris interface (WIP) (#17881 )	2023-05-26 23:40:49 +08:00
plat1ko	cdfbfd1f6b	[fix](replica) Fix inconsistent replica id between FE and BE (#18688 )	2023-05-06 11:06:29 +08:00
Zhengguo Yang	52b1bd2c81	[clone](download) fix be clone action download tablet content length overflow (#18851 )	2023-04-28 11:35:17 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Mingyu Chen	05db6e9b55	[refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009 ) Follow #17586. This PR mainly changes: Remove env/ Remove FileUtils/FilesystemUtils Some methods are moved to LocalFileSystem Remove olap/file_cache Add s3 client cache for s3 file system In my test, the time of open s3 file can be reduced significantly Fix cold/hot separation bug for s3 fs. This is the last PR of #17764. After this, all IO operation should be in io/fs. Except for tests in #17586, I also tested some case related to fs io: clone concurrency query on local/s3/hdfs load error log create and clean disk metrics	2023-03-29 09:00:52 +08:00
AlexYue	f03598f214	[enhance](cooldown) no snapshot or migration action for cooldown tablet (#17658 )	2023-03-27 13:35:32 +08:00
Xin Liao	0801883604	[fix](merge-on-write) fix that delete bitmap is not calculated correctly when clone tablet (#17334 )	2023-03-05 22:04:28 +08:00
Xin Liao	5190a496ac	[fix](rebalance) fix that the clone operation is not performed due to incorrect condition judgment (#17381 )	2023-03-05 21:58:33 +08:00
plat1ko	26a46d8c3f	[fix](cooldown) Handle full clone with cooldowned rowsets (#17069 )	2023-02-28 11:04:01 +08:00
plat1ko	66ceab540a	[fix](replica) Fix inconsistent replica id between BE and FE in corner case of tablet rebalance (#16889 )	2023-02-22 16:21:11 +08:00
zhengshengjun	d013d529c8	[Feature](ipv6)Support IPV6 (#14063 ) Support IPV6 in Apache Doris, the main changes are: 1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string 2. BRPC and HTTP support binding to IPV6 address 3. BRPC and HTTP support visiting IPV6 Services	2023-02-14 21:43:10 +08:00
plat1ko	7482b6bad2	[fix](cooldown) Add cold_compaction_lock to serialize any operations which may delete the input rowsets of cold data compaction (#16742 ) Add cold_compaction_lock to serialize tablet clone, cold data compaction and follow cooldowned data	2023-02-14 21:38:33 +08:00
plat1ko	5014ad03e7	[feature](cooldown) Auto delete unused remote files (#16588 )	2023-02-13 23:59:39 +08:00
AlexYue	8317c4a752	[Bug](cooldown) set new replica id when early exit in doing clone when no missed versions (#16644 ) * set new replica id * reduce lock * reset when replica id is different	2023-02-13 14:39:03 +08:00
AlexYue	6a8fc35b78	[Bug](Cooldown) fix load balance causing no cooldown replica (#16641 )	2023-02-12 16:47:38 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
AlexYue	2389a90cd0	[enhancement](snapshot) add missed version log when make_snapshot in engine clone task (#14284 )	2022-11-24 14:51:28 +08:00
AlexYue	15eb07b829	[BugFix](file cache) don't clean clone dir when doing _gc_unused_file_caches (#14194 ) * use another file_size overload for noexcept * don't gc clone dir * use better status	2022-11-14 11:35:08 +08:00
Xinyi Zou	0b945fe361	[enhancement](memtracker) Refactor mem tracker hierarchy (#13585 ) mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc. type includes enum Type { GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan QUERY = 1, // Count the memory consumption of all Query tasks. LOAD = 2, // Count the memory consumption of all Load tasks. COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks. SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks. CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots. BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask. CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask. } Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated. other fix: In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.	2022-11-08 09:52:33 +08:00
Xin Liao	554ba40b13	[feature-wip](unique-key-merge-on-write) update delete bitmap when increamental clone (#12364 )	2022-09-09 17:03:27 +08:00
plat1ko	db07e51cd3	[refactor](status) Refactor status handling in agent task (#11940 ) Refactor TaggableLogger Refactor status handling in agent task: Unify log format in TaskWorkerPool Pass Status to the top caller, and replace some OLAPInternalError with more detailed error message Status Premature return with the opposite condition to reduce indention	2022-08-29 12:06:01 +08:00
Lightman	3e13b7d2c2	[Bugfix](light-shema-change) fix _finish_clone dead lock (#11823 ) In engine_clone_task.cpp, it use tablet->tablet_schema() to create rowset, but in the method, it need a lock that already locked in engine_clone_task.cpp:514. It use cloned_tablet_meta->tablet_schema() originally, but modified in #11131. It need to revert to use cloned_tablet_meta->tablet_schema().	2022-08-17 09:10:08 +08:00
Lightman	b35daf0a04	[improvement](light-schema-change) Support tablet schema cache (#11131 )	2022-08-01 12:18:00 +08:00
Xinyi Zou	73d8f5901d	fix mem tracker limiter (#11376 )	2022-08-01 09:44:04 +08:00
plat1ko	a6537a90cd	[Enhancement] Garbage collection of unused data on remote storage backend (#10731 ) * [Feature](cold_on_s3) support unused remote rowset gc * return aborted when skip drop tablet * perform unused remote rowset gc	2022-07-29 14:38:39 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
plat1ko	331fa50501	[feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280 ) This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet, and there is no necessary to prohibit loading new data to cooled tablets. Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without perceiving the underlying filesystem. The abstracted `RemoteFileSystem` can try local caching strategies with different granularity, instead of caching segment files as before. To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory. In the future, `FileReader`s and `FileWriter`s should be unified.	2022-07-08 12:18:39 +08:00
plat1ko	f4e2f78a1a	[fix] Fix the bug that data balance causes tablet loss (#9971 ) 1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent. 2. According to #6063, almost apply this fix on current code.	2022-06-15 09:52:56 +08:00
Xinyi Zou	85362a907e	[fix](mem tracker) Fix some memory leaks, inaccurate statistics, core dump, deadlock bugs (#10072 ) 1. Fix the memory leak. When the load task is canceled, the `IndexChannel` and `NodeChannel` mem trackers cannot be destructed in time. 2. Fix Load task being frequently canceled by oom and inaccurate `LoadChannel` mem tracker limit, and rewrite the variable name of `mem limit` in `LoadChannel`. 3. Fix core dump, when logout task mem tracker, phmap erase fails, resulting in repeated logout of the same tracker. 4. Fix the deadlock, when add_child_tracker mem limit exceeds, calling log_usage causes `_child_trackers_lock` deadlock. 5. Fix frequent log printing when thread mem tracker limit exceeds, which will affect readability and performance. 6. Optimize some details of mem tracker display.	2022-06-14 21:38:37 +08:00
plat1ko	4cd579b155	[refactor] Check status precise_code instead of construct OLAPInternalError (#9514 ) * check status precise_code instead of construct OLAPInternalError * move is_io_error to Status	2022-05-12 15:39:29 +08:00
Xinyi Zou	b34ed43ec9	[feature-wip] (memory tracker) (step6, End) Fix some details (#9301 ) 1. Fix LoadTask, ChunkAllocator, TabletMeta, Brpc, the accuracy of memory track. 2. Modified some MemTracker names, deleted some unnecessary trackers, and improved readability. 3. More powerful MemTracker debugging capabilities. 4. Avoid creating TabletColumn temporary objects and improve BE startup time by 8%. 5. Fix some other details.	2022-05-10 18:17:09 +08:00
caiconghui	580ce38a3f	[fix](schema_hash) Fix bug that introduced by removing schema_hash (#9449 )	2022-05-08 21:03:10 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
yiguolei	e5e0dc421d	[refactor] Change ALL OLAPStatus to Status (#8855 ) Currently, there are 2 status code in BE, one is common/Status.h, and the other is olap/olap_define.h called OLAPStatus. OLAPStatus is just an enum type, it is very simple and could not save many informations, I will unify these code to common/Status.	2022-04-14 11:43:49 +08:00
yiguolei	c872793a23	remove rowset converter since it is useless (#8974 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-04-13 10:40:12 +08:00
Zhengguo Yang	290366787c	[refactor] refactor code, replace some file with stl libs (#8759 ) 1. replace ConditionVariables with std::condition_variable 2. repalace Mutex with std::mutex 3. repalce MonoTime with std::chrono	2022-04-13 09:55:29 +08:00
Xinyi Zou	519305cb22	[feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669 ) Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.	2022-04-08 09:02:26 +08:00
caiconghui	c69dd54116	[refactor](mutex) Use std::mutex to replace Mutex and refactor some lock logic (#8452 )	2022-03-24 14:50:02 +08:00
Xinyi Zou	eeae516e37	[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476 ) Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G Implement a new way of memory statistics based on TCMalloc New/Delete Hook, MemTracker and TLS, and it is expected that all memory new/delete/malloc/free of the BE process can be counted.	2022-03-20 23:06:54 +08:00
caiconghui	c86d469baf	[Refactor](storage_engine) Use std::shared_mutex to replace RWMutex (#8387 )	2022-03-11 18:14:24 +08:00
yiguolei	0ff7de4157	[refactor] remove agent status (#8273 ) There are 3 error code types in BE: OLAPStatus AgentStatus Status. It is very confused and sometimes conflict during write code. I will try to unify them to Status.	2022-03-09 13:04:50 +08:00
Mingyu Chen	26289c28b0	[fix](load)(compaction) Fix NodeChannel coredump bug and modify some compaction logic (#8072 ) 1. Fix the problem of BE crash caused by destruct sequence. (close #8058) 2. Add a new BE config `compaction_task_num_per_fast_disk` This config specify the max concurrent compaction task num on fast disk(typically .SSD). So that for high speed disk, we can execute more compaction task at same time, to compact the data as soon as possible 3. Avoid frequent selection of unqualified tablet to perform compaction. 4. Modify some log level to reduce the log size of BE. 5. Modify some clone logic to handle error correctly.	2022-02-17 10:52:08 +08:00
yiguolei	7d7e3a39f5	[refactor] Remove snapshot converter and unused Protobuf Definitions (#8026 ) 1. remove snapshot converter 2. remove unused protobuf definitions 3. move some macro as const variables	2022-02-12 16:06:04 +08:00
pengxiangyu	20ef8a6e21	[feature-wip](remote storage)(step1) use a struct instead of string for parameter path, add basic remote method (#7098 ) For the first, we need to make a parameter to discribe the data is local or remote. At then, we need to support some basic function to support the operation for remote storage.	2021-12-22 22:58:23 +08:00

1 2

79 Commits