doris

Author	SHA1	Message	Date
Tiewei Fang	c9f86bc7e2	[refactor] Refactoring Status static methods to format message using fmt(#9533 )	2022-07-02 18:58:23 +08:00
yixiutt	f35b235c3b	[opt](compaction) optimize compaction in concurrent load (#10153 ) add some logic to opt compaction: 1.seperate base&cumu compaction in case base compaction runs too long and affect cumu compaction 2.fix level size in cu compaction so that file size below 64M have a right level size, when choose rowsets to do compaction, the policy will ignore big rowset, this will reduce about 25% cpu in high frequency concurrent load 3.remove skip window restriction so rowset can do compaction right after generated, cause we'll not delete rowset after compaction. This will highly reduce compaction score in concurrent log. 4.remove version consistence check in can_do_compaction, we'll choose a consecutive rowset to do compaction, so this logic is useless after add logic above, compaction score and cpu cost will have a substantial optimize in concurrent load. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-06-17 17:49:45 +08:00
Pxl	5805f8077f	[Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003 )	2022-06-16 10:50:08 +08:00
chenlinzhong	4dfebb9852	[Feature] compaction quickly for small data import (#9804 ) * compaction quickly for small data import #9791 1.merge small versions of rowset as soon as possible to increase the import frequency of small version data 2.small version means that the number of rows is less than config::small_compaction_rowset_rows default 1000	2022-06-15 21:48:34 +08:00
plat1ko	f4e2f78a1a	[fix] Fix the bug that data balance causes tablet loss (#9971 ) 1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent. 2. According to #6063, almost apply this fix on current code.	2022-06-15 09:52:56 +08:00
yixiutt	3363b3aa19	[fix](load) fix streamload failure due to false unhealthy replica in concurrent stream load (#10007 ) in concurrent stream load, fe will run publish version task concurrently, which cause publish task disorder in be. For example: fe publish task with version 1 2 3 4 be may handle task with sequence 1 2 4 3 In case above, when report tablet info, be found that version 4 published but version 3 not visible, it'll report version miss to fe, and fe will set replica lastFailedVersion, and finally makes transaction commits fail while no quorum health replicas。 Add a time condition if a version miss for 60 seconds then report version miss.	2022-06-10 09:15:14 +08:00
jacktengg	3743f19369	[feature] support convert alpha rowset (#9890 ) Add alpha rowset to beta rowset convert to convert rowset automatically. We will remove alpha rowset's code after 1.1.	2022-06-04 12:29:03 +08:00
Lijia Liu	47dfdd8e09	[fix](storage) Disable compaction before schema change is actually executed(#9032 ) (#9065 ) As in issue, the combination and schema change at the same time may lead to version intersection. Describe the overview of changes. 1. Do not do compaction before schema change is actually executed. 2. Set tablet as bad when it has version intersection. 3. Do not do schema change when it can not find appropriate versions to delete in new tablet. 4. Do not change rowsets after compaction if the rowsets of the tablet has changed.	2022-06-01 23:29:18 +08:00
Xinyi Zou	ca05d1ee01	[fix](memory tracker) Fix lru cache, compaction tracker, add USE_MEM_TRACKER compile (#9661 ) 1. Fix Lru Cache MemTracker consumption value is negative. 2. Fix compaction Cache MemTracker has no track. 3. Add USE_MEM_TRACKER compile option. 4. Make sure the malloc/free hook is not stopped at any time.	2022-05-25 08:56:17 +08:00
yiguolei	2c79d223e4	[refactor][rowset]move rowset writer to a single place (#9368 )	2022-05-19 23:57:02 +08:00
plat1ko	4cd579b155	[refactor] Check status precise_code instead of construct OLAPInternalError (#9514 ) * check status precise_code instead of construct OLAPInternalError * move is_io_error to Status	2022-05-12 15:39:29 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
pengxiangyu	e157c2c254	[feature-wip](remote-storage) step3: Support remote storage, only for be, add migration_task_v2 (#8806 ) 1. Add TStorageMigrationReqV2 and EngineStorageMigrationTask to support migration action 2. Change TabletManager::create_tablet() for remote storage 3. Change TabletManager::try_delete_unused_tablet_path() for remote storage	2022-04-22 22:38:10 +08:00
yiguolei	e5e0dc421d	[refactor] Change ALL OLAPStatus to Status (#8855 ) Currently, there are 2 status code in BE, one is common/Status.h, and the other is olap/olap_define.h called OLAPStatus. OLAPStatus is just an enum type, it is very simple and could not save many informations, I will unify these code to common/Status.	2022-04-14 11:43:49 +08:00
Zhengguo Yang	290366787c	[refactor] refactor code, replace some file with stl libs (#8759 ) 1. replace ConditionVariables with std::condition_variable 2. repalace Mutex with std::mutex 3. repalce MonoTime with std::chrono	2022-04-13 09:55:29 +08:00
Mingyu Chen	bd0a3369b7	[fix] check disk capacity before writing data (#8887 ) 1. We forgot to check disk capacity when writing data. 2. TODO: the user specified disk capacity is not used now. We need to find a way to use it. 3. Avoid print too much compaction log when there is not suitable version for compaction.	2022-04-08 11:29:49 +08:00
pengxiangyu	e63afc1a3c	[feature-wip](remote storage)(step2) add storage_backend_mgr on BE side (#8663 ) 1. add storage backend mgr 2. remove env_remote	2022-03-31 11:13:14 +08:00
Xinyi Zou	eeae516e37	[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476 ) Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G Implement a new way of memory statistics based on TCMalloc New/Delete Hook, MemTracker and TLS, and it is expected that all memory new/delete/malloc/free of the BE process can be counted.	2022-03-20 23:06:54 +08:00
caiconghui	c666eaadfd	[fix] Fix some mistakes for ReadWriteLock in be (#8464 )	2022-03-15 11:45:00 +08:00
Mingyu Chen	5f8e948125	[fix] BE crash when reporting tablet (#8453 ) this bug was introduced from #8209	2022-03-12 23:12:52 +08:00
caiconghui	c86d469baf	[Refactor](storage_engine) Use std::shared_mutex to replace RWMutex (#8387 )	2022-03-11 18:14:24 +08:00
Mingyu Chen	ffddebfd1d	[fix](report) fix bug that tablet may already be delete when reporting (#8444 ) 1. This bug was introduced by #8209. Error in fe.warn.log: ``` java.lang.IllegalStateException: 560278 at com.google.common.base.Preconditions.checkState(Preconditions.java:508) ~[spark-dpp-0.15-SNAPSHOT.jar:0.15-SNAPSHOT] at org.apache.doris.catalog.TabletInvertedIndex.getReplica(TabletInvertedIndex.java:462) ~[palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.catalog.Catalog.replayBackendReplicasInfo(Catalog.java:6941) ~[palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:626) [palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2446) [palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.master.Checkpoint.doCheckpoint(Checkpoint.java:116) [palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.master.Checkpoint.runAfterCatalogReady(Checkpoint.java:74) [palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:0.15-SNAPSHOT] at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:0.15-SNAPSHOT] ``` Since the reporting of a tablet and the deletion of a tablet are two independent events and are not mutually exclusive, it may happen that the tablet is deleted first and the reporting is done later. 2. Change the tablet report info. Now, the version of a tablet report from BE is the largest continuous version. Eg, versions: [1,2,3,5,7], the report version of this tablet will be 3.	2022-03-11 17:24:20 +08:00
Mingyu Chen	826467e116	[fix](replica) handle replica version missing info to avoid -214 error (#8209 ) In the original tablet reporting information, the version missing information is done by combining two pieces of information as follows: 1. the maximum consecutive version number 2. the `version_miss` field The logic of this approach is confusing and inconsistent with the logic of checking for missing versions when querying. After the change, we directly use the version checking logic used in the query, and set `version_miss` to true if a missing version is found and on the FE processing side. Originally, only the bad replica information was syncronized among FEs, but not the version missing information. As a result, the non-master FE is not aware of the missing version information. In the new design, we deprecate the original log persistence class `BackendTabletsInfo` and use the new `BackendReplicasInfo` to record replica reporting information and write both bad and version missing information to metadata so that other FEs can synchronize these information.	2022-03-09 13:03:22 +08:00
Mingyu Chen	26289c28b0	[fix](load)(compaction) Fix NodeChannel coredump bug and modify some compaction logic (#8072 ) 1. Fix the problem of BE crash caused by destruct sequence. (close #8058) 2. Add a new BE config `compaction_task_num_per_fast_disk` This config specify the max concurrent compaction task num on fast disk(typically .SSD). So that for high speed disk, we can execute more compaction task at same time, to compact the data as soon as possible 3. Avoid frequent selection of unqualified tablet to perform compaction. 4. Modify some log level to reduce the log size of BE. 5. Modify some clone logic to handle error correctly.	2022-02-17 10:52:08 +08:00
yiguolei	aea3e4e59b	[refactor] Remove version hash from BE and related test in BE (#8027 )	2022-02-14 09:29:27 +08:00
GoGoWen	91332fa6bd	[fix](reader) fix logic error for Tablet::capture_rs_readers (#7469 )	2021-12-24 21:32:49 +08:00
pengxiangyu	20ef8a6e21	[feature-wip](remote storage)(step1) use a struct instead of string for parameter path, add basic remote method (#7098 ) For the first, we need to make a parameter to discribe the data is local or remote. At then, we need to support some basic function to support the operation for remote storage.	2021-12-22 22:58:23 +08:00
Mingyu Chen	db57c42c83	[improvement](compaction)(tablet repair) Add missing rowsets in compaction status url and support force dropping redundant replica (#7283 ) 1. Add missing rowsets in compaction status url 2. Add a new config `force_drop_redundant_replica` to force drop redundant replicas. 3. Fix FE ut	2021-12-09 22:34:57 +08:00
Mingyu Chen	adb6bfdf74	[Bug] Fix bug that truncate table may change the storage medium property (#6905 )	2021-10-25 10:07:27 +08:00
HappenLee	a0b3840daa	[MemerySave] Change TabletSchema in tablet to reference to save mem (#6814 ) Change TabletSchema in tablet to reference to save memory	2021-10-16 21:54:32 +08:00
Mingyu Chen	982b76c3c0	[Bug] Fix resource tag bug, add documents and some other bug fix (#6708 ) 1. Fix bug of UNKNOWN Operation Type 91 2. Support using resource_tag property of user to limit the usage of BE 3. Add new FE config `disable_tablet_scheduler` to disable tablet scheduler. 4. Add documents for resource tag. 5. Modify the default value of FE config `default_db_data_quota_bytes` to 1PB. 6. Add a new BE config `disable_compaction_trace_log` to disable the trace log of compaction time cost. 7. Modify the default value of BE config `remote_storage_read_buffer_mb` to 16MB 8. Fix `show backends` results error 9. Add new BE config `external_table_connect_timeout_sec` to set the timeout when connecting to odbc and mysql table. 10. Modify issue template to enable blank issue, for release note or other specific usage. 11. Fix a bug in alpha_row_set split_range() function.	2021-09-28 10:37:42 +08:00
Mingyu Chen	fee8e6afc5	[Bug] Fix some bugs (#6665 ) 1.Fix a potential BE coredump of sending batch when loading data. (Fix [Bug] BE crash when loading data #6656) 2.Fix a potential BE coredump when doing schema change. (Fix [Bug] BE crash when doing alter task #6657) 3.Optimize the metric of base_compaction_request_failed. 4.Add Order column in show tablet result. (Fix [Feature] Add order column in SHOW TABLET stmt result #6658) 5.Fix bug that tablet repair slot not being released. (Fix [Bug] Tablet scheduler stop working #6659) 6.Fix bug that REPLICA_MISSING error can not be handled. (Fix [Bug] REPLICA_MISSING error can not be handled. #6660) 7.Modify column name of SHOW PROC "/cluster_balance/cluster_load_stat" 8.Optimize the result of SHOW PROC "/statistic" to show COLOCATE_MISMATCH tablets (Fix [Feature] the health status of colocate table's tablet is not shown in show proc statistic #6663) 9.Fix bug that show load where state='pending' can not be executed. (Fix [Bug] show load where state='pending' can not be executed. #6664)	2021-09-17 10:11:37 +08:00
Mingyu Chen	c6aa37f5ef	[Alter] Support doing compaction for tablets under alter operation (#6365 ) The problem I want to solve is described in #6355. This CL mainly changes: 1. Support compacting tablets under alter operations On BE side, the compaction logic will select tablets which state is "TABLET_NOTREADY" to do cumulative compaction. 2. Remove "alter_task" field in tablet's meta on BE side. "alter_task" field is never used long time ago 3. Support doing delete operation when table is doing alter operation. Previously, when a table is doing alter operation, execution of delete will return error: Table's state is not NORMAL. But now, delete can be executed successfully only if the condition column is not under schema change. And delete condition will be applied to all materialized indexes.	2021-08-07 21:32:26 +08:00
weizuo93	1454aacd69	[Metric] Add metrics to monitor size of queued tasks in load thread pool (#6306 ) (1) Add metrics to monitor the size of queued tasks in load thread pool. (2) Change some log level to VLOG_NOTICE	2021-07-27 13:41:44 +08:00
Mingyu Chen	4a5f0f859d	[Bug] Add readlock when calling get_rowset_by_version() (#6120 )	2021-07-01 09:19:10 +08:00
Mingyu Chen	81ecf3d097	[Bug] Rebuilt version graph of a tablet when there are too many orphan vertex (#5945 ) The version information of the tablet will be stored in the memory in an adjacency graph data structure. And as the new version is written and the old version is deleted, the data structure will begin to have empty vertex with no edge associations(orphan vertex). These orphan vertexs should be removed somehow.	2021-06-03 09:59:20 +08:00
HappenLee	1a81b9e160	[MemTracker] Some enchance of MemTracker (#5783 ) 1 Make some MemTracker have reasonable parent MemTracker not the root tracker 2 Make each MemTracker can be easily to trace. 3 Add show level of MemTracker to reduce the MemTracker show in the web page to have a way to control show how many tracker in web page.	2021-05-19 09:27:50 +08:00
stdpain	17cd32ffee	[BUG] Fixed uninitialized variables in compaction (#5828 )	2021-05-18 12:13:58 +08:00
Mingyu Chen	8850cfe2ad	[Compaction] Modify compaction logic (#5737 ) 1. Add /api/compaction/run_status to show the running compaction tasks. 2. Support do base and cumulative compaction for one tablet at same time. 3. Modify some log level. 4. Add a feedback document.	2021-05-07 11:18:47 +08:00
Zhengguo Yang	49b2bc39ae	[Optimize] Reduce meaningless memory copies (#5748 ) Reduce meaningless memory copies of rowset_meta pb	2021-05-05 10:20:09 +08:00
weizuo93	e519a24c9a	dynamic adjust compaction policy (#5651 ) Co-authored-by: weizuo <weizuo@xiaomi.com>	2021-04-26 12:39:13 +08:00
wangbo	b4a4c29651	(#5638 ) stale rowset can't be access after clone finish (#5639 ) * (#5638) stale rowset can't be access after clone finish * clear stale rowset after clone	2021-04-19 09:27:41 +08:00
Zhengguo Yang	d641a26490	[Refactor] Remove boost filesystem (#5579 ) * use std::filesystem instead of boost Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>	2021-04-08 09:11:59 +08:00
Mingyu Chen	087fd8159b	[Bug] Fix bug that the stale rowset file will not be deleted (#5527 ) 1. If cumulative compaction compact only one rowset, the old rowset will not be put into `stale_rowset_meta_map` 2. Show rowset id in `/api/compaction/show` Co-authored-by: xxiao2018 <benghua3_1@sina.com>	2021-03-17 22:31:05 +08:00
Mingyu Chen	a6e2c3e3f1	[Bug][Clone] Fix the bug that incremental clone is not triggered (#5230 ) In version 0.13, we support a more efficient compaction logic. This logic will maintain multiple version paths of the tablet. This can avoid -230 errors and can also support incremental clone. But the previous incremental clone uses the incremental rowset meta recorded in `incr_rs_meta`. At present, the incremental rowset meta recorded in `incr_rs_meta` and the records in `stale_rs_meta` are duplicated, and the current clone logic does not adapt to the new multi-version path, resulting in many cases not triggering incremental clone. This CL mainly modified: 1. Removed `incr_rs_meta` metadata 2. Modified the clone logic. When the clone is incremented, it will try to read the rowset in `stale_rs_meta`. 3. Delete a lot of code that was previously used for version compatibility.	2021-02-06 22:04:48 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
Mingyu Chen	3d4b2cb1ae	[Bug] Fix tablet shared ptr circular reference causing the tablet not to be cleared (#5100 ) Regardless of whether the tablet is submitted for compaction or not, we need to call 'reset_compaction' to clean up the base_compaction or cumulative_compaction objects in the tablet, because these two objects store the tablet's own shared_ptr. If it is not cleaned up, the reference count of the tablet will always be greater than 1, thus cannot be collected by the garbage collector. (TabletManager::start_trash_sweep) This bug is introduced from #4891	2020-12-18 21:17:18 +08:00
Yingchun Lai	49f7eb69bf	[Refactor] Refactor DeleteHandler and Cond module (2nd) (#5030 ) * [Refactor] Refactor DeleteHandler and Cond module (#4925) This patch mainly do the following refactors: - Use int64_t instead of int32_t for 'version' in DeleteHandler - Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments - Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid - Use range loop to simplify code - Reduce some compare operations in Cond::del_eval - Improve some branch predictions in Reader - Fix and improve some unit tests	2020-12-08 10:01:18 +08:00
Mingyu Chen	c440aa07d1	Revert "[Refactor] Refactor DeleteHandler and Cond module (#4925 )" (#5028 ) This reverts commit 9c9992e0aa28ee85364eebf86a6675f1073e08fb. Co-authored-by: morningman <chenmingyu@baidu.com>	2020-12-05 21:39:49 +08:00
Yingchun Lai	9c9992e0aa	[Refactor] Refactor DeleteHandler and Cond module (#4925 ) This patch mainly do the following refactors: - Use int64_t instead of int32_t for 'version' in DeleteHandler - Move some comments from .cpp to .h file, add some new comments in .h files, and also remove some meaningless comments - Use switch...case... instead of multiple if..else.. for DeleteConditionHandler::is_condition_value_valid - Use range loop to simplify code - Reduce some compare operations in Cond::del_eval - Improve some branch predictions in Reader - Fix and improve some unit tests	2020-12-04 12:13:30 +08:00

1 2 3

118 Commits