doris

Author	SHA1	Message	Date
Xinyi Zou	63d57b83f3	[fix](memory) Fix request jemallloc metrics wait lock je_malloc_mutex_lock_slow #16381 MetricRegistry::trigger_all_hooks holds the metrics lock and is stuck in get_je_metrics, to_prometheus is waiting for MetricRegistry::trigger_all_hooks to release the lock, so get_je_metrics is no longer called in MetricRegistry::trigger_all_hooks.	2023-02-04 22:49:22 +08:00
Xinyi Zou	17885acd09	[improvement](metrics) Metrics add all rowset nums and segment nums (#16208 )	2023-01-30 09:55:32 +08:00
Xinyi Zou	e9afd3210c	[improvement](memory) Optimize the log of process memory insufficient and support regular GC cache (#16084 ) 1. When the process memory is insufficient, print the process memory statistics in a more timely and detailed manner. 2. Support regular GC cache, currently only page cache and chunk allocator are included, because many people reported that the memory does not drop after the query ends. 3. Reduce system available memory warning water mark to reduce memory waste 4. Optimize soft mem limit logging	2023-01-29 10:02:04 +08:00
yiguolei	e49766483e	[refactor](remove unused code) remove many xxxVal structure (#16143 ) remove many xxxVal structure remove BetaRowsetWriter::_add_row remove anyval_util.cpp remove non-vectorized geo functions remove non-vectorized like predicate Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-28 14:17:43 +08:00
caiconghui	0148b39de0	[fix](metric) fix be down when enable_system_metrics is false (#16140 ) if we set enable_system_metrics to false, we will see be down with following message "enable metric calculator failed, maybe you set enable_system_metrics to false ", so fix it Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-01-28 00:10:39 +08:00
yiguolei	adb758dcac	[refactor](remove non vec code) remove json functions string functions match functions and some code (#16141 ) remove json functions code remove string functions code remove math functions code move MatchPredicate to olap since it is only used in storage predicate process remove some code in tuple, Tuple structure should be removed in the future. remove many code in collection value structure, they are useless	2023-01-26 16:21:12 +08:00
yiguolei	615a5e7b51	[refactor](remove non vec code) remove non vec functions and AggregateInfo (#16138 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-25 12:53:05 +08:00
yiguolei	6e8eedc521	[refactor](remove unused code) remove storage buffer and orc reader (#16137 ) remove olap storage byte buffer remove orc reader remove time operator remove read_write_util remove aggregate funcs remove compress.h and cpp remove bhp_lib Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-24 22:29:32 +08:00
yiguolei	79ad74637d	[refactor](remove expr) remove non vectorized Expr and ExprContext related codes (#16136 )	2023-01-24 10:45:35 +08:00
Xin Huang	05f0f63718	[fix](daemon) should use GetMonoTimeMicros() (#16070 )	2023-01-19 10:44:06 +08:00
yiguolei	16862d9b43	[refactor](remove unused code) remove buffer pool and disk io mgr (#15853 ) * [refactor](remove buffer pool and disk io mgr) remove unused code Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-13 09:42:58 +08:00
Gabriel	0fbdf8e3e1	[Refactor](table function) Decouple vectorized table functions from non-vectorized ones (#15772 )	2023-01-12 15:08:21 +08:00
TengJianPing	8f31a36429	[feature] support spill to disk for sort node (#15624 )	2023-01-11 08:40:58 +08:00
YueW	edecc2e706	[feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211 ) * [feature-wip](inverted index)inverted index api: reader * [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL * [feature-wip](inverted index) Adapt to index meta * [enhance] add more metrics * [enhance] add fulltext match query check for column type and index parser * [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node	2022-12-30 21:48:14 +08:00
Xinyi Zou	c16cc5c602	[fix](memtracker) Fix load channel memory tracker are not refreshed in time (#15048 )	2022-12-16 10:43:03 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
Xinyi Zou	cdbbf1e4ee	[enhancement](memory) Add Memory GC when the available memory of the BE process is lacking (#14712 ) When the system MemAvailable is less than the warning water mark, or the memory used by the BE process exceeds the mem soft limit, run minor gc and try to release cache. When the MemAvailable of the system is less than the low water mark, or the memory used by the BE process exceeds the mem limit, run fucc gc, try to release the cache, and start canceling from the query with the largest memory usage until the memory of mem_limit * 20% is released.	2022-12-07 15:28:52 +08:00
Yongqiang YANG	07472f7318	[fix](tcmalloc_gc) optimize policy of tcmalloc gc (#14776 ) Release memory when memory pressure is above pressure limit and keep at lease 2% memory as tcmalloc cache.	2022-12-05 21:16:35 +08:00
Yongqiang YANG	5b29489c7f	(tcmalloc) gc does not work in somecases (#14732 ) gc does not work in some cases	2022-12-02 09:18:23 +08:00
Yongqiang YANG	486a77fec0	[fix](tcmalloc) use low_watermark instead of hard_mem_limit (#14660 ) * [fix](tcmalloc) use low_watermark instead of hard_mem_limit hard_mem_limit is removed. * format	2022-11-30 11:29:57 +08:00
Xinyi Zou	e1f0fa069c	[enhancement](memory) Refactored process memory statistics periodically refresh, and fix catch bad_alloc (#14580 )	2022-11-29 10:15:25 +08:00
Yongqiang YANG	0702277196	[improvement](tcmalloc) add moderate mode and avoid oom with a lot of cache (#14374 ) ReleaseToSystem aggressively when there are little free memory.	2022-11-28 20:17:51 +08:00
Yongqiang YANG	8b3afd431e	[improvement](memory) simplify memory config related to tcmalloc (#13781 ) There are several configs related to tcmalloc, users do know how to config them. Actually users just want two modes, performance or compact, in performance mode, users want doris run query and load quickly while in compact mode, users want doris run with less memory usage. If we want to config tcmalloc individually, we can use env variables which are supported by tcmalloc.	2022-11-01 21:45:19 +08:00
Xinyi Zou	9454bcca12	[fix](memory) Fix USE_JEMALLOC=true UBSAN compilation error #13398	2022-10-17 08:52:14 +08:00
Yongqiang YANG	9dc35ab534	[fix](streamload) set coord for streamLoad (#12744 ) When a stream load is canceled, status is reported to coord.	2022-09-23 20:23:19 +08:00
Xinyi Zou	42b6532131	remove gc and fix print (#12682 )	2022-09-17 00:16:15 +08:00
plat1ko	db07e51cd3	[refactor](status) Refactor status handling in agent task (#11940 ) Refactor TaggableLogger Refactor status handling in agent task: Unify log format in TaskWorkerPool Pass Status to the top caller, and replace some OLAPInternalError with more detailed error message Status Premature return with the opposite condition to reduce indention	2022-08-29 12:06:01 +08:00
Mingyu Chen	abbf75d302	[doc][refactor](metrics) Reorganize FE and BE metrics and add document (#11307 )	2022-08-02 11:34:06 +08:00
lihangyu	b04a791895	[Enhancement] support compile with jemalloc (#10542 ) A test feature to use jemalloc as default malloc.	2022-07-11 12:15:35 +08:00
Xinyi Zou	deeb3028ad	[Enhancement] [Memory] [Vectorized] Stress test and optimize memory allocation (#9581 ) * vec stress test, Allocator introduce chunkallocator * fix comment	2022-06-29 02:57:51 +08:00
Pxl	fd0bd395ac	[Enhancement] Remove some unused include (#10035 )	2022-06-17 10:47:25 +08:00
Zhengguo Yang	290366787c	[refactor] refactor code, replace some file with stl libs (#8759 ) 1. replace ConditionVariables with std::condition_variable 2. repalace Mutex with std::mutex 3. repalce MonoTime with std::chrono	2022-04-13 09:55:29 +08:00
spaces-x	bea9a7ba4f	[feature] Support pre-aggregation for quantile type (#8234 ) Add a new column-type to speed up the approximation of quantiles. 1. The new column-type is named `quantile_state` with fixed aggregation function `quantile_union`, which stores the intermediate results of pre-aggregated approximation calculations for quantiles. 2. support pre-aggregation of new column-type and quantile_state related functions.	2022-03-24 09:11:34 +08:00
yiguolei	989e03ddf9	[improvement] Improve sig handler (#8545 ) * Refactor glog's default signal handler Co-authored-by: Zhengguo Yang <780531911@qq.com>	2022-03-22 10:40:31 +08:00
Xinyi Zou	e17aef9467	[refactor] refactor the implement of MemTracker, and related usage (#8322 ) Modify the implementation of MemTracker: 1. Simplify a lot of useless logic; 2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing; 3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes; 4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection 5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently; 6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later; 7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env; 8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.; Modify where MemTracker is used: 1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code; 2. Added trackers for global objects such as ChunkAllocator and StorageEngine; 3. Added more fine-grained trackers such as ExprContext; 4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode; 5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;	2022-03-11 22:04:23 +08:00
Zhengguo Yang	d9c2c2cac6	Revert "[refactor] remove unused new_in_predicate code (#8263 )" (#8372 ) This reverts commit 757e35744d4f6319e936fca84b4be13cf043a578.	2022-03-07 15:55:38 +08:00
Zhengguo Yang	757e35744d	[refactor] remove unused new_in_predicate code (#8263 ) remove unused code of new_in_predicate.h/cpp	2022-03-01 11:11:42 +08:00
Zhengguo Yang	f3817829bb	[fix] fix malloc and free mismatch issue (#7702 ) The memory allocate by `malloc` should be freed by `free`	2022-01-14 09:32:33 +08:00
Mingyu Chen	0499b2211b	[feat](lateral-view) Support execution of lateral view stmt (#7255 ) 1. Add table function node 2. Add 3 table functions: explode_split, explode_bitmap and explode_json_array	2021-12-16 10:46:15 +08:00
Zhengguo Yang	ed3ff470ce	[ARRAY] Support array type load and select not include access by index (#5980 ) This is part of the array type support and has not been fully completed. The following functions are implemented 1. fe array type support and implementation of array function, support array syntax analysis and planning 2. Support import array type data through insert into 3. Support select array type data 4. Only the array type is supported on the value lie of the duplicate table this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979	2021-07-13 14:02:39 +08:00
Zhengguo Yang	739c0268ff	[refactor] Remove decimal v1 related code from code base (#6079 ) remove ALL DECIMAL V1 type code ， this is a part of #6073	2021-07-07 10:26:32 +08:00
Zhengguo Yang	ba38973209	use virtual hosted-style request to access object store (#5894 ) * use virtual hosted-style access request object store	2021-05-27 15:52:07 +08:00
Zhengguo Yang	86af8c76a3	[DOC] Add docs of load and export using S3 protocol (#5551 ) Add docs of load and export using S3 protocol	2021-03-27 18:58:29 +08:00
Zhengguo Yang	6ede4c6ec1	[Feature] Support backup,restore,load,export directly connect to s3 (#5399 ) * [doris-1008] support backup and restore directly to cloud storage via aws s3 protocol * Internal][S3DirectAccess] Support backup,restore,load,export directlyconnect to s3 1. Support load and export data from/to s3 directly. 2. Add a config to auto convert broker access to s3 acces when available Change-Id: Iac96d4b3670776708bc96a119ff491db8cb4cde7 (cherry picked from commit 2f03832ca52221cc7436069b96c45c48c4bc7201) * [Internal][S3DirectAccess] File path glob compatible with broker Change-Id: Ie55e07a547aa22c6fa8d432ca926216c10384e68 (cherry picked from commit d4fb25544c0dc06d23e1ada571ec3f8edd4ba56f) * [internal] [doris-1008] fix log4j class not found Change-Id: I468176aca0d821383c74ee658d461aba9e7d5be3 (cherry picked from commit 029adaa9d6ded8503acbd6644c1519456f3db232) * add poms Co-authored-by: yangzhengguo01 <yangzhengguo01@baidu.com>	2021-02-22 16:07:56 +08:00
wangbo	f3aded9370	[Bug] System metric init failed cause be start failed (#5262 ) System metric init failed cause be start failed	2021-02-01 00:10:57 +08:00
Zhengguo Yang	e536823f92	[Thirdparty] Fix build thirdparty may be failed (#5187 ) 1. fix build thirdparty may be failed in some os, because of default lib path is `lib` or`lib64` or `arrow` bulld failed by `brotil` and `zstd` 2. fix canot extract `.tar.bz2` file	2021-01-04 15:21:18 +08:00
Youngwb	650536d53e	[Feature] Add Topn udaf (#4803 ) For #4674 This is a udaf for approximate topn using Space-Saving algorithm. At present, we can only calculate the frequent items and their frequencies in a certain column, based on which we can implement similar topN functions supported by Kylin in the future. I have also added a test to calculate the accuracy of this algorithm. The following is a rough running result. The total amount of data is 1 million lines and follows the Zipfian distribution, where Element Cardinality represents the data cardinality, 20X, 50X.. The value representing space_expand_rate is 20,50, which is used to set the counter number in the space-saving algorithm ``` zf exponent = 0.5 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 94% 98% 99% zf exponent = 0.6，1 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 100% 100% 100% ```	2020-12-16 21:58:34 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
Yingchun Lai	b780df697a	[refactor] Optimize threads usage mode in BE (#4440 ) BE can not graceful exit because some threads are running in endless loop. This patch do the following optimization: - Use the well encapsulated Thread and ThreadPool instead of std::thread and std::vector<std::thread> - Use CountDownLatch in thread's loop condition to avoid endless loop - Introduce a new class Daemon for daemon works, like tcmalloc_gc, memory_maintenance and calculate_metrics - Decouple statistics type TaskWorkerPool and StorageEngine notification by submit tasks to TaskWorkerPool's queue - Reorder objects' stop and deconstruct in main(), i.e. stop network services at first, then internal services - Use libevent in pthreads mode, by calling evthread_use_pthreads(), then EvHttpServer can exit gracefully in multi-threads - Call brpc::Server's Stop() and ClearServices() explicitly	2020-09-06 20:19:14 +08:00

1 2

74 Commits