doris

Author	SHA1	Message	Date
Qi Chen	ecb4372479	[Fix](pipelinex) Fix `MaxScannerThreadNum` calculation error in file scan operator when turn on pipelinex. (#33037 ) MaxScannerThreadNum in file scan operator when turn on pipelinex is incorrect, it will cost many memory and causing performance degradation. This PR fix it.	2024-04-07 22:11:27 +08:00
Gabriel	6600e92b12	[scan](status) Finish execution if scanner failed (#32966 )	2024-03-29 10:51:15 +08:00
Ashin Gau	352617a34d	[fix](scanner) cached blocks may be empty when VFileScanner return NOT_FOUND (#32745 ) Cached blocks may be empty when VFileScanner return NOT_FOUND. This feature is introduced by https://github.com/apache/doris/pull/15226. Move this function inner `VFileScanner`.	2024-03-27 10:01:05 +08:00
Pxl	f579eceb34	[Improvementation](profile) add some profile on vcollect_iterator (#32794 ) add some profile on vcollect_iterator	2024-03-26 20:33:16 +08:00
Mingyu Chen	e99b33c274	[opt](file-meta-cache) reduce file meta cache size and disable cache for some cases (#32340 ) File meta cache on BE is used to cache the meta for external table's file such as parquet footer. This cache is counted by number, not memory consumption. So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache will be large and causing OOM. This PR mainly changes: 1. Add a new method `exceed_prune_limit()` for `CachePolicy` For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time. 2. Reduce the default capability of file meta cache, from 20000 to 1000 Also change the default capability of hdfs file handle cache, from 20000 to 1000 4. Change judgement of whether enable file meta cache when querying If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache will be disabled for this query. Because cache is useless if there are too many files.	2024-03-21 14:07:22 +08:00
Mingyu Chen	2e564036ef	[fix](profile) avoid update profile in deconstructor (#32131 ) In previous, the counter in `profile` may be updated when close the file reader. And the file reader may be closed when the object being deconstruted. But at that time, the `profile` object may already be deleted, causing NPE and BE will crash. This PR try to fix this issue: 1. Remove the "profile counter update" logic from all `close()` method. 2. Add a new interface `ProfileCollector` It has 2 methods: - `collect_profile_at_runtime()` It can be called at runtime, eg, in every `get_next_block()` method. So that the counter in profile can be updated at runtime. - `collect_profile_before_close()` Should be called before the object call `close()`. And it will only be called once. 3. Derived from `ProfileCollector` All classes which may update the profile counter in `close()` method should extends the `ProfileCollector`. Such as `GenericReader`, etc. And implement `collect_profile_before_close()` And `collect_profile_before_close()` will be called in `scanner->mark_to_need_to_close()`.	2024-03-21 14:07:22 +08:00
wangbo	258dcfca97	[Refactor](executor)Add information_schema.workload_groups (#32195 ) (#32314 )	2024-03-15 20:46:54 +08:00
HappenLee	c8f3643890	[exec](runtimefilter) support null aware in runtime filter (#32152 ) null aware in runtime filter	2024-03-15 18:05:13 +08:00
yiguolei	62023d705d	[refactor](rename) rename task group to workload group in be (#32204 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-03-15 18:04:02 +08:00
lihangyu	1db57c0667	[Optimization][Scanner] Skip `_init_variant_columns` when there are no variant columns, and ensure `inherit_tablet_index` is called only once (#32174 )	2024-03-15 18:01:19 +08:00
Gabriel	61928f7df5	[pipelineX](scanner) Use the actual instances num when ignore data distribution (#32081 )	2024-03-12 14:20:39 +08:00
Mryange	b0b7161ad0	[feature](rf) add filter info profile when rf run as expr (#31822 )	2024-03-12 14:17:48 +08:00
wangbo	c5390d00bb	[Improvement]Add schema table backend_active_tasks (#31945 )	2024-03-09 19:55:48 +08:00
lihangyu	0da010603e	[Improve](TabletSchemaCache) reduce duplicated memory consumption for column name and column path (#31141 ) Both could be reference to related field in TabletColumn.And use shared_ptr for TabletColumn in TabletSchema for later memory reuse	2024-03-09 19:44:42 +08:00
HappenLee	9bf22a872a	[Bug](fix) fix or and "<=>" cause coredump in query (#31884 )	2024-03-07 16:53:19 +08:00
wangbo	28f0b7eb32	[Improvement](profile)Add tvf active_be_tasks() #31815	2024-03-07 16:12:23 +08:00
Pxl	25d1934289	[Feature](topn) support multiple topn filter on backend (#31665 ) support multiple topn filter on backend	2024-03-06 13:05:22 +08:00
Ashin Gau	8a44c180bf	[opt](scan) read scan ranges in the order of partitions (#31630 )	2024-03-02 01:09:10 +08:00
Ashin Gau	6e62017ed5	[fix](scanner) allocated_bytes should be called after success (#31428 ) allocated_bytes should be called after success	2024-02-27 10:12:36 +08:00
wangbo	c34639245e	[Improvement](executor)add remote scan thread pool (#31376 ) * add remote scan thread pool * +1	2024-02-27 10:12:33 +08:00
Ashin Gau	35333d7a77	[opt](scanner) scan enough blocks in each scan task (#31277 )	2024-02-27 10:12:18 +08:00
HappenLee	52c45e38af	[Refactor](RF) refactor the profile of rf and pipeline-x support local ignore (#31287 ) * [Refactor](RF) refactor the profile of rf and pipeline-x support local ignore * fix local merge filter	2024-02-23 19:05:06 +08:00
Gabriel	52b9af06fb	[pipelineX](refactor) Delete subclasses inherited from Dependency (#31216 )	2024-02-22 13:01:48 +08:00
wangbo	97c9d75af3	[Feature](executor)Add scan_thread_num property for workload group (#31106 )	2024-02-20 16:24:05 +08:00
Ashin Gau	366a6792bf	[refactor](scanner) refactoring and optimizing scanner scheduling (#30746 )	2024-02-16 10:12:24 +08:00
py023	4b42156fc0	[chore](clang-tidy): add bugprone linters (#29521 ) This PR introduces 4 bugprone linter rules to .clang-tidy, these linters found some bugs in #28965. This PR also add some comments to mute false positive reports.	2024-02-05 21:58:08 +08:00
Qi Chen	6289f7e605	[Fix](multi-catalog) Fix truncate_char_or_varchar_column crash. (#30731 )	2024-02-03 20:26:04 +08:00
zy-kkk	4f8730d092	[improvement](jdbc catalog) Optimize connection pool parameter settings (#30588 ) This PR makes the following changes to the connection pool of JDBC Catalog 1. Set the maximum connection survival time, the default is 30 minutes - Moreover, one-half of the maximum survival time is the recyclable time, - One-tenth is the check interval for recycling connections 2. Keepalive only takes effect on the connection pool on BE, and will be activated based on one-fifth of the maximum survival time. 3. The maximum number of existing connections is changed from 100 to 10 4. Add the connection cache recycling thread on BE, and add a parameter to control the recycling time, the default is 28800 (8 hours) 5. Add CatalogID to the key of the connection pool cache to achieve better isolation, requires refresh catalog to take effect 6. Upgrade druid connection pool to version 1.2.20 7. Added JdbcResource's setting of default parameters when upgrading the FE version to avoid errors due to unset parameters.	2024-02-03 20:26:03 +08:00
Gabriel	045225a096	[pipelineX](profile) Fix Tablet counter on pipelineX engine (#30613 )	2024-01-31 23:53:39 +08:00
wuwenchi	7d037c12bf	[bugfix](paimon)fix paimon testcases (#30514 ) 1. set default timezone 2. not supported `char` type to pushdown	2024-01-31 23:53:39 +08:00
HappenLee	378d9e7336	[Colo][Scan] delete the colo scan code (#30584 )	2024-01-31 23:53:39 +08:00
HappenLee	129463f557	[Try_Fix](scan) try fix the scanner schedule logic to prevent excessive memory usage and timeout (#30515 )	2024-01-30 15:31:22 +08:00
Yongqiang YANG	bedad15f03	[enhancement](scanner) add a lower bound for bytes in scanner queue (#29624 )	2024-01-27 09:13:21 +08:00
wangbo	9e0c518aaf	[Feature](executor)Workload Group support Non-Pipeline Execution (#30164 )	2024-01-23 10:11:25 +08:00
yiguolei	d3bf23d70d	[chore](removelogs) remove debug query timeout logs (#30006 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-01-16 18:48:18 +08:00
Jerry Hu	779ed25972	[fix](scan) crashing caused by unlocked reading of tablet (#30005 )	2024-01-16 18:46:19 +08:00
wangbo	e35b26f4fc	[feature](auditlog)Add runtime cpu time/peak memory metric (#29925 )	2024-01-16 18:39:00 +08:00
Jerry Hu	b7b8e59392	[opt](scanner) use buffered queue to avoid acquiring locks frequently (#29938 )	2024-01-16 18:37:44 +08:00
Jerry Hu	c8845c9e07	[opt](scanner) Improve the efficiency of TOPN opt (#29937 )	2024-01-16 18:37:44 +08:00
yiguolei	5e697990a8	[bugfix](timeout) serving_blocks_num may cause timeout, try to fix it (#29912 ) Although serving_blocks_num is an atomic variable. It's ++ and -- are not protected by transfer lock. I am not sure the memory order of ++ and --. I think it maybe the root cause of query timeout. So that I remove the check and test it in github pipeline.	2024-01-16 18:34:19 +08:00
yiguolei	e4e57e9b05	[chore](removelogs) remove debug query timeout logs	2024-01-12 14:37:20 +08:00
Xujian Duan	ad2c13e009	[Optimize](kill-query)Support the scanners exits as soon as possible when kill query #29803	2024-01-12 13:58:19 +08:00
wangbo	0d691c638b	[Feature](profile)Support report runtime workload statistics #29591	2024-01-12 11:59:27 +08:00
yiguolei	ca75c9b8ab	add more logs to debug timeout Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-01-12 11:48:39 +08:00
yiguolei	abb7640d37	[debug](timeout) add more log in scanner ctx to find timeout problem #29704 Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-01-12 11:44:21 +08:00
zy-kkk	8fc9c18c85	[improvement](jdbc catalog) Put the jdbc connection pool parameters into catalog properties (#29195 )	2024-01-12 11:40:28 +08:00
yiguolei	9ef4e49307	[bugfix](scannerdeadloop) there is a dead loop in scanner ctx (#29794 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-01-11 16:47:54 +08:00
yiguolei	c497f749ce	[debug](timeout) debug select timeout (#29627 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-01-07 19:54:02 +08:00
huanghaibin	0b731800a0	[enhancement](group_commit) refector wal manager code (#29560 )	2024-01-07 18:54:41 +08:00
yiguolei	f28dbc702c	[bugfix](scanner done) should not set process status to query context (#29512 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-01-04 15:18:10 +08:00

1 2 3 4 5 ...

491 Commits