Commit Graph

491 Commits

Author SHA1 Message Date
ecb4372479 [Fix](pipelinex) Fix MaxScannerThreadNum calculation error in file scan operator when turn on pipelinex. (#33037)
MaxScannerThreadNum in file scan operator when turn on pipelinex is incorrect, it will cost many memory and causing performance degradation. This PR fix it.
2024-04-07 22:11:27 +08:00
6600e92b12 [scan](status) Finish execution if scanner failed (#32966) 2024-03-29 10:51:15 +08:00
352617a34d [fix](scanner) cached blocks may be empty when VFileScanner return NOT_FOUND (#32745)
Cached blocks may be empty when VFileScanner return NOT_FOUND. This feature is introduced by https://github.com/apache/doris/pull/15226. Move this function inner `VFileScanner`.
2024-03-27 10:01:05 +08:00
Pxl
f579eceb34 [Improvementation](profile) add some profile on vcollect_iterator (#32794)
add some profile on vcollect_iterator
2024-03-26 20:33:16 +08:00
e99b33c274 [opt](file-meta-cache) reduce file meta cache size and disable cache for some cases (#32340)
File meta cache on BE is used to cache the meta for external table's file such as parquet footer.
This cache is counted by number, not memory consumption.
So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache
will be large and causing OOM.

This PR mainly changes:

1. Add a new method `exceed_prune_limit()` for `CachePolicy`
    For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time.

2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

4. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.
2024-03-21 14:07:22 +08:00
2e564036ef [fix](profile) avoid update profile in deconstructor (#32131)
In previous, the counter in `profile` may be updated when close the file reader.
And the file reader may be closed when the object being deconstruted.
But at that time, the `profile` object may already be deleted, causing NPE and BE will crash.

This PR try to fix this issue:

1. Remove the "profile counter update" logic from all `close()` method.

2. Add a new interface `ProfileCollector`

	It has 2 methods:
	
	- `collect_profile_at_runtime()`

		It can be called at runtime, eg, in every `get_next_block()` method.
		So that the counter in profile can be updated at runtime.
		
	- `collect_profile_before_close()`

		Should be called before the object call `close()`. And it will only be called once.
		
3. Derived from `ProfileCollector`

	All classes which may update the profile counter in `close()` method should extends
	the `ProfileCollector`. Such as `GenericReader`, etc. And implement `collect_profile_before_close()`
	
	And `collect_profile_before_close()` will be called in `scanner->mark_to_need_to_close()`.
2024-03-21 14:07:22 +08:00
258dcfca97 [Refactor](executor)Add information_schema.workload_groups (#32195) (#32314) 2024-03-15 20:46:54 +08:00
c8f3643890 [exec](runtimefilter) support null aware in runtime filter (#32152)
null aware in runtime filter
2024-03-15 18:05:13 +08:00
62023d705d [refactor](rename) rename task group to workload group in be (#32204)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-03-15 18:04:02 +08:00
1db57c0667 [Optimization][Scanner] Skip _init_variant_columns when there are no variant columns, and ensure inherit_tablet_index is called only once (#32174) 2024-03-15 18:01:19 +08:00
61928f7df5 [pipelineX](scanner) Use the actual instances num when ignore data distribution (#32081) 2024-03-12 14:20:39 +08:00
b0b7161ad0 [feature](rf) add filter info profile when rf run as expr (#31822) 2024-03-12 14:17:48 +08:00
c5390d00bb [Improvement]Add schema table backend_active_tasks (#31945) 2024-03-09 19:55:48 +08:00
0da010603e [Improve](TabletSchemaCache) reduce duplicated memory consumption for column name and column path (#31141)
Both could be reference to related field in TabletColumn.And use shared_ptr for TabletColumn in TabletSchema for later memory reuse
2024-03-09 19:44:42 +08:00
9bf22a872a [Bug](fix) fix or and "<=>" cause coredump in query (#31884) 2024-03-07 16:53:19 +08:00
28f0b7eb32 [Improvement](profile)Add tvf active_be_tasks() #31815 2024-03-07 16:12:23 +08:00
Pxl
25d1934289 [Feature](topn) support multiple topn filter on backend (#31665)
support multiple topn filter on backend
2024-03-06 13:05:22 +08:00
8a44c180bf [opt](scan) read scan ranges in the order of partitions (#31630) 2024-03-02 01:09:10 +08:00
6e62017ed5 [fix](scanner) allocated_bytes should be called after success (#31428)
allocated_bytes should be called after success
2024-02-27 10:12:36 +08:00
c34639245e [Improvement](executor)add remote scan thread pool (#31376)
* add remote scan thread pool

* +1
2024-02-27 10:12:33 +08:00
35333d7a77 [opt](scanner) scan enough blocks in each scan task (#31277) 2024-02-27 10:12:18 +08:00
52c45e38af [Refactor](RF) refactor the profile of rf and pipeline-x support local ignore (#31287)
* [Refactor](RF) refactor the profile of rf and pipeline-x support local ignore

* fix local merge filter
2024-02-23 19:05:06 +08:00
52b9af06fb [pipelineX](refactor) Delete subclasses inherited from Dependency (#31216) 2024-02-22 13:01:48 +08:00
97c9d75af3 [Feature](executor)Add scan_thread_num property for workload group (#31106) 2024-02-20 16:24:05 +08:00
366a6792bf [refactor](scanner) refactoring and optimizing scanner scheduling (#30746) 2024-02-16 10:12:24 +08:00
4b42156fc0 [chore](clang-tidy): add bugprone linters (#29521)
This PR introduces 4 bugprone linter rules to .clang-tidy, these linters found some bugs in #28965. This PR also add some comments to mute false positive reports.
2024-02-05 21:58:08 +08:00
6289f7e605 [Fix](multi-catalog) Fix truncate_char_or_varchar_column crash. (#30731) 2024-02-03 20:26:04 +08:00
4f8730d092 [improvement](jdbc catalog) Optimize connection pool parameter settings (#30588)
This PR makes the following changes to the connection pool of JDBC Catalog
1. Set the maximum connection survival time, the default is 30 minutes

-   Moreover, one-half of the maximum survival time is the recyclable time,
-   One-tenth is the check interval for recycling connections

2. Keepalive only takes effect on the connection pool on BE, and will be activated based on one-fifth of the maximum survival time.
3. The maximum number of existing connections is changed from 100 to 10
4. Add the connection cache recycling thread on BE, and add a parameter to control the recycling time, the default is 28800 (8 hours)
5. Add CatalogID to the key of the connection pool cache to achieve better isolation, requires refresh catalog to take effect
6. Upgrade druid connection pool to version 1.2.20
7. Added JdbcResource's setting of default parameters when upgrading the FE version to avoid errors due to unset parameters.
2024-02-03 20:26:03 +08:00
045225a096 [pipelineX](profile) Fix Tablet counter on pipelineX engine (#30613) 2024-01-31 23:53:39 +08:00
7d037c12bf [bugfix](paimon)fix paimon testcases (#30514)
1. set default timezone
2. not supported `char` type to pushdown
2024-01-31 23:53:39 +08:00
378d9e7336 [Colo][Scan] delete the colo scan code (#30584) 2024-01-31 23:53:39 +08:00
129463f557 [Try_Fix](scan) try fix the scanner schedule logic to prevent excessive memory usage and timeout (#30515) 2024-01-30 15:31:22 +08:00
bedad15f03 [enhancement](scanner) add a lower bound for bytes in scanner queue (#29624) 2024-01-27 09:13:21 +08:00
9e0c518aaf [Feature](executor)Workload Group support Non-Pipeline Execution (#30164) 2024-01-23 10:11:25 +08:00
d3bf23d70d [chore](removelogs) remove debug query timeout logs (#30006)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-16 18:48:18 +08:00
779ed25972 [fix](scan) crashing caused by unlocked reading of tablet (#30005) 2024-01-16 18:46:19 +08:00
e35b26f4fc [feature](auditlog)Add runtime cpu time/peak memory metric (#29925) 2024-01-16 18:39:00 +08:00
b7b8e59392 [opt](scanner) use buffered queue to avoid acquiring locks frequently (#29938) 2024-01-16 18:37:44 +08:00
c8845c9e07 [opt](scanner) Improve the efficiency of TOPN opt (#29937) 2024-01-16 18:37:44 +08:00
5e697990a8 [bugfix](timeout) serving_blocks_num may cause timeout, try to fix it (#29912)
Although serving_blocks_num is an atomic variable. It's ++ and -- are not protected by transfer lock.
I am not sure the memory order of ++ and --.
I think it maybe the root cause of query timeout. So that I remove the check and test it in github pipeline.
2024-01-16 18:34:19 +08:00
e4e57e9b05 [chore](removelogs) remove debug query timeout logs 2024-01-12 14:37:20 +08:00
ad2c13e009 [Optimize](kill-query)Support the scanners exits as soon as possible when kill query #29803 2024-01-12 13:58:19 +08:00
0d691c638b [Feature](profile)Support report runtime workload statistics #29591 2024-01-12 11:59:27 +08:00
ca75c9b8ab add more logs to debug timeout
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-12 11:48:39 +08:00
abb7640d37 [debug](timeout) add more log in scanner ctx to find timeout problem #29704
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-12 11:44:21 +08:00
8fc9c18c85 [improvement](jdbc catalog) Put the jdbc connection pool parameters into catalog properties (#29195) 2024-01-12 11:40:28 +08:00
9ef4e49307 [bugfix](scannerdeadloop) there is a dead loop in scanner ctx (#29794)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-11 16:47:54 +08:00
c497f749ce [debug](timeout) debug select timeout (#29627)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-07 19:54:02 +08:00
0b731800a0 [enhancement](group_commit) refector wal manager code (#29560) 2024-01-07 18:54:41 +08:00
f28dbc702c [bugfix](scanner done) should not set process status to query context (#29512)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-01-04 15:18:10 +08:00