Commit Graph

515 Commits

Author SHA1 Message Date
eb49cd839b [refactor](datalake) return the error status instead of static_cast<void> (#34873)
Followup #34797
`static_cast<void>` has ignored the wrong status, some of them should make the query finished with error status, so replace `static_cast<void>`  with `RETURN_IF_ERROR`.

The following three scenarios need to be handled separately and cannot be simply replaced:
1. The outer function returns void;
2. Call status function inner constructors or destructors;
3. Call status function with best effort, and should ignore the wrong status.
2024-05-23 19:06:21 +08:00
adc364a6fd [feature](Paimon) support deletion vector for Paimon naive reader (#34743) (#35241)
bp #34743
Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-05-23 00:01:30 +08:00
903ff32021 [opt](fe) exit FE when transfer to (non)master failed (#34809) (#35158)
bp #34809
2024-05-21 22:31:47 +08:00
98f8eb5c43 [opt](split) get file splits in batch mode (#34032) (#35107)
bp  #34032
2024-05-21 22:27:07 +08:00
1f0c45204b [fix](iceberg) read the primary key columns if hasing equality delete (#34884)
backport: #34835
2024-05-15 11:37:25 +08:00
cc00666be6 [opt](inverted index) add inlist condition handling to compound (#34134)
1. Previously, the compound did not support the inlist condition, which could impact performance if an inverted index was created.
2024-05-10 14:35:47 +08:00
e085f75a43 [opt](file-scanner) print current path when encountering error (#34365) (#34523)
bp #34365
2024-05-08 14:49:03 +08:00
1bfe0f0393 [feature](iceberg)support read iceberg complex type,iceberg.orc format and position delete. (#33935) (#34256)
master #33935
2024-04-29 14:40:12 +08:00
9f0a5690a6 [profile](scan) add projection time in scaner #34120 2024-04-26 07:43:40 +08:00
47b54d4bd5 Fix remote scan pool (#33976) 2024-04-25 15:04:43 +08:00
4d7ac82305 [profile](scanner) Fix wrong metrics (#33965) 2024-04-22 22:33:24 +08:00
59de97be5e [improvement](mow) Add profile for delete_bitmap get_agg function (#33576) 2024-04-17 23:42:13 +08:00
2cd4012541 [opt](scan) read scan ranges in the order of partitions (#33515) (#33657)
backport: #33515
2024-04-17 23:42:12 +08:00
8ee8de7857 [Fix](executor)reset remote scan thread num #33579 2024-04-17 23:42:11 +08:00
249a9c9875 [Feature](Variant) support aggregation model for Variant type (#33493)
refactor use `insert_from` to replace `replace_column_data` for variable lengths columns
2024-04-17 23:42:00 +08:00
6bcf24b1f6 [bug](not in) if not in (null) could eos early (#33482)
* [bug](not in) if not in (null) could eos early
2024-04-17 23:41:59 +08:00
Pxl
5f30463bb3 [Chore](descriptors) remove unused codes for descriptors (#33408)
remove unused codes for descriptors
2024-04-12 15:09:25 +08:00
f7d52b5b1c [feature](expr) add type check when expr prepare (#33330) 2024-04-11 09:31:50 +08:00
Pxl
8fd6d4c41b [Chore](build) add -Wconversion and remove some unused code (#33127)
add -Wconversion and remove some unused code
2024-04-10 15:26:08 +08:00
8e19cdd745 [featrue](expr) support common subexpression elimination be part (#32673) 2024-04-10 11:56:21 +08:00
cf7595d423 [opt](memory) Optimize mem tracker accuracy (#32039) (#33140) 2024-04-10 11:42:19 +08:00
3c4ccb3981 Revert "[opt](scan) read scan ranges in the order of partitions (#31630)"
This reverts commit 5d99dffe6f1a3fcb107ce56181aeff96ef222def.
2024-04-09 12:37:31 +08:00
0234976ab7 [refactor](meta scan) Remove RPC from execute threads (#33378) 2024-04-08 20:28:02 +08:00
a8232c67f9 [pipelineX](runtime filter) Fix task timeout caused by runtime filter (#33332) (#33369) 2024-04-08 16:30:32 +08:00
ecb4372479 [Fix](pipelinex) Fix MaxScannerThreadNum calculation error in file scan operator when turn on pipelinex. (#33037)
MaxScannerThreadNum in file scan operator when turn on pipelinex is incorrect, it will cost many memory and causing performance degradation. This PR fix it.
2024-04-07 22:11:27 +08:00
6600e92b12 [scan](status) Finish execution if scanner failed (#32966) 2024-03-29 10:51:15 +08:00
352617a34d [fix](scanner) cached blocks may be empty when VFileScanner return NOT_FOUND (#32745)
Cached blocks may be empty when VFileScanner return NOT_FOUND. This feature is introduced by https://github.com/apache/doris/pull/15226. Move this function inner `VFileScanner`.
2024-03-27 10:01:05 +08:00
Pxl
f579eceb34 [Improvementation](profile) add some profile on vcollect_iterator (#32794)
add some profile on vcollect_iterator
2024-03-26 20:33:16 +08:00
e99b33c274 [opt](file-meta-cache) reduce file meta cache size and disable cache for some cases (#32340)
File meta cache on BE is used to cache the meta for external table's file such as parquet footer.
This cache is counted by number, not memory consumption.
So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache
will be large and causing OOM.

This PR mainly changes:

1. Add a new method `exceed_prune_limit()` for `CachePolicy`
    For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time.

2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

4. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.
2024-03-21 14:07:22 +08:00
2e564036ef [fix](profile) avoid update profile in deconstructor (#32131)
In previous, the counter in `profile` may be updated when close the file reader.
And the file reader may be closed when the object being deconstruted.
But at that time, the `profile` object may already be deleted, causing NPE and BE will crash.

This PR try to fix this issue:

1. Remove the "profile counter update" logic from all `close()` method.

2. Add a new interface `ProfileCollector`

	It has 2 methods:
	
	- `collect_profile_at_runtime()`

		It can be called at runtime, eg, in every `get_next_block()` method.
		So that the counter in profile can be updated at runtime.
		
	- `collect_profile_before_close()`

		Should be called before the object call `close()`. And it will only be called once.
		
3. Derived from `ProfileCollector`

	All classes which may update the profile counter in `close()` method should extends
	the `ProfileCollector`. Such as `GenericReader`, etc. And implement `collect_profile_before_close()`
	
	And `collect_profile_before_close()` will be called in `scanner->mark_to_need_to_close()`.
2024-03-21 14:07:22 +08:00
258dcfca97 [Refactor](executor)Add information_schema.workload_groups (#32195) (#32314) 2024-03-15 20:46:54 +08:00
c8f3643890 [exec](runtimefilter) support null aware in runtime filter (#32152)
null aware in runtime filter
2024-03-15 18:05:13 +08:00
62023d705d [refactor](rename) rename task group to workload group in be (#32204)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-03-15 18:04:02 +08:00
1db57c0667 [Optimization][Scanner] Skip _init_variant_columns when there are no variant columns, and ensure inherit_tablet_index is called only once (#32174) 2024-03-15 18:01:19 +08:00
61928f7df5 [pipelineX](scanner) Use the actual instances num when ignore data distribution (#32081) 2024-03-12 14:20:39 +08:00
b0b7161ad0 [feature](rf) add filter info profile when rf run as expr (#31822) 2024-03-12 14:17:48 +08:00
c5390d00bb [Improvement]Add schema table backend_active_tasks (#31945) 2024-03-09 19:55:48 +08:00
0da010603e [Improve](TabletSchemaCache) reduce duplicated memory consumption for column name and column path (#31141)
Both could be reference to related field in TabletColumn.And use shared_ptr for TabletColumn in TabletSchema for later memory reuse
2024-03-09 19:44:42 +08:00
9bf22a872a [Bug](fix) fix or and "<=>" cause coredump in query (#31884) 2024-03-07 16:53:19 +08:00
28f0b7eb32 [Improvement](profile)Add tvf active_be_tasks() #31815 2024-03-07 16:12:23 +08:00
Pxl
25d1934289 [Feature](topn) support multiple topn filter on backend (#31665)
support multiple topn filter on backend
2024-03-06 13:05:22 +08:00
8a44c180bf [opt](scan) read scan ranges in the order of partitions (#31630) 2024-03-02 01:09:10 +08:00
6e62017ed5 [fix](scanner) allocated_bytes should be called after success (#31428)
allocated_bytes should be called after success
2024-02-27 10:12:36 +08:00
c34639245e [Improvement](executor)add remote scan thread pool (#31376)
* add remote scan thread pool

* +1
2024-02-27 10:12:33 +08:00
35333d7a77 [opt](scanner) scan enough blocks in each scan task (#31277) 2024-02-27 10:12:18 +08:00
52c45e38af [Refactor](RF) refactor the profile of rf and pipeline-x support local ignore (#31287)
* [Refactor](RF) refactor the profile of rf and pipeline-x support local ignore

* fix local merge filter
2024-02-23 19:05:06 +08:00
52b9af06fb [pipelineX](refactor) Delete subclasses inherited from Dependency (#31216) 2024-02-22 13:01:48 +08:00
97c9d75af3 [Feature](executor)Add scan_thread_num property for workload group (#31106) 2024-02-20 16:24:05 +08:00
366a6792bf [refactor](scanner) refactoring and optimizing scanner scheduling (#30746) 2024-02-16 10:12:24 +08:00
4b42156fc0 [chore](clang-tidy): add bugprone linters (#29521)
This PR introduces 4 bugprone linter rules to .clang-tidy, these linters found some bugs in #28965. This PR also add some comments to mute false positive reports.
2024-02-05 21:58:08 +08:00