Commit Graph

382 Commits

Author SHA1 Message Date
a5565f68b2 [Refactor](opentelemetry) Remove opentelemetry (#26605) 2023-11-09 18:05:34 +08:00
2bb3ef1981 [refactor](scan) delete bloom_filter_predicate (#26499) 2023-11-07 19:37:31 +08:00
fa7a38b587 [fix](runtime filter) append late arrival runtime filters in vfilecanner (#25996)
`VFileScanner` will try to append late arrival runtime filters in each loop of `ScannerScheduler::_scanner_scan`.  However, `VFileScanner::_get_next_reader` only generates the `_push_down_conjuncts` in the first loop, so the late arrival runtime filters are ignored.
2023-11-07 09:50:35 +08:00
a5ef90dacc [enhancement](recover) support skipping missing version in select by session variable (#25654) 2023-11-02 20:01:51 +08:00
a4e415ab09 [feature](hive)Support hive tables after alter type. (#25138)
1.Reconstruct the logic of decode to read parquet. The parquet  reader first reads the data according to the parquet physical type, and then performs a type conversion.

2.Support hive alter table.
2023-11-02 00:24:21 +08:00
ec85e22506 [enhance](scanner) pass the tablet in NewOlapScanner's ctor (#26167) 2023-11-01 17:50:14 +08:00
f2874b9452 [bug](shared scan) Fix use-after-free when enable pipeline shared scanning (#26199)
When enable shared scan, all scanners will be created by one instance. When the main instance reach eos and quit, all states of it will be released. But other instances are still possible to get block from those scanners. So we must assure scanners will not be dependent on any states of the main instance after it quit.
2023-11-01 15:51:20 +08:00
8c454a3287 [bug](scanner) Fix scanner core dump (#26156) 2023-10-31 22:23:32 +08:00
e20cab64f4 [improvement](scan) avoid too many scanners for file scan node (#25727)
In previous, when using file scan node(eq, querying hive table), the max number of scanner for each scan node
will be the `doris_scanner_thread_pool_thread_num`(default is 48).
And if the query parallelism is N, the total number of scanner would be 48 * N, which is too many.

In this PR, I change the logic, the max number of scanner for each scan node
will be the `doris_scanner_thread_pool_thread_num / query parallelism`. So that the total number of scanners
will be up to `doris_scanner_thread_pool_thread_num`.

Reduce the number of scanner can significantly reduce the memory usage of query.
2023-10-29 17:41:31 +08:00
46d40b1952 [refactor](executor)Remove empty group logic #26005 2023-10-27 14:24:41 +08:00
c3527672a5 [refactor & pipelineX][pick fix] Pick fix of predicate pushdown to pipelineX (#25953)
Co-authored-by: JackDrogon <jack.xsuperman@gmail.com>
2023-10-26 18:04:43 +08:00
1ba8a9bae4 [feature-wip](executor)Fe send topic info to be (#25798) 2023-10-26 15:52:48 +08:00
6e1a4dbda2 [Fix](predicate pushdown) Common expression not acting on any slot should not be pushed down (#25901) 2023-10-26 11:20:12 +08:00
6dd60c6ebb [Enhance](BE) Add -Wshadow-field compile option to avoid unexpected shadowing behavior (#25698)
* Fix `Tablet::_meta_lock` shadows member inherited from `BaseTablet`

* Add -Wshadow-field compile option to avoid unexpected shadowing behavior
2023-10-26 10:00:28 +08:00
693982fd1a [feature](decimal) support decimal256 (#25386) 2023-10-25 15:47:51 +08:00
6b2eed779c [feature](AuditLog) add scanRows scanBytes in auditlog (#25435) 2023-10-25 10:00:35 +08:00
0e0f8090f7 [refactor](text_convert)Use serde to replace text_convert. (#25543)
Remove text_convert and use serde to replace it.
2023-10-24 09:52:43 +08:00
b5ee4a9dbb [enhancement](profilev2) add some fields for profile v2 (#25611)
Add 3 counters for ExecNode:

ExecTime - Total execution time(excluding the execution time of children).
OutputBytes - The total number of bytes output to parent.
BlockCount - The total count of blocks output to parent.
2023-10-23 15:55:40 +08:00
b964ab76b3 [refactor](shuffle) Simplify hash partitioning strategy (#25596) 2023-10-19 19:28:22 +08:00
54780c62e0 [improvement](executor)Using cgroup to implement cpu hard limit (#25489)
* Using cgroup to implement cpu hard limit

* code style
2023-10-19 18:56:26 +08:00
3d1206d325 [date](fix) modify push-down predicate for datev1 type (#25571)
For comparison predicate, two arguments must be cast to datetime and push down to storage if either one is date type. This PR disables predicate push-down for this case.
2023-10-19 14:18:27 +08:00
9c9fc84f39 [feature](merge-cloud) Abstract BaseTablet for CloudTablet (#24929) 2023-10-18 20:29:04 +08:00
f75ee49cb4 [chore](fmt) Remove stringstream by fmt (#25474)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-10-16 21:31:54 +08:00
be27d4d921 [fix](broker-load) fix use_count() issue when doing broker load in debug mode (#25288)
When executing broker load in ASAN mode, BE may crash with error:
```
F20231010 18:18:17.044978 185490 block.cpp:694] Check failed: d.column->use_count() == 1 (3 vs. 1)
*** Check failure stack trace: ***
    @     0x55e9d94c4e46  google::LogMessage::SendToLog()
    @     0x55e9d94c1410  google::LogMessage::Flush()
    @     0x55e9d94c5689  google::LogMessageFatal::~LogMessageFatal()
    @     0x55e9c509f80d  doris::vectorized::Block::clear_column_data()
    @     0x55e9b6c170b3  doris::PlanFragmentExecutor::get_vectorized_internal()
    @     0x55e9b6c147e6  doris::PlanFragmentExecutor::open_vectorized_internal()
    @     0x55e9b6c12d9a  doris::PlanFragmentExecutor::open()
    @     0x55e9b6c18426  doris::PlanFragmentExecutor::execute()
    @     0x55e9b6945cca  doris::FragmentMgr::_exec_actual()
    @     0x55e9b696456c  doris::FragmentMgr::exec_plan_fragment()::$_0::operator()()
```

It may happen when there is column maping like:
```
(k1,v2,v3,v4,v5,v6,v7,v8)
set (k2=v4,k3=v4,k4=v4)
```

in load stmt.

Case is covered by Baidu test cases
2023-10-12 17:04:29 +08:00
bdb64eab73 [feature](meta) queries as table valued function (#25052) (#25052)
1. Add queries view as table function.
2. Proxy result to other FEs and return merged results back to BE.

Co-authored-by: yiguolei <676222867@qq.com>
2023-10-12 16:26:14 +08:00
7434f80300 [pipelineX](refactor) Refactor pending finish dependency (#25181) 2023-10-10 11:56:02 +08:00
53b46b7e6c [FIX](filter) update for filter_by_select logic (#25007)
this pr is aim to update for filter_by_select logic and change delete limit

only support scala type in delete statement where condition
only support column nullable and predict column support filter_by_select logic, because we can not push down non-scala type to storage layer to pack in predict column but do filter logic
2023-10-09 21:27:40 +08:00
5c020be4d2 [Bug](join) corner case cause the mark join + null aware left join core dump in regression test in pipeline query engine (#25087) 2023-10-08 22:50:12 +08:00
7e9ffad933 [fix](ES catalog)Doris cannot parse ES date field without time zone (#24864)
1. Add support for Doris to parse ES date field without time zone info. eg: `2023-04-17T23:01:18.151`, this time will be treated as UTC time, since ES assumes that the time zone for time fields without time zones is UTC.
2. Change local time zone convertion from system local time zone to session variable time zone.
2023-10-08 19:28:08 +08:00
c3d9f42a3e [fix](scanner) fix load cannot end when set exec_mem_limit (#25090) 2023-10-08 17:07:30 +08:00
7b2ff38401 query cpu hard limit based on doris scheduler (#24844) 2023-10-07 12:03:07 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
430634367a [pipelineX](node)support file scan operator (#24924) 2023-09-27 22:10:43 +08:00
947b116318 [pipelineX](fix) Fix BE crash due to ES scan operator (#24983) 2023-09-27 20:45:38 +08:00
082bcd820b [feature](insert) Support wal for group commit insert (#23053) 2023-09-26 14:46:24 +08:00
513e37bdbf [pipelineX](node)support jdbc scan operator (#24851) 2023-09-26 10:02:51 +08:00
8191cd1dad [Bug](ScanNode) Fix potential incorrect query result caused by concurrent NewOlapScanNode initialization and Compaction (#24638)
* Optimize fetch delete predicates

* Fix incorrect query result when compaction eliminate delete predicates between `NewOlapScanNode::_init_scanners` and `NewOlapScanner::init`

* Fix be ut
2023-09-25 22:24:35 +08:00
3b4d8b4ac8 [pipelineX](feature) Support schema scan operator (#24850) 2023-09-25 14:42:25 +08:00
9412775686 remove useless variable in scanctx (#24849)
remove useless variable in scanctx
2023-09-25 14:36:18 +08:00
39e6512a21 [bug](scanner) Fix memory out of bound in scanner scheduler (#24840) 2023-09-25 09:58:26 +08:00
9579634eac [Debug](pipeline) add log of pipeline scan bug (#24804) 2023-09-25 08:38:31 +08:00
27eed937b3 [pipelineX](es scan) Support ES scan operator (#24824)
Support ES scan operator
2023-09-24 00:32:38 +08:00
8a85a75b8b [chore](scanner) check columns' nullable with schema (#24724)
Add a validation to prevent potential schema inconsistency issues.
2023-09-22 11:34:53 +08:00
c9b2f4cb92 [workload](pipeline) Add cgroup cpu controller (#24052) 2023-09-21 21:49:33 +08:00
1405b7ca82 [improve](scan) support lower the thread priority of scan thread (#24526)
The configuration item is used to lower the priority of the scanner thread,
typically employed to ensure CPU scheduling for write operations.
2023-09-20 17:00:24 +08:00
c0df8fca20 [pipelineX](fix) Fix potential concurrent problem (#24651) 2023-09-20 13:00:58 +08:00
71dcb58db9 [improvement](scanner_schedule) reduce memory consumption of scanner (#24199)
* [improvement](scanner_schedule) reduce memory consumption of scanner

1. limit scanner by memory consumptin rather than blocks.
2. scheduler run correcty instread of at lest 1.
2023-09-19 21:36:23 +08:00
6a33e4639a [schedule](pipeline) Remove wait schedule time in pipeline query engine and change current queue to std::mutex (#24525)
This reverts commit 591aeaa98d1178e2e277278c7afeafef9bdb88d6.
2023-09-18 23:57:56 +08:00
d24f3efd4a [pipelineX](profile) Phase 1: refactor pipelineX detailed profile (#24322) 2023-09-15 16:14:05 +08:00
Pxl
35c5d71549 [Improvement](join) some improvement of hash join (#23972)
some improvement of hash join
2023-09-14 17:55:35 +08:00