Commit Graph

404 Commits

Author SHA1 Message Date
54b5d04ff9 [improve](csv_reader) handle csv reader error (#27892) 2023-12-02 10:05:02 +08:00
1706699e7e [fix](multi-catalog)support the max compute partition prune (#27154)
1. max compute partition prune,
we just support filter mc partitions by '=',it can filter just one partition
to support multiple partition filter and range operator('>','<', '>='..), the partition prune should be supported.

2. add max compute row count cache and partitionValues cache

3. add max compute regression case
2023-12-01 22:28:26 +08:00
68525fc112 [feature](profile) add RuntimeFilterInfo in merge profile #27869 2023-12-01 21:42:25 +08:00
7398c3daf1 [Feature-Variant](Variant Type) support variant type query and index (#27676) 2023-11-29 10:37:28 +08:00
f565f60bc3 [refactor](standard)BE:Initialize pointer variables in the class to nullptr by default (#27587) 2023-11-28 13:02:30 +08:00
d10a708fa2 [improve](jdbc catalog) add profile for jdbc scan (#27447) 2023-11-27 10:33:39 +08:00
dfe3a2dd01 [feature](mtmv)(3)Implementing multi table materialized views (#26146)
Introduction to Main Classes:
- MTMVService:MTMV services for other modules to call
- MTMVHookService:All operations that affect the MTMV
  - MTMVJobManager:All operations that affect the MTMV job
  - MTMVCacheManager:All operations that affect the MTMV Cache
- MTMVTask&MTMVJob:Inherit from job framework
2023-11-24 12:34:38 +08:00
2ea33518b0 [Opt](load) use batching to optimize auto partition (#26915)
use batching to optimize auto partition
2023-11-23 19:12:28 +08:00
b457856bd2 [chore](be) remove bthread scanner related codes (#27417) 2023-11-23 15:18:49 +08:00
5442e8d1fc [pipelineX](dependency) split different dependencies (#27366) 2023-11-22 12:50:39 +08:00
1ebb54afdc [fix](null equal) fix coredump of pushing eq_for_null (#27341) 2023-11-21 18:36:33 +08:00
459f75073f [pipelineX](dependency) remove OrDependency (#27242) 2023-11-20 13:05:34 +08:00
b1eef30b49 [pipelineX](dependency) Wake up task by dependencies (#26879)
---------

Co-authored-by: Mryange <2319153948@qq.com>
2023-11-18 03:20:24 +08:00
5d548935e0 [improvement](insert) support schema change and decommission for group commit (#26359) 2023-11-17 21:41:38 +08:00
52995c528e [fix](iceberg) iceberg use customer method to encode special characters of field name (#27108)
Fix two bugs:
1. Missing column is case sensitive, change the column name to lower case in FE for hive/iceberg/hudi
2. Iceberg use custom method to encode special characters in column name. Decode the column name to match the right column in parquet reader.
2023-11-17 18:38:55 +08:00
0491437a86 [Opt](scanner-scheduler) Optimize BlockingQueue, BlockingPriorityQueue and change remote scan thread pool. (#26784)
## Proposed changes
- Optimize `BlockingQueue`, `BlockingPriorityQueue` by swapping `notify` and `unlock` to reduce lock competition. Ref: https://www.boost.org/doc/libs/1_54_0/boost/thread/sync_bounded_queue.hpp
- Change remote scan thread pool to `PriorityQueue`.

### Test result
Before:
```
mysql> select  sum(lo_partkey)  from  lineorder;
+-----------------+
| sum(lo_partkey) |
+-----------------+
| 300021444265405 |
+-----------------+
1 row in set (1.11 sec)
```

After:
```
mysql> select  sum(lo_partkey)  from  lineorder;
+-----------------+
| sum(lo_partkey) |
+-----------------+
| 300021444265405 |
+-----------------+
1 row in set (0.80 sec)
```
2023-11-15 18:24:36 +08:00
5ad49dceaa [fix](scanner_schedule) scanner hangs due to negative num_running_scanners (#26816)
* [fix] scanner hangs due to negative num_running_scanners

Before the patch, num_running_scanners is increased after submitting,
then it may be decreased before increasing then negative values can
be seen by get_block_from_queue and a expected submit does not happend.

Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
2023-11-13 23:03:49 +08:00
504ec324bb Revert "[refactor](scan) delete bloom_filter_predicate (#26499)" (#26851)
This reverts commit 2bb3ef198144954583aea106591959ee09932cba.
2023-11-13 16:27:23 +08:00
2f32a721ee [refactor](jni) unified jni framework for jdbc catalog (#26317)
This commit overhauls the JDBC connector logic within our project, transitioning from the previous mechanism of fetching data through JNI calls for individual ResultSet items to a more efficient and unified approach using the VectorTable data structure.
2023-11-13 14:28:15 +08:00
d9e0a9fa2e [enhancement](230) print max version and spec version when -230 happens (#26643)
More information is provided.
2023-11-13 09:57:22 +08:00
66054a5c78 [opt](scanner) increase the connection num of s3 client (#26795) 2023-11-12 00:29:11 -06:00
196fadc044 [enhancement](metrics) enhance visibility of flush thread pool (#26544) 2023-11-11 19:53:24 +08:00
a5565f68b2 [Refactor](opentelemetry) Remove opentelemetry (#26605) 2023-11-09 18:05:34 +08:00
2bb3ef1981 [refactor](scan) delete bloom_filter_predicate (#26499) 2023-11-07 19:37:31 +08:00
fa7a38b587 [fix](runtime filter) append late arrival runtime filters in vfilecanner (#25996)
`VFileScanner` will try to append late arrival runtime filters in each loop of `ScannerScheduler::_scanner_scan`.  However, `VFileScanner::_get_next_reader` only generates the `_push_down_conjuncts` in the first loop, so the late arrival runtime filters are ignored.
2023-11-07 09:50:35 +08:00
a5ef90dacc [enhancement](recover) support skipping missing version in select by session variable (#25654) 2023-11-02 20:01:51 +08:00
a4e415ab09 [feature](hive)Support hive tables after alter type. (#25138)
1.Reconstruct the logic of decode to read parquet. The parquet  reader first reads the data according to the parquet physical type, and then performs a type conversion.

2.Support hive alter table.
2023-11-02 00:24:21 +08:00
ec85e22506 [enhance](scanner) pass the tablet in NewOlapScanner's ctor (#26167) 2023-11-01 17:50:14 +08:00
f2874b9452 [bug](shared scan) Fix use-after-free when enable pipeline shared scanning (#26199)
When enable shared scan, all scanners will be created by one instance. When the main instance reach eos and quit, all states of it will be released. But other instances are still possible to get block from those scanners. So we must assure scanners will not be dependent on any states of the main instance after it quit.
2023-11-01 15:51:20 +08:00
8c454a3287 [bug](scanner) Fix scanner core dump (#26156) 2023-10-31 22:23:32 +08:00
e20cab64f4 [improvement](scan) avoid too many scanners for file scan node (#25727)
In previous, when using file scan node(eq, querying hive table), the max number of scanner for each scan node
will be the `doris_scanner_thread_pool_thread_num`(default is 48).
And if the query parallelism is N, the total number of scanner would be 48 * N, which is too many.

In this PR, I change the logic, the max number of scanner for each scan node
will be the `doris_scanner_thread_pool_thread_num / query parallelism`. So that the total number of scanners
will be up to `doris_scanner_thread_pool_thread_num`.

Reduce the number of scanner can significantly reduce the memory usage of query.
2023-10-29 17:41:31 +08:00
46d40b1952 [refactor](executor)Remove empty group logic #26005 2023-10-27 14:24:41 +08:00
c3527672a5 [refactor & pipelineX][pick fix] Pick fix of predicate pushdown to pipelineX (#25953)
Co-authored-by: JackDrogon <jack.xsuperman@gmail.com>
2023-10-26 18:04:43 +08:00
1ba8a9bae4 [feature-wip](executor)Fe send topic info to be (#25798) 2023-10-26 15:52:48 +08:00
6e1a4dbda2 [Fix](predicate pushdown) Common expression not acting on any slot should not be pushed down (#25901) 2023-10-26 11:20:12 +08:00
6dd60c6ebb [Enhance](BE) Add -Wshadow-field compile option to avoid unexpected shadowing behavior (#25698)
* Fix `Tablet::_meta_lock` shadows member inherited from `BaseTablet`

* Add -Wshadow-field compile option to avoid unexpected shadowing behavior
2023-10-26 10:00:28 +08:00
693982fd1a [feature](decimal) support decimal256 (#25386) 2023-10-25 15:47:51 +08:00
6b2eed779c [feature](AuditLog) add scanRows scanBytes in auditlog (#25435) 2023-10-25 10:00:35 +08:00
0e0f8090f7 [refactor](text_convert)Use serde to replace text_convert. (#25543)
Remove text_convert and use serde to replace it.
2023-10-24 09:52:43 +08:00
b5ee4a9dbb [enhancement](profilev2) add some fields for profile v2 (#25611)
Add 3 counters for ExecNode:

ExecTime - Total execution time(excluding the execution time of children).
OutputBytes - The total number of bytes output to parent.
BlockCount - The total count of blocks output to parent.
2023-10-23 15:55:40 +08:00
b964ab76b3 [refactor](shuffle) Simplify hash partitioning strategy (#25596) 2023-10-19 19:28:22 +08:00
54780c62e0 [improvement](executor)Using cgroup to implement cpu hard limit (#25489)
* Using cgroup to implement cpu hard limit

* code style
2023-10-19 18:56:26 +08:00
3d1206d325 [date](fix) modify push-down predicate for datev1 type (#25571)
For comparison predicate, two arguments must be cast to datetime and push down to storage if either one is date type. This PR disables predicate push-down for this case.
2023-10-19 14:18:27 +08:00
9c9fc84f39 [feature](merge-cloud) Abstract BaseTablet for CloudTablet (#24929) 2023-10-18 20:29:04 +08:00
f75ee49cb4 [chore](fmt) Remove stringstream by fmt (#25474)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-10-16 21:31:54 +08:00
be27d4d921 [fix](broker-load) fix use_count() issue when doing broker load in debug mode (#25288)
When executing broker load in ASAN mode, BE may crash with error:
```
F20231010 18:18:17.044978 185490 block.cpp:694] Check failed: d.column->use_count() == 1 (3 vs. 1)
*** Check failure stack trace: ***
    @     0x55e9d94c4e46  google::LogMessage::SendToLog()
    @     0x55e9d94c1410  google::LogMessage::Flush()
    @     0x55e9d94c5689  google::LogMessageFatal::~LogMessageFatal()
    @     0x55e9c509f80d  doris::vectorized::Block::clear_column_data()
    @     0x55e9b6c170b3  doris::PlanFragmentExecutor::get_vectorized_internal()
    @     0x55e9b6c147e6  doris::PlanFragmentExecutor::open_vectorized_internal()
    @     0x55e9b6c12d9a  doris::PlanFragmentExecutor::open()
    @     0x55e9b6c18426  doris::PlanFragmentExecutor::execute()
    @     0x55e9b6945cca  doris::FragmentMgr::_exec_actual()
    @     0x55e9b696456c  doris::FragmentMgr::exec_plan_fragment()::$_0::operator()()
```

It may happen when there is column maping like:
```
(k1,v2,v3,v4,v5,v6,v7,v8)
set (k2=v4,k3=v4,k4=v4)
```

in load stmt.

Case is covered by Baidu test cases
2023-10-12 17:04:29 +08:00
bdb64eab73 [feature](meta) queries as table valued function (#25052) (#25052)
1. Add queries view as table function.
2. Proxy result to other FEs and return merged results back to BE.

Co-authored-by: yiguolei <676222867@qq.com>
2023-10-12 16:26:14 +08:00
7434f80300 [pipelineX](refactor) Refactor pending finish dependency (#25181) 2023-10-10 11:56:02 +08:00
53b46b7e6c [FIX](filter) update for filter_by_select logic (#25007)
this pr is aim to update for filter_by_select logic and change delete limit

only support scala type in delete statement where condition
only support column nullable and predict column support filter_by_select logic, because we can not push down non-scala type to storage layer to pack in predict column but do filter logic
2023-10-09 21:27:40 +08:00
5c020be4d2 [Bug](join) corner case cause the mark join + null aware left join core dump in regression test in pipeline query engine (#25087) 2023-10-08 22:50:12 +08:00