Commit Graph

14642 Commits

Author SHA1 Message Date
e718952e89 [fix](nereids)only enable colocate scan for one phase global parttion topn in some condition (#26473) 2023-11-08 20:46:40 +08:00
0c1458f21f [fix](planner)isnull predicate can't be safely constant folded in inlineview (#25377)
disable is null predicate constant fold rule for inline view
consider sql
select c.*
from (
select a.*, b.x
from test_insert a left join
(select 'some_const_str' x from test_insert) b on true
) c
where c.x is null;

when push “c.x is null” into c, after folding constant rule, it will get empty result. Because x is 'some_const_str' and "x is null" will be evaluated to false. This is wrong.
2023-11-08 20:46:29 +08:00
d749d99fe2 [fix](nereids)don't normalize column name for base index (#26476) 2023-11-08 20:45:58 +08:00
d0960bac56 [Fix](partial update) Fix partial update info loss when the delete bitmaps of the committed transactions are calculated by the compaction (#26556)
a fix for #25147
2023-11-08 19:56:31 +08:00
223be6947c [opt](Nereids) let DataType toSql same with legacy planner (#26576) 2023-11-08 05:34:32 -06:00
ec87401581 Fix workload group regression test failed (#26579) 2023-11-08 19:23:49 +08:00
3bce6d3828 [Opt](orc-reader) Optimize orc string dict filter in not_single_conjunct case. (#26386)
Optimize orc/parquet string dict filter in not_single_conjunct case. We can optimize this processing to filter block firstly by dict code, then filter by not_single_conjunct. Because dict code is int, it will filter faster than string.

For example:
```
select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate  and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01';
```
 `l_receiptdate` and `l_shipmode` will using string dict filtering, and `l_commitdate < l_receiptdate` is the an not_single_conjunct which contains dict filter field. We can optimize this processing to filter block firstly by dict code, then filter by not_single_conjunct. Because dict code is int, it will filter faster than string.

### Test Result:
Before:
 mysql> select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate  and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01';
+----------------------+
| count(l_receiptdate) |
+----------------------+
|             49314694 |
+----------------------+
1 row in set (6.87 sec)

After:
mysql> select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate  and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01';
+----------------------+
| count(l_receiptdate) |
+----------------------+
|             49314694 |
+----------------------+
1 row in set (4.85 sec)
2023-11-08 18:03:18 +08:00
45c2fa62a4 [pipeline](exec) disable shared scan in default and disable shared scan in limit with where scan (#25952) 2023-11-08 17:51:12 +08:00
a6d2013802 [opt](nereids) use 2 phase agg above union all (#26245)
forbid one phase agg for pattern: agg-unionAll
one phase agg plan: agg-union-hashDistribute-children
two phase agg plan: agg(global) - hashDistribute-agg(local)-union-randomDistribute
the key point is the cost of randomDistribute is much lower than the hashDistribute, and hence two-phase agg wins.
2023-11-08 17:15:53 +08:00
96d2e3394a [opt](meta) Improve the performance of getting expr name (#26341)
CaseFormat.UPPER_CAMEL.to(CaseFormat.LOWER_UNDERSCORE, name)
It's time-consuming when call many times. So lazy call when necessary
2023-11-08 03:14:15 -06:00
58bf79f79e [fix](move-memtable) pass load stream num to backends (#26198) 2023-11-08 16:16:33 +08:00
6637f9c15f Add enable_cgroup_cpu_soft_limit (#26510) 2023-11-08 15:52:13 +08:00
f018b00646 [ci](perf) add new pipeline of tpch-sf100 (#26334)
* [ci](perf) add new pipeline of tpch-sf100
Co-authored-by: stephen <hello-stephen@qq.com>
2023-11-08 15:32:02 +08:00
a3666aa87e [feature](decimal) support decimal256 when creating table (#26308) 2023-11-08 15:21:01 +08:00
xy
be7d49cb9f [Fix](doc) Fixed some errors in the documentation (#26410)
Co-authored-by: xingying01 <xingying01@corp.netease.com>
2023-11-08 15:19:34 +08:00
f80495da83 [fix](Nereids) ban right outer, right anti, full outer with bucket shuffle (#26529)
if left bucket has no data, we do not generate left bucket instance.
These join should reserve all right side data. But because left instance
is not exists. So right data will be discard since no dest be set.

We ban these join temporarily until we could generate all instance
for left side in Coordinator.
2023-11-08 01:16:50 -06:00
5d4557938a [regression-test](fix) fix export_struct bug (#26561) 2023-11-08 14:57:07 +08:00
fc304c0e7c (metric) add histogramJsonMetric and nodeInfo (#26172)
Add histogramJsonMetric and nodeInfo to the interface "http://fe_host:http_port/metrics?type=json".
2023-11-08 14:46:18 +08:00
44b51bf0b9 [Feature](Variant) support variant load (#26572) 2023-11-08 00:37:57 -06:00
0f3e97f9c5 [regression-test][framework] support cases that can only run in non-concurrent-mode. (#26487) 2023-11-08 12:46:36 +08:00
9502cc758d [fix](regression) fix group commit regression test (#26557) 2023-11-08 11:57:07 +08:00
f8f3bc6a67 Revert "[Chore](ci)Temporarily cancel the mandatory restrictions of ShellCheck (#26553)" (#26565)
This reverts commit b7c81bc73625b26df746fc2213980c16b9d8f1a0.
2023-11-08 11:52:08 +08:00
a2419a8eb4 [enhancement](sink) refactor code of auto partition and where clause and enable them on sinkv2 (#26432)
For better performance and elasticity, we move memtable from loadchannel to
sink, VTabletSinkV2 is introduced, then there are VTabletWriter and
VTabletSinkV2 distributing rows to tablets. where clauses on mvs are
executed in VTabletWriter, while VTabletSinkV2 needs it too. So common code
is moved to row distribution.

Actually, we can layer code by rows' data flow, then the code is much more
understood and maintainable.

ScanNode -> Sink/Writer (RowDistribution -> IndexChannel / DeltaWriter)
2023-11-08 11:51:40 +08:00
7bad2e1d9f [opt](nereids) infer result column name in ctas and query stmt (#26055)
Infer name if it is an expression and doesn't alias artificially when create or select stmt in nereids.
The infer name strategy is the same as #24990
2023-11-07 21:28:48 -06:00
f4cbbe6429 [chore](workflow) Fix security issues with pull_request_target (#26525)
In the workflow Code Checks, we use the event pull_request_target which has write permission to enable the actions to comment on our PRs. We should be careful with the write permission and must forbid from running any user code. The previous PR #24761 tried its best to achieve this goal.
However, there is a scenario lacking of consideration (See #26494). #26494 attacks the workflow by git submodule way. This PR fixes this scenario by checkouting the external action explicitly in the workflow.
2023-11-08 11:23:13 +08:00
47ba4aaf30 [Enhancement](load) add timer and partitions number limit (#26549)
add timer and partitions number limit
2023-11-08 11:22:40 +08:00
c93c8f6105 [opt](nereids) make AGG_SCALAR_SUBQUERY_TO_WINDOW_FUNCTION rewrite rule #25969 2023-11-08 11:04:08 +08:00
290070074a [refactor](stats) refactor collection logic and opt some config (#26163)
1. not collect partition stats anymore
2. merge insert of stats
3. delete period collector since it is useless
4. remove enable_auto_sample
5. move some config related to stats to global session variable

Before this PR, when analyze  a table, the insert count equals column count times 2

After this PR, insert count of analyze table would reduce to column count / insert_merge_item_count.

According to my test, when analyzing  tpch lineitem, the insert sql count is 1
2023-11-08 11:03:44 +08:00
1544110c1b [feature-wip](arrow-flight)(step4) Support other DML and DDL statements, besides Select (#25919)
Design Documentation Linked to #25514
2023-11-08 10:50:42 +08:00
806461721c [opt](Nereids) remove Nondeterministic trait from date related functions (#26444) 2023-11-07 20:43:37 -06:00
b7c81bc736 [Chore](ci)Temporarily cancel the mandatory restrictions of ShellCheck (#26553)
To let #26525 pass.
2023-11-08 10:42:22 +08:00
daea751a98 [Improvement](auditlog) add column catalog for audit log and audit log table (#26403) 2023-11-08 10:25:15 +08:00
Pxl
3cdbb6e637 [Bug](materialized-view) fix some bugs on create mv with percentile_approx (#26528)
1. percentile_approx have wrong symbol
2. fnCall.getParams() get obsolete childrens
2023-11-08 10:09:37 +08:00
519b48648e [fix](move-memtable) handle status when possible (#26526) 2023-11-08 10:09:06 +08:00
607a5d25f1 [feature](streamload) support HTTP request with chunked transfer (#26520) 2023-11-08 10:07:05 +08:00
a354f87d2e [refactor](pipeline) simplify runtime state ctor (#26461) 2023-11-08 09:57:09 +08:00
70bc8600a9 [fix](regression) fix regression framework bug: if real test result is negative, it will miss check test result (#25734) 2023-11-08 09:05:58 +08:00
a6756b4660 [pipelineX](bug) Fix broadcast buffer reference count (#26545) 2023-11-08 00:14:48 +08:00
4995ca8fba [fix](move-memtable) ensure segment is flushed before add segment (#26522) 2023-11-07 22:42:16 +08:00
32b36d3c9c [refactor](move-memtable) rename proto OpenStreamSink to OpenLoadStream (#26527) 2023-11-07 22:41:20 +08:00
3faf3b4118 [chore] Print FE version even if it has been started (#26427)
In the previous implementation, `bin/start_fe.sh --version` will
complain that "Frontend running as process xxx. Stop it first."

To show version
1. `bin/start_fe.sh --version` will print version info to fe.out
2. `bin/start_fe.sh --console --version` will print version info to stdout
2023-11-07 22:33:02 +08:00
5d80e7dc2f [Improvement](pipelineX) Improve local exchange on pipelineX engine (#26464) 2023-11-07 22:11:44 +08:00
ceccc451fa [enhancement](Nereids): add LOG info to show the phase of NereidsPlanner. (#26538)
Add LOG info to show the phase of NereidsPlanner, we can use these info to debug.
2023-11-07 21:46:54 +08:00
2be6c9ff7d [enhancement](Nereids): when the DPhyper failed, roll back to cascades without join reorder (#26390)
when the DPhyper failed, roll back to cascades without join reorder
2023-11-07 20:05:40 +08:00
5e9a23e643 [fix](prepare statement) Not supported such prepared statement if prepare a forward master sql (#26512) 2023-11-07 19:41:44 +08:00
2bb3ef1981 [refactor](scan) delete bloom_filter_predicate (#26499) 2023-11-07 19:37:31 +08:00
d6eb3324a1 [cleanup](load) remove unused code in sink v2 header (#26521) 2023-11-07 19:35:12 +08:00
ad1f635070 [Feature](auditloader) Plugin auditloader use auth token to avoid using cleartext passwords in config (#26278)
Doris FE will check if stream load http request has auth token after checking password failed;
Plugin audit-log loader can use auth token if plugin config set use_auth_token to true

Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
2023-11-07 19:14:57 +08:00
38a14c3325 [docs](fix) add bitmap_remove in sidebars.json (#26523) 2023-11-07 19:01:27 +08:00
2feed57f47 [Fix](fs_benchmark_tools) Fix run_fs_benchmark.sh classpath issue. (#26183)
Fix run_fs_benchmark.sh classpath issue.
2023-11-07 18:43:30 +08:00