Commit Graph

9013 Commits

Author SHA1 Message Date
b8d8cf1ac9 [regression](test) script for teamcity to check if pr need run build (#16937)
* [regression](test) script for teamcity to check if pr need run build

* Update check-pr-if-need-run-build.sh

fix

* Update check-pr-if-need-run-build.sh

fix

---------

Co-authored-by: stephen <hello_stephen@@qq.com>
2023-03-01 15:59:31 +08:00
48ef61780d [refactor](struct-type) refactor and clean unused code for struct type (#17257)
remove unused code for struct type
2023-03-01 15:49:31 +08:00
0732eb54bc [feature](struct-type) support csv format stream load for struct type (#17143)
Refactor from_string method in data_type_struct.cpp to support csv format stream load for struct type.
2023-03-01 15:48:48 +08:00
Pxl
62440f3140 [Bug](Materialized-View) forbiden mv rewrite on create view and remove duplicate method getIsM… (#17194)
1. forbiden mv rewrite on create view to avoid select fail
2. remove duplicate method getIsMaterialized
2023-03-01 13:46:56 +08:00
ff8902370c [improvement](doc) Supplementary Bulk Deletion Notes (#17113)
* 补充批量删除注意事项

* 按照批量删除文档前文的介绍, 用户可能会开启`show_hidden_columns`的session variable来查看表是否支持批量删除. 
* 后续按示例进行DELETE/MERGE的导入作业后, 如果在同一个session中执行`select count(*) from xxx`语句时, 可能会发现结果与预期不一致
* 可能无法快速联想到是因为之前开启的session variable导致被删除的语句也被查出来了.

* supplement batch deletion notes for English doc
2023-03-01 13:35:20 +08:00
b8ebcdff78 [Bug](bloomfilter) Fix wrong result using bloomfilter with date type (#17225) 2023-03-01 12:29:20 +08:00
979cf42d7a [Bug](decimalv3) Use correct decimal scale for function round (#17232)
Co-authored-by: maochongxin <maochongxin@gmail.com>
2023-03-01 12:28:41 +08:00
cbdf1af2d5 [feature](Nereids): pushdown Alias through Join. (#17150) 2023-03-01 11:33:37 +08:00
62ec74f4e7 segcompaction featuring verticalcompaction (#16731)
This patchset applies the following changes:

using vertical compaction machanism to do segcompaction
basic (WIP) refraction to separate segcompaction logic from BetaRowsetWriter
add segcompaction specific ut and regression tests
2023-03-01 10:55:40 +08:00
48afd77e37 [enhancement](k8s) Support fqdn mode for fe in k8s enviroment (#16315) 2023-03-01 10:54:39 +08:00
774f66c6bc [Enhencement](test) enhencement regression test of java udf (#17251)
when run regression test of java udf many times.
if failed in some reason, the next time will meet error like: function already exist

Issue Number: close #xxx
2023-03-01 09:34:40 +08:00
e687f3badd Revert "[feature-wip](BE http)Support BE http service using brpc (#16123)" (#17219)
This reverts commit 049ecccc578802496e5421db19e21e7eb256699d.
Merge back after streamload is handled.
2023-03-01 09:18:25 +08:00
2f471de675 [fix](FileCache) load file cache before start up daemon threads (#17199)
Daemon threads in doris_main.cpp will upload tablet metrics periodically, which will use StorageEngine::instance(). However loading file cache is a process in main thread, when it takes a lot of time to load file cache, StorageEngine::instance() will be a null pointer in daemon threads.
2023-03-01 08:35:57 +08:00
e22a9ecc3b [enhancement](execute model) using thread pool to execute report or join task instead of staring too many thread (#17212)
* [enhancement](execute model) using thread pool to execute report or join task instead of staring too many thread

Doris will start report thread and join thread during fragment execution. There are many problems if create and destroy thread very frequently. Jemalloc may not behave very well, it may crashed.

jemalloc/jemalloc#1405

It is better to using thread pool to do these tasks.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-01 08:35:27 +08:00
68e9a66aa0 [Enchancement](schema scanner) add SchemaScanner profile (#17230)
Add some profile information to the schema scanner to facilitate performance optimization.

Example:

SchemaScanner:
      -  FillBlockTime:  9s131ms
      -  GetDbTime:  12.816ms
      -  GetDescribeTime:  1s645ms
      -  GetTableTime:  25.433ms
2023-03-01 08:34:27 +08:00
cfc2d45795 [typo](docs) fix typo (#17208) 2023-03-01 07:41:21 +08:00
eeca16d7a0 [fix](doc)adjust Flink connector document structure and add SchemaChange example (#17231) 2023-03-01 07:40:56 +08:00
475368c62d [typo](docs) Add some details about AES encryption. (#17243)
* [typo](docs) Add some details about AES encryption.

* Update aes.md

* Update aes.md

* Update aes.md

* Update aes.md
2023-03-01 07:40:11 +08:00
7369261f33 [typo](docs)update hight-concurrent-point-query.md (#17248)
Co-authored-by: liuxiaodong <liuxiaodong1@corp.netease.com>
2023-03-01 07:37:27 +08:00
5d096f9fcb [community] update collaborators (#17263) 2023-03-01 07:30:06 +08:00
91bf497a88 [fix](Nereids): provide BUCKETED property only when child's property is enforced for agg (#17229) 2023-03-01 01:11:42 +08:00
cf7e97dd27 [chore](thirdparty) Fix the linkage errors for librdkafka (#17181)
Fix the linkage errors for librdkafka
2023-02-28 21:37:27 +08:00
7f6209ede4 [fix](routine load) fix be core dump while use routine load (#17222) 2023-02-28 21:01:38 +08:00
e3d7f7c8d8 [feature](Nereids) add test framework for cost model (#17071)
add test-frame-work for cost model according paper Testing the Accuracy of Query Optimizers
2023-02-28 20:59:07 +08:00
1b58f7f2ea [fix](Nereids) json object and json array should always not nullable (#17205) 2023-02-28 20:26:21 +08:00
9bcc3ae283 [Fix](DOE)Fix be core dump when parse es epoch_millis date format (#17100) 2023-02-28 20:09:35 +08:00
94cea0ea6d [fix](Nereids) Disable preagg when there is DELETE_SIGN filter (#17157)
1. disable preAgg when there is delete sign when binding relation
2. keep the preAgg status in SelectMaterializeIndex rule
2023-02-28 19:59:05 +08:00
1ced23018e [improvement](test) modify test_clean_label test to support run multiple time (#17223)
use uuid in load label to avoid Label already used issue on the second run
Only for master, already fixed in branch-1.2-lts
2023-02-28 19:24:55 +08:00
459874be50 Revert "[Bug](log) add some log to find out bug (#16518)" (#17178)
This reverts commit d1c6b8114053e8c754c979d8d3fbf5c880d361d2.
2023-02-28 19:23:12 +08:00
838107b8e8 [enhencement](Nereids) support inverted index scan (#17197) 2023-02-28 19:07:49 +08:00
e34e72dd51 [feature](Nereids) show cost and execution time for each plan (#17123)
1. Show cost in optimized plan
2. show plan time schedule time and so on in profile
2023-02-28 18:59:57 +08:00
34813bae13 [improvement](meta) make database,table,column names to support unicode (replace PR #13467 with this) (#14531)
Make database, table, column and other names support unicode by changing LABEL_REGEX COMMON_NAME_REGIEX COMMON_TABLE_NAME_REGEX COLUMN_NAME_REGEX regular expressions in class FeNameFormat.

P.S. @SharpRay has transfered PR #13467 to me, and I‘m responsible for the task now. There will be some modifications during the review period, so I create a new PR and the original #13467 could be closed. Thanks.
2023-02-28 18:50:36 +08:00
dd4bd3f360 [enhancement](Nereids) consider string literal coercion when search function signature (#17175) 2023-02-28 17:59:52 +08:00
727853017c [regression-test](Nereids) add agg function, tvf, generator, window function test cases (#16824)
add agg_function, tvf, generator, window_function test for nereids and add more feature to gen.py
2023-02-28 17:51:39 +08:00
1dd2a41e38 [vectorized](bug) fix window function can't handle first row of beyond (#17084)
Issue Number: close #16845
2023-02-28 17:30:23 +08:00
79e49dad93 [fix](brpc) solve bthread hang problem (#17206) 2023-02-28 17:10:05 +08:00
f8e20ceca2 [Improvement](jsonb) add suport for JSONB type for arrow (#16869)
add suport for JSONB type for arrow, which is used by doris spark/flink connector.
2023-02-28 17:04:13 +08:00
9db56201a6 [refactor](Nereids) Refactor rewrite framework to speed up plan (#17126)
This pr refactor the rewrite framework from memo to plan tree, and speed up the analyze/rewrite stage.

Changes:
- abandoned memo in the analysis/rewrite stage, so that we can skip some actions, like new GroupExpression, distinct GroupExpression in the memo(high cost), update children to GroupPlan
- change the most of rules to static rule, so that we can skip initialize lots of rules in Analyzer/Rewriter at every query. but some rules need context, like visitor rule, create rule at the runtime make it is easy to use, so make `custom` rule can help us to create it.
- remove the `logger` field in the Job, Job are generated in large quantities at runtime, we don't need to use logger so save huge time to initialize logger.
- skip some rule as far as possible, e.g. `SelectMaterializedIndexWithoutAggregate`, skip select mv if the table not exist rullup.
- add some caches for frequent operation, like get Job.getDisableRules, Plan.getUnboundExpression
- new bottom up rewrite rule, it can keep traverse multiple new plan which return by rules, this feature depends on `Plan.mutableState`, it is necessary to add this variable field for plan. if the plan is fully immutable, we must use withXxx to renew the plan and set the state for it, this take more runtime overhead and developing workload. another reason is we need multiple mutable state, e.g. whether is applied the rule,  whether this plan is manage by the rewrite framework. the good side of mutable state is efficient, but I suggest we don't direct use mutable state in the rule as far as possible, if we need use it, please wrap the mutable state in the framework to update and release it correctly. a good example is `AppliedAwareRuleCondition`, it can update and get the state: whether this plan is applied to a rule before.
- merge some rules, invoke multiple rules in one traverse
- refactor the `EliminateUnnecessaryProject` by CustomRewritor, fix the problem which eliminate some Project which decided the query output order, the case is limit(project), sort(project).

TODO: add trace for new rewrite framework

benchmark:

legacy optimizer:
```
+-----------+---------------+---------------+---------------+
|  SQL ID   |      avg      |      min      |      max      |
+-----------+---------------+---------------+---------------+
|  SQL 1    |       1.39 ms |          0 ms |          9 ms |
|  SQL 2    |       1.38 ms |          0 ms |         10 ms |
|  SQL 3    |       2.05 ms |          1 ms |         18 ms |
|  SQL 4    |       0.89 ms |          0 ms |          9 ms |
|  SQL 5    |       1.74 ms |          1 ms |         11 ms |
|  SQL 6    |       2.00 ms |          1 ms |         13 ms |
|  SQL 7    |       1.83 ms |          1 ms |         15 ms |
|  SQL 8    |       0.92 ms |          0 ms |          7 ms |
|  SQL 9    |       2.60 ms |          1 ms |         19 ms |
|  SQL 10   |       3.54 ms |          2 ms |         28 ms |
|  SQL 11   |       3.04 ms |          1 ms |         18 ms |
|  SQL 12   |       3.26 ms |          2 ms |         16 ms |
|  SQL 13   |       1.10 ms |          0 ms |         10 ms |
|  SQL 14   |       2.90 ms |          1 ms |         13 ms |
|  SQL 15   |       1.18 ms |          0 ms |          9 ms |
|  SQL 16   |       1.05 ms |          0 ms |         13 ms |
|  SQL 17   |       1.03 ms |          0 ms |          7 ms |
|  SQL 18   |       0.94 ms |          0 ms |          7 ms |
|  SQL 19   |       1.47 ms |          0 ms |         13 ms |
|  SQL 20   |       0.47 ms |          0 ms |          4 ms |
|  SQL 21   |       0.54 ms |          0 ms |          5 ms |
|  SQL 22   |       3.34 ms |          1 ms |         19 ms |
|  SQL 23   |       7.97 ms |          4 ms |         44 ms |
|  SQL 24   |      11.11 ms |          7 ms |         28 ms |
|  SQL 25   |       0.98 ms |          0 ms |          8 ms |
|  SQL 26   |       0.83 ms |          0 ms |          7 ms |
|  SQL 27   |       0.93 ms |          0 ms |         16 ms |
|  SQL 28   |       2.19 ms |          1 ms |         18 ms |
|  SQL 29   |       3.23 ms |          1 ms |         20 ms |
|  SQL 30   |      59.99 ms |         51 ms |         81 ms |
|  SQL 31   |       2.65 ms |          1 ms |         18 ms |
|  SQL 32   |       2.47 ms |          1 ms |         17 ms |
|  SQL 33   |       2.30 ms |          1 ms |         16 ms |
|  SQL 34   |       0.66 ms |          0 ms |          8 ms |
|  SQL 35   |       0.63 ms |          0 ms |          6 ms |
|  SQL 36   |       2.25 ms |          1 ms |         15 ms |
|  SQL 37   |       5.97 ms |          3 ms |         20 ms |
|  SQL 38   |       5.73 ms |          3 ms |         21 ms |
|  SQL 39   |       6.32 ms |          4 ms |         23 ms |
|  SQL 40   |       8.61 ms |          5 ms |         35 ms |
|  SQL 41   |       6.29 ms |          4 ms |         28 ms |
|  SQL 42   |       6.04 ms |          4 ms |         15 ms |
|  SQL 43   |       5.81 ms |          3 ms |         16 ms |
+-----------+---------------+---------------+---------------+
| TOTAL AVG |       4.22 ms |       2.47 ms |      17.05 ms |
| TOTAL SUM |     181.62 ms |        106 ms |        733 ms |
+-----------+---------------+---------------+---------------+
```

nereids with memo rewrite framework(old):
```
+-----------+---------------+---------------+---------------+
|  SQL ID   |      avg      |      min      |      max      |
+-----------+---------------+---------------+---------------+
|  SQL 1    |       3.61 ms |          1 ms |         20 ms |
|  SQL 2    |       3.47 ms |          2 ms |         16 ms |
|  SQL 3    |       3.27 ms |          1 ms |         18 ms |
|  SQL 4    |       2.23 ms |          1 ms |         12 ms |
|  SQL 5    |       3.60 ms |          1 ms |         20 ms |
|  SQL 6    |       2.73 ms |          1 ms |         17 ms |
|  SQL 7    |       3.04 ms |          1 ms |         23 ms |
|  SQL 8    |       3.53 ms |          2 ms |         20 ms |
|  SQL 9    |       3.74 ms |          2 ms |         22 ms |
|  SQL 10   |       3.66 ms |          2 ms |         18 ms |
|  SQL 11   |       3.93 ms |          2 ms |         15 ms |
|  SQL 12   |       4.85 ms |          2 ms |         27 ms |
|  SQL 13   |       4.41 ms |          2 ms |         28 ms |
|  SQL 14   |       5.16 ms |          2 ms |         41 ms |
|  SQL 15   |       4.33 ms |          2 ms |         33 ms |
|  SQL 16   |       4.94 ms |          2 ms |         51 ms |
|  SQL 17   |       3.27 ms |          1 ms |         25 ms |
|  SQL 18   |       2.78 ms |          1 ms |         22 ms |
|  SQL 19   |       3.51 ms |          1 ms |         42 ms |
|  SQL 20   |       1.84 ms |          1 ms |         13 ms |
|  SQL 21   |       3.47 ms |          1 ms |         66 ms |
|  SQL 22   |       5.21 ms |          2 ms |         29 ms |
|  SQL 23   |       5.55 ms |          3 ms |         25 ms |
|  SQL 24   |       4.21 ms |          2 ms |         28 ms |
|  SQL 25   |       3.47 ms |          1 ms |         23 ms |
|  SQL 26   |       3.03 ms |          2 ms |         21 ms |
|  SQL 27   |       3.07 ms |          1 ms |         17 ms |
|  SQL 28   |       4.51 ms |          3 ms |         22 ms |
|  SQL 29   |       4.97 ms |          3 ms |         21 ms |
|  SQL 30   |      11.95 ms |          8 ms |         33 ms |
|  SQL 31   |       3.92 ms |          2 ms |         23 ms |
|  SQL 32   |       3.74 ms |          2 ms |         15 ms |
|  SQL 33   |       3.62 ms |          2 ms |         22 ms |
|  SQL 34   |       4.60 ms |          1 ms |         55 ms |
|  SQL 35   |       3.47 ms |          2 ms |         25 ms |
|  SQL 36   |       3.34 ms |          2 ms |         18 ms |
|  SQL 37   |       4.77 ms |          2 ms |         23 ms |
|  SQL 38   |       4.44 ms |          2 ms |         39 ms |
|  SQL 39   |       4.52 ms |          2 ms |         23 ms |
|  SQL 40   |       5.50 ms |          3 ms |         30 ms |
|  SQL 41   |       5.01 ms |          2 ms |         24 ms |
|  SQL 42   |       4.32 ms |          2 ms |         24 ms |
|  SQL 43   |       4.29 ms |          2 ms |         42 ms |
+-----------+---------------+---------------+---------------+
| TOTAL AVG |       4.11 ms |       1.91 ms |      26.30 ms |
| TOTAL SUM |     176.88 ms |         82 ms |       1131 ms |
+-----------+---------------+---------------+---------------+
```

nereids with plan tree rewrite framework(new):
```
+-----------+---------------+---------------+---------------+
|  SQL ID   |      avg      |      min      |      max      |
+-----------+---------------+---------------+---------------+
|  SQL 1    |       3.21 ms |          1 ms |         18 ms |
|  SQL 2    |       3.99 ms |          1 ms |         76 ms |
|  SQL 3    |       2.93 ms |          1 ms |         21 ms |
|  SQL 4    |       2.13 ms |          1 ms |         21 ms |
|  SQL 5    |       2.43 ms |          1 ms |         30 ms |
|  SQL 6    |       2.08 ms |          1 ms |         11 ms |
|  SQL 7    |       2.03 ms |          1 ms |         11 ms |
|  SQL 8    |       2.27 ms |          1 ms |         22 ms |
|  SQL 9    |       2.42 ms |          1 ms |         16 ms |
|  SQL 10   |       2.65 ms |          1 ms |         14 ms |
|  SQL 11   |       2.78 ms |          1 ms |         14 ms |
|  SQL 12   |       3.09 ms |          1 ms |         19 ms |
|  SQL 13   |       2.33 ms |          1 ms |         13 ms |
|  SQL 14   |       2.66 ms |          1 ms |         16 ms |
|  SQL 15   |       2.34 ms |          1 ms |         15 ms |
|  SQL 16   |       2.04 ms |          1 ms |         30 ms |
|  SQL 17   |       2.09 ms |          1 ms |         17 ms |
|  SQL 18   |       1.87 ms |          1 ms |         15 ms |
|  SQL 19   |       2.21 ms |          1 ms |         50 ms |
|  SQL 20   |       1.32 ms |          0 ms |         12 ms |
|  SQL 21   |       1.63 ms |          1 ms |         11 ms |
|  SQL 22   |       2.75 ms |          1 ms |         30 ms |
|  SQL 23   |       3.44 ms |          2 ms |         17 ms |
|  SQL 24   |       2.01 ms |          1 ms |         14 ms |
|  SQL 25   |       1.58 ms |          1 ms |         11 ms |
|  SQL 26   |       1.53 ms |          0 ms |         13 ms |
|  SQL 27   |       1.62 ms |          1 ms |         12 ms |
|  SQL 28   |       2.90 ms |          1 ms |         21 ms |
|  SQL 29   |       3.04 ms |          2 ms |         17 ms |
|  SQL 30   |      10.54 ms |          7 ms |         49 ms |
|  SQL 31   |       2.61 ms |          1 ms |         21 ms |
|  SQL 32   |       2.42 ms |          1 ms |         14 ms |
|  SQL 33   |       2.13 ms |          1 ms |         14 ms |
|  SQL 34   |       1.69 ms |          1 ms |         14 ms |
|  SQL 35   |       1.87 ms |          1 ms |         15 ms |
|  SQL 36   |       2.37 ms |          1 ms |         21 ms |
|  SQL 37   |       3.06 ms |          1 ms |         15 ms |
|  SQL 38   |       4.09 ms |          1 ms |         31 ms |
|  SQL 39   |       5.81 ms |          2 ms |         43 ms |
|  SQL 40   |       4.55 ms |          2 ms |         34 ms |
|  SQL 41   |       3.49 ms |          1 ms |         20 ms |
|  SQL 42   |       2.75 ms |          1 ms |         26 ms |
|  SQL 43   |       2.81 ms |          1 ms |         14 ms |
+-----------+---------------+---------------+---------------+
| TOTAL AVG |       2.78 ms |       1.19 ms |      21.35 ms |
| TOTAL SUM |     119.56 ms |         51 ms |        918 ms |
+-----------+---------------+---------------+---------------+
```
2023-02-28 16:02:09 +08:00
a1db5c6f52 [fix](vec) crash caused by not-implemented function in ColumnFixedLengthObject (#17215) 2023-02-28 15:27:06 +08:00
37551a0163 [enhancement](Nereids) implement to legacy literal for datetimev2 literal (#17177) 2023-02-28 14:51:38 +08:00
3e40467ce6 [Bug](vec) Fix chinese pinyin order by (#17152)
bug: some chinese word not sort by pinyin in GBK coding

CREATE TABLE `test_convert` (
                 `a` varchar(100) NULL
             ) ENGINE=OLAP
               DUPLICATE KEY(`a`)
               DISTRIBUTED BY HASH(`a`) BUCKETS 3
               PROPERTIES (
               "replication_allocation" = "tag.location.default: 1"
               );
insert into test_convert values("b"), ("a"), ("c"), ("睿"), ("多"), ("丝");
Query OK, 6 rows affected (0.03 sec)
{'label':'insert_ca73a6acc2194d5b_888218a3949355a6', 'status':'VISIBLE', 'txnId':'18068'}
mysql [test]>select * from test_convert;
+------+
| a    |
+------+
| a    |
| c    |
| 丝   |
| b    |
| 多   |
| 睿   |
+------+
6 rows in set (0.01 sec)
mysql [test]>select * from test_convert order by convert(a using gbk);          
+------+
| a    |
+------+
| a    |
| b    |
| c    |
| 多   |
| 丝   |
| 睿   |
+------+
6 rows in set (0.01 sec)
2023-02-28 14:29:56 +08:00
bf5037d6d5 [fix](OrcReader) typo in anaylize null values (#17156)
typographical error in analyzing null values for OrcReader.
2023-02-28 14:29:13 +08:00
598038e674 [improvement](parquet-reader)support parquet data page v2 (#17054)
Support parquet data page v2
Now the parquet data on AWS glue use data page v2, but we didn't support before.
2023-02-28 14:23:45 +08:00
89542b3e50 [test](test_multi_partition) add multi partition by datetime test (#17068)
add multi partition by datetime test , This feature was created by @catpineapple
2023-02-28 12:10:24 +08:00
4d8b310de0 [fix](struct-type) fix struct subtype support (#17081)
1. Make sure all sub types which STRUCT supported work correctly;
2. remove unused variable `_need_validate_data`;
3. lazy init min or max decimal to support nested DecimalV2 column validate;

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2023-02-28 11:37:07 +08:00
1771d1e5e7 [fix](value-range) fix the value range of non-nullable column contains null causes query short key index error. (#16943)
* [fix](value-range) fix the value range of non-nullable column contains null causes query short key index error.
2023-02-28 11:15:32 +08:00
8141a1f0c5 [feature](cooldown)get tablet return cooldown conf (#17074)
* get tablet return cooldown conf

* get tablet return cooldown conf
2023-02-28 11:14:11 +08:00
26a46d8c3f [fix](cooldown) Handle full clone with cooldowned rowsets (#17069) 2023-02-28 11:04:01 +08:00
17c8123371 [test](regression) add some regression cases on constant evaluation. (#16599) 2023-02-28 10:57:37 +08:00
da2e9f4179 [improvement](test)Add nereids p0 pipeline trigger not required (#17193) 2023-02-28 10:51:54 +08:00