Commit Graph

4027 Commits

Author SHA1 Message Date
97bf07fe26 [enhancement](Nereids) add new distributed cost model (#17556)
Add a new distributed cost model in Nereids. The new cost model models the cost of the pipeline execute engine by dividing cost into run and start costs. They are:
* START COST: the cost from starting to emitting the fist tuple
* RUN COST: the cost from emitting the first tuple to emitting all tuples

For the parent operator and child operator, we assume the timeline of them is:
  ```
  child start ---> child run --------------------> finish
             |---> parent start ---> parent run -> finish
  ```

Therefore, in the parallel model, we can get:
  ```
  start_cost(parent) = start_cost(child) + start_cost(parent)
  run_cost(parent) = max(run_cost(child), start_cost(parent) + run_cost(parent))
  ```
2023-03-15 11:22:31 +08:00
85080ee3c3 [vectorized](function) support array_map function (#17581) 2023-03-15 10:51:29 +08:00
5ab758674e [fix](planner) nested loop join with left semi generate repeat result (#17767) 2023-03-15 09:56:44 +08:00
45fcdaabc7 [Bug](catalog) Fix fetching information_schema table timed out(#17692) (#17694)
Co-authored-by: hugoluo <hugoluo@tencent.com>
2023-03-15 09:56:24 +08:00
16a4dc0a85 [ehancement](profile) Disable profiling for the internal query (#17720) 2023-03-15 09:48:29 +08:00
9b047d2c94 Feat: Add byte size to TTypedesc in TExpr. Which will be used to carry scalarType information. (#17757)
Co-authored-by: libinfeng <libinfeng@selectdb.com>
2023-03-15 08:24:32 +08:00
7872f3626a [feature](Nereids): Rewrite InPredicate to disjunction if there exist items < 3 elements in InPredicate (#17646)
* [feature](Nereids): Rewrite InPredicate to disjunction if there exists < 3 elements in InPredicate

* fix SimplifyRange
2023-03-15 08:23:56 +08:00
02220560c5 [Improvement](multi catalog)Hive splitter. Get HDFS/S3 splits by using FileSystem api (#17706)
Use FileSystem API to get splits for file in HDFS/S3 instead of calling InputFormat.getSplits.
The splits is based on blocks in HDFS/S3.
2023-03-15 00:25:00 +08:00
b28f31f98d [fix](meta) fix show create table result of hive table (#17677)
make it usable in hive.

current issue: type of partition column are wrapped by ``, it's not illegal in hive. One problem case:

CREATE TABLE t3p_parquet(
id int,
name string)
PARTITIONED BY (
dt int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://path/to/t3p_parquet'
TBLPROPERTIES (
'transient_lastDdlTime'='1671700883')
2023-03-14 22:50:35 +08:00
e46077fbf4 print group id for physical plan node (#17742) 2023-03-14 22:35:08 +08:00
6348819c27 [fix](Nereids) remove bitmap_union_int(bigint) signature (#17356) 2023-03-14 20:42:47 +08:00
ff9e03e2bf [Feature](add bitmap udaf) add the bitmap intersection and difference set for mixed calculation of udaf (#15588)
* Add the bitmap intersection and difference set for mixed calculation of udaf

Co-authored-by: zhangbinbin05 <zhangbinbin05@baidu.com>
2023-03-14 20:40:37 +08:00
65f71d9e06 [enhance](nereids) broadcast cost calculate (#17711)
update broadcast join cost estimate according to BE implementation.
there is an enhancement on BE. in broadcast join, BE only build one hash table, not instanceNum hash tables.
2023-03-14 19:45:03 +08:00
699159698e [enhancement](planner) support update from syntax (#17639)
support update from syntax

note: enable_concurrent_update is not supported now

```
UPDATE <target_table>
  SET <col_name> = <value> [ , <col_name> = <value> , ... ]
  [ FROM <additional_tables> ]
  [ WHERE <condition> ]
```

for example:
t1
```
+----+----+----+-----+------------+
| id | c1 | c2 | c3  | c4         |
+----+----+----+-----+------------+
| 3  | 3  | 3  | 3.0 | 2000-01-03 |
| 2  | 2  | 2  | 2.0 | 2000-01-02 |
| 1  | 1  | 1  | 1.0 | 2000-01-01 |
+----+----+----+-----+------------+
```

t2
```
+----+----+----+------+------------+
| id | c1 | c2 | c3   | c4         |
+----+----+----+------+------------+
| 4  | 4  | 4  |  4.0 | 2000-01-04 |
| 2  | 20 | 20 | 20.0 | 2000-01-20 |
| 5  | 5  | 5  |  5.0 | 2000-01-05 |
| 1  | 10 | 10 | 10.0 | 2000-01-10 |
| 3  | 30 | 30 | 30.0 | 2000-01-30 |
+----+----+----+------+------------+
```

t3
```
+----+
| id |
+----+
| 1  |
| 5  |
| 4  |
+----+
```

do update
```sql
 update t1 set t1.c1 = t2.c1, t1.c3 = t2.c3 * 100 from t2 inner join t3 on t2.id = t3.id where t1.id = t2.id;
```

the result
```
+----+----+----+--------+------------+
| id | c1 | c2 | c3     | c4         |
+----+----+----+--------+------------+
| 3  | 3  | 3  |    3.0 | 2000-01-03 |
| 2  | 2  | 2  |    2.0 | 2000-01-02 |
| 1  | 10 | 1  | 1000.0 | 2000-01-01 |
+----+----+----+--------+------------+
```
2023-03-14 19:26:30 +08:00
f1dde20315 [ehancemnet](nereids) Refactor statistics (#17637)
1. Support for more expression type
2. Support derive with histogram
3. Use StatisticRange to abstract to logic
4. Use Statistics rather than StatisDeriveResult
2023-03-14 13:10:55 +08:00
be3a7e69cd [refactor](Nereids): polish code SemiJoinLogicalJoinTranspose. (#17740) 2023-03-14 12:48:58 +08:00
3a97190661 [fix](Nereids) Compare plan with their output rather than string in UnrankTest (#17698)
After adding a unique ID, the unRankTest fail because each plan has a different ID in the string.
To avoid the effect of unique ID, Compare the plan with the output rather than the string
2023-03-14 11:10:06 +08:00
5b39fa9843 [Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562)
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------

Co-authored-by: spaces-x <weixiang06@meituan.com>
2023-03-14 10:54:04 +08:00
f3c6ee5961 [Enhance](ComputeNode) ES Scan node support to be scheduled to compute node (#16533)
ES Scan node support to be scheduled to compute node.
2023-03-14 00:13:24 +08:00
9b7596f1c6 [Feature](Dynamic schema table) step1 support schema change expression (#17494)
1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
2023-03-13 15:12:42 +08:00
c302fa2564 [Feature](array-function) Support array_pushfront function (#17584) 2023-03-13 14:26:02 +08:00
ac944e2ac1 [fix](cooldown)Fix bug for storage policy in dynamic partition (#17665)
* fix bug for partition storage policy
2023-03-13 14:13:55 +08:00
be5147c32e [enhancement](feservice) catch throwable and print log for frontend service (#17708)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-13 11:27:00 +08:00
782001c75b [fix](planner) project should be done inside subquery (#17630)
WITH t0 AS(
SELECT report.date1 AS date2 FROM(
SELECT DATE_FORMAT(date, '%Y%m%d') AS date1 FROM cir_1756_t1
) report GROUP BY report.date1
),
t3 AS(
SELECT date_format(date, '%Y%m%d') AS date3
FROM cir_1756_t2
)
SELECT row_number() OVER(ORDER BY date2)
FROM(
SELECT t0.date2 FROM t0 LEFT JOIN t3 ON t0.date2 = t3.date3
) tx;

The DATE_FORMAT(date, '%Y%m%d') was calculated in GROUP BY node, which is wrong. This expr should be calculated inside the subquery.
2023-03-13 11:10:27 +08:00
55c42da511 [Feature](array) Support array<decimalv3> data type (#16640) 2023-03-13 10:48:13 +08:00
39b5682d59 [Pipeline](shared_scan_opt) Support shared scan opt in pipeline exec engine 2023-03-13 10:33:57 +08:00
a0a2809324 [Enhancement](multi-catalog) support hms event deserialization for HDP/CDH Hive versions. (#17660)
Some HDP/CDH Hive versions use gzip to compress the message body of hms NotificationEvent,
so com.qihoo.finance.hms.event.MetastoreEventFactory can not transfer it rightly.
2023-03-13 09:47:28 +08:00
b0d1166989 [fix](meta) fix concurrent modification exception and potential NPE (#17602) 2023-03-12 22:12:07 +08:00
46dcf69644 [fix](jdbc-catalog) avoid calculate driver's md5 when replaying edit log (#17693) 2023-03-12 22:11:45 +08:00
54e5c71e52 [fix](planner) Fix NPE when update stats by profile 2023-03-12 21:40:47 +08:00
a651926ba9 [fix](fqdn) Add UnknownHostException handle logic in FQDNManager to avoid that active ip could be incorrectly assigned to dead be or dead fe (#17689)
1.if be is dead and be ip not changed by FQDNManager,A situation may occur that after a while the old ip is used by other new alive pod,this may cause two be share same ip which is unexpected.
2.when enable_fqdn is false, user can still set hostname in be when add backend

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-03-12 21:12:33 +08:00
0d05e4cce0 [Improvement](multi-catalog) The interface of external Splitter. WIP (#17390)
This is PR introduce splitter interface external table.
The splitter interface contain one method getSplits, which is used by QueryScanProvider to get the external file split. 
For Hive/Iceberg/TVF, a split is a file block. For ES, it is a shard.
This PR also move the getSplits logic in FileScanProviderIf to the new Splitter interface.
In the future, we may unify internal table as well.
2023-03-12 20:11:08 +08:00
a452db35da [improvement](filecache)Change the hash field of the backend (#17499)
ip of backend may change
use id as a hash field
2023-03-12 20:04:25 +08:00
b93e553958 [enhance](Nereids): allow empty hash condition (#17699) 2023-03-12 18:51:22 +08:00
11fbe07221 [refactor](Nereids) Refactor all rewrite logical unit tests by match-pattern (#17691) 2023-03-12 18:49:12 +08:00
d774162a53 [minor](Nereids): rename rule (#17509) 2023-03-12 00:17:07 +08:00
9745ee60a7 [fix](priv) fix bug of grant priv on ctl.db.* not work (#17612)
currently, when use grant xxx_priv on ctl.db.* to user_a, it does not work. When user_a switch to ctl,
he cannot see or use any database.
2023-03-11 22:27:26 +08:00
692d510edb [fix](schema_hash) remove useless schema_hash param in tablet and replica url (#17489)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-03-11 21:34:47 +08:00
d7cb5cf3db [feature](nereids) add session var: dump_nereids_memo (#17666)
* dump_nereids_memo

* print groupexpr id
2023-03-11 13:40:15 +08:00
3231fab8c2 [feature](nereids) add unique id for groupExpression and plan node (#17628)
* add unqiue id for groupExpression and plan node

* fix ut
2023-03-11 13:23:41 +08:00
db9692a114 [feature](Nereids): convert CrossJoin to InnerJoin. (#17681) 2023-03-11 13:23:28 +08:00
3745e6c18a [fix](Nereids): order of project's logical properties is different with that of project expression (#17648) 2023-03-11 00:26:54 +08:00
051ab7a9c6 [refactor](Nereids): refactor Join-Dependent Predicate Duplication. (#17653) 2023-03-10 22:19:45 +08:00
566d133610 [enhancement](Nereids) Refactor EliminateLimitTest and EliminateFilterTest by match-pattern (#17631) 2023-03-10 21:24:36 +08:00
9cfa61b402 [Enhancement](HttpServer) Provide authentication interface for BE (#17073)
Add an authentication interface in FE for BE
2023-03-10 16:34:47 +08:00
9ae5ec4dc5 [fix](nereids) PushdownExpressionsInHashCondition contains duplicate column and WindowExpression miss column stats (#17624)
tpcds: q47 and q57
1. PushdownExpressionsInHashCondition:project contains duplicate column
2. WindowExpression stats caclucate: miss column stats
2023-03-10 16:08:43 +08:00
739e043c8d [fix](publish) add retry publish when succeed replica num less than quorum and transaction not VISIBLE (#17453)
for some reasons, transaction pushlish succeed replica num less than quorum,
this transaction's status can not to be VISIBLE, and this publish task of this
replica of this tablet on this backend need retry publish success to
make transaction VISIBLE when last publish failed.
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-03-10 12:02:15 +08:00
Pxl
1a549edac2 [Chore](third-party) upgrade thrift from 0.13 to 0.16 (#17202)
upgrade thrift from 0.13 to 0.16
There is thrift's release notes https://github.com/apache/thrift/blob/master/CHANGES.md
2023-03-10 11:33:16 +08:00
f84b8b7c8b [fix](priv) fix extract real user name when do privilege check (#17488)
fix extract real user name of root/admin
2023-03-10 10:22:13 +08:00
fe6361f4b5 [regression-test](p0) fix some unstable p0 cases (#17518)
drop database before create
remove some large, unused debug log
2023-03-10 10:21:39 +08:00