Commit Graph

4047 Commits

Author SHA1 Message Date
c29582bd57 [pipeline](split by segment)support segment split by scanner (#17738)
* support segment split by scanner

* change code by cr
2023-03-16 15:25:52 +08:00
b3d8be7cac [fix](cooldown)add push conf for alter storage policy (#17818)
* add push conf for alter storage policy
2023-03-16 14:27:27 +08:00
ee7226348d [FIX](Map) fix map compaction error (#17795)
When compaction case, memory map offsets coming to  same olap convertor which is from 0 to 0+size
but it should be continue in different pages when in one segment writer . 
eg : 
last block with map offset : [3, 6, 8, ... 100] 
this block with map offset : [5, 10, 15 ..., 100] 
the same convertor should record last offset to make later coming offset followed last offset.
so after convertor : 
the current offset should [105, 110, 115, ... 200], then column writer just call append_data() to make the right offset data append pages
2023-03-16 13:54:01 +08:00
0086fdbbdb [enhancement](planner) support delete from using syntax (#17787)
support syntax delete using, this syntax only support UNIQUE KEY model

use the result of `t2` join `t3` to romve rows from `t1`

```sql
-- create t1, t2, t3 tables
CREATE TABLE t1
  (id INT, c1 BIGINT, c2 STRING, c3 DOUBLE, c4 DATE)
UNIQUE KEY (id)
DISTRIBUTED BY HASH (id)
PROPERTIES('replication_num'='1', "function_column.sequence_col" = "c4");

CREATE TABLE t2
  (id INT, c1 BIGINT, c2 STRING, c3 DOUBLE, c4 DATE)
DISTRIBUTED BY HASH (id)
PROPERTIES('replication_num'='1');

CREATE TABLE t3
  (id INT)
DISTRIBUTED BY HASH (id)
PROPERTIES('replication_num'='1');

-- insert data
INSERT INTO t1 VALUES
  (1, 1, '1', 1.0, '2000-01-01'),
  (2, 2, '2', 2.0, '2000-01-02'),
  (3, 3, '3', 3.0, '2000-01-03');

INSERT INTO t2 VALUES
  (1, 10, '10', 10.0, '2000-01-10'),
  (2, 20, '20', 20.0, '2000-01-20'),
  (3, 30, '30', 30.0, '2000-01-30'),
  (4, 4, '4', 4.0, '2000-01-04'),
  (5, 5, '5', 5.0, '2000-01-05');

INSERT INTO t3 VALUES
  (1),
  (4),
  (5);

-- remove rows from t1
DELETE FROM t1
  USING t2 INNER JOIN t3 ON t2.id = t3.id
  WHERE t1.id = t2.id;
```

the expect result is only remove the row where id = 1 in table t1

```
+----+----+----+--------+------------+
| id | c1 | c2 | c3     | c4         |
+----+----+----+--------+------------+
| 2  | 2  | 2  |    2.0 | 2000-01-02 |
| 3  | 3  | 3  |    3.0 | 2000-01-03 |
+----+----+----+--------+------------+
```
2023-03-16 13:12:00 +08:00
bece027135 [ehancement](profile) Add HTTP interface for q-error (#17786)
1. Add Http interface for query q-error
2. Fix the selectivity calculation of inner join, it would always be 0 if there is only one join condition before
2023-03-16 12:19:23 +08:00
c2edca7bda [fix](Nereids) construct project with all slots in semi-semi-transpose-project rule (#17811)
error msg in tpch 20
```
SlotRef have invalid slot id: , desc: 22, slot_desc: tuple_desc_map: [Tuple(id=10 slots=[Slot(id=51 type=DECIMALV2(27, 9) col=-1, colname= null=(offset=0 mask=80)), Slot(id=52 type=INT col=-1, colname= null=(offset=0 mask=0)), Slot(id=53 type=INT col=-1, colname= null=(offset=0 mask=0)), Slot(id=54 type=INT col=-1, colname= null=(offset=0 mask=0)), Slot(id=55 type=INT col=-1, colname= null=(offset=0 mask=0))] has_varlen_slots=0)] tuple_id_map: [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0] tuple_is_nullable: [0] , desc_tbl: Slot(id=22 type=INT col=-1, colname= null=(offset=0 mask=0))
```

Before we only use slots in `hashJoin` conditions to construct projects, which may lost some slots in `project`, such as 
```
LOGICAL_SEMI_JOIN_LOGICAL_JOIN_TRANSPOSE_PROJECT

LogicalJoin[1135] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(PS_PARTKEY#0 = P_PARTKEY#6)], otherJoinConjuncts=[] )
|--LogicalProject[1128] ( distinct=false, projects=[PS_PARTKEY#0, PS_SUPPKEY#1], excepts=[], canEliminate=true )
|  +--LogicalJoin[1120] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(L_PARTKEY#17 = PS_PARTKEY#0), (L_SUPPKEY#18 = PS_SUPPKEY#1)], otherJoinConjuncts=[(cast(PS_AVAILQTY#2 as DECIMAL(27, 9)) > (0.5 * sum(L_QUANTITY))#33)] )
|     |--GroupPlan( GroupId#2 )
|     +--GroupPlan( GroupId#7 )
+--GroupPlan( GroupId#12 )
----------------------after----------------------
LogicalJoin[1141] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(L_PARTKEY#17 = PS_PARTKEY#0), (L_SUPPKEY#18 = PS_SUPPKEY#1)], otherJoinConjuncts=[(cast(PS_AVAILQTY#2 as DECIMAL(27, 9)) > (0.5 * sum(L_QUANTITY))#33)] )
|--LogicalProject[1140] ( distinct=false, projects=[PS_PARTKEY#0, PS_SUPPKEY#1], excepts=[], canEliminate=true )
|  +--LogicalJoin[1139] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(PS_PARTKEY#0 = P_PARTKEY#6)], otherJoinConjuncts=[] )
|     |--GroupPlan( GroupId#2 )
|     +--GroupPlan( GroupId#12 )
+--GroupPlan( GroupId#7 )
```
`PS_AVAILQTY#2` lost in project

Now we use all slots to construct projest
2023-03-16 11:53:32 +08:00
ebe651dae9 [Fix](Planner)Add call once logic to analyze of function aes_decrypt #17829
The problem is an exception when doing analyze:
java.lang.IllegalStateException: exceptions :
errCode = 2, detailMessage = select list expression not produced by aggregation output (missing from GROUP BY clause?): xxx

The scenario is:
select aes_decrypt(xxx,xxx) as c0 from table group by c0;

Analyze of problem:
The direct problem is mismatched of slotref, and this mismatched due to the mismatched of parameter number of aes_decrypt function. When debuging, we can see the slotref of group column is added to ExprSubstitutionMap, but can not matching with select result columns. And this is because when substiting expr it will analyze again, so the parameter would be added twice. This will cause the mismatching of function, so it would not be substitute as a slotref, the exception would be throw.

Fix:
Add call once to adding third parameter of aes_decrypt type function. Compare the child we want to add to the last child of function. If they are the same, do not add it.
2023-03-16 11:04:21 +08:00
1da3e7596e [fix](point query) Fix NegativeArraySizeException when prepared statement contains a long string (#17651) 2023-03-16 10:24:33 +08:00
f8ad01f55d [fix](fe) fix drop frontend removeUnReadyElectableNode incorrectly (#17680)
* when add two not exist fe and drop two not exit fe, we may meet exception like this:
'''
    java.lang.IllegalArgumentException: com.sleepycat.je.config.IntConfigParam:
             param je.rep.electableGroupSizeOverride doesn't validate, -1 is less than min of 0
    at com.sleepycat.je.config.IntConfigParam.validate(IntConfigParam.java:47)
    at com.sleepycat.je.config.IntConfigParam.validateValue(IntConfigParam.java:75)
    at com.sleepycat.je.dbi.DbConfigManager.setVal(DbConfigManager.java:648)
    at com.sleepycat.je.dbi.DbConfigManager.setIntVal(DbConfigManager.java:694)
    at com.sleepycat.je.rep.ReplicationMutableConfig.setElectableGroupSizeOverrideVoid(ReplicationMutableConfig.java:523)
    at com.sleepycat.je.rep.ReplicationMutableConfig.setElectableGroupSizeOverride(ReplicationMutableConfig.java:512)
    at org.apache.doris.ha.BDBHA.removeUnReadyElectableNode(BDBHA.java:236)
    at org.apache.doris.catalog.Env.dropFrontend(Env.java:2533)
'''
2023-03-16 10:22:42 +08:00
b043b9798d [feature](bdbje) Add config param for bdbje logging level (#17064)
Add new config param bdbje_file_logging_level
2023-03-16 09:50:44 +08:00
a53d46e317 [Fix](array function) fix array_pushfront function with DecimalV3 #17760
Support array_pushfront function with DecimalV3

Issue Number: close #xxx
2023-03-16 09:03:52 +08:00
d4dc56c99e [fix](insert) Fragment is not cancelled when client quit without commit a rollback transation insert (#17678) 2023-03-15 21:46:40 +08:00
990dce9a47 [fix](load) fix load channel timeout too fast in routine load task (#17796)
enlarge the timeout in routine load

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-03-15 21:12:02 +08:00
0ad459fea5 [fix](multi-catalog) fix forward to master throw NPE (#17791)
The ConnectionContext maybe null, and actually, we don't need ConnectionContext in MasterCatalogExecutor
2023-03-15 20:50:29 +08:00
e4a1e57d6f [feature](multi-catalog) support sap hana jdbc catalog and jdbc external table (#17780) 2023-03-15 20:37:36 +08:00
54a51933b6 [improvement](FQDN) Support existing cluster upgrade (#17659) 2023-03-15 20:13:13 +08:00
1fdc265083 [fix](Nereids) lock tables when analyze may lead to dead lock (#17776)
Nereids use AutoCloseable to do tables read lock and unlock.
However, if AutoCloseable throw exception when open resource, its close function would not be called.
So, we do close manually when exception thrown in opening stage.
2023-03-15 14:00:27 +08:00
1e3da95359 [fix](Nereids): fix LAsscom split conjuncts. (#17792) 2023-03-15 13:39:08 +08:00
ceff7e851d [fix](cooldown)Check cooldown ttl and datetime when alter storage policy (#17779)
* Check cooldown ttl and datetime when alter storage policy
2023-03-15 12:19:30 +08:00
c8de04f9d7 [fix][Nereids] fix not correct condition to checkReorder in InnerJoinRightAssociate. (#17799) 2023-03-15 11:49:03 +08:00
97bf07fe26 [enhancement](Nereids) add new distributed cost model (#17556)
Add a new distributed cost model in Nereids. The new cost model models the cost of the pipeline execute engine by dividing cost into run and start costs. They are:
* START COST: the cost from starting to emitting the fist tuple
* RUN COST: the cost from emitting the first tuple to emitting all tuples

For the parent operator and child operator, we assume the timeline of them is:
  ```
  child start ---> child run --------------------> finish
             |---> parent start ---> parent run -> finish
  ```

Therefore, in the parallel model, we can get:
  ```
  start_cost(parent) = start_cost(child) + start_cost(parent)
  run_cost(parent) = max(run_cost(child), start_cost(parent) + run_cost(parent))
  ```
2023-03-15 11:22:31 +08:00
85080ee3c3 [vectorized](function) support array_map function (#17581) 2023-03-15 10:51:29 +08:00
5ab758674e [fix](planner) nested loop join with left semi generate repeat result (#17767) 2023-03-15 09:56:44 +08:00
45fcdaabc7 [Bug](catalog) Fix fetching information_schema table timed out(#17692) (#17694)
Co-authored-by: hugoluo <hugoluo@tencent.com>
2023-03-15 09:56:24 +08:00
16a4dc0a85 [ehancement](profile) Disable profiling for the internal query (#17720) 2023-03-15 09:48:29 +08:00
9b047d2c94 Feat: Add byte size to TTypedesc in TExpr. Which will be used to carry scalarType information. (#17757)
Co-authored-by: libinfeng <libinfeng@selectdb.com>
2023-03-15 08:24:32 +08:00
7872f3626a [feature](Nereids): Rewrite InPredicate to disjunction if there exist items < 3 elements in InPredicate (#17646)
* [feature](Nereids): Rewrite InPredicate to disjunction if there exists < 3 elements in InPredicate

* fix SimplifyRange
2023-03-15 08:23:56 +08:00
02220560c5 [Improvement](multi catalog)Hive splitter. Get HDFS/S3 splits by using FileSystem api (#17706)
Use FileSystem API to get splits for file in HDFS/S3 instead of calling InputFormat.getSplits.
The splits is based on blocks in HDFS/S3.
2023-03-15 00:25:00 +08:00
b28f31f98d [fix](meta) fix show create table result of hive table (#17677)
make it usable in hive.

current issue: type of partition column are wrapped by ``, it's not illegal in hive. One problem case:

CREATE TABLE t3p_parquet(
id int,
name string)
PARTITIONED BY (
dt int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://path/to/t3p_parquet'
TBLPROPERTIES (
'transient_lastDdlTime'='1671700883')
2023-03-14 22:50:35 +08:00
e46077fbf4 print group id for physical plan node (#17742) 2023-03-14 22:35:08 +08:00
6348819c27 [fix](Nereids) remove bitmap_union_int(bigint) signature (#17356) 2023-03-14 20:42:47 +08:00
ff9e03e2bf [Feature](add bitmap udaf) add the bitmap intersection and difference set for mixed calculation of udaf (#15588)
* Add the bitmap intersection and difference set for mixed calculation of udaf

Co-authored-by: zhangbinbin05 <zhangbinbin05@baidu.com>
2023-03-14 20:40:37 +08:00
65f71d9e06 [enhance](nereids) broadcast cost calculate (#17711)
update broadcast join cost estimate according to BE implementation.
there is an enhancement on BE. in broadcast join, BE only build one hash table, not instanceNum hash tables.
2023-03-14 19:45:03 +08:00
699159698e [enhancement](planner) support update from syntax (#17639)
support update from syntax

note: enable_concurrent_update is not supported now

```
UPDATE <target_table>
  SET <col_name> = <value> [ , <col_name> = <value> , ... ]
  [ FROM <additional_tables> ]
  [ WHERE <condition> ]
```

for example:
t1
```
+----+----+----+-----+------------+
| id | c1 | c2 | c3  | c4         |
+----+----+----+-----+------------+
| 3  | 3  | 3  | 3.0 | 2000-01-03 |
| 2  | 2  | 2  | 2.0 | 2000-01-02 |
| 1  | 1  | 1  | 1.0 | 2000-01-01 |
+----+----+----+-----+------------+
```

t2
```
+----+----+----+------+------------+
| id | c1 | c2 | c3   | c4         |
+----+----+----+------+------------+
| 4  | 4  | 4  |  4.0 | 2000-01-04 |
| 2  | 20 | 20 | 20.0 | 2000-01-20 |
| 5  | 5  | 5  |  5.0 | 2000-01-05 |
| 1  | 10 | 10 | 10.0 | 2000-01-10 |
| 3  | 30 | 30 | 30.0 | 2000-01-30 |
+----+----+----+------+------------+
```

t3
```
+----+
| id |
+----+
| 1  |
| 5  |
| 4  |
+----+
```

do update
```sql
 update t1 set t1.c1 = t2.c1, t1.c3 = t2.c3 * 100 from t2 inner join t3 on t2.id = t3.id where t1.id = t2.id;
```

the result
```
+----+----+----+--------+------------+
| id | c1 | c2 | c3     | c4         |
+----+----+----+--------+------------+
| 3  | 3  | 3  |    3.0 | 2000-01-03 |
| 2  | 2  | 2  |    2.0 | 2000-01-02 |
| 1  | 10 | 1  | 1000.0 | 2000-01-01 |
+----+----+----+--------+------------+
```
2023-03-14 19:26:30 +08:00
f1dde20315 [ehancemnet](nereids) Refactor statistics (#17637)
1. Support for more expression type
2. Support derive with histogram
3. Use StatisticRange to abstract to logic
4. Use Statistics rather than StatisDeriveResult
2023-03-14 13:10:55 +08:00
be3a7e69cd [refactor](Nereids): polish code SemiJoinLogicalJoinTranspose. (#17740) 2023-03-14 12:48:58 +08:00
3a97190661 [fix](Nereids) Compare plan with their output rather than string in UnrankTest (#17698)
After adding a unique ID, the unRankTest fail because each plan has a different ID in the string.
To avoid the effect of unique ID, Compare the plan with the output rather than the string
2023-03-14 11:10:06 +08:00
5b39fa9843 [Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562)
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------

Co-authored-by: spaces-x <weixiang06@meituan.com>
2023-03-14 10:54:04 +08:00
f3c6ee5961 [Enhance](ComputeNode) ES Scan node support to be scheduled to compute node (#16533)
ES Scan node support to be scheduled to compute node.
2023-03-14 00:13:24 +08:00
9b7596f1c6 [Feature](Dynamic schema table) step1 support schema change expression (#17494)
1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
2023-03-13 15:12:42 +08:00
c302fa2564 [Feature](array-function) Support array_pushfront function (#17584) 2023-03-13 14:26:02 +08:00
ac944e2ac1 [fix](cooldown)Fix bug for storage policy in dynamic partition (#17665)
* fix bug for partition storage policy
2023-03-13 14:13:55 +08:00
be5147c32e [enhancement](feservice) catch throwable and print log for frontend service (#17708)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-13 11:27:00 +08:00
782001c75b [fix](planner) project should be done inside subquery (#17630)
WITH t0 AS(
SELECT report.date1 AS date2 FROM(
SELECT DATE_FORMAT(date, '%Y%m%d') AS date1 FROM cir_1756_t1
) report GROUP BY report.date1
),
t3 AS(
SELECT date_format(date, '%Y%m%d') AS date3
FROM cir_1756_t2
)
SELECT row_number() OVER(ORDER BY date2)
FROM(
SELECT t0.date2 FROM t0 LEFT JOIN t3 ON t0.date2 = t3.date3
) tx;

The DATE_FORMAT(date, '%Y%m%d') was calculated in GROUP BY node, which is wrong. This expr should be calculated inside the subquery.
2023-03-13 11:10:27 +08:00
55c42da511 [Feature](array) Support array<decimalv3> data type (#16640) 2023-03-13 10:48:13 +08:00
39b5682d59 [Pipeline](shared_scan_opt) Support shared scan opt in pipeline exec engine 2023-03-13 10:33:57 +08:00
a0a2809324 [Enhancement](multi-catalog) support hms event deserialization for HDP/CDH Hive versions. (#17660)
Some HDP/CDH Hive versions use gzip to compress the message body of hms NotificationEvent,
so com.qihoo.finance.hms.event.MetastoreEventFactory can not transfer it rightly.
2023-03-13 09:47:28 +08:00
b0d1166989 [fix](meta) fix concurrent modification exception and potential NPE (#17602) 2023-03-12 22:12:07 +08:00
46dcf69644 [fix](jdbc-catalog) avoid calculate driver's md5 when replaying edit log (#17693) 2023-03-12 22:11:45 +08:00
54e5c71e52 [fix](planner) Fix NPE when update stats by profile 2023-03-12 21:40:47 +08:00