Commit Graph

9299 Commits

Author SHA1 Message Date
bbf88ecc49 [Bug](datetimev2) Fix BE crash if scale is invalid (#17763) 2023-03-15 12:08:23 +08:00
049b70b957 [test](Nereids) add yandex metrica p2 regression case (#17082) 2023-03-15 11:50:00 +08:00
c8de04f9d7 [fix][Nereids] fix not correct condition to checkReorder in InnerJoinRightAssociate. (#17799) 2023-03-15 11:49:03 +08:00
97bf07fe26 [enhancement](Nereids) add new distributed cost model (#17556)
Add a new distributed cost model in Nereids. The new cost model models the cost of the pipeline execute engine by dividing cost into run and start costs. They are:
* START COST: the cost from starting to emitting the fist tuple
* RUN COST: the cost from emitting the first tuple to emitting all tuples

For the parent operator and child operator, we assume the timeline of them is:
  ```
  child start ---> child run --------------------> finish
             |---> parent start ---> parent run -> finish
  ```

Therefore, in the parallel model, we can get:
  ```
  start_cost(parent) = start_cost(child) + start_cost(parent)
  run_cost(parent) = max(run_cost(child), start_cost(parent) + run_cost(parent))
  ```
2023-03-15 11:22:31 +08:00
66f3ef568e (functions) optimize const_column to full convert 2023-03-15 10:57:03 +08:00
85080ee3c3 [vectorized](function) support array_map function (#17581) 2023-03-15 10:51:29 +08:00
ca0367d846 FIX: es doc (#17771) 2023-03-15 10:40:53 +08:00
5ab758674e [fix](planner) nested loop join with left semi generate repeat result (#17767) 2023-03-15 09:56:44 +08:00
45fcdaabc7 [Bug](catalog) Fix fetching information_schema table timed out(#17692) (#17694)
Co-authored-by: hugoluo <hugoluo@tencent.com>
2023-03-15 09:56:24 +08:00
16a4dc0a85 [ehancement](profile) Disable profiling for the internal query (#17720) 2023-03-15 09:48:29 +08:00
64c2437be5 [fix](coalesce) support coalesce function for bitmap (#17798) 2023-03-15 09:34:44 +08:00
9b047d2c94 Feat: Add byte size to TTypedesc in TExpr. Which will be used to carry scalarType information. (#17757)
Co-authored-by: libinfeng <libinfeng@selectdb.com>
2023-03-15 08:24:32 +08:00
7872f3626a [feature](Nereids): Rewrite InPredicate to disjunction if there exist items < 3 elements in InPredicate (#17646)
* [feature](Nereids): Rewrite InPredicate to disjunction if there exists < 3 elements in InPredicate

* fix SimplifyRange
2023-03-15 08:23:56 +08:00
02220560c5 [Improvement](multi catalog)Hive splitter. Get HDFS/S3 splits by using FileSystem api (#17706)
Use FileSystem API to get splits for file in HDFS/S3 instead of calling InputFormat.getSplits.
The splits is based on blocks in HDFS/S3.
2023-03-15 00:25:00 +08:00
b28f31f98d [fix](meta) fix show create table result of hive table (#17677)
make it usable in hive.

current issue: type of partition column are wrapped by ``, it's not illegal in hive. One problem case:

CREATE TABLE t3p_parquet(
id int,
name string)
PARTITIONED BY (
dt int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://path/to/t3p_parquet'
TBLPROPERTIES (
'transient_lastDdlTime'='1671700883')
2023-03-14 22:50:35 +08:00
76f486980a [docs](user)update the users number (#17749) 2023-03-14 22:42:51 +08:00
e46077fbf4 print group id for physical plan node (#17742) 2023-03-14 22:35:08 +08:00
7180cf3d9b [Improve](row store) avoid serialize null slot into a jsonb row (#17734)
This could save some disk space
2023-03-14 22:13:41 +08:00
6348819c27 [fix](Nereids) remove bitmap_union_int(bigint) signature (#17356) 2023-03-14 20:42:47 +08:00
ff9e03e2bf [Feature](add bitmap udaf) add the bitmap intersection and difference set for mixed calculation of udaf (#15588)
* Add the bitmap intersection and difference set for mixed calculation of udaf

Co-authored-by: zhangbinbin05 <zhangbinbin05@baidu.com>
2023-03-14 20:40:37 +08:00
65f71d9e06 [enhance](nereids) broadcast cost calculate (#17711)
update broadcast join cost estimate according to BE implementation.
there is an enhancement on BE. in broadcast join, BE only build one hash table, not instanceNum hash tables.
2023-03-14 19:45:03 +08:00
699159698e [enhancement](planner) support update from syntax (#17639)
support update from syntax

note: enable_concurrent_update is not supported now

```
UPDATE <target_table>
  SET <col_name> = <value> [ , <col_name> = <value> , ... ]
  [ FROM <additional_tables> ]
  [ WHERE <condition> ]
```

for example:
t1
```
+----+----+----+-----+------------+
| id | c1 | c2 | c3  | c4         |
+----+----+----+-----+------------+
| 3  | 3  | 3  | 3.0 | 2000-01-03 |
| 2  | 2  | 2  | 2.0 | 2000-01-02 |
| 1  | 1  | 1  | 1.0 | 2000-01-01 |
+----+----+----+-----+------------+
```

t2
```
+----+----+----+------+------------+
| id | c1 | c2 | c3   | c4         |
+----+----+----+------+------------+
| 4  | 4  | 4  |  4.0 | 2000-01-04 |
| 2  | 20 | 20 | 20.0 | 2000-01-20 |
| 5  | 5  | 5  |  5.0 | 2000-01-05 |
| 1  | 10 | 10 | 10.0 | 2000-01-10 |
| 3  | 30 | 30 | 30.0 | 2000-01-30 |
+----+----+----+------+------------+
```

t3
```
+----+
| id |
+----+
| 1  |
| 5  |
| 4  |
+----+
```

do update
```sql
 update t1 set t1.c1 = t2.c1, t1.c3 = t2.c3 * 100 from t2 inner join t3 on t2.id = t3.id where t1.id = t2.id;
```

the result
```
+----+----+----+--------+------------+
| id | c1 | c2 | c3     | c4         |
+----+----+----+--------+------------+
| 3  | 3  | 3  |    3.0 | 2000-01-03 |
| 2  | 2  | 2  |    2.0 | 2000-01-02 |
| 1  | 10 | 1  | 1000.0 | 2000-01-01 |
+----+----+----+--------+------------+
```
2023-03-14 19:26:30 +08:00
f999b823fc [feature](array) support array for apache arrow convertor (#17682)
* support array type for arrow

* fix builder.Append() for each array row

* fix array child column append start offset
2023-03-14 17:53:16 +08:00
f1dde20315 [ehancemnet](nereids) Refactor statistics (#17637)
1. Support for more expression type
2. Support derive with histogram
3. Use StatisticRange to abstract to logic
4. Use Statistics rather than StatisDeriveResult
2023-03-14 13:10:55 +08:00
be3a7e69cd [refactor](Nereids): polish code SemiJoinLogicalJoinTranspose. (#17740) 2023-03-14 12:48:58 +08:00
77ab2fac20 [refactor](functioncontext) remove function context impl class (#17715)
* [refactor](functioncontext) remove function context impl class


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-03-14 11:21:45 +08:00
3a97190661 [fix](Nereids) Compare plan with their output rather than string in UnrankTest (#17698)
After adding a unique ID, the unRankTest fail because each plan has a different ID in the string.
To avoid the effect of unique ID, Compare the plan with the output rather than the string
2023-03-14 11:10:06 +08:00
5b39fa9843 [Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562)
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------

Co-authored-by: spaces-x <weixiang06@meituan.com>
2023-03-14 10:54:04 +08:00
36a0d40ac3 Fix errors in the data-partition.md (#17756) 2023-03-14 10:44:57 +08:00
ba0f5a2355 [test](mv) Add mv case from fe ut (#17204)
add some mv case from fe ut MaterializedViewFunctionTest
2023-03-14 10:29:43 +08:00
2e0af4e33c [Enhancement](inverted-index) use read buffer when read index bytes in compound reader (#17306)
Read IO would be a problem when reading inverted index from disk.
Using read buffer to reduce IO.
Set use buffer flag to be true when reading internal bytes in compound reader for inverted index.
2023-03-14 10:10:59 +08:00
7d91114304 [fix](join) fix wrong result of null aware left anti join (#17752) 2023-03-14 09:35:46 +08:00
c6630a06c1 [Fix](multi-catalog) Fix "test_hive_other" regression test. (#17611) 2023-03-14 09:16:48 +08:00
76458cf091 [typo](partition)Modify the list partition document #17744 2023-03-14 08:27:26 +08:00
883ae8a86d [typo](docs) Add some content for bitmap_hash.md. (#17747) 2023-03-14 08:27:07 +08:00
f3c6ee5961 [Enhance](ComputeNode) ES Scan node support to be scheduled to compute node (#16533)
ES Scan node support to be scheduled to compute node.
2023-03-14 00:13:24 +08:00
9b7596f1c6 [Feature](Dynamic schema table) step1 support schema change expression (#17494)
1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
2023-03-13 15:12:42 +08:00
c302fa2564 [Feature](array-function) Support array_pushfront function (#17584) 2023-03-13 14:26:02 +08:00
ac944e2ac1 [fix](cooldown)Fix bug for storage policy in dynamic partition (#17665)
* fix bug for partition storage policy
2023-03-13 14:13:55 +08:00
be5147c32e [enhancement](feservice) catch throwable and print log for frontend service (#17708)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-13 11:27:00 +08:00
2b31fc1472 [fix](regression) segcompaction timeout too short (#16731) (#17565)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-03-13 11:19:21 +08:00
b9fac82fb1 [fix](regression) adjust regression pipeline config(tablet_create_timeout_second) for avoiding create partition timeout (#17668)
This pull request for bellow problem :
regression pipleline fail case always meet error "Failed to create partition. Timeout. Unfinished mark: 10003=57059", so adjust tablet_create_timeout_second to 100
2023-03-13 11:18:03 +08:00
5fccbac81b [fix](demo)add Sync full database for versions below doris 1.2 (#17669) 2023-03-13 11:17:29 +08:00
Pxl
16fc3a0e22 [Chore](compile) remove some unused static on inline function to reduce compile time (#17603)
remove some unused static on inline function to reduce compile time
2023-03-13 11:11:59 +08:00
782001c75b [fix](planner) project should be done inside subquery (#17630)
WITH t0 AS(
SELECT report.date1 AS date2 FROM(
SELECT DATE_FORMAT(date, '%Y%m%d') AS date1 FROM cir_1756_t1
) report GROUP BY report.date1
),
t3 AS(
SELECT date_format(date, '%Y%m%d') AS date3
FROM cir_1756_t2
)
SELECT row_number() OVER(ORDER BY date2)
FROM(
SELECT t0.date2 FROM t0 LEFT JOIN t3 ON t0.date2 = t3.date3
) tx;

The DATE_FORMAT(date, '%Y%m%d') was calculated in GROUP BY node, which is wrong. This expr should be calculated inside the subquery.
2023-03-13 11:10:27 +08:00
55c42da511 [Feature](array) Support array<decimalv3> data type (#16640) 2023-03-13 10:48:13 +08:00
3a6c0e7867 [fix](regression) fix test_array_export and test_map_export dir conflict #17636
regression test test_array_export and test_map_export use same output dir, if they run at the same time, the cases will failed.
2023-03-13 10:35:50 +08:00
39b5682d59 [Pipeline](shared_scan_opt) Support shared scan opt in pipeline exec engine 2023-03-13 10:33:57 +08:00
edb2d90852 [fix](routine load) fix ROUTINE LOAD bug,kafka commit a lack of one(#17282) (#17291)
Co-authored-by: hugoluo <hugoluo@tencent.com>
2023-03-13 10:20:59 +08:00
a0a2809324 [Enhancement](multi-catalog) support hms event deserialization for HDP/CDH Hive versions. (#17660)
Some HDP/CDH Hive versions use gzip to compress the message body of hms NotificationEvent,
so com.qihoo.finance.hms.event.MetastoreEventFactory can not transfer it rightly.
2023-03-13 09:47:28 +08:00