Commit Graph

926 Commits

Author SHA1 Message Date
ffda858f01 [fix](regression) fix unstable test cases and remove redundant cases (#17845)
aggregate_strategies execution too slow, use smaller table valued function to speed up
add a p2 case nereids_syntax_p2/aggregate_strategies to use larger table valued function to ensure correct
remove case nereids_syntax_p0/test_join_nereids since it redundant with nereids_p0/join/test_join
remove unstable case in query_p0/aggregate/aggregate
2023-03-16 15:59:26 +08:00
ee7226348d [FIX](Map) fix map compaction error (#17795)
When compaction case, memory map offsets coming to  same olap convertor which is from 0 to 0+size
but it should be continue in different pages when in one segment writer . 
eg : 
last block with map offset : [3, 6, 8, ... 100] 
this block with map offset : [5, 10, 15 ..., 100] 
the same convertor should record last offset to make later coming offset followed last offset.
so after convertor : 
the current offset should [105, 110, 115, ... 200], then column writer just call append_data() to make the right offset data append pages
2023-03-16 13:54:01 +08:00
0086fdbbdb [enhancement](planner) support delete from using syntax (#17787)
support syntax delete using, this syntax only support UNIQUE KEY model

use the result of `t2` join `t3` to romve rows from `t1`

```sql
-- create t1, t2, t3 tables
CREATE TABLE t1
  (id INT, c1 BIGINT, c2 STRING, c3 DOUBLE, c4 DATE)
UNIQUE KEY (id)
DISTRIBUTED BY HASH (id)
PROPERTIES('replication_num'='1', "function_column.sequence_col" = "c4");

CREATE TABLE t2
  (id INT, c1 BIGINT, c2 STRING, c3 DOUBLE, c4 DATE)
DISTRIBUTED BY HASH (id)
PROPERTIES('replication_num'='1');

CREATE TABLE t3
  (id INT)
DISTRIBUTED BY HASH (id)
PROPERTIES('replication_num'='1');

-- insert data
INSERT INTO t1 VALUES
  (1, 1, '1', 1.0, '2000-01-01'),
  (2, 2, '2', 2.0, '2000-01-02'),
  (3, 3, '3', 3.0, '2000-01-03');

INSERT INTO t2 VALUES
  (1, 10, '10', 10.0, '2000-01-10'),
  (2, 20, '20', 20.0, '2000-01-20'),
  (3, 30, '30', 30.0, '2000-01-30'),
  (4, 4, '4', 4.0, '2000-01-04'),
  (5, 5, '5', 5.0, '2000-01-05');

INSERT INTO t3 VALUES
  (1),
  (4),
  (5);

-- remove rows from t1
DELETE FROM t1
  USING t2 INNER JOIN t3 ON t2.id = t3.id
  WHERE t1.id = t2.id;
```

the expect result is only remove the row where id = 1 in table t1

```
+----+----+----+--------+------------+
| id | c1 | c2 | c3     | c4         |
+----+----+----+--------+------------+
| 2  | 2  | 2  |    2.0 | 2000-01-02 |
| 3  | 3  | 3  |    3.0 | 2000-01-03 |
+----+----+----+--------+------------+
```
2023-03-16 13:12:00 +08:00
1da3e7596e [fix](point query) Fix NegativeArraySizeException when prepared statement contains a long string (#17651) 2023-03-16 10:24:33 +08:00
a53d46e317 [Fix](array function) fix array_pushfront function with DecimalV3 #17760
Support array_pushfront function with DecimalV3

Issue Number: close #xxx
2023-03-16 09:03:52 +08:00
079e6a3e12 [regression-test](vectorized) remove unused vectorization flag (#17662) 2023-03-15 17:59:22 +08:00
049b70b957 [test](Nereids) add yandex metrica p2 regression case (#17082) 2023-03-15 11:50:00 +08:00
66f3ef568e (functions) optimize const_column to full convert 2023-03-15 10:57:03 +08:00
85080ee3c3 [vectorized](function) support array_map function (#17581) 2023-03-15 10:51:29 +08:00
5ab758674e [fix](planner) nested loop join with left semi generate repeat result (#17767) 2023-03-15 09:56:44 +08:00
64c2437be5 [fix](coalesce) support coalesce function for bitmap (#17798) 2023-03-15 09:34:44 +08:00
699159698e [enhancement](planner) support update from syntax (#17639)
support update from syntax

note: enable_concurrent_update is not supported now

```
UPDATE <target_table>
  SET <col_name> = <value> [ , <col_name> = <value> , ... ]
  [ FROM <additional_tables> ]
  [ WHERE <condition> ]
```

for example:
t1
```
+----+----+----+-----+------------+
| id | c1 | c2 | c3  | c4         |
+----+----+----+-----+------------+
| 3  | 3  | 3  | 3.0 | 2000-01-03 |
| 2  | 2  | 2  | 2.0 | 2000-01-02 |
| 1  | 1  | 1  | 1.0 | 2000-01-01 |
+----+----+----+-----+------------+
```

t2
```
+----+----+----+------+------------+
| id | c1 | c2 | c3   | c4         |
+----+----+----+------+------------+
| 4  | 4  | 4  |  4.0 | 2000-01-04 |
| 2  | 20 | 20 | 20.0 | 2000-01-20 |
| 5  | 5  | 5  |  5.0 | 2000-01-05 |
| 1  | 10 | 10 | 10.0 | 2000-01-10 |
| 3  | 30 | 30 | 30.0 | 2000-01-30 |
+----+----+----+------+------------+
```

t3
```
+----+
| id |
+----+
| 1  |
| 5  |
| 4  |
+----+
```

do update
```sql
 update t1 set t1.c1 = t2.c1, t1.c3 = t2.c3 * 100 from t2 inner join t3 on t2.id = t3.id where t1.id = t2.id;
```

the result
```
+----+----+----+--------+------------+
| id | c1 | c2 | c3     | c4         |
+----+----+----+--------+------------+
| 3  | 3  | 3  |    3.0 | 2000-01-03 |
| 2  | 2  | 2  |    2.0 | 2000-01-02 |
| 1  | 10 | 1  | 1000.0 | 2000-01-01 |
+----+----+----+--------+------------+
```
2023-03-14 19:26:30 +08:00
5b39fa9843 [Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562)
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------

Co-authored-by: spaces-x <weixiang06@meituan.com>
2023-03-14 10:54:04 +08:00
ba0f5a2355 [test](mv) Add mv case from fe ut (#17204)
add some mv case from fe ut MaterializedViewFunctionTest
2023-03-14 10:29:43 +08:00
c6630a06c1 [Fix](multi-catalog) Fix "test_hive_other" regression test. (#17611) 2023-03-14 09:16:48 +08:00
c302fa2564 [Feature](array-function) Support array_pushfront function (#17584) 2023-03-13 14:26:02 +08:00
782001c75b [fix](planner) project should be done inside subquery (#17630)
WITH t0 AS(
SELECT report.date1 AS date2 FROM(
SELECT DATE_FORMAT(date, '%Y%m%d') AS date1 FROM cir_1756_t1
) report GROUP BY report.date1
),
t3 AS(
SELECT date_format(date, '%Y%m%d') AS date3
FROM cir_1756_t2
)
SELECT row_number() OVER(ORDER BY date2)
FROM(
SELECT t0.date2 FROM t0 LEFT JOIN t3 ON t0.date2 = t3.date3
) tx;

The DATE_FORMAT(date, '%Y%m%d') was calculated in GROUP BY node, which is wrong. This expr should be calculated inside the subquery.
2023-03-13 11:10:27 +08:00
55c42da511 [Feature](array) Support array<decimalv3> data type (#16640) 2023-03-13 10:48:13 +08:00
39b5682d59 [Pipeline](shared_scan_opt) Support shared scan opt in pipeline exec engine 2023-03-13 10:33:57 +08:00
13e05c4a5d [Enhencement](stream load) add some regression test for json format streamload (#17520) 2023-03-12 20:13:07 +08:00
455c800405 [feature](parquet-reader) add rle bool and delta decoder to read AWS Glue (#17112)
Support delta encoding and rle(bool) to read Glue data
add delta bit pack decoder,
add delta length byte array decoder,
add delta byte array decoder.
add rle bool decoder.

We find some data type is read with delta encoding on AWS Glue, so it should be supported.
The definition of delta encoding can refer to the delta encoding in parquet.
2023-03-12 20:09:58 +08:00
Pxl
8328ab69ad [Chore](Materialized-View) add some mv regression test case (#17345)
1. add some mv regression test case
2. rename materialized_view_p0 to mv_p0 (avoid create database failed because long db name)
2023-03-11 10:55:11 +08:00
6dcd791b74 [feature](struct-type) support CAST AS Struct type (#17553)
1. add support `CAST AS Struct` from Struct type;
2. fix crash while `CAST('{}' AS Struct)`;
3. `CAST('' AS complext_type)` should return NULL instead of empty object;

---------

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2023-03-10 21:21:16 +08:00
a79b8ede88 [Bug](ColumnArray) Fix array column replicate replicate_offsets not matched (#17616)
the input replicate_offsets should be the same size as ColumnArray's offset.
```
IColumn::Offsets replicate_offsets(get_offsets().size(), 0);
// |---------------------|-------------------------|-------------------------|
// [0, begin)             [begin, begin + count_sz)  [begin + count_sz, size())
//  do not need to copy    copy counts[n] times       do not need to copy
```

we should
2023-03-10 11:52:22 +08:00
e1bf9411de [feature](array function) add support for array_enumerate_uniq (#17541)
add support for array_enumerate_uniq()
2023-03-10 10:20:49 +08:00
4ba93efc98 [Enhance](DOE)Support parse default es iso datetime string (#17412)
* support parse default es iso datetime string
2023-03-10 09:59:20 +08:00
006f7a91ac [fix](planner) should not turn on push agg op when olapscan has conjuncts on it (#17598)
we should not set PushAggOp to any type, if olap scan already has conjunct on it.
2023-03-10 09:33:08 +08:00
a745ab1703 [fix](schema scanner) fix query some schema table report invalid parameter (#17626)
Example:

SELECT ROUTINE_SCHEMA AS PROCEDURE_CAT, NULL AS PROCEDURE_SCHEM,ROUTINE_NAME AS PROCEDURE_NAME,NULL AS NUM_INPUT_PARAMS,NULL AS NUM_OUTPUT_PARAMS,NULL AS NUM_RESULT_SETS,ROUTINE_COMMENT AS REMARKS,IF(ROUTINE_TYPE = 'FUNCTION', 2,IF(ROUTINE_TYPE= 'PROCEDURE', 1, 0)) AS PROCEDURE_TYPE FROM INFORMATION_SCHEMA.ROUTINES WHERE ROUTINE_SCHEMA = DATABASE();
ERROR 1105 (HY000): errCode = 2, detailMessage = invalid parameter

This wrong and some BI tools could not work correctly.
2023-03-10 08:52:09 +08:00
08f0170895 [fix](olap) The 'scan key' generated by the 'is null' expression causes incorrect query results (#17569) 2023-03-10 08:51:06 +08:00
f9baf9c556 [improvement](scan) Support pushdown execute expr ctx (#15917)
In the past, only simple predicates (slot=const), and, like, or (only bitmap index) could be pushed down to the storage layer. scan process:

Read part of the column first, and calculate the row ids with a simple push-down predicate.
Use row ids to read the remaining columns and pass them to the scanner, and the scanner filters the remaining predicates.
This pr will also push-down the remaining predicates (functions, nested predicates...) in the scanner to the storage layer for filtering. scan process:

Read part of the column first, and use the push-down simple predicate to calculate the row ids, (same as above)
Use row ids to read the columns needed for the remaining predicates, and use the pushed-down remaining predicates to reduce the number of row ids again.
Use row ids to read the remaining columns and pass them to the scanner.
2023-03-10 08:35:32 +08:00
849b5b7b8f [fix](sequence) fix that the result is wrong when load multiple duplicate keys (#17575) 2023-03-09 20:59:23 +08:00
6c894be007 [enhancement](Nereids) support decimalv3 and precision derive (#17393) 2023-03-09 14:12:10 +08:00
4ef46159ae [vectorized](udaf) support array type for java-udaf (#17351) 2023-03-09 11:30:07 +08:00
06dee69174 [Refactor](map) remove using column array in map to reduce offset column (#17330)
1. remove column array in map 
2. add offsets column in map 
Aim to reduce duplicate offset  from key-array and value-array in disk
2023-03-09 11:22:26 +08:00
368e6a4f9c [Bug](array filter) Fix bug due to ColumnArray::filter_generic invalid inplace size_at after set_end_ptr (#17554)
We should make a new PodArray to add items instead of do it inplace
2023-03-09 10:59:29 +08:00
00727e8c11 [fix](in-bitmap) fix result may be wrong if the left side of the in bitmap predicate is a constant (#17570) 2023-03-09 10:59:05 +08:00
397cc011c4 [fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420)
ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided.
Solved: 192/256 supports calculation without init vector

For other algorithms, an error should be reported when there is no init vector

Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector.

Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt

Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found
2023-03-09 09:51:41 +08:00
2b6d971c2f [fix](nereids)fix first_value/lead/lag window function bug in nereids (#17315)
* [fix](nereids)fix first_value/lead/lag window function bug in nereids

* add more test

* add order by to fix test case

* fix test cases
2023-03-09 09:35:27 +08:00
4822b9811a [feature](nereids)support bitmap runtime filter on nereids (#16927)
* A in(B) -> bitmap_contains(bitmap_union(B), A)
support bitmap runtime filter on nereids

* GroupPlan -> Plan

* fmt

* fix target cast problem
remove test code
2023-03-09 09:30:24 +08:00
f0bd002911 [fix](DOE) Fix esquery not working (#17566)
Function esquery does not work because there is a problem parsing the first parameter type.
The first parameter, which is SlotRef, will be cast to CastExpr. This will cause error while generating ES DSL.
Add more types to adapt esquery function.
2023-03-08 21:51:17 +08:00
bd5ed2b0c2 [enhancement](histogram) optimize the histogram bucketing strategy, etc (#17264)
* optimize the histogram bucketing strategy, etc

* fix p0 regression of histogram
2023-03-08 20:12:05 +08:00
eea6d770d7 [fix](bitmap) fix wrong result of bitmap_or for null (#17456)
Result of select bitmap_to_string(bitmap_or(to_bitmap(1), null)) should be 1 instead of null.

This PR fix logic of bitmap_or and bitmap_or_count.

Other count related funcitons should also be checked and fix, they will be fixed in another PR.
2023-03-08 16:29:01 +08:00
aab14922af [Feature](Nereids) support MarkJoin (#16616)
# Proposed changes
1.The new optimizer supports the combination of subquery and disjunction.In the way of MarkJoin, it behaves the same as the old optimizer. For design details see:https://emmymiao87.github.io/jekyll/update/2021/07/25/Mark-Join.html.
2.Implicit type conversion is performed when conjects are generated after subquery parsing
3.Convert the unnesting of scalarSubquery in filter from filter+join to join + Conjuncts.
2023-03-08 14:26:24 +08:00
626fbc34f9 [bugfix](jsonb) Fix create mv using jsonb key cause be crash (#17430) 2023-03-08 14:18:26 +08:00
4ea0d6c5fa [feature](array_function) add support for array_popfront (#17416) 2023-03-08 13:57:38 +08:00
b1d65f855d [Feature](array-function) Support array_concat function (#17436) 2023-03-08 13:57:16 +08:00
778acb3c5b [opt](string) optimize string equal comparision (#17336)
Optimize string equal and not-equal comparison by using memequal_small_allow_overflow15.
2023-03-08 11:30:00 +08:00
4b743061b4 [feature](function) support type template in SQL function (#17344)
A new way just like c++ template is proposed in this PR. The previous functions can be defined much simpler using template function. 

    # map element extract template function
    [['element_at', '%element_extract%'], 'E', ['ARRAY<E>', 'BIGINT'], 'ALWAYS_NULLABLE', ['E']],

    # map element extract template function
    [['element_at', '%element_extract%'], 'V', ['MAP<K, V>', 'K'], 'ALWAYS_NULLABLE', ['K', 'V']],


BTW, the plain type function is not affected and the legacy ARRAY_X MAP_K_V is still supported for compatability.
2023-03-08 10:51:31 +08:00
69c62b6c6c [Fix](vectorization) fixed that when a column's _fixed_values exceeds the max_pushdown_conditions_per_column limit, the column will not perform predicate pushdown, but if there are subsequent columns that need to be pushed down, the subsequent column pushdown will be misplaced in _scan_keys and it causes query results to be wrong (#17405)
the max_pushdown_conditions_per_column limit, the column will not perform predicate pushdown, but if there are subsequent columns that need to be pushed down, the subsequent column pushdown will be misplaced in _scan_keys and it causes query results to be wrong
Co-authored-by: tongyang.hty <hantongyang@douyu.tv>
2023-03-08 07:23:56 +08:00
06468ba627 [vectorized](bug) fix array constructor function change origin column from block (#17296) 2023-03-07 16:42:23 +08:00