Commit Graph

1394 Commits

Author SHA1 Message Date
fefc0d6814 [Fix](planner)fix create view ignore order by info bug. (#18197) 2023-03-30 20:17:46 +08:00
99bd5ec022 [fix](Nereids) fix some bugs in Subquery to window rule (#18233)
we introduce this rule by PR #17968, but some corner case do
not be processed correctly. This PR fix these bugs:
1. fix window function generation method, replace inner slot with
   equivalent outer slot
2. forbid below scenes
    a. inner has a mapping project
    b. inner has an unexpected filter
    c. outer has a mapping project
    d. outer has an unexpected filter
    e. outer has additional table
    f. outer has same table
    g. outer and inner with different join condition
    h. outer and inner has same table with different join condition
2023-03-30 16:09:16 +08:00
ea41d94582 [Improve](complex-type) Support Count(complexType) (#17868)
Support count function for ARRAY/MAP/STRUCT type
2023-03-30 15:43:32 +08:00
e3bd812887 [fix](stream-load) find line delimiter in csv should start with no offset (#18161)
when loading big file with multi bytes line delimiter, some line record maybe incomplete because of _output_buf_limit, so this incomplete data will move to the beginning of the output buf and read more data into output buf. In this case, find line delimiter should start with no offset to avoid a bug that spilt two lines as one line.
2023-03-30 14:42:34 +08:00
Pxl
cec983b7ef [Chore](materialized-view) forbiden create mv with where clause contained aggregate column (#18168)
forbiden create mv with where clause contained aggregate column

create table a_table(
	k1 int null,
	k2 int not null,
    k3 bigint null,
	k4 bigint sum null,
    k5 bitmap bitmap_union null,
    k6 hll hll_union null
)
aggregate key (k1,k2,k3)
distributed BY hash(k1) buckets 3
properties("replication_num" = "1");
create materialized view where_1 as select k1,k4 from a_table where k4 =1; // invalid, mv on agg table need group by
create materialized view where_2 as select k1,sum(k4) from a_table where k4 =1 group by k1; // invalid, k4 is agg column
create materialized view where_2 as select k1,sum(k4) from a_table where k1+k4 =1 group by k1; // invalid, k4 is agg column
2023-03-30 13:03:03 +08:00
Pxl
c8ad62a3cd [Enchancement](materialized-view) enchance materialized view where clause match (#18179)
enchance materialized view where clause match
2023-03-30 13:02:21 +08:00
c8ea5bff1d [Fix](planner) fix nested udf bind arguments exception (#18188)
nested alias function will cause bind argument exception, sql like:
``` sql
CREATE ALIAS FUNCTION f1(DATETIMEV2(3), INT)
            with PARAMETER (datetime1, int1) as date_trunc(days_sub(datetime1, int1), 'day')

CREATE ALIAS FUNCTION f2(DATETIMEV2(3), int)
            with PARAMETER (datetime1, int1) as DATE_FORMAT(HOURS_ADD(
                date_trunc(datetime1, 'day'),
                add(multiply(floor(divide(HOUR(datetime1), divide(24,int1))), 1), 1)
            ), '%Y%m%d:%H')

select f2(f1(now(3), 2), 3)
```

bug in FunctionCallExpr#rewriteExpr(), the retExpr will be replaced to originExpr to change the alias function to builtin function, but the retExpr.fn is not null, so when return to outer scope, the fn will be covered. That's the example:
```
f1(f1()) -> date_trunc(days_sub(date_trunc(days_sub()))) is correct and
f1(f1()) -> date_trunc(days_sub(days_sub())) is bug.
``` 

we fix it.
2023-03-30 11:39:02 +08:00
b3657959c9 [fix](planner )need add LateralViewRef's id into TableRef's allTableRefIds (#18220)
1. add LateralViewRef's id into TableRef's allTableRefIds, so the caller won't miss LateralViewRef when trying to get all the tableref ids.
2. TableFunctionNode should use child node's output tuple id as the input tuple id
2023-03-30 11:32:18 +08:00
525f15dddf [vectorized](function) support array_sortby function (#18071) 2023-03-30 11:07:49 +08:00
9877143210 [fix](like) fix wrong result of like pattern with backslash (#18039)
Result is empty for query select * from person where address like '%\\\\%';, but MySQL can get a line of result.

CREATE TABLE `person` (
  `id` int(11) NULL,
  `name` text NULL,
  `age` int(11) NULL,
  `class` int(11) NULL,
  `address` text NULL
) ENGINE=OLAP
UNIQUE KEY(`id`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
); 

insert into person values (10001,'test1',30,2,'test\\\\,xxx');
Adding logs:

select * from person where address like '%\\\\%';

I0323 10:26:15.907760 2387043 like.cpp:558] arg str: %\\%, size: 4, pattern LIKE_ENDS_WITH_RE: (?:%+)(((\\%)|(\\_)|([^%_]))+), size: 30
I0323 10:26:15.907789 2387043 like.cpp:562] match 0: \\%, size: 3
I0323 10:26:15.907801 2387043 like.cpp:562] match 1: \%, size: 2
I0323 10:26:15.907811 2387043 like.cpp:562] match 2: \%, size: 2
I0323 10:26:15.907821 2387043 like.cpp:562] match 3: , size: 0
I0323 10:26:15.907830 2387043 like.cpp:562] match 4: \, size: 1
I0323 10:26:15.907842 2387043 like.cpp:615] search_string : \\%
I0323 10:26:15.907855 2387043 like.cpp:619] search_string escape removed: \%
It matchs against the LIKE_ENDS_WITH_RE which is wrong, the meaning of the sql should be: match strings that have one backslash in any place.
2023-03-30 11:05:09 +08:00
2ee1468576 [improvement](executor) Support task group schedule in pipeline engine (#17615) 2023-03-30 10:49:50 +08:00
3b04d42779 [fix](bitmap) fix bug: orthogonal_bitmap_union_count coredump when arg is nullable (#18182)
Query cause be cordump:

select    ORTHOGONAL_BITMAP_UNION_COUNT(     cast(null as bitmap)) from   t;
2023-03-30 09:31:58 +08:00
55bf38dbab [feature-wip](MTMV) Use SSB ddl to test (#18150)
Add regression tests for MTMV.
2023-03-30 00:11:38 +08:00
6964d9f99c [fix](function) resubmit-fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17907)
* Revert "[fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420)"

This reverts commit 397cc011c4f1ba5a25c770258c13f1cd3f28b47d.

* [fix-resubmit](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420)

ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided.
Solved: 192/256 supports calculation without init vector

For other algorithms, an error should be reported when there is no init vector

Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector.

Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt

Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found
2023-03-29 21:13:01 +08:00
b92087dee8 [Fix](Nereids) ReorderJoin rule cannot process MarkJoin correctly (#18159)
Fix two problems,
1. The logical join containing the MarkJoinSlotRefrance column will generate a plan->MarkJoinSlotreference structure when reorderJoin is executed, and the MarkJoinSlotreference column will be restored after the reorder is completed. But when filter+crossJoin exists, it will be transformed into innerJoin in the rules, causing the map to fail, and the corresponding plan cannot be found, thus losing the MarkJoinSlotreference column.
2. Originally, the MarkJoinSlotReference column was used as the NonUserVisibleOutput of logicalJoin. At the same time, when logicalApply was generated, the added logicalProject did not include the MarkJoinSlotReference column, and the invalid logicalProject was deleted based on other rules, so as to ensure that LogicalApply was under the logicalFilter and could recognize the MarkJoinSlotReference column. But there will be problems if logicalProject cannot be deleted.

Repair method
1. For logicalJoin containing MarkJoinSlotreference, the rules of reorderJoin are not executed.
2. Use MarkJoinSlotreference as the output of logicalJoin and also as the output of LogicalApply.
3. When generating LogicalApply, if MarkJoinSlotreference is included, you need to add an additional logicalProject to logicalFilter, and remove the MarkJoinSlotreference column.

eg
```
logicalFilter(subquery with disconjunct)

after SubqueryToApply

logicalProject(without markJoinSlotReference)
+-- logicalFilter(markJoinSlotReference)
    +-- logicalProject(with markJoinSlotReference)
        +-- logicalApply()
```

```
SELECT * FROM sub_query_correlated_subquery1 WHERE k1 IN (SELECT k1 FROM sub_query_correlated_subquery3) OR k1 < 10;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject[60] ( distinct=false, projects=[k1#0, k2#1], excepts=[], canEliminate=true )                                                                                                        |
| +--LogicalProject[59] ( distinct=false, projects=[k1#0, k2#1], excepts=[], canEliminate=true )                                                                                                     |
|    +--LogicalFilter[58] ( predicates=($c$1#7#false OR (k1#0 < 10)) )                                                                                                                               |
|       +--LogicalProject[57] ( distinct=false, projects=[k1#0, k2#1, $c$1#7#false], excepts=[], canEliminate=true )                                                                                 |
|          +--LogicalApply ( correlationSlot=[], correlationFilter=Optional.empty, isMarkJoin=true, MarkJoinSlotReference=$c$1#7#false, scalarSubCorrespondingSlot=empty )                           |
|             |--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery1, indexName=<index_not_selected>, selectedIndexId=63105, preAgg=ON )    |
|             +--LogicalProject[34] ( distinct=false, projects=[k1#2], excepts=[], canEliminate=true )                                                                                               |
|                +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery3, indexName=<index_not_selected>, selectedIndexId=63115, preAgg=ON ) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
2023-03-29 16:12:42 +08:00
Pxl
503c6bf38e [Chore](materialized-view) forbiden create mv with some constant expr and curdate() (#18145)
forbiden create mv with some constant expr and curdate()
2023-03-29 16:08:48 +08:00
238223fb8b [regression-test](log) add log for malforamt response of stream load (#18173) 2023-03-29 15:52:44 +08:00
c0797c4be3 [test](decimal) Update output data for P1 regression (#18199) 2023-03-29 15:13:12 +08:00
Pxl
0c01df6bb2 [Bug](view) fix AES_ENCRYPT have wrong result on view (#18034) 2023-03-29 10:49:39 +08:00
Pxl
fd18e34c0c [Chore](planner) add error information for OnClause contain ExistsPredicates (#18090) 2023-03-29 10:47:41 +08:00
Pxl
664fbffcba [Enchancement](table-function) optimization for vectorized table function (#17973) 2023-03-29 10:45:00 +08:00
32ccf0c68d [test](case)add external hive parquet case 0328 #18169
add case about external hive parquet
2023-03-29 09:13:03 +08:00
4f2135f869 [test](Nereids) Add regression test for query empty table of tpcds (#18172)
Add regression test for query empty table tpcds, this can prevent test fallback
2023-03-28 20:34:27 +08:00
012f7bd031 [feature](function)Add ST_Area function (#18138) 2023-03-28 19:36:09 +08:00
d27201f331 [fix](nested_loop_join)got incorrect result from nested loop join without condition (#18139) 2023-03-28 16:20:05 +08:00
ba1b159ad2 [fix](regression) deal with output order and timeout for segcompaction p1 (#18162)
1. Add `order by` to regulate the output order to avoid false-negative
    mismatch for dup table.
2. Increase load timeout.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-03-28 16:00:27 +08:00
b9161295b7 [Fix](plan) fix bug that the case sensibility of column name may impact join method (#17904)
Issue Number: close #17876
2023-03-28 15:18:30 +08:00
d7dcdfcba9 [Fix](Create View) support create view from tvf (#18087)
Support create view as select * from tvf()
2023-03-28 15:07:32 +08:00
1956f04aa2 [feature](multi-catalog) add specified_database_list PROPERTY for jdbc/hms/iceberg catalog (#17803)
add specified_database_list PROPERTY for jdbc catalog, user can use many database specified by jdbc catalog
2023-03-28 14:04:41 +08:00
Pxl
d2839eb41f [Chore](Materialized-View) add some mv regression test case (#18095)
add some mv regression test case
2023-03-28 10:31:37 +08:00
Pxl
9c1e86f84f [Bug](materialized-view) add some limit for create mv on aggregate table (#18141)
add some limit for create mv on aggregate table.
```sql
CREATE TABLE t1 (   
p1 INT,   
p2 INT,   
p3 INT,   
v1 INT SUM,   
v2 INT MAX,   
v3 INT MIN ) AGGREGATE KEY (p1, p2, p3) DISTRIBUTED BY HASH (p1) BUCKETS 1 PROPERTIES ('replication_num' = '1');


CREATE MATERIALIZED VIEW mv_1 AS SELECT p1, SUM(v3) FROM t1 GROUP BY p1;  // invalid aggregate type
CREATE MATERIALIZED VIEW mv_2 AS SELECT p1, MIN(v3+v3) FROM t1 GROUP BY p1; // invalid expression calculate on aggregate column
CREATE MATERIALIZED VIEW mv_3 AS SELECT p1, SUM(v1) FROM t1 GROUP BY p1; // cast v1 as bigint, ok
CREATE MATERIALIZED VIEW mv_4 AS SELECT p1, SUM(abs(v1)) FROM t1 GROUP BY p1; // invalid expression calculate on aggregate column

```
2023-03-28 10:28:29 +08:00
09e346e47c [fix](type) Data precision is lost when converting DOUBLE type data to DECIMAL (#17191) (#17562)
1. Fix bug when converting DOUBLE to DECIMAL;
2. Fix bug when converting DOUBLE to DECIMALV3;
2023-03-28 09:46:43 +08:00
c95b81f950 [fix](order by) fix bug of order by desc when rowsets is no overlapping (#18100)
In the case of rowets non-overlap and desc sorting, the logic of VCollectIterator::Level0Iterator::init_for_union will be followed. In this function, the row ref pos of the first level0 iterator is set to 0, and the row pos of other level0 iterators are all Set to -1.

But in the level1iterator, when rowets are non-overlapping and is ordering by desc, the list of rowset iterators will be reversed, causing the row ref pos of the first level0 iterator in the list to be -1, causing the block reader to think that the entire tablet has no data.
2023-03-28 09:31:37 +08:00
99427d409d [vectorized](udaf) fix java-udaf case is unstable with fuzzy mode #18146
he udaf case is unstable reason:
when fuzzy enable_pipeline_engine=true, the case of agg function only 1 instance,
so not merge the default value, but if instance>1, will merge the default value
2023-03-28 09:30:49 +08:00
115e52c16c [Opt](array) optimize_array_sort (#18123) 2023-03-27 22:01:24 +08:00
ee80c12815 [feature](json) add json_extract function (#17808) 2023-03-27 21:19:47 +08:00
894f38a517 [fix](planner) fix conjunct planned on exchange node (#18042)
sql like: 
select k5, k6, SUM(k3) AS k3 
from ( 
    select
        k5,
        date_format(k6, '%Y-%m-%d') as k6,
        count(distinct k3) as k3 
    from t 
    group by k5, k6
) AS temp where 1=1
group by k5, k6;

will throw exception since conjuncts planned on exchange node, because exchange node cannot handle conjuncts, now we skip exchange node when planning conjuncts, which fixes the bug. 
notice: the bug occurs iff the conjunct is always true like 1=1 above.
2023-03-27 17:50:52 +08:00
902629adb6 [fix](planner) fix targetTypeDef NPE when value is null (#18072)
sql like:
select * from (select *, null as top from v1)t where top = 5;
select * from (select *, null as top from v1)t where top is not null;
will cause NPE because targetTypeDef is null when value is null. Now we use cast target type to the targetTypeDef.
2023-03-27 17:29:14 +08:00
8b07021f5f [enhancement](regression-test) add hint to disable nereids planner for some cases (#18066) 2023-03-27 14:06:50 +08:00
bcf95cd920 [feature](function)Add ST_Angle_Sphere function (#17919) 2023-03-27 10:14:46 +08:00
78abb40fdc [improvement](string) throw exception instead of log fatal if string column exceed total size limit (#17989)
Throw exception instead of log fatal if string column exceed total size limit, so that we can catch it and let query fail, instead of causing be exit.
2023-03-27 08:55:26 +08:00
a0b100d38e [enhancement](regression-test) prove setting default value to session var will be detected #18113 2023-03-26 12:56:15 +08:00
2a0890d803 [feature](datatype) add show data types stmt (#18111) 2023-03-26 12:37:06 +08:00
96f274b8f3 [fix](global-variable) fix bug that set default value for global variable will cause NullPointerException (#18004) 2023-03-25 22:45:26 +08:00
df0eca4003 [improvement] (schema change) Lightweight schema change of modify column with varchar length (#17207)
Signed-off-by: Yisong Han <yisong8686@gmail.com>
2023-03-25 22:38:19 +08:00
74fdb6c116 [refactor](regression-test) refactor ssl test from p0 to p2 (#17847) 2023-03-25 22:37:26 +08:00
360d3050bc [Feature](array-function) Support array_reverse_sort function (#17754)
Co-authored-by: zhangyu209 <zhangyu209@meituan.com>
2023-03-25 21:58:11 +08:00
50eeb2d9a4 [fix](json) change int to bigint for json function (#17769) 2023-03-25 21:57:29 +08:00
Pxl
a8753faeb1 [Bug](function) fix column complex not resize after filter (#18043) 2023-03-25 21:48:13 +08:00
f84481886b [feature](string_functions) The 'split_part' function supports non-constant parameters (#18029) 2023-03-25 12:03:11 +08:00