Commit Graph

560 Commits

Author SHA1 Message Date
151842a1fe [feature](inverted index)WIP inverted index api: SQL syntax and metadata (#13430)
Introduce a SQL syntax for creating inverted index and related metadata changes.

```
-- create table with INVERTED index 

CREATE TABLE httplogs (
  ts datetime,
  clientip varchar(20),
  request string,
  status smallint,
  size int,
  INDEX idx_size (size) USING INVERTED,
  INDEX idx_status (status) USING INVERTED,
  INDEX idx_clientip (clientip) USING INVERTED PROPERTIES("parser"="none")
)
DUPLICATE KEY(ts)
DISTRIBUTED BY RANDOM BUCKETS 10

-- add an INVERTED index  to a table

CREATE INDEX idx_request ON httplogs(request) USING INVERTED PROPERTIES("parser"="english");
```
2022-11-08 23:46:53 +08:00
826cfdaf93 [feature](information_schema) add backends information_schema table (#13086) 2022-11-08 22:15:10 +08:00
115c6bd411 [fix](keyranges) fix the split error of keyranges (#14049)
fix the split error of keyranges
2022-11-08 22:09:16 +08:00
54c07f8782 [regression](Nereids) add back tpch regression test cases (#13826)
1. add back TPC-H regression test cases
2. fix decimal problem on aggregate function sum and agg introduced by #13764 
3. fix memo merge group NPE introduced by #13900
2022-11-08 16:40:46 +08:00
f7ecb6d79f [Bug](Bitmap) fix sub_bitmap calculate wrong result to return null (#13978)
fix sub_bitmap calculate wrong result to return null
2022-11-08 14:10:12 +08:00
34f43ac781 [bug](like function)fix like '' (empty string) get wrong result with all rows #14035 2022-11-08 08:51:39 +08:00
1c2532b9dc [Bug](udf) Make UDF's type always nullable (#14002) 2022-11-07 20:51:31 +08:00
bb9182d602 [fix](repeat)remove unmaterialized expr from repeat node (#13953) 2022-11-07 14:13:05 +08:00
e8d2fb6778 [feature](function)add search functions: multi_search_all_positions & multi_match_any (#13763)
Co-authored-by: yiliang qiu <yiliang.qiu@qq.com>
2022-11-07 11:50:55 +08:00
7ffe88b579 [feature-array](array-type) Add array function array_popback (#13641)
Remove the last element from array.

```
mysql> select array_popback(['test', NULL, 'value']);
+-----------------------------------------------------+
| array_popback(ARRAY('test', NULL, 'value')) |
+-----------------------------------------------------+
| [test, NULL]                                        |
+-----------------------------------------------------+
```
2022-11-07 10:48:16 +08:00
1724faf9a5 [test](java-udf)add java udf RegressionTest about the currently supported data types #13972 2022-11-05 19:25:58 +08:00
06a1efdb01 [fix](Nerieds) fix tpch and support trace plan's change event (#13957)
This pr fix some bugs for run tpc-h
1. fix the avg(decimal) crash the backend. The fix code in `Avg.getFinalType()` and every child class of `ComputeSinature`
2. fix the ReorderJoin dead loop. The fix code in `ReorderJoin.findInnerJoin()`
3. fix the TimestampArithmetic can not bind the functions in the child. The fix code in `BindFunction.FunctionBinder.visitTimestampArithmetic()`

New feature: support trace the plan's change event, you can `set enable_nereids_trace=true` to open trace log and see some log like this:
```
2022-11-03 21:07:38,391 INFO (mysql-nio-pool-0|208) [Job.printTraceLog():128] ========== RewriteBottomUpJob ANALYZE_FILTER_SUBQUERY ==========
before:
LogicalProject ( projects=[S_ACCTBAL#17, S_NAME#13, N_NAME#4, P_PARTKEY#19, P_MFGR#21, S_ADDRESS#14, S_PHONE#16, S_COMMENT#18] )
+--LogicalFilter ( predicates=((((((((P_PARTKEY#19 = PS_PARTKEY#7) AND (S_SUPPKEY#12 = PS_SUPPKEY#8)) AND (P_SIZE#24 = 15)) AND (P_TYPE#23 like '%BRASS')) AND (S_NATIONKEY#15 = N_NATIONKEY#3)) AND (N_REGIONKEY#5 = R_REGIONKEY#0)) AND (R_NAME#1 = 'EUROPE')) AND (PS_SUPPLYCOST#10 =  (SCALARSUBQUERY) (QueryPlan: LogicalAggregate ( phase=LOCAL, outputExpr=[min(PS_SUPPLYCOST#31) AS `min(PS_SUPPLYCOST)`#33], groupByExpr=[] )), (CorrelatedSlots: [P_PARTKEY#19, S_SUPPKEY#12, S_NATIONKEY#15, N_NATIONKEY#3, N_REGIONKEY#5, R_REGIONKEY#0, R_NAME#1]))) )
   +--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
      |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
      |  |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
      |  |  |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
      |  |  |  |--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.part, output=[P_PARTKEY#19, P_NAME#20, P_MFGR#21, P_BRAND#22, P_TYPE#23, P_SIZE#24, P_CONTAINER#25, P_RETAILPRICE#26, P_COMMENT#27], candidateIndexIds=[], selectedIndexId=11076, preAgg=ON )
      |  |  |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.supplier, output=[S_SUPPKEY#12, S_NAME#13, S_ADDRESS#14, S_NATIONKEY#15, S_PHONE#16, S_ACCTBAL#17, S_COMMENT#18], candidateIndexIds=[], selectedIndexId=11124, preAgg=ON )
      |  |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.partsupp, output=[PS_PARTKEY#7, PS_SUPPKEY#8, PS_AVAILQTY#9, PS_SUPPLYCOST#10, PS_COMMENT#11], candidateIndexIds=[], selectedIndexId=11092, preAgg=ON )
      |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.nation, output=[N_NATIONKEY#3, N_NAME#4, N_REGIONKEY#5, N_COMMENT#6], candidateIndexIds=[], selectedIndexId=11044, preAgg=ON )
      +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.region, output=[R_REGIONKEY#0, R_NAME#1, R_COMMENT#2], candidateIndexIds=[], selectedIndexId=11108, preAgg=ON )

after:
LogicalProject ( projects=[S_ACCTBAL#17, S_NAME#13, N_NAME#4, P_PARTKEY#19, P_MFGR#21, S_ADDRESS#14, S_PHONE#16, S_COMMENT#18] )
+--LogicalFilter ( predicates=((((((((P_PARTKEY#19 = PS_PARTKEY#7) AND (S_SUPPKEY#12 = PS_SUPPKEY#8)) AND (P_SIZE#24 = 15)) AND (P_TYPE#23 like '%BRASS')) AND (S_NATIONKEY#15 = N_NATIONKEY#3)) AND (N_REGIONKEY#5 = R_REGIONKEY#0)) AND (R_NAME#1 = 'EUROPE')) AND (PS_SUPPLYCOST#10 = min(PS_SUPPLYCOST)#33)) )
   +--LogicalProject ( projects=[P_PARTKEY#19, P_NAME#20, P_MFGR#21, P_BRAND#22, P_TYPE#23, P_SIZE#24, P_CONTAINER#25, P_RETAILPRICE#26, P_COMMENT#27, S_SUPPKEY#12, S_NAME#13, S_ADDRESS#14, S_NATIONKEY#15, S_PHONE#16, S_ACCTBAL#17, S_COMMENT#18, PS_PARTKEY#7, PS_SUPPKEY#8, PS_AVAILQTY#9, PS_SUPPLYCOST#10, PS_COMMENT#11, N_NATIONKEY#3, N_NAME#4, N_REGIONKEY#5, N_COMMENT#6, R_REGIONKEY#0, R_NAME#1, R_COMMENT#2, min(PS_SUPPLYCOST)#33] )
      +--LogicalApply ( correlationSlot=[P_PARTKEY#19, S_SUPPKEY#12, S_NATIONKEY#15, N_NATIONKEY#3, N_REGIONKEY#5, R_REGIONKEY#0, R_NAME#1], correlationFilter=Optional.empty )
         |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
         |  |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
         |  |  |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
         |  |  |  |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
         |  |  |  |  |--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.part, output=[P_PARTKEY#19, P_NAME#20, P_MFGR#21, P_BRAND#22, P_TYPE#23, P_SIZE#24, P_CONTAINER#25, P_RETAILPRICE#26, P_COMMENT#27], candidateIndexIds=[], selectedIndexId=11076, preAgg=ON )
         |  |  |  |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.supplier, output=[S_SUPPKEY#12, S_NAME#13, S_ADDRESS#14, S_NATIONKEY#15, S_PHONE#16, S_ACCTBAL#17, S_COMMENT#18], candidateIndexIds=[], selectedIndexId=11124, preAgg=ON )
         |  |  |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.partsupp, output=[PS_PARTKEY#7, PS_SUPPKEY#8, PS_AVAILQTY#9, PS_SUPPLYCOST#10, PS_COMMENT#11], candidateIndexIds=[], selectedIndexId=11092, preAgg=ON )
         |  |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.nation, output=[N_NATIONKEY#3, N_NAME#4, N_REGIONKEY#5, N_COMMENT#6], candidateIndexIds=[], selectedIndexId=11044, preAgg=ON )
         |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.region, output=[R_REGIONKEY#0, R_NAME#1, R_COMMENT#2], candidateIndexIds=[], selectedIndexId=11108, preAgg=ON )
         +--LogicalAggregate ( phase=LOCAL, outputExpr=[min(PS_SUPPLYCOST#31) AS `min(PS_SUPPLYCOST)`#33], groupByExpr=[] )
            +--LogicalFilter ( predicates=(((((P_PARTKEY#19 = PS_PARTKEY#28) AND (S_SUPPKEY#12 = PS_SUPPKEY#29)) AND (S_NATIONKEY#15 = N_NATIONKEY#3)) AND (N_REGIONKEY#5 = R_REGIONKEY#0)) AND (CAST(R_NAME AS STRING) = CAST(EUROPE AS STRING))) )
               +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.partsupp, output=[PS_PARTKEY#28, PS_SUPPKEY#29, PS_AVAILQTY#30, PS_SUPPLYCOST#31, PS_COMMENT#32], candidateIndexIds=[], selectedIndexId=11092, preAgg=ON )

```
2022-11-04 15:01:06 +08:00
0a228a68d6 [Improvement](javaudf) support different date argument for date/datetime type (#13920) 2022-11-03 20:33:20 +08:00
5d7b51dcc2 [BugFix](Concat) output of string concat function exceeds UINT makes crash (#13916) 2022-11-03 19:44:44 +08:00
d183199319 [Bug](array-type) Fix array product calculate decimal type return wrong result (#13794) 2022-11-03 17:26:34 +08:00
8043418db4 [optimization](array-type) update the exception message when create table with array column (#13731)
This pr is used to update the exception message when create table with array column.
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-11-03 17:12:17 +08:00
5fe3342aa3 [Vectorized](function) support bitmap_to_array function (#13926) 2022-11-03 14:29:28 +08:00
29e01db7ce [Fix](Nereids) add comments to CostAndEnforcerJob and fix view test case (#13046)
1. add comments to cost and enforcer job as some code is too hard to understand
2. fix nereids_syntax_p0/view.groovy's multi-answer bug.
2022-11-03 12:12:24 +08:00
bfba058ecf [Feature](join) Support null aware left anti join (#13871) 2022-11-03 12:11:25 +08:00
57ee5c4a65 [feature](nereids) Support authentication (#13434)
Add a rule to check the permission of a user who are executing a query. Forbid users who don't have SELECT_PRIV on some tables from executing queries on these tables.
2022-11-03 11:58:14 +08:00
a4a991207b [fix](agg)fix group by constant value bug (#13827)
* [fix](agg)fix group by constant value bug

* keep only one const grouping exprs if no agg exprs
2022-11-03 10:26:59 +08:00
37e4a1769d [fix](sequence) fix that update table core dump with sequence column (#13847)
* [fix](sequence) fix that update table core dump with sequence column

* update
2022-11-03 09:02:21 +08:00
cc37ef0f2b [regression-test](query) Add the regression case of the query under the large wide table. #13897
Co-authored-by: smallhibiscus <844981280>
2022-11-03 08:47:53 +08:00
7b4c2cabb4 [feature](new-scan) support transactional insert in new scan framework (#13858)
Support running transactional insert operation with new scan framework. eg:

admin set frontend config("enable_new_load_scan_node" = "true");
begin;
insert into tbl1 values(1,2);
insert into tbl1 values(3,4);
insert into tbl1 values(5,6);
commit;
Add some limitation to transactional insert

Do not support non-literal value in insert stmt
Fix some issue about array type:

Forbid cast other non-array type to NESTED array type, it may cause BE crash.
Add getStringValueForArray() method for Expr, to get valid string-formatted array type value.
Add useLocalSessionState=true in regression-test jdbc url
without this config, the jdbc driver will send some init cmd each time it connect to server, such as
select @@session.tx_read_only.
But when we use transactional insert, after begin command, Doris do not support any other type of
stmt except for insert, commit or rollback.
So adding this config to let the jdbc NOT send cmd when connecting.
2022-11-03 08:36:07 +08:00
Fy
e021705053 [feature](nereids) support common table expression (#12742)
Support common table expression(CTE) in Nereids:
- Just implemented inline CTE, which means we will copy the logicalPlan of CTE everywhere it is referenced;
- If the name of CTE is the same as an existing table or view, we will choose CTE first;
2022-11-02 23:41:53 +08:00
b83744d2f6 [feature](function)add regexp functions: regexp_replace_one, regexp_extract_all (#13766) 2022-11-02 23:15:57 +08:00
0ea7f85986 [fix](keyword) add BIN as keyword (#13907) 2022-11-02 22:30:43 +08:00
374303186c [Vectorized](function) support topn_array function (#13869) 2022-11-02 19:49:23 +08:00
6eea855e78 [feature](Nereids) Support lots of scalar function and fix some bug (#13764)
Proposed changes
1. function interfaces that can search the matched signature, say ComputeSignature. It's equal to the Function.CompareMode.
   - IdenticalSignature: equal to Function.CompareMode.IS_IDENTICAL
   - NullOrIdenticalSignature: equal to Function.CompareMode.IS_INDISTINGUISHABLE
   - ImplicitlyCastableSignature: equal to Function.CompareMode.IS_SUPERTYPE_OF
   - ExplicitlyCastableSignature: equal to Function.CompareMode.IS_NONSTRICT_SUPERTYPE_OF
3. generate lots of scalar functions
4. bug-fix: disassemble avg function compute wrong result because the wrong input type, the AggregateParam.inputTypesBeforeDissemble is use to save the origin input type and pass to backend to find the correct global aggregate function.
5. bug-fix: subquery with OneRowRelation will crash because wrong nullable property


Note:
1. currently no more unit test/regression test for the scalar functions, I will add the test until migrate aggregate functions for unified processing.
2. A known problem is can not invoke the variable length function, I will fix it later.
2022-11-02 18:01:08 +08:00
f2a0adf34e [fix](fe) Inconsistent behavior for string comparison in FE and BE (#13604) 2022-11-02 15:32:13 +08:00
e6080a6e4c [regression](join) add right anti join with other predicate regression case (#13815) 2022-11-02 13:27:58 +08:00
277025b046 [fix](join)ColumnNullable need handle const column with nullable const value (#13866) 2022-11-02 08:52:49 +08:00
287a739510 [javaudf](string) Fix string format in java udf (#13854) 2022-11-01 21:25:12 +08:00
61c817f4cc [feature](syntax) support SELECT * EXCEPT (#13844)
* [feature](syntax) support SELECT * EXCEPT: add regression test
2022-11-01 19:41:25 +08:00
e63608b556 [Bug](test) fix some test case result is ramdom (#13837) 2022-11-01 14:06:47 +08:00
42b2725f03 [Bug](delete) Fix wrong delete operation (#13840) 2022-11-01 13:38:43 +08:00
b27714542d [fix](planner) infer predicate could generate predicates in another scope (#13691)
* [fix](planner) infer predicate could generate predicates in another scope
2022-11-01 09:03:41 +08:00
d2c5c1af3b [feature](regression) add custom config file for Regression: regression-conf-custom.groovy (#13783) 2022-10-31 22:49:06 +08:00
ba177a15cb [feature-wip](recover) new recover ddl and support show catalog recycle bin (#13067) 2022-10-31 17:44:56 +08:00
f49a0daf54 [fix](regression) Fix concurrent regression failure (#13798) 2022-10-31 15:57:45 +08:00
53e5f3939e [fix](plan)result exprs should be substituted in the same way as agg exprs (#13744)
* [fix](cast)ignore implicit cast when comparing two exprs

* fix fe ut
2022-10-31 10:19:32 +08:00
61b7c2c96c [fix](join) fix incorrect result when using anti join with other join predicates (#13743) 2022-10-31 09:51:34 +08:00
b15e0a9fb5 [Bug](function) fix bug of if function of nullable column process (#13779) 2022-10-31 08:38:53 +08:00
753c2ccfd1 [fix](test) drop table before create it (#13791) 2022-10-30 22:45:08 +08:00
98cc32aa0e [BugFix](regression-test) add order by in left/right join test case (#13774)
The result of right join is unordered, so we need add order by to guarantee results consistent.
2022-10-30 18:00:08 +08:00
efe813ba60 [fix](test)(explain) add full qualified name for scan node explain string (#13777)
1.
In the "explain" result of SQL, the table name in `ScanNode` should be full qualified with dbname.
And for olap scan node, the selected index name should not be "null".

2.
Remove `tpch_sf1_p1/tpch_sf1/nereids/` in regression test, it will be fixed later.
2022-10-30 13:24:48 +08:00
f325119362 [fix](regression-test) update table name string in tpch_sf1 explain case (#13724) 2022-10-28 11:13:29 +08:00
5805011629 [Feature](string-function) Add function mask/mask_first_n/mask_last_n (#13694)
Implementation of mask function from hive.
2022-10-28 10:43:56 +08:00
20363edc73 [BugFix](function) fix reverse function dynamic buffer overflow due to illegal character (#13671)
Previous logic of reverse function might not be strong enough to handle illegal character. For example, one one byte size character would be mistaken as one utf-8 character which occupies more than one byte space. And unfortunately exceeding the buffer space during future process.
2022-10-28 08:44:08 +08:00
859ffa6304 [bugfix](concat) be crash caused by function concat(ifnull) (#13693) 2022-10-28 08:42:51 +08:00