Commit Graph

3029 Commits

Author SHA1 Message Date
151842a1fe [feature](inverted index)WIP inverted index api: SQL syntax and metadata (#13430)
Introduce a SQL syntax for creating inverted index and related metadata changes.

```
-- create table with INVERTED index 

CREATE TABLE httplogs (
  ts datetime,
  clientip varchar(20),
  request string,
  status smallint,
  size int,
  INDEX idx_size (size) USING INVERTED,
  INDEX idx_status (status) USING INVERTED,
  INDEX idx_clientip (clientip) USING INVERTED PROPERTIES("parser"="none")
)
DUPLICATE KEY(ts)
DISTRIBUTED BY RANDOM BUCKETS 10

-- add an INVERTED index  to a table

CREATE INDEX idx_request ON httplogs(request) USING INVERTED PROPERTIES("parser"="english");
```
2022-11-08 23:46:53 +08:00
826cfdaf93 [feature](information_schema) add backends information_schema table (#13086) 2022-11-08 22:15:10 +08:00
3f3f2eb098 [Nereids][Improve] infer predicate after push down predicate (#12996)
This PR implements the function of predicate inference

For example:

``` sql
select * from student left join score on student.id = score.sid where score.sid > 1
```
transformed logical plan tree:

                    left join
             /                    \
       filter(sid >1)     filter(id > 1) <---- inferred predicate
         |                           |
      scan                      scan  

See `InferPredicatesTest`  for more cases

 The logic is as follows:
  1. poll up bottom predicate then infer additional predicates
    for example:
    select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id
    1. poll up bottom predicate
       select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1
    2. infer
       select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t.id = 1 and t2.id = 1
    finally transformed sql:
       select * from (select * from t1 where t1.id = 1) t join t2 on t.id = t2.id and t2.id = 1
  2. put these predicates into `otherJoinConjuncts` , these predicates are processed in the next
    round of predicate push-down


Now only support infer `ComparisonPredicate`.

TODO: We should determine whether `expression` satisfies the condition for replacement
             eg: Satisfy `expression` is non-deterministic
2022-11-08 21:36:17 +08:00
b6f91b6eff [improvement](profile) support ordinary user to get query profile via http api (#14016) 2022-11-08 20:39:01 +08:00
ecfdf0320d [fix](statistics) ColumnStatistics was changed unexpectedly when show stats (#14068)
The logic of show stats would change the internal collected ColumnStat unexpectedly which would cause inaccurate cost and inefficient plan
2022-11-08 20:26:37 +08:00
cdc635610b [enhancement](Nereids) tpch q21 anti and semi join reorder (#14037)
estimation of anti and semi join need re-work. we just let tpch q21 pass.
2022-11-08 17:21:50 +08:00
54c07f8782 [regression](Nereids) add back tpch regression test cases (#13826)
1. add back TPC-H regression test cases
2. fix decimal problem on aggregate function sum and agg introduced by #13764 
3. fix memo merge group NPE introduced by #13900
2022-11-08 16:40:46 +08:00
1c07a01038 [feature](multi-catalog) Support data on s3-compatible oss and support aliyun DLF (#13994)
Support Aliyun DLF
Support data on s3-compatible object storage, such as aliyun oss.
Refactor some interface of catalog, to make it more tidy.
Fix bug that the default text format field delimiter of hive should be \x01
Add a new class PooledHiveMetaStoreClient to wrap the IMetaStoreClient.
2022-11-08 14:02:41 +08:00
61d4974ba1 [fix](Nereids) Use simple cost to calculate benefit and avoid unuseless calculation (#14056)
In GraphSimplifier, we can use simple cost to calculate the benefit.
And only when the best neighbor of the apply step is the processing edge, we need to update recursively.
2022-11-08 13:11:38 +08:00
e6b12ce8e8 [feature](Nereids) support query that group by use alias generated in aggregate output (#14030)
support query having alias in group by list, such as:
SELECT c1 AS a, SUM(c2) FROM t GROUP BY a;
2022-11-08 11:02:42 +08:00
b09e5ced97 [fix](priv) fix meta replay bug when upgrading from 1.1.x to 1.2.x (#14046) 2022-11-08 10:43:33 +08:00
6ed443c7e8 [enhancement](profile) add instanceNum, tableIds to profile. (#13985) 2022-11-08 08:49:16 +08:00
17a4746a08 [enhancement](Nereids) support otherJoinConjuncts in cascades join reorder (#13681) 2022-11-08 00:08:44 +08:00
1c2532b9dc [Bug](udf) Make UDF's type always nullable (#14002) 2022-11-07 20:51:31 +08:00
4ea1b39cb2 [enhancement](Nereids) remove unnecessary decimal cast (#13745) 2022-11-07 19:24:10 +08:00
f2978fb6ff [feat](Nereids) add graph simplifier (#14007) 2022-11-07 18:45:45 +08:00
22b4c6af20 [feature](Nereids) support statement having aggregate function in order by list (#13976)
1. add a feature that support statement having aggregate function in order by list. such as:
    SELECT COUNT(*) FROM t GROUP BY c1 ORDER BY COUNT(*) DESC;
2. add clickbench analyze unit tests
2022-11-07 17:01:31 +08:00
bb9182d602 [fix](repeat)remove unmaterialized expr from repeat node (#13953) 2022-11-07 14:13:05 +08:00
3c8524b9d8 [security](fe jar) upgrade commons-codec:commons-codec to 1.13 #13951 2022-11-07 13:50:07 +08:00
27549564a7 [feature](table-valued-function) Support S3 tvf (#13959)
This pr does three things:

1. Modified the framework of table-valued-function(tvf).
2. be support `fetch_table_schema` rpc.
3. Implemented `S3(path, AK, SK, format)` table-valued-function.
2022-11-06 11:04:26 +08:00
fb5a3e118a [feature-wip](dlf) prepare to support aliyun dlf (#13969)
[What is DLF](https://www.alibabacloud.com/product/datalake-formation)

This PR is a preparation for support DLF, with some changes of multi catalog

1. Add RuntimeException for most of hive meta store or es client visit operation.
2. Add DLF related dependencies.
3. Move the checks of es catalog properties to the analysis phase of creating es catalog

TODO(in next PR):

1. Refactor the `getSplit` method to support not only hdfs, but s3-compatible object storage.
2. Finish the implementation of supporting DLF
2022-11-06 10:01:57 +08:00
d01f7c546a [refactor](iceberg-hudi) disable iceberg and hudi table by default (#13932) 2022-11-05 19:22:27 +08:00
wxy
620a137bd7 [enhancement](test) support tablet repair and balance process in ut (#13940) 2022-11-05 19:20:23 +08:00
2ee7ba79a8 [Improvement](javaudf) improve java loader usage (#13962) 2022-11-05 13:20:04 +08:00
06a1efdb01 [fix](Nerieds) fix tpch and support trace plan's change event (#13957)
This pr fix some bugs for run tpc-h
1. fix the avg(decimal) crash the backend. The fix code in `Avg.getFinalType()` and every child class of `ComputeSinature`
2. fix the ReorderJoin dead loop. The fix code in `ReorderJoin.findInnerJoin()`
3. fix the TimestampArithmetic can not bind the functions in the child. The fix code in `BindFunction.FunctionBinder.visitTimestampArithmetic()`

New feature: support trace the plan's change event, you can `set enable_nereids_trace=true` to open trace log and see some log like this:
```
2022-11-03 21:07:38,391 INFO (mysql-nio-pool-0|208) [Job.printTraceLog():128] ========== RewriteBottomUpJob ANALYZE_FILTER_SUBQUERY ==========
before:
LogicalProject ( projects=[S_ACCTBAL#17, S_NAME#13, N_NAME#4, P_PARTKEY#19, P_MFGR#21, S_ADDRESS#14, S_PHONE#16, S_COMMENT#18] )
+--LogicalFilter ( predicates=((((((((P_PARTKEY#19 = PS_PARTKEY#7) AND (S_SUPPKEY#12 = PS_SUPPKEY#8)) AND (P_SIZE#24 = 15)) AND (P_TYPE#23 like '%BRASS')) AND (S_NATIONKEY#15 = N_NATIONKEY#3)) AND (N_REGIONKEY#5 = R_REGIONKEY#0)) AND (R_NAME#1 = 'EUROPE')) AND (PS_SUPPLYCOST#10 =  (SCALARSUBQUERY) (QueryPlan: LogicalAggregate ( phase=LOCAL, outputExpr=[min(PS_SUPPLYCOST#31) AS `min(PS_SUPPLYCOST)`#33], groupByExpr=[] )), (CorrelatedSlots: [P_PARTKEY#19, S_SUPPKEY#12, S_NATIONKEY#15, N_NATIONKEY#3, N_REGIONKEY#5, R_REGIONKEY#0, R_NAME#1]))) )
   +--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
      |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
      |  |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
      |  |  |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
      |  |  |  |--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.part, output=[P_PARTKEY#19, P_NAME#20, P_MFGR#21, P_BRAND#22, P_TYPE#23, P_SIZE#24, P_CONTAINER#25, P_RETAILPRICE#26, P_COMMENT#27], candidateIndexIds=[], selectedIndexId=11076, preAgg=ON )
      |  |  |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.supplier, output=[S_SUPPKEY#12, S_NAME#13, S_ADDRESS#14, S_NATIONKEY#15, S_PHONE#16, S_ACCTBAL#17, S_COMMENT#18], candidateIndexIds=[], selectedIndexId=11124, preAgg=ON )
      |  |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.partsupp, output=[PS_PARTKEY#7, PS_SUPPKEY#8, PS_AVAILQTY#9, PS_SUPPLYCOST#10, PS_COMMENT#11], candidateIndexIds=[], selectedIndexId=11092, preAgg=ON )
      |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.nation, output=[N_NATIONKEY#3, N_NAME#4, N_REGIONKEY#5, N_COMMENT#6], candidateIndexIds=[], selectedIndexId=11044, preAgg=ON )
      +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.region, output=[R_REGIONKEY#0, R_NAME#1, R_COMMENT#2], candidateIndexIds=[], selectedIndexId=11108, preAgg=ON )

after:
LogicalProject ( projects=[S_ACCTBAL#17, S_NAME#13, N_NAME#4, P_PARTKEY#19, P_MFGR#21, S_ADDRESS#14, S_PHONE#16, S_COMMENT#18] )
+--LogicalFilter ( predicates=((((((((P_PARTKEY#19 = PS_PARTKEY#7) AND (S_SUPPKEY#12 = PS_SUPPKEY#8)) AND (P_SIZE#24 = 15)) AND (P_TYPE#23 like '%BRASS')) AND (S_NATIONKEY#15 = N_NATIONKEY#3)) AND (N_REGIONKEY#5 = R_REGIONKEY#0)) AND (R_NAME#1 = 'EUROPE')) AND (PS_SUPPLYCOST#10 = min(PS_SUPPLYCOST)#33)) )
   +--LogicalProject ( projects=[P_PARTKEY#19, P_NAME#20, P_MFGR#21, P_BRAND#22, P_TYPE#23, P_SIZE#24, P_CONTAINER#25, P_RETAILPRICE#26, P_COMMENT#27, S_SUPPKEY#12, S_NAME#13, S_ADDRESS#14, S_NATIONKEY#15, S_PHONE#16, S_ACCTBAL#17, S_COMMENT#18, PS_PARTKEY#7, PS_SUPPKEY#8, PS_AVAILQTY#9, PS_SUPPLYCOST#10, PS_COMMENT#11, N_NATIONKEY#3, N_NAME#4, N_REGIONKEY#5, N_COMMENT#6, R_REGIONKEY#0, R_NAME#1, R_COMMENT#2, min(PS_SUPPLYCOST)#33] )
      +--LogicalApply ( correlationSlot=[P_PARTKEY#19, S_SUPPKEY#12, S_NATIONKEY#15, N_NATIONKEY#3, N_REGIONKEY#5, R_REGIONKEY#0, R_NAME#1], correlationFilter=Optional.empty )
         |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
         |  |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
         |  |  |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
         |  |  |  |--LogicalJoin ( type=CROSS_JOIN, hashJoinCondition=[], otherJoinCondition=[] )
         |  |  |  |  |--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.part, output=[P_PARTKEY#19, P_NAME#20, P_MFGR#21, P_BRAND#22, P_TYPE#23, P_SIZE#24, P_CONTAINER#25, P_RETAILPRICE#26, P_COMMENT#27], candidateIndexIds=[], selectedIndexId=11076, preAgg=ON )
         |  |  |  |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.supplier, output=[S_SUPPKEY#12, S_NAME#13, S_ADDRESS#14, S_NATIONKEY#15, S_PHONE#16, S_ACCTBAL#17, S_COMMENT#18], candidateIndexIds=[], selectedIndexId=11124, preAgg=ON )
         |  |  |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.partsupp, output=[PS_PARTKEY#7, PS_SUPPKEY#8, PS_AVAILQTY#9, PS_SUPPLYCOST#10, PS_COMMENT#11], candidateIndexIds=[], selectedIndexId=11092, preAgg=ON )
         |  |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.nation, output=[N_NATIONKEY#3, N_NAME#4, N_REGIONKEY#5, N_COMMENT#6], candidateIndexIds=[], selectedIndexId=11044, preAgg=ON )
         |  +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.region, output=[R_REGIONKEY#0, R_NAME#1, R_COMMENT#2], candidateIndexIds=[], selectedIndexId=11108, preAgg=ON )
         +--LogicalAggregate ( phase=LOCAL, outputExpr=[min(PS_SUPPLYCOST#31) AS `min(PS_SUPPLYCOST)`#33], groupByExpr=[] )
            +--LogicalFilter ( predicates=(((((P_PARTKEY#19 = PS_PARTKEY#28) AND (S_SUPPKEY#12 = PS_SUPPKEY#29)) AND (S_NATIONKEY#15 = N_NATIONKEY#3)) AND (N_REGIONKEY#5 = R_REGIONKEY#0)) AND (CAST(R_NAME AS STRING) = CAST(EUROPE AS STRING))) )
               +--LogicalOlapScan ( qualified=default_cluster:regression_test_tpch_sf1_p1_tpch_sf1.partsupp, output=[PS_PARTKEY#28, PS_SUPPKEY#29, PS_AVAILQTY#30, PS_SUPPLYCOST#31, PS_COMMENT#32], candidateIndexIds=[], selectedIndexId=11092, preAgg=ON )

```
2022-11-04 15:01:06 +08:00
dc01fb4085 [enhancement](Nereids) remove unnecessary string cast (#13730)
convert string like literal to the cast type instead of run cast in runtime
2022-11-04 11:18:22 +08:00
9bf20a7b5d [enhancement](Nereids) remove unnecessary int cast (#13881) 2022-11-04 11:07:59 +08:00
efb2596c7a [enhancment](Nereids) enable push down filter through aggregation (#13938) 2022-11-04 11:04:00 +08:00
f2d84d81e6 [feature-wip][refactor](multi-catalog) Persist external catalog related metadata. (#13746)
Persist external catalog/db/table, including the columns of external tables.
After this change, external objects could have their own uniq ID through their lifetime,
this is required for the statistic information collection.
2022-11-04 09:04:00 +08:00
698541e58d [improvement](exec) add more debug info on fragment exec error (#13899) 2022-11-04 08:55:31 +08:00
5d56fe6d32 [fix](meta)(recover) fix recover info persist bug (#13948)
introduced from #13830
2022-11-04 07:40:21 +08:00
0a228a68d6 [Improvement](javaudf) support different date argument for date/datetime type (#13920) 2022-11-03 20:33:20 +08:00
8043418db4 [optimization](array-type) update the exception message when create table with array column (#13731)
This pr is used to update the exception message when create table with array column.
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-11-03 17:12:17 +08:00
c1438cbad6 [revert](Nereids): revert GroupExpression Children ImmutableList. (#13918) 2022-11-03 16:29:54 +08:00
b1816d49e7 [fix](typo) check catalog enable exception message spelling mistake (#13925) 2022-11-03 14:44:37 +08:00
29e01db7ce [Fix](Nereids) add comments to CostAndEnforcerJob and fix view test case (#13046)
1. add comments to cost and enforcer job as some code is too hard to understand
2. fix nereids_syntax_p0/view.groovy's multi-answer bug.
2022-11-03 12:12:24 +08:00
bfba058ecf [Feature](join) Support null aware left anti join (#13871) 2022-11-03 12:11:25 +08:00
57ee5c4a65 [feature](nereids) Support authentication (#13434)
Add a rule to check the permission of a user who are executing a query. Forbid users who don't have SELECT_PRIV on some tables from executing queries on these tables.
2022-11-03 11:58:14 +08:00
31d8fdd9e4 [fix](Nereids) finalize local aggregate should not turn on stream pre agg (#13922) 2022-11-03 11:08:06 +08:00
a4a991207b [fix](agg)fix group by constant value bug (#13827)
* [fix](agg)fix group by constant value bug

* keep only one const grouping exprs if no agg exprs
2022-11-03 10:26:59 +08:00
b3c6af0059 [Bugfix](MV) Fixed load negative values into bitmap type materialized views successfully under non-vectorization (#13719)
* [Bugfix](MV) Fixed load negative values into bitmap type materialized views successfully under non-vectorization
2022-11-03 09:21:38 +08:00
7b4c2cabb4 [feature](new-scan) support transactional insert in new scan framework (#13858)
Support running transactional insert operation with new scan framework. eg:

admin set frontend config("enable_new_load_scan_node" = "true");
begin;
insert into tbl1 values(1,2);
insert into tbl1 values(3,4);
insert into tbl1 values(5,6);
commit;
Add some limitation to transactional insert

Do not support non-literal value in insert stmt
Fix some issue about array type:

Forbid cast other non-array type to NESTED array type, it may cause BE crash.
Add getStringValueForArray() method for Expr, to get valid string-formatted array type value.
Add useLocalSessionState=true in regression-test jdbc url
without this config, the jdbc driver will send some init cmd each time it connect to server, such as
select @@session.tx_read_only.
But when we use transactional insert, after begin command, Doris do not support any other type of
stmt except for insert, commit or rollback.
So adding this config to let the jdbc NOT send cmd when connecting.
2022-11-03 08:36:07 +08:00
Fy
e021705053 [feature](nereids) support common table expression (#12742)
Support common table expression(CTE) in Nereids:
- Just implemented inline CTE, which means we will copy the logicalPlan of CTE everywhere it is referenced;
- If the name of CTE is the same as an existing table or view, we will choose CTE first;
2022-11-02 23:41:53 +08:00
0ea7f85986 [fix](keyword) add BIN as keyword (#13907) 2022-11-02 22:30:43 +08:00
53814e466b [Enhancement](Nereids)optimize merge group in memo #13900 2022-11-02 20:42:55 +08:00
374303186c [Vectorized](function) support topn_array function (#13869) 2022-11-02 19:49:23 +08:00
b26d8f284c [fix](rpc) The proxy removed when rpc exception occurs is not an abnormal proxy (#13836)
`BackendServiceProxy.getInstance()` uses the round robin strategy to obtain the proxy,
so when the current RPC request is abnormal, the proxy removed by 
`BackendServiceProxy.getInstance().removeProxy(...)` is not an abnormal proxy.
2022-11-02 19:39:33 +08:00
6eea855e78 [feature](Nereids) Support lots of scalar function and fix some bug (#13764)
Proposed changes
1. function interfaces that can search the matched signature, say ComputeSignature. It's equal to the Function.CompareMode.
   - IdenticalSignature: equal to Function.CompareMode.IS_IDENTICAL
   - NullOrIdenticalSignature: equal to Function.CompareMode.IS_INDISTINGUISHABLE
   - ImplicitlyCastableSignature: equal to Function.CompareMode.IS_SUPERTYPE_OF
   - ExplicitlyCastableSignature: equal to Function.CompareMode.IS_NONSTRICT_SUPERTYPE_OF
3. generate lots of scalar functions
4. bug-fix: disassemble avg function compute wrong result because the wrong input type, the AggregateParam.inputTypesBeforeDissemble is use to save the origin input type and pass to backend to find the correct global aggregate function.
5. bug-fix: subquery with OneRowRelation will crash because wrong nullable property


Note:
1. currently no more unit test/regression test for the scalar functions, I will add the test until migrate aggregate functions for unified processing.
2. A known problem is can not invoke the variable length function, I will fix it later.
2022-11-02 18:01:08 +08:00
a871fef815 [Improve](Nereids): refactor eliminate outer join (#13402)
Refactor eliminate outer join #12985

Evaluate the expression with ConstantFoldRule. If the evaluation result is NULL or FALSE, then the elimination condition is satisfied.
2022-11-02 17:39:05 +08:00
1bafb26217 [fix](Nereids) throw NPE when call getOutputExprIds in LogicalProperties (#13898) 2022-11-02 16:52:18 +08:00