Commit Graph

1079 Commits

Author SHA1 Message Date
a2b9b9edd7 [fix](planner) fix bug in agg on constant column (#16442)
For performance reason, we want to remove constant column from groupingExprs.
For example:
                `select sum(T.A) from T group by T.B, 'xyz'` is equivalent to `select sum(T.A) from T group by T.B`
We can remove constant column `abc` from groupingExprs.

But there is an exception when all groupingExpr are constant
For example:

                sql1: `select 'abc' from t group by 'abc'`
                 is not equivalent to
                sql2: `select 'abc' from t`

                sql3: `select 'abc', sum(a) from t group by 'abc'`
                 is not equivalent to
                sql4: `select 1, sum(a) from t`
                (when t is empty, sql3 returns 0 tuple, sql4 return 1 tuple)

We need to keep some constant columns if all groupingExpr are constant.

Consider sql5 `select a from (select "abc" as a, 'def' as b) T group by b, a;`
if the constant column `a` is in select list, this column should not be removed.
sql5 is transformed to 
sql6 `select a from (select "abc" as a, 'def' as b) T group by a;`
2023-02-13 11:26:08 +08:00
46dd887ae2 [fix](nereids) make slot binding compatible to original planner (#16612)
SELECT a,2 as a FROM (SELECT '1' as a) b HAVING a=1

in original planner, having clause binding failed. Make Nereids failed too.
2023-02-13 11:14:17 +08:00
f41a2055d3 [feature](Load)Remove user/password in properties for mysql load to avoid double auth. (#16073)
Use FE cluster token to auth stream load.
This auth is only open for be, and fe auth still only support http basic auth.

I will use this auth for mysql load to build a no-auth stream load from fe to be.
And this will avoid double auth in mysql load.
More information to see the design doc.
2023-02-13 10:00:08 +08:00
cf739e7496 [Enhancement](Stmt) Set insert_into timeout session variable separately (#16343) 2023-02-12 16:56:10 +08:00
3c3110b253 [Fix](Jdbc Catalog) jdbc catalog support to connect to doris database (#16527)
Doris can use mysql-jdbc-jar to connect doris database, but doris has some data type that mysql without.
Such as DecimalV3 and Date/DatetimeV2
I add some case judgments in `Mysql Catalog` , so that Jdbc catalog can identify the data type of DORIS
2023-02-10 20:24:40 +08:00
43eca4f209 [Feature-WIP](inverted index) Implementation for alter inverted index. (#16371)
implementation for add/drop inverted index.
2023-02-10 17:56:17 +08:00
6a5277b391 [fix](sequence-column) MergeIterator does not use the correct seq column for comparison (#16494) 2023-02-10 17:51:15 +08:00
b99e2dc727 [bug](jdbc) fix jdbc can't get object of PGobject (#16496)
when pg table have some  unsupported column type like: point, polygon, jsonb......
jdbc catalog will convert it to string type in doris. but get result set in java is org.postgresql.util.PGobject
 
Some test need this pr: #16442
2023-02-10 16:19:02 +08:00
8758cd412f [feature](auth)Implementing privilege management with rbac model (#16091)
change implement of auth to rbac

each user has one default role which can not be drop;

if you grant priv to user,it will grant to default role ,

In the current pr, the user can still only have one role other than the default role, but in the future, the user and role will be many-to-many

rename PaloRole,PaloAuth,PaloPrivilege to Role,Auth,Privilege
2023-02-10 12:30:49 +08:00
379bef598d [fix-core](block) clear block row_same_bit when block reuse (#16172) 2023-02-10 12:21:27 +08:00
1b3902baa2 [Feature](Complex-type) Add struct and map type to Doris (#16444)
This commit support:
1、Insert + select for struct/map type
2、Json stream load for struct type
3、m[key] function for map type

How to use:
Set the fe config to create table for struct and map type
1、admin set frontend config("enable_struct_type" = "true");
2、admin set frontend config("enable_map_type" = "true");

#16547

Co-authored-by: xy720 <xuyang25@baidu.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2023-02-10 11:00:33 +08:00
0c20c607b2 fix stats (#16556) 2023-02-10 11:00:01 +08:00
e68299113e [fix](regression test) fix test_array_index.groovy without 'order by' lead to result mismatch (#16575) 2023-02-10 08:53:22 +08:00
438daaaf1c [enchancement](mv) forbidden craete useless mv in fe (#16286)
forbidden create useless mv in fe
2023-02-09 23:00:09 +08:00
05ed1f751b [fix](planner)(Nereids) add date and datev2 signature to greatest and least function (#16565) 2023-02-09 21:36:53 +08:00
851a3575ae [fix](regression case) exclude test_broker_load suite, reopen after bug fix (#16554)
There is something wrong with the `test_broker_load` suite(s3 auth problem).
So I ignore this case temporarily.
cc @wsjz , please help to solve it and add it back
2023-02-09 15:51:32 +08:00
e48a033338 [Bug](pipeline) Support projection in UnionSourceOperator (#16525) 2023-02-09 14:43:44 +08:00
e1f1386395 [fix](cooldown) Rewrite update cooldown conf (#16488)
Remove error-prone CooldownJob, and use CooldownConfHandler to update Tablet's cooldown conf.
Some bug fix about cooldown.
2023-02-09 09:12:55 +08:00
f0b0eedbc5 [fix](planner)group_concat lost order by info in second phase merge agg (#16479) 2023-02-08 20:48:52 +08:00
a512469537 [fix](planner) cannot process more than one subquery in disjunct (#16506)
before this PR, Doris cannot process sql like that
```sql
CREATE TABLE `test_sq_dj1` (
    `c1` int(11) NULL,
    `c2` int(11) NULL,
    `c3` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`c1`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`c1`) BUCKETS 3
PROPERTIES (
    "replication_allocation" = "tag.location.default: 1",
    "in_memory" = "false",
    "storage_format" = "V2",
    "disable_auto_compaction" = "false"
);

CREATE TABLE `test_sq_dj2` (
    `c1` int(11) NULL,
    `c2` int(11) NULL,
    `c3` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`c1`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`c1`) BUCKETS 3
PROPERTIES (
    "replication_allocation" = "tag.location.default: 1",
    "in_memory" = "false",
    "storage_format" = "V2",
    "disable_auto_compaction" = "false"
);

insert into test_sq_dj1 values(1, 2, 3), (10, 20, 30), (100, 200, 300);
insert into test_sq_dj2 values(10, 20, 30);

-- core
SELECT * FROM test_sq_dj1 WHERE c1 IN (SELECT c1 FROM test_sq_dj2) OR c1 IN (SELECT c1 FROM test_sq_dj2) OR c1 < 10;

-- invalid slot
SELECT * FROM test_sq_dj1 WHERE c1 IN (SELECT c1 FROM test_sq_dj2) OR c1 IN (SELECT c2 FROM test_sq_dj2) OR c1 < 10;
```

there are two problems:
1. we should remove redundant sub-query in one conjuncts to avoid generate useless join node
2. when we have more than one sub-query in one disjunct. we should put the conjunct contains the disjunct at the top node of the set of mark join nodes. And pop up the mark slot to the top node.
2023-02-08 18:46:06 +08:00
f71fc3291f [Bug](fix) right anti join error result when batch size is low (#16510) 2023-02-08 17:26:19 +08:00
f6a20f844b [fix](hashjoin) join produce blocks with rows larger than batch size: handle join with other conjuncts (#16402) 2023-02-08 14:26:35 +08:00
41947c73eb [Feature](array-function) Support array functions for nested type datev2 and datetimev2 (#16382) 2023-02-08 12:51:07 +08:00
81dbed70c2 [fix](Nereids) back off on tpch p1 (#16478)
adjust nullable on empty set should apply after unnested sub-query
some function should propagate nullable when args are datev2 or datetimev2
add back tpch sf0.1 nereids regression test
2023-02-08 10:43:13 +08:00
289a4b2ea4 [fix](func) fix truncate float type result error (#16468)
When the argument of truncate function is float type, it can match both truncate(DECIMALV3) and truncate(DOUBLE), if the match is truncate(DECIMALV3), the precision is lost when converting float to DECIMALV3(38, 0).

Here I modify it to match truncate(DOUBLE) for now, maybe we still need to solve the problem of losing precision when converting float to DECIMALV3.
2023-02-08 08:57:43 +08:00
a4c28e6efa [Fix](Nereids) runtime filter cannot generate when expression is cast. (#16120) 2023-02-07 20:28:07 +08:00
1d0fdff98a [Bug](sort) disable 2phase read for sort by expressions exclude slotref (#16460)
```
create table tbl1 (k1 varchar(100), k2 string) distributed by hash(k1) buckets 1 properties("replication_num" = "1");

insert into tbl1 values(1, "alice");

select cast(k1 as INT) as id from tbl1 order by id limit 2;
```

The above query could pass `checkEnableTwoPhaseRead` since the order by element is SlotRef but actually it's an function call expr
2023-02-07 19:42:54 +08:00
91229bb87d [Bug](makr join) Fix mark join with other conjuncts (#16435) 2023-02-07 09:31:41 +08:00
a13beca0de [Fix](load)Use lower case for load column names. #16422
The columns name in stream load and broker load are case sensitive, make it case insensitive. This would be consist with query, because query sql columns name are case insensitve.
2023-02-07 09:18:37 +08:00
36a5e0a2a9 [bugfix](array) fix element revert on error in DataTypeArray::from_string (#16434)
* fix array from_string element revert on error

* add testcase
2023-02-06 18:27:36 +08:00
c1400a34eb [regression](fix) 1. fix broker load test case and add orc test 2. se… (#16373)
* [regression](fix) 1. fix broker load test case and add orc test 2. set enableBrokerLoad=true in pipeline

* add a load test for the orc file and let it run in the TeamCity pipeline.

--This pr may not pass P0 Regression check since the bug of orc load has not been fixed.--
change the column name in the load sql to lowercase to pass P0 Regression check.
corrected: it's not a bug but a feature.
2023-02-06 16:10:25 +08:00
118e769dac [fix](regression-test) fix external table p2 case bug (#16405)
Remove hive table test case, hive table is no longer support on master
Fix unstable sort result
2023-02-06 15:55:05 +08:00
dccd04a3ba [fix](fe)predicate is wrongly pushed through CUBE function (#15831) 2023-02-06 11:29:15 +08:00
f2fd47f238 [Improve](row-store) support row cache (#16263) 2023-02-06 11:16:39 +08:00
f940cf4cf6 [fix](multi-catalog) fix recursive get schema cache bug (#16415) 2023-02-06 09:23:07 +08:00
09870098af [fix](func) fix core dump when the pattern of the regexp_extract_all function does not contain subpatterns (#16408) 2023-02-05 01:16:54 +08:00
df3a6e2412 [fix](fe)only set column info for slots in sortTupleDesc (#16407) 2023-02-04 23:14:25 +08:00
ca7b2e27a8 [regression-test](function) add regression test for money_format with truncate (#16052) 2023-02-04 23:10:01 +08:00
dd63897757 [fix](be)the set operation node should accept both nullable and non-nullable data from child node (#16126) 2023-02-04 23:08:59 +08:00
1146bde695 [feature-wip](MTMV) Support refresh mtmv (#16218)
Support using this sql to refresh mtmv manually. It can generate a mtmv task right now.

```
REFRESH MATERIALIZED VIEW test_mv_view [complete];
```

You can use `show mtmv task` to show the latest task.

In this pr, I also try to clear the mtmv tasks when drop the mtmv to make sure test suite to be right
2023-02-04 20:17:45 +08:00
918004c016 [Bug](date) Fix BE crash caused by function datediff (#16397)
* [Bug](date) Fix BE crash caused by function `datediff`

* update
2023-02-04 18:43:23 +08:00
f443ebfd9a [Improvement](statistics) optimise histogram keyword (#16369) 2023-02-03 23:02:41 +08:00
125b60b4b9 [improvement](compatibility) add DATA_TYPE in information schema for new types #16391
Add DATA_TYPE in information schema for types: datev2, datatimev2, decimal, jsonb. It was 'unknown' for these types and cause problem for tools such as BI using information schema.
2023-02-03 22:28:42 +08:00
4f778c38a1 [feature](nereids) support explore 4 phase aggregation (#16298)
support 4 phase Aggregation.
example: 
`select count(distinct k1), sum(k2) from t`
suppose t.k0 is distribute key.

we have plan 
```
Agg(DISTINCT_GLOBAL)
   |
Exchange(Gather)
  |
Agg(DISTINCT_LOCAL)
  |
Agg(GLOBAL)
  |
Exchange(hash distribute by k1)
 |
Agg(LOCAL) 
 |
scan
```

limitations:
1. only support sql with one distinct.
not support:`select count(distinct k1), count(distinct k2) from t`
2. only support sql with distinct one column
not support: `select count(distinct k1, k2) from t`
2023-02-03 21:51:10 +08:00
5e232a30d8 [fix](planner) Doris returns empty sets when select from a inline view (#16370)
Doris always delays the execution of expressions as possible as it can, so as the expansion of constant expression. Given below SQL:

```sql
select i from (select 'abc' as i, sum(birth) as j from  subquerytest2) as tmp
```

The aggregation would be eliminated, since its output is not required by the outer block, but the expasion for constant expression would be done in the final result expr, and since aggreagete output has been eliminate, the expasion would actually do nothing, and finally cause a empty results.

To fix this, we materialize the results expr in the inner block for such SQL, it may affect performance, but better than let system produce a mistaken result.
2023-02-03 21:23:52 +08:00
a5d9aca7ba [test](Nereids) enable G-K and L-Q scalar function regression test cases (#16169)
1. delete invalid signature of nvl function 
2. fix some test cases that failed because of malformed function name
2023-02-03 21:18:43 +08:00
87fbb8341a [Bug](datev2) Fix bug when cast datev2 to date (#16394) 2023-02-03 20:50:16 +08:00
929b31bd3c [Feature](Nereids) Support CaseWhen with subquery (#16385)
Co-authored-by: jianghaochen <jianghaochen@meituan.com>
2023-02-03 18:20:47 +08:00
3f4ca3da32 [Bug](CURRENT_TIMESTAMP) Fix wrong default value after schema change (#16364)
* [Bug](CURRENT_TIMESTAMP) Fix wrong default value after schema change

* update

* update
2023-02-03 17:06:24 +08:00
6294b29f0a [chore](regression-test) Remove array config in regression test (#16376)
The fe config "enable_array_type" is not used, this commit removes it from regression test.
2023-02-03 14:44:03 +08:00