Commit Graph

5755 Commits

Author SHA1 Message Date
cfb110c905 [fix](nereids) fix some nereids bugs (#15714)
1. remove forcing nullable for slot on EmptySetNode.
2. order by xxx desc should use nulls last as default order.
3. don't create runtime filter if runtime filter mode is OFF.
4. group by constant value need check the corresponding expr shouldn't have any aggregation functions.
5. fix two left outer join reorder bug( A left join B left join C).
6. fix semi join and left outer join reorder bug.( A left join B semi join C ).
7. fix group by NULL bug.
8. change ceil and floor function to correct signature.
9. add literal comparasion for string and date type.
10. fix the getOnClauseUsedSlots method may not return valid value.
11. the tightness common type of string and date should be date.
12. the nullability of set operation node's result exprs is not set correctly.
13. Sort node should remove redundent ordering exprs.
2023-01-11 17:18:44 +08:00
d4e4e18b47 [fix](DOE): Fix query _id error and es properties error (#15792)
Fix query _id error
_id not exist mapping, but be can query it, we need skip check it exist mapping.
2023-01-11 17:00:59 +08:00
18a3b75626 [fix](QueryDetail) fix QueryDetail may be incorrect and null pointer exception (#15765)
* [fix](QueryDetail) fix QueryDetail may be incorrect and null pointer exception
2023-01-11 16:38:55 +08:00
4424874237 [fix](Nereids): move parentExpression in moveOwnership() (#15786) 2023-01-11 15:47:37 +08:00
006b3bd61a [fix](nereids) orthogonal_bitmap_intersect's return type should be bitmap (#15784) 2023-01-11 12:53:37 +08:00
7f2c433e08 [feature](Nereids) add relation id to unboundTVFRelation to avoid incorrect group expression comparison (#15740) 2023-01-11 12:49:14 +08:00
Pxl
2587095811 [Bug](mv) fix mv selector check group expr && forbid create dup mv with bitmap/hll && add some case (#15738) 2023-01-11 11:38:56 +08:00
3c8c31a5f8 [chore](Session) remove unused codes for enable_lateral_view
session variable `enable_lateral_view` has been removed for a long time.
This cl just remove variable name `enable_lateral_view`.
2023-01-11 11:24:28 +08:00
89c21af87d [chore](fe) update fe snapshot to 1.2 and fix auditloader compile error (#15787)
This PR #14925 change some field of AuditEvent, so we need to upgrade the fe-core's SNAPSHOT to 1.2
because auditloader depends on fe-core

Already push the 1.2-SNAPSHOT to
https://repository.apache.org/content/repositories/snapshots/org/apache/doris/fe-core/1.2-SNAPSHOT/
2023-01-11 08:46:48 +08:00
8f31a36429 [feature] support spill to disk for sort node (#15624) 2023-01-11 08:40:58 +08:00
bc34a44f06 [Fix](Nereids) fix type coercion for binary arithmetic (#15185)
support sql like: select true + 1 + '2.0' and prevent select true + 1 + 'x';
2023-01-11 02:55:44 +08:00
c87a9a5949 [fix](Nereids) Add varchar literal compare (#15672)
support "1" = "123"
2023-01-11 02:41:50 +08:00
280603b253 [fix](nereids) bind sort key priority problem (#15646)
`a.b.c` should only bind on `a.b.c`, not on `b.c` or `c`
2023-01-11 02:03:09 +08:00
ab2e0fd397 [fix](tvf) cancel strict restrictions on tvf parameters (#15764)
Cancel strict restrictions on tvf parameters.
2023-01-10 22:40:19 +08:00
79b24cdb1f [fix](JdbcResource) fix that JdbcResource does not support the jdbcurl of Oracle and SQLServer (#15757)
Actually, `JdbcResource` should support `Oracle` jdbcurl and `SQLServer` jdbcurl for jdbc external table.
2023-01-10 22:38:30 +08:00
90a92f0643 [feature-wip](multi-catalog) add iceberg tvf to read snapshots (#15618)
Support new table value function `iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")`
we can use the sql `select * from iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")` to get snapshots info  of a table. The other iceberg metadata will be supported later when needed.

One of the usage:

Before we use following sql to time travel:
`select * from ice_table FOR TIME AS OF "2022-10-10 11:11:11"`;
`select * from ice_table FOR VERSION AS OF "snapshot_id"`;
we can use the snapshots metadata to get the `committed time` or `snapshot_id`, 
and then, we can use it as the time or version in time travel clause
2023-01-10 22:37:35 +08:00
542542a4b2 [fix](nereids) fix bug in estimation of min/max of Year (#15712)
1. fix bug in estimation of min/max of Year
2. remove Utils.getLocalDatetimeFromLong(Long). this method is will throw exception if input parameter is too big. And this method is not used any more when we fix the above bug
2023-01-10 21:29:16 +08:00
fec89ad58c [fix](nereids) week should be able to recognized as function name in function call context (#15735) 2023-01-10 19:54:59 +08:00
7767931aca [ehancement](nereids) let parser support utf8 identifier (#15721)
After this PR, below SQL could be parsed well too
- SELECT k1 AS 测试 FROM  test;
- SELECT k1 AS テスト FROM test;
2023-01-10 19:43:04 +08:00
bb28144c76 [fix](schema change) bugfix for light schema change while with rollup (#15681)
Describe your changes.
this problem come from pr: #11494

After add column to rollup index, it also change column UniqueId inside base index.
2023-01-10 19:03:06 +08:00
a67cea2d27 [Enhancement](metric) add current edit log metric (#15657) 2023-01-10 18:46:57 +08:00
503b6ee4da [chore](vulnerability) fix fe high risk vulnerability scanned by bug scanner (#15649) 2023-01-10 17:44:18 +08:00
47097a3db8 [fix](having) revert 15143 and fix having clause with multi-conditions (#15745)
Describe your changes.

Firstly having clause of Mysql is really very complex, we are hard to follow all rules, so we revert pr15143 to keep the logic the same as before.

Secondly the origin implementation has problem while having clause has multi-conditions.
For example:

case1: here v2 inside having clause use table column test_having_alias_tb.v2
SELECT id, v1-2 as v, sum(v2) v2 FROM test_having_alias_tb GROUP BY id,v having(v2>1);
ERROR 1105 (HY000): errCode = 2, detailMessage = HAVING clause not produced by aggregation output (missing from GROUP BY clause?): (`v2` > 1)
case2: here v2 inside having clause use alias name v2 =sum(test_having_alias_tb.v2), another condition make logic of v2 differently.
SELECT id, v1-2 as v, sum(v2) v2 FROM test_having_alias_tb GROUP BY id,v having(v>0 AND v2>1) ORDER BY id,v;
+------+------+------+
| id   | v    | v2   |
+------+------+------+
|    2 |    1 |    3 |
+------+------+------+
So here we try to make the having clause rules simple:
Rule1: if alias name inside having clause is the same as column name, we use column name not alias name;
Rule2: if alias name inside having clause do not have same name as column name, we use alias name;

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2023-01-10 15:57:29 +08:00
dec79c000b [fix](MTMV) build mode is missing after restart FE (#15551) 2023-01-10 11:38:56 +08:00
1888aba301 [fix](MTMV) fix replayReplaceTable error when restart fe (#15564) 2023-01-10 11:36:17 +08:00
025623a124 [feature](Nereids) Support lots of aggregate functions (#15671)
1. generate lots of aggregate functions
2. support `group_concat(columns order by order_columns)`  grammer
3. support and generate array aggregate/scalar functions, like `array_union`. we should support array grammar in the future, e.g. `select [1, 2, 3]`
4. add `checkLegalityBeforeTypeCoercion` and `checkLegalityAfterRewrite` function to check the legality of expression before type coercion and after rewrite, copy the semantic check of `FunctionCallExpr` to the checkLegality; remove the `ForbiddenMetricTypeArguments`; move the check of aes/sm4 crypto function from translator to checkLegalityBeforeTypeCoercion
5. refactor the `NullableAggregateFunction`: distinct is the first parameter, alwaysNullable is the second parameter; Fix some wrong initialize order: some function invoke super(distinct, alwaysNullable) but some function invoke super(alwaysNullable, distinct)
2023-01-10 11:20:27 +08:00
601d9af23b [fix](planner) disconjunct in sub-query failed when plan it on hash join (#15653)
all conjuncts should be added before HashJoinNode init. Otherwise, some slots on conjuncts linked to the tuple not in intermediate tuple on HashJoinNode
2023-01-10 11:10:12 +08:00
c19e391d32 [fix](profile) show query profile for pipeline engine (#15687) 2023-01-10 10:12:34 +08:00
9e3a61989b [refactor](es) remove BE generated dsl for es query #15751
remove fe config enable_new_es_dsl and all related code.
Now the DSL for es is always generated on FE side.
2023-01-10 08:40:32 +08:00
05f6e4c48a [fix](predicate) fix be core dump caused by pushing down the double column predicate (#15693) 2023-01-09 19:31:04 +08:00
2b0e5e42a5 [ehancement](nereids) Support list parttion prune (#15724) 2023-01-09 19:00:53 +08:00
67ceb83294 [enhance](Nereids): polish test format, add more comment. (#15662) 2023-01-09 15:40:27 +08:00
5ceb5441f4 [feature](nereids) let set operation syntax campatible with lagecy planner (#15664)
Though this syntax doesn't get suppoted in many other systems since the order by clause here almost redandunt and useless but we have to keep consistent with the legacy doris syntax

Here is a example:
SELECT * FROM (SELECT k1, k3 FROM tbl1 ORDER BY k3 UNION ALL SELECT k1, k5 FROM tbl2) t;
2023-01-09 15:31:29 +08:00
7543d677fa [fix](nereids) Fix the bugs of data distribution calculation on OlapScan (#15699)
when need to scan more than one olap table partition and it is not a colocate table or its colocate group is unstable, we need to make it as any distribution even if its distribution type is Hash
2023-01-09 15:25:54 +08:00
e2492cf7fc [Bug](DECIMALV3) Fix binary predicate between decimalv3 and float (#15696) 2023-01-09 15:16:59 +08:00
2c9c7c48ac [improvement](decimalv3) Java UDF and array type support DECIMALV3 (#15674) 2023-01-09 15:13:16 +08:00
4c50c4906b [fix](Nereids) add implicit casting for arithmetic expression (#15630)
Add implicit casting for arithmetic expression to support select "1" + "2"
2023-01-09 15:10:35 +08:00
4f2bea86ee [fix](Nereids) divide operator return type is not same with lagecy planner (#15707) 2023-01-09 14:50:24 +08:00
93b941baeb [fix](tvf) use virtual-hosted style when s3('uri'='s3://xxx') (#15617)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-01-09 14:09:40 +08:00
211cc66d02 [fix](multi-catalog) fix image loading failture when create catalog with resource (#15692)
Bug fix
fix image loading failture when create catalog with resource
When creating jdbc catalog with resource, the metadata image will failed to be loaded.
Because when loading jdbc catalog image, it will try to get resource from ResourceMgr,
but ResourceMgr has not been loaded, so NPE will be thrown.

This PR fix this bug, and refactor some logic about catalog and resource.

When loading jdbc catalog image, it will not get resource from ResourceMgr.
And now user can create catalog with resource and properties, like:

create catalog jdbc_catalog with resource jdbc_resource
properites("user" = "user1");
The properties in "properties" clause will overwrite the properties in "jdbc_resource".

force adding tinyInt1isBit=false to jdbc url
The default value of tinyInt1isBit is true, and it will cause tinyint in mysql to be bit type.
force adding tinyInt1isBit=false to jdbc url so that the tinyint in mysql will be tinyint in Doris.

Avoid calculate checksum of jdbc driver jar multiple times
Refactor
Refactor the notification logic when updating properties in resource.
When updating properties in resource, it will notify the corresponding catalog to update its own properties.
This PR change this logic. After updating properties in resource, it will only uninitialize the catalog's internal
objects such "jdbc client" or "hms client". And this objects will be re-initialized lazily.

And all properties will be got from Resource at runtime, so that it will always get the latest properties

Regression test cases
Because we add tinyInt1isBit=false to jdbc url, some of cases need to be changed.
2023-01-09 09:56:26 +08:00
Pxl
1514b5ab5c [Feature](Materialized-View) support advanced Materialized-View (#15212) 2023-01-09 09:53:11 +08:00
97cea9b5c9 [improvement](bdbje) add more log to make bdbje DatabaseNotFoundException problem easily solved (#15715)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-01-09 08:55:21 +08:00
wxy
6829d361cb [Feature](audit) add errorCode and errorMessage in audit log (#14925)
* [feat] add errorCode and errorMessage in audit log.

* [Feature](audit) add errorCode and errorMessage in audit log

Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2023-01-09 08:47:57 +08:00
f256bb8d39 [fix](meta) fix priv table load bug when upgrading to 1.2.x (#15706)
In old version, NODE_PRIV will be incorrectly assigned to normal users.
So when upgrading to 1.2.x, it will failed to handle this unexpected case.
This PR fix this by removing NODE_PRIV from normal user.
2023-01-09 08:38:26 +08:00
36590da24b [fix](regression p0) add the alias function hist to histogram and fix p0 (#15708)
add the alias function hist to histogram and fix p0
2023-01-08 11:31:23 +08:00
500c7fb702 [improvement](multi-catalog) support unsupported column type (#15660)
When creating an external catalog, Doris will automatically sync the schema of table from external catalog.
But some of column type are not supported by Doris now, such as struct, map, etc.

In previous, when meeting these unsupported column, Doris will throw an exception, and the corresponding
table can not be synced. But user may just want to query other supported columns.

In this PR, I add a new column type: UNSUPPORTED. And now it is just used for external table schema sync.
When meeting unsupported column, it will be synced as column with UNSUPPORTED type.

When query this table, there are serval situation:

select * from table: throw error Unsupported type 'UNSUPPORTED_TYPE' xxx
select k1 from table: k1 is with supported type. query OK.
select * except(k2): k2 is with unsupported type. query OK
2023-01-08 10:07:10 +08:00
5dfdacd278 [enhancement](histogram) add histogram syntax and perstist histogram statistics (#15490)
Histogram statistics are more expensive to collect and we collect and persist them separately.

This PR does the following work:
1. Add histogram syntax and add keyword `TABLE`
2. Add the task of collecting histogram statistics
3. Persistent histogram statistics
4. Replace fastjson with gson
5. Add unit tests...

Relevant syntax examples:
> Refer to some databases such as mysql and add the keyword `TABLE`.

```SQL
-- collect column statistics
ANALYZE TABLE statistics_test;

-- collect histogram statistics
ANALYZE TABLE statistics_test UPDATE HISTOGRAM ON col1,col2;
```

base on #15317
2023-01-07 00:55:42 +08:00
76ad599fd7 [enhancement](histogram) optimise aggregate function histogram (#15317)
This pr mainly to optimize the histogram(👉🏻 https://github.com/apache/doris/pull/14910)  aggregation function. Including the following:
1. Support input parameters `sample_rate` and `max_bucket_num`
2. Add UT and regression test
3. Add documentation
4. Optimize function implementation logic
 
Parameter description:
- `sample_rate`:Optional. The proportion of sample data used to generate the histogram. The default is 0.2.
- `max_bucket_num`:Optional. Limit the number of histogram buckets. The default value is 128.

---

Example:

```
MySQL [test]> SELECT histogram(c_float) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_float`)                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.2,"max_bucket_num":128,"bucket_num":3,"buckets":[{"lower":"0.1","upper":"0.1","count":1,"pre_sum":0,"ndv":1},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+

MySQL [test]> SELECT histogram(c_string, 0.5, 2) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_string`)                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.5,"max_bucket_num":2,"bucket_num":2,"buckets":[{"lower":"str1","upper":"str7","count":4,"pre_sum":0,"ndv":3},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+
```

Query result description:

```
{
    "sample_rate": 0.2, 
    "max_bucket_num": 128, 
    "bucket_num": 3, 
    "buckets": [
        {
            "lower": "0.1", 
            "upper": "0.2", 
            "count": 2, 
            "pre_sum": 0, 
            "ndv": 2
        }, 
        {
            "lower": "0.8", 
            "upper": "0.9", 
            "count": 2, 
            "pre_sum": 2, 
            "ndv": 2
        }, 
        {
            "lower": "1.0", 
            "upper": "1.0", 
            "count": 2, 
            "pre_sum": 4, 
            "ndv": 1
        }
    ]
}
```

Field description:
- sample_rate:Rate of sampling
- max_bucket_num:Limit the maximum number of buckets
- bucket_num:The actual number of buckets
- buckets:All buckets
    - lower:Upper bound of the bucket
    - upper:Lower bound of the bucket
    - count:The number of elements contained in the bucket
    - pre_sum:The total number of elements in the front bucket
    - ndv:The number of different values in the bucket

> Total number of histogram elements = number of elements in the last bucket(count) + total number of elements in the previous bucket(pre_sum).
2023-01-07 00:50:32 +08:00
9c8fcd805c [feature](Nereids) support variable type expression (#15659) 2023-01-07 00:32:57 +08:00
08d439cde7 [feature](Nereids) add keyword rlike (#15647) 2023-01-07 00:28:21 +08:00