Commit Graph

4359 Commits

Author SHA1 Message Date
7dd96bc341 [fix](olap) remove zorder support when create table (#18698) 2023-04-16 09:24:18 +08:00
8f0d4ae625 [Fix](fe)Upgrade hive-catalog-shade version to 1.0.3 (#18690) 2023-04-15 22:10:45 +08:00
bcff3710ca [fix] set execution timeout for brokerload and use query timeout when… (#18694)
We should use query timeout if execution timeout is not set to upgrade.
2023-04-15 20:41:04 +08:00
d2efc619b0 [Enchancement](statistics) Show histogram statistics, show specified column statistics (#18657) 2023-04-14 22:36:40 +08:00
f7e129934e [fix](nereids) only order by slot reference could use topn opt (#18622)
select cast(k1 as INT) as id from tbl1 order by id limit 2; 

is not valid for topN optimization, because 'id' is
a cast expr not a table column from scan node.
This pr address this issue.
2023-04-14 20:59:06 +08:00
683d64b361 [Refactor](multi catalog)Remove redundant param context for FileQueryScanNode (#18636)
Remove redundant param context for FileQueryScanNode.
Remove duplicated code for QueryScanProviders.
2023-04-14 20:20:21 +08:00
e1b3955e05 [refactor](jdbc) using jvm parameters to init jdbc datasource (#18670)
using the jvm parameters to init jdbc datasource connect pool.
if anyone don't need to maintain the connect, so could set JDBC_MIN_POOL=0
2023-04-14 18:45:29 +08:00
f2d75cb492 [fix](Nereids) fix signature precision round for decimalv3 (#18639)
add decimalv3 signature to below functions:
ceil
dceil
dfloor
dround
floor
round
round_bankers
truncate
fix ComputePrecisionForRound to get correct signature
2023-04-14 18:18:41 +08:00
5acf764d9c [fix](trino catalog) To specify both catalog and database, run the show table command (#18645)
* [fix](trino catalog) To specify both catalog and database, run the show table command

* fix
2023-04-14 17:51:50 +08:00
362b5a34ae [feat](stats) Support to delete expired stats periodically (#18614)
Support to delete expired stats periodically and manually.

default cleaner running interval is 2 days

Manually clean syntax is
```sql
DROP EXPIRED STATS
```

TODO:
1. process external catalog's stats
2. run drop at the appointed time
3. sleep a short time after drop one batch
2023-04-14 17:32:51 +08:00
65f9db90c8 [feature](nereids) forbid unknown col stats #18617
Add session variable forbid_unknown_col_stats. When this var is true, nereids rejects to use unknown column stats.
the main purpose of this pr is to save debug effort.
2023-04-14 17:13:39 +08:00
5d1abe4507 [Bugfix](Mtmv)Fix mtmv meta load failed (#18605)
MTMV meta load fail since meta was public to the CI System
2023-04-14 16:29:18 +08:00
4174d5a707 [opt](nereids) optimze aggregation estimation #18607
`select count(*) from T group by A, B`
suppose `ndv(A) > ndv(B)`
the estimated row count of aggregate is between ndv(A) and ndv(A) * ndv(B)

in previous version, we choose upper bound, that is ndv(A) * ndv(B). The drawback of this choice is the estimated row is often bigger that row count of T.

In this version, we choose the lower bound.
2023-04-14 16:13:25 +08:00
73e087d79c [feature](Nereids): support eager agg for Plan inside project. (#18637) 2023-04-14 15:30:33 +08:00
9634d21a28 [fix](info_db) avoid infodb query timeout when external catalog info is too large or is not reachable (#18662)
When query tables in information_schema databases, it may timeout due to:

There are external catalog with too many tables.
The external catalog is unreachable
So I add a new FE config infodb_support_ext_catalog.
The default is false, which means that when select from tables in information_schema database,
the result will not contain the information of the table in external catalog.

Describe your changes.
2023-04-14 14:40:31 +08:00
db5ec6f6b0 [FIX](thrift)Fix with 1.2 version for thrift #18658 2023-04-14 14:07:42 +08:00
4d18ea30f4 [fix](Nereids) get_json_bigint should return bigint type (#18626) 2023-04-14 14:01:44 +08:00
e009c459bf [enhancement](planner) remove date function if its child's type is date (#18593)
if we have expr like below
```
date(c1) -- c1's type is date or datev2
```
the expr's result is exactly same with c1, and we should
remove date function. This expr optimization will simplify
expr, speed up execution and increase the opportunity of
push filters to storage layer.
2023-04-14 14:01:20 +08:00
008ae4984b [feature](Nereids): convert rightSemi to leftSemi for matching more rule. (#18648) 2023-04-14 11:20:22 +08:00
e6b0e05840 [fix](Nerieds) Fix some bugs in binding and type coercion (#18548)
1. fix bind ambiguous slots exception because select same slots
2. fix bind SetOperation multiple times because CTE
3. fix case when clause not coercion to same type
4. fix an exception when set_var hint exists in subquery or CTE
2023-04-14 11:00:24 +08:00
183800e1ad [Fix](variables) fix session variable does not take effect immediately when set global variable in follower FE (#18609) 2023-04-14 10:35:03 +08:00
ca891d880f [fix](es) ClassCastException when getting root schema (#18438)
* [fix](es) ClassCastException when getting root schema
2023-04-14 09:04:09 +08:00
b6b4408283 [fix](meta) void NPE when save meta (#18600)
Introduced from #16878,
the newly added string field can not be null, or NPE will be thrown when calling `Text.writeString()`
2023-04-14 08:52:09 +08:00
b39846c2c7 [Fix](Catalog)Delete duplicate defined dependencies to avoid class loading exceptions (#18628)
`iceberg-hive-metastore` and `hive-storage-api` have been defined in hive-catalog-shade,
and some classes in the shade have been renamed, so we cannot declare them again.
The classes in the shade should be kept.

The `hive-metastore-api` used in `ranger` can also use the jar in the `shade`.
Since we rename the tool class used inside the `hive`, this has no effect.
2023-04-13 22:12:19 +08:00
1d3699a70c [refactor](jdbc) refactor jdbc connection num in datasource (#18563)
now maybe jdbc have problem that there are too many connections and they do not release,
so change the property of datasource: init = 1, min = 1, max = 100, and idle time is 10 minutes.
2023-04-13 22:08:08 +08:00
281ceee3cc [feature-wip](resource-group) Support resource group tvf (#18519)
related: #18098
2023-04-13 20:11:20 +08:00
33eec9096f [Enhancement](FE) use customized grpc threadpool to get better metric for grpc from FE to BE (#13983)
Previously in Doris FE, there is no specific thread pool for grpc-client-channel,
by default the underlying netty logic would use one dynamic unbounded cache threadpool.
The workload for this grpc threadpool is unseen.
Use ThreadpoolMgr to create one customized threadpool to get Prometheus-compatible metric data.
2023-04-13 20:09:26 +08:00
aa6b3cc537 [fix](planner)keep all agg functions if there is any virtual slots in group by list (#18630)
Because of the limitation of ProjectPlanner, we have to keep set agg functions materialized if there is any virtual slots in the group by list, such as 'GROUPING_ID' in the group by list etc.
2023-04-13 19:44:46 +08:00
40a352959d [Pipeline](exec) Support shared scan in colo agg (#18457) 2023-04-13 17:25:41 +08:00
99558153f5 [minor](Nereids): rename func and add TODO. (#18633) 2023-04-13 17:17:43 +08:00
b72c71dec0 [fix](stats) Analysis jobs didn't get persisted properly (#18602)
In previous implementation, Doris would only persist one task to tract analysis job status. After this PR, each task of column analysis would be persisted.And store a record which task_id is -1 as the job of the user submitted AnalyzeStmt.

AnalyzeStmt <---1-1---> AnalysisJob
AnalysisJob <---1-n---> AnalysisTask
2023-04-13 16:36:06 +08:00
2f64a8b387 [feature](GEO)Support read/write WKB/EWKB to gis types (#18526)
Support mutual conversion from wkb and gis types.also compatible with EWKB format
https://cwiki.apache.org/confluence/display/DORIS/DSIP-033%3A+More+GEO+functions
2023-04-13 16:25:18 +08:00
c4e9808382 [feature](multi-catalog) support trino jdbc catalog and jdbc external table (#18497) 2023-04-13 16:00:09 +08:00
Pxl
eb46bcb304 [Bug](materialized-view) fix match wrong index on some scan node (#18561)
fix match wrong index on some scan node
2023-04-13 11:50:14 +08:00
d57371da13 [feature](struct-type) support basic struct constructor function (#18190)
This commit will support struct and named_struct function.
2023-04-13 09:18:00 +08:00
af0cf0c050 [Fix](multi catalog)Refresh table object while refresh external table. (#18592)
Refresh table object while refresh external table. Including:
Refresh catalog, refresh database and refresh table.
Before visiting database, need to guarantee catalog has been initialized.
Before visiting table, need to guarantee catalog and database have been initialized.
2023-04-13 08:49:44 +08:00
a9f9366736 [fix](nereids) the data type of compareExpr and listQuery should be the same when creating InSubquery (#18539)
Consider sql

select table_B_alias.b from table_B_alias where table_B_alias.b in ( select a from table_A_alias );

if table_B_alias.b is int and table_A_alias.a is bigint,
we should cast(b as bigint) to make the data type the same as the InSubquery.
2023-04-12 20:02:37 +08:00
3cf4f49444 [FixBug](jdbc Catalog) fix sqlserver column type mapping (#18518)
For type int identity of sqlserver, the column type read from JDBC is called int indentity. So we need deal with this case.
2023-04-12 19:58:30 +08:00
edbe3e40b3 [fix](nereids) remove unused visitDateTimeV2Literal method (#18568)
BE supports date v2 literal and datetime v2 literal now, so remove visitDateTimeV2Literal method
2023-04-12 19:52:22 +08:00
09a4e9fd6b [enhancement](Nereids) Simplify the codes for runtime filter validation (#18571)
Since the goal of `ColumnStatistic#coverage` function is to determine whether the build side range is complete enclosed by the range of probe side, if so, as the comment of `RuntimeFilterPruner` explained, corresponding runtime filter might be thought as useless and get pruned.

Howerver, the original logic of this method is quite confused.

Simplify its logic by this formula:

```java
!(this.maxValue >= other.maxValue && this.maxValue <= other.maxValue)
```
2023-04-12 17:55:29 +08:00
db44970685 [feature](stats) Support sync analyze (#18567)
Gammer:

```
ANALYZE [SYNC] TABLE ....
```

Add this feature so that we could test and tune stats framework conveniently.
2023-04-12 17:49:30 +08:00
b93e04ab66 [test](Nereids) add regression test to check join order for tpch queries (#18543)
by explain shape plan command, with stats injection, we add regression test to check tpch queries' plan shape.
2023-04-12 15:43:21 +08:00
5dbc7e1c0e [fix](fe) add fe isReady check before getMasterIp (#18417)
when fe node is not ready, will get "" for master ip, and redirect will get error

---------

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-04-12 14:33:31 +08:00
75fd4b70fa [improve](fe)Optimize fe binary package packaging (#18554) 2023-04-12 12:58:45 +08:00
39a7a4cc55 [feat](Nereids): a new CBO rule: Eager Split/GroupByCount (#18556) 2023-04-12 12:13:06 +08:00
155e4e547b [pipeline](profile) Show each instance profile in FE (#18544) 2023-04-12 11:25:46 +08:00
cb644d5bc3 [feature](function) support any type in SQL function (#18392)
Add AnyType to Doris.
Support Inference function in fe SQL function.
2023-04-11 19:45:02 +08:00
876b4efdf1 [fix](nereids)remove redundant session Var ENABLE_NEREIDS_RUNTIME_FILTER (#18523)
remove redundant session Var ENABLE_NEREIDS_RUNTIME_FILTER
2023-04-11 18:48:54 +08:00
79cd50e1ff [enhancement](statistics) update semi/anti cardinality estimation algorithm (#18524) 2023-04-11 16:51:24 +08:00
25008bbf7f [feat](Nereids): a new CBO rule: Eager Count/GroupBy. (#18511) 2023-04-11 16:37:59 +08:00