Commit Graph

9906 Commits

Author SHA1 Message Date
d4928c60c8 [vectorized](profile) fix pipeline profile can't get result under more instances (#18525)
when enable pipeline to true, and set instances > 1
because all scan nodes share the scanners, maybe get the profile of scan node is all empty
now show all the scan nodes and remove some infos those that _num_scanners->value() == 0
2023-04-14 18:20:19 +08:00
4cde3d4f21 [Enhancement](Expr) Change small fix container size of In set to 8. (#18492)
In #17976, we introduced small fix container to optimize the in expr. This PR will change small fix container size of In set to 8, which has better performance when size > 8 by the perf test.
2023-04-14 18:19:45 +08:00
f2d75cb492 [fix](Nereids) fix signature precision round for decimalv3 (#18639)
add decimalv3 signature to below functions:
ceil
dceil
dfloor
dround
floor
round
round_bankers
truncate
fix ComputePrecisionForRound to get correct signature
2023-04-14 18:18:41 +08:00
4284fc4e75 [chore] Download apache orc source code from github if git does not work in build.sh. (#18625)
* [chore] Download apache orc source code from github if git does not work in build.sh.

* add cd "${DORIS_HOME}"

* Fix blank issue.
2023-04-14 17:54:14 +08:00
5acf764d9c [fix](trino catalog) To specify both catalog and database, run the show table command (#18645)
* [fix](trino catalog) To specify both catalog and database, run the show table command

* fix
2023-04-14 17:51:50 +08:00
90f4e4feff [Fix](thrift) add SCH_BACKENDS in TSchemaTableType (#18647) 2023-04-14 17:47:25 +08:00
362b5a34ae [feat](stats) Support to delete expired stats periodically (#18614)
Support to delete expired stats periodically and manually.

default cleaner running interval is 2 days

Manually clean syntax is
```sql
DROP EXPIRED STATS
```

TODO:
1. process external catalog's stats
2. run drop at the appointed time
3. sleep a short time after drop one batch
2023-04-14 17:32:51 +08:00
65f9db90c8 [feature](nereids) forbid unknown col stats #18617
Add session variable forbid_unknown_col_stats. When this var is true, nereids rejects to use unknown column stats.
the main purpose of this pr is to save debug effort.
2023-04-14 17:13:39 +08:00
5d1abe4507 [Bugfix](Mtmv)Fix mtmv meta load failed (#18605)
MTMV meta load fail since meta was public to the CI System
2023-04-14 16:29:18 +08:00
4174d5a707 [opt](nereids) optimze aggregation estimation #18607
`select count(*) from T group by A, B`
suppose `ndv(A) > ndv(B)`
the estimated row count of aggregate is between ndv(A) and ndv(A) * ndv(B)

in previous version, we choose upper bound, that is ndv(A) * ndv(B). The drawback of this choice is the estimated row is often bigger that row count of T.

In this version, we choose the lower bound.
2023-04-14 16:13:25 +08:00
73e087d79c [feature](Nereids): support eager agg for Plan inside project. (#18637) 2023-04-14 15:30:33 +08:00
9634d21a28 [fix](info_db) avoid infodb query timeout when external catalog info is too large or is not reachable (#18662)
When query tables in information_schema databases, it may timeout due to:

There are external catalog with too many tables.
The external catalog is unreachable
So I add a new FE config infodb_support_ext_catalog.
The default is false, which means that when select from tables in information_schema database,
the result will not contain the information of the table in external catalog.

Describe your changes.
2023-04-14 14:40:31 +08:00
db5ec6f6b0 [FIX](thrift)Fix with 1.2 version for thrift #18658 2023-04-14 14:07:42 +08:00
4d18ea30f4 [fix](Nereids) get_json_bigint should return bigint type (#18626) 2023-04-14 14:01:44 +08:00
e009c459bf [enhancement](planner) remove date function if its child's type is date (#18593)
if we have expr like below
```
date(c1) -- c1's type is date or datev2
```
the expr's result is exactly same with c1, and we should
remove date function. This expr optimization will simplify
expr, speed up execution and increase the opportunity of
push filters to storage layer.
2023-04-14 14:01:20 +08:00
81799d614e [feature-wip](resource-group) support resource group interface in be. (#18588) 2023-04-14 14:00:49 +08:00
008ae4984b [feature](Nereids): convert rightSemi to leftSemi for matching more rule. (#18648) 2023-04-14 11:20:22 +08:00
e6b0e05840 [fix](Nerieds) Fix some bugs in binding and type coercion (#18548)
1. fix bind ambiguous slots exception because select same slots
2. fix bind SetOperation multiple times because CTE
3. fix case when clause not coercion to same type
4. fix an exception when set_var hint exists in subquery or CTE
2023-04-14 11:00:24 +08:00
c704351273 [enhancement](memory) Refactor memory limit exceeded behavior (#18590)
No check mem tracker limit and no cancel task in mem hook, only in Allocator. This helps in clearer analysis of memory issues and reduces performance loss.
PODArray/hash table/arena memory allocation will use Allocator.

Optimize mem limit exceeded log printing

Optimize compilation time
2023-04-14 10:42:35 +08:00
8751f08d5a [bugfix](GEO)fix precision problem (#18642) 2023-04-14 10:39:19 +08:00
f422fe888c [Doc](typo) Remove redundant words #18659 2023-04-14 10:35:26 +08:00
183800e1ad [Fix](variables) fix session variable does not take effect immediately when set global variable in follower FE (#18609) 2023-04-14 10:35:03 +08:00
56d84739c1 [Opt](pipeline) opt the scanner ctx schedule in pipeline engine (#18545) 2023-04-14 09:59:03 +08:00
2294fb46a5 [refactor](minor) update scan concurrency for pipeline (#18650) 2023-04-14 09:45:12 +08:00
dedcfd7c28 [Doc] (Show) add doc for show create repository statement (#18542) 2023-04-14 09:44:54 +08:00
cc24e2ae13 [doc](readme)add Backend C++ Coding Specification (#18649) 2023-04-14 09:37:18 +08:00
72236d2b08 [typo](docs) add row to column doc (#18546)
* [typo](docs) add row to column doc
2023-04-14 09:04:55 +08:00
ca891d880f [fix](es) ClassCastException when getting root schema (#18438)
* [fix](es) ClassCastException when getting root schema
2023-04-14 09:04:09 +08:00
b6b4408283 [fix](meta) void NPE when save meta (#18600)
Introduced from #16878,
the newly added string field can not be null, or NPE will be thrown when calling `Text.writeString()`
2023-04-14 08:52:09 +08:00
d28030e1e5 [chore](third-party) Configure the search paths for pkg-config and cmake (#18624)
Currently, our third party libraries are built by autotools or cmake. Under some scenarios, we may use system-wide headers or libraries to build them which may make the build process fail.

We can configure the search paths explicitly to help autotools and cmake find the right dependencies.
2023-04-14 08:43:27 +08:00
b39846c2c7 [Fix](Catalog)Delete duplicate defined dependencies to avoid class loading exceptions (#18628)
`iceberg-hive-metastore` and `hive-storage-api` have been defined in hive-catalog-shade,
and some classes in the shade have been renamed, so we cannot declare them again.
The classes in the shade should be kept.

The `hive-metastore-api` used in `ranger` can also use the jar in the `shade`.
Since we rename the tool class used inside the `hive`, this has no effect.
2023-04-13 22:12:19 +08:00
1d3699a70c [refactor](jdbc) refactor jdbc connection num in datasource (#18563)
now maybe jdbc have problem that there are too many connections and they do not release,
so change the property of datasource: init = 1, min = 1, max = 100, and idle time is 10 minutes.
2023-04-13 22:08:08 +08:00
6c0af24e9d [Improve](simdjson reader) support UTF-8 unicode (with BOM) (#18585) 2023-04-13 21:58:44 +08:00
281ceee3cc [feature-wip](resource-group) Support resource group tvf (#18519)
related: #18098
2023-04-13 20:11:20 +08:00
33eec9096f [Enhancement](FE) use customized grpc threadpool to get better metric for grpc from FE to BE (#13983)
Previously in Doris FE, there is no specific thread pool for grpc-client-channel,
by default the underlying netty logic would use one dynamic unbounded cache threadpool.
The workload for this grpc threadpool is unseen.
Use ThreadpoolMgr to create one customized threadpool to get Prometheus-compatible metric data.
2023-04-13 20:09:26 +08:00
aa6b3cc537 [fix](planner)keep all agg functions if there is any virtual slots in group by list (#18630)
Because of the limitation of ProjectPlanner, we have to keep set agg functions materialized if there is any virtual slots in the group by list, such as 'GROUPING_ID' in the group by list etc.
2023-04-13 19:44:46 +08:00
2519931a04 [vectorized](function) support time_to_sec function (#18354)
support time_to_sec function
2023-04-13 19:31:12 +08:00
05badac053 [Improve](docs)new libraries check (#18634) 2023-04-13 17:57:38 +08:00
40a352959d [Pipeline](exec) Support shared scan in colo agg (#18457) 2023-04-13 17:25:41 +08:00
99558153f5 [minor](Nereids): rename func and add TODO. (#18633) 2023-04-13 17:17:43 +08:00
b72c71dec0 [fix](stats) Analysis jobs didn't get persisted properly (#18602)
In previous implementation, Doris would only persist one task to tract analysis job status. After this PR, each task of column analysis would be persisted.And store a record which task_id is -1 as the job of the user submitted AnalyzeStmt.

AnalyzeStmt <---1-1---> AnalysisJob
AnalysisJob <---1-n---> AnalysisTask
2023-04-13 16:36:06 +08:00
2f64a8b387 [feature](GEO)Support read/write WKB/EWKB to gis types (#18526)
Support mutual conversion from wkb and gis types.also compatible with EWKB format
https://cwiki.apache.org/confluence/display/DORIS/DSIP-033%3A+More+GEO+functions
2023-04-13 16:25:18 +08:00
c4e9808382 [feature](multi-catalog) support trino jdbc catalog and jdbc external table (#18497) 2023-04-13 16:00:09 +08:00
2ae0bb7f13 [minor](test) remove unused function to improve test coverage (#18598) 2023-04-13 15:30:53 +08:00
Pxl
eb46bcb304 [Bug](materialized-view) fix match wrong index on some scan node (#18561)
fix match wrong index on some scan node
2023-04-13 11:50:14 +08:00
726402b53b [bugfix](topn) fix topn runtime predicate crash in short circuit evaluate for types like string decimal (#18409) 2023-04-13 11:10:59 +08:00
df0aaece1d [Function](test) add some test cases for agg functions (#18610) 2023-04-13 10:23:41 +08:00
4335c9998f [chore](ARM) Add some vectorization compatibility code on aarch64 (#18553)
update sse2noen to support more sse code on arm cpus
2023-04-13 10:15:33 +08:00
6d91635c5b [fix](json_reader) Do not increase the value of read_rows for empty line (#18611)
If read an empty row the row num++, the row num will be larger than actual column size, it will core.
2023-04-13 10:08:11 +08:00
3c3364ba27 [chore](row store) ignore serialize block to row column if no row store column (#18601) 2023-04-13 10:02:33 +08:00