Commit Graph

9922 Commits

Author SHA1 Message Date
042cf2a1bf [enhancement](ut) add ut for buffered reader (#18667) 2023-04-16 18:08:22 +08:00
70a418c4a8 [improvement](release) add release download scripts (#18703)
For now, there are 3 packages for the release binaries of Doris: https://doris.apache.org/download
And user may be confused about how to download and deploy these packages.

So I provide a download script for each release, and user can simply download the script and run it, like:

```
> sh download_x64_apache.sh

Begin to download FE from "https://mirrors.tuna.tsinghua.edu.cn/apache/doris/1.2/1.2.3-rc02/apache-doris-fe-1.2.3-bin-x86_64.tar.xz" to "apache-doris-1.2.3-bin/" ...
Total size: 408078012 Bytes
#################################################### 100.0%
Begin to download BE from "https://mirrors.tuna.tsinghua.edu.cn/apache/doris/1.2/1.2.3-rc02/apache-doris-be-1.2.3-bin-x86_64.tar.xz" to "apache-doris-1.2.3-bin/" ...
Total size: 606211324 Bytes
#################################################### 100.0%
Begin to download DEPS from "https://mirrors.tuna.tsinghua.edu.cn/apache/doris/1.2/1.2.3-rc02/apache-doris-dependencies-1.2.3-bin-x86_64.tar.xz" to "apache-doris-1.2.3-bin/" ...
Total size: 253869148 Bytes
#################################################### 100.0%
Begin to assemble the binaries ...
Move java-udf-jar-with-dependencies.jar to be/lib/ ...
Download complete!
You can now deploy Apache Doris from apache-doris-1.2.3-bin/
```

The script will do the rest.

This script will later be published on the Download page of Apache Doris website, so that user can easily get
it and use it.

Currently only for Linux platform. Other platform is untested.
2023-04-16 16:49:59 +08:00
69ae14f228 [Bug](pipeline) regression heap use after free (#18701) 2023-04-16 16:22:41 +08:00
3ec52dc7da [tpch](nereids) add regression test for tpch_sf500 plan shape #18631
add regression test to check tpch_sf500 plan shape by explain shape plan.
2023-04-16 11:37:33 +08:00
7dd96bc341 [fix](olap) remove zorder support when create table (#18698) 2023-04-16 09:24:18 +08:00
8f0d4ae625 [Fix](fe)Upgrade hive-catalog-shade version to 1.0.3 (#18690) 2023-04-15 22:10:45 +08:00
bcff3710ca [fix] set execution timeout for brokerload and use query timeout when… (#18694)
We should use query timeout if execution timeout is not set to upgrade.
2023-04-15 20:41:04 +08:00
cc4778a271 [Fix](orc-reader) Check hasNulls() firstly when use notNull data in ColumnVectorBatch. #18674 2023-04-15 19:48:31 +08:00
d653a64fb9 [minor](thrift) modify identifier to compatible with 1.2-lts (#18641)
* [minor](thrift) modify identifier

* udpate
2023-04-15 17:49:09 +08:00
Pxl
975b373896 [Chore](thrift) add some check on client cache && remove some unused code && catch st… #18683 2023-04-15 17:47:51 +08:00
98b8bef05b [bugfix](inverted index) fix inverted index to support NULL value filter (#18302) 2023-04-15 13:20:26 +08:00
d2efc619b0 [Enchancement](statistics) Show histogram statistics, show specified column statistics (#18657) 2023-04-14 22:36:40 +08:00
30a783908e [test][typo](alter) enhance the suspicious rollup case for alter LSC and add experimental docs (#18612)
* improve rollup case

* add docs and mark it as experimental

* adjust the unique key in uniq-rollup-case to add flexibility
2023-04-14 21:15:04 +08:00
f7e129934e [fix](nereids) only order by slot reference could use topn opt (#18622)
select cast(k1 as INT) as id from tbl1 order by id limit 2; 

is not valid for topN optimization, because 'id' is
a cast expr not a table column from scan node.
This pr address this issue.
2023-04-14 20:59:06 +08:00
683d64b361 [Refactor](multi catalog)Remove redundant param context for FileQueryScanNode (#18636)
Remove redundant param context for FileQueryScanNode.
Remove duplicated code for QueryScanProviders.
2023-04-14 20:20:21 +08:00
e1b3955e05 [refactor](jdbc) using jvm parameters to init jdbc datasource (#18670)
using the jvm parameters to init jdbc datasource connect pool.
if anyone don't need to maintain the connect, so could set JDBC_MIN_POOL=0
2023-04-14 18:45:29 +08:00
d4928c60c8 [vectorized](profile) fix pipeline profile can't get result under more instances (#18525)
when enable pipeline to true, and set instances > 1
because all scan nodes share the scanners, maybe get the profile of scan node is all empty
now show all the scan nodes and remove some infos those that _num_scanners->value() == 0
2023-04-14 18:20:19 +08:00
4cde3d4f21 [Enhancement](Expr) Change small fix container size of In set to 8. (#18492)
In #17976, we introduced small fix container to optimize the in expr. This PR will change small fix container size of In set to 8, which has better performance when size > 8 by the perf test.
2023-04-14 18:19:45 +08:00
f2d75cb492 [fix](Nereids) fix signature precision round for decimalv3 (#18639)
add decimalv3 signature to below functions:
ceil
dceil
dfloor
dround
floor
round
round_bankers
truncate
fix ComputePrecisionForRound to get correct signature
2023-04-14 18:18:41 +08:00
4284fc4e75 [chore] Download apache orc source code from github if git does not work in build.sh. (#18625)
* [chore] Download apache orc source code from github if git does not work in build.sh.

* add cd "${DORIS_HOME}"

* Fix blank issue.
2023-04-14 17:54:14 +08:00
5acf764d9c [fix](trino catalog) To specify both catalog and database, run the show table command (#18645)
* [fix](trino catalog) To specify both catalog and database, run the show table command

* fix
2023-04-14 17:51:50 +08:00
90f4e4feff [Fix](thrift) add SCH_BACKENDS in TSchemaTableType (#18647) 2023-04-14 17:47:25 +08:00
362b5a34ae [feat](stats) Support to delete expired stats periodically (#18614)
Support to delete expired stats periodically and manually.

default cleaner running interval is 2 days

Manually clean syntax is
```sql
DROP EXPIRED STATS
```

TODO:
1. process external catalog's stats
2. run drop at the appointed time
3. sleep a short time after drop one batch
2023-04-14 17:32:51 +08:00
65f9db90c8 [feature](nereids) forbid unknown col stats #18617
Add session variable forbid_unknown_col_stats. When this var is true, nereids rejects to use unknown column stats.
the main purpose of this pr is to save debug effort.
2023-04-14 17:13:39 +08:00
5d1abe4507 [Bugfix](Mtmv)Fix mtmv meta load failed (#18605)
MTMV meta load fail since meta was public to the CI System
2023-04-14 16:29:18 +08:00
4174d5a707 [opt](nereids) optimze aggregation estimation #18607
`select count(*) from T group by A, B`
suppose `ndv(A) > ndv(B)`
the estimated row count of aggregate is between ndv(A) and ndv(A) * ndv(B)

in previous version, we choose upper bound, that is ndv(A) * ndv(B). The drawback of this choice is the estimated row is often bigger that row count of T.

In this version, we choose the lower bound.
2023-04-14 16:13:25 +08:00
73e087d79c [feature](Nereids): support eager agg for Plan inside project. (#18637) 2023-04-14 15:30:33 +08:00
9634d21a28 [fix](info_db) avoid infodb query timeout when external catalog info is too large or is not reachable (#18662)
When query tables in information_schema databases, it may timeout due to:

There are external catalog with too many tables.
The external catalog is unreachable
So I add a new FE config infodb_support_ext_catalog.
The default is false, which means that when select from tables in information_schema database,
the result will not contain the information of the table in external catalog.

Describe your changes.
2023-04-14 14:40:31 +08:00
db5ec6f6b0 [FIX](thrift)Fix with 1.2 version for thrift #18658 2023-04-14 14:07:42 +08:00
4d18ea30f4 [fix](Nereids) get_json_bigint should return bigint type (#18626) 2023-04-14 14:01:44 +08:00
e009c459bf [enhancement](planner) remove date function if its child's type is date (#18593)
if we have expr like below
```
date(c1) -- c1's type is date or datev2
```
the expr's result is exactly same with c1, and we should
remove date function. This expr optimization will simplify
expr, speed up execution and increase the opportunity of
push filters to storage layer.
2023-04-14 14:01:20 +08:00
81799d614e [feature-wip](resource-group) support resource group interface in be. (#18588) 2023-04-14 14:00:49 +08:00
008ae4984b [feature](Nereids): convert rightSemi to leftSemi for matching more rule. (#18648) 2023-04-14 11:20:22 +08:00
e6b0e05840 [fix](Nerieds) Fix some bugs in binding and type coercion (#18548)
1. fix bind ambiguous slots exception because select same slots
2. fix bind SetOperation multiple times because CTE
3. fix case when clause not coercion to same type
4. fix an exception when set_var hint exists in subquery or CTE
2023-04-14 11:00:24 +08:00
c704351273 [enhancement](memory) Refactor memory limit exceeded behavior (#18590)
No check mem tracker limit and no cancel task in mem hook, only in Allocator. This helps in clearer analysis of memory issues and reduces performance loss.
PODArray/hash table/arena memory allocation will use Allocator.

Optimize mem limit exceeded log printing

Optimize compilation time
2023-04-14 10:42:35 +08:00
8751f08d5a [bugfix](GEO)fix precision problem (#18642) 2023-04-14 10:39:19 +08:00
f422fe888c [Doc](typo) Remove redundant words #18659 2023-04-14 10:35:26 +08:00
183800e1ad [Fix](variables) fix session variable does not take effect immediately when set global variable in follower FE (#18609) 2023-04-14 10:35:03 +08:00
56d84739c1 [Opt](pipeline) opt the scanner ctx schedule in pipeline engine (#18545) 2023-04-14 09:59:03 +08:00
2294fb46a5 [refactor](minor) update scan concurrency for pipeline (#18650) 2023-04-14 09:45:12 +08:00
dedcfd7c28 [Doc] (Show) add doc for show create repository statement (#18542) 2023-04-14 09:44:54 +08:00
cc24e2ae13 [doc](readme)add Backend C++ Coding Specification (#18649) 2023-04-14 09:37:18 +08:00
72236d2b08 [typo](docs) add row to column doc (#18546)
* [typo](docs) add row to column doc
2023-04-14 09:04:55 +08:00
ca891d880f [fix](es) ClassCastException when getting root schema (#18438)
* [fix](es) ClassCastException when getting root schema
2023-04-14 09:04:09 +08:00
b6b4408283 [fix](meta) void NPE when save meta (#18600)
Introduced from #16878,
the newly added string field can not be null, or NPE will be thrown when calling `Text.writeString()`
2023-04-14 08:52:09 +08:00
d28030e1e5 [chore](third-party) Configure the search paths for pkg-config and cmake (#18624)
Currently, our third party libraries are built by autotools or cmake. Under some scenarios, we may use system-wide headers or libraries to build them which may make the build process fail.

We can configure the search paths explicitly to help autotools and cmake find the right dependencies.
2023-04-14 08:43:27 +08:00
b39846c2c7 [Fix](Catalog)Delete duplicate defined dependencies to avoid class loading exceptions (#18628)
`iceberg-hive-metastore` and `hive-storage-api` have been defined in hive-catalog-shade,
and some classes in the shade have been renamed, so we cannot declare them again.
The classes in the shade should be kept.

The `hive-metastore-api` used in `ranger` can also use the jar in the `shade`.
Since we rename the tool class used inside the `hive`, this has no effect.
2023-04-13 22:12:19 +08:00
1d3699a70c [refactor](jdbc) refactor jdbc connection num in datasource (#18563)
now maybe jdbc have problem that there are too many connections and they do not release,
so change the property of datasource: init = 1, min = 1, max = 100, and idle time is 10 minutes.
2023-04-13 22:08:08 +08:00
6c0af24e9d [Improve](simdjson reader) support UTF-8 unicode (with BOM) (#18585) 2023-04-13 21:58:44 +08:00
281ceee3cc [feature-wip](resource-group) Support resource group tvf (#18519)
related: #18098
2023-04-13 20:11:20 +08:00