For now, there are 3 packages for the release binaries of Doris: https://doris.apache.org/download
And user may be confused about how to download and deploy these packages.
So I provide a download script for each release, and user can simply download the script and run it, like:
```
> sh download_x64_apache.sh
Begin to download FE from "https://mirrors.tuna.tsinghua.edu.cn/apache/doris/1.2/1.2.3-rc02/apache-doris-fe-1.2.3-bin-x86_64.tar.xz" to "apache-doris-1.2.3-bin/" ...
Total size: 408078012 Bytes
#################################################### 100.0%
Begin to download BE from "https://mirrors.tuna.tsinghua.edu.cn/apache/doris/1.2/1.2.3-rc02/apache-doris-be-1.2.3-bin-x86_64.tar.xz" to "apache-doris-1.2.3-bin/" ...
Total size: 606211324 Bytes
#################################################### 100.0%
Begin to download DEPS from "https://mirrors.tuna.tsinghua.edu.cn/apache/doris/1.2/1.2.3-rc02/apache-doris-dependencies-1.2.3-bin-x86_64.tar.xz" to "apache-doris-1.2.3-bin/" ...
Total size: 253869148 Bytes
#################################################### 100.0%
Begin to assemble the binaries ...
Move java-udf-jar-with-dependencies.jar to be/lib/ ...
Download complete!
You can now deploy Apache Doris from apache-doris-1.2.3-bin/
```
The script will do the rest.
This script will later be published on the Download page of Apache Doris website, so that user can easily get
it and use it.
Currently only for Linux platform. Other platform is untested.
select cast(k1 as INT) as id from tbl1 order by id limit 2;
is not valid for topN optimization, because 'id' is
a cast expr not a table column from scan node.
This pr address this issue.
when enable pipeline to true, and set instances > 1
because all scan nodes share the scanners, maybe get the profile of scan node is all empty
now show all the scan nodes and remove some infos those that _num_scanners->value() == 0
In #17976, we introduced small fix container to optimize the in expr. This PR will change small fix container size of In set to 8, which has better performance when size > 8 by the perf test.
Support to delete expired stats periodically and manually.
default cleaner running interval is 2 days
Manually clean syntax is
```sql
DROP EXPIRED STATS
```
TODO:
1. process external catalog's stats
2. run drop at the appointed time
3. sleep a short time after drop one batch
Add session variable forbid_unknown_col_stats. When this var is true, nereids rejects to use unknown column stats.
the main purpose of this pr is to save debug effort.
`select count(*) from T group by A, B`
suppose `ndv(A) > ndv(B)`
the estimated row count of aggregate is between ndv(A) and ndv(A) * ndv(B)
in previous version, we choose upper bound, that is ndv(A) * ndv(B). The drawback of this choice is the estimated row is often bigger that row count of T.
In this version, we choose the lower bound.
When query tables in information_schema databases, it may timeout due to:
There are external catalog with too many tables.
The external catalog is unreachable
So I add a new FE config infodb_support_ext_catalog.
The default is false, which means that when select from tables in information_schema database,
the result will not contain the information of the table in external catalog.
Describe your changes.
if we have expr like below
```
date(c1) -- c1's type is date or datev2
```
the expr's result is exactly same with c1, and we should
remove date function. This expr optimization will simplify
expr, speed up execution and increase the opportunity of
push filters to storage layer.
1. fix bind ambiguous slots exception because select same slots
2. fix bind SetOperation multiple times because CTE
3. fix case when clause not coercion to same type
4. fix an exception when set_var hint exists in subquery or CTE
No check mem tracker limit and no cancel task in mem hook, only in Allocator. This helps in clearer analysis of memory issues and reduces performance loss.
PODArray/hash table/arena memory allocation will use Allocator.
Optimize mem limit exceeded log printing
Optimize compilation time
Currently, our third party libraries are built by autotools or cmake. Under some scenarios, we may use system-wide headers or libraries to build them which may make the build process fail.
We can configure the search paths explicitly to help autotools and cmake find the right dependencies.
`iceberg-hive-metastore` and `hive-storage-api` have been defined in hive-catalog-shade,
and some classes in the shade have been renamed, so we cannot declare them again.
The classes in the shade should be kept.
The `hive-metastore-api` used in `ranger` can also use the jar in the `shade`.
Since we rename the tool class used inside the `hive`, this has no effect.
now maybe jdbc have problem that there are too many connections and they do not release,
so change the property of datasource: init = 1, min = 1, max = 100, and idle time is 10 minutes.