1. set replica count fot stats tbl as :"Math.max(Config.statistic_internal_table_replica_num,Config.min_replication_num_per_tablet)"
2. update comment for stats tbl remove symbol `'`
This commit introduces a transformation for SQL queries that contain multiple distinct aggregate functions. When the number of distinct values processed by these functions is greater than 1, they are converted into multi_distinct functions for more efficient handling.
Example:
```
SELECT COUNT(DISTINCT c1), SUM(DISTINCT c2) FROM tbl GROUP BY c3
-- Transformed to
SELECT MULTI_DISTINCT_COUNT(c1), MULTI_DISTINCT_SUM(c2) FROM tbl GROUP BY c3
```
The following functions can be transformed:
- COUNT
- SUM
- AVG
- GROUP_CONCAT
If any unsupported functions are encountered, an error is now reported during the optimization phase.
To ensure the absence of such cases, a final check has been implemented after the rewriting phase.
we convert input parameters to double for function ceil, floor and round,
because DecimalV2 could not do these operation. Since we intro DecimalV3,
we should convert all parameters to DecimalV3 to get correct result.
For example, when we use double as parameters, we get wrong result:
```sql
select round(341/20000,4),341/20000,round(0.01705,4);
+-------------------------+---------------+-------------------+
| round((341 / 20000), 4) | (341 / 20000) | round(0.01705, 4) |
+-------------------------+---------------+-------------------+
| 0.017 | 0.01705 | 0.0171 |
+-------------------------+---------------+-------------------+
```
DecimalV3 could get correct result
```sql
select round(341/20000,4),341/20000,round(0.01705,4);
+-------------------------+---------------+-------------------+
| round((341 / 20000), 4) | (341 / 20000) | round(0.01705, 4) |
+-------------------------+---------------+-------------------+
| 0.0171 | 0.01705 | 0.0171 |
+-------------------------+---------------+-------------------+
```
According the implementation in execution engine, all order keys
in SortNode will be output. We must normalize LogicalSort follow
by it.
We push down all non-slot order key in sort to materialize them
behind sort. So, all order key will be slot and do not need do
projection by SortNode itself.
This will simplify translation of SortNode by avoid to generate
resolvedTupleExprs and sortTupleDesc.
default value in the first cell of values when rise a cast exception, we filter it when check the types of values in insert, when the literal is string and value is the specific default value string, we skip type check.
columnStatistics.minExpr and maxExpr is useful when we derive stats for cast function.
This pr
1. maintains the min/max expr during stats derive in filter condition: col<literal, col>literal and col=literal
2. adjust column stats range for cast function (now only support cast from string to other types)
ds9 is changed, but no performance issue: on tpcds_sf100_rf exe time is 1.5~1.6sec, the same as master
The exception may be thrown before LOG is initialized.
Such as wrong config value. So we need to print it to fe.out, otherwise
we can't know what's wrong.
After this PR, the error can be found in fe.out, such as:
```
java.lang.NumberFormatException: For input string: "3g"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.apache.doris.common.ConfigBase.setConfigField(ConfigBase.java:253)
at org.apache.doris.common.ConfigBase.setFields(ConfigBase.java:232)
at org.apache.doris.common.ConfigBase.initConf(ConfigBase.java:146)
at org.apache.doris.common.ConfigBase.init(ConfigBase.java:112)
at org.apache.doris.DorisFE.start(DorisFE.java:101)
at org.apache.doris.DorisFE.main(DorisFE.java:73)
```
we should define metric name only once like following:
# HELP doris_fe_query_latency_ms
# TYPE doris_fe_query_latency_ms summary
doris_fe_query_latency_ms{quantile="0.75"} 1.0
doris_fe_query_latency_ms{quantile="0.95"} 2.0
doris_fe_query_latency_ms{quantile="0.98"} 100.0
doris_fe_query_latency_ms{quantile="0.99"} 100.0
doris_fe_query_latency_ms{quantile="0.999"} 100.0
doris_fe_query_latency_ms{quantile="0.75",user="default_cluster:test1"} 1.0
doris_fe_query_latency_ms{quantile="0.95",user="default_cluster:test1"} 1.0
doris_fe_query_latency_ms{quantile="0.98",user="default_cluster:test1"} 1.0
doris_fe_query_latency_ms{quantile="0.99",user="default_cluster:test1"} 1.0
doris_fe_query_latency_ms{quantile="0.999",user="default_cluster:test1"} 1.0
JdbcScanNode need to use the conjuncts to generate sql in finalize function. But the conjuncts have not passed to JdbcScanNode yet while calling finalize. This pr is to pass the conjuncts to scan node before using it to avoid scan the whole table.
When useBulkCopyForBatchInsert=false, the JDBC driver will not use SQL Server's Bulk Copy API for batch insertions. Thus, during the batch insertion process, each insert statement needs to be individually sent to the SQL Server, leading to a higher number of network roundtrips. Network latency could potentially become a significant factor contributing to performance degradation. For this reason, we recommend setting this parameter to true by default to enhance the performance of PreparedStatement batch insertions.
In this manner, when performing batch insertions, the JDBC driver will send all insertion data to SQL Server in one go via the Bulk Copy API, rather than sending each insert statement individually. This can significantly reduce the number of network roundtrips, thereby improving performance.
Please note that this option is only effective for fully parameterized INSERT statements. If your INSERT statement is mixed with other SQL statements, or if it contains values specified directly in the statement, then the JDBC driver will not use the Bulk Copy API, but instead will use the standard insert method.
Nereids LogicalCatalogRelation and PhysicalCatalogRelation getDatabase function only try to search InternalCatalog to find a table. This will cause all external table failed to query because it couldn't find the external database in Internal catalog.
```
mysql> explain select count(*) from multi_partition_orc;
ERROR 1105 (HY000): AnalysisException, msg: Database [default_cluster:multi_partition] does not exist.
```
This pr is using catalog name to find the correct catalog first, and then try to get the database in this catalog.
Current runtime filter can't be pushed down into complicated plan pattern, such as set operation as join child and cte sender as filter before shuffling. This pr refines the pushing down ability and can able to push the filter into different plan tree layer recursively, such as nested subquery, set op, cte sender, etc.
Add a new FE config `force_olap_table_replication_num`.
If this config is larger than 0, when doing creating table operation, the replication num of table will
forcibly be this value.
Default is 0, which make no effect.
This config will only effect the creating olap table operation, other operation such as `add partition`,
`modify table properties` will not be effect.
The motivation of this config is that the most regression test cases are creating table will single replica,
this will be the regression test running well in p0, p1 pipeline.
But we also need to run these cases in multi backend Doris cluster, so we need test cases will multi replicas.
But it is hard to modify each test cases. So I add this config, so that we can simply set it to create all tables with
specified replication number.
implement CD-A algorithm in order to support others join in DPHyper.
The algorithm details are in on the correct and complete enumeration of the core search
1. markConstantConjunct method shouldn't change the input conjunct
2. Use Expr's comeFrom method to check if the pushed expr is one of the group by exprs, this is the correct way to check if the conjunct can be pushed down through the agg node.
3. migrateConstantConjuncts should substitute the conjuncts using inlineViewRef's analyzer to make the analyzer recognize the column in the conjuncts in the following analyze phase
which operation can update this time:
1.when refresh catalog,lastUpdateTime of catalog will be update
2.when refresh db,lastUpdateTime of db will be update
3.when reload table schema to cache,lastUpdateTime of dbtable will be update
4.when receive add/drop table event,lastUpdateTime of db will be update
5.when receive alter table event,lastUpdateTime of table will be update
failed to run sql:
```sql
create alias function f1(int) with parameter(n) as dayofweek(hours_add('2023-06-18', n))
create alias function f2(int) with parameter(n) as dayofweek(hours_add(makedate(year('2023-06-18'), f1(3)), n))
select f2(f1(3))
```
it will throw an exception: f1 is not a builtin-function.
because f2's original function contains f1, and f1 is not a builtin-function, should be rewritten firstly.
we should avoid of it. And we will support it later.
1. check minio region, set default region if user region is not provided, and throw minio error msg
2. support read root path s3://bucket1
3. fix max compute public access
The getTable function in CascadesContext only handles the internal catalog case (try to find table only in internal
catalog and dbs). However, it should take all the external catalogs into consideration, otherwise, it will failed to find a
table or get the wrong table while querying external table. This pr is to fix this bug.
Add more profile information for external table plan time. Including init and finalize scan node time, getSplits time, create scan range time, get all partitions time and get all files for all partitions time. Also modified the Indentation to make it easier to read.
This is an example output of the new profile summary.
```
Execution Summary:
- Analysis Time: 3ms
- Plan Time: 26s885ms
- JoinReorder Time: N/A
- CreateSingleNode Time: N/A
- QueryDistributed Time: N/A
- Init Scan Node Time: 1ms
- Finalize Scan Node Time: 26s868ms
- Get Splits Time: 26s554ms
- Get PARTITIONS Time: 20s189ms
- Get PARTITION FILES Time: 6s289ms
- Create Scan Range Time: 314ms
- Schedule Time: 1s67ms
- Fetch Result Time: 56ms
- Write Result Time: 0ms
- Wait and Fetch Result Time: 57ms
```
When we were altering the catalog, we did not verify the new parameters of the catalog, and now we have added verification
My changes:
When We are altering the catalog, I have carried out a full inspection, and if an exception occurs, the parameters will be rolled back
We should not remove any limit from uncorrelated subquery. For Example
```sql
-- should return nothing, but return all tuple of t if we remove limit from exists
SELECT * FROM t WHERE EXISTS (SELECT * FROM t limit 0);
-- should return the tuple with smallest c1 in t,
-- but report error if we remove limit from scalar subquery
SELECT * FROM t WHERE c1 = (SELECT * FROM t ORDER BY c1 LIMIT 1);
```