Commit Graph

5302 Commits

Author SHA1 Message Date
2e20ff8cab [feature](metric) Support collect query counter and error query counter metric in user level (#22125)
1. support collect query counter and error query counter metric in user level
2. add sum and count for histogram metric for mistaken delete in PR #22045
2023-07-25 11:16:38 +08:00
3c58e9bac9 [Fix](Nereids) Fix problem of infer predicates not completely (#22145)
Problem:
When inferring predicate in nereids, new inferred predicates can not be the source of next round. For example:

create table tt1(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
create table tt2(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
create table tt3(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
explain select * from tt1 left join tt2 on tt1.c1 = tt2.c1 left join tt3 on tt2.c1 = tt3.c1 where tt1.c1 = 123;

we expect to get t33.c1 = 123, but we can just get t22.c1 = 123. Because when infer tt1.c1 = 123 and tt2.c1 = tt3.c1, we can
not get any relationship of these two predicates.

Solution:
We need to cache middle results of source predicates like t22.c1 = 123 in example.
2023-07-25 10:05:00 +08:00
fc67929e34 [improvement](catalog) optimize ldap and support more character in user and table name (#21968)
- common name support `-` ,reason: MySQL's db name support `-`
- table name support `-`
- username support `.`,reason:LDAP's username support `.`
- ldap doc
- ldap support rbac
2023-07-24 22:04:37 +08:00
7fcf702081 [improvement](multi catalog)paimon support filesystem metastore (#21910)
1.support filesystem metastore

2.support predicate and project when split

3.fix partition table query error

todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem

doc pr: #21966
2023-07-24 22:02:57 +08:00
82bdcb3da8 [fix](Nereids) translate partition topn order key on wrong tuple (#22168)
partition key should on child tuple, sort key should on partition top's tuple
2023-07-24 20:46:27 +08:00
2d52d8d926 [opt](stats) Update stats table config and comment (#22070)
1. set replica count fot stats tbl as :"Math.max(Config.statistic_internal_table_replica_num,Config.min_replication_num_per_tablet)"
2. update comment for stats tbl remove symbol `'`
2023-07-24 20:43:55 +08:00
0677b261b5 [fix](Nereids) should not process prepare command by Nereids (#22167) 2023-07-24 20:11:40 +08:00
0205f540ac [enhancement](config) Enlarge broker scanner bytes conf to 500G, 5G is still not enough (#22126) 2023-07-24 19:49:39 +08:00
cf30ea914a [fix](Nereids) forbid gather sort with explict shuffle (#22153)
gather sort with explict shuffle usually bad, forbid it
2023-07-24 19:45:18 +08:00
3ba3690f93 [Fix](Http-API)Check and replace user sensitive characters (#22148) 2023-07-24 18:21:42 +08:00
68bd4a1a96 [opt](Nereids) check multiple distinct functions that cannot be transformed into muti_distinct (#21626)
This commit introduces a transformation for SQL queries that contain multiple distinct aggregate functions. When the number of distinct values processed by these functions is greater than 1, they are converted into multi_distinct functions for more efficient handling.

Example:
```
SELECT COUNT(DISTINCT c1), SUM(DISTINCT c2) FROM tbl GROUP BY c3
-- Transformed to
SELECT MULTI_DISTINCT_COUNT(c1), MULTI_DISTINCT_SUM(c2) FROM tbl GROUP BY c3
```

The following functions can be transformed:
- COUNT
- SUM
- AVG
- GROUP_CONCAT

If any unsupported functions are encountered, an error is now reported during the optimization phase.

To ensure the absence of such cases, a final check has been implemented after the rewriting phase.
2023-07-24 16:34:17 +08:00
21deb57a4d [fix](Nereids) remove double sigature of ceil, floor and round (#22134)
we convert input parameters to double for function ceil, floor and round,
because DecimalV2 could not do these operation. Since we intro DecimalV3,
we should convert all parameters to DecimalV3 to get correct result.
For example, when we use double as parameters, we get wrong result:
```sql
select round(341/20000,4),341/20000,round(0.01705,4);
+-------------------------+---------------+-------------------+
| round((341 / 20000), 4) | (341 / 20000) | round(0.01705, 4) |
+-------------------------+---------------+-------------------+
| 0.017                   | 0.01705       | 0.0171            |
+-------------------------+---------------+-------------------+
```
DecimalV3 could get correct result
```sql
select round(341/20000,4),341/20000,round(0.01705,4);
+-------------------------+---------------+-------------------+
| round((341 / 20000), 4) | (341 / 20000) | round(0.01705, 4) |
+-------------------------+---------------+-------------------+
| 0.0171                  | 0.01705       | 0.0171            |
+-------------------------+---------------+-------------------+
```
2023-07-24 16:08:00 +08:00
ac9480123c [refactor](Nereids) push down all non-slot order key in sort and prune them upper sort (#22034)
According the implementation in execution engine, all order keys
in SortNode will be output. We must normalize LogicalSort follow
by it.
We push down all non-slot order key in sort to materialize them
behind sort. So, all order key will be slot and do not need do
projection by SortNode itself.
This will simplify translation of SortNode by avoid to generate
resolvedTupleExprs and sortTupleDesc.
2023-07-24 15:36:33 +08:00
667e4ea99b [Fix](binlog) Fix bugs in tombstone (#22031) 2023-07-24 14:33:16 +08:00
b5f27b5349 [enhance](nereids) enable wf partition topn by default (#21860) 2023-07-24 14:21:45 +08:00
66fa1bef6d [refactor](Nereids): avoid useless groupByColStats Map (#22000) 2023-07-24 12:13:52 +08:00
ea35437c44 [Fix](Nereids)fix insert into default value exception (#21924)
default value in the first cell of values when rise a cast exception, we filter it when check the types of values in insert, when the literal is string and value is the specific default value string, we skip type check.
2023-07-24 12:08:43 +08:00
e141409171 [Fix](planner) fix rewritten alias function's original function is not analyzed again (#21497)
fn is null because the alias function's original function is analyzed again, we fix it by add an analysis phase.
2023-07-24 11:40:00 +08:00
138e6c2f01 [stats](nereids)keep min/max expr in colstats (#22064)
columnStatistics.minExpr and maxExpr is useful when we derive stats for cast function.
This pr
1. maintains the min/max expr during stats derive in filter condition: col<literal, col>literal and col=literal
2. adjust column stats range for cast function (now only support cast from string to other types)

ds9 is changed, but no performance issue: on tpcds_sf100_rf exe time is 1.5~1.6sec, the same as master
2023-07-24 10:28:36 +08:00
c78341b728 [improvement](spark-load) support datev2 and datetimev2 #21839 2023-07-24 09:07:53 +08:00
ff9811fa1b [Bug][Colocate] when adding a table to the colocate group, we should check that the number of buckets per partition is the same (#21906)
for example

CREATE TABLE `colocate_a` (
 dt date,
 k1 int,
 v1 int
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
PARTITION BY RANGE(`dt`)
(PARTITION p1 VALUES [('2022-10-02'), ('2022-10-03'))
DISTRIBUTED BY HASH(`k1`) BUCKETS 2
PROPERTIES (
"replication_num" = "3",
"in_memory" = "false",
"storage_format" = "V2"
);

ALTER TABLE colocate_a set ("colocate_with" = "ab");

CREATE TABLE `colocate_b` (
 dt date,
 k1 int,
 v1 int
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
PARTITION BY RANGE(`dt`)
(PARTITION p1 VALUES [('2022-10-02'), ('2022-10-03'))
DISTRIBUTED BY HASH(`k1`) BUCKETS 2
PROPERTIES (
"replication_num" = "3",
"in_memory" = "false",
"storage_format" = "V2"
);

ALTER TABLE colocate_b ADD PARTITION p2 VALUES [("2022-10-03"),("2022-10-04")) DISTRIBUTED BY HASH(k1) BUCKETS 10;

ALTER TABLE colocate_b set ("colocate_with" = "ab");
table colocate_b partition p2 set bucket num is 10 then take it into group ab.

In ColocateTableCheckerAndBalancer matchGroup occur :

java.lang.IllegalStateException: 2 vs. 10
303861     at com.google.common.base.Preconditions.checkState(Preconditions.java:508) ~[guava-30.0-jre.jar:?]
303862     at org.apache.doris.clone.ColocateTableCheckerAndBalancer.matchGroup(ColocateTableCheckerAndBalancer.java:242) ~[doris-fe.jar:1.2-SNAPSHOT]
303863     at org.apache.doris.clone.ColocateTableCheckerAndBalancer.runAfterCatalogReady(ColocateTableCheckerAndBalancer.java:95) ~[doris-fe.jar:1.2-SNAPSHOT]
303864     at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) ~[doris-fe.jar:1.2-SNAPSHOT]
303865     at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
---------

Co-authored-by: shizhiqiang03 <shizhiqiang03@meituan.com>
2023-07-24 09:01:16 +08:00
64348055a1 [improvement](iceberg) Optimize the split to the user-specified size #22078
According to the specified split size, the split tasks are merged to keep a single task near the expected size.
2023-07-24 08:48:10 +08:00
a5099a2d3b [minor](log) print error msg to fe.out before log is initialized (#22106)
The exception may be thrown before LOG is initialized.
Such as wrong config value. So we need to print it to fe.out, otherwise
we can't know what's wrong.

After this PR, the error can be found in fe.out, such as:

```
java.lang.NumberFormatException: For input string: "3g"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Long.parseLong(Long.java:589)
        at java.lang.Long.parseLong(Long.java:631)
        at org.apache.doris.common.ConfigBase.setConfigField(ConfigBase.java:253)
        at org.apache.doris.common.ConfigBase.setFields(ConfigBase.java:232)
        at org.apache.doris.common.ConfigBase.initConf(ConfigBase.java:146)
        at org.apache.doris.common.ConfigBase.init(ConfigBase.java:112)
        at org.apache.doris.DorisFE.start(DorisFE.java:101)
        at org.apache.doris.DorisFE.main(DorisFE.java:73)
```
2023-07-23 19:20:10 +08:00
22aa54e335 [enhancement](config) enlarge max_bytes_per_broker_scanner to 5G #22099 2023-07-23 12:00:32 +08:00
dfb5d4bc13 [fix](catalog) do not call makeSureInitialized when create/drop table/db from hms meta event (#21941)
Supplement to #21104
2023-07-23 11:24:20 +08:00
8cb532230a [fix](metric) fix prometheus metric format error (#22045)
we should define metric name only once like following:

# HELP doris_fe_query_latency_ms 
# TYPE doris_fe_query_latency_ms summary
doris_fe_query_latency_ms{quantile="0.75"} 1.0
doris_fe_query_latency_ms{quantile="0.95"} 2.0
doris_fe_query_latency_ms{quantile="0.98"} 100.0
doris_fe_query_latency_ms{quantile="0.99"} 100.0
doris_fe_query_latency_ms{quantile="0.999"} 100.0
doris_fe_query_latency_ms{quantile="0.75",user="default_cluster:test1"} 1.0
doris_fe_query_latency_ms{quantile="0.95",user="default_cluster:test1"} 1.0
doris_fe_query_latency_ms{quantile="0.98",user="default_cluster:test1"} 1.0
doris_fe_query_latency_ms{quantile="0.99",user="default_cluster:test1"} 1.0
doris_fe_query_latency_ms{quantile="0.999",user="default_cluster:test1"} 1.0
2023-07-22 22:38:29 +08:00
3d0f952934 [FIX](complex-type)delete enable_map/struct_type switch #21957 2023-07-22 15:29:32 +08:00
50c8563f35 [fix](partial update) fix some bugs of sequence column (#21896) 2023-07-22 15:26:48 +08:00
355ac18363 [Fix](jdbc catalog) Pass conjuncts to JdbcScanNode and FileScanNode before doing finalize. (#21998)
JdbcScanNode need to use the conjuncts to generate sql in finalize function. But the conjuncts have not passed to JdbcScanNode yet while calling finalize. This pr is to pass the conjuncts to scan node before using it to avoid scan the whole table.
2023-07-22 14:08:44 +08:00
42ec92fd12 [enhancement](jdbc catalog) Add sqlserver jdbc url param useBulkCopyForBatchInsert=true (#22032)
When useBulkCopyForBatchInsert=false, the JDBC driver will not use SQL Server's Bulk Copy API for batch insertions. Thus, during the batch insertion process, each insert statement needs to be individually sent to the SQL Server, leading to a higher number of network roundtrips. Network latency could potentially become a significant factor contributing to performance degradation. For this reason, we recommend setting this parameter to true by default to enhance the performance of PreparedStatement batch insertions.

In this manner, when performing batch insertions, the JDBC driver will send all insertion data to SQL Server in one go via the Bulk Copy API, rather than sending each insert statement individually. This can significantly reduce the number of network roundtrips, thereby improving performance.

Please note that this option is only effective for fully parameterized INSERT statements. If your INSERT statement is mixed with other SQL statements, or if it contains values specified directly in the statement, then the JDBC driver will not use the Bulk Copy API, but instead will use the standard insert method.
2023-07-22 11:32:21 +08:00
82f5a3f684 [Fix] (multi catalog)Fix external table couldn't find db bug (#22074)
Nereids LogicalCatalogRelation and PhysicalCatalogRelation getDatabase function only try to search InternalCatalog to find a table. This will cause all external table failed to query because it couldn't find the external database in Internal catalog.
```
mysql> explain select count(*) from multi_partition_orc;
ERROR 1105 (HY000): AnalysisException, msg: Database [default_cluster:multi_partition] does not exist.
```

This pr is using catalog name to find the correct catalog first, and then try to get the database in this catalog.
2023-07-22 00:13:26 +08:00
93f9a8cbf5 [fix](nereids)PredicatePropagation only support integer types for now (#22096) 2023-07-21 23:40:08 +08:00
0b1c82b021 [opt](nereids) enhance runtime filter pushdown (#21883)
Current runtime filter can't be pushed down into complicated plan pattern, such as set operation as join child and cte sender as filter before shuffling. This pr refines the pushing down ability and can able to push the filter into different plan tree layer recursively, such as nested subquery, set op, cte sender, etc.
2023-07-21 23:31:30 +08:00
ef01988ae1 [opt](inverted index) support the same column create different type index (#21972) 2023-07-21 23:02:39 +08:00
acf4aa2818 [fix](planner)shouldn't force push down conjuncts for union statement (#22079)
* [fix](planner)shouldn't force push down conjuncts for union statement
2023-07-21 21:12:56 +08:00
85cc044aaa [feature](create-table) support setting replication num for creating table opertaion globally (#21848)
Add a new FE config `force_olap_table_replication_num`.
If this config is larger than 0, when doing creating table operation, the replication num of table will
forcibly be this value.
Default is 0, which make no effect.
This config will only effect the creating olap table operation, other operation such as `add partition`,
`modify table properties` will not be effect.

The motivation of this config is that the most regression test cases are creating table will single replica,
this will be the regression test running well in p0, p1 pipeline.
But we also need to run these cases in multi backend Doris cluster, so we need test cases will multi replicas.
But it is hard to modify each test cases. So I add this config, so that we can simply set it to create all tables with
specified replication number.
2023-07-21 19:36:04 +08:00
e489b60ea3 [feature](load) support line delimiter for old broker load (#22030) 2023-07-21 19:31:19 +08:00
b76d0d84ac [enhancement](Nereids) support other join framework in DPHyper (#21835)
implement CD-A algorithm in order to support others join in DPHyper.
The algorithm details are in on the correct and complete enumeration of the core search
2023-07-21 18:31:52 +08:00
7cac36d9e8 [chore](Nereids) fix typo in some plan visitor (#21830) 2023-07-21 18:22:20 +08:00
94e2c3cf0f [fix](tablet clone) sched wait slot if has be path (#22015) 2023-07-21 13:27:40 +08:00
74313c7d54 [feature-wip](autoinc)(step-3) add auto increment support for unique table (#22036) 2023-07-21 13:24:41 +08:00
6512893257 [refactor](vectorized) Remove useless control variables to simplify aggregation node code (#22026)
* [refactor](vectorized) Remove useless control variables to simplify aggregation node code

* fix
2023-07-21 12:45:23 +08:00
fb5b412698 [fix](planner)fix bug of pushing conjuncts into inlineview (#21962)
1. markConstantConjunct method shouldn't change the input conjunct
2. Use Expr's comeFrom method to check if the pushed expr is one of the group by exprs, this is the correct way to check if the conjunct can be pushed down through the agg node.
3. migrateConstantConjuncts should substitute the conjuncts using inlineViewRef's analyzer to make the analyzer recognize the column in the conjuncts in the following analyze phase
2023-07-21 11:34:56 +08:00
b09c4d490a [fix](test) should not create and read internal table when use mock cluster in UT (#21660) 2023-07-21 11:30:26 +08:00
0b2b1cbd58 [improvement](multi-catalog)add last sync time for external catalog (#21873)
which operation can update this time:

1.when refresh catalog,lastUpdateTime of catalog will be update
2.when refresh db,lastUpdateTime of db will be update
3.when reload table schema to cache,lastUpdateTime of dbtable will be update
4.when receive add/drop table event,lastUpdateTime of db will be update
5.when receive alter table event,lastUpdateTime of table will be update
2023-07-21 09:42:35 +08:00
f3d9a843dd [Fix](planner)fix ctas incorrect string types of the target table. (#21754)
string types from src table will be replaced to text type in ctas table, we change it to be corresponding to the src table.
2023-07-20 22:14:43 +08:00
a151326268 [Fix](planner)fix failed running alias function with an alias function in original function. (#21024)
failed to run sql:
```sql
create alias function f1(int) with parameter(n) as dayofweek(hours_add('2023-06-18', n))
create alias function f2(int) with parameter(n) as dayofweek(hours_add(makedate(year('2023-06-18'), f1(3)), n))

 select f2(f1(3))
```
it will throw an exception: f1 is not a builtin-function.
because f2's original function contains f1, and f1 is not a builtin-function, should be rewritten firstly.
we should avoid of it. And we will support it later.
2023-07-20 22:12:10 +08:00
ab11dea98d [Enhancement](config) optimize behavior of default_storage_medium (#20739) 2023-07-20 22:00:11 +08:00
7d488688b4 [fix](multi-catalog)fix minio default region and throw minio error msg, support s3 bucket root path (#21994)
1. check minio region, set default region if user region is not provided, and throw minio error msg
2. support read root path s3://bucket1
3. fix max compute public access
2023-07-20 20:48:55 +08:00
eabd5d386b [Fix](multi catalog)Fix nereids context table always use internal catalog bug (#21953)
The getTable function in CascadesContext only handles the internal catalog case (try to find table only in internal 
catalog and dbs). However, it should take all the external catalogs into consideration, otherwise, it will failed to find a 
table or get the wrong table while querying external table. This pr is to fix this bug.
2023-07-20 20:32:01 +08:00