doris

Author	SHA1	Message	Date
caiconghui	2e20ff8cab	[feature](metric) Support collect query counter and error query counter metric in user level (#22125 ) 1. support collect query counter and error query counter metric in user level 2. add sum and count for histogram metric for mistaken delete in PR #22045	2023-07-25 11:16:38 +08:00
LiBinfeng	3c58e9bac9	[Fix](Nereids) Fix problem of infer predicates not completely (#22145 ) Problem: When inferring predicate in nereids, new inferred predicates can not be the source of next round. For example: create table tt1(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1'); create table tt2(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1'); create table tt3(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1'); explain select * from tt1 left join tt2 on tt1.c1 = tt2.c1 left join tt3 on tt2.c1 = tt3.c1 where tt1.c1 = 123; we expect to get t33.c1 = 123, but we can just get t22.c1 = 123. Because when infer tt1.c1 = 123 and tt2.c1 = tt3.c1, we can not get any relationship of these two predicates. Solution: We need to cache middle results of source predicates like t22.c1 = 123 in example.	2023-07-25 10:05:00 +08:00
zhangdong	fc67929e34	[improvement](catalog) optimize ldap and support more character in user and table name (#21968 ) - common name support `-` ,reason: MySQL's db name support `-` - table name support `-` - username support `.`,reason:LDAP's username support `.` - ldap doc - ldap support rbac	2023-07-24 22:04:37 +08:00
zhangdong	7fcf702081	[improvement](multi catalog)paimon support filesystem metastore (#21910 ) 1.support filesystem metastore 2.support predicate and project when split 3.fix partition table query error todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem doc pr: #21966	2023-07-24 22:02:57 +08:00
morrySnow	82bdcb3da8	[fix](Nereids) translate partition topn order key on wrong tuple (#22168 ) partition key should on child tuple, sort key should on partition top's tuple	2023-07-24 20:46:27 +08:00
AKIRA	2d52d8d926	[opt](stats) Update stats table config and comment (#22070 ) 1. set replica count fot stats tbl as :"Math.max(Config.statistic_internal_table_replica_num,Config.min_replication_num_per_tablet)" 2. update comment for stats tbl remove symbol `'`	2023-07-24 20:43:55 +08:00
morrySnow	0677b261b5	[fix](Nereids) should not process prepare command by Nereids (#22167 )	2023-07-24 20:11:40 +08:00
Siyang Tang	0205f540ac	[enhancement](config) Enlarge broker scanner bytes conf to 500G, 5G is still not enough (#22126 )	2023-07-24 19:49:39 +08:00
morrySnow	cf30ea914a	[fix](Nereids) forbid gather sort with explict shuffle (#22153 ) gather sort with explict shuffle usually bad, forbid it	2023-07-24 19:45:18 +08:00
Calvin Kirs	3ba3690f93	[Fix](Http-API)Check and replace user sensitive characters (#22148 )	2023-07-24 18:21:42 +08:00
谢健	68bd4a1a96	[opt](Nereids) check multiple distinct functions that cannot be transformed into muti_distinct (#21626 ) This commit introduces a transformation for SQL queries that contain multiple distinct aggregate functions. When the number of distinct values processed by these functions is greater than 1, they are converted into multi_distinct functions for more efficient handling. Example: ``` SELECT COUNT(DISTINCT c1), SUM(DISTINCT c2) FROM tbl GROUP BY c3 -- Transformed to SELECT MULTI_DISTINCT_COUNT(c1), MULTI_DISTINCT_SUM(c2) FROM tbl GROUP BY c3 ``` The following functions can be transformed: - COUNT - SUM - AVG - GROUP_CONCAT If any unsupported functions are encountered, an error is now reported during the optimization phase. To ensure the absence of such cases, a final check has been implemented after the rewriting phase.	2023-07-24 16:34:17 +08:00
morrySnow	21deb57a4d	[fix](Nereids) remove double sigature of ceil, floor and round (#22134 ) we convert input parameters to double for function ceil, floor and round, because DecimalV2 could not do these operation. Since we intro DecimalV3, we should convert all parameters to DecimalV3 to get correct result. For example, when we use double as parameters, we get wrong result: ```sql select round(341/20000,4),341/20000,round(0.01705,4); +-------------------------+---------------+-------------------+ \| round((341 / 20000), 4) \| (341 / 20000) \| round(0.01705, 4) \| +-------------------------+---------------+-------------------+ \| 0.017 \| 0.01705 \| 0.0171 \| +-------------------------+---------------+-------------------+ ``` DecimalV3 could get correct result ```sql select round(341/20000,4),341/20000,round(0.01705,4); +-------------------------+---------------+-------------------+ \| round((341 / 20000), 4) \| (341 / 20000) \| round(0.01705, 4) \| +-------------------------+---------------+-------------------+ \| 0.0171 \| 0.01705 \| 0.0171 \| +-------------------------+---------------+-------------------+ ```	2023-07-24 16:08:00 +08:00
morrySnow	ac9480123c	[refactor](Nereids) push down all non-slot order key in sort and prune them upper sort (#22034 ) According the implementation in execution engine, all order keys in SortNode will be output. We must normalize LogicalSort follow by it. We push down all non-slot order key in sort to materialize them behind sort. So, all order key will be slot and do not need do projection by SortNode itself. This will simplify translation of SortNode by avoid to generate resolvedTupleExprs and sortTupleDesc.	2023-07-24 15:36:33 +08:00
DeadlineFen	667e4ea99b	[Fix](binlog) Fix bugs in tombstone (#22031 )	2023-07-24 14:33:16 +08:00
xzj7019	b5f27b5349	[enhance](nereids) enable wf partition topn by default (#21860 )	2023-07-24 14:21:45 +08:00
jakevin	66fa1bef6d	[refactor](Nereids): avoid useless groupByColStats Map (#22000 )	2023-07-24 12:13:52 +08:00
mch_ucchi	ea35437c44	[Fix](Nereids)fix insert into default value exception (#21924 ) default value in the first cell of values when rise a cast exception, we filter it when check the types of values in insert, when the literal is string and value is the specific default value string, we skip type check.	2023-07-24 12:08:43 +08:00
mch_ucchi	e141409171	[Fix](planner) fix rewritten alias function's original function is not analyzed again (#21497 ) fn is null because the alias function's original function is analyzed again, we fix it by add an analysis phase.	2023-07-24 11:40:00 +08:00
minghong	138e6c2f01	[stats](nereids)keep min/max expr in colstats (#22064 ) columnStatistics.minExpr and maxExpr is useful when we derive stats for cast function. This pr 1. maintains the min/max expr during stats derive in filter condition: col<literal, col>literal and col=literal 2. adjust column stats range for cast function (now only support cast from string to other types) ds9 is changed, but no performance issue: on tpcds_sf100_rf exe time is 1.5~1.6sec, the same as master	2023-07-24 10:28:36 +08:00
gnehil	c78341b728	[improvement](spark-load) support datev2 and datetimev2 #21839	2023-07-24 09:07:53 +08:00
shee	ff9811fa1b	[Bug][Colocate] when adding a table to the colocate group, we should check that the number of buckets per partition is the same (#21906 ) for example CREATE TABLE `colocate_a` ( dt date, k1 int, v1 int ) ENGINE=OLAP DUPLICATE KEY(`k1`) PARTITION BY RANGE(`dt`) (PARTITION p1 VALUES [('2022-10-02'), ('2022-10-03')) DISTRIBUTED BY HASH(`k1`) BUCKETS 2 PROPERTIES ( "replication_num" = "3", "in_memory" = "false", "storage_format" = "V2" ); ALTER TABLE colocate_a set ("colocate_with" = "ab"); CREATE TABLE `colocate_b` ( dt date, k1 int, v1 int ) ENGINE=OLAP DUPLICATE KEY(`k1`) PARTITION BY RANGE(`dt`) (PARTITION p1 VALUES [('2022-10-02'), ('2022-10-03')) DISTRIBUTED BY HASH(`k1`) BUCKETS 2 PROPERTIES ( "replication_num" = "3", "in_memory" = "false", "storage_format" = "V2" ); ALTER TABLE colocate_b ADD PARTITION p2 VALUES [("2022-10-03"),("2022-10-04")) DISTRIBUTED BY HASH(k1) BUCKETS 10; ALTER TABLE colocate_b set ("colocate_with" = "ab"); table colocate_b partition p2 set bucket num is 10 then take it into group ab. In ColocateTableCheckerAndBalancer matchGroup occur : java.lang.IllegalStateException: 2 vs. 10 303861 at com.google.common.base.Preconditions.checkState(Preconditions.java:508) ~[guava-30.0-jre.jar:?] 303862 at org.apache.doris.clone.ColocateTableCheckerAndBalancer.matchGroup(ColocateTableCheckerAndBalancer.java:242) ~[doris-fe.jar:1.2-SNAPSHOT] 303863 at org.apache.doris.clone.ColocateTableCheckerAndBalancer.runAfterCatalogReady(ColocateTableCheckerAndBalancer.java:95) ~[doris-fe.jar:1.2-SNAPSHOT] 303864 at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) ~[doris-fe.jar:1.2-SNAPSHOT] 303865 at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.2-SNAPSHOT] --------- Co-authored-by: shizhiqiang03 <shizhiqiang03@meituan.com>	2023-07-24 09:01:16 +08:00
wuwenchi	64348055a1	[improvement](iceberg) Optimize the split to the user-specified size #22078 According to the specified split size, the split tasks are merged to keep a single task near the expected size.	2023-07-24 08:48:10 +08:00
Mingyu Chen	a5099a2d3b	[minor](log) print error msg to fe.out before log is initialized (#22106 ) The exception may be thrown before LOG is initialized. Such as wrong config value. So we need to print it to fe.out, otherwise we can't know what's wrong. After this PR, the error can be found in fe.out, such as: ``` java.lang.NumberFormatException: For input string: "3g" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at org.apache.doris.common.ConfigBase.setConfigField(ConfigBase.java:253) at org.apache.doris.common.ConfigBase.setFields(ConfigBase.java:232) at org.apache.doris.common.ConfigBase.initConf(ConfigBase.java:146) at org.apache.doris.common.ConfigBase.init(ConfigBase.java:112) at org.apache.doris.DorisFE.start(DorisFE.java:101) at org.apache.doris.DorisFE.main(DorisFE.java:73) ```	2023-07-23 19:20:10 +08:00
Siyang Tang	22aa54e335	[enhancement](config) enlarge max_bytes_per_broker_scanner to 5G #22099	2023-07-23 12:00:32 +08:00
zhangdong	dfb5d4bc13	[fix](catalog) do not call makeSureInitialized when create/drop table/db from hms meta event (#21941 ) Supplement to #21104	2023-07-23 11:24:20 +08:00
caiconghui	8cb532230a	[fix](metric) fix prometheus metric format error (#22045 ) we should define metric name only once like following: # HELP doris_fe_query_latency_ms # TYPE doris_fe_query_latency_ms summary doris_fe_query_latency_ms{quantile="0.75"} 1.0 doris_fe_query_latency_ms{quantile="0.95"} 2.0 doris_fe_query_latency_ms{quantile="0.98"} 100.0 doris_fe_query_latency_ms{quantile="0.99"} 100.0 doris_fe_query_latency_ms{quantile="0.999"} 100.0 doris_fe_query_latency_ms{quantile="0.75",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.95",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.98",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.99",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.999",user="default_cluster:test1"} 1.0	2023-07-22 22:38:29 +08:00
amory	3d0f952934	[FIX](complex-type)delete enable_map/struct_type switch #21957	2023-07-22 15:29:32 +08:00
zhannngchen	50c8563f35	[fix](partial update) fix some bugs of sequence column (#21896 )	2023-07-22 15:26:48 +08:00
Jibing-Li	355ac18363	[Fix](jdbc catalog) Pass conjuncts to JdbcScanNode and FileScanNode before doing finalize. (#21998 ) JdbcScanNode need to use the conjuncts to generate sql in finalize function. But the conjuncts have not passed to JdbcScanNode yet while calling finalize. This pr is to pass the conjuncts to scan node before using it to avoid scan the whole table.	2023-07-22 14:08:44 +08:00
zy-kkk	42ec92fd12	[enhancement](jdbc catalog) Add sqlserver jdbc url param `useBulkCopyForBatchInsert=true` (#22032 ) When useBulkCopyForBatchInsert=false, the JDBC driver will not use SQL Server's Bulk Copy API for batch insertions. Thus, during the batch insertion process, each insert statement needs to be individually sent to the SQL Server, leading to a higher number of network roundtrips. Network latency could potentially become a significant factor contributing to performance degradation. For this reason, we recommend setting this parameter to true by default to enhance the performance of PreparedStatement batch insertions. In this manner, when performing batch insertions, the JDBC driver will send all insertion data to SQL Server in one go via the Bulk Copy API, rather than sending each insert statement individually. This can significantly reduce the number of network roundtrips, thereby improving performance. Please note that this option is only effective for fully parameterized INSERT statements. If your INSERT statement is mixed with other SQL statements, or if it contains values specified directly in the statement, then the JDBC driver will not use the Bulk Copy API, but instead will use the standard insert method.	2023-07-22 11:32:21 +08:00
Jibing-Li	82f5a3f684	[Fix] (multi catalog)Fix external table couldn't find db bug (#22074 ) Nereids LogicalCatalogRelation and PhysicalCatalogRelation getDatabase function only try to search InternalCatalog to find a table. This will cause all external table failed to query because it couldn't find the external database in Internal catalog. ``` mysql> explain select count(*) from multi_partition_orc; ERROR 1105 (HY000): AnalysisException, msg: Database [default_cluster:multi_partition] does not exist. ``` This pr is using catalog name to find the correct catalog first, and then try to get the database in this catalog.	2023-07-22 00:13:26 +08:00
starocean999	93f9a8cbf5	[fix](nereids)PredicatePropagation only support integer types for now (#22096 )	2023-07-21 23:40:08 +08:00
xzj7019	0b1c82b021	[opt](nereids) enhance runtime filter pushdown (#21883 ) Current runtime filter can't be pushed down into complicated plan pattern, such as set operation as join child and cte sender as filter before shuffling. This pr refines the pushing down ability and can able to push the filter into different plan tree layer recursively, such as nested subquery, set op, cte sender, etc.	2023-07-21 23:31:30 +08:00
YueW	ef01988ae1	[opt](inverted index) support the same column create different type index (#21972 )	2023-07-21 23:02:39 +08:00
starocean999	acf4aa2818	[fix](planner)shouldn't force push down conjuncts for union statement (#22079 ) * [fix](planner)shouldn't force push down conjuncts for union statement	2023-07-21 21:12:56 +08:00
Mingyu Chen	85cc044aaa	[feature](create-table) support setting replication num for creating table opertaion globally (#21848 ) Add a new FE config `force_olap_table_replication_num`. If this config is larger than 0, when doing creating table operation, the replication num of table will forcibly be this value. Default is 0, which make no effect. This config will only effect the creating olap table operation, other operation such as `add partition`, `modify table properties` will not be effect. The motivation of this config is that the most regression test cases are creating table will single replica, this will be the regression test running well in p0, p1 pipeline. But we also need to run these cases in multi backend Doris cluster, so we need test cases will multi replicas. But it is hard to modify each test cases. So I add this config, so that we can simply set it to create all tables with specified replication number.	2023-07-21 19:36:04 +08:00
Siyang Tang	e489b60ea3	[feature](load) support line delimiter for old broker load (#22030 )	2023-07-21 19:31:19 +08:00
谢健	b76d0d84ac	[enhancement](Nereids) support other join framework in DPHyper (#21835 ) implement CD-A algorithm in order to support others join in DPHyper. The algorithm details are in on the correct and complete enumeration of the core search	2023-07-21 18:31:52 +08:00
mch_ucchi	7cac36d9e8	[chore](Nereids) fix typo in some plan visitor (#21830 )	2023-07-21 18:22:20 +08:00
yujun	94e2c3cf0f	[fix](tablet clone) sched wait slot if has be path (#22015 )	2023-07-21 13:27:40 +08:00
bobhan1	74313c7d54	[feature-wip](autoinc)(step-3) add auto increment support for unique table (#22036 )	2023-07-21 13:24:41 +08:00
ZenoYang	6512893257	[refactor](vectorized) Remove useless control variables to simplify aggregation node code (#22026 ) * [refactor](vectorized) Remove useless control variables to simplify aggregation node code * fix	2023-07-21 12:45:23 +08:00
starocean999	fb5b412698	[fix](planner)fix bug of pushing conjuncts into inlineview (#21962 ) 1. markConstantConjunct method shouldn't change the input conjunct 2. Use Expr's comeFrom method to check if the pushed expr is one of the group by exprs, this is the correct way to check if the conjunct can be pushed down through the agg node. 3. migrateConstantConjuncts should substitute the conjuncts using inlineViewRef's analyzer to make the analyzer recognize the column in the conjuncts in the following analyze phase	2023-07-21 11:34:56 +08:00
谢健	b09c4d490a	[fix](test) should not create and read internal table when use mock cluster in UT (#21660 )	2023-07-21 11:30:26 +08:00
zhangdong	0b2b1cbd58	[improvement](multi-catalog)add last sync time for external catalog (#21873 ) which operation can update this time: 1.when refresh catalog,lastUpdateTime of catalog will be update 2.when refresh db,lastUpdateTime of db will be update 3.when reload table schema to cache,lastUpdateTime of dbtable will be update 4.when receive add/drop table event,lastUpdateTime of db will be update 5.when receive alter table event,lastUpdateTime of table will be update	2023-07-21 09:42:35 +08:00
mch_ucchi	f3d9a843dd	[Fix](planner)fix ctas incorrect string types of the target table. (#21754 ) string types from src table will be replaced to text type in ctas table, we change it to be corresponding to the src table.	2023-07-20 22:14:43 +08:00
mch_ucchi	a151326268	[Fix](planner)fix failed running alias function with an alias function in original function. (#21024 ) failed to run sql: ```sql create alias function f1(int) with parameter(n) as dayofweek(hours_add('2023-06-18', n)) create alias function f2(int) with parameter(n) as dayofweek(hours_add(makedate(year('2023-06-18'), f1(3)), n)) select f2(f1(3)) ``` it will throw an exception: f1 is not a builtin-function. because f2's original function contains f1, and f1 is not a builtin-function, should be rewritten firstly. we should avoid of it. And we will support it later.	2023-07-20 22:12:10 +08:00
Shiyuan Ji	ab11dea98d	[Enhancement](config) optimize behavior of default_storage_medium (#20739 )	2023-07-20 22:00:11 +08:00
slothever	7d488688b4	[fix](multi-catalog)fix minio default region and throw minio error msg, support s3 bucket root path (#21994 ) 1. check minio region, set default region if user region is not provided, and throw minio error msg 2. support read root path s3://bucket1 3. fix max compute public access	2023-07-20 20:48:55 +08:00
Jibing-Li	eabd5d386b	[Fix](multi catalog)Fix nereids context table always use internal catalog bug (#21953 ) The getTable function in CascadesContext only handles the internal catalog case (try to find table only in internal catalog and dbs). However, it should take all the external catalogs into consideration, otherwise, it will failed to find a table or get the wrong table while querying external table. This pr is to fix this bug.	2023-07-20 20:32:01 +08:00

1 2 3 4 5 ...

5302 Commits