doris

Author	SHA1	Message	Date
zhangdong	dfb5d4bc13	[fix](catalog) do not call makeSureInitialized when create/drop table/db from hms meta event (#21941 ) Supplement to #21104	2023-07-23 11:24:20 +08:00
caiconghui	8cb532230a	[fix](metric) fix prometheus metric format error (#22045 ) we should define metric name only once like following: # HELP doris_fe_query_latency_ms # TYPE doris_fe_query_latency_ms summary doris_fe_query_latency_ms{quantile="0.75"} 1.0 doris_fe_query_latency_ms{quantile="0.95"} 2.0 doris_fe_query_latency_ms{quantile="0.98"} 100.0 doris_fe_query_latency_ms{quantile="0.99"} 100.0 doris_fe_query_latency_ms{quantile="0.999"} 100.0 doris_fe_query_latency_ms{quantile="0.75",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.95",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.98",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.99",user="default_cluster:test1"} 1.0 doris_fe_query_latency_ms{quantile="0.999",user="default_cluster:test1"} 1.0	2023-07-22 22:38:29 +08:00
amory	3d0f952934	[FIX](complex-type)delete enable_map/struct_type switch #21957	2023-07-22 15:29:32 +08:00
zhannngchen	50c8563f35	[fix](partial update) fix some bugs of sequence column (#21896 )	2023-07-22 15:26:48 +08:00
Jibing-Li	355ac18363	[Fix](jdbc catalog) Pass conjuncts to JdbcScanNode and FileScanNode before doing finalize. (#21998 ) JdbcScanNode need to use the conjuncts to generate sql in finalize function. But the conjuncts have not passed to JdbcScanNode yet while calling finalize. This pr is to pass the conjuncts to scan node before using it to avoid scan the whole table.	2023-07-22 14:08:44 +08:00
zy-kkk	42ec92fd12	[enhancement](jdbc catalog) Add sqlserver jdbc url param `useBulkCopyForBatchInsert=true` (#22032 ) When useBulkCopyForBatchInsert=false, the JDBC driver will not use SQL Server's Bulk Copy API for batch insertions. Thus, during the batch insertion process, each insert statement needs to be individually sent to the SQL Server, leading to a higher number of network roundtrips. Network latency could potentially become a significant factor contributing to performance degradation. For this reason, we recommend setting this parameter to true by default to enhance the performance of PreparedStatement batch insertions. In this manner, when performing batch insertions, the JDBC driver will send all insertion data to SQL Server in one go via the Bulk Copy API, rather than sending each insert statement individually. This can significantly reduce the number of network roundtrips, thereby improving performance. Please note that this option is only effective for fully parameterized INSERT statements. If your INSERT statement is mixed with other SQL statements, or if it contains values specified directly in the statement, then the JDBC driver will not use the Bulk Copy API, but instead will use the standard insert method.	2023-07-22 11:32:21 +08:00
Jibing-Li	82f5a3f684	[Fix] (multi catalog)Fix external table couldn't find db bug (#22074 ) Nereids LogicalCatalogRelation and PhysicalCatalogRelation getDatabase function only try to search InternalCatalog to find a table. This will cause all external table failed to query because it couldn't find the external database in Internal catalog. ``` mysql> explain select count(*) from multi_partition_orc; ERROR 1105 (HY000): AnalysisException, msg: Database [default_cluster:multi_partition] does not exist. ``` This pr is using catalog name to find the correct catalog first, and then try to get the database in this catalog.	2023-07-22 00:13:26 +08:00
starocean999	93f9a8cbf5	[fix](nereids)PredicatePropagation only support integer types for now (#22096 )	2023-07-21 23:40:08 +08:00
xzj7019	0b1c82b021	[opt](nereids) enhance runtime filter pushdown (#21883 ) Current runtime filter can't be pushed down into complicated plan pattern, such as set operation as join child and cte sender as filter before shuffling. This pr refines the pushing down ability and can able to push the filter into different plan tree layer recursively, such as nested subquery, set op, cte sender, etc.	2023-07-21 23:31:30 +08:00
YueW	ef01988ae1	[opt](inverted index) support the same column create different type index (#21972 )	2023-07-21 23:02:39 +08:00
starocean999	acf4aa2818	[fix](planner)shouldn't force push down conjuncts for union statement (#22079 ) * [fix](planner)shouldn't force push down conjuncts for union statement	2023-07-21 21:12:56 +08:00
Mingyu Chen	85cc044aaa	[feature](create-table) support setting replication num for creating table opertaion globally (#21848 ) Add a new FE config `force_olap_table_replication_num`. If this config is larger than 0, when doing creating table operation, the replication num of table will forcibly be this value. Default is 0, which make no effect. This config will only effect the creating olap table operation, other operation such as `add partition`, `modify table properties` will not be effect. The motivation of this config is that the most regression test cases are creating table will single replica, this will be the regression test running well in p0, p1 pipeline. But we also need to run these cases in multi backend Doris cluster, so we need test cases will multi replicas. But it is hard to modify each test cases. So I add this config, so that we can simply set it to create all tables with specified replication number.	2023-07-21 19:36:04 +08:00
Siyang Tang	e489b60ea3	[feature](load) support line delimiter for old broker load (#22030 )	2023-07-21 19:31:19 +08:00
谢健	b76d0d84ac	[enhancement](Nereids) support other join framework in DPHyper (#21835 ) implement CD-A algorithm in order to support others join in DPHyper. The algorithm details are in on the correct and complete enumeration of the core search	2023-07-21 18:31:52 +08:00
mch_ucchi	7cac36d9e8	[chore](Nereids) fix typo in some plan visitor (#21830 )	2023-07-21 18:22:20 +08:00
yujun	94e2c3cf0f	[fix](tablet clone) sched wait slot if has be path (#22015 )	2023-07-21 13:27:40 +08:00
bobhan1	74313c7d54	[feature-wip](autoinc)(step-3) add auto increment support for unique table (#22036 )	2023-07-21 13:24:41 +08:00
ZenoYang	6512893257	[refactor](vectorized) Remove useless control variables to simplify aggregation node code (#22026 ) * [refactor](vectorized) Remove useless control variables to simplify aggregation node code * fix	2023-07-21 12:45:23 +08:00
starocean999	fb5b412698	[fix](planner)fix bug of pushing conjuncts into inlineview (#21962 ) 1. markConstantConjunct method shouldn't change the input conjunct 2. Use Expr's comeFrom method to check if the pushed expr is one of the group by exprs, this is the correct way to check if the conjunct can be pushed down through the agg node. 3. migrateConstantConjuncts should substitute the conjuncts using inlineViewRef's analyzer to make the analyzer recognize the column in the conjuncts in the following analyze phase	2023-07-21 11:34:56 +08:00
谢健	b09c4d490a	[fix](test) should not create and read internal table when use mock cluster in UT (#21660 )	2023-07-21 11:30:26 +08:00
zhangdong	0b2b1cbd58	[improvement](multi-catalog)add last sync time for external catalog (#21873 ) which operation can update this time: 1.when refresh catalog,lastUpdateTime of catalog will be update 2.when refresh db,lastUpdateTime of db will be update 3.when reload table schema to cache,lastUpdateTime of dbtable will be update 4.when receive add/drop table event,lastUpdateTime of db will be update 5.when receive alter table event,lastUpdateTime of table will be update	2023-07-21 09:42:35 +08:00
mch_ucchi	f3d9a843dd	[Fix](planner)fix ctas incorrect string types of the target table. (#21754 ) string types from src table will be replaced to text type in ctas table, we change it to be corresponding to the src table.	2023-07-20 22:14:43 +08:00
mch_ucchi	a151326268	[Fix](planner)fix failed running alias function with an alias function in original function. (#21024 ) failed to run sql: ```sql create alias function f1(int) with parameter(n) as dayofweek(hours_add('2023-06-18', n)) create alias function f2(int) with parameter(n) as dayofweek(hours_add(makedate(year('2023-06-18'), f1(3)), n)) select f2(f1(3)) ``` it will throw an exception: f1 is not a builtin-function. because f2's original function contains f1, and f1 is not a builtin-function, should be rewritten firstly. we should avoid of it. And we will support it later.	2023-07-20 22:12:10 +08:00
Shiyuan Ji	ab11dea98d	[Enhancement](config) optimize behavior of default_storage_medium (#20739 )	2023-07-20 22:00:11 +08:00
slothever	7d488688b4	[fix](multi-catalog)fix minio default region and throw minio error msg, support s3 bucket root path (#21994 ) 1. check minio region, set default region if user region is not provided, and throw minio error msg 2. support read root path s3://bucket1 3. fix max compute public access	2023-07-20 20:48:55 +08:00
Jibing-Li	eabd5d386b	[Fix](multi catalog)Fix nereids context table always use internal catalog bug (#21953 ) The getTable function in CascadesContext only handles the internal catalog case (try to find table only in internal catalog and dbs). However, it should take all the external catalogs into consideration, otherwise, it will failed to find a table or get the wrong table while querying external table. This pr is to fix this bug.	2023-07-20 20:32:01 +08:00
Jibing-Li	e4ac52b2aa	[Improvement](profile)Add init and finalize external scan node time in profile (#21749 ) Add more profile information for external table plan time. Including init and finalize scan node time, getSplits time, create scan range time, get all partitions time and get all files for all partitions time. Also modified the Indentation to make it easier to read. This is an example output of the new profile summary. ``` Execution Summary: - Analysis Time: 3ms - Plan Time: 26s885ms - JoinReorder Time: N/A - CreateSingleNode Time: N/A - QueryDistributed Time: N/A - Init Scan Node Time: 1ms - Finalize Scan Node Time: 26s868ms - Get Splits Time: 26s554ms - Get PARTITIONS Time: 20s189ms - Get PARTITION FILES Time: 6s289ms - Create Scan Range Time: 314ms - Schedule Time: 1s67ms - Fetch Result Time: 56ms - Write Result Time: 0ms - Wait and Fetch Result Time: 57ms ```	2023-07-20 20:29:18 +08:00
Hongshuai Zhang	0e8432526e	[fix](multi-catalog) check properties when alter catalog (#20130 ) When we were altering the catalog, we did not verify the new parameters of the catalog, and now we have added verification My changes： When We are altering the catalog, I have carried out a full inspection, and if an exception occurs, the parameters will be rolled back	2023-07-20 20:18:14 +08:00
minghong	aabe379527	[fix](stats) support utf-8 string range compare (#22024 ) in previous version, some utf-8 string literal are mapped to negative double. this issue makes our range check misfunction.	2023-07-20 18:39:41 +08:00
morrySnow	ee65e0a6b1	[fix](Nereids) should not remove any limit from uncorrelated subquery (#21976 ) We should not remove any limit from uncorrelated subquery. For Example ```sql -- should return nothing, but return all tuple of t if we remove limit from exists SELECT * FROM t WHERE EXISTS (SELECT * FROM t limit 0); -- should return the tuple with smallest c1 in t, -- but report error if we remove limit from scalar subquery SELECT * FROM t WHERE c1 = (SELECT * FROM t ORDER BY c1 LIMIT 1); ```	2023-07-20 18:37:04 +08:00
bobhan1	367ad9164a	[feature-wip](auto-inc)(step-2) support auto-increment column for duplicate table (#19917 )	2023-07-20 18:03:39 +08:00
shuke	be2754e1a2	[fuzzy](modify) enable pipeline and nereids in regression env by default (#21824 ) enable pipeline and nereids in regression env by default	2023-07-20 17:12:21 +08:00
谢健	4cfe990095	[enhancement](Nereids) add test framework for otherjoin (#21887 )	2023-07-20 16:35:55 +08:00
HonestManXin	365afb5389	[fix](sparkdpp) Hive table properties not take effect when create spark session (#21881 ) When creating a Hive external table for Spark loading, the Hive external table includes related information such as the Hive Metastore. However, when submitting a job, it is required to have the hive-site.xml file in the Spark conf directory; otherwise, the Spark job may fail with an error message indicating that the corresponding Hive table cannot be found. The SparkEtlJob.initSparkConfigs method sets the properties of the external table into the Spark conf. However, at this point, the Spark session has already been created, and the Hive-related parameters will not take effect. To ensure that the Spark Hive catalog properly loads Hive tables, you need to set the Hive-related parameters before creating the Spark session. Co-authored-by: zhangshixin <zhangshixin@youzan.com>	2023-07-20 14:36:00 +08:00
starocean999	86d7233b06	[fix](nereids) ExtractAndNormalizeWindowExpression rule should push down correct exprs to child (#21827 ) consider the window function: ```sql substr( ref_1.cp_type, sum(CASE WHEN ref_1.cp_type = 0 THEN 3 ELSE 2 END) OVER (), 1) ``` Before the pr, only "CASE WHEN ref_1.cp_type = 0 THEN 3 ELSE 2 END" is pushed down. But both "ref_1.cp_type" and "CASE WHEN ref_1.cp_type = 0 THEN 3 ELSE 2 END" should be pushed down. This pr fix it	2023-07-20 11:47:55 +08:00
Kaijie Chen	0f116ce148	Revert "[Enhancement](Nereids)enable nereids DML by default. (#21539 )" (#22013 ) This reverts commit f668b3965effbd5df4902f20b496cb6b6642414c.	2023-07-20 11:32:54 +08:00
morrySnow	a859a93b22	[fix](Nereids) should not push down project to the nullable side of outer join (#21999 )	2023-07-20 11:10:11 +08:00
Siyang Tang	3d0832d973	[fix](stmt-forward) fix forward null packet (#21979 )	2023-07-20 10:45:16 +08:00
Mingyu Chen	5c7c4d90b4	[improvement](catalog) return the root cause of error when forwarding init request to master FE (#22001 )	2023-07-20 10:42:29 +08:00
lihangyu	e7f143c266	[Fix](topn opt) forbit outfile when using 2phase read (#21991 ) "Enabling two-phase query for similar select * from tbl into outfile "file:/xxx/" format as orc; queries can lead to performance issues due to the fetch operation."	2023-07-20 10:32:30 +08:00
zhangstar333	c364196577	[fuzzy](test) set topnOptLimitThreshold to 0 in fuzzy test temporary (#21952 ) Now P0 pipeline test have some failed cese about topn, but can't reproduce at local So set this threshold to 0 temporary.	2023-07-20 10:22:22 +08:00
Jack Drogon	28a6a2e44d	[Enhancement](binlog) Add partitionRange && indexIds in UpsertRecord && PartitionCommitInfo (#22005 )	2023-07-20 09:52:21 +08:00
zy-kkk	2daad2151d	[enhancement](jdbc catalog) Add mysql jdbc catalog function to filter push-down identification (#21745 )	2023-07-19 23:48:23 +08:00
LiBinfeng	58f2593ba1	[Fix](Nereids) Add cast comparison with slot reference when inferring predicate (#21171 ) Problem: When inferring predicate, we assume that slot reference need to be inferred. But in this case: carete table tb1(l1 smallint) ...; create table tb2(l2 int) ...; select * from tb1 inner join tb2 where tb1.l1 = tb2.l2 and tb2.l2 = 1; We can not get tb1.l1 = 1 filter because we will add a cast to l1 (Cast smallint to int l1) = l2. Solved: Add cast consideration when inferring predicate, also add change judgement when judging equals to slotreference and cast expression. But when we want to infer predicate from bigger type cast to smaller type, it is logical error. For example: select * from tb1 inner join tb2 where tb1.l1 = cast(tb2.l2 as smallint) and tb2.l2 = (number between smallint max and intmax); tb2.l2 value can not infer to left side because tb1.l1 would be false value, and when we add one more condition like tb1.l1 = tb3.l3(smallint). It would cause this predicate be false.	2023-07-19 23:14:26 +08:00
zhangstar333	a51aab6d29	[FE](compile) fix master fe compile failed (#21971 ) fix master fe compile failed	2023-07-19 18:02:00 +08:00
jakevin	0fa3efae1d	[fix](Nereids): removePhysicalExpression() should clear empty Group. (#21951 )	2023-07-19 14:41:06 +08:00
minghong	bd40767754	[stats](nereids) dump col stats for all physical plan node and cost details in memo #21902 1. print cost detail 2. dump col stats in memo	2023-07-19 14:10:26 +08:00
mch_ucchi	f668b3965e	[Enhancement](Nereids)enable nereids DML by default. (#21539 ) TODO: fix cast agg_state type when do insert	2023-07-19 13:52:15 +08:00
Xiaocc	d8272b16e9	[fix](fe) fd leak of ssl #19645	2023-07-19 12:45:54 +08:00
morrySnow	d987f782d2	[refactor](Nereids) refactor cte analyze, rewrite and reuse code (#21727 ) REFACTOR: 1. Generate CTEAnchor, CTEProducer, CTEConsumer when analyze. For example, statement `WITH cte1 AS (SELECT * FROM t) SELECT * FROM cte1`. Before this PR, we got analyzed plan like this: ``` logicalCTE(LogicalSubQueryAlias(cte1)) +-- logicalProject() +-- logicalCteConsumer() ``` we only have LogicalCteConsumer on the plan, but not LogicalCteProducer. This is not a valid plan, and should not as the final result of analyze. After this PR, we got analyzed plan like this: ``` logicalCteAnchor() \|-- logicalCteProducer() +-- logicalProject() +-- logicalCteConsumer() ``` This is a valid plan with LogicalCteProducer and LogicalCteConsumer 2. Replace re-analyze unbound plan with deepCopy plan when do CTEInline Because we generate LogicalCteAnchor and LogicalCteProducer when analyze. So, we could not do re-analyze to gnerate CTE inline plan anymore. The another reason is, we reuse relation id between unbound and bound relation. So, if we do re-analyze on unresloved CTE plan, we will get two relation with same RelationId. This is wrong, because we use RelationId to distinguish two different relations. This PR implement two helper class to deep copy a new plan from CTEProducer. `LogicalPlanDeepCopier` and `ExpressionDeepCopier` 3. New rewrite framework to ensure do CTEInline in right way. Before this PR, we do CTEInline before apply any rewrite rule. But sometimes, some CteConsumer could be eliminated after rewrite. After this PR, we do CTEInline after the plans relaying on CTEProducer have been rewritten. So we could do CTEInline if some the count of CTEConsumer decrease under the threshold of CTEInline. 4. add relation id to all relation plan node 5. let all relation generated from table implement trait CatalogRelation 6. reuse relation id between unbound relation and relation after bind ENHANCEMENT: 1. Pull up CTEAnchor before RBO to avoid break other rules' pattern Before this PR, we will generate CTEAnchor and LogicalCTE in the middle of plan. So all rules should process LogicalCTEAnchor, otherwise will generate unexpected plan. For example, push down filter and push down project should add pattern like: ``` logicalProject(logicalCTE) ... logicalFilter(logicalCteAnchor) ... ``` project and filter must be push through these virtual plan node to ensure all project and filter could be merged togather and get right order of them. for Example: ``` logicalProject +-- logicalFilter +-- logicalCteAnchor +-- logicalProject +-- logicalFilter +-- logicalOlapScan ``` upper plan will lead to translation error. because we could not do twice filter and project on bottom logicalOlapScan. BUGFIX: 1. Recursive analyze LogicalCTE to avoid bind outer relation on inner CTE For example ```sql SELECT * FROM (WITH cte1 AS (SELECT * FROM t1) SELECT * FROM cte1)v1, cte1 v2; ``` Before this PR, we will use nested cte name to bind outer plan. So the outer cte1 with alias v2 will bound on the inner cte1. After this PR, the sql will throw Table not exists exception when binding. 2. Use right way do withChildren in CTEProducer and remove projects in it Before this PR, we add an attr named projects in CTEProducer to represent the output of it. This is because we cannot get right output of it by call `getOutput` method on it. The root reason of that is the wrong implementation of computeOutput of LogicalCteProducer. This PR fix this problem and remove projects attr of CTEProducer. 3. Adjust nullable rule update CTEConsumer's output by CTEProducer's output This PR process nullable on LogicalCteConsumer to ensure CteConsumer's output with right nullable info, if the CteProducer's output nullable has been adjusted. 4. Bind set operation expression should not change children's output's nullable This PR use fix a problem introduced by prvious PR #21168. The nullable info of SetOperation's children should not changed after binding SetOperation.	2023-07-19 11:41:41 +08:00

1 2 3 4 5 ...

5278 Commits