doris

Author	SHA1	Message	Date
minghong	be8fb68712	[fix](nereids)distribute node missing rows and cost #20943 in dumped memo, distribute node missed estimated rows and cost.	2023-06-19 23:42:01 +08:00
minghong	f20ef165fe	[opt](Nereids) update join stats derive (#20895 ) in hash join condition, some equals are trustable, some are not. an equal is trustable if one side is almost unique, like primary key. for such equal condition we could estimate more accurate. the problem is in rewriten q20, the are 2 equal condition, one is trustable, another is not. But we treat both of them as trustable. Test result: on tpch100, from 2.2 sec to 0.44 sec no impact on tpch other queries no performance impact on tpcds queries	2023-06-19 23:40:44 +08:00
AKIRA	010861b7ec	[enhancement](Nereids) Don't write to table_statistics when create sync analyze job anymore #20956 1. Don't write to table_statistics when create sync analyze job anymore since it's meaningless 2. Capture exceptions when creating each system analyze job to avoid the failure of creation of all automatic collection jobs due to a single job creation failure. 3. Mark auto triggered period job's job type as system	2023-06-19 20:00:41 +08:00
morrySnow	08ac55291f	[opt](Nereids) change log level to debug to avoid log explode (#20954 )	2023-06-19 18:50:06 +08:00
Mingyu Chen	415f1053a4	[minor](progress) do not update progress if job id is not set (#20949 )	2023-06-19 18:13:43 +08:00
Xiangyu Wang	63b9684696	[Feature](broker-load) Support priority for Broker Load job. (#20628 ) Support priority for Broker Load job.	2023-06-19 14:16:48 +08:00
yongjinhou	26cca5e00a	[Enhancement](tvf) Add frontends table-valued-function (#20857 )	2023-06-19 13:57:40 +08:00
Mingyu Chen	170cc46b12	[fix](hive) check hive transactional table's file format (#20888 ) Sometimes we meet a hive table with parameter: "transactional" = "true", but format is parquet, which is not supported. So we need to check the input format for transactional table.	2023-06-19 12:33:00 +08:00
starocean999	4c6f1de42b	[feature](nereids) enable nereids planner by default (#20761 ) 1. enable nereids planner by default 2. add fuzzy test on enable_nereids_planner 3. fix some ut that result different under legacy planner and nereids	2023-06-19 12:24:47 +08:00
Pxl	85c5d7c6a9	[Chore](materialized-view) add ssb_flat mv test case (#20869 ) add ssb_flat mv test case	2023-06-19 10:51:50 +08:00
Kang	0bed52d86b	[bugfix](inverted index) fix not writing inverted index file if upgrade from old version #20945 The metadata indexes in MaterializedIndexMeta is introduced in 2.0-beta version and it's relied by writing inverted index. When upgrade doris from old version to 2.0-beta, the metadata indexes in MaterializedIndexMeta is empty and no inverted index file will be written. This PR add compatibility logic to use metadata indexes in table if indexes in MaterializedIndexMeta is empty and indexes in MaterializedIndexMeta indexId is equal to table baseIndexId. This PR also fix metadata indexes missing for schema change.	2023-06-19 09:11:36 +08:00
Zhiyu Hu	1efd345963	[Enhancement](table) adding information_schema.parameters table (#20259 ) this is a virtual table for compatibility information_schema parameters table	2023-06-19 09:05:46 +08:00
slothever	8af77315e0	[fix](multi-catalog)fix s3 check, complete catalog properties (#20591 ) stability and some fixes 1. fix s3 availability check 2. add independent minio properties 3. add job conf cache 4. remove extra s3 propertie when convert catalog properties 5. add some ut case to check conveted properties	2023-06-18 20:21:59 +08:00
Siyang Tang	8366ce7a81	[enhancement](insert-stmt) Make `insert into tbl values();` compatible with mysql (#20694 )	2023-06-18 19:56:07 +08:00
jakevin	ac3290021d	[fix](Nereids): MergeSetOperations can merge SetOperation ALL. (#20902 )	2023-06-18 17:49:03 +08:00
mch_ucchi	5ae14549d1	[Feature](Nereids) support delete using syntax to delete data from unique key table (#20452 )	2023-06-18 16:22:21 +08:00
morrySnow	905ba3fa86	[fix](Nereids & planner) fix 3 plan errors about mv selection (#20939 ) fix 3 issues about mv selection: 1. in legacy planner, should not consider aux expr when do mv selection 2. in Nereids, should not hit mv when agg function on value column is distinct 3. select mv cannot rewrite agg function in on table with column name in uppercase	2023-06-18 12:45:34 +08:00
morrySnow	c6d78c2d7b	[fix](Nereids) cannot get output when partitioned table without any parititon (#20937 )	2023-06-18 12:44:36 +08:00
GoGoWen	e5015b472e	[Fix](Error) fix some typo error in code (#20612 )	2023-06-17 08:34:26 +08:00
zy-kkk	fe18cfa2fb	[improvement](pg jdbc)Support for automatically obtaining the precision of the postgresql timestamp type (#20909 )	2023-06-16 23:41:09 +08:00
zy-kkk	367f64e7bd	[improvement](jdbc) support insert autoinc and default value column to mysql (#20765 ) In JdbcMysqlClient, I've added methods to retrieve auto-increment and default value columns from MySQL. These columns are then mapped into Doris metadata to make them visible to users. When handling the InsertStmt into an execution plan, Doris used to automatically fill in NULL or default values for columns not specified in the InsertStmt. However, in the JDBC catalog, we don't need Doris to handle these unspecified columns, so I've made changes to skip them directly. For the insert prepared statement required for writing, our previous behavior was to obtain all columns for placeholders. So, the change I made is to pass in the columns processed by the execution plan during the sink task generation stage for dynamic generation.	2023-06-16 23:38:11 +08:00
zy-kkk	e834637a5b	[improvement](ck jdbc) Support for automatically getting the precision of clickhouse's datetime64 type (#20887 )	2023-06-16 23:37:30 +08:00
minghong	bf197ee8d2	[opt](nereids) adjust cost model for BroadCastJoin and PartitionJoin (#20713 ) we add penalty for broadcast join (bc for brief in the following). the intuition of penalty is as follow: 1. if the build side is very small (< 1M), we prefer bc, and set `penalty=1`, which means no penalty 2. if build side is more than 1M, we consider the ratio of the probe row count to the build row count. the less the ratio is, the higher penalty is. this pr has positive impact on tpch queries. Only q3 is changed. in out test (tpch 1T, 3BE) q3 improved from 5.1sec to 2.5 sec. this pr has positive impact on tpcds queries. test on tpcds sf100 (3BE), cold run improve from 163 sec to 156 sec, hot run improves from 155 sec to 149 sec	2023-06-16 22:49:04 +08:00
morrySnow	bb4f10b457	[fix](Nereids) lost having when analyze sort-having-agg (#20914 )	2023-06-16 22:13:28 +08:00
morrySnow	7ee744ff5a	[opt](Nereids) add more unexpected expression check (#20901 ) 1. check sub-query after rewrite, should not meet any sub-query there 2. check below expression on specific plan - Aggreagate - TableGeneratingFunction - Filter - AggregateFunction - GroupingScalarFunction - TableGeneratingFunction - WindowExpression - Generate - AggregateFunction - GroupingScalarFunction - WindowExpression - Having - TableGeneratingFunction - WindowExpression - Join - AggregateFunction - GroupingScalarFunction - TableGeneratingFunction - WindowExpression - Project - TableGeneratingFunction - Sort - AggregateFunction - GroupingScalarFunction - TableGeneratingFunction - WindowExpression - Window - GroupingScalarFunction - TableGeneratingFunction	2023-06-16 22:12:39 +08:00
xzj7019	ab32299ba4	[feature](nereids) Support multi target rf #20714 Support multi target runtime filter, mainly for set operation, such as union/intersect/except.	2023-06-16 20:26:00 +08:00
YueW	1cc611a913	[fix](match) fix regression case test_index_match_select and test_index_match_phrase (#20860 ) 1. add more checks for match expression in nereids: - match expression only support in filter - match expression left child and right child must all be string type - left child for match expression must be sloftRef, right child for match expression must be Literal 2. to fix regression case test_index_match_select and test_index_match_phrase	2023-06-16 20:18:29 +08:00
morrySnow	5dc0f90c7f	[opt](Nereids) revert convert IN with 2 options to OR expression rule (#20894 ) revert this rule because it has negative effect on predicate push-down-to-storage-layer	2023-06-16 19:11:37 +08:00
jakevin	6cde7bc8ad	[feature](Nereids) just reserve logical expression in memo after do dphyp (#20843 ) After DPHyp, clear all physicalExpression and other, just keep logicalExpression as original plan as input of cascades optimize.	2023-06-16 18:12:39 +08:00
starocean999	536bf56a35	[fix](planner) strip trailing zeros when cast string literal to decimal literal (#20743 )	2023-06-16 17:08:36 +08:00
AKIRA	e63739e729	[fix](nereids) add regression test for stats analyze and fix some bugs (#20865 ) 1. Add regression test case for analyze to make sure show/drop/analyze stats would work as expected 2. Remove useless code, which would block the clean for expired stats 3. Fix bug of DropStats, before this PR drop the whole table stats would casuse a NPE exception when parsing stmt	2023-06-16 16:43:49 +08:00
mch_ucchi	0bd9aecfcc	[Fix](planner&Nereids) fold constant invalid yyyy-mm-31 to the last day of month incorrectly (#20685 ) currently, expression: cast('20230631' as date) will be evaluate to 2023-06-30 incorrectly, and '20230632' will be null, we fix the bug and evaluate all the invalid date to null.	2023-06-16 16:23:02 +08:00
amory	c3b9e99350	[fix](regress-test)update config for disable_nested_complex_type (#20735 )	2023-06-16 15:51:41 +08:00
starocean999	ccfd6f1d23	[fix](nereids) only setHasColocatePlanNode when currentFragment's data partition is not RANDOM (#20885 )	2023-06-16 14:54:22 +08:00
Yongqiang YANG	1a62c79970	[fix](replica) do not delete the only one replica (#20872 )	2023-06-16 13:57:18 +08:00
Jack Drogon	9d41edd9eb	[Feature](binlog) Add binlog gc && Auth master_token (#20854 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-06-16 11:25:11 +08:00
jakevin	21bb7fea07	[minor](Nereids): convert if condition into a check (#20855 )	2023-06-15 20:10:56 +08:00
zy-kkk	d9b3c2aba2	[improvement](jdbc) support support get mysql information_schema's table and clickhouse system's table (#20768 )	2023-06-15 14:53:51 +08:00
Pxl	01e53f4e67	[Bug](materialized-view) fix problems about create mv on ssb_flat q4.1 failed (#20658 ) fix problems about create mv on ssb_flat q4.1 failed	2023-06-15 14:38:21 +08:00
Jibing-Li	31c17f1088	[improvement](tvf)Support hdfs and s3 tvf for nereids (#20829 ) Support hdfs and s3 tvf for nereids.	2023-06-15 10:30:09 +08:00
weizuo93	71e8cb061c	[Log](query) Add log of fragment instance num for query (#20597 ) Co-authored-by: weizuo <weizuo@xiaomi.com>	2023-06-15 09:52:13 +08:00
zy-kkk	09d187ec77	[improvement](ck jdbc) Optimized reading of datetime and ip types of the ClickHouse JDBC Catalog (#20804 )	2023-06-14 23:28:08 +08:00
Jibing-Li	f82e43b96a	[Improvement](jdbc external table)Support jdbc external table for nereids. (#20799 ) Nereids planner only support JDBC external catalog, this pr is to support JDBC external table for nereids.	2023-06-14 23:25:43 +08:00
morrySnow	7ed03f6b86	[fix](Nereids) EmptySetRelation should be Gather not Any (#20801 )	2023-06-14 19:24:33 +08:00
YueW	1c9f107185	[feature](nereids) support match syntax (#20781 ) Support match syntax in nereids. match syntax use like: ```sql select * from test where msg match "hello"; select * from test where msg match_any "hello"; select * from test where msg match_all "hello hi"; select * from test where msg match_phrase "hello world"; ``` `match` is same as `match_any`. the pr of match syntax in original planner: https://github.com/apache/doris/pull/14211	2023-06-14 17:30:27 +08:00
Ashin Gau	062641e8f8	[fix](hudi) set default class loader for hudi serializer (#20680 ) hudi serializer `org.apache.hudi.common.util.SerializationUtils$KryoInstantiator.newKryo` throws error like `java.lang.IllegalArgumentException: classLoader cannot be null`. Set the default class loader for scan thread. ``` public Kryo newKryo() { Kryo kryo = new Kryo(); ... // Thread.currentThread().getContextClassLoader() returns null kryo.setClassLoader(Thread.currentThread().getContextClassLoader()); ... return kryo; } ```	2023-06-14 16:02:56 +08:00
morrySnow	54d42244fe	[feature](Nereids) add cbo rewrite framework (#20746 ) The changes in this PR: 1. rename BatchRewriteJob to AbstractBatchJobExecutor 2. add a new rewrite job type, CostBasedRewriteJob. It receive a RewriteJob as input, compare the cost of two candidate plans using or not using the input RewriteJob and return the lower cost plan as the rewrite result. 3. do some small refactor on NereidsPlanner for better abstraction 4. do some refactor on dir structure of Nereids The usage of cbo rewrite framework: if you want let a rule or a rule list to be run in cbo rewrite frame work, you just need to wrap the rule / rule list with costBased function of class Rewriter, for example ```java ... costBased( custom(RuleType.AGG_SCALAR_SUBQUERY_TO_WINDOW_FUNCTION, AggScalarSubQueryToWindowFunction::new) ), ... ```	2023-06-14 15:57:42 +08:00
caiconghui	bcf103e993	[enhancement](log4j) support high performance mode for log4j to escape potential bottleneck for doris read and write (#20759 ) As we know, log4j2 some times may be bottleneck in doris fe when there are many logs to be output in sync mode while asynchronous logging has a better performance， and we find that capturing caller location has a similar impact across all logging libraries, and slows down asynchronous logging by about 30-100x. so, here we provide three log mode for log4j2 to meet the needs of different users. refer to https://logging.apache.org/log4j/2.x/performance.html	2023-06-14 15:16:04 +08:00
AKIRA	f707dc9395	[fix](stats) Fix NPE when analyze database sync (#20775 )	2023-06-14 15:01:02 +08:00
Gabriel	20ac940711	[Bug](pipeline) fix bug for file scan node on pipeline engine (#20763 )	2023-06-14 12:52:56 +08:00

1 2 3 4 5 ...

4961 Commits