doris

Author	SHA1	Message	Date
zy-kkk	09d187ec77	[improvement](ck jdbc) Optimized reading of datetime and ip types of the ClickHouse JDBC Catalog (#20804 )	2023-06-14 23:28:08 +08:00
Jibing-Li	f82e43b96a	[Improvement](jdbc external table)Support jdbc external table for nereids. (#20799 ) Nereids planner only support JDBC external catalog, this pr is to support JDBC external table for nereids.	2023-06-14 23:25:43 +08:00
morrySnow	7ed03f6b86	[fix](Nereids) EmptySetRelation should be Gather not Any (#20801 )	2023-06-14 19:24:33 +08:00
YueW	1c9f107185	[feature](nereids) support match syntax (#20781 ) Support match syntax in nereids. match syntax use like: ```sql select * from test where msg match "hello"; select * from test where msg match_any "hello"; select * from test where msg match_all "hello hi"; select * from test where msg match_phrase "hello world"; ``` `match` is same as `match_any`. the pr of match syntax in original planner: https://github.com/apache/doris/pull/14211	2023-06-14 17:30:27 +08:00
Ashin Gau	062641e8f8	[fix](hudi) set default class loader for hudi serializer (#20680 ) hudi serializer `org.apache.hudi.common.util.SerializationUtils$KryoInstantiator.newKryo` throws error like `java.lang.IllegalArgumentException: classLoader cannot be null`. Set the default class loader for scan thread. ``` public Kryo newKryo() { Kryo kryo = new Kryo(); ... // Thread.currentThread().getContextClassLoader() returns null kryo.setClassLoader(Thread.currentThread().getContextClassLoader()); ... return kryo; } ```	2023-06-14 16:02:56 +08:00
morrySnow	54d42244fe	[feature](Nereids) add cbo rewrite framework (#20746 ) The changes in this PR: 1. rename BatchRewriteJob to AbstractBatchJobExecutor 2. add a new rewrite job type, CostBasedRewriteJob. It receive a RewriteJob as input, compare the cost of two candidate plans using or not using the input RewriteJob and return the lower cost plan as the rewrite result. 3. do some small refactor on NereidsPlanner for better abstraction 4. do some refactor on dir structure of Nereids The usage of cbo rewrite framework: if you want let a rule or a rule list to be run in cbo rewrite frame work, you just need to wrap the rule / rule list with costBased function of class Rewriter, for example ```java ... costBased( custom(RuleType.AGG_SCALAR_SUBQUERY_TO_WINDOW_FUNCTION, AggScalarSubQueryToWindowFunction::new) ), ... ```	2023-06-14 15:57:42 +08:00
caiconghui	bcf103e993	[enhancement](log4j) support high performance mode for log4j to escape potential bottleneck for doris read and write (#20759 ) As we know, log4j2 some times may be bottleneck in doris fe when there are many logs to be output in sync mode while asynchronous logging has a better performance， and we find that capturing caller location has a similar impact across all logging libraries, and slows down asynchronous logging by about 30-100x. so, here we provide three log mode for log4j2 to meet the needs of different users. refer to https://logging.apache.org/log4j/2.x/performance.html	2023-06-14 15:16:04 +08:00
AKIRA	f707dc9395	[fix](stats) Fix NPE when analyze database sync (#20775 )	2023-06-14 15:01:02 +08:00
Gabriel	20ac940711	[Bug](pipeline) fix bug for file scan node on pipeline engine (#20763 )	2023-06-14 12:52:56 +08:00
mch_ucchi	1c394f4964	Fix](Nereids) insert into table not need unpartitioned as root fragment's data partition (#20737 )	2023-06-14 11:57:41 +08:00
starocean999	8726047f86	[fix](nereids) select text as minimum column unexpected (#20745 ) column of string and text types has width -1, and shouldn't be considered as minimum size column	2023-06-14 11:49:22 +08:00
qiye	35c19daec7	[opt](routine load) log BE id when get partitions failed. (#20749 ) Add BackendId when get partitions failed to make debug error easier.	2023-06-13 19:15:05 +08:00
lvshaokang	37db0145b4	[fix](load) fix mysql load parse response npe (#20699 )	2023-06-13 18:14:03 +08:00
starocean999	7636dd1fdc	[fix](nereids) always use colocate scan when agg's fragment has olap scan (#20695 )	2023-06-13 17:59:17 +08:00
starocean999	7942bd0bf9	[fix](planner) cast string literal to date like type should not be an implict cast (#20709 ) 1. cast string literal to date like type should not be an implict cast 2. the string representation of float like type should not be scientific notation 3. the data type of like function's regex expr should be string type even if it's a null literal 4. add -Xss4m in fe.conf to prevent stack overflow in some case	2023-06-13 17:57:14 +08:00
mch_ucchi	0e82c0d7a2	[Fix](Nereids) constant folding for function timestamp() (#20607 )	2023-06-13 17:41:58 +08:00
zy-kkk	54a7dbeb4d	[Refactor](External) Move Common ODBC Methods to JDBC Class and Add Default config to Disable ODBC Creation (#20566 ) This PR addresses the refactoring of common methods that were originally located within the ODBC classes, but were used by the JDBC classes. These methods have now been moved to the JDBC classes to improve code readability and maintainability. In addition, we have disabled the creation of ODBC external tables by default. However, this will not affect the existing usage of ODBC. You can still enable the ODBC external tables through the enable_odbc_table setting. Please be aware that we plan to completely remove the ODBC external tables in future versions, so we recommend using the JDBC Catalog as a priority.	2023-06-13 14:29:04 +08:00
mch_ucchi	eaa13e66f9	[fix](planner) inplement constant folding for function to_monday() (#20708 )	2023-06-13 11:40:44 +08:00
Yulei-Yang	ee0e2b40da	[Improvement](meta) support return brief info of restore job (#20653 )	2023-06-13 10:47:31 +08:00
Mingyu Chen	e28187feb7	[fix](hive) fix NPE of hive meta store client (#20664 ) The failed to connect to hive meta store, the exception will be thrown. But there is a bug that the exception object may not be set, causing NPE.	2023-06-13 09:41:49 +08:00
lexluo09	57656b2459	[Enhancement](java-udf) java-udf module split to sub modules (#20185 ) The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner. Co-authored-by: lexluo <lexluo@tencent.com>	2023-06-13 09:41:22 +08:00
Qi Chen	73ad885e19	[Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679 ) After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables. Support hive3 transactional hive full acid tables. Hive2 transactional hive full acid tables need to run major compactions.	2023-06-13 08:55:16 +08:00
zhangdong	939575f5f3	[fix](mtmv)create mtmv failed when not specifying refresh strategy #20696 * fix no refresh error * add ut	2023-06-13 08:53:24 +08:00
Mingyu Chen	412ca9059e	[fix](routine-load) fix stackoverflow bug in routine load (#20704 ) When executing routine load job, there may encounter StackOverflowException. This is because the expr in column setting list will be analyze for each routine load sub task, and there is a self-reference bug that may cause endless loop when analyzing expr. The following columns expr list may trigger this bug: ``` columns(col1, col2, col2=null_or_empty(col2), col1=null_or_empty(col2)) ``` This fix is verified by user, but I can't add regression test for this case, because I can't submit a routine load job in our regression test, and this bug can only be triggered in routine load.	2023-06-13 00:07:56 +08:00
zhangstar333	565095eb52	[bug](function) fix is_null/is_not_null check is_const has error (#20562 ) fix is_null/is_not_null check is_const has error	2023-06-12 18:21:12 +08:00
zhangdong	daf18a4b0e	[fix](MTMV) Support refreshing data manually (#20108 )	2023-06-12 17:57:06 +08:00
jiawei liang	99c0592157	[Feature](array-function) Support array_pushback function #17417 (#19988 ) Implement array_pushback. mysql> select array_pushback([1, 2], 3); +--------------------------------+ \| array_pushback(ARRAY(1, 2), 3) \| +--------------------------------+ \| [1, 2, 3] \| +--------------------------------+ 1 row in set (0.01 sec)	2023-06-12 16:51:12 +08:00
minghong	141813b476	[tpcds](nereids) estimate distribution cost by byte size instead of row count (#20642 ) this pr impacts tpch q16 Agg strategy, but no performance issue this pr improves tpcds sf100 before: cold 141 sec hot 133 sec after: code 137 sec hot 128 sec	2023-06-12 16:23:49 +08:00
zxealous	10134ea8c6	[fix](planner) fix RewriteInPredicateRule may be useless (#20668 ) Issue Number: close #20669 RewriteInPredicateRule may cast InPredicate expr's two child to the same type, for example: where cast(age as char) in ('11'), the type of age is int, RewriteInPredicateRule will cast expr's two child type to int. As in the example above, child 0 will be such struct: ``` child 0: type: int \|--- child: type : char \|-- child: type : int ``` Due to the RewriteInPredicateRule cast the type of the expr to int, it will reanalyze stmt, but it will reset stmt first before reanalyze the stmt, and reset opt will change child 0 to such struct: ``` child: type : char \|-- child: type : int ``` It cause two child's type will be cast to varchar in func castAllToCompatibleType, the logic of RewriteInPredicateRule will be useless. In 1.1-lts and 1.2-lts, such case " where cast(age as char) in ('11')" can't work well, because func castAllToCompatibleType will cast int to char but int can't cast to char(master can work well because func castAllToCompatibleType will cast int to varchar in such case). ``` MySQL [test]> select user_id from test_cast where cast(age as char) in ('45'); ERROR 1105 (HY000): errCode = 2, detailMessage = type not match, originType=INT, targeType=CHAR(*) ```	2023-06-12 14:39:01 +08:00
caoliang-web	28fbdf3273	[BUG](es_catalog)Solve the problem of querying es catalog Unexpected exception: Index:… (#18743 )	2023-06-12 13:48:12 +08:00
Pxl	7f8c5c81e7	[Feature](agg_state) support agg_state combinator on nereids (#20164 ) support agg_state combinator on nereids	2023-06-12 12:49:26 +08:00
starocean999	bcc37c9405	[fix](planner)the common type of floating and decimal should be floating type (#20634 ) * [fix](planner)the common type of floating and decimal should be floating type * fix test cases	2023-06-12 11:32:23 +08:00
GoGoWen	4c340f2851	[Feature] (Multi-Catalog) support query hll column in doris jdbc table - part 1 (#19413 ) Issue Number: close #17895	2023-06-12 11:16:19 +08:00
yiguolei	a6f625676b	[profile](remove child) child is for node, should not be used to organize counters (#20676 ) Currently, there are many profiles using add child profile to orgnanize profile into blocks. But it is wrong. Child profile will have a total time counter. Actually, what we should use is just a label. - MemoryUsage: - HashTable: 23.98 KB - SerializeKeyArena: 446.75 KB Add a new macro ADD_LABEL_COUNTER to add just a label in the profile. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-06-12 10:00:35 +08:00
catpineapple	c9b08d5c20	[feature](planner) multi partition create by integer column (#19597 ) Create partitions use ： ``` PARTITION BY RANGE(integer_col)( FROM (10) TO (1000) INTERVAL 50 ) ```	2023-06-11 22:42:21 +08:00
caiconghui	8162d0062b	[fix](alter) fix potential concurrent issue for alter when check olap table state normal outside write lock scope is not atomic (#20480 ) now, we check some olap table state normal outside write lock scope, the table state may be changed to unnormal when we do alter operation --------- Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-06-11 18:17:41 +08:00
abmdocrt	3d9e520fb2	[fix](ssl) fix ssl connection bug for JDBC 8.0.19 (#20659 )	2023-06-11 13:50:03 +08:00
minghong	987b29ded5	[fix](nereids)avoid to derive rowCount NaN (#20523 ) the formula used to compute ndv after filter implies that the new rowCount is smaller than the original rowCount. When we apply this formula to join, we should add branch if new row count is bigger than original row count. when new row count is bigger, the ndv is not changed.	2023-06-10 15:40:14 +08:00
Jibing-Li	87bc405c41	[Improvement](statistics)Support external table partition statistics (#20415 ) Support collect statistics for HMS external table with specific partitions. Add session variables to limit the partitions to collect for whole table line number and columns statistics.	2023-06-10 12:28:53 +08:00
Ashin Gau	9a83d78dfe	[Enhancement](hudi) support hudi mor table, step2 follow #19909 (#20570 ) PR(https://github.com/apache/doris/pull/19909) has implemented the framework of hudi reader for MOR table. This PR completes all functions of reading MOR table and enables end-to-end queries. Key Implementations: 1. Use hudi meta information to generate the table schema, not from hive client. 2. Use hive client to list hudi partitions, so it strongly depends the sync-tools(https://hudi.apache.org/docs/syncing_metastore/) which syncs the partitions of hudi into hive metastore. However, we may get the hudi partitions directly from .hoodie directory. 3. Remove `HudiHMSExternalCatalog`, because other catalogs like glue is compatible with hive catalog. 4. Read the COW table originally from c++. 5. Hudi RecordReader will use ProcessBuilder to start a hotspot debugger process, which may be stuck when attaching the origin JNI process, soI use a tricky method to kill this useless process.	2023-06-10 12:25:53 +08:00
morrySnow	c79642781b	[minor](Nereids) remove some invasive code of minidump in cascades framework (#20606 )	2023-06-09 23:41:00 +08:00
minghong	def6a8ec94	[regression](nereids) check tpch sf1T and sf500 plan shape on 3 BE environment #20610	2023-06-09 22:46:40 +08:00
YueW	656b9ad3da	[enhancement](index) Nereids support no need to read raw data for index column that only in filter conditions (#20605 )	2023-06-09 21:54:48 +08:00
morrySnow	54504fb61d	[opt](Nereids) remove running in OptimizeGroup to avoid recompute on it parent (#20608 ) we have some prunning path logical in cascades framework. However it do not work as we expected. if we do prunning on one Group, then maybe we need to do thousands of times optimization on its parent without any success result. This PR remove these prunning provisionally. We will add prunning back when we re-design it.	2023-06-09 19:16:39 +08:00
mch_ucchi	df1e526ec0	[opt](planner)(Nereids) add switch to determine if some unfixed functions will be folded on fe. (#20270 ) add switch to determine if below functions could be folded on fe. - now() - current_date() - current_time() - unix_timestamp() - utc_timestamp() - uuid() - rand()	2023-06-09 18:18:56 +08:00
Jack Drogon	70819fae22	[feature](alter) Add AlterDatabasePropertyStmt binlog impl (#20550 )	2023-06-09 17:29:21 +08:00
AKIRA	a6aee1fc2c	[enhancement](stats) Forbid unknown stats check for internal_column (#20535 ) Ignore internal columns when enable new optimizer and forbid unknown stats	2023-06-09 16:16:11 +08:00
AKIRA	b6386889d5	[fix](stats) set analysis job status to finished when be crashed by mistake (#20485 ) If BE crashed the error would be logged, and the analysis task would be mark as finished, which is incorrect. In this PR, update analysis task according to the query state	2023-06-09 15:43:11 +08:00
AKIRA	fe8233863a	[enhancement](stats) ignore view by default when analyze whole DB #20630	2023-06-09 14:13:54 +08:00
jakevin	44e20d9087	[feature](Nereids): push down alias into union outputs. (#20543 )	2023-06-09 11:53:44 +08:00

1 2 3 4 5 ...

4920 Commits