doris

Author	SHA1	Message	Date
lihangyu	50c1d55769	[Improve](dynamic schema) support filtering invalid data (#21160 ) * [Improve](dynamic schema) support filtering invalid data 1. Support dynamic schema to filter illegal data. 2. Expand the regular expression for ColumnName to support more column names. 3. Be compatible with PropertyAnalyzer and support legacy tables. 4. Default disable parse multi dimenssion array, since some bug unresolved	2023-06-26 19:32:43 +08:00
airborne12	1ac8cdec7e	[Fix](inverted index) fix inverted query cache for chinese tokenizer (#21106 ) 1. query cache for chinese tokenizer is confusing when just converting w_char to char. 2. seperate query_type from inverted_index_reader to clean code.	2023-06-25 22:04:02 +08:00
minghong	2d1163c4d8	[refactor](nereids) update Agg stats derive method #21036 This pr has no effect on tpch queries. Some tpcds queries are impacted. They are 4/11/23/24/47/51/57/65/74, in which 4 and 51 are improved	2023-06-25 21:47:32 +08:00
AKIRA	638aa41988	[fix](planner) fix push filter through agg #21080 In the previous implementation, the check for groupby exprs was ignored. Add this necessary check to make sure it would work You could reproduce it by runnning belowing sql: CREATE TABLE t_push_filter_through_agg (col1 varchar(11451) not null, col2 int not null, col3 int not null) UNIQUE KEY(col1) DISTRIBUTED BY HASH(col1) BUCKETS 3 PROPERTIES( "replication_num"="1" ); CREATE VIEW `view_i` AS SELECT `b`.`col1` AS `col1`, `b`.`col2` AS `col2` FROM ( SELECT `col1` AS `col1`, sum(`cost`) AS `col2` FROM ( SELECT `col1` AS `col1`, sum(CAST(`col3` AS INT)) AS `cost` FROM `t_push_filter_through_agg` GROUP BY `col1` ) a GROUP BY `col1` ) b; SELECT SUM(`total_cost`) FROM view_a WHERE `dt` BETWEEN '2023-06-12' AND '2023-06-18' LIMIT 1;	2023-06-25 19:14:20 +08:00
Mryange	6896776034	[test](regression) update some case in p2 (#21094 ) update some case in p2	2023-06-25 11:16:56 +08:00
starocean999	8b561cfb03	[fix](nereids)create datev2 and datetimev2 literal if enable_date_conversion is true (#21065 )	2023-06-21 20:29:36 +08:00
airborne12	6ac0bfeceb	[Feature](inverted index) add unicode parser for inverted index (#21035 )	2023-06-21 20:14:06 +08:00
zhannngchen	cc53391c9a	Revert "[feature](merge-on-write) enable merge on write by default (#… (#21041 )	2023-06-21 18:36:46 +08:00
HHoflittlefish777	2beed11256	[Bug](streamload) fix inconsistent load result of be and fe (#20950 )	2023-06-21 18:12:51 +08:00
Mryange	8bcd42d3f6	[test](regression) update some case in brown_p2 #21037	2023-06-21 16:25:07 +08:00
DeadlineFen	4d84cd8ca1	Revert "Revert "[Test](regression) CCR syncer thrift interface regression test (#20935 )" (#20990 )" (#21022 ) This reverts commit 2a294801f1324a999570158eea3224239eefbb29.	2023-06-21 15:20:21 +08:00
Qi Chen	bad22dd4e2	[Fix](orc-reader) Fix orc dict filter null value issue in `_convert_dict_cols_to_string_cols` which caused incorrect result. (#21047 ) Query results should not have empty values. ``` use regresssion.multi_catalog; select commit_id from github_events_orc WHERE (event_type = 'CommitCommentEvent') AND commit_id != "" limit 10; ``` ``` +------------------------------------------+ \| commit_id \| +------------------------------------------+ \| 685c1fd8dbbdc10c042932f9a9f88be00ff96c75 \| \| 685c1fd8dbbdc10c042932f9a9f88be00ff96c75 \| \| 4e3ab2ff2d2474f5d51334b9b0fdf17e9845a166 \| \| \| \| \| \| \| \| \| \| \| \| \| \| 7191c20cb49da07a7fc16aa32dc0de4faff528b2 \| +------------------------------------------+ 10 rows in set (0.54 sec) ```	2023-06-21 14:54:01 +08:00
Pxl	5f0bb49d46	[Feature](materialized-view) support create mv contain aggstate column (#20812 ) support create mv contain aggstate column	2023-06-21 13:06:52 +08:00
amory	18beb822a3	[FIX](array-type) fix array string output with fe const expr (#21042 ) fe foldconstRule make array() function expr with const literal , and would not pass this array literal to be . but we should make fe array string output format is same with be array string output	2023-06-21 11:52:02 +08:00
dujl	0cf9de8cef	[fix](decimalv3) fix result error when cast a round decimalv3 to double (#20678 )	2023-06-21 00:02:48 +08:00
Kang	2c11ce0a02	[bugfix](topn) fix key topn merge block conflict with index predicate result columns (#20820 )	2023-06-20 21:23:00 +08:00
LiBinfeng	f10258577b	[Fix](Planner) Fix group concat with multi distinct and segs (#20912 ) Problem: when use select group_concat(distinct a, 'seg1'), group_concat(distinct b, 'seg2') ... Error would rised Reason: Group_concat function regard 'seg' as arguments also, so multi distinct column error would rised Solved: let Multi Distinct group_concat function only get first argument as real argument	2023-06-20 21:00:18 +08:00
zy-kkk	7e01f074e2	[improvement](jdbc mysql) support auto calculate the precision of timestamp/datetime (#20788 )	2023-06-20 10:39:34 +08:00
zzzzzzzs	824bc02603	[Function] Support date function: microsecond() (#20044 )	2023-06-20 10:32:54 +08:00
jakevin	d02ecef406	[fix](Nereids): revert `push down alias into union` (#20991 ) revert #20543 to tmp avoid problem	2023-06-20 09:32:26 +08:00
Mryange	5a28b6f9fc	[fix](datetime) Fix the error in date calculation that includes constants (#20863 ) before ``` mysql> select hours_add('2023-03-30 22:23:45.23452',8); +-------------------------------------+ \| hours_add('2023-03-30 22:23:45', 8) \| +-------------------------------------+ \| 2023-03-31 06:23:45 \| +-------------------------------------+ mysql> select date_add('2023-03-30 22:23:45.23452',8); +------------------------------------+ \| date_add('2023-03-30 22:23:45', 8) \| +------------------------------------+ \| 2023-04-07 22:23:45 \| +------------------------------------+ mysql [test]>select hours_add('2023-03-30 22:23:45.23452',8); +-------------------------------------------+ \| hours_add('2023-03-30 22:23:45.23452', 8) \| +-------------------------------------------+ \| 2023-03-31 06:23:45.000234 \| +-------------------------------------------+ ``` after ``` mysql [test]>select hours_add('2023-03-30 22:23:45.23452',8); +-------------------------------------------+ \| hours_add('2023-03-30 22:23:45.23452', 8) \| +-------------------------------------------+ \| 2023-03-31 06:23:45.23452 \| +-------------------------------------------+ 1 row in set (0.01 sec) mysql [test]>select date_add('2023-03-30 22:23:45.23452',8); +------------------------------------------+ \| date_add('2023-03-30 22:23:45.23452', 8) \| +------------------------------------------+ \| 2023-04-07 22:23:45.23452 \| +------------------------------------------+ 1 row in set (0.00 sec) mysql [test]>set enable_nereids_planner=true; Query OK, 0 rows affected (0.00 sec) mysql [test]>set enable_fallback_to_original_planner=false; Query OK, 0 rows affected (0.00 sec) mysql [test]>select hours_add('2023-03-30 22:23:45.23452',8); +-------------------------------------------+ \| hours_add('2023-03-30 22:23:45.23452', 8) \| +-------------------------------------------+ \| 2023-03-31 06:23:45.23452 \| +-------------------------------------------+ 1 row in set (0.03 sec) mysql [test]>select date_add('2023-03-30 22:23:45.23452',8); +------------------------------------------+ \| days_add('2023-03-30 22:23:45.23452', 8) \| +------------------------------------------+ \| 2023-04-07 22:23:45.23452 \| +------------------------------------------+ 1 row in set (0.00 sec) ```	2023-06-19 23:44:30 +08:00
starocean999	e6f50c04f1	[fix](nereids)SubqueryToApply rule lost is null condition (#20971 ) * [fix](nereids)SubqueryToApply rule lost is null condition	2023-06-19 23:43:40 +08:00
minghong	f20ef165fe	[opt](Nereids) update join stats derive (#20895 ) in hash join condition, some equals are trustable, some are not. an equal is trustable if one side is almost unique, like primary key. for such equal condition we could estimate more accurate. the problem is in rewriten q20, the are 2 equal condition, one is trustable, another is not. But we treat both of them as trustable. Test result: on tpch100, from 2.2 sec to 0.44 sec no impact on tpch other queries no performance impact on tpcds queries	2023-06-19 23:40:44 +08:00
DeadlineFen	2a294801f1	Revert "[Test](regression) CCR syncer thrift interface regression test (#20935 )" (#20990 ) This reverts commit dd482b74c849b022862e7cfb1f1d0b933a84e3d2.	2023-06-19 21:38:03 +08:00
Yongqiang YANG	dd5ecea36a	[fix](compress) snappy does not work right (#20934 )	2023-06-19 14:11:10 +08:00
TengJianPing	fb9fcf460a	[fix](leftjoin) fix bug of left and full join with other conjuncts (#20946 ) Fix bug of left and full outer join with other conjuncts. When equal matched row count of a probe row exceed batch_size, some times the _join_node->_is_any_probe_match_row_output flag is not set correcty, which result in outputing extra rows for the probe row.	2023-06-19 12:27:06 +08:00
Pxl	85c5d7c6a9	[Chore](materialized-view) add ssb_flat mv test case (#20869 ) add ssb_flat mv test case	2023-06-19 10:51:50 +08:00
Zhiyu Hu	1efd345963	[Enhancement](table) adding information_schema.parameters table (#20259 ) this is a virtual table for compatibility information_schema parameters table	2023-06-19 09:05:46 +08:00
Siyang Tang	8366ce7a81	[enhancement](insert-stmt) Make `insert into tbl values();` compatible with mysql (#20694 )	2023-06-18 19:56:07 +08:00
jakevin	ac3290021d	[fix](Nereids): MergeSetOperations can merge SetOperation ALL. (#20902 )	2023-06-18 17:49:03 +08:00
mch_ucchi	5ae14549d1	[Feature](Nereids) support delete using syntax to delete data from unique key table (#20452 )	2023-06-18 16:22:21 +08:00
DeadlineFen	dd482b74c8	[Test](regression) CCR syncer thrift interface regression test (#20935 )	2023-06-18 00:13:09 +08:00
zy-kkk	fe18cfa2fb	[improvement](pg jdbc)Support for automatically obtaining the precision of the postgresql timestamp type (#20909 )	2023-06-16 23:41:09 +08:00
zy-kkk	367f64e7bd	[improvement](jdbc) support insert autoinc and default value column to mysql (#20765 ) In JdbcMysqlClient, I've added methods to retrieve auto-increment and default value columns from MySQL. These columns are then mapped into Doris metadata to make them visible to users. When handling the InsertStmt into an execution plan, Doris used to automatically fill in NULL or default values for columns not specified in the InsertStmt. However, in the JDBC catalog, we don't need Doris to handle these unspecified columns, so I've made changes to skip them directly. For the insert prepared statement required for writing, our previous behavior was to obtain all columns for placeholders. So, the change I made is to pass in the columns processed by the execution plan during the sink task generation stage for dynamic generation.	2023-06-16 23:38:11 +08:00
zy-kkk	e834637a5b	[improvement](ck jdbc) Support for automatically getting the precision of clickhouse's datetime64 type (#20887 )	2023-06-16 23:37:30 +08:00
minghong	bf197ee8d2	[opt](nereids) adjust cost model for BroadCastJoin and PartitionJoin (#20713 ) we add penalty for broadcast join (bc for brief in the following). the intuition of penalty is as follow: 1. if the build side is very small (< 1M), we prefer bc, and set `penalty=1`, which means no penalty 2. if build side is more than 1M, we consider the ratio of the probe row count to the build row count. the less the ratio is, the higher penalty is. this pr has positive impact on tpch queries. Only q3 is changed. in out test (tpch 1T, 3BE) q3 improved from 5.1sec to 2.5 sec. this pr has positive impact on tpcds queries. test on tpcds sf100 (3BE), cold run improve from 163 sec to 156 sec, hot run improves from 155 sec to 149 sec	2023-06-16 22:49:04 +08:00
morrySnow	5dc0f90c7f	[opt](Nereids) revert convert IN with 2 options to OR expression rule (#20894 ) revert this rule because it has negative effect on predicate push-down-to-storage-layer	2023-06-16 19:11:37 +08:00
mch_ucchi	5573858cb4	[Enhance](regression-test) use another db than test_query_db for nereids_p0 (#19467 ) replace test_query_db to nereids_test_query_db in nereids_p0 directory split test_join into five files to run faster.	2023-06-16 16:11:14 +08:00
yuxuan-luo	97135a1cbb	[Feature] (json)add json_contains function (#20824 )	2023-06-16 15:10:12 +08:00
Mryange	179b933ccc	[test](regression) update some case in p2 (#20683 )	2023-06-16 10:09:22 +08:00
zy-kkk	d9b3c2aba2	[improvement](jdbc) support support get mysql information_schema's table and clickhouse system's table (#20768 )	2023-06-15 14:53:51 +08:00
Pxl	01e53f4e67	[Bug](materialized-view) fix problems about create mv on ssb_flat q4.1 failed (#20658 ) fix problems about create mv on ssb_flat q4.1 failed	2023-06-15 14:38:21 +08:00
zy-kkk	09d187ec77	[improvement](ck jdbc) Optimized reading of datetime and ip types of the ClickHouse JDBC Catalog (#20804 )	2023-06-14 23:28:08 +08:00
Pxl	a0d4f11667	[Bug](function) catch error state in function cast to avoid core dump (#20751 ) catch error state in function cast to avoid core dump	2023-06-14 17:34:34 +08:00
YueW	1c9f107185	[feature](nereids) support match syntax (#20781 ) Support match syntax in nereids. match syntax use like: ```sql select * from test where msg match "hello"; select * from test where msg match_any "hello"; select * from test where msg match_all "hello hi"; select * from test where msg match_phrase "hello world"; ``` `match` is same as `match_any`. the pr of match syntax in original planner: https://github.com/apache/doris/pull/14211	2023-06-14 17:30:27 +08:00
zy-kkk	affe36d32e	[test](find_in_set) add find_in_set function test case (#20718 )	2023-06-14 09:43:48 +08:00
starocean999	7636dd1fdc	[fix](nereids) always use colocate scan when agg's fragment has olap scan (#20695 )	2023-06-13 17:59:17 +08:00
starocean999	7942bd0bf9	[fix](planner) cast string literal to date like type should not be an implict cast (#20709 ) 1. cast string literal to date like type should not be an implict cast 2. the string representation of float like type should not be scientific notation 3. the data type of like function's regex expr should be string type even if it's a null literal 4. add -Xss4m in fe.conf to prevent stack overflow in some case	2023-06-13 17:57:14 +08:00
TengJianPing	feb21fc9e9	[fix](group_concat) use default seperator ',' instead of ', ' for group_concat, to be consistant with mysql (#20741 )	2023-06-13 17:20:29 +08:00
TengJianPing	2adf5169e6	[improvement](test) improve p2 case of githubevents (#20727 ) Check rows of github_events table after restore finish.	2023-06-13 14:31:24 +08:00

1 2 3 4 5 ...

1338 Commits