doris

Author	SHA1	Message	Date
TengJianPing	2019bb3870	[fix](bitmap) fix wrong result of bitmap intersect functions (#22735 ) * [fix](bitmap) fix wrong result of bitmap intersect functions * fix test case	2023-08-09 18:31:24 +08:00
HappenLee	4608dcb2d9	[fix](agg) fix coredump caused by push down count aggregation (#22699 ) fix coredump caused by push down count aggregation	2023-08-09 10:21:20 +08:00
zzzzzzzs	66784cef71	[Enhancement](Load) Stream Load using SQL (#22509 ) This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first. thanks @Cai-Yao @yiguolei	2023-08-08 13:49:04 +08:00
starocean999	1617368ee1	[fix](planner) fix bug of push constant conjuncts through set operation node (#22695 ) when pushing down constant conjunct into set operation node, we should assign the conjunct to agg node if there is one. This is consistant with pushing constant conjunct into inlineview.	2023-08-08 12:25:42 +08:00
zhangguoqiang	91b15183e7	[enhance][external]enhance and fix external cases 0807 (#22689 ) enhance and fix external cases 0807	2023-08-08 10:53:08 +08:00
Mingyu Chen	c9dc715c5d	[fix](broker-load) fix error when using multi data description for same table in load stmt (#22666 ) For load request, there are 2 tuples on scan node, input tuple and output tuple. The input tuple is for reading file, and it will be converted to output tuple based on user specified column mappings. And the broker load support different column mapping in different data description to same table(or partition). So for each scanner, the output tuples are same but the input tuple can be different. The previous implements save the input tuple in scan node level, causing different scanner using same input tuple, which is incorrect. This PR remove the input tuple from scan node and save them in each scanners.	2023-08-07 20:03:03 +08:00
Mryange	bc697ca9d6	[fix](time) fix error in time_to_sec	2023-08-07 17:33:24 +08:00
AlexYue	f036cdfde6	[feature](compaction) support delete in cumulative compaction (#19609 )	2023-08-07 15:22:21 +08:00
Mingyu Chen	c31226b144	[refractor](regression-test) sort out test cases of external tables (#22640 ) sort out the test cases of external table. After modify, there are 2 directories: 1. `external_table_p0`: all p0 cases of external tables: hive, es, jdbc and tvf 2. `external_table_p2`: all p2 cases of external tables: hive, es, mysql, pg, iceberg and tvf So that we can run it with one line command like: ``` sh run-regression-test.sh --run -d external_table_p0,external_table_p2 ```	2023-08-07 11:12:30 +08:00
czzmmc	1a8a1e5b16	[Feature](count_by_enum) support count_by_enum function (#22071 ) count_by_enum(expr1, expr2, ... , exprN); Treats the data in a column as an enumeration and counts the number of values in each enumeration. Returns the number of enumerated values for each column, and the number of non-null values versus the number of null values.	2023-08-06 16:05:14 +08:00
zhangstar333	d3b50e3b2a	[BUG](date_trunc) fix date_trunc function only handle lower string (#22602 ) fix date_trunc function only handle lower string	2023-08-05 12:53:13 +08:00
zzzxl	fe6bae2924	[fix](invert index) supports utf8 and non-utf8 strings (#22570 ) supports utf8 and non-utf8 strings: [fix] compatible with utf8 and invalid utf8 doris-thirdparty#110	2023-08-05 12:52:53 +08:00
Xujian Duan	3024b82918	[fix](load)Fix wrong default value for char and varchar of reading json data (#22626 ) If a column is defined as: col VARCHAR/CHAR NULL and no default value. Then we load json data which misses column col, the result queried is not correct: +------+ \| col \| +------+ \| 1 \| +------+ But expect: +------+ \| col \| +------+ \| NULL \| +------+ --------- Co-authored-by: duanxujian <duanxujian@jd.com>	2023-08-05 12:47:27 +08:00
Kang	7fe08c74fe	[fix](inverted index) return empty result instead of error for empty match query (#22592 ) return empty result instead of error for empty match query as follows: `SELECT * FROM t WHERE msg MATCH ''` `SELECT * FROM t WHERE msg MATCH 'stop_word'`	2023-08-04 17:36:32 +08:00
starocean999	ef53a27887	[fix](nereids) allow in or exits subquery in binary operator (#22391 ) support subquery in binary operator like if( xx in ( subquery ), 1, 0 )	2023-08-04 15:35:19 +08:00
谢健	658d75c816	[feature](Nereids): normalize join condition after expanding or condition NLJ (#22555 )	2023-08-04 13:37:37 +08:00
minghong	f828a3d826	[shape](nereids) ssb sf100 plan shape check (#22596 )	2023-08-04 13:12:21 +08:00
minghong	62b1a7bcf3	[tpcds](nereids) add rule to eliminate empty relation #22203 1. eliminate emptyrelation, 2. const fold after filter pushdown	2023-08-04 12:49:53 +08:00
minghong	0e9fad4fe9	[stats](nereids) improve Anti join stats estimation #22444 No impact on TPC-H impact on TPC-DS 16/69/94 improved	2023-08-04 12:48:39 +08:00
mch_ucchi	3447a70b25	[Fix](planner)fix delete stmt contains where but delete all data. (#22563 )	2023-08-03 23:44:05 +08:00
amory	469886eb4e	[FIX](array)fix if function for array() #22553 [FIX](array)fix if function for array() #22553	2023-08-03 19:40:45 +08:00
谢健	4322fdc96d	[feature](Nereids): add or expansion in CBO(#22465 )	2023-08-03 13:29:33 +08:00
Ashin Gau	938f768aba	[fix](parquet) resolve offset check failed in parquet map type (#22510 ) Fix error when reading empty map values in parquet. The `offsets.back()` doesn't not equal the number of elements in map's key column. ### How does this happen Map in parquet is stored as repeated group, and `repeated_parent_def_level` is set incorrectly when parsing map node in parquet schema. ``` the map definition in parquet: optional group <name> (MAP) { repeated group map (MAP_KEY_VALUE) { required <type> key; optional <type> value; } } ``` ### How to fix Set the `repeated_parent_def_level` of key/value node as the definition level of map node. `repeated_parent_def_level` is the definition level of the first ancestor node whose `repetition_type` equals `REPEATED`. Empty array/map values are not stored in doris column, so have to use `repeated_parent_def_level` to skip the empty or null values in ancestor node. For instance, considering an array of strings with 3 rows like the following: `null, [], [a, b, c]` We can store four elements in data column: `null, a, b, c` and the offsets column is: `1, 1, 4` and the null map is: `1, 0, 0` For the `i-th` row in array column: range from `offsets[i - 1]` until `offsets[i]` represents the elements in this row, so we can't store empty array/map values in doris data column. As a comparison, spark does not require `repeated_parent_def_level`, because the spark column stores empty array/map values , and use anther length column to indicate empty values. Please reference: https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java Furthermore, we can also avoid store null array/map values in doris data column. The same three rows as above, We can only store three elements in data column: `a, b, c` and the offsets column is: `0, 0, 3` and the null map is: `1, 0, 0`	2023-08-02 22:33:10 +08:00
airborne12	0cd5183556	[Refactor](inverted index) refact tokenize function for inverted index (#22313 )	2023-08-02 19:12:22 +08:00
Mryange	ddd90855a9	[vectorized](udaf) java udaf support with map type (#22397 ) [vectorized](udaf) java udaf support with map type (#22397) * test * remove some unused * update * add case	2023-08-02 15:03:44 +08:00
Mryange	bf50f9fa7f	[fix](decimal) fix cast rounding half up with negative number (#22450 )	2023-08-01 21:47:42 +08:00
qiye	b8399148ef	[fix](DOE) es catalog not working with pipeline,datetimev2, array and esquery (#22046 )	2023-08-01 21:45:16 +08:00
minghong	d5d82b7c31	[stats](nereids) fix bug for avg-size (#22421 )	2023-08-01 17:13:00 +08:00
Pxl	8d16f1bb09	[Chore](materialized-view) update documentation about materialized-view and update test (#22350 ) update documentation about materialized-view and update test	2023-08-01 15:13:34 +08:00
Gabriel	7a2ff56863	[regression](fix) fix test_round case (#22441 )	2023-08-01 11:35:44 +08:00
Jerry Hu	c1f36639fd	[fix](sort) VSortedRunMerger does not return any rows with a large offset value (#22191 )	2023-07-31 22:28:13 +08:00
starocean999	450e0b1078	[fix](nereids) recompute logical properties in plan post process (#22356 ) join commute rule will swap the left and right child. This cause the change of logical properties. So we need recompute the logical properties in plan post process to get the correct result	2023-07-31 21:04:39 +08:00
LiBinfeng	3a1d678ca9	[Fix](Planner) fix parse error of view with group_concat order by (#22196 ) Problem: When create view with projection group_concat(xxx, xxx order by orderkey). It will failed during second parse of inline view For example: it works when doing "SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM test GROUP BY id" but when create view it does not work "create view test_view as SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM test GROUP BY id" Reason: when creating view, we will doing parse again of view.toSql() to check whether it has some syntax error. And when doing toSql() to group_concat with order by, it add seperate ', ' between second parameter and order by. So when parsing again, it would failed because it is different semantic with original statement. group_concat(`name`, "," ORDER BY id) ==> group_concat(`name`, "," , ORDER BY id) Solved: Change toSql of group_concat and add order by statement analyze() of group_concat in Planner cause it would work if we get order by from view statement and do not analyze and binding slot reference to it	2023-07-31 17:20:23 +08:00
amory	7261845b3d	[FIX](complex-type)fix complex type nested col_const (#22375 ) for array/map/struct in mysql_writer unpack_if_const only unpack self column not nested , so col_const should not used in nested column.	2023-07-31 14:53:18 +08:00
zclllyybb	f2919567df	[feature](datetime) Support timezone when insert datetime value (#21898 )	2023-07-31 13:08:28 +08:00
TengJianPing	79289e32dc	[fix](cast) fix wrong result of casting empty string to array date (#22281 )	2023-07-30 21:15:03 +08:00
Jibing-Li	03761c37cd	[Improvement](multi catalog) Support Iceberg, Paimon and MaxCompute table in nereids. (#22338 )	2023-07-29 21:43:35 +08:00
Mryange	47c2cc5c74	[vectorized](udf) java udf support with return map type (#22300 )	2023-07-29 12:52:27 +08:00
daidai	ae8a26335c	[opt](hive)opt select count() stmt push down agg on parquet in hive . (#22115 ) Optimization "select count() from table" stmtement , push down "count" type to BE. support file type : parquet ，orc in hive . 1. 4kfiles , 60kwline num before: 1 min 37.70 sec after: 50.18 sec 2. 50files , 60kwline num before: 1.12 sec after: 0.82 sec	2023-07-29 00:31:01 +08:00
zhannngchen	53d255f482	[fix](partial update) remove CHECK on illegal number of partial columns (#22319 )	2023-07-28 23:11:58 +08:00
xzj7019	f7c106c709	[opt](nereids) enhance broadcast join cost calculation (#22092 ) Enhance broadcast join cost calculation, by considering both the build side effort from building bigger hash table, and more probe side effort from bigger cost of ProbeWhenBuildSideOutput and ProbeWhenSearchHashTable, if parallel_fragment_exec_instance_num is more than 1. Current solution gives a penalty factor on rightRowCount, and the factor is the total instance number to the power of 2. Penalty on outputRows is not taken currently and will be refined in next generation cost model. Also brings some update for shape checking: update original control variable in shape file parallel_fragment_exec_instance_num to parallel_pipeline_task_num, if pipeline is enabled. fix a be_number variable inactive issue.	2023-07-28 23:06:02 +08:00
Kaijie Chen	2f43e59535	[test](regression) add partial update seq_col delete cases (#22340 )	2023-07-28 17:36:55 +08:00
HHoflittlefish777	05abfbc5ef	[improvement](regression-test) add compression algorithm regression test (#22303 )	2023-07-28 17:28:52 +08:00
starocean999	5a0ad09856	[fix](nereids) SubqueryToApply may lost conjunct (#22262 ) consider sql: ``` SELECT * FROM sub_query_correlated_subquery1 t1 WHERE coalesce(bitand( cast( (SELECT sum(k1) FROM sub_query_correlated_subquery3 ) AS int), cast(t1.k1 AS int)), coalesce(t1.k1, t1.k2)) is NULL ORDER BY t1.k1, t1.k2; ``` is Null conjunct is lost in SubqueryToApply rule. This pr fix it	2023-07-28 15:08:56 +08:00
bobhan1	0c734a861e	[Enhancement](delete) eliminate reading the old values of non-key columns for delete stmt (#22270 )	2023-07-28 14:37:33 +08:00
morrySnow	5da5fac37a	[refactor](Nereids) add result sink node (#22254 ) use ResultSink as query root node to let plan of query statement has the same pattern with insert statement	2023-07-28 11:31:09 +08:00
zhangy5	adc44d9f46	[regression-test] add list partition case and multi partition keys case (#22042 ) * [regression-test] add list partition case and multi partition keys case * fix delete failed	2023-07-28 10:12:35 +08:00
Ashin Gau	0d7d9b92db	[fix](multi-catalog) complex types parsing failed, with unexpected nulls and rows (#22228 ) Fix tow bugs: 1. Unexpected null values in array column. If 65535 consecutive values are not null in nullable array column, this error will be triggered. The reason is that the array parser did not handle boundary conditions. 2. The number of rows of key filed, and that of value field in map column are not equal. Similarly, the number of rows among fields in struct column are not the same. This would be triggered when the number of rows are not equal among parquet pages of different columns in a row group.	2023-07-28 10:03:08 +08:00
Qi Chen	8caa5a9ba4	[Fix](mutli-catalog) Fix null partitions error in iceberg tables. (#22185 ) ### Issue when partition has null partitions, it throws error `Failed to fill partition column: t_int=null` ### Resolution - Fix the following null partitions error in iceberg tables by replacing null partition to '\N'. - Add regression test for hive null partition.	2023-07-27 23:57:35 +08:00
Jerry Hu	b5fa29e138	[fix](bitmap) incorrect result of function 'bitmap_from_array' (#22305 )	2023-07-27 22:44:06 +08:00

1 2 3 4 5 ...

1517 Commits