doris

Author	SHA1	Message	Date
xy720	0732eb54bc	[feature](struct-type) support csv format stream load for struct type (#17143 ) Refactor from_string method in data_type_struct.cpp to support csv format stream load for struct type.	2023-03-01 15:48:48 +08:00
Pxl	62440f3140	[Bug](Materialized-View) forbiden mv rewrite on create view and remove duplicate method getIsM… (#17194 ) 1. forbiden mv rewrite on create view to avoid select fail 2. remove duplicate method getIsMaterialized	2023-03-01 13:46:56 +08:00
Gabriel	b8ebcdff78	[Bug](bloomfilter) Fix wrong result using bloomfilter with date type (#17225 )	2023-03-01 12:29:20 +08:00
Gabriel	979cf42d7a	[Bug](decimalv3) Use correct decimal scale for function round (#17232 ) Co-authored-by: maochongxin <maochongxin@gmail.com>	2023-03-01 12:28:41 +08:00
zhengyu	62ec74f4e7	segcompaction featuring verticalcompaction (#16731 ) This patchset applies the following changes: using vertical compaction machanism to do segcompaction basic (WIP) refraction to separate segcompaction logic from BetaRowsetWriter add segcompaction specific ut and regression tests	2023-03-01 10:55:40 +08:00
morrySnow	1b58f7f2ea	[fix](Nereids) json object and json array should always not nullable (#17205 )	2023-02-28 20:26:21 +08:00
huangzhaowei	9bcc3ae283	[Fix](DOE)Fix be core dump when parse es epoch_millis date format (#17100 )	2023-02-28 20:09:35 +08:00
谢健	94cea0ea6d	[fix](Nereids) Disable preagg when there is DELETE_SIGN filter (#17157 ) 1. disable preAgg when there is delete sign when binding relation 2. keep the preAgg status in SelectMaterializeIndex rule	2023-02-28 19:59:05 +08:00
lvliang	34813bae13	[improvement](meta) make database,table,column names to support unicode (replace PR #13467 with this) (#14531 ) Make database, table, column and other names support unicode by changing LABEL_REGEX COMMON_NAME_REGIEX COMMON_TABLE_NAME_REGEX COLUMN_NAME_REGEX regular expressions in class FeNameFormat. P.S. @SharpRay has transfered PR #13467 to me, and I‘m responsible for the task now. There will be some modifications during the review period, so I create a new PR and the original #13467 could be closed. Thanks.	2023-02-28 18:50:36 +08:00
mch_ucchi	727853017c	[regression-test](Nereids) add agg function, tvf, generator, window function test cases (#16824 ) add agg_function, tvf, generator, window_function test for nereids and add more feature to gen.py	2023-02-28 17:51:39 +08:00
zhangstar333	1dd2a41e38	[vectorized](bug) fix window function can't handle first row of beyond (#17084 ) Issue Number: close #16845	2023-02-28 17:30:23 +08:00
HappenLee	3e40467ce6	[Bug](vec) Fix chinese pinyin order by (#17152 ) bug: some chinese word not sort by pinyin in GBK coding CREATE TABLE `test_convert` ( `a` varchar(100) NULL ) ENGINE=OLAP DUPLICATE KEY(`a`) DISTRIBUTED BY HASH(`a`) BUCKETS 3 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); insert into test_convert values("b"), ("a"), ("c"), ("睿"), ("多"), ("丝"); Query OK, 6 rows affected (0.03 sec) {'label':'insert_ca73a6acc2194d5b_888218a3949355a6', 'status':'VISIBLE', 'txnId':'18068'} mysql [test]>select * from test_convert; +------+ \| a \| +------+ \| a \| \| c \| \| 丝 \| \| b \| \| 多 \| \| 睿 \| +------+ 6 rows in set (0.01 sec) mysql [test]>select * from test_convert order by convert(a using gbk); +------+ \| a \| +------+ \| a \| \| b \| \| c \| \| 多 \| \| 丝 \| \| 睿 \| +------+ 6 rows in set (0.01 sec)	2023-02-28 14:29:56 +08:00
camby	4d8b310de0	[fix](struct-type) fix struct subtype support (#17081 ) 1. Make sure all sub types which STRUCT supported work correctly; 2. remove unused variable `_need_validate_data`; 3. lazy init min or max decimal to support nested DecimalV2 column validate; Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2023-02-28 11:37:07 +08:00
luozenglin	1771d1e5e7	[fix](value-range) fix the value range of non-nullable column contains null causes query short key index error. (#16943 ) * [fix](value-range) fix the value range of non-nullable column contains null causes query short key index error.	2023-02-28 11:15:32 +08:00
luozenglin	17c8123371	[test](regression) add some regression cases on constant evaluation. (#16599 )	2023-02-28 10:57:37 +08:00
Jibing-Li	76e539dbda	[Improvement](multi catalog)(nereids)Support JDBC external table for new planner. (#17063 ) Support JDBC external table for Nereids planner. JDBC table is another type of table, like olap table, hms table and so on.	2023-02-28 09:43:04 +08:00
xueweizhang	e0cd8599d2	[fix](delete) fix delete from bug which can get wrong result (#17146 ) 理论上，如果是两次独立的删除，比如delete from table where a=1; delete from table where a=2;其实这个地方应该可以使用的，但是目前的代码，是把所有不同版本的delete predicates和不同列的delete predicates都放到一起了，失去了版本信息、失去了谓词间可能是and的关系，统一弱化成了delete predicates都是独立的，有一个delete predicates满足条件，就把page都去掉。这个pr的修改方式，就是在当前代码的基础上，当只有一个delete predicate的时候才能保证后续淘汰page的正确性，所以这里一律加了 == 1的判断才传递delete predicates。如果要把不同版本的delete predicates和不同列的delete predicates作为完整和严谨的逻辑去判断page，需要修改的设计就有点多了，目前的方案算是一种优先解决bug的思路，后续可以进一步把delete predicates这块加速zone判断进行page淘汰的逻辑完善，提高delete predicates使用的场景。	2023-02-28 09:20:10 +08:00
Jibing-Li	dd1bd6d8f1	[Fix](multi catalog)Support hive default partition. (#17179 ) Hive store all the data without partition columns to a default partition named __HIVE_DEFAULT_PARTITION__. Doris will fail to get the this partition when the partition column type is INT or something else that __HIVE_DEFAULT_PARTITION__ couldn't convert to. This pr is to support hive default partition, set the column value to NULL for the missing partition columns.	2023-02-28 00:08:29 +08:00
huangzhaowei	d3a6cab716	[Fix](MySQLLoad) Fix load a big local file bug since bytebuffer from mysql packet using the same byte array (#16901 ) Loading a big local file will cause `INTERNAL_ERROR]too many filtered rows` issue since the bytebuffer from mysql client always use the same byte array. And the later bytes will overwrite the previous one and make wrong bytes order among the network. Copy the byte array and then fill it into network.	2023-02-28 00:06:44 +08:00
奕冷	c0360f80bb	[enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases (#15339 ) Enhance aggregate function `collect_set` and `collect_list` to support optional `max_size` param, which enables to limit the number of elements in result array.	2023-02-27 14:22:30 +08:00
lihangyu	29dc08fc45	[Optimize](simd json reader) Cached search results for previous row (keyed as index in JSON object) - used as a hint. (#17124 ) * [Optimize](simd json reader) Cached search results for previous row (keyed as index in JSON object) - used as a hint. `_simdjson_set_column_value` could become a hot spot while parsing json in simdjson mode, introduce `_prev_positions` to cache results for previous row (keyed as index in JSON object) due to the json name field order, should be quite the same between each lines * fix case	2023-02-27 10:39:22 +08:00
DuRipeng	aefcc98715	[Enhancement](datetimev2-enhance) support 'microseconds_sub' function for datetimev2 (#17130 ) Based on #16970 , introduce microseconds_sub function for datetimev2	2023-02-27 08:47:30 +08:00
morrySnow	469b6b8466	[enhancement](Nereids) datetime v2 type precision derive (#17079 )	2023-02-26 22:33:55 +08:00
Tiewei Fang	3a9aa03aab	[BugFix](oracle-catalog) Modify the doris data type mapping of oracle `NUMBER(p,s)` type (#17051 ) The data type `NUMBER(p,s)` of oracle has some different of doris decimal type in semantics. For Oracle Number(p,s) type： 1. if s<0 , it means this is an Interger. This `NUMBER(p,s)` has (p+\|s\| ) significant digit, and rounding will be performed at s position. eg: if we insert 1234567 into `NUMBER(5,-2)` type, then the oracle will store 1234500. In this case, Doris will use int type (`TINYINT/SMALLINT/INT/.../LARGEINT`). 2. if s>=0 && s<p , it just like doris Decimal(p,s) behavior. 3. if s>=0 && s>p, it means this is a decimal(like 0.xxxxx). p represents how many digits can be left to the left after the decimal point, the figure after the decimal point s will be rounded. eg: we can not insert 0.0123456 into `NUMBER(5,7)` type, because there must be two zeros on the right side of the decimal point, we can insert 0.0012345 into `NUMBER(5,7)` type. In this case, Doris will use `DECIMAL(s,s)` 4. if we don't specify p and s for `NUMBER(p,s)` like `NUMBER`, the p and s of `NUMBER` are uncertain. In this case, doris can not determine p and s, so doris can not determine data type.	2023-02-26 09:05:41 +08:00
Pxl	2db4a981b3	[Feature](Materialized-View) forbiden rename column on materialized view (#17030 ) forbiden rename column on materialized view	2023-02-24 21:28:31 +08:00
YangShaw	c53b6a9532	[fix](Nereids) fix nullable() of lead/lag (#17014 ) fix bug when we use NULL as default value for window function lead() and lag()	2023-02-24 21:27:44 +08:00
YueW	5f2dad29ca	[enhancement](inverted index) Support inverted index without specified parser to use match query (#17110 )	2023-02-24 20:34:55 +08:00
ZhaoChangle	b5d67781a2	[Fix](function)fix datatime-diff function's overflow (#16935 )	2023-02-24 20:06:06 +08:00
Pxl	0691586eb7	[Chore](regression-test) add createMV action && add some mv case from fe ut MaterializedViewFunctionTest (#16825 ) 1. add createMV action 2. add some mv case from fe ut MaterializedViewFunctionTest 3. reduce mv scheduler interval time from 10s to 0.3s	2023-02-24 16:35:37 +08:00
AKIRA	cf5bc9594b	[fix](planner) conjuncts of the outer query block didn't work when it's on the results expr of inline view (#17036 ) Here is a cases: select id, name from (select '123' as id, '1234' as name, age from test_insert ) a where name != '1234';	2023-02-24 15:27:34 +08:00
AlexYue	c39914c0a0	[feature](partition)add default list partition (#15509 ) This pr implements the list default partition referred in related #15507. It's similar as GreenPlum's default's partition which would store all data not satisfying prior partition key's constraints and optimizer wouldn't filter default partition which means default partition would be scanned each time you try to select data from one table with default partition. User could either create one table with default partition or alter add one default partition. ```sql PARTITION LIST(key) { PARTITION p1 values in (xx,xx), PARTITION DEFAULT } ALTER TABLE XXX ADD PARTITION DEFAULT ``` We don't support automatically migrate data inside default partition which meets newly added partition key's constraint to newly add partition when alter add new partition. User should select default partition using new constraints as predicate and insert them to new partition. ```sql insert into tbl select * from tbl partition default where partition_key=xx; ```	2023-02-24 15:24:59 +08:00
starocean999	479d57df88	[fix](planner) the project expr should be calculated in join node in some case (#17035 ) Consider the sql bellow: select sum(cc.qlnm) as qlnm FROM outerjoin_A left join (SELECT outerjoin_B.b, coalesce(outerjoin_C.c, 0) AS qlnm FROM outerjoin_B inner JOIN outerjoin_C ON outerjoin_B.b = outerjoin_C.c ) cc on outerjoin_A.a = cc.b group by outerjoin_A.a; The coalesce(outerjoin_C.c, 0) was calculated in the agg node, which is wrong. This pr correct this, and the expr is calculated in the inner join node now.	2023-02-24 15:20:05 +08:00
TengJianPing	883f575cfe	[fix](string function) fix wrong usage of iconv_open (#17048 ) * [fix](string function) fix wrong usage of iconv_open Also add test case for function convert * fix test case	2023-02-24 09:13:10 +08:00
qiye	92ecd16573	(feature)[DOE]Support array for Doris on ES (#16941 ) * (feature)[DOE]Support array for Doris on ES	2023-02-23 19:31:18 +08:00
lihangyu	526a66e9fb	[Function](array-type) support array_apply (#17020 ) Filter array to match specific binary condition ``` mysql> select array_apply([1000000, 1000001, 1000002], '=', 1000002); +-------------------------------------------------------------+ \| array_apply(ARRAY(1000000, 1000001, 1000002), '=', 1000002) \| +-------------------------------------------------------------+ \| [1000002] \| +-------------------------------------------------------------+ ```	2023-02-23 17:38:16 +08:00
zhannngchen	edead494cb	[Enhancement](storage) add a new hidden column __DORIS_VERSION_COL__ for unique key table (#16509 )	2023-02-23 15:47:17 +08:00
xy720	91fc9fae8e	[Bug](complex-type) Fix is null predicate in delete stmt for array/struct/map type (#17018 )	2023-02-23 15:06:49 +08:00
morrySnow	37960e83d3	[test](Nereids) add ssb sf0.1 p1 regression case (#17046 )	2023-02-23 12:25:10 +08:00
DuRipeng	e65a061256	[Enhancement](datetimev2-enhance) support 'microseconds_add' function for datetimev2 (#16970 ) support 'microseconds_add' function for datetimev2	2023-02-22 17:49:41 +08:00
morrySnow	7956800df7	[refactor](Nereids) let type coercion same with legacy planner (#16844 ) - change for Nereids 1. add a variable length parameter to the ctor of Count for a good error reporting of Count(a, b) 2. refactor StringRegexPredicate, let it inherit from ScalarFunction 3. remove useless class TypeCollection 4. use catalog.Type.Collection to check expression arguments type 5. change type coercion for TimestampArithmetic, divide, integral divide, comparison predicate, case when and in predicate. Let them same as legacy planner. - change for legacy planner 1. change the common type of floating and Decimal from Decimal to Double	2023-02-22 17:29:37 +08:00
AKIRA	a95f47ac0a	[ehancement](planner) Support filter the output of set operation node (#16666 )	2023-02-21 19:22:09 +08:00
lihangyu	113023fb86	(Enhancement)[load-json] support simdjson in new json reader (#16903 ) be config: enable_simdjson_reader=true related PR #11665	2023-02-21 11:31:00 +08:00
Xin Liao	3a5e8f83e8	[fix](merge-on-write) fix that be may coredump when sequence column is null (#16832 ) To facilitate the use of the primary key index, encode the seq column to the minimum value of the corresponding length when the seq column is null.	2023-02-20 16:25:52 +08:00
Pxl	ce3afe7f13	[Enchancement](Materialized-View) forbiden some case in create mv with group by and fix select fail on g… (#16820 ) 1. forbiden some case in create mv with group by select k1+1,sum(abs(k2+2)+k3+3) from d_table group by k1; 2. fix select fail on grouping column have diffrent expr with select list create materialized view k1p2ap3psg as select k1+1,sum(abs(k2+2)+k3+3) from d_table group by k1+1; mysql [test]>explain select k1+1,sum(abs(k2+2)+k3+3) from d_table group by k1; ERROR 1105 (HY000): errCode = 2, detailMessage = select list expression not produced by aggregation output (missing from GROUP BY clause?): `k1` + 1	2023-02-20 13:04:50 +08:00
zhangstar333	5291f14aff	[vectorized](udf) java udf support array type (#16841 )	2023-02-20 10:00:25 +08:00
amory	8b70bfdc31	[Feature](map-type) Support stream load and fix some bugs for map type (#16776 ) 1、support stream load with json, csv format for map 2、fix olap convertor when compaction action in map column which has null 3、support select outToFile for map 4、add some regression-test	2023-02-19 15:11:54 +08:00
zhengshengjun	e2e6a0dd83	[Feature](load) Support mutable property for partition (#16036 ) The background is described in this issue: #15723, where users used Apache Druid to satisfy such lambada requirements before. We will not make Doris dropping data not belonged to current time window automatically like Druid, which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way: 1. Support mutable property for a partition. 2. The mutable property of a partition is passed from FE to BE in a load procedure 3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio', so that data write to immutable partition will be neglected and not cause load failure. Use Example: 1. Add immutable partition or modify an partition to be immutable: - alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true'); - alter table test_tbl modify partition xx set ('mutable' = 'false'); 2. Write 5 records into table, two of then belongs to immutable partition	2023-02-18 23:09:34 +08:00
ZhaoChangle	d6a841409f	[Enhancement](func)Introduce non_nullable extraction function. #16621 Introduced a new function non_nullable to BE, which can extract concrete data column from a nullable column. If the input argument is already not a nullable column, raise an error.	2023-02-18 20:44:07 +08:00
AKIRA	861e4bc64a	[fix](planner) Nullable of slot descriptor is mistaken and cause BE crash #16862	2023-02-18 20:39:56 +08:00
morrySnow	9b94729c87	Revert "[test](pipeline) Run nereids cases in p1/p2 (#16130 )" (#16792 ) This reverts commit b480db2e119ac0516e8621ea3d53c40f250c1d24.	2023-02-17 18:48:27 +08:00

1 2 3 4 5 ...

856 Commits