doris

Author	SHA1	Message	Date
xy720	48ef61780d	[refactor](struct-type) refactor and clean unused code for struct type (#17257 ) remove unused code for struct type	2023-03-01 15:49:31 +08:00
xy720	0732eb54bc	[feature](struct-type) support csv format stream load for struct type (#17143 ) Refactor from_string method in data_type_struct.cpp to support csv format stream load for struct type.	2023-03-01 15:48:48 +08:00
Gabriel	b8ebcdff78	[Bug](bloomfilter) Fix wrong result using bloomfilter with date type (#17225 )	2023-03-01 12:29:20 +08:00
Gabriel	979cf42d7a	[Bug](decimalv3) Use correct decimal scale for function round (#17232 ) Co-authored-by: maochongxin <maochongxin@gmail.com>	2023-03-01 12:28:41 +08:00
zhengyu	62ec74f4e7	segcompaction featuring verticalcompaction (#16731 ) This patchset applies the following changes: using vertical compaction machanism to do segcompaction basic (WIP) refraction to separate segcompaction logic from BetaRowsetWriter add segcompaction specific ut and regression tests	2023-03-01 10:55:40 +08:00
Yongqiang YANG	e687f3badd	Revert "[feature-wip](BE http)Support BE http service using brpc (#16123 )" (#17219 ) This reverts commit 049ecccc578802496e5421db19e21e7eb256699d. Merge back after streamload is handled.	2023-03-01 09:18:25 +08:00
Ashin Gau	2f471de675	[fix](FileCache) load file cache before start up daemon threads (#17199 ) Daemon threads in doris_main.cpp will upload tablet metrics periodically, which will use StorageEngine::instance(). However loading file cache is a process in main thread, when it takes a lot of time to load file cache, StorageEngine::instance() will be a null pointer in daemon threads.	2023-03-01 08:35:57 +08:00
yiguolei	e22a9ecc3b	[enhancement](execute model) using thread pool to execute report or join task instead of staring too many thread (#17212 ) * [enhancement](execute model) using thread pool to execute report or join task instead of staring too many thread Doris will start report thread and join thread during fragment execution. There are many problems if create and destroy thread very frequently. Jemalloc may not behave very well, it may crashed. jemalloc/jemalloc#1405 It is better to using thread pool to do these tasks. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-01 08:35:27 +08:00
WenYao	68e9a66aa0	[Enchancement](schema scanner) add SchemaScanner profile (#17230 ) Add some profile information to the schema scanner to facilitate performance optimization. Example: SchemaScanner: - FillBlockTime: 9s131ms - GetDbTime: 12.816ms - GetDescribeTime: 1s645ms - GetTableTime: 25.433ms	2023-03-01 08:34:27 +08:00
zxealous	7f6209ede4	[fix](routine load) fix be core dump while use routine load (#17222 )	2023-02-28 21:01:38 +08:00
huangzhaowei	9bcc3ae283	[Fix](DOE)Fix be core dump when parse es epoch_millis date format (#17100 )	2023-02-28 20:09:35 +08:00
Gabriel	459874be50	Revert "[Bug](log) add some log to find out bug (#16518 )" (#17178 ) This reverts commit d1c6b8114053e8c754c979d8d3fbf5c880d361d2.	2023-02-28 19:23:12 +08:00
lvliang	34813bae13	[improvement](meta) make database,table,column names to support unicode (replace PR #13467 with this) (#14531 ) Make database, table, column and other names support unicode by changing LABEL_REGEX COMMON_NAME_REGIEX COMMON_TABLE_NAME_REGEX COLUMN_NAME_REGEX regular expressions in class FeNameFormat. P.S. @SharpRay has transfered PR #13467 to me, and I‘m responsible for the task now. There will be some modifications during the review period, so I create a new PR and the original #13467 could be closed. Thanks.	2023-02-28 18:50:36 +08:00
zhangstar333	1dd2a41e38	[vectorized](bug) fix window function can't handle first row of beyond (#17084 ) Issue Number: close #16845	2023-02-28 17:30:23 +08:00
chenlinzhong	79e49dad93	[fix](brpc) solve bthread hang problem (#17206 )	2023-02-28 17:10:05 +08:00
Kang	f8e20ceca2	[Improvement](jsonb) add suport for JSONB type for arrow (#16869 ) add suport for JSONB type for arrow, which is used by doris spark/flink connector.	2023-02-28 17:04:13 +08:00
Jerry Hu	a1db5c6f52	[fix](vec) crash caused by not-implemented function in ColumnFixedLengthObject (#17215 )	2023-02-28 15:27:06 +08:00
HappenLee	3e40467ce6	[Bug](vec) Fix chinese pinyin order by (#17152 ) bug: some chinese word not sort by pinyin in GBK coding CREATE TABLE `test_convert` ( `a` varchar(100) NULL ) ENGINE=OLAP DUPLICATE KEY(`a`) DISTRIBUTED BY HASH(`a`) BUCKETS 3 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); insert into test_convert values("b"), ("a"), ("c"), ("睿"), ("多"), ("丝"); Query OK, 6 rows affected (0.03 sec) {'label':'insert_ca73a6acc2194d5b_888218a3949355a6', 'status':'VISIBLE', 'txnId':'18068'} mysql [test]>select * from test_convert; +------+ \| a \| +------+ \| a \| \| c \| \| 丝 \| \| b \| \| 多 \| \| 睿 \| +------+ 6 rows in set (0.01 sec) mysql [test]>select * from test_convert order by convert(a using gbk); +------+ \| a \| +------+ \| a \| \| b \| \| c \| \| 多 \| \| 丝 \| \| 睿 \| +------+ 6 rows in set (0.01 sec)	2023-02-28 14:29:56 +08:00
Ashin Gau	bf5037d6d5	[fix](OrcReader) typo in anaylize null values (#17156 ) typographical error in analyzing null values for OrcReader.	2023-02-28 14:29:13 +08:00
slothever	598038e674	[improvement](parquet-reader)support parquet data page v2 (#17054 ) Support parquet data page v2 Now the parquet data on AWS glue use data page v2, but we didn't support before.	2023-02-28 14:23:45 +08:00
camby	4d8b310de0	[fix](struct-type) fix struct subtype support (#17081 ) 1. Make sure all sub types which STRUCT supported work correctly; 2. remove unused variable `_need_validate_data`; 3. lazy init min or max decimal to support nested DecimalV2 column validate; Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2023-02-28 11:37:07 +08:00
luozenglin	1771d1e5e7	[fix](value-range) fix the value range of non-nullable column contains null causes query short key index error. (#16943 ) * [fix](value-range) fix the value range of non-nullable column contains null causes query short key index error.	2023-02-28 11:15:32 +08:00
plat1ko	26a46d8c3f	[fix](cooldown) Handle full clone with cooldowned rowsets (#17069 )	2023-02-28 11:04:01 +08:00
zhannngchen	00723e36cf	[enhancement](merge-on-write) add delete bitmap correctness check for single load (#17147 ) For Unique Key MoW table, if there are duplicate keys in one single load job and there's multiple segments, we need to calculate delete bitmap to mark these duplicate keys deleted. Add a check here to detect any bugs that might cause duplicate keys.	2023-02-28 10:06:36 +08:00
奕冷	049ecccc57	[feature-wip](BE http)Support BE http service using brpc (#16123 ) Now, streamload is not supported.	2023-02-28 09:59:29 +08:00
xueweizhang	e0cd8599d2	[fix](delete) fix delete from bug which can get wrong result (#17146 ) 理论上，如果是两次独立的删除，比如delete from table where a=1; delete from table where a=2;其实这个地方应该可以使用的，但是目前的代码，是把所有不同版本的delete predicates和不同列的delete predicates都放到一起了，失去了版本信息、失去了谓词间可能是and的关系，统一弱化成了delete predicates都是独立的，有一个delete predicates满足条件，就把page都去掉。这个pr的修改方式，就是在当前代码的基础上，当只有一个delete predicate的时候才能保证后续淘汰page的正确性，所以这里一律加了 == 1的判断才传递delete predicates。如果要把不同版本的delete predicates和不同列的delete predicates作为完整和严谨的逻辑去判断page，需要修改的设计就有点多了，目前的方案算是一种优先解决bug的思路，后续可以进一步把delete predicates这块加速zone判断进行page淘汰的逻辑完善，提高delete predicates使用的场景。	2023-02-28 09:20:10 +08:00
Zhengguo Yang	b51ce415e7	[Feature](load) Add submitter and comments to load job (#16878 ) * [Feature](load) Add submitter and comments to load job	2023-02-28 09:06:19 +08:00
zhannngchen	84413f33b8	[enhancement](merge-on-write) add skip_delete_bitmap session variable for debug purpose (#17127 )	2023-02-27 23:31:28 +08:00
Xin Liao	d5b1d3403f	[fix](merge-on-write) fix that the version of delete bitmap is incorrect when calculate delete bitmap between segments (#17095 ) Different version numbers are used to calculate the delete bitmap between segments and rowsets, resulting in the failure of the last update of the delete bitmap.	2023-02-27 17:17:25 +08:00
Pxl	b06f3da96c	[Bug] fix not close when pipeline context prepare failed (#17061 )	2023-02-27 14:24:39 +08:00
奕冷	c0360f80bb	[enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases (#15339 ) Enhance aggregate function `collect_set` and `collect_list` to support optional `max_size` param, which enables to limit the number of elements in result array.	2023-02-27 14:22:30 +08:00
Pxl	0723e55f76	[Bug](build) fix compile fail on unused value #17165 error: variable 'nullcount' set but not used [-Werror,-Wunused-but-set-variable] int nullcount = 0;	2023-02-27 14:19:44 +08:00
yiguolei	33acaa067b	[refactor](mempool) remove mempool parameter from key decoder methods (#17137 ) decode method is only used for big int and other decode method is only used in unit test. I remove the useless method and we can remove mempool parameter from decode method. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-27 11:16:14 +08:00
TengJianPing	aab8dad191	[fix](sort) fix bug of sort (#17151 ) The logic of topn and full sort is wrong when there are both offsets and limits, the offset is not considered when doing the max heap optimization, which will lead to wrong result.	2023-02-27 10:55:12 +08:00
lihangyu	29dc08fc45	[Optimize](simd json reader) Cached search results for previous row (keyed as index in JSON object) - used as a hint. (#17124 ) * [Optimize](simd json reader) Cached search results for previous row (keyed as index in JSON object) - used as a hint. `_simdjson_set_column_value` could become a hot spot while parsing json in simdjson mode, introduce `_prev_positions` to cache results for previous row (keyed as index in JSON object) due to the json name field order, should be quite the same between each lines * fix case	2023-02-27 10:39:22 +08:00
Xinyi Zou	857d38e24b	[fix](scan) Default enable function(Like) pushdown #17154 function pushdown: #10355 NGram BloomFilter Index apply like pushdown: #11579 Enabled by default, make sure it stays active. If NGram BloomFilter Index is not used, this like pushdown can be replaced by #15917, which can push down all expressions including like.	2023-02-27 09:58:37 +08:00
DuRipeng	aefcc98715	[Enhancement](datetimev2-enhance) support 'microseconds_sub' function for datetimev2 (#17130 ) Based on #16970 , introduce microseconds_sub function for datetimev2	2023-02-27 08:47:30 +08:00
Kang	7cb6c522b0	[Enhancement](array) vectorized string equal comparasion in array_contains function use StringRef instead of string_view operator == for vectorized impl for array_contains function. - test data: 10,000,000 rows with a ARRAY<STRING> column. There are 10 elements, average length 11 chars, in the array column in each row. - test SQL: `select count() from test_like_array where array_contains(s_arr, 'xxxxxxxx');` - test result: 0.76 sec vs. 0.52 sec, 30% time reduced	2023-02-26 19:42:26 +08:00
zxealous	a0782a1855	[fix](file reader) fix be core in broker file reader (#17039 ) A const reference member variables as class member stores a temporary object, which cannot be got after the temporary object being destroyed, cause be core dump while enable debug level log _broker_addr has been destroyed in BrokerFileReader	2023-02-26 12:35:31 +08:00
zhangstar333	94927b3b1c	[vectorized](bug) fix open fold constant cause be core dump (#17055 ) add a defer in fold constant to close. add more type when call _get_result function in fold constant.3. fix in can't handle null. eg:select 1 in (2, NULL, 1); in java udf jni_ctx will be nullptr, so call close will be core dump. Describe your changes.	2023-02-26 12:30:03 +08:00
Tiewei Fang	f6ce072297	[Enhencement](csv-reader) Optimize csv_reader `_split_value` and fix json_reader case sensitive (#17093 ) 1. Enhencement: For single-charset column separator，csv_reader use another method of `split value`. 2. BugFix Set `json` file format loading to be sensitive.	2023-02-26 09:03:04 +08:00
Ashin Gau	c43e521d29	[feature](multi-catalog) support map&struct type in parquet&orc reader (#17087 ) Support parsing map&struct type in parquet&orc reader. ## Remaining Problems 1. Doris use array type to build the key and value column of a `map`, but doesn't fill the offsets in value column, so the offsets in value column is wasted. 2. Parquet support reading only key or value column in `map`, this PR hasn't supported yet. 3. Parquet support reading partial columns in `struct`, this PR hasn't supported yet.	2023-02-26 08:55:39 +08:00
Ashin Gau	e42465ae59	[fix](OrcReader) handle null values in orc reader for string type (#17135 ) Orc doesn't fill null values in new batch, but the former batch has been release. Other types like int/long/timestamp... are flat types without pointer in them, so other types do not need to be handled separately like string.	2023-02-26 08:10:40 +08:00
shee	6eeba204f9	[Enhancement] path scan causes disk io to skyrocket (#16968 )	2023-02-25 09:15:15 +08:00
Xin Liao	c071c327e7	[fix](load) fix add broken tablet core dump (#17104 )	2023-02-24 23:59:03 +08:00
YueW	5f2dad29ca	[enhancement](inverted index) Support inverted index without specified parser to use match query (#17110 )	2023-02-24 20:34:55 +08:00
ZhaoChangle	b5d67781a2	[Fix](function)fix datatime-diff function's overflow (#16935 )	2023-02-24 20:06:06 +08:00
AlexYue	c39914c0a0	[feature](partition)add default list partition (#15509 ) This pr implements the list default partition referred in related #15507. It's similar as GreenPlum's default's partition which would store all data not satisfying prior partition key's constraints and optimizer wouldn't filter default partition which means default partition would be scanned each time you try to select data from one table with default partition. User could either create one table with default partition or alter add one default partition. ```sql PARTITION LIST(key) { PARTITION p1 values in (xx,xx), PARTITION DEFAULT } ALTER TABLE XXX ADD PARTITION DEFAULT ``` We don't support automatically migrate data inside default partition which meets newly added partition key's constraint to newly add partition when alter add new partition. User should select default partition using new constraints as predicate and insert them to new partition. ```sql insert into tbl select * from tbl partition default where partition_key=xx; ```	2023-02-24 15:24:59 +08:00
yiguolei	03a4fe6f39	[enhancement](streamload) make stream load context as shared ptr and save it in global load mgr (#16996 )	2023-02-24 11:15:29 +08:00
Tiewei Fang	be047f11aa	[BugFix](csv_reader) csv_reader support datev2/datetimev2 (#17031 )	2023-02-24 11:13:48 +08:00

1 2 3 4 5 ...

3915 Commits