doris

Author	SHA1	Message	Date
starocean999	27352afdf6	[fix](fe)support multi distinct group_concat (#17237 ) * [fix](fe)support multi distinct group_concat * update based on comments	2023-03-02 17:53:13 +08:00
Jerry Hu	823d968452	[fix](expr) avoid crashing caused by big depth of expression tree (#17314 )	2023-03-02 16:55:53 +08:00
Mingyu Chen	39f59f554a	[improvement](dry-run)(tvf) support csv schema in tvf and add "dry_run_query" variable (#16983 ) This CL mainly changes: Support specifying csv schema manually in s3/hdfs table valued function s3 ( 'URI' = 'https://bucket1/inventory.dat', 'ACCESS_KEY'= 'ak', 'SECRET_KEY' = 'sk', 'FORMAT' = 'csv', 'column_separator' = '\|', 'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)', 'use_path_style'='true' ) Add new session variable dry_run_query If set to true, the real query result will not be returned, instead, it will only return the number of returned rows. mysql> select * from bigtable; +--------------+ \| ReturnedRows \| +--------------+ \| 10000000 \| +--------------+ This can avoid large result set transmission time and focus on real execution time of query engine. For debug and analysis purpose.	2023-03-02 16:51:27 +08:00
yiguolei	17f4990bd3	[enhancement](functioncontext) function context should use shared ptr and simply function context (#17311 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-02 16:23:54 +08:00
xueweizhang	9f088f6e90	[feature](json) add json_valid function (#17247 ) add json_valid function Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2023-03-02 14:08:52 +08:00
xueweizhang	9155d8b9d1	[fix](delete) fix 'is null' or 'is not null' delete predicate will get wrong result (#17190 ) fix 'is null' or 'is not null' delete predicate will get wrong result Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2023-03-02 14:05:44 +08:00
YueW	707f814fc2	[fix](inverted index) fix still execute match query after drop inverted index (#17293 ) background： At the moment, match query must with inverted index, problem description: After drop inverted index which is the only index in table, there still can use match query for this index column. fix it: The index should be updated on BE regardless of whether the indexes_desc from FE is empty.	2023-03-02 11:12:54 +08:00
Mingyu Chen	30df268c1f	[fix](hdfs)(catalog) fix BE crash when hdfs-site.xml not exist in be/conf and fix compute node logic (#17244 ) We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml, if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash. This CL mainly changes: Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist. Refactor the HDFSCommonBuilder so that it can return error correctly. Add BE IP info in status, so that we can get ip from error msg like: ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file 000.snappy.orc, err: [INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends. This CL refactor this logic and also change some FE config: prefer_compute_node_for_external_table If set to true, query on external table will prefer to assign to compute node. And the max number of compute node is controlled by min_backend_num_for_external_table. If set to false, query on external table will assign to any node. min_backend_num_for_external_table Only take effect when prefer_compute_node_for_external_table is true. If the compute node number is less than this value, query on external table will try to get some mix node to assign, to let the total number of node reach this value. If the compute node number is larger than this value, query on external table will assign to compute node only.	2023-03-02 11:09:55 +08:00
yixiutt	de5112bd90	[bugfix](merger) traverse rs_meta in lock (#17271 ) tablet_schema(version) will traverse rowset_meta and it should call in meta_lock.	2023-03-02 09:47:44 +08:00
Xinyi Zou	b7677beab7	[enhancement](memtracker) Add special counter for memtracker and fix thread create and destroy track #17301 Add a special counter for memtracker, faster, but relaxed ordering and not accurate in real time Track thread create and destroy memory, which was previously removed due to performance loss and added back	2023-03-02 08:55:00 +08:00
Gabriel	d7ee542dd4	[refactor](function) refine function geo #17289 remove unused constant args	2023-03-02 08:42:16 +08:00
Pxl	527eb5b059	[Enchancement](function) nullable inline refactor of min_max_by/bitmap && add register_functio… (#17228 ) 1. nullable inline refactor of min_max_by/bitmap/group_concat/histogram/topn 2. add register_function_both method 3. add datetimev2 type creator of min_max_by 4. remove uint16/32/64 in FOR_INTEGER_TYPES	2023-03-02 00:00:01 +08:00
HappenLee	1244eed1cd	[Opt](exec) opt the dispose nullable column logic (#17192 )	2023-03-01 23:25:40 +08:00
Gabriel	633f2d52a4	[minor](log) add some logs (#17287 )	2023-03-01 22:41:50 +08:00
Gabriel	6de02f1f46	[minor](jvm) add more error logs for JNI (#17270 )	2023-03-01 22:09:57 +08:00
xueweizhang	34c5e84e9f	[fix](insert) fix txn error reason clearly (#16997 ) Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2023-03-01 20:28:41 +08:00
Tiewei Fang	f1db0d9501	[Enhencement](File Reader) delete old file_reader (#17261 ) * delete old file_reader * fix 1	2023-03-01 20:24:03 +08:00
YueW	b839353c2d	[fix](inverted index) fix BE coredump because of not ignore case ensitivity for column name when create index (#17276 )	2023-03-01 19:32:39 +08:00
Xinyi Zou	3871e989ac	[fix](memory) Avoid repeating meaningless memory gc #17258	2023-03-01 19:23:33 +08:00
Xinyi Zou	a1e3b908d7	[fix](memory) split mem usage thread and gc thread to different threads (#17213 ) Ensure that the memory status is refreshed in time Avoid frequent GC	2023-03-01 19:19:05 +08:00
xy720	48ef61780d	[refactor](struct-type) refactor and clean unused code for struct type (#17257 ) remove unused code for struct type	2023-03-01 15:49:31 +08:00
xy720	0732eb54bc	[feature](struct-type) support csv format stream load for struct type (#17143 ) Refactor from_string method in data_type_struct.cpp to support csv format stream load for struct type.	2023-03-01 15:48:48 +08:00
Gabriel	b8ebcdff78	[Bug](bloomfilter) Fix wrong result using bloomfilter with date type (#17225 )	2023-03-01 12:29:20 +08:00
Gabriel	979cf42d7a	[Bug](decimalv3) Use correct decimal scale for function round (#17232 ) Co-authored-by: maochongxin <maochongxin@gmail.com>	2023-03-01 12:28:41 +08:00
zhengyu	62ec74f4e7	segcompaction featuring verticalcompaction (#16731 ) This patchset applies the following changes: using vertical compaction machanism to do segcompaction basic (WIP) refraction to separate segcompaction logic from BetaRowsetWriter add segcompaction specific ut and regression tests	2023-03-01 10:55:40 +08:00
Yongqiang YANG	e687f3badd	Revert "[feature-wip](BE http)Support BE http service using brpc (#16123 )" (#17219 ) This reverts commit 049ecccc578802496e5421db19e21e7eb256699d. Merge back after streamload is handled.	2023-03-01 09:18:25 +08:00
Ashin Gau	2f471de675	[fix](FileCache) load file cache before start up daemon threads (#17199 ) Daemon threads in doris_main.cpp will upload tablet metrics periodically, which will use StorageEngine::instance(). However loading file cache is a process in main thread, when it takes a lot of time to load file cache, StorageEngine::instance() will be a null pointer in daemon threads.	2023-03-01 08:35:57 +08:00
yiguolei	e22a9ecc3b	[enhancement](execute model) using thread pool to execute report or join task instead of staring too many thread (#17212 ) * [enhancement](execute model) using thread pool to execute report or join task instead of staring too many thread Doris will start report thread and join thread during fragment execution. There are many problems if create and destroy thread very frequently. Jemalloc may not behave very well, it may crashed. jemalloc/jemalloc#1405 It is better to using thread pool to do these tasks. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-01 08:35:27 +08:00
WenYao	68e9a66aa0	[Enchancement](schema scanner) add SchemaScanner profile (#17230 ) Add some profile information to the schema scanner to facilitate performance optimization. Example: SchemaScanner: - FillBlockTime: 9s131ms - GetDbTime: 12.816ms - GetDescribeTime: 1s645ms - GetTableTime: 25.433ms	2023-03-01 08:34:27 +08:00
zxealous	7f6209ede4	[fix](routine load) fix be core dump while use routine load (#17222 )	2023-02-28 21:01:38 +08:00
huangzhaowei	9bcc3ae283	[Fix](DOE)Fix be core dump when parse es epoch_millis date format (#17100 )	2023-02-28 20:09:35 +08:00
Gabriel	459874be50	Revert "[Bug](log) add some log to find out bug (#16518 )" (#17178 ) This reverts commit d1c6b8114053e8c754c979d8d3fbf5c880d361d2.	2023-02-28 19:23:12 +08:00
lvliang	34813bae13	[improvement](meta) make database,table,column names to support unicode (replace PR #13467 with this) (#14531 ) Make database, table, column and other names support unicode by changing LABEL_REGEX COMMON_NAME_REGIEX COMMON_TABLE_NAME_REGEX COLUMN_NAME_REGEX regular expressions in class FeNameFormat. P.S. @SharpRay has transfered PR #13467 to me, and I‘m responsible for the task now. There will be some modifications during the review period, so I create a new PR and the original #13467 could be closed. Thanks.	2023-02-28 18:50:36 +08:00
zhangstar333	1dd2a41e38	[vectorized](bug) fix window function can't handle first row of beyond (#17084 ) Issue Number: close #16845	2023-02-28 17:30:23 +08:00
chenlinzhong	79e49dad93	[fix](brpc) solve bthread hang problem (#17206 )	2023-02-28 17:10:05 +08:00
Kang	f8e20ceca2	[Improvement](jsonb) add suport for JSONB type for arrow (#16869 ) add suport for JSONB type for arrow, which is used by doris spark/flink connector.	2023-02-28 17:04:13 +08:00
Jerry Hu	a1db5c6f52	[fix](vec) crash caused by not-implemented function in ColumnFixedLengthObject (#17215 )	2023-02-28 15:27:06 +08:00
HappenLee	3e40467ce6	[Bug](vec) Fix chinese pinyin order by (#17152 ) bug: some chinese word not sort by pinyin in GBK coding CREATE TABLE `test_convert` ( `a` varchar(100) NULL ) ENGINE=OLAP DUPLICATE KEY(`a`) DISTRIBUTED BY HASH(`a`) BUCKETS 3 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); insert into test_convert values("b"), ("a"), ("c"), ("睿"), ("多"), ("丝"); Query OK, 6 rows affected (0.03 sec) {'label':'insert_ca73a6acc2194d5b_888218a3949355a6', 'status':'VISIBLE', 'txnId':'18068'} mysql [test]>select * from test_convert; +------+ \| a \| +------+ \| a \| \| c \| \| 丝 \| \| b \| \| 多 \| \| 睿 \| +------+ 6 rows in set (0.01 sec) mysql [test]>select * from test_convert order by convert(a using gbk); +------+ \| a \| +------+ \| a \| \| b \| \| c \| \| 多 \| \| 丝 \| \| 睿 \| +------+ 6 rows in set (0.01 sec)	2023-02-28 14:29:56 +08:00
Ashin Gau	bf5037d6d5	[fix](OrcReader) typo in anaylize null values (#17156 ) typographical error in analyzing null values for OrcReader.	2023-02-28 14:29:13 +08:00
slothever	598038e674	[improvement](parquet-reader)support parquet data page v2 (#17054 ) Support parquet data page v2 Now the parquet data on AWS glue use data page v2, but we didn't support before.	2023-02-28 14:23:45 +08:00
camby	4d8b310de0	[fix](struct-type) fix struct subtype support (#17081 ) 1. Make sure all sub types which STRUCT supported work correctly; 2. remove unused variable `_need_validate_data`; 3. lazy init min or max decimal to support nested DecimalV2 column validate; Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2023-02-28 11:37:07 +08:00
luozenglin	1771d1e5e7	[fix](value-range) fix the value range of non-nullable column contains null causes query short key index error. (#16943 ) * [fix](value-range) fix the value range of non-nullable column contains null causes query short key index error.	2023-02-28 11:15:32 +08:00
plat1ko	26a46d8c3f	[fix](cooldown) Handle full clone with cooldowned rowsets (#17069 )	2023-02-28 11:04:01 +08:00
zhannngchen	00723e36cf	[enhancement](merge-on-write) add delete bitmap correctness check for single load (#17147 ) For Unique Key MoW table, if there are duplicate keys in one single load job and there's multiple segments, we need to calculate delete bitmap to mark these duplicate keys deleted. Add a check here to detect any bugs that might cause duplicate keys.	2023-02-28 10:06:36 +08:00
奕冷	049ecccc57	[feature-wip](BE http)Support BE http service using brpc (#16123 ) Now, streamload is not supported.	2023-02-28 09:59:29 +08:00
xueweizhang	e0cd8599d2	[fix](delete) fix delete from bug which can get wrong result (#17146 ) 理论上，如果是两次独立的删除，比如delete from table where a=1; delete from table where a=2;其实这个地方应该可以使用的，但是目前的代码，是把所有不同版本的delete predicates和不同列的delete predicates都放到一起了，失去了版本信息、失去了谓词间可能是and的关系，统一弱化成了delete predicates都是独立的，有一个delete predicates满足条件，就把page都去掉。这个pr的修改方式，就是在当前代码的基础上，当只有一个delete predicate的时候才能保证后续淘汰page的正确性，所以这里一律加了 == 1的判断才传递delete predicates。如果要把不同版本的delete predicates和不同列的delete predicates作为完整和严谨的逻辑去判断page，需要修改的设计就有点多了，目前的方案算是一种优先解决bug的思路，后续可以进一步把delete predicates这块加速zone判断进行page淘汰的逻辑完善，提高delete predicates使用的场景。	2023-02-28 09:20:10 +08:00
Zhengguo Yang	b51ce415e7	[Feature](load) Add submitter and comments to load job (#16878 ) * [Feature](load) Add submitter and comments to load job	2023-02-28 09:06:19 +08:00
zhannngchen	84413f33b8	[enhancement](merge-on-write) add skip_delete_bitmap session variable for debug purpose (#17127 )	2023-02-27 23:31:28 +08:00
Xin Liao	d5b1d3403f	[fix](merge-on-write) fix that the version of delete bitmap is incorrect when calculate delete bitmap between segments (#17095 ) Different version numbers are used to calculate the delete bitmap between segments and rowsets, resulting in the failure of the last update of the delete bitmap.	2023-02-27 17:17:25 +08:00
Pxl	b06f3da96c	[Bug] fix not close when pipeline context prepare failed (#17061 )	2023-02-27 14:24:39 +08:00

1 2 3 4 5 ...

3935 Commits