doris

Author	SHA1	Message	Date
Adonis Ling	982c5f06b5	[fix](build) Resolve the conflicts when building be with java-udf (#11938 )	2022-08-20 18:24:32 +08:00
Pxl	64dc3b360f	[Bug](function) fix dcheck fail on close vexpr ctx (#11908 )	2022-08-19 19:11:10 +08:00
carlvinhust2012	f66e42f848	[optimization](array-type) support the decimal/datetime as the nest type of array in print_value (#11784 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-08-19 17:59:09 +08:00
yixiutt	01bd7f224b	[bugifx](compaction) fix filter_delete if schema has sequence column (#11909 ) introduced in #11721. Use last column as delete sign, but if sequence column exist, it's wrong. Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-08-19 14:56:06 +08:00
Gabriel	1f9eec5462	[Regression](datev2) Add test cases for datev2/datetimev2 (#11831 )	2022-08-19 10:57:55 +08:00
chenlinzhong	7a505cf040	[remote-udaf](optimize) Optimize RPC exception handling logic (#11680 )	2022-08-19 10:25:01 +08:00
slothever	124b4f7694	[feature-wip](parquet-reader) row group reader ut finish (#11887 ) Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-08-18 17:18:14 +08:00
Gabriel	1da39771e3	[Bug](runtime filter) Fix bug for runtime filter in concurrent scanners (#11848 )	2022-08-18 14:47:08 +08:00
Gabriel	b8a33d2629	[Improvement](load) turn `enable_vectorized_load` on by default (#11833 )	2022-08-18 14:43:09 +08:00
Pxl	cac317430f	[Bug](aggregation) fix core dump on 2nd phase aggregate (#11843 )	2022-08-18 14:42:34 +08:00
HappenLee	d505d1a5ae	[Vectorized](compaction) filter delete data in base compaction (#11721 ) * [Vectorized](compaction) filter delete data in base compaction Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-08-18 14:22:59 +08:00
Stalary	e1a1a04c2f	[Enhancement](Doe) Be query es use fe generate dsl. (#11840 )	2022-08-18 10:31:17 +08:00
lihangyu	cfb90b39c7	(vec-stream-load-json) simdjson throw execption lead to core dump (#11880 ) when config::enable_simdjson_parser=true in vec streamload, may lead to core dump when json input invalid format string like '{ "a', or all the fields is null like '{}', this may lead to simdjson lib throw some unhandled expection like `Objects and arrays can only be iterated when they are first encountered`.We should take care of these cases Signed-off-by: eldenmoon <15605149486@163.com>	2022-08-18 10:27:34 +08:00
AlexYue	8b10a1a3f7	[enhancement](VSlotRef) enhance column_id check in execute function during runtime (#11862 ) The column id check in VSlotRef::execute function before is too strict for fuzzy test to continuously produce random query. Temporarily loosen the check logic. Moreover, there exists some careless call to VExpr::get_const_col, it might return a nullptr but not every function call checks if it's valid. It's an underlying problem.	2022-08-18 09:12:26 +08:00
AlexYue	50ef6e35be	[enhancement](RowDescriptor) enhance tuple_idx check during runtime (#11835 )	2022-08-17 17:50:48 +08:00
zhangstar333	7df8c6f493	[vectorized](improvement) improve agg function of bitmap_union with f… (#11822 ) * [vectorized](improvement) improve agg function of bitmap_union with fastuinon	2022-08-17 14:13:01 +08:00
wangbo	3a49156e30	[performance] (vectorization)optimize In Expr (#11826 ) Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-08-17 10:46:37 +08:00
camby	fadc78c6cf	[fix](str_to_date) str_to_date support format without leading zero (#11817 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-16 18:23:16 +08:00
slothever	f39f57636b	[feature-wip](parquet-reader) update column read model and add page index (#11601 )	2022-08-16 15:04:07 +08:00
lihangyu	01383c3217	[Enhancement](stream-load-json) using simdjson to parse json (#11665 ) Currently we use rapidjson to parse json document, It's fast but not fast enough compare to simdjson.And I found that the simdjson has a parsing front-end called simdjson::ondemand which will parse json when accessing fields and could strip the field token from the original document, using this feature we could reduce the cost of string copy(eg. we convert everthing to a string literal in _write_data_to_column by sprintf, I saw a hotspot from the flamegrame in this function, using simdjson::to_json_string will strip the token(a string piece) which is std::string_view and this is exactly we need).And second in _set_column_value we could iterate through the json document by for (auto field: object_val) {xxx}, this is much faster than looking up a field by it's field name like objectValue.FindMember("k1").The third optimization is the at_pointer interface simdjson provided, this could directly get the json field from original document.	2022-08-16 14:49:50 +08:00
Xinyi Zou	c124470408	[enhancement](memory) Fix too much cache leads to less memory available for queries (#11751 ) Disable Chunk Allocator in Vectorized Allocator, this will reduce memory cache. For high concurrent queries, using Chunk Allocator with vectorized Allocator can reduce the impact of gperftools tcmalloc central lock. Jemalloc or google tcmalloc have core cache, Chunk Allocator may no longer be needed after replacing gperftools tcmalloc.	2022-08-16 14:35:57 +08:00
Kang	4be6e70f1c	[fix](query) fix orderby keys limit return less or no result (#11757 ) The bug is caused by use _num_rows_read for limit check. _num_rows_read is count of rows read from storage, but may be filtered by filter_block for WHERE predicate. Add a _num_rows_return, which is rows after filter_block for WHERE predicate, for count for really returned rows.	2022-08-16 14:31:47 +08:00
Xinyi Zou	2a1803c646	[enhancement](memtracker) Optimize query memory accuracy (#11740 ) Currently, only the virtual memory used by the query can be tracked through the tcmalloc hook. When the memory is not fully used after the application, the recorded virtual memory will be larger than the physical memory. At present, it is mainly because PODArray does not memset 0 when applying for memory, and blocks applied for through PODArray in places such as VOlapScanNode::_free_blocks are usually used for memory reuse and cannot be fully used.	2022-08-16 14:23:28 +08:00
ZenoYang	288b440b14	[improvement](vectorized) Improve count distinct performance by using fastunion (#11516 ) Improve count distinct performance by using fastunion. Testing our user real data has a 10-40% performance improvement.	2022-08-16 12:18:46 +08:00
luozenglin	5104982614	[enhancement](tracing) append the profile counter to trace. (#11458 ) 1. append the profile counter and infos to span attributes. 2. output traceid to audit log.	2022-08-15 21:36:38 +08:00
Ashin Gau	0b9bfd15b7	[feature-wip](parquet-reader) parquet physical type to doris logical type (#11769 ) Two improvements have been added: 1. Translate parquet physical type into doris logical type. 2. Decode parquet column chunk into doris ColumnPtr, and add unit tests to show how to use related API.	2022-08-15 16:08:11 +08:00
carlvinhust2012	ab9529f6b5	[enhancement](array-type) support export files in 'select into outfile' (#11703 ) this pr is used to support export array type in 'select into outfile'. Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-08-15 12:34:31 +08:00
carlvinhust2012	8c8f48c4c2	[feature-wip](array-type) add the array_join function (#11406 ) this pr is used to add the array_join function. Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-08-15 11:43:17 +08:00
Gabriel	abd2eb4fa1	[Bug](date function) Fix bug for date format %T (#11729 ) * [Bug](date function) Fix bug for date format %T	2022-08-12 19:29:58 +08:00
pengxiangyu	1c4927eac3	[fix](core)fix bug for status not init(#11730 )	2022-08-12 17:42:37 +08:00
Gabriel	e353be7dcb	[Bug](date function) Return null if date format is invalid (#11720 )	2022-08-12 14:07:55 +08:00
Gabriel	15abafee71	[Bug](runtime filters) support late-arrival runtime filters (#11599 )	2022-08-12 11:55:15 +08:00
zhannngchen	0ab43c51e8	[Feature](unique-key-merge-on-write) some fix on delete bitmap usage (#11623 )	2022-08-12 11:54:31 +08:00
Gabriel	7d97aa194b	[feature-wip](datev2) Support to use datev2 as partition column (#11618 )	2022-08-12 11:54:01 +08:00
plat1ko	4047c3577d	[enhancement](Status) Optimize Status implementation	2022-08-12 11:39:35 +08:00
Jibing-Li	9b9ed1aef1	[data lake](arrow scanner)Fix file arrow scanner column index out of range core. (#11691 )	2022-08-12 11:34:29 +08:00
Yongqiang YANG	9950501fdf	[fix](profile) close eof scanner before transfer done (#11705 ) We should close eof scanners before transfer done, otherwise, they are closed until scannode is closed. Because plan is closed after the plan is finished, so query profile would leak stats from scanners closed by scannode::close. e.g. SegmentTotalNum in profile is less.	2022-08-12 11:28:43 +08:00
wangbo	4c8cc7f03e	[fix](storage)fix column dict incorrect result (#11694 ) Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-08-12 11:05:57 +08:00
Pxl	f5fe622a1b	[Bug](materialized view) fix create materialized view fail 1. remove referenced_column(seems unused now). 2. fix mv slot ref id wrong. 3. add type check for hll_hash. 4. enable non-nullable column change to nullable column.	2022-08-12 09:49:16 +08:00
Xin Liao	5d66839035	[feature-wip](unique-key-merge-on-write) push down runtime filter on unique key with merge on write table (#11695 )	2022-08-11 22:50:13 +08:00
Gabriel	2068bf2dea	[Refactor](predicate) Use primitive type as template argument for predicate (#11647 )	2022-08-11 12:06:44 +08:00
Ashin Gau	8f5aed27ec	[feature-wip](parquet-reader)read and decode parquet physical type (#11637 ) # Proposed changes Read and decode parquet physical type. 1. The encoding type of boolean is bit-packing, this PR introduces the implementation of bit-packing from Impala 2. Create a parquet including all the primitive types supported by hive ## Remaining Problems 1. At present, only physical types are decoded, and there is no corresponding and conversion methods with doris logical. 2. No parsing and processing Decimal type / Timestamp / Date. 3. Int_8 / Int_16 is stored as Int_32. How to resolve these types.	2022-08-11 10:17:32 +08:00
zhannngchen	70b39475cf	[fix](scanner) delete predicates might be inconsistent with rowset readers (#11598 )	2022-08-10 19:40:54 +08:00
Jerry Hu	c8418d13b5	[improvement](config)Use session variable to replace configuration for 'enable_function_pushdown' (#11641 )	2022-08-10 19:25:02 +08:00
Jerry Hu	0291f84a9e	[fix](like-predicate) Add missing functions in LikeColumnPredicate (#11631 )	2022-08-10 15:03:14 +08:00
starocean999	601f28dd90	[fix](regexpr)regexpr functions' contexts should be THREAD_LOCAL (#11595 )	2022-08-10 06:58:24 +08:00
camby	01e4522612	[fix]collect_list/collect_set without GROUP BY for NOT NULL column (#11529 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-09 20:49:37 +08:00
carlvinhust2012	df47b6941d	[feature-wip](array-type) support the array type in reverse function (#11213 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-08-09 20:49:09 +08:00
Kang	f9b151744d	optimize topn query if order by columns is prefix of sort keys of table (#10694 ) * [feature](planner): push limit to olapscan when meet sort. * if olap_scan_node's sort_info is set, push sort_limit, read_orderby_key and read_orderby_key_reverse for olap scanner * There is a common query pattern to find latest time serials data. eg. SELECT * from t_log WHERE t>t1 AND t<t2 ORDER BY t DESC LIMIT 100 If the ORDER BY columns is the prefix of the sort key of table, it can be greatly optimized to read much fewer data instead of read all data between t1 and t2. By leveraging the same order of ORDER BY columns and sort key of table, just read the LIMIT N rows for each related segment and merge N rows. 1. set read_orderby_key to true for read_params and _reader_context if olap_scan_node's sort info is set. 2. set read_orderby_key_reverse to true for read_params and _reader_context if is_asc_order is false. 3. rowset reader force merge read segments if read_orderby_key is true. 4. block reader and tablet reader force merge read rowsets if read_orderby_key is true. 5. for ORDER BY DESC, read and compare in reverse order 5.1 segment iterator read backward using a new BackwardBitmapRangeIterator and reverse the result block before return to caller. 5.2 VCollectIterator::LevelIteratorComparator, VMergeIteratorContext return opposite result for _is_reverse order in its compare function. Co-authored-by: jackwener <jakevingoo@gmail.com>	2022-08-09 09:08:44 +08:00
Gabriel	ed7f7dead9	[Refactor](push-down predicate) Derive push-down predicate from vconjuncts (#11468 ) * [Refactor](push-down predicate) Derive push-down predicate from vconjuncts	2022-08-08 19:19:26 +08:00

1 2 3 4 5 ...

519 Commits