doris

Author	SHA1	Message	Date
Stalary	e1a1a04c2f	[Enhancement](Doe) Be query es use fe generate dsl. (#11840 )	2022-08-18 10:31:17 +08:00
lihangyu	cfb90b39c7	(vec-stream-load-json) simdjson throw execption lead to core dump (#11880 ) when config::enable_simdjson_parser=true in vec streamload, may lead to core dump when json input invalid format string like '{ "a', or all the fields is null like '{}', this may lead to simdjson lib throw some unhandled expection like `Objects and arrays can only be iterated when they are first encountered`.We should take care of these cases Signed-off-by: eldenmoon <15605149486@163.com>	2022-08-18 10:27:34 +08:00
AlexYue	50ef6e35be	[enhancement](RowDescriptor) enhance tuple_idx check during runtime (#11835 )	2022-08-17 17:50:48 +08:00
wangbo	3a49156e30	[performance] (vectorization)optimize In Expr (#11826 ) Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-08-17 10:46:37 +08:00
slothever	f39f57636b	[feature-wip](parquet-reader) update column read model and add page index (#11601 )	2022-08-16 15:04:07 +08:00
lihangyu	01383c3217	[Enhancement](stream-load-json) using simdjson to parse json (#11665 ) Currently we use rapidjson to parse json document, It's fast but not fast enough compare to simdjson.And I found that the simdjson has a parsing front-end called simdjson::ondemand which will parse json when accessing fields and could strip the field token from the original document, using this feature we could reduce the cost of string copy(eg. we convert everthing to a string literal in _write_data_to_column by sprintf, I saw a hotspot from the flamegrame in this function, using simdjson::to_json_string will strip the token(a string piece) which is std::string_view and this is exactly we need).And second in _set_column_value we could iterate through the json document by for (auto field: object_val) {xxx}, this is much faster than looking up a field by it's field name like objectValue.FindMember("k1").The third optimization is the at_pointer interface simdjson provided, this could directly get the json field from original document.	2022-08-16 14:49:50 +08:00
Kang	4be6e70f1c	[fix](query) fix orderby keys limit return less or no result (#11757 ) The bug is caused by use _num_rows_read for limit check. _num_rows_read is count of rows read from storage, but may be filtered by filter_block for WHERE predicate. Add a _num_rows_return, which is rows after filter_block for WHERE predicate, for count for really returned rows.	2022-08-16 14:31:47 +08:00
ZenoYang	288b440b14	[improvement](vectorized) Improve count distinct performance by using fastunion (#11516 ) Improve count distinct performance by using fastunion. Testing our user real data has a 10-40% performance improvement.	2022-08-16 12:18:46 +08:00
luozenglin	5104982614	[enhancement](tracing) append the profile counter to trace. (#11458 ) 1. append the profile counter and infos to span attributes. 2. output traceid to audit log.	2022-08-15 21:36:38 +08:00
Ashin Gau	0b9bfd15b7	[feature-wip](parquet-reader) parquet physical type to doris logical type (#11769 ) Two improvements have been added: 1. Translate parquet physical type into doris logical type. 2. Decode parquet column chunk into doris ColumnPtr, and add unit tests to show how to use related API.	2022-08-15 16:08:11 +08:00
pengxiangyu	1c4927eac3	[fix](core)fix bug for status not init(#11730 )	2022-08-12 17:42:37 +08:00
Gabriel	15abafee71	[Bug](runtime filters) support late-arrival runtime filters (#11599 )	2022-08-12 11:55:15 +08:00
zhannngchen	0ab43c51e8	[Feature](unique-key-merge-on-write) some fix on delete bitmap usage (#11623 )	2022-08-12 11:54:31 +08:00
Gabriel	7d97aa194b	[feature-wip](datev2) Support to use datev2 as partition column (#11618 )	2022-08-12 11:54:01 +08:00
Jibing-Li	9b9ed1aef1	[data lake](arrow scanner)Fix file arrow scanner column index out of range core. (#11691 )	2022-08-12 11:34:29 +08:00
Yongqiang YANG	9950501fdf	[fix](profile) close eof scanner before transfer done (#11705 ) We should close eof scanners before transfer done, otherwise, they are closed until scannode is closed. Because plan is closed after the plan is finished, so query profile would leak stats from scanners closed by scannode::close. e.g. SegmentTotalNum in profile is less.	2022-08-12 11:28:43 +08:00
Xin Liao	5d66839035	[feature-wip](unique-key-merge-on-write) push down runtime filter on unique key with merge on write table (#11695 )	2022-08-11 22:50:13 +08:00
Ashin Gau	8f5aed27ec	[feature-wip](parquet-reader)read and decode parquet physical type (#11637 ) # Proposed changes Read and decode parquet physical type. 1. The encoding type of boolean is bit-packing, this PR introduces the implementation of bit-packing from Impala 2. Create a parquet including all the primitive types supported by hive ## Remaining Problems 1. At present, only physical types are decoded, and there is no corresponding and conversion methods with doris logical. 2. No parsing and processing Decimal type / Timestamp / Date. 3. Int_8 / Int_16 is stored as Int_32. How to resolve these types.	2022-08-11 10:17:32 +08:00
zhannngchen	70b39475cf	[fix](scanner) delete predicates might be inconsistent with rowset readers (#11598 )	2022-08-10 19:40:54 +08:00
Jerry Hu	c8418d13b5	[improvement](config)Use session variable to replace configuration for 'enable_function_pushdown' (#11641 )	2022-08-10 19:25:02 +08:00
Jerry Hu	0291f84a9e	[fix](like-predicate) Add missing functions in LikeColumnPredicate (#11631 )	2022-08-10 15:03:14 +08:00
camby	01e4522612	[fix]collect_list/collect_set without GROUP BY for NOT NULL column (#11529 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-09 20:49:37 +08:00
Kang	f9b151744d	optimize topn query if order by columns is prefix of sort keys of table (#10694 ) * [feature](planner): push limit to olapscan when meet sort. * if olap_scan_node's sort_info is set, push sort_limit, read_orderby_key and read_orderby_key_reverse for olap scanner * There is a common query pattern to find latest time serials data. eg. SELECT * from t_log WHERE t>t1 AND t<t2 ORDER BY t DESC LIMIT 100 If the ORDER BY columns is the prefix of the sort key of table, it can be greatly optimized to read much fewer data instead of read all data between t1 and t2. By leveraging the same order of ORDER BY columns and sort key of table, just read the LIMIT N rows for each related segment and merge N rows. 1. set read_orderby_key to true for read_params and _reader_context if olap_scan_node's sort info is set. 2. set read_orderby_key_reverse to true for read_params and _reader_context if is_asc_order is false. 3. rowset reader force merge read segments if read_orderby_key is true. 4. block reader and tablet reader force merge read rowsets if read_orderby_key is true. 5. for ORDER BY DESC, read and compare in reverse order 5.1 segment iterator read backward using a new BackwardBitmapRangeIterator and reverse the result block before return to caller. 5.2 VCollectIterator::LevelIteratorComparator, VMergeIteratorContext return opposite result for _is_reverse order in its compare function. Co-authored-by: jackwener <jakevingoo@gmail.com>	2022-08-09 09:08:44 +08:00
Gabriel	ed7f7dead9	[Refactor](push-down predicate) Derive push-down predicate from vconjuncts (#11468 ) * [Refactor](push-down predicate) Derive push-down predicate from vconjuncts	2022-08-08 19:19:26 +08:00
lihangyu	9349746987	[Fix](stream-load-json) fix VJsonReader::_write_data_to_column invalid column type cast when meet null (#11564 ) column_ptr will be a none nullable column pointer after `column_ptr = &nullable_column->get_nested_column()` so we should not cast column_ptr to ColumnNullable any more	2022-08-08 15:57:39 +08:00
Ashin Gau	37d1180cca	[feature-wip](parquet-reader)decode parquet data (#11536 )	2022-08-08 12:44:06 +08:00
Pxl	2cd3bf80dc	[bugfix](schema change)fix core dump on vectorized_alter_table (#11538 )	2022-08-08 10:45:28 +08:00
slothever	e8a344b683	[feature-wip](parquet-reader) add predicate filter and column reader (#11488 )	2022-08-08 10:21:24 +08:00
slothever	95753ec868	[feature](parquet-reader) add group filter util (#11533 ) * [feature-wip](parquet-reader) add group filter util Co-authored-by: jinzhe <jinzhe@selectdb.com>	2022-08-05 14:02:48 +08:00
yiguolei	321107cb40	[refactor](schema change) Using tablet schema shared ptr instead of raw ptr (#11475 ) * Using tabletschema shared ptr instead of raw ptrs Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-05 11:04:38 +08:00
huangzhaowei	6eb8ac0ebf	[feature-wip][multi-catalog]Support caseSensitive field name in file scan node (#11310 ) * Impl case sentive in file scan node	2022-08-05 08:03:16 +08:00
starocean999	092a394782	[improvement](agg)limit the output of agg node (#11461 ) * [improvement](agg)limit the output of agg node	2022-08-05 07:53:55 +08:00
Ashin Gau	aed0282046	[feature-wip](parquet-reader)get compressed parquet page data (#11493 )	2022-08-04 17:44:52 +08:00
Pxl	ec3c911f97	[Feature][Materialized-View] support materialized view on vectorized engine (#10792 )	2022-08-04 14:07:48 +08:00
Xinyi Zou	ecbf87d77b	[bugfix](memtracker)fix exceed memory limit log (#11485 )	2022-08-04 10:22:20 +08:00
slothever	1b4d6a620a	(feature-wip)[parquet-reader] support page index serde (#11415 )	2022-08-03 10:36:06 +08:00
Jerry Hu	842a5b8e24	[refactor](agg) Abstract the hash operation into a method" (#11399 )	2022-08-02 17:27:19 +08:00
HappenLee	38ffe685b5	[Bug](ODBC) fix vectorized null value error report in odbc scan node (#11420 ) * [Bug](ODBC) fix vectorized null value error report in odbc scan node Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-08-02 15:44:12 +08:00
Ashin Gau	44a1a20e65	[feature-wip](parquet-reader)parse parquet schema (#11381 ) Analyze schema elements in parquet FileMetaData, and generate the hierarchy of nested fields. For exmpale: 1. primitive type ``` // thrift: optional int32 <column-name>; // sql definition: <column-name> int32; ``` 2. nested type ``` // thrift: optional group <column-name> (LIST) { repeated group bag { optional group array_element (LIST) { repeated group bag { optional int32 array_element } } } } // sql definition: <column-name> array<array<int32>> ```	2022-08-02 10:56:13 +08:00
luozenglin	1cf57a985d	[fix] Fix the query result error caused by the grouping sets statemen… (#11316 ) * [fix] Fix the query result error caused by the grouping sets statement grouping as an expression	2022-08-01 13:52:18 +08:00
Zhengguo Yang	4f5e1601df	[bug](scanner) Improve limit query performance on olapScannode and avoid infinite loop (#11301 ) 1. Fix a bug that query large column table may cause infinite loop 2. Optimize the query logic with limit, for the case where the limit value is relatively small, reduce the parallelism of the scanner, reduce unnecessary resource consumption, and increase the number of similar queries that the system can carry at the same time, and increase the query speed by more than 60%	2022-08-01 13:50:12 +08:00
Lightman	b35daf0a04	[improvement](light-schema-change) Support tablet schema cache (#11131 )	2022-08-01 12:18:00 +08:00
Jerry Hu	0325fa436e	[fix](agg)Add field of 'is_first_phase' in TAggregationNode (#11321 )	2022-08-01 11:49:50 +08:00
Jerry Hu	d360974dce	[improvement](agg)Use phmap::flat_hash_set in AggregateFunctionUniq (#11363 ) This reverts commit 688b55053dd1fc5113343a6f565ad732ddd9612a.	2022-08-01 10:36:11 +08:00
Mingyu Chen	688b55053d	Revert "[improvement]Use phmap::flat_hash_set in AggregateFunctionUniq (#11257 )" (#11356 ) This reverts commit a7199fb98e18b925664b38460b667d04cbee8e01.	2022-07-30 23:15:36 +08:00
zhangstar333	1f30e563a7	[refactor][vectorized] refactor first/last value agg functions (#10661 ) * refactor first and last [refactor][vectorized] refactor first/last value agg functions * add some change * remove first/last about always nullable * remove always nullable and register it * refactor value remove bool null flag * refactor win first last to ptr and pos	2022-07-30 18:38:56 +08:00
Xinyi Zou	18864ab7fe	weak relationship between MemTracker and MemTrackerLimiter (#11347 )	2022-07-30 18:33:54 +08:00
Luwei	d6f937cb01	(performance)[scanner] Isolate local and remote queries using different scanner… (#11006 )	2022-07-29 19:14:46 +08:00
Ashin Gau	84ce2a1e98	[feature-wip](multi-catalog)(fix) partition value error when a block contains multiple splits (#11260 ) `FileArrowScanner::get_next` returns a block when full, so it maybe contains multiple splits in small files or crosses two splits in large files. However, a block can only fill the partition values from one file. Different splits may be from different files, causing the error of embed partition values.	2022-07-29 18:48:59 +08:00
Jerry Hu	a7199fb98e	[improvement]Use phmap::flat_hash_set in AggregateFunctionUniq (#11257 )	2022-07-29 16:55:22 +08:00

1 2 3 4

197 Commits