doris

Author	SHA1	Message	Date
Gabriel	ba3e0b3f96	[feature](compaction) allow to set disable_auto_compaction for tables (#11743 )	2022-08-17 11:05:47 +08:00
Xin Liao	12c4d1f4dd	[feature-wip](unique-key-merge-on-write) unique key table with MOW supports sequence column (#11808 )	2022-08-17 10:56:14 +08:00
Xin Liao	c3e6a841c1	[feature-wip](unique-key-merge-on-write) fix that sort segments by segment id in descending order (#11811 )	2022-08-17 10:54:30 +08:00
wangbo	3a49156e30	[performance] (vectorization)optimize In Expr (#11826 ) Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-08-17 10:46:37 +08:00
yiguolei	c715209a7e	[refactor](dpp) remove original dpp writer (#11838 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-17 10:42:29 +08:00
Lightman	3e13b7d2c2	[Bugfix](light-shema-change) fix _finish_clone dead lock (#11823 ) In engine_clone_task.cpp, it use tablet->tablet_schema() to create rowset, but in the method, it need a lock that already locked in engine_clone_task.cpp:514. It use cloned_tablet_meta->tablet_schema() originally, but modified in #11131. It need to revert to use cloned_tablet_meta->tablet_schema().	2022-08-17 09:10:08 +08:00
camby	fadc78c6cf	[fix](str_to_date) str_to_date support format without leading zero (#11817 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-16 18:23:16 +08:00
plat1ko	30175010c7	Fix nullptr in perform_remote_tablet_gc (#11820 )	2022-08-16 16:50:21 +08:00
slothever	f39f57636b	[feature-wip](parquet-reader) update column read model and add page index (#11601 )	2022-08-16 15:04:07 +08:00
lihangyu	01383c3217	[Enhancement](stream-load-json) using simdjson to parse json (#11665 ) Currently we use rapidjson to parse json document, It's fast but not fast enough compare to simdjson.And I found that the simdjson has a parsing front-end called simdjson::ondemand which will parse json when accessing fields and could strip the field token from the original document, using this feature we could reduce the cost of string copy(eg. we convert everthing to a string literal in _write_data_to_column by sprintf, I saw a hotspot from the flamegrame in this function, using simdjson::to_json_string will strip the token(a string piece) which is std::string_view and this is exactly we need).And second in _set_column_value we could iterate through the json document by for (auto field: object_val) {xxx}, this is much faster than looking up a field by it's field name like objectValue.FindMember("k1").The third optimization is the at_pointer interface simdjson provided, this could directly get the json field from original document.	2022-08-16 14:49:50 +08:00
plat1ko	fecfdd78bf	[enhancement](status) Fix Status related macros to enable RVO or move ctor (#11753 )	2022-08-16 14:40:35 +08:00
Xinyi Zou	c124470408	[enhancement](memory) Fix too much cache leads to less memory available for queries (#11751 ) Disable Chunk Allocator in Vectorized Allocator, this will reduce memory cache. For high concurrent queries, using Chunk Allocator with vectorized Allocator can reduce the impact of gperftools tcmalloc central lock. Jemalloc or google tcmalloc have core cache, Chunk Allocator may no longer be needed after replacing gperftools tcmalloc.	2022-08-16 14:35:57 +08:00
Kang	4be6e70f1c	[fix](query) fix orderby keys limit return less or no result (#11757 ) The bug is caused by use _num_rows_read for limit check. _num_rows_read is count of rows read from storage, but may be filtered by filter_block for WHERE predicate. Add a _num_rows_return, which is rows after filter_block for WHERE predicate, for count for really returned rows.	2022-08-16 14:31:47 +08:00
Xinyi Zou	7d836cf0c7	[fix](memtracker) Fix flush memtable to reduce load channel mem not executed (#11771 ) The memory value automatically tracked by the tcmalloc hook in the DeltaWriter is smaller than the value recorded manually in the memtable, because the first 4096-byte Chunk requested by each MemPool when the memtable is initialized is not tracked to the DeltaWriter by the hook. The values of the two are not equal, causing the mem_consumption() == _mem_table->memory_usage branch judgment to fail.	2022-08-16 14:30:45 +08:00
Xinyi Zou	2a1803c646	[enhancement](memtracker) Optimize query memory accuracy (#11740 ) Currently, only the virtual memory used by the query can be tracked through the tcmalloc hook. When the memory is not fully used after the application, the recorded virtual memory will be larger than the physical memory. At present, it is mainly because PODArray does not memset 0 when applying for memory, and blocks applied for through PODArray in places such as VOlapScanNode::_free_blocks are usually used for memory reuse and cannot be fully used.	2022-08-16 14:23:28 +08:00
yixiutt	573588693c	[bugfix](load) get max versio in read lock (#11806 ) Introduced by #11195。 Get max version from tablet meta should in read lock in multi-thread load。	2022-08-16 12:25:29 +08:00
ZenoYang	288b440b14	[improvement](vectorized) Improve count distinct performance by using fastunion (#11516 ) Improve count distinct performance by using fastunion. Testing our user real data has a 10-40% performance improvement.	2022-08-16 12:18:46 +08:00
Xinyi Zou	d2bb3ad08e	[fix](memtracker) Fix core in logout task mem tracker (#11797 )	2022-08-16 11:28:06 +08:00
luozenglin	5104982614	[enhancement](tracing) append the profile counter to trace. (#11458 ) 1. append the profile counter and infos to span attributes. 2. output traceid to audit log.	2022-08-15 21:36:38 +08:00
Lightman	71df82696d	[fix](schema change) fix memory exceeded when schema change (#11748 ) In row mode schema change, it will fail sometime because memory exceeded. When the left memory is enough for sorting but not enough for next block, it will not flush row_block_arr which data in memory and continue to alloc next block so it can't alloc the memory and return directly. And if it can't alloc the memory for block, it need to flush row_block_arr and try it again unless row_block_arr is empty.	2022-08-15 17:57:39 +08:00
luozenglin	0f75bd0e38	[fix](delete) fix query result error after delete (#11754 ) convert dictionary code for delete predicates.	2022-08-15 17:52:03 +08:00
Ashin Gau	0b9bfd15b7	[feature-wip](parquet-reader) parquet physical type to doris logical type (#11769 ) Two improvements have been added: 1. Translate parquet physical type into doris logical type. 2. Decode parquet column chunk into doris ColumnPtr, and add unit tests to show how to use related API.	2022-08-15 16:08:11 +08:00
Zhengguo Yang	805c13aaa1	[fix](backup) fix backup restore raise `Storage backend not initialized.` error (#11736 ) fix backup restore raise Storage backend not initialized. error	2022-08-15 13:24:38 +08:00
carlvinhust2012	ab9529f6b5	[enhancement](array-type) support export files in 'select into outfile' (#11703 ) this pr is used to support export array type in 'select into outfile'. Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-08-15 12:34:31 +08:00
carlvinhust2012	8c8f48c4c2	[feature-wip](array-type) add the array_join function (#11406 ) this pr is used to add the array_join function. Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-08-15 11:43:17 +08:00
Gabriel	77e241cbb0	[refactor](date) Use uint32 as predicate type for date type (#11708 ) Use uint32 as predicate type for date type	2022-08-15 11:12:33 +08:00
bin41215	ec5d4e3d17	print physical memory and virtual memory separately. (#11747 )	2022-08-13 13:56:49 +08:00
Gabriel	abd2eb4fa1	[Bug](date function) Fix bug for date format %T (#11729 ) * [Bug](date function) Fix bug for date format %T	2022-08-12 19:29:58 +08:00
yiguolei	408dbf840b	[bugfix](schema change) when there is a string column with delete predicate, the schema change may core (#11739 ) * [bugfix](schema change) when there is a string column with delete predicate, the schema change may core Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-12 19:29:22 +08:00
pengxiangyu	1c4927eac3	[fix](core)fix bug for status not init(#11730 )	2022-08-12 17:42:37 +08:00
TengJianPing	58822c7b55	[bugfix](odbc) return error if convert unicode failed (#11728 ) * [bugfix](odbc) return error if convert unicode failed	2022-08-12 17:28:48 +08:00
Gabriel	e353be7dcb	[Bug](date function) Return null if date format is invalid (#11720 )	2022-08-12 14:07:55 +08:00
pengxiangyu	e5c2bb9699	[fix](remote)Fix bug for Cache Reader (#11629 )	2022-08-12 13:40:32 +08:00
Gabriel	15abafee71	[Bug](runtime filters) support late-arrival runtime filters (#11599 )	2022-08-12 11:55:15 +08:00
zhannngchen	0ab43c51e8	[Feature](unique-key-merge-on-write) some fix on delete bitmap usage (#11623 )	2022-08-12 11:54:31 +08:00
Gabriel	7d97aa194b	[feature-wip](datev2) Support to use datev2 as partition column (#11618 )	2022-08-12 11:54:01 +08:00
carlvinhust2012	b36680796f	[optimization] (be-log) modify the backendservice log (#11689 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-08-12 11:52:24 +08:00
plat1ko	4047c3577d	[enhancement](Status) Optimize Status implementation	2022-08-12 11:39:35 +08:00
Jibing-Li	9b9ed1aef1	[data lake](arrow scanner)Fix file arrow scanner column index out of range core. (#11691 )	2022-08-12 11:34:29 +08:00
Yongqiang YANG	9950501fdf	[fix](profile) close eof scanner before transfer done (#11705 ) We should close eof scanners before transfer done, otherwise, they are closed until scannode is closed. Because plan is closed after the plan is finished, so query profile would leak stats from scanners closed by scannode::close. e.g. SegmentTotalNum in profile is less.	2022-08-12 11:28:43 +08:00
wangbo	4c8cc7f03e	[fix](storage)fix column dict incorrect result (#11694 ) Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-08-12 11:05:57 +08:00
Pxl	f5fe622a1b	[Bug](materialized view) fix create materialized view fail 1. remove referenced_column(seems unused now). 2. fix mv slot ref id wrong. 3. add type check for hll_hash. 4. enable non-nullable column change to nullable column.	2022-08-12 09:49:16 +08:00
Xin Liao	5d66839035	[feature-wip](unique-key-merge-on-write) push down runtime filter on unique key with merge on write table (#11695 )	2022-08-11 22:50:13 +08:00
yiguolei	ea57bf6370	[refactor](delete predicate) Unify delete to segmentiterator (#11650 ) * remove seek columns and unify delete columns in rowset reader Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-11 15:12:43 +08:00
Gabriel	2068bf2dea	[Refactor](predicate) Use primitive type as template argument for predicate (#11647 )	2022-08-11 12:06:44 +08:00
Ashin Gau	8f5aed27ec	[feature-wip](parquet-reader)read and decode parquet physical type (#11637 ) # Proposed changes Read and decode parquet physical type. 1. The encoding type of boolean is bit-packing, this PR introduces the implementation of bit-packing from Impala 2. Create a parquet including all the primitive types supported by hive ## Remaining Problems 1. At present, only physical types are decoded, and there is no corresponding and conversion methods with doris logical. 2. No parsing and processing Decimal type / Timestamp / Date. 3. Int_8 / Int_16 is stored as Int_32. How to resolve these types.	2022-08-11 10:17:32 +08:00
Gabriel	a3714981fd	[Bug](schema change) Fix bug for vectorized schema change (#11652 )	2022-08-10 21:42:51 +08:00
zhannngchen	70b39475cf	[fix](scanner) delete predicates might be inconsistent with rowset readers (#11598 )	2022-08-10 19:40:54 +08:00
Jerry Hu	c8418d13b5	[improvement](config)Use session variable to replace configuration for 'enable_function_pushdown' (#11641 )	2022-08-10 19:25:02 +08:00
Jerry Hu	0291f84a9e	[fix](like-predicate) Add missing functions in LikeColumnPredicate (#11631 )	2022-08-10 15:03:14 +08:00

1 2 3 4 5 ...

2490 Commits