doris

Author	SHA1	Message	Date
Mingyu Chen	5f73668626	[log] add more error info for hdfs reader writer (#10475 )	2022-06-29 12:02:27 +08:00
huangzhaowei	abd10f0f3e	[feature-wip](multi-catalog) Impl FileScanNode in be (#10402 ) Define a new file scanner node for hms table in be. This file scanner node is different from broker scan node as blow: 1. Broker scan node will define src slot and dest slot, there is two memory copy in it: first is from file to src slot and second from src to dest slot. Otherwise FileScanNode only have one stemp memory copy just from file to dest slot. 2. Broker scan node will read all the filed in the file to src slot and FileScanNode only read the need filed. 3. Broker scan node will convert type into string type for src slot and then use cast to convert to dest slot type, but FileScanNode will have the final type. Now FileScanNode is a standalone code, but we will uniform the file scan and broker scan in the feature.	2022-06-29 11:04:01 +08:00
minghong	8cbdbb5658	[Enhancement] a better vec version for count_zero_num (#10472 )	2022-06-29 10:26:42 +08:00
Xinyi Zou	deeb3028ad	[Enhancement] [Memory] [Vectorized] Stress test and optimize memory allocation (#9581 ) * vec stress test, Allocator introduce chunkallocator * fix comment	2022-06-29 02:57:51 +08:00
Jerry Hu	7898c818e9	Revert "[improvement]Do not lazily read dict encoded columns (#10420 )" (#10466 ) Reason: 1. Some queries performance degradation 2. Coredump bug: #10419 This reverts commit 904e7576797c796b809823647a769bc1d4569115.	2022-06-28 15:43:48 +08:00
Tiewei Fang	17eb8c00d3	[feature] add table valued function framework and numbers table valued function (#10214 )	2022-06-28 14:01:57 +08:00
Jerry Hu	904e757679	[improvement]Do not lazily read dict encoded columns (#10420 )	2022-06-26 22:08:48 +08:00
Gabriel	ca94867b4e	[Feature-wip] add date v2 type (#9916 )	2022-06-26 16:07:56 +08:00
Stalary	79ad05eec6	[fix](doe) fix doe on es v8 (#10391 ) doris on es8 can not work, because type change. The use of type is no longer recommended in es7, and support for type has been removed from es8. 1. /_mapping not support include_type_name 2. /_search not support use type	2022-06-26 09:51:29 +08:00
Pxl	4750e94746	set default do not build benchmark-tool && and use lld/gold (#10215 )	2022-06-25 22:31:11 +08:00
Gabriel	eebfbd0c91	Revert "[fix](vectorized) Support outer join for vectorized exec engine (#10323 )" (#10424 ) This reverts commit 2cc670dba697a330358ae7d485d856e4b457c679.	2022-06-25 22:18:08 +08:00
Mingyu Chen	7fe4b20da3	[feature-wip](multi-catalog) refactor catalog interface (#10320 )	2022-06-25 21:51:54 +08:00
HappenLee	f12b22a51e	[Bug][Vectorized] Fix cord dump of BloomFilter not support DATE type (#10417 )	2022-06-25 21:29:32 +08:00
ZenoYang	4ca257a1cd	[improvement] Modify the default value of doris_scan_range_max_mb (#10232 ) * [improvement] Modify the default value of doris_scan_range_max_mb * fix regression-test	2022-06-25 19:48:49 +08:00
Gabriel	14a9a676e7	[BUG] fix DCHECK failed (#10396 )	2022-06-25 17:08:40 +08:00
Kidd	eb25df5a2c	[fix] (mem tracker) Fix inaccurate mem tracker leads to load OOM (#10409 ) * fix load tracker * fix comment	2022-06-25 14:13:02 +08:00
Jibing-Li	8abd00dcd5	[feature-wip](multi-catalog) Add catalog name to information schema. (#10349 ) Information schema database need to show catalog name after multi-catalog is supported. This part is step 1, add catalog name for schemata table.	2022-06-25 11:53:04 +08:00
Jerry Hu	7921320124	[fix]Make sure only call once set_dict_encoding_type for each ColumnReader (#10389 )	2022-06-25 04:31:19 +08:00
Jerry Hu	df908873bb	[improvement]Use std::iota to set values of _block_rowids in SegmentIterator::_read_columns_by_index (#10386 )	2022-06-25 04:30:23 +08:00
carlvinhust2012	89860fd0e3	[opt] delete the redundant parameter of _execute_non_nullable (#10173 ) 1. This pr is used to delete the redundant parameter of _execute_non_nullable. 2. This modification will not affect the function "element_at".	2022-06-24 19:22:50 +08:00
Gabriel	476be35961	[TYPO] fix typo 'destory' -> 'destroy' (#10373 )	2022-06-24 19:11:28 +08:00
Mingyu Chen	8a49c7ef04	[chore] Rename Doris binary output format	2022-06-24 15:30:05 +08:00
Mingyu Chen	9036f93df4	Revert "[improvement](function) optimize substr performance (#10169 )" (#10390 ) This reverts commit 2335d233f1f52eb64a380b4c9959becdf182b71b.	2022-06-24 14:38:52 +08:00
HappenLee	2cc670dba6	[fix](vectorized) Support outer join for vectorized exec engine (#10323 ) In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple.	2022-06-24 08:59:30 +08:00
yinzhijian	1bd0d7ded5	[typo] Fix typos in comments (#10252 )	2022-06-24 08:57:54 +08:00
Kang	2335d233f1	[improvement](function) optimize substr performance (#10169 ) optimize substr performance about 1.5~2x speedup.	2022-06-24 08:57:31 +08:00
Yongqiang YANG	b1d9b54805	BetaRowsetReader::next_block does not return 0 rows before eof (#10367 )	2022-06-24 07:22:45 +08:00
Jerry Hu	2e661ac63f	[improvement]Support vectorized predicates for dict columns (#10370 )	2022-06-24 07:21:26 +08:00
carlvinhust2012	1541dcd919	fix some typo in comments (#10374 )	2022-06-24 07:20:08 +08:00
yiguolei	b8d2c96842	[refactor]Remove load_delete job (#10353 )	2022-06-24 00:04:38 +08:00
yiguolei	3370c10528	[profile] add more detail profile in segment iterator (#10352 )	2022-06-23 15:32:43 +08:00
Yongqiang YANG	f466668d48	[improvement] each tuple starting at aligned address to build with ubsan enabled (#8831 ) When I builded doris be with ubsan enabled and enabled vectorization, be core dump at doris::DecimalV2Value::operator long(). It cored because accessing on a non-aligned address by sse. With ubsan enabled, compile generates different assemble code including sse instruction. A sender serializes tuples to a contiguous memory area, while a receiver just copy it. So we should align each tuple offset to 16 bytes. For compatibility, we should use a config to control it. BTW: with tools like ubsan, asan, tsan we can find bugs more easily, e.g. #8815. It is difficult to find the bug without ubsan. Anyway, we should use modern tools to be more productive.	2022-06-23 14:03:01 +08:00
HappenLee	fa13bef3da	[Bug][Vectorized] Fix coredump in other join conjunt is const expr (#10223 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-23 13:27:32 +08:00
wangbo	0c39e1018c	[fixbug]opt nullable (#10346 ) Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-06-23 12:37:43 +08:00
wangbo	d73f170eeb	[optimize](storage)optimize date in storage layer (#8967 ) * opt date in storage * code style Co-authored-by: Wang Bo <wangbo36@meituan.com>	2022-06-23 12:29:10 +08:00
Gabriel	139cd3d11a	[Improvement] remove olap filters when use in key ranges (#10278 )	2022-06-23 09:12:29 +08:00
Gabriel	ed1e130ef6	[BUGFIX] fix wrong children quantity in debug string (#10348 )	2022-06-23 09:10:30 +08:00
Yongqiang YANG	274a0f2603	[fix] do not read seq column when reading a compacted rowset (#10344 ) SEQ_COL is used on tables with unique key to order data in one transaction(rowset), when there is only one rowset and the rowset is compacted, rows in the rowset is sorted and rows with same keys are resolved by compaction, so a scanner sets direct_mode to optimize read iterator to avoid sorting and aggregating, and iterators does not need SEQ_COL. However, init_return_columns adds SEQ_COL to return_columns, which is passed to SegmentIterator. Then segment Iterator would be called via get_next with a block without SEQ_COL, segment iterator creates columns included in return_columns but not in the block. SEQ_COL is nullable, segment Iterator does not handle it, so a core dump happen. Actually, in the above case, segment iterator does not need to read SEQ_COL. When SEQ_COL is really needed, iterators creates SEQ_COL column in block, so segment Iterator does not need do create SEQ_COL at all.	2022-06-23 08:44:43 +08:00
Gabriel	200557052a	[BUGFIX] wrong answer with `with as` + two phase agg (#10303 )	2022-06-22 14:39:39 +08:00
TengJianPing	994feb9dbe	[bugfix][compaction][vectorized]fix compaction OOM (#10289 )	2022-06-22 14:38:30 +08:00
Kidd	f7ed2817ad	[fix] [ubsan] Fix TCMalloc Hook deadlocks when ThreadContext is initialized (#10310 )	2022-06-22 14:37:48 +08:00
camby	5248b21a01	[fix UT] for pr10249 evaluate interface changed (#10269 ) * UT fix for pr10249, evaluate interface changed, but UT do not change. * fix be code format Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-06-22 08:49:53 +08:00
Adonis Ling	d056f5873b	[Fix](compile) Fix compilation errors reported by clang (#10221 ) fix failed to build the codebase by clang	2022-06-21 11:04:22 +08:00
Gabriel	84f57398d9	[Improvement] set debug string for VExpressions (#10166 )	2022-06-21 07:43:25 +08:00
Gabriel	f5e5880fb6	[Improvement] make expression for template argument a constexpr (#10268 )	2022-06-21 07:42:02 +08:00
chenlinzhong	5974e452bc	[enhancement] CRC32 instructions compatible arm arch (#10261 ) The performance of some CPUs that do not implement CRC instructions is particularly poor	2022-06-20 17:49:06 +08:00
minghong	c3743ec9aa	[enhancement] optmize 2 cases in seg_iter: all/none rows passed predicate (#10259 ) * [enhancement] optmize 2 cases: all/none rows passed predicate in seg_iter. * format	2022-06-20 17:47:52 +08:00
Jerry Hu	57327e6236	[improvement]Separate input and output parameters in ColumnPredicate (#10249 ) ```cpp for (uint16_t i = 0; i < *size; ++i) { // some code here } ``` The value of size is read for each conditional test, which also prevents possible vectorization.	2022-06-20 15:04:57 +08:00
Gabriel	588634ddf6	[feature] support runtime filter on vectorized engine (#10103 )	2022-06-20 09:46:38 +08:00
hongbin	ecdf8bcfdd	[comments]Replace some chinese comments in product Code (#10243 )	2022-06-20 09:24:19 +08:00

1 2 3 4 5 ...

2257 Commits