Commit Graph

2241 Commits

Author SHA1 Message Date
8abd00dcd5 [feature-wip](multi-catalog) Add catalog name to information schema. (#10349)
Information schema database need to show catalog name after multi-catalog is supported.
This part is step 1, add catalog name for schemata table.
2022-06-25 11:53:04 +08:00
7921320124 [fix]Make sure only call once set_dict_encoding_type for each ColumnReader (#10389) 2022-06-25 04:31:19 +08:00
df908873bb [improvement]Use std::iota to set values of _block_rowids in SegmentIterator::_read_columns_by_index (#10386) 2022-06-25 04:30:23 +08:00
89860fd0e3 [opt] delete the redundant parameter of _execute_non_nullable (#10173)
1. This pr is used to delete the redundant parameter of _execute_non_nullable.
2. This modification will not affect the function "element_at".
2022-06-24 19:22:50 +08:00
476be35961 [TYPO] fix typo 'destory' -> 'destroy' (#10373) 2022-06-24 19:11:28 +08:00
8a49c7ef04 [chore] Rename Doris binary output format 2022-06-24 15:30:05 +08:00
9036f93df4 Revert "[improvement](function) optimize substr performance (#10169)" (#10390)
This reverts commit 2335d233f1f52eb64a380b4c9959becdf182b71b.
2022-06-24 14:38:52 +08:00
2cc670dba6 [fix](vectorized) Support outer join for vectorized exec engine (#10323)
In a vectorized scenario, the query plan will generate a new tuple for the join node.
This tuple mainly describes the output schema of the join node.
Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema.
For example:
1. The case where the null side column caused by outer join is converted to nullable.
2. The projection of the outer tuple.
2022-06-24 08:59:30 +08:00
1bd0d7ded5 [typo] Fix typos in comments (#10252) 2022-06-24 08:57:54 +08:00
2335d233f1 [improvement](function) optimize substr performance (#10169)
optimize substr performance about 1.5~2x speedup.
2022-06-24 08:57:31 +08:00
b1d9b54805 BetaRowsetReader::next_block does not return 0 rows before eof (#10367) 2022-06-24 07:22:45 +08:00
2e661ac63f [improvement]Support vectorized predicates for dict columns (#10370) 2022-06-24 07:21:26 +08:00
1541dcd919 fix some typo in comments (#10374) 2022-06-24 07:20:08 +08:00
b8d2c96842 [refactor]Remove load_delete job (#10353) 2022-06-24 00:04:38 +08:00
3370c10528 [profile] add more detail profile in segment iterator (#10352) 2022-06-23 15:32:43 +08:00
f466668d48 [improvement] each tuple starting at aligned address to build with ubsan enabled (#8831)
When I builded doris be with ubsan enabled and enabled vectorization,
be core dump at doris::DecimalV2Value::operator long(). It cored
because accessing on a non-aligned address by sse.

With ubsan enabled, compile generates different assemble code including
sse instruction.

A sender serializes tuples to a contiguous memory area, while a receiver
just copy it. So we should align each tuple offset to 16 bytes.

For compatibility, we should use a config to control it.

BTW: with tools like ubsan, asan, tsan we can find bugs more easily,
e.g. #8815. It is difficult to find the bug without ubsan.

Anyway, we should use modern tools to be more productive.
2022-06-23 14:03:01 +08:00
fa13bef3da [Bug][Vectorized] Fix coredump in other join conjunt is const expr (#10223)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-23 13:27:32 +08:00
0c39e1018c [fixbug]opt nullable (#10346)
Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-06-23 12:37:43 +08:00
d73f170eeb [optimize](storage)optimize date in storage layer (#8967)
* opt date in storage

* code style

Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-06-23 12:29:10 +08:00
139cd3d11a [Improvement] remove olap filters when use in key ranges (#10278) 2022-06-23 09:12:29 +08:00
ed1e130ef6 [BUGFIX] fix wrong children quantity in debug string (#10348) 2022-06-23 09:10:30 +08:00
274a0f2603 [fix] do not read seq column when reading a compacted rowset (#10344)
SEQ_COL is used on tables with unique key to order data in one transaction(rowset),
when there is only one rowset and the rowset is compacted, rows in the rowset is sorted 
and rows with same keys are resolved by compaction, so a scanner sets direct_mode to 
optimize read iterator to avoid sorting and aggregating, and iterators does not need SEQ_COL. 
However, init_return_columns adds SEQ_COL to return_columns, which is passed to SegmentIterator.
Then segment Iterator would be called via get_next with a block without SEQ_COL, segment iterator 
creates columns included in return_columns but not in the block. SEQ_COL is nullable, segment Iterator 
does not handle it, so a core dump happen.

Actually, in the above case, segment iterator does not need to read SEQ_COL. 
When SEQ_COL is really needed, iterators creates SEQ_COL column in block,
so segment Iterator does not need do create SEQ_COL at all.
2022-06-23 08:44:43 +08:00
200557052a [BUGFIX] wrong answer with with as + two phase agg (#10303) 2022-06-22 14:39:39 +08:00
994feb9dbe [bugfix][compaction][vectorized]fix compaction OOM (#10289) 2022-06-22 14:38:30 +08:00
f7ed2817ad [fix] [ubsan] Fix TCMalloc Hook deadlocks when ThreadContext is initialized (#10310) 2022-06-22 14:37:48 +08:00
5248b21a01 [fix UT] for pr10249 evaluate interface changed (#10269)
* UT fix for pr10249, evaluate interface changed, but UT do not change.

* fix be code format

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-06-22 08:49:53 +08:00
d056f5873b [Fix](compile) Fix compilation errors reported by clang (#10221)
fix failed to build the codebase by clang
2022-06-21 11:04:22 +08:00
84f57398d9 [Improvement] set debug string for VExpressions (#10166) 2022-06-21 07:43:25 +08:00
f5e5880fb6 [Improvement] make expression for template argument a constexpr (#10268) 2022-06-21 07:42:02 +08:00
5974e452bc [enhancement] CRC32 instructions compatible arm arch (#10261)
The performance of some CPUs that do not implement CRC instructions is particularly poor
2022-06-20 17:49:06 +08:00
c3743ec9aa [enhancement] optmize 2 cases in seg_iter: all/none rows passed predicate (#10259)
* [enhancement] optmize 2 cases: all/none rows passed predicate in seg_iter.

* format
2022-06-20 17:47:52 +08:00
57327e6236 [improvement]Separate input and output parameters in ColumnPredicate (#10249)
```cpp
for (uint16_t i = 0; i < *size; ++i) {
	// some code here
}
```
The value of size is read for each conditional test, which also prevents possible vectorization.
2022-06-20 15:04:57 +08:00
588634ddf6 [feature] support runtime filter on vectorized engine (#10103) 2022-06-20 09:46:38 +08:00
ecdf8bcfdd [comments]Replace some chinese comments in product Code (#10243) 2022-06-20 09:24:19 +08:00
1c9ce29440 [improvement]Avoid frequently allocating and releasing flags in InListPredicate (#10248) 2022-06-20 09:08:02 +08:00
ab29ad2144 [typo] Fix typos in comments (#10247) 2022-06-20 09:06:29 +08:00
67f341f44e [TLP](step-1) Remove incubator prefix (#10230)
Remove some `incubator-` prefix in source code.
The document is not modified, will be done in next PR.
2022-06-19 19:34:52 +08:00
6ad024a2bf [fix] (mem tracker) Refactor memtable mem tracker, fix flush memtable DCHECK failed (#10156)
1. Added memory leak detection for `DeltaWriter` and `MemTable` mem tracker
2. Modify memtable mem tracker to virtual to avoid frequent recursive consumption of parent tracker.
3. Disable memtable flush thread attach memtable tracker, ensure that memtable mem tracker is completely accurate.
4. Modify `memory_verbose_track=false`. At present, there is a performance problem in the frequent switch thread mem tracker. 
      - Because the mem tracker exists as a shared_ptr in the thread local. Each time it is switched, the atomic variable use_count in the shared_ptr of the current tracker will be -1, and the tracker to be replaced use_count +1, multi-threading Frequent changes to the same tracker shared_ptr are slow.
      - TODO: 1. Reduce unnecessary thread mem tracker switch, 2. Consider using raw pointers for mem tracker in thread local.
2022-06-19 16:48:42 +08:00
70450d04ba [typo] Fix typos in comments (#10172) 2022-06-19 10:30:17 +08:00
ffe466cbc7 [fix](reader)replace an auto with size_t to avoid integer overflow (#10163) 2022-06-19 10:29:01 +08:00
5fdd995b4c [fix] Fix heap-use-after-free when using type array<string> (#10127) 2022-06-19 10:27:36 +08:00
1d3496c6ab [feature] support backup/restore connect to HDFS (#10081) 2022-06-19 10:26:20 +08:00
0e404edf54 [improvement] Change array offset type from UInt32 to UInt64 (#10070)
Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now.
If we call array_union to merge arrays repeatedly, the size of array may overflow.
So we need to extend it before `Array Data Type` release.
2022-06-19 10:24:08 +08:00
7a85e8d525 [bug](be) fix be block_reader.cc::_update_agg_value() mem leak.(#10216) (#10218) 2022-06-17 21:25:52 +08:00
f7789f4bc4 [fix]InListPredicate wrong result (#10211)
* fix

* reg test

Co-authored-by: Wang Bo <wangbo36@meituan.com>
2022-06-17 18:34:25 +08:00
f35b235c3b [opt](compaction) optimize compaction in concurrent load (#10153)
add some logic to opt compaction:
1.seperate base&cumu compaction in case base compaction runs too long and
affect cumu compaction
2.fix level size in cu compaction so that file size below 64M have a right level
size, when choose rowsets to do compaction, the policy will ignore big rowset,
this will reduce about 25% cpu in high frequency concurrent load
3.remove skip window restriction so rowset can do compaction right after
generated, cause we'll not delete rowset after compaction. This will highly
reduce compaction score in concurrent log.
4.remove version consistence check in can_do_compaction, we'll choose a
consecutive rowset to do compaction, so this logic is useless

after add logic above, compaction score and cpu cost will have a substantial
optimize in concurrent load.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-06-17 17:49:45 +08:00
60147ad7a5 [Improvement] build runtime filters asynchronously (#10186) 2022-06-17 11:09:13 +08:00
5e47b03595 [feature-wip](array-type) Add array aggregation functions (#10108) 2022-06-17 11:07:49 +08:00
Pxl
fd0bd395ac [Enhancement] Remove some unused include (#10035) 2022-06-17 10:47:25 +08:00
44e979e43b [Vectorized][Function] add orthogonal bitmap agg functions (#10126)
* [Vectorized][Function] add orthogonal bitmap agg functions
save some file about orthogonal bitmap function
add some file to rebase
update functions file

* refactor union_count function
refactor orthogonal union count functions

* remove bool is_variadic
2022-06-17 08:48:41 +08:00