Commit Graph

18 Commits

Author SHA1 Message Date
b15ccdbe98 [Pick](Variant) pick some fix (#37922)
#37674
#37839
#37883 
#37857 
#37794
2024-07-16 21:38:47 +08:00
3cd7b88868 [Fix](Variant) fix variant with empty key (#35671)
in some senario empty key will cause crash like

```
*** tablet *** SIGSEGV unknown detail explain (@0x0) received by PID 1527747 (
TID 1544788 OR 0x7f3302988700) from PID 0; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*
, void*) at /mnt/disk2/lihangyu/doris/be/src/common/signal_handler.h:429
 1# 0x00007F4880A12B50 in /lib64/libc.so.6
 2# doris::vectorized::PathInDataBuilder::append(std::basic_string_view<char,
std::char_traits<char> >, bool) at /mnt/disk2/lihangyu/doris/be/src/vec/json/p
ath_in_data.cpp:193
 3# doris::vectorized::JSONDataParser<doris::vectorized::SimdJSONParser, false
>::traverseObject(doris::vectorized::SimdJSONParser::Object const&, doris::vec
torized::JSONDataParser<doris::vectorized::SimdJSONParser, false>::ParseContex
t&) at /mnt/disk2/lihangyu/doris/be/src/vec/json/json_parser.cpp:121
 4# doris::vectorized::JSONDataParser<doris::vectorized::SimdJSONParser, false
>::traverse(doris::vectorized::SimdJSONParser::Element const&, doris::vectoriz
ed::JSONDataParser<doris::vectorized::SimdJSONParser, false>::ParseContext&) a
t /mnt/disk2/lihangyu/doris/be/src/vec/json/json_parser.cpp:95
 5# doris::vectorized::JSONDataParser<doris::vectorized::SimdJSONParser, false
>::parse(char const*, unsigned long) at /mnt/disk2/lihangyu/doris/be/src/vec/j
son/json_parser.cpp:81
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-05-30 19:55:25 +08:00
691f3c5ee7 [Performance](Variant) Improve load performance for variant type (#33890)
1. remove phmap for padding rows
2. add SimpleFieldVisitorToScarlarType for short circuit type deducing
3. correct type coercion for conflict types bettween integers
4. improve nullable column performance
5. remove shared_ptr dependancy for DataType use TypeIndex instead
6. Optimization by caching the order of fields (which is almost always the same)
and a quick check to match the next expected field, instead of searching the hash table.

benchmark:
In clickbench data, load performance:
12m36.799s ->7m10.934s about 43% latency reduce

In variant_p2/performance.groovy:
3min44s20 -> 1min15s80 about 66% latency reducy
2024-05-18 17:58:33 +08:00
25358564ca [Fix](compile) Fix gcc compile on master (#33864)
This is imported by #33511. wrongly used

ColumnStr<T> ();

which violate C++20 standard(see https://wg21.cmeerw.net/cwg/issue2237) but still supported by clang up until now(see llvm/llvm-project#58112)
2024-04-19 23:41:37 +08:00
1300317723 [Exec](join) Support column string64 to avoid join failed in string size overflow the uint32 (#33511) (#33850) 2024-04-18 19:43:08 +08:00
0da010603e [Improve](TabletSchemaCache) reduce duplicated memory consumption for column name and column path (#31141)
Both could be reference to related field in TabletColumn.And use shared_ptr for TabletColumn in TabletSchema for later memory reuse
2024-03-09 19:44:42 +08:00
7b79b77cc9 [Optimize](Variant) make tablet schema more well-organized (#99) (#30922) 2024-02-18 11:50:17 +08:00
e9e1e2894b [performance](variant) support topn 2phase read for variant column (#28318)
[performance](variant) support topn 2phase read for variant column
2023-12-25 11:50:41 +08:00
341822ec05 [regression-test](Variant) add compaction case for variant and fix bugs (#28066) 2023-12-08 12:18:46 +08:00
7398c3daf1 [Feature-Variant](Variant Type) support variant type query and index (#27676) 2023-11-29 10:37:28 +08:00
44b51bf0b9 [Feature](Variant) support variant load (#26572) 2023-11-08 00:37:57 -06:00
2664d1cffb [chore](vec) Make this copy constructor of StringRef explicit (#25337) 2023-10-12 14:12:46 +08:00
1ef22d7f7c [Feature](variant) add variant type (#24170)
Add variant type for metadata Add persistent information for variant, including the path of variant sub-columns, persisting them to the segment footer and tablet schema of the rowset.
2023-09-14 14:21:53 +08:00
9e21318834 [refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594)
1. make ColumnObject exception safe
2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema
3. add more test cases
2023-06-01 10:25:04 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
Pxl
e2ac06d6d6 [Chore](execution) change PipelineTaskState to enum class && remove some row-based code (#17300)
1. change PipelineTaskState to enum class
2. remove some row-based code on FoldConstantExecutor::_get_result
3. reduce memcpy on minmax runtime filter function(Now we can guarantee that the input data is aligned)
4. add Wunused-template check, and remove some unused function, change some static function to inline function.
2023-03-08 12:41:15 +08:00
be9385d40a [improvement](lock raii) use raii to lock and unlock (#16652)
* [improvement](lock raii) use raii to lock and unlock

This is part of exception safe: #16366.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-13 14:06:36 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00