Commit Graph

8 Commits

Author SHA1 Message Date
3cd7b88868 [Fix](Variant) fix variant with empty key (#35671)
in some senario empty key will cause crash like

```
*** tablet *** SIGSEGV unknown detail explain (@0x0) received by PID 1527747 (
TID 1544788 OR 0x7f3302988700) from PID 0; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*
, void*) at /mnt/disk2/lihangyu/doris/be/src/common/signal_handler.h:429
 1# 0x00007F4880A12B50 in /lib64/libc.so.6
 2# doris::vectorized::PathInDataBuilder::append(std::basic_string_view<char,
std::char_traits<char> >, bool) at /mnt/disk2/lihangyu/doris/be/src/vec/json/p
ath_in_data.cpp:193
 3# doris::vectorized::JSONDataParser<doris::vectorized::SimdJSONParser, false
>::traverseObject(doris::vectorized::SimdJSONParser::Object const&, doris::vec
torized::JSONDataParser<doris::vectorized::SimdJSONParser, false>::ParseContex
t&) at /mnt/disk2/lihangyu/doris/be/src/vec/json/json_parser.cpp:121
 4# doris::vectorized::JSONDataParser<doris::vectorized::SimdJSONParser, false
>::traverse(doris::vectorized::SimdJSONParser::Element const&, doris::vectoriz
ed::JSONDataParser<doris::vectorized::SimdJSONParser, false>::ParseContext&) a
t /mnt/disk2/lihangyu/doris/be/src/vec/json/json_parser.cpp:95
 5# doris::vectorized::JSONDataParser<doris::vectorized::SimdJSONParser, false
>::parse(char const*, unsigned long) at /mnt/disk2/lihangyu/doris/be/src/vec/j
son/json_parser.cpp:81
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-05-30 19:55:25 +08:00
691f3c5ee7 [Performance](Variant) Improve load performance for variant type (#33890)
1. remove phmap for padding rows
2. add SimpleFieldVisitorToScarlarType for short circuit type deducing
3. correct type coercion for conflict types bettween integers
4. improve nullable column performance
5. remove shared_ptr dependancy for DataType use TypeIndex instead
6. Optimization by caching the order of fields (which is almost always the same)
and a quick check to match the next expected field, instead of searching the hash table.

benchmark:
In clickbench data, load performance:
12m36.799s ->7m10.934s about 43% latency reduce

In variant_p2/performance.groovy:
3min44s20 -> 1min15s80 about 66% latency reducy
2024-05-18 17:58:33 +08:00
44b51bf0b9 [Feature](Variant) support variant load (#26572) 2023-11-08 00:37:57 -06:00
2664d1cffb [chore](vec) Make this copy constructor of StringRef explicit (#25337) 2023-10-12 14:12:46 +08:00
9e21318834 [refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594)
1. make ColumnObject exception safe
2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema
3. add more test cases
2023-06-01 10:25:04 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
be9385d40a [improvement](lock raii) use raii to lock and unlock (#16652)
* [improvement](lock raii) use raii to lock and unlock

This is part of exception safe: #16366.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-13 14:06:36 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00