doris

Author	SHA1	Message	Date
lihangyu	b15ccdbe98	[Pick](Variant) pick some fix (#37922 ) #37674 #37839 #37883 #37857 #37794	2024-07-16 21:38:47 +08:00
lihangyu	3cd7b88868	[Fix](Variant) fix variant with empty key (#35671 ) in some senario empty key will cause crash like ``` * tablet * SIGSEGV unknown detail explain (@0x0) received by PID 1527747 ( TID 1544788 OR 0x7f3302988700) from PID 0; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t* , void) at /mnt/disk2/lihangyu/doris/be/src/common/signal_handler.h:429 1# 0x00007F4880A12B50 in /lib64/libc.so.6 2# doris::vectorized::PathInDataBuilder::append(std::basic_string_view<char, std::char_traits<char> >, bool) at /mnt/disk2/lihangyu/doris/be/src/vec/json/p ath_in_data.cpp:193 3# doris::vectorized::JSONDataParser<doris::vectorized::SimdJSONParser, false >::traverseObject(doris::vectorized::SimdJSONParser::Object const&, doris::vec torized::JSONDataParser<doris::vectorized::SimdJSONParser, false>::ParseContex t&) at /mnt/disk2/lihangyu/doris/be/src/vec/json/json_parser.cpp:121 4# doris::vectorized::JSONDataParser<doris::vectorized::SimdJSONParser, false >::traverse(doris::vectorized::SimdJSONParser::Element const&, doris::vectoriz ed::JSONDataParser<doris::vectorized::SimdJSONParser, false>::ParseContext&) a t /mnt/disk2/lihangyu/doris/be/src/vec/json/json_parser.cpp:95 5# doris::vectorized::JSONDataParser<doris::vectorized::SimdJSONParser, false >::parse(char const, unsigned long) at /mnt/disk2/lihangyu/doris/be/src/vec/j son/json_parser.cpp:81 ``` ## Proposed changes Issue Number: close #xxx <!--Describe your changes.-->	2024-05-30 19:55:25 +08:00
lihangyu	691f3c5ee7	[Performance](Variant) Improve load performance for variant type (#33890 ) 1. remove phmap for padding rows 2. add SimpleFieldVisitorToScarlarType for short circuit type deducing 3. correct type coercion for conflict types bettween integers 4. improve nullable column performance 5. remove shared_ptr dependancy for DataType use TypeIndex instead 6. Optimization by caching the order of fields (which is almost always the same) and a quick check to match the next expected field, instead of searching the hash table. benchmark: In clickbench data, load performance: 12m36.799s ->7m10.934s about 43% latency reduce In variant_p2/performance.groovy: 3min44s20 -> 1min15s80 about 66% latency reducy	2024-05-18 17:58:33 +08:00
zclllyybb	25358564ca	[Fix](compile) Fix gcc compile on master (#33864 ) This is imported by #33511. wrongly used ColumnStr<T> (); which violate C++20 standard(see https://wg21.cmeerw.net/cwg/issue2237) but still supported by clang up until now(see llvm/llvm-project#58112)	2024-04-19 23:41:37 +08:00
HappenLee	1300317723	[Exec](join) Support column string64 to avoid join failed in string size overflow the uint32 (#33511 ) (#33850 )	2024-04-18 19:43:08 +08:00
lihangyu	0da010603e	[Improve](TabletSchemaCache) reduce duplicated memory consumption for column name and column path (#31141 ) Both could be reference to related field in TabletColumn.And use shared_ptr for TabletColumn in TabletSchema for later memory reuse	2024-03-09 19:44:42 +08:00
lihangyu	7b79b77cc9	[Optimize](Variant) make tablet schema more well-organized (#99 ) (#30922 )	2024-02-18 11:50:17 +08:00
lihangyu	e9e1e2894b	[performance](variant) support topn 2phase read for variant column (#28318 ) [performance](variant) support topn 2phase read for variant column	2023-12-25 11:50:41 +08:00
lihangyu	341822ec05	[regression-test](Variant) add compaction case for variant and fix bugs (#28066 )	2023-12-08 12:18:46 +08:00
lihangyu	7398c3daf1	[Feature-Variant](Variant Type) support variant type query and index (#27676 )	2023-11-29 10:37:28 +08:00
lihangyu	44b51bf0b9	[Feature](Variant) support variant load (#26572 )	2023-11-08 00:37:57 -06:00
Jerry Hu	2664d1cffb	[chore](vec) Make this copy constructor of StringRef explicit (#25337 )	2023-10-12 14:12:46 +08:00
lihangyu	1ef22d7f7c	[Feature](variant) add variant type (#24170 ) Add variant type for metadata Add persistent information for variant, including the path of variant sub-columns, persisting them to the segment footer and tablet schema of the rowset.	2023-09-14 14:21:53 +08:00
lihangyu	9e21318834	[refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594 ) 1. make ColumnObject exception safe 2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema 3. add more test cases	2023-06-01 10:25:04 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Pxl	e2ac06d6d6	[Chore](execution) change PipelineTaskState to enum class && remove some row-based code (#17300 ) 1. change PipelineTaskState to enum class 2. remove some row-based code on FoldConstantExecutor::_get_result 3. reduce memcpy on minmax runtime filter function(Now we can guarantee that the input data is aligned) 4. add Wunused-template check, and remove some unused function, change some static function to inline function.	2023-03-08 12:41:15 +08:00
yiguolei	be9385d40a	[improvement](lock raii) use raii to lock and unlock (#16652 ) * [improvement](lock raii) use raii to lock and unlock This is part of exception safe: #16366. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-13 14:06:36 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00

18 Commits