doris

Author	SHA1	Message	Date
lihangyu	007f152e5e	[Improve](compile) add `__AVX2__` macro for JsonbParser (#28754 ) * [Improve](compile) add `__AVX2__` macro for JsonbParser * throw exception instead of CHECK	2023-12-21 10:25:26 +08:00
lihangyu	942450a2e5	[Fix](Variant) ColumnObject need to be finalized when doing ColumnObject::update_hash_with_value (#28119 ) Otherwise accessing rows at `n` will lead to heap buffer overflow ``` 5# SipHash::update(char const*, unsigned long) at /home/zcp/repo_center/doris_master/doris/be/src/vec/common/sip_hash.h:132 6# doris::vectorized::ColumnString::update_hash_with_value(unsigned long, SipHash&) const at /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_string.h:452 7# doris::vectorized::ColumnObject::update_hash_with_value(unsigned long, SipHash&) const at /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_object.cpp:1433 8# doris::vectorized::Block::update_hash(SipHash&) const at /home/zcp/repo_center/doris_master/doris/be/src/vec/core/block.cpp:721 9# doris::EngineChecksumTask::_compute_checksum() at ```	2023-12-07 18:48:05 +08:00
Pxl	e3d2425d47	[Improvement](join) remove insert_indices_from_join and special judge for -1 (#27779 ) remove insert_indices_from_join and special judge for -1	2023-12-04 11:03:22 +08:00
lihangyu	7e3d6bc9f1	[Fix](Variant) Implement ColumnObject::update_hash_with_value (#27873 )	2023-12-01 20:14:47 +08:00
lihangyu	f3a1abf20b	[chore](compile) fix compile error in ColumnObject (#27739 ) This is issue is caused by the two PR merged without conflict	2023-11-29 13:39:32 +08:00
lihangyu	7398c3daf1	[Feature-Variant](Variant Type) support variant type query and index (#27676 )	2023-11-29 10:37:28 +08:00
Pxl	d969047b50	[Refactor](join) refactor of hash join (#27557 ) Improve the performance under the tpch data set by reconstructing the join related code and the use of hash table Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: BiteTheDDDDt <pxl290@qq.com>	2023-11-28 19:46:00 +08:00
zhiqiang	a5565f68b2	[Refactor](opentelemetry) Remove opentelemetry (#26605 )	2023-11-09 18:05:34 +08:00
lihangyu	44b51bf0b9	[Feature](Variant) support variant load (#26572 )	2023-11-08 00:37:57 -06:00
Jerry Hu	2664d1cffb	[chore](vec) Make this copy constructor of StringRef explicit (#25337 )	2023-10-12 14:12:46 +08:00
lihangyu	913282b29b	[refactor](column) remove `get_data_type` in IColumn (#25242 )	2023-10-10 20:27:15 +08:00
bobhan1	642e5cdb69	[Fix](Status) Make `Status` `[[nodiscard]]` and handle returned `Status` correctly (#23395 )	2023-09-29 22:38:52 +08:00
lihangyu	50c1d55769	[Improve](dynamic schema) support filtering invalid data (#21160 ) * [Improve](dynamic schema) support filtering invalid data 1. Support dynamic schema to filter illegal data. 2. Expand the regular expression for ColumnName to support more column names. 3. Be compatible with PropertyAnalyzer and support legacy tables. 4. Default disable parse multi dimenssion array, since some bug unresolved	2023-06-26 19:32:43 +08:00
lihangyu	9e21318834	[refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594 ) 1. make ColumnObject exception safe 2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema 3. add more test cases	2023-06-01 10:25:04 +08:00
lihangyu	8cc0af150a	[Fix](dynamic table) fix dynamic table with insert into and column al… (#18808 ) 1. The num_rows should be correctly set 2. insert into has no dynamic column	2023-04-21 11:19:00 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
yiguolei	f38e00b4c0	[refactor](typesystem) using typeindex to create column instead of type name because type name is not stable (#18328 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-09 18:08:31 +08:00
lihangyu	043f77200f	[Bug](dynamic-table) Fix column alignment logic and support filtering null values when slot is not null (#17842 ) Before this PR when encountering null values with some columns which is specified as `NOT NULL`, null values will not be filtered,thi behavior does not match with the original load behavior. Second column alignment logic has bug : ``` template <typename ColumnInserterFn> void align_variant_by_name_and_type(ColumnObject& dst, const ColumnObject& src, size_t row_cnt, ColumnInserterFn inserter) { CHECK(dst.is_finalized() && src.is_finalized()); // Use rows() here instead of size(), since size() will check_consistency // but we could not check_consistency since num_rows will be upgraded even // if src and dst is empty, we just increase the num_rows of dst and fill // num_rows of default values when meet new data size_t num_rows = dst.rows(); ```	2023-03-17 16:53:30 +08:00
lihangyu	9b7596f1c6	[Feature](Dynamic schema table) step1 support schema change expression (#17494 ) 1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility	2023-03-13 15:12:42 +08:00
Pxl	e2ac06d6d6	[Chore](execution) change PipelineTaskState to enum class && remove some row-based code (#17300 ) 1. change PipelineTaskState to enum class 2. remove some row-based code on FoldConstantExecutor::_get_result 3. reduce memcpy on minmax runtime filter function(Now we can guarantee that the input data is aligned) 4. add Wunused-template check, and remove some unused function, change some static function to inline function.	2023-03-08 12:41:15 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00

21 Commits