doris

Author	SHA1	Message	Date
lihangyu	527293aa41	[refactor](dynamic table) remove dynamic table (#23298 )	2023-08-23 14:15:14 +08:00
Gabriel	ba882dea21	[pipelineX](dependency) Build DAG between pipelines (#23355 )	2023-08-23 13:21:32 +08:00
Jerry Hu	14296ee87f	[fix](window_function) wrong order by range (#23346 )	2023-08-23 11:23:00 +08:00
Pxl	8ed4045df9	[Chore](primitive-type) remove VecPrimitiveTypeTraits (#22842 )	2023-08-23 08:37:40 +08:00
Pxl	e6d20f842c	[Bug](compile) fix compile failed on function case (#23335 )	2023-08-22 22:10:53 +08:00
HappenLee	5c2fae7ce5	[pipeline](exec) Refactor the table sink code in remove unless code (#23223 ) Refactor the table sink code in remove unless code	2023-08-22 20:42:14 +08:00
Pxl	1a1f86486d	[Improvement](function) opt for case when (#23068 ) opt for case when	2023-08-22 18:31:40 +08:00
camby	0b51e6d8e1	[refractor](FunctionArrayIndex) make the codes more simple	2023-08-22 17:48:59 +08:00
Ashin Gau	9d2e23b1aa	[fix](parquet) A row of complex type may be stored across more pages (#23277 ) A row of complex type may be stored across two(or more) pages, and the parameter `align_rows` indicates that whether the reader should read the remaining value of the last row in previous page.	2023-08-22 14:47:10 +08:00
Ashin Gau	5ff7b57fc1	[fix](parquet) parquet reader confuses logical/physical/slot id of columns (#23198 ) `ParquetReader` confuses logical/physical/slot id of columns. If only reading the scalar types, there's nothing wrong, but when reading complex types, `RowGroup` and `PageIndex` will get wrong statistics. Therefore, if the query contains complex types and pushed-down predicates, the probability of the result set is incorrect.	2023-08-22 13:35:29 +08:00
Gabriel	12075f9853	[pipelineX](projection) Support projection and blocking agg (#23256 )	2023-08-21 22:23:02 +08:00
Gabriel	dcd6c3c022	[pipelineX](refactor) propose a new pipeline execution model (#22562 )	2023-08-21 15:38:45 +08:00
plat1ko	d4694167a8	[Enhancement](chore) Some Status relevant enhancement (#23072 )	2023-08-21 14:14:38 +08:00
zhangstar333	37b49f60b7	[refactor](conf) add be conf for partition topn partitions threshold (#23220 ) add be conf for partition topn partitions threshold	2023-08-21 10:52:41 +08:00
amory	33dfa0c454	[Improve](serde) support text serde for nested type-array/map (#22738 ) Now we can not support nested type array/map so this pr aim to: 1. add format option for string convert defined datatype to keep with origin from_string 2. support array map can nested array and map	2023-08-21 10:32:28 +08:00
Jerry Hu	0967d7ec04	[improvement](agg) Do not serialize bitmap to string (#23172 )	2023-08-21 10:10:15 +08:00
Pxl	a11e0e3bc4	[Bug](agg) fix QUANTILE_UNION many problems (#23181 ) fix QUANTILE_UNION many problems	2023-08-21 10:04:27 +08:00
Ashin Gau	4bf055c818	[fix](parquet) the key colum of map type in parquet may be nullable (#23180 ) Fix errors when reading map type with nullable key column in parquet file. `ParquetReader` support to read nullable key column, but add a check to prevent reading nullable key column. Unfortunately, this check error was not thrown correctly, causing the BE to crash, and thrown meaningless error logs in be.out: ``` ... 11# doris::vectorized::ParquetReader::get_columns(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, doris::TypeDescriptor, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, doris::TypeDescriptor> > >, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >) at /root/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:508 12# doris::vectorized::VFileScanner::_get_next_reader() in /root/yun_you_external/output/be/lib/doris_be 13# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState, doris::vectorized::Block, bool*) at /root/doris/be/src/vec/exec/scan/vfile_scanner.cpp:241 ... ```	2023-08-20 22:59:18 +08:00
HappenLee	433a6103ab	[Enhancement](scanner) allocate blocks in scanner_context on demand and free them on close (#23182 ) Introduced #19389 , removed #20785	2023-08-19 12:13:24 +08:00
Tiewei Fang	0838ff4bf4	[fix](Outfile) fix bug that the `fileSize` is not correct when outfile is completed (#22951 )	2023-08-18 22:31:44 +08:00
daidai	419e922a69	[fix](json)Fix the bug that does not stop when reading json files (#23062 ) * [fix](json)Fix the bug that does not stop when reading json files	2023-08-18 18:23:19 +08:00
Pxl	477961dc21	[Chore](agg) refactor of hash map (#22958 ) refactor of hash map	2023-08-18 17:59:30 +08:00
HappenLee	3d4ec1ac88	[pipeline](exec) support async writer in jdbc sink in pipeline query engine (#23144 ) support async writer in jdbc sink in pipeline query engine	2023-08-18 17:07:57 +08:00
ZenoYang	1c3cc77a54	[fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty (#21236 ) * [fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty * add ut * fix nereids * fix regression-test	2023-08-18 14:37:49 +08:00
Ashin Gau	795006ea3d	[fix](multi-catalog) conversion of compatible numerical types (#23113 ) Hive support schema change, but doesn't rewrite the parquet file, so the physical type of parquet file may not equal the logical type of table schema.	2023-08-18 14:05:33 +08:00
wuwenchi	a5ca6cadd6	[Improvement] Optimize count operation for iceberg (#22923 ) Iceberg has its own metadata information, which includes count statistics for table data. If the table does not contain equli'ty delete, we can get the count data of the current table directly from the count statistics.	2023-08-18 09:57:51 +08:00
Qi Chen	314f5a5143	[Fix](orc-reader) Fix filling partition or missing column used incorrect row count. (#23096 ) [Fix](orc-reader) Fix filling partition or missing column used incorrect row count. `_row_reader->nextBatch` returns number of read rows. When orc lazy materialization is turned on, the number of read rows includes filtered rows, so caller must look at `numElements` in the row batch to determine how many rows were not filtered which will to fill to the block. In this case, filling partition or missing column used incorrect row count which will cause be crash by `filter.size() != offsets.size()` in filter column step. When orc lazy materialization is turned off, add `_convert_dict_cols_to_string_cols(block, nullptr)` if `(block->rows() == 0)`.	2023-08-17 23:26:11 +08:00
starocean999	57568ba472	[fix](be)shouldn't use arena to alloc memory for SingleValueDataString (#23075 ) * [fix](be)shouldn't use arena to alloc memory for SingleValueDataString * format code	2023-08-17 22:18:09 +08:00
Jerry Hu	c5c984b79b	[refactor](bitmap) using template to reduce duplicate code (#23060 ) * [refactor](bitmap) support for batch value insertion * fix values was not filled for int8 and int16	2023-08-17 18:14:29 +08:00
TengJianPing	b252c49071	[fix](hash join) fix heap-use-after-free of HashJoinNode (#23094 )	2023-08-17 16:29:47 +08:00
Mryange	e289e03a1a	[fix](executor)fix no return with old type in time_round	2023-08-17 15:34:26 +08:00
Pxl	cf1865a1c8	[Bug](scan) fix core dump due to store_path_map (#23084 ) fix core dump due to store_path_map	2023-08-17 15:24:43 +08:00
zzzxl	8b51da0523	[Fix](load) fix partiotion Null pointer exception (#22965 )	2023-08-17 14:09:47 +08:00
TengJianPing	343a6dc29d	[improvement](hash join) Return result early if probe side has no data (#23044 )	2023-08-17 09:17:09 +08:00
amory	390c52f73a	[Improve](complex-type) update for array/map element_at with nested complex type with local tvf (#22927 )	2023-08-16 20:47:36 +08:00
Pxl	d5df3bae25	[Bug](exchange) fix dcheck fail when VDataStreamRecvr input empty block (#22992 ) fix dcheck fail when VDataStreamRecvr input empty block	2023-08-16 10:21:19 +08:00
Gabriel	f191736bfe	[bug](shuffle) Fix DCHECK failure if exchange node has limit (#22993 )	2023-08-15 19:14:37 +08:00
HappenLee	9b2323b7fd	[Pipeline](exec) support async writer in pipelien query engine (#22901 )	2023-08-15 17:32:53 +08:00
TengJianPing	50f66b1246	[fix](pipeline) fix bug of datastream sender when doing BUCKET_SHFFULE_HASH_PARTITIONED shuffle (#22988 ) This issue is introduced by #22765, if #22765 is picked to 2.0, then also need to pick this PR. When shuffle type is BUCKET_SHFFULE_HASH_PARTITIONED, since data of multi buckets maybe sent to the same channel, send eos too early may cause data lost.	2023-08-15 17:30:27 +08:00
Mryange	f1864d9fcf	[fix](function) fix str_to_date with specific format #22981	2023-08-15 15:30:48 +08:00
Jerry Hu	9b42093742	[feature](agg) Make 'map_agg' support array type as value (#22945 )	2023-08-15 14:44:50 +08:00
zhangstar333	c2ff940947	[refactor](parquet)change decimal type export as fixed-len-byte on parquet write (#22792 ) before the parquet write export decimal as byte-binary, but can't be import those fied to Hive. Now, change to export decimal as fixed-len-byte-array in order to import hive directly.	2023-08-15 13:17:50 +08:00
Mryange	94bf8fb3c5	[performance](executor) optimize time_round function only one arg (#22855 )	2023-08-15 13:16:42 +08:00
airborne12	d431a35721	[Fix](inverted index) fix non-index match function core (#22959 )	2023-08-15 11:27:12 +08:00
xy	b5ea3454a6	[Bug](aggregation)fix for map_agg when columns[1] is nullable (#22932 ) In the map_agg handler function, added the judgment on columns[1]->is_nullable()	2023-08-15 11:26:03 +08:00
Pxl	3f55d5d4d5	[Chore](excution) change some log fatal and dcheck to exception (#22890 ) change some log fatal and dcheck to exception	2023-08-15 10:45:00 +08:00
TengJianPing	8318dfa9a3	[fix](datastream sender) fix wrong result of BUCKET_SHFFULE_HASH_PARTITIONED shuffle (#22973 ) fix wrong result of BUCKET_SHFFULE_HASH_PARTITIONED shuffle	2023-08-15 10:21:14 +08:00
zhangstar333	911bd0e818	[bug](if) fix if function not handle const nullable value (#22823 ) fix if function not handle const nullable value	2023-08-15 10:16:48 +08:00
Siyang Tang	b49dc8042d	[feature](load) refactor CSV reading process during scanning, and support enclose and escape for stream load (#22539 ) ## Proposed changes Refactor thoughts: close #22383 Descriptions about `enclose` and `escape`: #22385 ## Further comments 2023-08-09: It's a pity that experiment shows that the original way for parsing plain CSV is faster. Therefor, the refactor is only applied on enclose related code. The plain CSV parser use the original logic. Fallback of performance is unavoidable anyway. From the `CSV reader`'s perspective, the real weak point may be the write column behavior, proved by the flame graph. Trimming escape will be enable after fix: #22411 is merged Cases should be discussed: 1. When an incomplete enclose appears in the beginning of a large scale data, the line delimiter will be unreachable till the EOF, will the buffer become extremely large? 2. What if an infinite line occurs in the case? Essentially, `1.` is equivalent to this. Only support stream load as trial in this PR, avoid too many unrelated changes. Docs will be added when `enclose` and `escape` is available for all kinds of load.	2023-08-15 09:23:53 +08:00
TengJianPing	7bc98748cf	[fix](datastream sender) fix wrong result of broadcast join; fix wrong result of pipeline (#22942 ) Fix bug of #22765 Close #22924	2023-08-14 18:59:19 +08:00

1 2 3 4 5 ...

2085 Commits