doris

Author	SHA1	Message	Date
Ashin Gau	4bf055c818	[fix](parquet) the key colum of map type in parquet may be nullable (#23180 ) Fix errors when reading map type with nullable key column in parquet file. `ParquetReader` support to read nullable key column, but add a check to prevent reading nullable key column. Unfortunately, this check error was not thrown correctly, causing the BE to crash, and thrown meaningless error logs in be.out: ``` ... 11# doris::vectorized::ParquetReader::get_columns(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, doris::TypeDescriptor, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, doris::TypeDescriptor> > >, std::unordered_set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >) at /root/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:508 12# doris::vectorized::VFileScanner::_get_next_reader() in /root/yun_you_external/output/be/lib/doris_be 13# doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState, doris::vectorized::Block, bool*) at /root/doris/be/src/vec/exec/scan/vfile_scanner.cpp:241 ... ```	2023-08-20 22:59:18 +08:00
HappenLee	433a6103ab	[Enhancement](scanner) allocate blocks in scanner_context on demand and free them on close (#23182 ) Introduced #19389 , removed #20785	2023-08-19 12:13:24 +08:00
Tiewei Fang	0838ff4bf4	[fix](Outfile) fix bug that the `fileSize` is not correct when outfile is completed (#22951 )	2023-08-18 22:31:44 +08:00
daidai	419e922a69	[fix](json)Fix the bug that does not stop when reading json files (#23062 ) * [fix](json)Fix the bug that does not stop when reading json files	2023-08-18 18:23:19 +08:00
Pxl	477961dc21	[Chore](agg) refactor of hash map (#22958 ) refactor of hash map	2023-08-18 17:59:30 +08:00
HappenLee	3d4ec1ac88	[pipeline](exec) support async writer in jdbc sink in pipeline query engine (#23144 ) support async writer in jdbc sink in pipeline query engine	2023-08-18 17:07:57 +08:00
ZenoYang	1c3cc77a54	[fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty (#21236 ) * [fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty * add ut * fix nereids * fix regression-test	2023-08-18 14:37:49 +08:00
Ashin Gau	795006ea3d	[fix](multi-catalog) conversion of compatible numerical types (#23113 ) Hive support schema change, but doesn't rewrite the parquet file, so the physical type of parquet file may not equal the logical type of table schema.	2023-08-18 14:05:33 +08:00
wuwenchi	a5ca6cadd6	[Improvement] Optimize count operation for iceberg (#22923 ) Iceberg has its own metadata information, which includes count statistics for table data. If the table does not contain equli'ty delete, we can get the count data of the current table directly from the count statistics.	2023-08-18 09:57:51 +08:00
Qi Chen	314f5a5143	[Fix](orc-reader) Fix filling partition or missing column used incorrect row count. (#23096 ) [Fix](orc-reader) Fix filling partition or missing column used incorrect row count. `_row_reader->nextBatch` returns number of read rows. When orc lazy materialization is turned on, the number of read rows includes filtered rows, so caller must look at `numElements` in the row batch to determine how many rows were not filtered which will to fill to the block. In this case, filling partition or missing column used incorrect row count which will cause be crash by `filter.size() != offsets.size()` in filter column step. When orc lazy materialization is turned off, add `_convert_dict_cols_to_string_cols(block, nullptr)` if `(block->rows() == 0)`.	2023-08-17 23:26:11 +08:00
starocean999	57568ba472	[fix](be)shouldn't use arena to alloc memory for SingleValueDataString (#23075 ) * [fix](be)shouldn't use arena to alloc memory for SingleValueDataString * format code	2023-08-17 22:18:09 +08:00
Jerry Hu	c5c984b79b	[refactor](bitmap) using template to reduce duplicate code (#23060 ) * [refactor](bitmap) support for batch value insertion * fix values was not filled for int8 and int16	2023-08-17 18:14:29 +08:00
TengJianPing	b252c49071	[fix](hash join) fix heap-use-after-free of HashJoinNode (#23094 )	2023-08-17 16:29:47 +08:00
Mryange	e289e03a1a	[fix](executor)fix no return with old type in time_round	2023-08-17 15:34:26 +08:00
Pxl	cf1865a1c8	[Bug](scan) fix core dump due to store_path_map (#23084 ) fix core dump due to store_path_map	2023-08-17 15:24:43 +08:00
zzzxl	8b51da0523	[Fix](load) fix partiotion Null pointer exception (#22965 )	2023-08-17 14:09:47 +08:00
TengJianPing	343a6dc29d	[improvement](hash join) Return result early if probe side has no data (#23044 )	2023-08-17 09:17:09 +08:00
amory	390c52f73a	[Improve](complex-type) update for array/map element_at with nested complex type with local tvf (#22927 )	2023-08-16 20:47:36 +08:00
Pxl	d5df3bae25	[Bug](exchange) fix dcheck fail when VDataStreamRecvr input empty block (#22992 ) fix dcheck fail when VDataStreamRecvr input empty block	2023-08-16 10:21:19 +08:00
Gabriel	f191736bfe	[bug](shuffle) Fix DCHECK failure if exchange node has limit (#22993 )	2023-08-15 19:14:37 +08:00
HappenLee	9b2323b7fd	[Pipeline](exec) support async writer in pipelien query engine (#22901 )	2023-08-15 17:32:53 +08:00
TengJianPing	50f66b1246	[fix](pipeline) fix bug of datastream sender when doing BUCKET_SHFFULE_HASH_PARTITIONED shuffle (#22988 ) This issue is introduced by #22765, if #22765 is picked to 2.0, then also need to pick this PR. When shuffle type is BUCKET_SHFFULE_HASH_PARTITIONED, since data of multi buckets maybe sent to the same channel, send eos too early may cause data lost.	2023-08-15 17:30:27 +08:00
Mryange	f1864d9fcf	[fix](function) fix str_to_date with specific format #22981	2023-08-15 15:30:48 +08:00
Jerry Hu	9b42093742	[feature](agg) Make 'map_agg' support array type as value (#22945 )	2023-08-15 14:44:50 +08:00
zhangstar333	c2ff940947	[refactor](parquet)change decimal type export as fixed-len-byte on parquet write (#22792 ) before the parquet write export decimal as byte-binary, but can't be import those fied to Hive. Now, change to export decimal as fixed-len-byte-array in order to import hive directly.	2023-08-15 13:17:50 +08:00
Mryange	94bf8fb3c5	[performance](executor) optimize time_round function only one arg (#22855 )	2023-08-15 13:16:42 +08:00
airborne12	d431a35721	[Fix](inverted index) fix non-index match function core (#22959 )	2023-08-15 11:27:12 +08:00
xy	b5ea3454a6	[Bug](aggregation)fix for map_agg when columns[1] is nullable (#22932 ) In the map_agg handler function, added the judgment on columns[1]->is_nullable()	2023-08-15 11:26:03 +08:00
Pxl	3f55d5d4d5	[Chore](excution) change some log fatal and dcheck to exception (#22890 ) change some log fatal and dcheck to exception	2023-08-15 10:45:00 +08:00
TengJianPing	8318dfa9a3	[fix](datastream sender) fix wrong result of BUCKET_SHFFULE_HASH_PARTITIONED shuffle (#22973 ) fix wrong result of BUCKET_SHFFULE_HASH_PARTITIONED shuffle	2023-08-15 10:21:14 +08:00
zhangstar333	911bd0e818	[bug](if) fix if function not handle const nullable value (#22823 ) fix if function not handle const nullable value	2023-08-15 10:16:48 +08:00
Siyang Tang	b49dc8042d	[feature](load) refactor CSV reading process during scanning, and support enclose and escape for stream load (#22539 ) ## Proposed changes Refactor thoughts: close #22383 Descriptions about `enclose` and `escape`: #22385 ## Further comments 2023-08-09: It's a pity that experiment shows that the original way for parsing plain CSV is faster. Therefor, the refactor is only applied on enclose related code. The plain CSV parser use the original logic. Fallback of performance is unavoidable anyway. From the `CSV reader`'s perspective, the real weak point may be the write column behavior, proved by the flame graph. Trimming escape will be enable after fix: #22411 is merged Cases should be discussed: 1. When an incomplete enclose appears in the beginning of a large scale data, the line delimiter will be unreachable till the EOF, will the buffer become extremely large? 2. What if an infinite line occurs in the case? Essentially, `1.` is equivalent to this. Only support stream load as trial in this PR, avoid too many unrelated changes. Docs will be added when `enclose` and `escape` is available for all kinds of load.	2023-08-15 09:23:53 +08:00
TengJianPing	7bc98748cf	[fix](datastream sender) fix wrong result of broadcast join; fix wrong result of pipeline (#22942 ) Fix bug of #22765 Close #22924	2023-08-14 18:59:19 +08:00
Pxl	d371101bfd	[Improvement](aggregation) make fixed hashmap's bitmap_size flexable (#22573 ) make fixed hashmap's bitmap_size flexable	2023-08-14 10:47:06 +08:00
Gabriel	abc9de07b3	[Bug](pipeline) make sure sink is not blocked before try close (#22765 ) make sure sink is not blocked before try close	2023-08-13 13:20:48 +08:00
Jack Drogon	395840cbbb	[Chore](refactor) Split IndexChannel from vtablet_sink.h into vtablet_sink.cc (#22848 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-08-13 10:21:12 +08:00
flynn	4e880288c6	[refactor]use clear concept to replace std::enable_if_t (#22801 ) --------- Signed-off-by: flynn <fenglv15@mails.ucas.ac.cn>	2023-08-12 15:10:30 +08:00
amory	5e2748d2b4	[Improve](complex-type)update orc reader for complex type and add regress tests (#22856 )	2023-08-12 07:06:12 +08:00
DongLiang-0	db69457576	[fix](avro)Fix S3 TVF avro format reading failure (#22199 ) This pr fixes two issues: 1. when using s3 TVF to query files in AVRO format, due to the change of `TFileType`, the originally queried `FILE_S3 ` becomes `FILE_LOCAL`, causing the query failed. 2. currently, both parameters `s3.virtual.key` and `s3.virtual.bucket` are removed. A new `S3Utils` in jni-avro to parse the bucket and key of s3. The purpose of doing this operation is mainly to unify the parameters of s3.	2023-08-11 17:22:48 +08:00
Mryange	72e264dd59	[fix](executor)fix error when FixedContainer with null (#22850 )	2023-08-11 17:20:50 +08:00
TengJianPing	bcac160013	[fix](broadcast shuffle) fix wrong result of broadcast shuffle (#22847 ) When data stream sender is doing broadcast shuffle, it accumulate to batch size and then send blocks to destinations, but for local receivers, it ONLY send the current block, which will cause data loss. This issue is introduced by #22218. If #22218 is pick to 2.0 branch, then also need to pick this PR.	2023-08-11 17:01:11 +08:00
slothever	209f36f1bf	[fix](multi-catalog)fix jdbc loader (#22814 )	2023-08-11 14:36:19 +08:00
amory	be1e0dcd27	[new-feature](complex-type) support read nested parquet and orc file with complex type (#22793 )	2023-08-10 18:23:07 +08:00
Pxl	56392e21ae	[Bug](decimalv3) fix decimalv3 keyrange set wrong number #22818	2023-08-10 18:15:40 +08:00
Qi Chen	f2658dc7bd	[Feature](multi-catalog) Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema. (#22318 ) Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema by session var `truncate_char_or_varchar_columns`.	2023-08-10 14:37:20 +08:00
daidai	f1db6bd8c1	[feature](hive)append support for struct and map column type on textfile format of hive table (#22347 ) 1. append support for struct and map column type on textfile format of hive table. 2. optimizer code that array column type. ```mysql +------+------------------------------------+ \| id \| perf \| +------+------------------------------------+ \| 1 \| {"key1":"value1", "key2":"value2"} \| \| 1 \| {"key1":"value1", "key2":"value2"} \| \| 2 \| {"name":"John", "age":"30"} \| +------+------------------------------------+ ``` ```mysql +---------+------------------+ \| column1 \| column2 \| +---------+------------------+ \| 1 \| {10, "data1", 1} \| \| 2 \| {20, "data2", 0} \| \| 3 \| {30, "data3", 1} \| +---------+------------------+ ``` Summarizes support for complex types(support assign delimiter) : 1. array< primitive_type > and array< array< ... > > 2. map< primitive_type , primitive_type > 3. Struct< primitive_type , primitive_type ... >	2023-08-10 13:47:58 +08:00
Jerry Hu	57fb9799b5	[feature](agg) add aggregation function 'bitmap_agg' (#22768 ) This function can be used to replace bitmap_union(to_bitmap(expr))， because bitmap_union(to_bitmap(expr)) need create many many small bitmaps firstly and then merge them into a single bitmap. bitmap_agg will convert the column value into a bitmap directly. Its performance is better than bitmap_union(to_bitmap(expr)) . In our test , there is about 30% improvement.	2023-08-10 12:18:25 +08:00
herry2038	eafdab0cfd	[Enhancement](tvf) Add frontends_disks table-valued-function (#22568 ) --------- Co-authored-by: yuxianbing <yuxianbing@yy.com> Co-authored-by: yuxianbing <iloveqaz123>	2023-08-10 10:40:24 +08:00
Mryange	b25d52b736	[feature](cast) remove some unused in functioncast and support some function in nereids (#22729 ) 1 ConvertImplGenericFromString do not need a template StringColumnType 2 remove timev1 in function cast 3 support time_to_sec , sec_to_time in nereids	2023-08-10 10:10:32 +08:00
slothever	919bfd73f1	[improvement](multi-catalog)add scanner isolation class loader (#22247 ) Add scanner isolation class loader to make each plugin non-conflicting. The BE will get scanner classes by JNI call and use JniClassLoader load them. In the last version，we always get canner classes from the system class path by default, so it cannot isolate the classes for each scanner	2023-08-10 10:02:46 +08:00

1 2 3 4 5 ...

2068 Commits