doris

Author	SHA1	Message	Date
huanghaibin	0070909d30	[fix](group commit)Fix the issue of duplicate addition of wal path when encouter exception (#28691 )	2023-12-21 20:27:33 +08:00
Ashin Gau	bcf2683b9d	[fix](scanner) fix concurrency bugs when scanner is stopped or finished (#28650 ) `ScannerContext` will schedule scanners even after stopped, and confused with `_is_finished` and `_should_stop`. Only Fix the concurrency bugs when scanner is stopped or finished reported in https://github.com/apache/doris/pull/28384	2023-12-21 10:37:58 +08:00
lihangyu	36857006cd	[Fix](json reader) fix json reader crash due to `fmt::format_to` (#28737 ) ``` 4# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75 5# __cxxabiv1::__terminate(void ()()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48 6# 0x00005622F33D22B1 in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be 7# 0x00005622F33D2404 in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be 8# fmt::v7::detail::error_handler::on_error(char const) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be 9# char const* fmt::v7::detail::parse_replacement_field<char, fmt::v7::detail::format_handler<fmt::v7::detail::buffer_appender<char>, char, fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<char>, char> >&>(char const, char const, fmt::v7::detail::format_handler<fmt::v7::detail::buffer_appender<char>, char, fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<char>, char> >&) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be 10# void fmt::v7::detail::vformat_to<char>(fmt::v7::detail::buffer<char>&, fmt::v7::basic_string_view<char>, fmt::v7::basic_format_args<fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<fmt::v7::type_identity<char>::type>, fmt::v7::type_identity<char>::type> >, fmt::v7::detail::locale_ref) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be 11# doris::vectorized::NewJsonReader::_append_error_msg(rapidjson::GenericValue<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool*) at /root/doris/be/src/vec/exec/format/json/new_json_reader.cpp:924 12# doris::vectorized::NewJsonReader::_set_column_value ```	2023-12-20 19:58:30 +08:00
wudongliang	111185407c	[Improve](tvf)jni-avro support split file (#27933 )	2023-12-19 16:37:34 +08:00
huanghaibin	66fbb22ad7	[fix](group commit) Fix some wal problems on group commit (#28554 )	2023-12-19 09:51:03 +08:00
wuwenchi	97e63516b7	[fix](streamload) catch exception when reading arrow data (#28558 )	2023-12-18 22:03:57 +08:00
Qi Chen	eb99e4270d	[Fix](parquet_reader) Fix dict filtering doesn't work with plain dict encoding in parquet reader. (#28290 )	2023-12-15 09:27:02 +08:00
lihangyu	48937fef48	[Performance](json reader) optimize filling default values (#25542 ) Add a faster path for filling default values, since looking up value map is relatively slow	2023-12-14 10:20:29 +08:00
Ashin Gau	ec91dd1129	[opt](vfilescanner) interrupt running parquet/orc readers when scannode is finished (#28223 ) VScanNode::get_next will check whether the ScanNode has reached limit condition, and send eos to TaskScheduler, and TaskScheduler will try to close ScanNode. However, ScanNode must wait all running scanners finished, so even if ScanNode has reached limit condition, it can't be closed immediately. This PR try to interrupt the running readers, and make ScanNode to end as soon as possible.	2023-12-13 19:31:08 +08:00
Qi Chen	9861cfc4bc	[Fix](Transactional-Hive) Fix transactional hive core dump when `TransactionalHiveReader::init_row_filters()`. (#28238 ) Fix transactional hive core dump when TransactionalHiveReader::init_row_filters().	2023-12-12 14:17:26 +08:00
julic20s	d8d8f15bf3	[improvement](vectorization) Use requires instead of specialization for doris::vectorized::Decimal (#28027 ) Use requires instead of specialization for doris::vectorized::Decimal	2023-12-08 09:59:52 +08:00
HHoflittlefish777	f9d4690023	[improve](stack_trace) avoid print stack trace in csv and json reader #28129	2023-12-07 22:45:18 +08:00
HHoflittlefish777	cb9a6f63ab	[refactor](simd_json_reader) refactor simd json parse to adapt stream parse (#27972 )	2023-12-07 14:45:15 +08:00
wuwenchi	54d062ddee	[feature](stream load) (step one)Add arrow data type for stream load (#26709 ) By using the Arrow data format, we can reduce the streamload of data transferred and improve the data import performance	2023-12-06 23:29:46 +08:00
Mingyu Chen	3e8c75e246	[minor](orc) opt the log info in orc reader (#27951 )	2023-12-06 20:47:36 +08:00
Qi Chen	2b4c4bb442	[Fix][Opt](parquet-reader) Fix filter push down with decimal types in parquet reader. (#27897 ) Fix filter push down with decimal types in parquet reader introduced by #22842	2023-12-04 22:25:39 +08:00
HHoflittlefish777	97d36b4f38	[fix](csv_reader) fix trim_double_quotes behavior change (#27882 )	2023-12-03 22:57:55 +08:00
Qi Chen	fc8b32be7a	[Opt](multi-catalog) Opt parquet orc reader numeric copy by `memcpy()` and `memset()`. (#27545 ) Opt parquet orc reader null map decoding by memset().	2023-12-03 09:55:05 +08:00
HHoflittlefish777	54b5d04ff9	[improve](csv_reader) handle csv reader error (#27892 )	2023-12-02 10:05:02 +08:00
slothever	1706699e7e	[fix](multi-catalog)support the max compute partition prune (#27154 ) 1. max compute partition prune, we just support filter mc partitions by '='，it can filter just one partition to support multiple partition filter and range operator('>','<', '>='..), the partition prune should be supported. 2. add max compute row count cache and partitionValues cache 3. add max compute regression case	2023-12-01 22:28:26 +08:00
HHoflittlefish777	3e910e2978	[refactor](simd_json_reader) refactor simd json reader to adapt to parse multi json (#27272 )	2023-11-30 15:01:06 +08:00
Qi Chen	e4149c6e4c	[Fix](parquet-reader) Fix null map issue in parquet reader. (#27777 ) Fix null map issue in parquet reader which cause result incorrect such as `min()`, `max()`. In order to share null map between parquet converted src column and dst column to avoid copying. It is very tricky that will call mutable function `doris_nullable_column->get_null_map_column_ptr()` which will set `_need_update_has_null = true`. Because some operations such as agg will call `has_null()` to set `_need_update_has_null = false`.	2023-11-30 13:55:37 +08:00
HHoflittlefish777	498d27c905	[improve](json_reader) add prompt when all fields is null (#27630 )	2023-11-29 18:26:42 +08:00
ShowCode	f565f60bc3	[refactor](standard)BE:Initialize pointer variables in the class to nullptr by default (#27587 )	2023-11-28 13:02:30 +08:00
Ashin Gau	dd65cc1d14	[opt](MergedIO) no need to merge large columns (#27315 ) 1. Fix a profile bug of `MergeRangeFileReader`, and add a profile `ApplyBytes` to show the total bytes of ranges. 2. There's no need to merge large columns, because `MergeRangeFileReader` will increase the copy time.	2023-11-23 19:15:47 +08:00
huanghaibin	5d548935e0	[improvement](insert) support schema change and decommission for group commit (#26359 )	2023-11-17 21:41:38 +08:00
Ashin Gau	52995c528e	[fix](iceberg) iceberg use customer method to encode special characters of field name (#27108 ) Fix two bugs: 1. Missing column is case sensitive, change the column name to lower case in FE for hive/iceberg/hudi 2. Iceberg use custom method to encode special characters in column name. Decode the column name to match the right column in parquet reader.	2023-11-17 18:38:55 +08:00
Qi Chen	a0661ed9d2	[Fix](multi-catalog) Fix complex type crash when using dict filter facility in the parquet-reader. (#27151 ) - Fix complex type crash when using the dict filter facility in the parquet-reader by turning off the dict filter facility in this case. - Add orc complex types regression test.	2023-11-17 13:43:58 +08:00
daidai	3585c7e216	[test](parquet)append parquet reader byte_array_decimal and rle_bool case (#26751 )	2023-11-14 15:05:10 +08:00
Ashin Gau	ec40603b93	[fix](parquet) compressed_page_size has the same meaning in page v1 and v2 (#26783 ) 1. Parquet with page v2 is parsed error when using other codec except snappy. Because `compressed_page_size` has the same meaning in page v1 and v2, it always contains the bytes of definition level, repetition level and compressed data. 2. Add regression test for `fix_length_byte_array` stored decimal type, and dictionary encoded date/datetime type.	2023-11-14 08:30:42 +08:00
Qi Chen	c07a70e22a	[Fix](orc-reader) Add missing `break` introduced by #26548 . (#26633 ) Add missing break introduced by #26548. Sorry for this mistake.	2023-11-09 18:29:44 +08:00
zhiqiang	a5565f68b2	[Refactor](opentelemetry) Remove opentelemetry (#26605 )	2023-11-09 18:05:34 +08:00
wudongliang	22bf2889e5	[feature](tvf)(jni-avro)jni-avro scanner add complex data types (#26236 ) Support avro's enum, record, union data types	2023-11-09 13:58:49 +08:00
Qi Chen	d1438a8563	[Fix](orc-reader) Fix orc complex types when late materialization was turned on by disabling late materialization in this case. (#26548 ) Fix orc complex types when late materialization was turned on in orc reader by disabling late materialization in this case.	2023-11-09 12:05:43 +08:00
Qi Chen	3bce6d3828	[Opt](orc-reader) Optimize orc string dict filter in not_single_conjunct case. (#26386 ) Optimize orc/parquet string dict filter in not_single_conjunct case. We can optimize this processing to filter block firstly by dict code, then filter by not_single_conjunct. Because dict code is int, it will filter faster than string. For example: ``` select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01'; ``` `l_receiptdate` and `l_shipmode` will using string dict filtering, and `l_commitdate < l_receiptdate` is the an not_single_conjunct which contains dict filter field. We can optimize this processing to filter block firstly by dict code, then filter by not_single_conjunct. Because dict code is int, it will filter faster than string. ### Test Result: Before: mysql> select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01'; +----------------------+ \| count(l_receiptdate) \| +----------------------+ \| 49314694 \| +----------------------+ 1 row in set (6.87 sec) After: mysql> select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01'; +----------------------+ \| count(l_receiptdate) \| +----------------------+ \| 49314694 \| +----------------------+ 1 row in set (4.85 sec)	2023-11-08 18:03:18 +08:00
lihangyu	44b51bf0b9	[Feature](Variant) support variant load (#26572 )	2023-11-08 00:37:57 -06:00
daidai	a4e415ab09	[feature](hive)Support hive tables after alter type. (#25138 ) 1.Reconstruct the logic of decode to read parquet. The parquet reader first reads the data according to the parquet physical type, and then performs a type conversion. 2.Support hive alter table.	2023-11-02 00:24:21 +08:00
Tiewei Fang	3e10e5af39	[Fix](Serde) Fix content displayed by complex types in MySQL Client (#25946 ) This pr makes three changes to the display of complex types： 1. NULL value in complex types refers to being displayed as `null`, not `NULL` 2. struct type is displayed as "column_name": column_value 3. Time types such as `datetime` and `date`, are displayed with double quotes in complex types. like `{1, "2023-10-26 12:12:12"}` This pr also do a code refactor: 1. nesting_level is set to a member variable of the `DataTypeSerDe`, rather than a parameter in methods. What's more, this pr fix a bug that fileSize is not correct, introduced by this pr: #25854	2023-11-01 23:48:55 +08:00
Siyang Tang	aafd53766b	[chore](file-reader) rm unused interface from generic reader (#26205 )	2023-11-01 18:43:14 +08:00
Pxl	696ecc8c83	[Chore](log) adjust error code on too many filtered rows (#26168 )	2023-11-01 00:15:56 +08:00
wuwenchi	b98744ae90	[Bug](iceberg)fix read partitioned iceberg without partition path (#25503 ) Iceberg does not require partition values to exist on file paths, so we should get the partition value from `PartitionScanTask.partition`.	2023-10-31 18:09:53 +08:00
plat1ko	6dd60c6ebb	[Enhance](BE) Add -Wshadow-field compile option to avoid unexpected shadowing behavior (#25698 ) * Fix `Tablet::_meta_lock` shadows member inherited from `BaseTablet` * Add -Wshadow-field compile option to avoid unexpected shadowing behavior	2023-10-26 10:00:28 +08:00
TengJianPing	693982fd1a	[feature](decimal) support decimal256 (#25386 )	2023-10-25 15:47:51 +08:00
Siyang Tang	88dd480c2e	[enhancement](CSV-reader) enhance err log for csv reading containing enclose or escape (#25816 )	2023-10-24 22:10:08 +08:00
Ashin Gau	d62e914205	[opt](profile) set datalake profile level as 1 (#25686 ) Follow #25491, only the profile marked as 1 will be shown in simplified profile.	2023-10-24 09:55:25 +08:00
daidai	0e0f8090f7	[refactor](text_convert)Use serde to replace text_convert. (#25543 ) Remove text_convert and use serde to replace it.	2023-10-24 09:52:43 +08:00
Qi Chen	08832d9f3a	[Fix](exec) Fix date dict dead loop. (#25570 )	2023-10-24 02:51:43 +08:00
Siyang Tang	9006e2b8a5	[fix](prefetch-read) make prefetch range correct to accelerate S3 load and fix its speed unbalance (#25775 )	2023-10-23 20:02:24 +08:00
Pxl	642c149e6a	remove datetime_value and move vecdatetime_value to doris namespace (#25695 ) remove datetime_value and move vecdatetime_value to doris namespace	2023-10-20 22:08:17 +08:00
YueW	e4a83a22d1	[opt](error msg) Make data codec error clearly when load csv data can't display (#25540 ) Co-authored-by: Tanya-W <tanya1218w@163,com>	2023-10-18 16:12:22 +08:00

1 2 3 4 5 ...

323 Commits