doris

Author	SHA1	Message	Date
github-actions[bot]	e923acef1b	branch-2.1: [Fix](JsonReader) Fix the issue where the null bitmap of the JSON reader was not initialized when the JSON path is specified as '$.’ #52211 (#52268 ) Cherry-picked from #52211 Co-authored-by: lihangyu <lihangyu@selectdb.com>	2025-06-28 14:21:38 +08:00
github-actions[bot]	8356141e03	branch-2.1: [enhancement](case) add cases for mow table load empty file #49843 (#49858 ) Cherry-picked from #49843 Co-authored-by: MoanasDaddyXu <xujianxu@selectdb.com>	2025-04-08 14:04:30 +08:00
Sun Chenyang	c9381b0285	[fix](load) Fix import failure when the stream load parameter specifies Transfer-Encoding:chunked (#48196 ) (#48503 ) pick from master #48196	2025-03-04 10:12:54 +08:00
github-actions[bot]	547e88b1ee	branch-2.1: [fix](csv reader) fix core dump when parsing csv with enclose #45485 (#45889 ) Cherry-picked from #45485 Co-authored-by: hui lai <laihui@selectdb.com>	2024-12-25 12:09:20 +08:00
Mryange	6dddd4c499	[function](cast)Make string casting to integers more like MySQL's beh… (#41541 ) …avior (#38847) https://github.com/apache/doris/pull/38847 ## Proposed changes There are two issues here. First, the results of casting are inconsistent between FE and BE . ``` FE mysql [(none)]>select cast('3.000' as int); +----------------------+ \| cast('3.000' as INT) \| +----------------------+ \| 3 \| +----------------------+ mysql [(none)]>set debug_skip_fold_constant = true; BE mysql [(none)]>select cast('3.000' as int); +----------------------+ \| cast('3.000' as INT) \| +----------------------+ \| NULL \| +----------------------+ ``` The second issue is that casting on BE converts '3.0' to null. Here, the casting logic for FE and BE has been unified <!--Describe your changes.--> ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> --------- Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>	2024-10-11 09:32:00 +08:00
morrySnow	a45dc8796a	[fix](Nereids) simplify decimal comparison wrong when cast to smaller scale (#41151 ) (#41618 ) pick from master #41151	2024-10-09 23:03:01 +08:00
lihangyu	677435cef8	[Pick](Branch-2.1) pick json reader fix and support specify $. as column (#39271 ) #39206 #38213	2024-08-13 17:44:45 +08:00
amory	338fa32303	[pick](simdjson) fix simdjson with object array when jsonroot is not empty (#38633 ) ## Proposed changes backport: https://github.com/apache/doris/pull/38490 Issue Number: close #xxx <!--Describe your changes.-->	2024-08-01 11:04:54 +08:00
lihangyu	d9fd419e47	[Fix](JsonReader) fix json with duplicate key entry may result out of bound exception (#38147 ) #38146	2024-07-19 22:53:02 +08:00
caiconghui	9ff129b630	[fix](stream_load) fix stream load may failed caused by column name with keyword (#35822 ) (#37890 ) #35938 #35822 let KW_SQL, KW_CACHE, KW_COLOCATE, KW_COMPRESS_TYPE, KW_DORIS_INTERNAL_TABLE_ID, KW_HOTSPOT, KW_PRIVILEGES, KW_RECENT, KW_STAGES, KW_WARM, KW_UP, KW_CONVERT_LSC be as non-reserved ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> --------- Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2024-07-16 20:20:34 +08:00
hui lai	c66df8d9e6	[branch-2.1](load) fix no error url if no partition can be found (#36831 ) (#37401 ) ## Proposed changes pick #36831 before ``` Stream load result: { "TxnId": 2014, "Label": "83ba46bd-280c-4e22-b581-4eb126fd49cf", "Comment": "", "TwoPhaseCommit": "false", "Status": "Fail", "Message": "[DATA_QUALITY_ERROR]Encountered unqualified data, stop processing", "NumberTotalRows": 1, "NumberLoadedRows": 1, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 1669, "LoadTimeMs": 58, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 10, "ReadDataTimeMs": 0, "WriteDataTimeMs": 47, "CommitAndPublishTimeMs": 0 } ``` after ``` Stream load result: { "TxnId": 2014, "Label": "83ba46bd-280c-4e22-b581-4eb126fd49cf", "Comment": "", "TwoPhaseCommit": "false", "Status": "Fail", "Message": "[DATA_QUALITY_ERROR]too many filtered rows", "NumberTotalRows": 1, "NumberLoadedRows": 0, "NumberFilteredRows": 1, "NumberUnselectedRows": 0, "LoadBytes": 1669, "LoadTimeMs": 58, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 10, "ReadDataTimeMs": 0, "WriteDataTimeMs": 47, "CommitAndPublishTimeMs": 0, "ErrorURL": "http://XXXX:8040/api/_load_error_log?file=__shard_4/error_log_insert_stmt_c6461270125a615b-2873833fb48d56a3_c6461270125a615b_2873833fb48d56a3" } ``` ## Proposed changes Issue Number: close #xxx <!--Describe your changes.-->	2024-07-08 10:41:33 +08:00
Mingyu Chen	ceef9ee123	[feature](serde) support presto compatible output format (#37039 ) (#37253 ) bp #37039	2024-07-04 13:56:05 +08:00
gnehil	a6a84b8ecc	[improvement](stream load)(cherry-pick) support hll_from_base64 for stream load column mapping (#36819 ) picked from https://github.com/apache/doris/pull/35923	2024-06-26 20:12:40 +08:00
starocean999	cbaff8a700	[fix](nereids)change the decimal's precision and scale for cast(xx as decimal) (#36540 ) pick from master #36316 expression cast( xx as decimal )'s datatype maybe decimalv3 or decimalv2 depending on enable_decimal_conversion value in fe conf file. if enable_decimal_conversion is true, the datatype is decimalv3(9, 0), but the datatype was decimalv3(38, 9) in 2.0 releases. So this pr change the datatype same as 2.0 releases to keep the behavior consistent.	2024-06-20 17:46:11 +08:00
lw112	48d4601ee3	[regression-test](load) add something like $.tag.[a.b] key's json case (#35134 )	2024-05-31 22:45:09 +08:00
Mingyu Chen	8da260ee0d	[fix](hdfs)read 'fs.defaultFS' from core-site.xml for hdfs load which has no default fs (#34217 ) (#34372 ) bp #34217 Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>	2024-05-01 00:31:49 +08:00
Xin Liao	cd1c9edd71	[fix](pipeline-load) fix no error url when data quality error and total rows is negative (#34072 ) (#34204 ) Co-authored-by: HHoflittlefish777 <77738092+HHoflittlefish777@users.noreply.github.com>	2024-04-27 18:19:08 +08:00
HHoflittlefish777	55d5ed9ab6	[test](streamload) add load empty file regression test (#34110 )	2024-04-26 07:42:09 +08:00
xy720	080c07ad87	[bug](random distribution) fix data loss and incorrect in random distribution table #33962	2024-04-24 17:13:50 +08:00
Pxl	5a5063be20	[bug](fix) heap use after free when json parse failed (#33955 )	2024-04-22 22:33:24 +08:00
Tiewei Fang	36a70ba1e7	[Fix](Csv-Reader)Fix the issue of BE core dump caused by improper configuration of column_seperator and line_delimiter. (#33693 )	2024-04-20 20:06:48 +08:00
amory	2648a92594	[FIX](load)fix load with split-by-string (#33713 )	2024-04-17 23:42:14 +08:00
HHoflittlefish777	b85bf3b6b0	[test](cast) add test for stream load cast (#33189 )	2024-04-10 15:26:09 +08:00
超威老仲	b0b5f84e40	[feature](load) support compressed JSON format data for broker load (#30809 )	2024-04-10 14:20:53 +08:00
Mryange	8bd101129a	[behavior change](output) change float output format (#32049 )	2024-03-21 14:07:22 +08:00
lihangyu	278b232e76	[Bug](json reader) object should stop processing when encounter error (#31159 ) If DATA_QUALITY_ERROR encountered we should stop processing this document any more.Otherwise there will be UB in simdjson.	2024-02-21 13:53:32 +08:00
HowardQin	0d32aeeaf6	[improvement](load) Enable lzo & Remove dependency on Markus F.X.J. Oberhumer's lzo library (#30573 ) Issue Number: close #29406 1. increase lzop version to 0x1040, I set to 0x1040 only for decompressing lzo files compressed by higher version of lzop, no change of decompressing logic, actully, 0x1040 should have "F_H_FILTER" feature, but it mainly for audio and image data, so we do not support it. 2. use orc::lzoDecompress() instead of lzo1x_decompress_safe() to decompress lzo data 3. use crc32c::Extend() instead of lzo_crc32() 4. use olap_adler32() instead of lzo_adler32() 5. thus, remove dependency of Markus F.X.J. Oberhumer's lzo library 6. remove DORIS_WITH_LZO, so lzo file are supported by stream and broker load by default 7. add some regression test	2024-02-05 22:00:24 +08:00
HowardQin	17cf4ab2c1	[case](regression) streamload publish timeout (#29457 ) Co-authored-by: qinhao <qinhao@newland.com.cn>	2024-01-07 19:50:16 +08:00
HowardQin	8a169b9906	[case](regression) Test enable pipeline load (#28172 ) Co-authored-by: qinhao <qinhao@newland.com.cn>	2023-12-28 10:49:19 +08:00
lw112	172f68480b	[Enhancement](load) Limit the number of incorrect data drops and add documents (#27727 ) In the load process, if there are problems with the original data, we will store the error data in an error_log file on the disk for subsequent debugging. However, if there are many error data, it will occupy a lot of disk space. Now we want to limit the number of error data that is saved to the disk. Be familiar with the usage of doris' import function and internal implementation process Add a new be configuration item load_error_log_limit_bytes = default value 200MB Use the newly added threshold to limit the amount of data that RuntimeState::append_error_msg_to_file writes to disk Write regression cases for testing and verification Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>	2023-12-22 10:43:18 +08:00
Zhiyu Hu	4f5821407f	[case]Load data with load_parallelism=any > 1 and stream load with compress type (#27306 )	2023-12-13 18:41:14 +08:00
HHoflittlefish777	3f3843fd4f	[fix](load) fix loaded rows error (#28061 )	2023-12-07 14:43:51 +08:00
abmdocrt	605257ccb7	[Enhancement](group commit) Add regression case for wal limit (#27949 )	2023-12-06 14:23:50 +08:00
amory	358d73a0ae	[FIX](complextype) fix empty quote with complex type (#27942 )	2023-12-05 12:25:26 +08:00
HHoflittlefish777	97d36b4f38	[fix](csv_reader) fix trim_double_quotes behavior change (#27882 )	2023-12-03 22:57:55 +08:00
HowardQin	f4afcae452	[case](regression) Stream load 2pc exceptions (#27804 ) Co-authored-by: qinhao <qinhao@newland.com.cn>	2023-12-01 22:27:40 +08:00
HHoflittlefish777	498d27c905	[improve](json_reader) add prompt when all fields is null (#27630 )	2023-11-29 18:26:42 +08:00
abmdocrt	13b26ee920	[Fix](core) Fix wal space back pressure core and add regression test (#27311 )	2023-11-27 15:10:26 +08:00
HowardQin	42c32c584b	[case](regression) test invalid jsonpaths (#27359 ) Co-authored-by: qinhao <qinhao@newland.com.cn>	2023-11-23 10:16:34 +08:00
amory	c0f22e8feb	[FIX](complextype)fix struct nested complex collection type and and regresstest (#26973 )	2023-11-20 22:29:12 +08:00
HowardQin	7164b7cebb	[case](regression) add read json by line test case (#26670 ) Co-authored-by: qinhao <qinhao@newland.com.cn>	2023-11-20 17:33:40 +08:00
amory	0ece18d6cd	[FIX](regresstest) fix test_map_nested_array csv file for id(#27105 )	2023-11-17 04:20:02 -06:00
Guangdong Liu	c7d961cb11	[regression test](stream load) add case for strict_mode=true and max_filter_ratio=0.5 (#27125 )	2023-11-17 13:39:01 +08:00
HowardQin	54989175fb	[case] Load json data with enable_simdjson_reader=false (#26601 )	2023-11-16 14:40:59 +08:00
Guangdong Liu	230d8af777	[regression test](temporary_partitions) add case for temporary_partitions #27063	2023-11-16 11:49:37 +08:00
Guangdong Liu	f1169d3c58	[regression-test](TRIM_DOUBLE_QUOTES) add case for TRIM_DOUBLE_QUOTES (#26998 )	2023-11-14 23:52:40 +08:00
HHoflittlefish777	cd7ad99de0	[improvement](regression-test) add chunked transfer json test (#26902 )	2023-11-14 10:31:30 +08:00
Guangdong Liu	4ecaa921f9	[regression-test](num_as_string) test num_as_string (#26842 )	2023-11-13 21:48:13 +08:00
HHoflittlefish777	607a5d25f1	[feature](streamload) support HTTP request with chunked transfer (#26520 )	2023-11-08 10:07:05 +08:00
Tiewei Fang	3e10e5af39	[Fix](Serde) Fix content displayed by complex types in MySQL Client (#25946 ) This pr makes three changes to the display of complex types： 1. NULL value in complex types refers to being displayed as `null`, not `NULL` 2. struct type is displayed as "column_name": column_value 3. Time types such as `datetime` and `date`, are displayed with double quotes in complex types. like `{1, "2023-10-26 12:12:12"}` This pr also do a code refactor: 1. nesting_level is set to a member variable of the `DataTypeSerDe`, rather than a parameter in methods. What's more, this pr fix a bug that fileSize is not correct, introduced by this pr: #25854	2023-11-01 23:48:55 +08:00

1 2 3

117 Commits