doris

Author	SHA1	Message	Date
lihangyu	278b232e76	[Bug](json reader) object should stop processing when encounter error (#31159 ) If DATA_QUALITY_ERROR encountered we should stop processing this document any more.Otherwise there will be UB in simdjson.	2024-02-21 13:53:32 +08:00
HowardQin	0d32aeeaf6	[improvement](load) Enable lzo & Remove dependency on Markus F.X.J. Oberhumer's lzo library (#30573 ) Issue Number: close #29406 1. increase lzop version to 0x1040, I set to 0x1040 only for decompressing lzo files compressed by higher version of lzop, no change of decompressing logic, actully, 0x1040 should have "F_H_FILTER" feature, but it mainly for audio and image data, so we do not support it. 2. use orc::lzoDecompress() instead of lzo1x_decompress_safe() to decompress lzo data 3. use crc32c::Extend() instead of lzo_crc32() 4. use olap_adler32() instead of lzo_adler32() 5. thus, remove dependency of Markus F.X.J. Oberhumer's lzo library 6. remove DORIS_WITH_LZO, so lzo file are supported by stream and broker load by default 7. add some regression test	2024-02-05 22:00:24 +08:00
HowardQin	17cf4ab2c1	[case](regression) streamload publish timeout (#29457 ) Co-authored-by: qinhao <qinhao@newland.com.cn>	2024-01-07 19:50:16 +08:00
HowardQin	8a169b9906	[case](regression) Test enable pipeline load (#28172 ) Co-authored-by: qinhao <qinhao@newland.com.cn>	2023-12-28 10:49:19 +08:00
lw112	172f68480b	[Enhancement](load) Limit the number of incorrect data drops and add documents (#27727 ) In the load process, if there are problems with the original data, we will store the error data in an error_log file on the disk for subsequent debugging. However, if there are many error data, it will occupy a lot of disk space. Now we want to limit the number of error data that is saved to the disk. Be familiar with the usage of doris' import function and internal implementation process Add a new be configuration item load_error_log_limit_bytes = default value 200MB Use the newly added threshold to limit the amount of data that RuntimeState::append_error_msg_to_file writes to disk Write regression cases for testing and verification Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>	2023-12-22 10:43:18 +08:00
Zhiyu Hu	4f5821407f	[case]Load data with load_parallelism=any > 1 and stream load with compress type (#27306 )	2023-12-13 18:41:14 +08:00
HHoflittlefish777	3f3843fd4f	[fix](load) fix loaded rows error (#28061 )	2023-12-07 14:43:51 +08:00
abmdocrt	605257ccb7	[Enhancement](group commit) Add regression case for wal limit (#27949 )	2023-12-06 14:23:50 +08:00
amory	358d73a0ae	[FIX](complextype) fix empty quote with complex type (#27942 )	2023-12-05 12:25:26 +08:00
HHoflittlefish777	97d36b4f38	[fix](csv_reader) fix trim_double_quotes behavior change (#27882 )	2023-12-03 22:57:55 +08:00
HowardQin	f4afcae452	[case](regression) Stream load 2pc exceptions (#27804 ) Co-authored-by: qinhao <qinhao@newland.com.cn>	2023-12-01 22:27:40 +08:00
HHoflittlefish777	498d27c905	[improve](json_reader) add prompt when all fields is null (#27630 )	2023-11-29 18:26:42 +08:00
abmdocrt	13b26ee920	[Fix](core) Fix wal space back pressure core and add regression test (#27311 )	2023-11-27 15:10:26 +08:00
HowardQin	42c32c584b	[case](regression) test invalid jsonpaths (#27359 ) Co-authored-by: qinhao <qinhao@newland.com.cn>	2023-11-23 10:16:34 +08:00
amory	c0f22e8feb	[FIX](complextype)fix struct nested complex collection type and and regresstest (#26973 )	2023-11-20 22:29:12 +08:00
HowardQin	7164b7cebb	[case](regression) add read json by line test case (#26670 ) Co-authored-by: qinhao <qinhao@newland.com.cn>	2023-11-20 17:33:40 +08:00
amory	0ece18d6cd	[FIX](regresstest) fix test_map_nested_array csv file for id(#27105 )	2023-11-17 04:20:02 -06:00
Guangdong Liu	c7d961cb11	[regression test](stream load) add case for strict_mode=true and max_filter_ratio=0.5 (#27125 )	2023-11-17 13:39:01 +08:00
HowardQin	54989175fb	[case] Load json data with enable_simdjson_reader=false (#26601 )	2023-11-16 14:40:59 +08:00
Guangdong Liu	230d8af777	[regression test](temporary_partitions) add case for temporary_partitions #27063	2023-11-16 11:49:37 +08:00
Guangdong Liu	f1169d3c58	[regression-test](TRIM_DOUBLE_QUOTES) add case for TRIM_DOUBLE_QUOTES (#26998 )	2023-11-14 23:52:40 +08:00
HHoflittlefish777	cd7ad99de0	[improvement](regression-test) add chunked transfer json test (#26902 )	2023-11-14 10:31:30 +08:00
Guangdong Liu	4ecaa921f9	[regression-test](num_as_string) test num_as_string (#26842 )	2023-11-13 21:48:13 +08:00
HHoflittlefish777	607a5d25f1	[feature](streamload) support HTTP request with chunked transfer (#26520 )	2023-11-08 10:07:05 +08:00
Tiewei Fang	3e10e5af39	[Fix](Serde) Fix content displayed by complex types in MySQL Client (#25946 ) This pr makes three changes to the display of complex types： 1. NULL value in complex types refers to being displayed as `null`, not `NULL` 2. struct type is displayed as "column_name": column_value 3. Time types such as `datetime` and `date`, are displayed with double quotes in complex types. like `{1, "2023-10-26 12:12:12"}` This pr also do a code refactor: 1. nesting_level is set to a member variable of the `DataTypeSerDe`, rather than a parameter in methods. What's more, this pr fix a bug that fileSize is not correct, introduced by this pr: #25854	2023-11-01 23:48:55 +08:00
amory	8a436d8ecc	[FIX](collectiontype) fix shrink char column in map/struct (#25725 ) fix shrink char column in map/struct before we has char with specific length defined in map or struct field we select map or struct , the char column in which one has been padding if we just insert less than specific length chars but in mysql here just show inserted chars not padding specific length chars --------- Co-authored-by: yiguolei <676222867@qq.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-10-25 17:16:13 +08:00
amory	c0f8c0af39	[FIX](collectiontype) fix collection type with char which without length (#25703 ) fix create complex type like array/map/struct with char which without length	2023-10-24 02:16:37 -05:00
YueW	e4a83a22d1	[opt](error msg) Make data codec error clearly when load csv data can't display (#25540 ) Co-authored-by: Tanya-W <tanya1218w@163,com>	2023-10-18 16:12:22 +08:00
lihangyu	b74836050a	[chore](config) turnoff fuzzy for `enable_simdjson_reader` (#25521 )	2023-10-17 18:42:11 +08:00
qiye	b2e3ecb81d	[opt](load)change `load_to_single_tablet` tablet search algorithm from random to round-robin (#25256 ) At present, `load_to_singlt_tablet` import implementation refers to simple random number remainder, which cannot achieve true averaging. This will lead to uneven disk IO and uneven use of cluster resources. To solve this problem, we are preparing to implement round-robin for each partition tablet imported each time, in order to achieve average load to each tablet. When generating the load query plan, the tablet index record currently imported is passed to BE. Add a deamon task in FE to regularly clean up the `loadTabletRecordMap`. The map will get the bucket_number of the partition and update the `load_tablet_index` when `getCurrentLoadTabletIndex`.	2023-10-16 16:43:25 +08:00
HHoflittlefish777	79fa1d1640	[enhancement](regression-test) add stream load json case (#25168 )	2023-10-09 16:40:39 +08:00
zzzzzzzs	6d27a016b9	[Improvement](regression-test) add http_stream case (#24930 )	2023-09-27 09:55:52 +08:00
meiyi	55d1090137	[feature](insert) Support group commit stream load (#24304 )	2023-09-26 20:57:02 +08:00
HHoflittlefish777	bc747be511	[Improvement](regression-test) add stream load case (#24396 )	2023-09-26 15:35:19 +08:00
HHoflittlefish777	8d4fd76a16	[Feature](StreamLoad2PC) Support commit and abort streamload2PC by label (#24613 )	2023-09-25 22:21:27 +08:00
amory	ce79711b0d	[FIX](serde) fix map/array deserialize string with quote pair (#24808 )	2023-09-23 01:12:20 +08:00
wangqt	7c32b2b4ed	[Fix](broker load) broker load with or predicate error fix #24157 Co-authored-by: wangqingtao6 <wangqingtao6@jd.com>	2023-09-20 14:56:32 +08:00
daidai	c704497d02	[fix](csv_reader)Fixed bug when parsing multi-character delimiters. (#24572 ) Fixed bug when parsing multi-character delimiters.	2023-09-20 12:41:35 +08:00
daidai	4b5cea1ef8	[enhancement](fix)change ordinary type null value is \N,complex type null value is null (#24207 )	2023-09-16 21:46:42 +08:00
amory	268c867679	[Improve](serde)replace function_cast from_string to serde (#24087 ) Now we can not support streamload with column which is map/array nested map/array serde can do this now , so we can replace it Notice. if item data in complex type data is empty we just return error, instead of makeup default value , because now we can not define right default for complex type	2023-09-14 13:53:16 +08:00
Mingyu Chen	650af8f4df	[fix](test) fix broker load with default value test case (#24123 )	2023-09-10 10:28:22 +08:00
daidai	f9a75b5c4f	[feature](csv_serde)1.append csv serde for serialize to csv and deserialize from csv. 2.let csvReader use csv serde not text_converter. (#23352 ) 1. append csv serde for serialize to csv and deserialize from csv. 2. let csvReader use csv serde not text_converter.	2023-09-10 00:16:21 +08:00
Calvin Kirs	4dac2d3b94	[Fix](Plan)StreamLoad cannot be parsed correctly when it contains complex where conditions (#23874 )	2023-09-05 11:26:59 +08:00
Kaijie Chen	a5761a25c5	[feature](move-memtable)[7/7] add regression tests (#23515 ) Co-authored-by: laihui <1353307710@qq.com>	2023-08-26 17:52:10 +08:00
daidai	2dda44d7b5	[fix](csv-reader)fix bug of multi-char delimiter in csv reader fix bug that csv_reader parse line in order to get column.	2023-08-23 15:19:13 +08:00
Siyang Tang	b49dc8042d	[feature](load) refactor CSV reading process during scanning, and support enclose and escape for stream load (#22539 ) ## Proposed changes Refactor thoughts: close #22383 Descriptions about `enclose` and `escape`: #22385 ## Further comments 2023-08-09: It's a pity that experiment shows that the original way for parsing plain CSV is faster. Therefor, the refactor is only applied on enclose related code. The plain CSV parser use the original logic. Fallback of performance is unavoidable anyway. From the `CSV reader`'s perspective, the real weak point may be the write column behavior, proved by the flame graph. Trimming escape will be enable after fix: #22411 is merged Cases should be discussed: 1. When an incomplete enclose appears in the beginning of a large scale data, the line delimiter will be unreachable till the EOF, will the buffer become extremely large? 2. What if an infinite line occurs in the case? Essentially, `1.` is equivalent to this. Only support stream load as trial in this PR, avoid too many unrelated changes. Docs will be added when `enclose` and `escape` is available for all kinds of load.	2023-08-15 09:23:53 +08:00
zzzzzzzs	66784cef71	[Enhancement](Load) Stream Load using SQL (#22509 ) This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first. thanks @Cai-Yao @yiguolei	2023-08-08 13:49:04 +08:00
Xujian Duan	3024b82918	[fix](load)Fix wrong default value for char and varchar of reading json data (#22626 ) If a column is defined as: col VARCHAR/CHAR NULL and no default value. Then we load json data which misses column col, the result queried is not correct: +------+ \| col \| +------+ \| 1 \| +------+ But expect: +------+ \| col \| +------+ \| NULL \| +------+ --------- Co-authored-by: duanxujian <duanxujian@jd.com>	2023-08-05 12:47:27 +08:00
amory	7261845b3d	[FIX](complex-type)fix complex type nested col_const (#22375 ) for array/map/struct in mysql_writer unpack_if_const only unpack self column not nested , so col_const should not used in nested column.	2023-07-31 14:53:18 +08:00
lihangyu	40299d280d	[Fix](json reader) fix rapidjson `array->PushBack` may take ownership… (#21988 ) With bellow json path `["$.data","$.data.datatimestamp"]` After `array_obj->PushBack` the `data` field owner will be taken from array_obj, and lead to null values for json path `$.data.datatimestamp` Rapidjson doc: ``` //! Append a GenericValue at the end of the array. \note The ownership of \c value will be transferred to this array on success. */ GenericValue& PushBack(GenericValue& value, Allocator& allocator); ```	2023-07-21 17:02:01 +08:00

1 2

92 Commits