Commit Graph

149 Commits

Author SHA1 Message Date
278b232e76 [Bug](json reader) object should stop processing when encounter error (#31159)
If DATA_QUALITY_ERROR encountered we should stop processing this document any more.Otherwise there will be UB in simdjson.
2024-02-21 13:53:32 +08:00
0d32aeeaf6 [improvement](load) Enable lzo & Remove dependency on Markus F.X.J. Oberhumer's lzo library (#30573)
Issue Number: close #29406

1. increase lzop version to 0x1040,
    I set to 0x1040 only for decompressing lzo files compressed by higher version of lzop,
	no change of decompressing logic,
	actully, 0x1040 should have "F_H_FILTER" feature,
	but it mainly for audio and image data, so we do not support it.
2. use orc::lzoDecompress() instead of lzo1x_decompress_safe() to decompress lzo data
3. use crc32c::Extend() instead of lzo_crc32()
4. use olap_adler32() instead of lzo_adler32()
5. thus, remove dependency of Markus F.X.J. Oberhumer's lzo library
6. remove DORIS_WITH_LZO, so lzo file are supported by stream and broker load by default
7. add some regression test
2024-02-05 22:00:24 +08:00
009bca9652 [regression test](broker load) add partition load case (#28259) 2024-01-30 15:30:39 +08:00
5f20d7c5d0 [regression test](stream load) test for enable_profile (#28534) 2024-01-30 15:30:39 +08:00
17cf4ab2c1 [case](regression) streamload publish timeout (#29457)
Co-authored-by: qinhao <qinhao@newland.com.cn>
2024-01-07 19:50:16 +08:00
8a169b9906 [case](regression) Test enable pipeline load (#28172)
Co-authored-by: qinhao <qinhao@newland.com.cn>
2023-12-28 10:49:19 +08:00
172f68480b [Enhancement](load) Limit the number of incorrect data drops and add documents (#27727)
In the load process, if there are problems with the original data, we will store the error data in an error_log file on the disk for subsequent debugging. However, if there are many error data, it will occupy a lot of disk space. Now we want to limit the number of error data that is saved to the disk.

Be familiar with the usage of doris' import function and internal implementation process
Add a new be configuration item load_error_log_limit_bytes = default value 200MB
Use the newly added threshold to limit the amount of data that RuntimeState::append_error_msg_to_file writes to disk
Write regression cases for testing and verification

Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>
2023-12-22 10:43:18 +08:00
1e08845fc5 [regression test](broker load) add case for sequence col (#27583) 2023-12-16 22:47:20 +08:00
4f5821407f [case]Load data with load_parallelism=any > 1 and stream load with compress type (#27306) 2023-12-13 18:41:14 +08:00
43327383c3 [regression test](broker laod) add exception case for merge type (#27840) 2023-12-13 18:34:34 +08:00
45b2dbab6a [improve](group commit) Group commit support max filter ratio when rows is less than value in config (#28139) 2023-12-12 16:33:36 +08:00
e49ed3d885 [regression test](memtable) add case for aggregation memtable (#28056)
1. create aggregation table
2. insert some data
3. drop the table and create again
4. modify some parameters for some branch
5. insert some data
6. change the parameters back to its default
2023-12-12 11:14:59 +08:00
1e5ff40e17 [refactor](group commit) remove future block (#27720)
Co-authored-by: huanghaibin <284824253@qq.com>
2023-12-11 08:41:51 +08:00
3f3843fd4f [fix](load) fix loaded rows error (#28061) 2023-12-07 14:43:51 +08:00
616ba2823f [regression test](broker load) Test case missing files (#27580) 2023-12-07 10:17:07 +08:00
605257ccb7 [Enhancement](group commit) Add regression case for wal limit (#27949) 2023-12-06 14:23:50 +08:00
358d73a0ae [FIX](complextype) fix empty quote with complex type (#27942) 2023-12-05 12:25:26 +08:00
97d36b4f38 [fix](csv_reader) fix trim_double_quotes behavior change (#27882) 2023-12-03 22:57:55 +08:00
f4afcae452 [case](regression) Stream load 2pc exceptions (#27804)
Co-authored-by: qinhao <qinhao@newland.com.cn>
2023-12-01 22:27:40 +08:00
498d27c905 [improve](json_reader) add prompt when all fields is null (#27630) 2023-11-29 18:26:42 +08:00
4ea69ed390 [regression test](broker load) add case for num_as_string (#27588) 2023-11-27 21:25:59 +08:00
13b26ee920 [Fix](core) Fix wal space back pressure core and add regression test (#27311) 2023-11-27 15:10:26 +08:00
ff1a06abcf [test](regression) add routine load sequence and error test (#27519) 2023-11-25 23:30:20 +08:00
160adbaa69 [regression test](routine test) add case for num_as_string (#27436) 2023-11-24 10:15:47 +08:00
5adbe47d3a [test](regression) add stream load tvf properties regression test (#27467) 2023-11-23 23:04:10 +08:00
c884e46e6c [regression test](routine test) add case for desired_concurrent_number (#27372) 2023-11-23 15:11:01 +08:00
6253f7d6c7 [test](regression) add routine load condition test (#27430) 2023-11-23 14:37:35 +08:00
42c32c584b [case](regression) test invalid jsonpaths (#27359)
Co-authored-by: qinhao <qinhao@newland.com.cn>
2023-11-23 10:16:34 +08:00
a2a6a722eb [test](regression) add routine load command test (#27384) 2023-11-22 18:55:35 +08:00
c1435c0589 [regression test](routine test) add case for send_batch_parallelism (#27333) 2023-11-21 20:43:20 +08:00
c0f22e8feb [FIX](complextype)fix struct nested complex collection type and and regresstest (#26973) 2023-11-20 22:29:12 +08:00
7164b7cebb [case](regression) add read json by line test case (#26670)
Co-authored-by: qinhao <qinhao@newland.com.cn>
2023-11-20 17:33:40 +08:00
4b6330cb93 [regression test](http stream) add case for strict_mode (#27098) 2023-11-18 00:10:02 +08:00
0ece18d6cd [FIX](regresstest) fix test_map_nested_array csv file for id(#27105) 2023-11-17 04:20:02 -06:00
c7d961cb11 [regression test](stream load) add case for strict_mode=true and max_filter_ratio=0.5 (#27125) 2023-11-17 13:39:01 +08:00
ee08958526 [regression test](http_stream) case for timezone (#27149)
It does not work now, anyway we need a case.
2023-11-17 13:36:41 +08:00
54989175fb [case] Load json data with enable_simdjson_reader=false (#26601) 2023-11-16 14:40:59 +08:00
230d8af777 [regression test](temporary_partitions) add case for temporary_partitions #27063 2023-11-16 11:49:37 +08:00
f1169d3c58 [regression-test](TRIM_DOUBLE_QUOTES) add case for TRIM_DOUBLE_QUOTES (#26998) 2023-11-14 23:52:40 +08:00
cd7ad99de0 [improvement](regression-test) add chunked transfer json test (#26902) 2023-11-14 10:31:30 +08:00
4ecaa921f9 [regression-test](num_as_string) test num_as_string (#26842) 2023-11-13 21:48:13 +08:00
607a5d25f1 [feature](streamload) support HTTP request with chunked transfer (#26520) 2023-11-08 10:07:05 +08:00
3e10e5af39 [Fix](Serde) Fix content displayed by complex types in MySQL Client (#25946)
This pr makes three changes to the display of complex types:
1. NULL value in complex types refers to being displayed as `null`, not `NULL`
2. struct type is displayed as "column_name": column_value
3. Time types such as `datetime` and `date`, are displayed with double quotes in complex types. like
    `{1, "2023-10-26 12:12:12"}`

This pr also do a code refactor:
1. nesting_level is set to a member variable of the `DataTypeSerDe`, rather than a parameter in methods.

What's more, this pr fix a bug that fileSize is not correct, introduced by this pr: #25854
2023-11-01 23:48:55 +08:00
4eb9a52ace [regression](s3load) Add regression testing and modify description text for s3load (#25947) 2023-11-01 07:39:16 +08:00
e783ef716f [fix](multi-table) fix unknown source slot descriptor when load multi table (#25762) 2023-10-25 21:52:01 +08:00
8a436d8ecc [FIX](collectiontype) fix shrink char column in map/struct (#25725)
fix shrink char column in map/struct
before we has char with specific length defined in map or struct field
we select map or struct , the char column in which one has been padding if we just insert less than specific length chars
but in mysql here just show inserted chars not padding specific length chars
---------

Co-authored-by: yiguolei <676222867@qq.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-10-25 17:16:13 +08:00
c0f8c0af39 [FIX](collectiontype) fix collection type with char which without length (#25703)
fix create complex type like array/map/struct with char which without length
2023-10-24 02:16:37 -05:00
e4a83a22d1 [opt](error msg) Make data codec error clearly when load csv data can't display (#25540)
Co-authored-by: Tanya-W <tanya1218w@163,com>
2023-10-18 16:12:22 +08:00
b74836050a [chore](config) turnoff fuzzy for enable_simdjson_reader (#25521) 2023-10-17 18:42:11 +08:00
85b8497624 [fix](Tvf) return empty set when tvf queries an empty file or an error uri (#25280)
### Before:
return errors when tvf queries an empty file or an error uri:
1. get parsed schema failed, empty csv file
2. Can not get first file, please check uri.

### Now:
we just return empty set when tvf queries an empty file or an error uri.
```sql
mysql> select * from s3( 
"uri" = "https://error_uri/exp_1.csv", 
"s3.access_key"= "xx", 
"s3.secret_key" = "yy", 
"format" = "csv") limit 10;

Empty set (1.29 sec)
```
2023-10-17 09:52:53 +08:00