Commit Graph

87 Commits

Author SHA1 Message Date
4f5821407f [case]Load data with load_parallelism=any > 1 and stream load with compress type (#27306) 2023-12-13 18:41:14 +08:00
3f3843fd4f [fix](load) fix loaded rows error (#28061) 2023-12-07 14:43:51 +08:00
605257ccb7 [Enhancement](group commit) Add regression case for wal limit (#27949) 2023-12-06 14:23:50 +08:00
358d73a0ae [FIX](complextype) fix empty quote with complex type (#27942) 2023-12-05 12:25:26 +08:00
97d36b4f38 [fix](csv_reader) fix trim_double_quotes behavior change (#27882) 2023-12-03 22:57:55 +08:00
f4afcae452 [case](regression) Stream load 2pc exceptions (#27804)
Co-authored-by: qinhao <qinhao@newland.com.cn>
2023-12-01 22:27:40 +08:00
498d27c905 [improve](json_reader) add prompt when all fields is null (#27630) 2023-11-29 18:26:42 +08:00
13b26ee920 [Fix](core) Fix wal space back pressure core and add regression test (#27311) 2023-11-27 15:10:26 +08:00
42c32c584b [case](regression) test invalid jsonpaths (#27359)
Co-authored-by: qinhao <qinhao@newland.com.cn>
2023-11-23 10:16:34 +08:00
c0f22e8feb [FIX](complextype)fix struct nested complex collection type and and regresstest (#26973) 2023-11-20 22:29:12 +08:00
7164b7cebb [case](regression) add read json by line test case (#26670)
Co-authored-by: qinhao <qinhao@newland.com.cn>
2023-11-20 17:33:40 +08:00
0ece18d6cd [FIX](regresstest) fix test_map_nested_array csv file for id(#27105) 2023-11-17 04:20:02 -06:00
c7d961cb11 [regression test](stream load) add case for strict_mode=true and max_filter_ratio=0.5 (#27125) 2023-11-17 13:39:01 +08:00
54989175fb [case] Load json data with enable_simdjson_reader=false (#26601) 2023-11-16 14:40:59 +08:00
230d8af777 [regression test](temporary_partitions) add case for temporary_partitions #27063 2023-11-16 11:49:37 +08:00
f1169d3c58 [regression-test](TRIM_DOUBLE_QUOTES) add case for TRIM_DOUBLE_QUOTES (#26998) 2023-11-14 23:52:40 +08:00
cd7ad99de0 [improvement](regression-test) add chunked transfer json test (#26902) 2023-11-14 10:31:30 +08:00
4ecaa921f9 [regression-test](num_as_string) test num_as_string (#26842) 2023-11-13 21:48:13 +08:00
607a5d25f1 [feature](streamload) support HTTP request with chunked transfer (#26520) 2023-11-08 10:07:05 +08:00
3e10e5af39 [Fix](Serde) Fix content displayed by complex types in MySQL Client (#25946)
This pr makes three changes to the display of complex types:
1. NULL value in complex types refers to being displayed as `null`, not `NULL`
2. struct type is displayed as "column_name": column_value
3. Time types such as `datetime` and `date`, are displayed with double quotes in complex types. like
    `{1, "2023-10-26 12:12:12"}`

This pr also do a code refactor:
1. nesting_level is set to a member variable of the `DataTypeSerDe`, rather than a parameter in methods.

What's more, this pr fix a bug that fileSize is not correct, introduced by this pr: #25854
2023-11-01 23:48:55 +08:00
8a436d8ecc [FIX](collectiontype) fix shrink char column in map/struct (#25725)
fix shrink char column in map/struct
before we has char with specific length defined in map or struct field
we select map or struct , the char column in which one has been padding if we just insert less than specific length chars
but in mysql here just show inserted chars not padding specific length chars
---------

Co-authored-by: yiguolei <676222867@qq.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-10-25 17:16:13 +08:00
c0f8c0af39 [FIX](collectiontype) fix collection type with char which without length (#25703)
fix create complex type like array/map/struct with char which without length
2023-10-24 02:16:37 -05:00
e4a83a22d1 [opt](error msg) Make data codec error clearly when load csv data can't display (#25540)
Co-authored-by: Tanya-W <tanya1218w@163,com>
2023-10-18 16:12:22 +08:00
b74836050a [chore](config) turnoff fuzzy for enable_simdjson_reader (#25521) 2023-10-17 18:42:11 +08:00
b2e3ecb81d [opt](load)change load_to_single_tablet tablet search algorithm from random to round-robin (#25256)
At present, `load_to_singlt_tablet` import implementation refers to simple random number remainder, which cannot achieve true averaging. This will lead to uneven disk IO and uneven use of cluster resources. To solve this problem, we are preparing to implement round-robin for each partition tablet imported each time, in order to achieve average load to each tablet.

When generating the load query plan, the tablet index record currently imported is passed to BE.
Add a deamon task in FE to regularly clean up the `loadTabletRecordMap`. The map will get the bucket_number of the partition and update the `load_tablet_index` when `getCurrentLoadTabletIndex`.
2023-10-16 16:43:25 +08:00
79fa1d1640 [enhancement](regression-test) add stream load json case (#25168) 2023-10-09 16:40:39 +08:00
6d27a016b9 [Improvement](regression-test) add http_stream case (#24930) 2023-09-27 09:55:52 +08:00
55d1090137 [feature](insert) Support group commit stream load (#24304) 2023-09-26 20:57:02 +08:00
bc747be511 [Improvement](regression-test) add stream load case (#24396) 2023-09-26 15:35:19 +08:00
8d4fd76a16 [Feature](StreamLoad2PC) Support commit and abort streamload2PC by label (#24613) 2023-09-25 22:21:27 +08:00
ce79711b0d [FIX](serde) fix map/array deserialize string with quote pair (#24808) 2023-09-23 01:12:20 +08:00
7c32b2b4ed [Fix](broker load) broker load with or predicate error fix #24157
Co-authored-by: wangqingtao6 <wangqingtao6@jd.com>
2023-09-20 14:56:32 +08:00
c704497d02 [fix](csv_reader)Fixed bug when parsing multi-character delimiters. (#24572)
Fixed bug when parsing multi-character delimiters.
2023-09-20 12:41:35 +08:00
4b5cea1ef8 [enhancement](fix)change ordinary type null value is \N,complex type null value is null (#24207) 2023-09-16 21:46:42 +08:00
268c867679 [Improve](serde)replace function_cast from_string to serde (#24087)
Now we can not support streamload with column which is map/array nested map/array
serde can do this now , so we can replace it
Notice. if item data in complex type data is empty we just return error, instead of makeup default value , because now we can not define right default for complex type
2023-09-14 13:53:16 +08:00
650af8f4df [fix](test) fix broker load with default value test case (#24123) 2023-09-10 10:28:22 +08:00
f9a75b5c4f [feature](csv_serde)1.append csv serde for serialize to csv and deserialize from csv. 2.let csvReader use csv serde not text_converter. (#23352)
1. append csv serde for serialize to csv and deserialize from csv.
2. let csvReader use csv serde not text_converter.
2023-09-10 00:16:21 +08:00
4dac2d3b94 [Fix](Plan)StreamLoad cannot be parsed correctly when it contains complex where conditions (#23874) 2023-09-05 11:26:59 +08:00
a5761a25c5 [feature](move-memtable)[7/7] add regression tests (#23515)
Co-authored-by: laihui <1353307710@qq.com>
2023-08-26 17:52:10 +08:00
2dda44d7b5 [fix](csv-reader)fix bug of multi-char delimiter in csv reader
fix bug that csv_reader parse line in order to get column.
2023-08-23 15:19:13 +08:00
b49dc8042d [feature](load) refactor CSV reading process during scanning, and support enclose and escape for stream load (#22539)
## Proposed changes

Refactor thoughts: close #22383
Descriptions about `enclose` and `escape`: #22385

## Further comments

2023-08-09: 
It's a pity that experiment shows that the original way for parsing plain CSV is faster. Therefor, the refactor is only applied on enclose related code. The plain CSV parser use the original logic.

Fallback of performance is unavoidable anyway. From the `CSV reader`'s perspective, the real weak point may be the write column behavior, proved by the flame graph.
 
Trimming escape will be enable after fix: #22411 is merged

Cases should be discussed: 

1. When an incomplete enclose appears in the beginning of a large scale data, the line delimiter will be unreachable till the EOF, will the buffer become extremely large?
2. What if an infinite line occurs in the case? Essentially,  `1.` is equivalent to this.  

Only support stream load as trial in this PR, avoid too many unrelated changes. Docs will be added when `enclose` and `escape` is available for all kinds of load.
2023-08-15 09:23:53 +08:00
66784cef71 [Enhancement](Load) Stream Load using SQL (#22509)
This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first.

thanks @Cai-Yao @yiguolei
2023-08-08 13:49:04 +08:00
3024b82918 [fix](load)Fix wrong default value for char and varchar of reading json data (#22626)
If a column is defined as: col VARCHAR/CHAR NULL and no default value. Then we load json data which misses column col, the result queried is not correct:
+------+
| col |
+------+
| 1 |
+------+
But expect:
+------+
| col |
+------+
| NULL |
+------+

---------

Co-authored-by: duanxujian <duanxujian@jd.com>
2023-08-05 12:47:27 +08:00
7261845b3d [FIX](complex-type)fix complex type nested col_const (#22375)
for array/map/struct in mysql_writer unpack_if_const only unpack self column not nested , so col_const should not used in nested column.
2023-07-31 14:53:18 +08:00
40299d280d [Fix](json reader) fix rapidjson array->PushBack may take ownership… (#21988)
With bellow json path
`["$.data","$.data.datatimestamp"]`

After `array_obj->PushBack` the `data` field owner will be taken from array_obj, and lead to null values for json path `$.data.datatimestamp`

Rapidjson doc:
```
//! Append a GenericValue at the end of the array.
  \note The ownership of \c value will be transferred to this array on success.
 */
GenericValue& PushBack(GenericValue& value, Allocator& allocator);
```
2023-07-21 17:02:01 +08:00
20242d9a0e [Improve](simdjson) put unescaped string value after parsed (#21866)
In some cases, it is necessary to unescape the original value, such as when converting a string to JSONB.
If not unescape, then later jsonb parse will be failed
2023-07-20 10:33:17 +08:00
6a0a21d8b0 [regression-test](load) add streamload default value test (#21536) 2023-07-06 10:14:13 +08:00
2beed11256 [Bug](streamload) fix inconsistent load result of be and fe (#20950) 2023-06-21 18:12:51 +08:00
0b228b3414 [fix](load)Support load json data with default value (#20624)
* support json default value

---------

Co-authored-by: duanxujian <duanxujian@jd.com>
2023-06-12 14:51:31 +08:00
55ccddb62c [Conf](decimalv3) enable decimalv3 by default 2023-05-29 15:38:31 +08:00