Commit Graph

21 Commits

Author SHA1 Message Date
9cd9e195d8 [test](load) add some s3 load regression test (#24399) 2023-09-23 22:40:45 +08:00
7c32b2b4ed [Fix](broker load) broker load with or predicate error fix #24157
Co-authored-by: wangqingtao6 <wangqingtao6@jd.com>
2023-09-20 14:56:32 +08:00
f61e6483bf [enhancement](broker-load) support compress type for old broker load, and split compress type from file format (#23882) 2023-09-14 21:42:28 +08:00
268c867679 [Improve](serde)replace function_cast from_string to serde (#24087)
Now we can not support streamload with column which is map/array nested map/array
serde can do this now , so we can replace it
Notice. if item data in complex type data is empty we just return error, instead of makeup default value , because now we can not define right default for complex type
2023-09-14 13:53:16 +08:00
232b58a27d [fix](broker-load) make sequence column name case insensitive (#24071) 2023-09-10 10:51:07 +08:00
55ccddb62c [Conf](decimalv3) enable decimalv3 by default 2023-05-29 15:38:31 +08:00
ee34b6de2d [Refact] (serde) refact mysql serde with data type (#19543)
refact mysql output (de)serialize with data type serde , avoid accoriding switch case Primitive type writed in mysqlWriter
2023-05-26 14:11:17 +08:00
425101bf53 [fix](test)Move broker test to p2. Move test data to cos in Beijing region (#18893)
Fix broker load p2 test case error.
1. Move test data from cos Hong kong region to Beijing region.
2. Move broker load test to p2 group.
3. Fix error message mismatch error.
2023-04-21 22:15:52 +08:00
ea41d94582 [Improve](complex-type) Support Count(complexType) (#17868)
Support count function for ARRAY/MAP/STRUCT type
2023-03-30 15:43:32 +08:00
4dbe30d37b [regression](vectorized) delete vectorized config in regression tests (#15126) 2022-12-16 17:08:29 +08:00
eab0af7afe [optimization](array-type) optimize the export precision of floating point numbers (#14261)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-11-18 18:24:11 +08:00
3ea9d3f2e1 [enhancement](array) support read list(Array) type from orc file (#14132)
Before this pr, if we try to load ORC file with native list(or array) type data, the be will crash.
Because complex types in ORC file include multi real columns, so we need to filter columns by column names.
Otherwise we could not read all columns we need.
Now arrow release-7.0.0 only support create stripe reader by column index, so we patch it to support create stripe reader by column names.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-15 17:48:17 +08:00
322ac5cf89 [refractor](array) refractor DataTypeArray from_string (#13905)
refractor DataTypeArray from_string, make it more clear;
support ',' and ']' inside string element, for example: ['hello,,,', 'world][]']
support empty elements, such as [,] ==> [0,0]
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-11-09 16:58:08 +08:00
c418bbd2d1 [feature-wip](new-scan) support Json reader (#13546)
Issue Number: close #12574
This pr adds `NewJsonReader` which implements GenericReader interface to support read json format file.

TODO:
1. modify `_scann_eof` later.
2. Rename `NewJsonReader` to `JsonReader` when `JsonReader` is deleted.
2022-10-26 12:52:21 +08:00
9a3c1f0867 [Improvement](decimal) print decimal according to the real precision and scale (#13437) 2022-10-21 10:00:01 +08:00
32b1456b28 [feature-wip](array) remove array config and check array nested depth (#13428)
1. remove FE config `enable_array_type`
2. limit the nested depth of array in FE side.
3. Fix bug that when loading array from parquet, the decimal type is treated as bigint
4. Fix loading array from csv(vec-engine), handle null and "null"
5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null. 
6. Remove `check_array_format()`, because it's logic is wrong and meaningless
7. Add stream load csv test cases and more parquet broker load tests
2022-10-20 15:52:31 +08:00
88e08a92d8 [fix](array-type) fix the wrong result when import array element with double quotes (#12786)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-10-13 23:07:19 +08:00
cf2b93532b [fix](file-scanner) fix some logic about broker load with parquet with new file scanner (#13135)
Fix some logic about broker load using new file scanner, with parquet format:

1. If columns are specified in load stmt, but none of them are in parquet file,
    error will be thrown like `err: No columns found in file`. See `parquet_s3_case4`

2. If the first column of table are not in table, the result number of rows is wrong.
    See `parquet_s3_case8`

3. If column specified in `columns` in load stmt does not exist in file and table,
    error will be thrown like: `failed to find default value expr for slot: x1`. See `parquet_s3_case2`
2022-10-08 13:08:08 +08:00
fa8ed2bccc [fix](array-type) fix the invalid format load for stream load (#12424)
this pr is used to fix the invalid format load for stream load.
before the change , we will get the error when we load the invalid array format.
the origin file to load :
1 [1, 2, 3]
2 [4, 5, 6]
3 \N
4 [7, \N, 8]
5 10, 11, 12
[hugo@xafj-palo]$ sh curl_cmd.sh
{
"TxnId": 11035,
"Label": "11c9f111-188e-4616-9a50-aec8b7814513",
"TwoPhaseCommit": "false",
"Status": "Fail",
"Message": "Array does not start with '[' character, found '1'",
"NumberTotalRows": 0,
"NumberLoadedRows": 0,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 55,
"LoadTimeMs": 7,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 3,
"CommitAndPublishTimeMs": 0
}
3. after this change, we will get success and the error url which report the error line.
[hugo@xafj-palo]$ sh curl_cmd.sh
{
"TxnId": 11046,
"Label": "249808ee-55f4-4c08-b671-b3d82689d614",
"TwoPhaseCommit": "false",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 5,
"NumberLoadedRows": 4,
"NumberFilteredRows": 1,
"NumberUnselectedRows": 0,
"LoadBytes": 55,
"LoadTimeMs": 39,
"BeginTxnTimeMs": 0,
"StreamLoadPutTimeMs": 2,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 19,
"CommitAndPublishTimeMs": 16,
"ErrorURL": "http://10.81.85.89:8502/api/_load_error_log?file=__shard_3/error_log_insert_stmt_8d4130f0c18aeb0a-ad7ffd4233c41893_8d4130f0c18aeb0a_ad7ffd4233c41893"
}

the sql select result:
MySQL [example_db]> select * from array_test06;
+------+--------------+
| k1 | k2 |
+------+--------------+
| 1 | [1, 2, 3] |
| 2 | [4, 5, 6] |
| 3 | NULL |
| 4 | [7, NULL, 8] |
+------+--------------+
4 rows in set (0.019 sec)

the url page show us:
"Reason: Invalid format for array column(k2). src line [10, 11, 12]; "

Issue Number: #7570
2022-09-19 08:52:59 +08:00
44c4a45f72 [fix](array-type) fix the wrong data when use stream load to import '\N' (#12102)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-29 09:53:37 +08:00
ff1971f916 [improvement](test) add dryRun option and group all cases into either p0 or p1 (#11576)
1. add dryRun option to list tests
2. group all cases into p0 p1 p2
2022-08-17 22:45:53 +08:00