Commit Graph

18 Commits

Author SHA1 Message Date
5f20d7c5d0 [regression test](stream load) test for enable_profile (#28534) 2024-01-30 15:30:39 +08:00
45b2dbab6a [improve](group commit) Group commit support max filter ratio when rows is less than value in config (#28139) 2023-12-12 16:33:36 +08:00
1e5ff40e17 [refactor](group commit) remove future block (#27720)
Co-authored-by: huanghaibin <284824253@qq.com>
2023-12-11 08:41:51 +08:00
5adbe47d3a [test](regression) add stream load tvf properties regression test (#27467) 2023-11-23 23:04:10 +08:00
4b6330cb93 [regression test](http stream) add case for strict_mode (#27098) 2023-11-18 00:10:02 +08:00
ee08958526 [regression test](http_stream) case for timezone (#27149)
It does not work now, anyway we need a case.
2023-11-17 13:36:41 +08:00
3e10e5af39 [Fix](Serde) Fix content displayed by complex types in MySQL Client (#25946)
This pr makes three changes to the display of complex types:
1. NULL value in complex types refers to being displayed as `null`, not `NULL`
2. struct type is displayed as "column_name": column_value
3. Time types such as `datetime` and `date`, are displayed with double quotes in complex types. like
    `{1, "2023-10-26 12:12:12"}`

This pr also do a code refactor:
1. nesting_level is set to a member variable of the `DataTypeSerDe`, rather than a parameter in methods.

What's more, this pr fix a bug that fileSize is not correct, introduced by this pr: #25854
2023-11-01 23:48:55 +08:00
5a55e47acd [Enhancement](Load) stream tvf support two phase commit (#23800) 2023-10-09 14:15:56 +08:00
727fa2c0cd [opt](tvf) refine the class of ExternalFileTableValuedFunction (#24706)
`ExternalFileTableValuedFunction` now has 3 derived classes:

- LocalTableValuedFunction
- HdfsTableValuedFunction
- S3TableValuedFunction

All these tvfs are for reading data from file. The difference is where to read the file, eg, from HDFS or from local filesystem.

So I refine the fields and methods of these classes.
Now there 3 kinds of properties of these tvfs:

1. File format properties

	File format properties, such as `format`, `column_separator`. For all these tvfs, they are common properties.
	So these properties should be analyzed in parenet class `ExternalFileTableValuedFunction`.
	
2. URI or file path

	The URI or file path property indicate the file location. For different storage, the format of the uri are not same.
	So they should be analyzed in each derived classes.
	
3. Other properties

	All other properties which are special for certain tvf.
	So they should be analyzed in each derived classes.
	
There are 2 new classes:

- `FileFormatConstants`: Define some common property names or variables related to file format.
- `FileFormatUtils`: Define some util methods related to file format.

After this PR, if we want to add some common properties for all these tvfs, only need to handled it in
`ExternalFileTableValuedFunction`, to avoid missing handle it in any one of them.

### Behavior change

1. Remove `fs.defaultFS` property in `hdfs()`, it can be got from `uri`
2. Use `\t` as the default column separator of csv format, same as stream load
2023-10-07 12:44:04 +08:00
4ce5213b1c [fix](insert) Fix test_group_commit_stream_load and add more regression in test_group_commit_http_stream (#24954) 2023-10-03 20:56:24 +08:00
4ff1ab7a4d [fix](regression-test) regenerate test_http_stream_properties.out file (#24946) 2023-09-28 10:39:15 +08:00
452318a9fc [Enhancement](streamload) stream tvf support user specified label (#24219)
stream tvf support user specified label
example:

curl -v --location-trusted -u root: -H "sql: insert into test.t1 WITH LABEL label1 select c1,c2 from http_stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_http_stream
return:

{
    "TxnId": 2064,
    "Label": "label1",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Success",
    "Message": "OK",
    "NumberTotalRows": 2,
    "NumberLoadedRows": 2,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 27,
    "LoadTimeMs": 152,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 83,
    "ReadDataTimeMs": 92,
    "WriteDataTimeMs": 41,
    "CommitAndPublishTimeMs": 24
}
2023-09-27 12:09:35 +08:00
6d27a016b9 [Improvement](regression-test) add http_stream case (#24930) 2023-09-27 09:55:52 +08:00
a6a0e78f32 [Enhancement](streamload) stream tvf support compress (#24303) 2023-09-26 20:58:20 +08:00
55d1090137 [feature](insert) Support group commit stream load (#24304) 2023-09-26 20:57:02 +08:00
e525e021ee [Enhancement](Load) stream tvf support csv header (#23797)
Co-authored-by: yiguolei <676222867@qq.com>
2023-09-05 11:15:45 +08:00
6630f92878 [Enhancement](Load) stream tvf support json (#23752)
stream tvf support json

[{"id":1, "name":"ftw", "age":18}]
[{"id":2, "name":"xxx", "age":17}]
[{"id":3, "name":"yyy", "age":19}]
example:

curl -v --location-trusted -u root: -H "sql: insert into test.t1(c1, c2) select id, name from http_stream(\"format\" = \"json\", \"strip_outer_array\" = \"true\", \"read_json_by_line\" = \"true\")" -T /root/json_file.json http://127.0.0.1:8030/api/_http_stream
2023-09-02 01:09:06 +08:00
05771e8a14 [Enhancement](Load) stream Load using SQL (#23362)
Using stream load in SQL mode

for example:
example.csv

10000,北京
10001,天津
curl -v --location-trusted -u root: -H "sql: insert into test.t1(c1, c2) select c1,c2 from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql
curl -v --location-trusted -u root: -H "sql: insert into test.t2(c1, c2, c3) select c1,c2, 'aaa' from stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql
curl -v --location-trusted -u root: -H "sql: insert into test.t3(c1, c2) select c1, count(1) from stream(\"format\" = \"CSV\", \"column_separator\" = \",\") group by c1" -T example.csv http://127.0.0.1:8030/api/_stream_load_with_sql
2023-08-30 19:02:48 +08:00