Commit Graph

12 Commits

Author SHA1 Message Date
727fa2c0cd [opt](tvf) refine the class of ExternalFileTableValuedFunction (#24706)
`ExternalFileTableValuedFunction` now has 3 derived classes:

- LocalTableValuedFunction
- HdfsTableValuedFunction
- S3TableValuedFunction

All these tvfs are for reading data from file. The difference is where to read the file, eg, from HDFS or from local filesystem.

So I refine the fields and methods of these classes.
Now there 3 kinds of properties of these tvfs:

1. File format properties

	File format properties, such as `format`, `column_separator`. For all these tvfs, they are common properties.
	So these properties should be analyzed in parenet class `ExternalFileTableValuedFunction`.
	
2. URI or file path

	The URI or file path property indicate the file location. For different storage, the format of the uri are not same.
	So they should be analyzed in each derived classes.
	
3. Other properties

	All other properties which are special for certain tvf.
	So they should be analyzed in each derived classes.
	
There are 2 new classes:

- `FileFormatConstants`: Define some common property names or variables related to file format.
- `FileFormatUtils`: Define some util methods related to file format.

After this PR, if we want to add some common properties for all these tvfs, only need to handled it in
`ExternalFileTableValuedFunction`, to avoid missing handle it in any one of them.

### Behavior change

1. Remove `fs.defaultFS` property in `hdfs()`, it can be got from `uri`
2. Use `\t` as the default column separator of csv format, same as stream load
2023-10-07 12:44:04 +08:00
a27349c83a [fix](Export) Concatenation the outfile sql for Export (#23635)
In the original logic, the `Export` statement generates `Selectstmt` for execution. But there is no way to make the `SelectStmt` use the new optimizer.

Now, we change the `Export` statement to generate the `outfile SQL`, and then use the new optimizer to parse the SQL so that outfile can use the new optimizer.
2023-09-08 10:20:18 +08:00
e54cd6a35d [fix](regression)fix case test_outfile_orc_max_file_size by replace table_export_name #23648
fix case test_outfile_orc_max_file_size by replace table_export_name
2023-08-31 18:51:13 +08:00
25b8831afd [fix](Outfile) fix core dump when export data to orc file format using outfile (#23586)
* fix

* add test
2023-08-30 19:01:44 +08:00
0838ff4bf4 [fix](Outfile) fix bug that the fileSize is not correct when outfile is completed (#22951) 2023-08-18 22:31:44 +08:00
10abbd2b62 [Feauture](Export) support parallel export job using Job Schedule (#22854) 2023-08-18 22:24:42 +08:00
e6b835617b [fix](regression) fix export case (#22790)
Fix a export case because the machine that performs the export is randomized when there are multiple nodes.
2023-08-10 10:00:17 +08:00
12784f863d [fix](Export) Fixed the bug that would be core when exporting large amounts of data (#21761)
A heap-buffer-overflow error occurs when exporting large amounts of data to orc format.
Reserve 50B for buffer to avoid this problem.
2023-07-18 00:06:38 +08:00
4ad3a7a8de [fix](exec) run exec_plan_fragment in pthread to avoid BE crash (#21343)
If there is only one fragment of a query plan, FE will call `exec_plan_fragment` rpc to BE.
And on BE side, the `exec_plan_fragment()` will be executed directly in bthread, but it may call
some JNI method like `AttachCurrentThread()`, which will return error in bthread.

So I modify the `exec_plan_fragment` to make sure it will be executed in pthread pool.
2023-07-01 12:29:22 +08:00
61d9bd2ba1 [fix](regression) fix export file test cases (#20463) 2023-06-06 20:07:31 +08:00
90cd791789 [fix](tvf) s3 tvf specify region and s3.region params failed (#19921) 2023-06-01 10:00:49 +08:00
e78149cb65 [Enhencement](Export) add property for outfile/export and add test (#18997)
This pr does three things:
1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first.
2. add p2 test for export
3. modify docs
2023-05-08 14:02:20 +08:00