pick from master #36316
expression cast( xx as decimal )'s datatype maybe decimalv3 or decimalv2
depending on enable_decimal_conversion value in fe conf file. if
enable_decimal_conversion is true, the datatype is decimalv3(9, 0), but
the datatype was decimalv3(38, 9) in 2.0 releases. So this pr change the
datatype same as 2.0 releases to keep the behavior consistent.
## Proposed changes
Linked PR : #35389
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
this PR is from @sjyango work in #32326,
wants merge #32326 into master branch, but it's draft and not maintain long time. so have this new PR.
Co-authored-by: sjyango <sjyang2022@zju.edu.cn>
* (feature)(cloud) Use dynamic allocator instead of static buffer pool for better elasticity. (#28983)
* [fix](outfile) Fix unable to export empty data (#30703)
Issue Number: close#30600
Fix unable to export empty data to hdfs / S3, this behavior is inconsistent with version 1.2.7,
version 1.2.7 can export empty data to hdfs/ S3, and there will be exported files on S3/HDFS.
* [fix](file-writer) avoid empty file for segment writer (#31169)
---------
Co-authored-by: AlexYue <yj976240184@gmail.com>
Co-authored-by: zxealous <zhouchangyue@baidu.com>
The UTF8 format of the Windows system has BOM.
We add a new user property to `Outfile/Export`。Therefore, when exporting Doris data, users can choose whether to bring BOM on the beginning of the CSV file.
**Usage:**
```sql
-- outfile:
select * from demo.student
into outfile "file:///xxx/export/exp_"
format as csv
properties(
"column_separator" = ",",
"with_bom" = "true"
);
-- Export:
EXPORT TABLE student TO "file:///xx/tmpdata/export/exp_"
PROPERTIES(
"format" = "csv",
"with_bom" = "true"
);
```
### How to reproduce
1. create a database db1 and a table tbl1;
2. insert some data and export with label L1;
3. drop the db1 and tbl1, and recreate them with same name.
4. insert some data and export with same label L1;
Expect: export success
Actual: error: Label L1 have already been used.
This PR fix it.
This pr do two things:
1. Infer the column name if the column is expression in `select into outfile`. The rule for column name generation can be refered in pr: #24990
2. fix bug that it will core dump if the `_schema` fails to build in the open phase in vorc_transformer.cpp
TODO:
1. Support infer the column name if the column is expression in `select into outfile` in new optimizer(Nereids).
In the original logic, the `Export` statement generates `Selectstmt` for execution. But there is no way to make the `SelectStmt` use the new optimizer.
Now, we change the `Export` statement to generate the `outfile SQL`, and then use the new optimizer to parse the SQL so that outfile can use the new optimizer.
Problem:
It will return a result although we use wrong ak/sk/bucket name, such as:
```sql
mysql> select * from demo.student
-> into outfile "s3://xxxx/exp_"
-> format as csv
-> properties(
-> "s3.endpoint" = "https://cos.ap-beijing.myqcloud.com",
-> "s3.region" = "ap-beijing",
-> "s3.access_key"= "xxx",
-> "s3.secret_key" = "yyyy"
-> );
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
| 1 | 3 | 26 | s3://xxxx/exp_2ae166e2981d4c08-b577290f93aa82ba_ |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
1 row in set (0.15 sec)
```
The reason for this is that we did not catch the error returned by `close()` phase.
This problem is casued by #21197
Fixed an issue that `csv_with_names` and `csv_with_names_and_types` file format could not be exported on nereids optimizer when using `select...into outfile`.
Problem:
When doing show export jobs, limit would execute before sort before changed. So the result would not be expected because limit always cut results first and we can not get what we want.
Example:
we having export job1 and job2 with JobId1 > JobId2. We want to get job with JobId1
show export from db order by JobId desc limit 1;
We do limit 1 first, so we would probably get Job2 because JobId assigned from small to large
Solve:
We can not cut results first if we have order by clause. And cut result set after sorting
1. support export `LARGEINT` data type to parquet/orc file format.
2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format.
3. Fix that the data is not correct when the DATE type data is exported to ORC.
Set FE `enable_new_load_scan_node` to true by default.
So that all load tasks(broker load, stream load, routine load, insert into) will use FileScanNode instead of BrokerScanNode
to read data
1. Support loading parquet file in stream load with new load scan node.
2. Fix bug that new parquet reader can not read column without logical or converted type.
3. Change jsonb parser function to "jsonb_parse_error_to_null"
So that if the input string is not a valid json string, it will return null for jsonb column in load task.
1. add "sync" to avoid some potential meta sync problem when running regression test on multi-node cluster
2. Use /tmp dir as dest dir of outfile test, to avoid "No such file or directory" error.