- Implements ORC lazy materialization, integrate with the implementation of https://github.com/apache/doris-thirdparty/pull/56 and https://github.com/apache/doris-thirdparty/pull/62.
- Refactor code: Move `execute_conjuncts()` and `execute_conjuncts_and_filter_block()` in `parquet_group_reader `to `VExprContext`, used by parquet reader and orc reader.
- Add session variables `enable_parquet_lazy_materialization` and `enable_orc_lazy_materialization` to control whether enable lazy materialization.
- Modify `build.sh` to update apache-orc submodule or download package every time.
when I use mysql-jdbc 5.1.47 create a doris jdbc catalog, the largeint cannot select
When mysql-jdbc reads largeint, it will convert the format to string because it is too long
mysql> select `largeint` from type3;
ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Fail to convert jdbc type of java.lang.String to doris type LARGEINT on column: largeint. You need to check this column type between external table and doris table.
Support querying data from the Nebula graph database
This feature comes from the needs of commercial customers who have used Doris and Nebula, hoping to connect these two databases
changes mainly include:
* add New Graph Database JDBC Type
* Adapt the type and map the graph to the Doris type
Fix errors when read boolean type from external doris cluster by jdbc catalog:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = (172.16.10.11)[INTERNAL_ERROR]Fail to convert jdbc type of java.lang.Integer to doris type BOOL on column: deleted.
You need to check this column type between external table and doris table.
```
MySQL Types and Return Values for GetColumnTypeName and GetColumnClassName are presented in https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-type-conversions.html.
However when tinyInt1isBit=false, GetColumnClassName of MySQL returns java.lang.Boolean, while that of Doris returns java.lang.Integer. In order to be compatible with both MySQL and Doris, Jdbc params should set tinyInt1isBit=true&transformedBitIsBoolean=true
This pr does three things:
1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first.
2. add p2 test for export
3. modify docs
Fix dict cols not be converted back to string type in some cases, which includes introduced by #19039.
For dict cols, we will convert dict cols to int32 type firstly, then convert back to string type after read block.
The block will be reuse it, so it is necessary to convert it back.
This work is in the early stage, current progress is not accurate because the scan range will be too large
for gathering information, what's more, only file scan node and import job support new progress manager
## How it works
for example, when we use the following load query:
```
LOAD LABEL test_broker_load
(
DATA INFILE("XXX")
INTO TABLE `XXX`
......
)
```
Initial Progress: the query will call `BrokerLoadJob` to create job, then `coordinator` is called to calculate scan range and its location.
Update Progress: BE will report runtime_state to FE and FE update progress status according to jobID and fragmentID
we can use `show load` to see the progress
PENDING:
```
State: PENDING
Progress: 0.00%
```
LOADING:
```
State: LOADING
Progress: 14.29% (1/7)
```
FINISH:
```
State: FINISHED
Progress: 100.00% (7/7)
```
At current time, full output of `show load\G` looks like:
```
*************************** 1. row ***************************
JobId: 25052
Label: test_broker
State: LOADING
Progress: 0.00% (0/7)
Type: BROKER
EtlInfo: NULL
TaskInfo: cluster:N/A; timeout(s):250000; max_filter_ratio:0.0
ErrorMsg: NULL
CreateTime: 2023-05-03 20:53:13
EtlStartTime: 2023-05-03 20:53:15
EtlFinishTime: 2023-05-03 20:53:15
LoadStartTime: 2023-05-03 20:53:15
LoadFinishTime: NULL
URL: NULL
JobDetails: {"Unfinished backends":{"5a9a3ecd203049bc-85e39a765c043228":[10080]},"ScannedRows":39611808,"TaskNumber":1,"LoadBytes":7398908902,"All backends":{"5a9a3ecd203049bc-85e39a765c043228":[10080]},"FileNumber":1,"FileSize":7895697364}
TransactionId: 14015
ErrorTablets: {}
User: root
Comment:
```
## TODO:
1. The current partition granularity of scan range is too large, resulting in an uneven loading process for progress."
2. Only broker load supports the new Progress Manager, support progress for other query
This PR enables periodic collection of statistics and is a precursor to automatic statistics collection. It mainly includes the following contents:
support periodic collection of statistics.
Change the type of Date in statistics p0 to DateV2(see [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type #19077) for test locally. complement cases(remove Chinese characters, optimize code, etc) , improve stability.
Supports setting whether to keep records of statistics synchronization job info, convenient for use in p0 testing.
The statistics job table was modified, and some auxiliary judgments were added to avoid the user perceiving the modification. This function was removed when the table schema is stable.