`isAdjustedToUTC` is exactly the opposite in parquet reader(https://github.com/apache/parquet-format/blob/master/LogicalTypes.md), resulting the time with `isAdjustedToUTC=true` has increased by eight hours(UTC8).
The parquet with `isAdjustedToUTC=true` can be produced by spark-sql with the following configuration:
```
--conf spark.sql.session.timeZone=UTC
--conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS
```
However, using the following configuration, there's no logical and convert type in parquet meta data, so the time read by doris will also increase by eight hours(UTC8). Users need to set their own UTC time zone in doris(https://doris.apache.org/docs/dev/advanced/time-zone/)
```
--conf spark.sql.session.timeZone=UTC
--conf spark.sql.parquet.outputTimestampType=INT96
```
For HDFS tvf like:
```
select count(*) from hdfs(
"uri" = "hdfs://HDFS8000871/path/to/1.parquet",
"fs.defaultFS" = "hdfs://HDFS8000871/",
"format" = "parquet"
);
```
Before, if the `fs.defaultFS` is end with `/`, the query will fail with error like:
```
reason: RemoteException: File does not exist: /user/doris/path/to/1.parquet
```
You can see that is a wrong path with wrong prefix `/user/doris`
User need to set `fs.defaultFS` to `hdfs://HDFS8000871` to avoid this error.
This PR fix this issue
Previously, local tvf can only query data on one BE node.
But if the storage is shared(eg, NAS), it can be executed on multi nodes.
This PR mainly changes:
1. Add a new property `"shared_stoage" = "false/true"`
Default is false, if set to true, "backend_id" is optional. If "backend_id" is set,
it still be executed on that BE, if not set, "shared_stoage" must be "true"
and it will be executed on multi nodes.
Doc: https://github.com/apache/doris-website/pull/494
1. Check the return value of avro reader's init_fetch_table_schema_reader()
2. Also fix a bug but the parse exception of Nereids may suppress the real exception from old planner
It will result unable to see the real error msg.
This PR #32217 find a problem that may failed to get jni env.
And it did a work around to avoid BE crash.
This PR followup this issue, to avoid BE crash when doing `close()` of JniConnector
if failed to get jni env.
The `close()` method will return error when:
1. Failed to get jni env
2. Failed to release jni resource.
This PR will ignore the first error, and still log fatal for second error
In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname
each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client.
So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname.
This PR mainly changes:
1. Add DNSCache for both FE and BE.
The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP.
Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it
and update the cache.
In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may
be changed at anytime.
There are other implements of this dns cache:
1. 36fed13997
This is for BE side, but it does not handle the IP change case.
3. https://github.com/apache/doris/pull/28479
This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change.
And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
MaxScannerThreadNum in file scan operator when turn on pipelinex is incorrect, it will cost many memory and causing performance degradation. This PR fix it.
In previous implementation, the row count cache will be expired after 10min(by default),
and after expiration, the next row count request will miss the cache, causing unstable query plan.
In this PR, the cache will be refreshed after Config.external_cache_expire_time_minutes_after_access,
so that the cache entry will remain fresh.
Some of mysql connector (eg, dotnet MySQL.Data) rely on variable's column type to make connection.
eg, `select @@autocommit` should with column type `BIGINT`, not `BIT`, otherwise it will throw error like:
```
System.FormatException: The input string 'True' was not in a correct format.
at System.Number.ThrowFormatException[TChar](ReadOnlySpan`1 value)
at System.Convert.ToInt32(String value)
at MySql.Data.MySqlClient.Driver.LoadCharacterSetsAsync(MySqlConnection connection, Boolean execAsync, CancellationToken cancellationToken)
at MySql.Data.MySqlClient.Driver.ConfigureAsync(MySqlConnection connection, Boolean execAsync, CancellationToken cancellationToken)
at MySql.Data.MySqlClient.MySqlConnection.OpenAsync(Boolean execAsync, CancellationToken cancellationToken)
at MySql.Data.MySqlClient.MySqlConnection.Open()
```
In this PR, I add a new field of `VarAttr`: `convertBoolToLongMethod`, if set, it will convert boolean to long.
And set it for `autocommit`
close or finish method will take a lot of time, and the lock will hold a lot of time. If there is a bug in close or finish method, it will affect pipeline execute thread.
writer's close method will need this lock, so that it will hang when close method is called.
* [fix](Nereids) support variant column with index when create table (#32948)
* [opt](Nereids) support create table with variant type (#32953)
---------
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>