`isAdjustedToUTC` is exactly the opposite in parquet reader(https://github.com/apache/parquet-format/blob/master/LogicalTypes.md), resulting the time with `isAdjustedToUTC=true` has increased by eight hours(UTC8).
The parquet with `isAdjustedToUTC=true` can be produced by spark-sql with the following configuration:
```
--conf spark.sql.session.timeZone=UTC
--conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS
```
However, using the following configuration, there's no logical and convert type in parquet meta data, so the time read by doris will also increase by eight hours(UTC8). Users need to set their own UTC time zone in doris(https://doris.apache.org/docs/dev/advanced/time-zone/)
```
--conf spark.sql.session.timeZone=UTC
--conf spark.sql.parquet.outputTimestampType=INT96
```
* [Fix](Variant) forbit table with variant type doing segment compaction temporarily
TODO fix this corretly in later works
* [Bug](Variant) use lower case name for variant's root, since backend treat parent column as lower case
This PR address the problem as blow:
```
errCode = 2, detailMessage = (172.16.56.137)[CANCELLED]failed to initialize storage reader. tablet=17136, res=[INTERNAL_ERROR]Not found field_name, field_name:Tags.tag_key1, schema:[Thread(8), Tags(9), Source(5), tags.tag_key1(-1), Title(6), Level(3), Time(2), CreateDate(1), Message(7), IP(4), AppId(0)]
```
1. Fix iceberg catalog bug
This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`,
to get locationUrl by calling hive metastore's `getCatalog()` method.
But this method only exists in hive 3+. So it will fail if we using hive 2.x.
I temporary remove this logic, because this logic is only used from iceberg table writing.
Which is still under development. We will rethink this logic later.
2. Fix test cases
Some of P2 test cases missed `order_qt`. And because the output format of the floating point
type is changed, some result in `out` files need to be regenerated.
Refactored the SQL query to handle the distinct and nullable values when computing the sum of 'v' and 'v + 1' in the 't' table like:
select sum(v + 1), sum(distinct v + 1) from t
=>
select sum(v) + count(v), sum(distinct v) + count(distinct v)
1. only push having as agg's parent if having just use slots from agg's output
2. show user friendly error message when item in select list but not in aggregate node's output
In order to support paimon with hive2, we need to modify the origin HiveMetastoreClient.java
to let it compatible with both hive2 and hive3.
And this modified HiveMetastoreClient should be at the front of the CLASSPATH, so that
it can overwrite the HiveMetastoreClient in hadoop jar.
This PR mainly changes:
1. Copy HiveMetastoreClient.java in FE to BE's preload jar.
2. Split the origin `preload-extensions-jar-with-dependencies.jar` into 2 jars
1. `preload-extensions-project.jar`, which contains the modified HiveMetastoreClient.
2. `preload-extensions-jar-with-dependencies.jar`, which contains other dependency jars.
3. Modify the `start_be.sh`, to let `preload-extensions-project.jar` be loaded first.
4. Change the way the assemble the jni scanner jar
Only need to assemble the project jar, without other dependencies.
Because actually we only use classed under `org.apache.doris` package.
So remove other unused dependency jars can also reduce the output size of BE.
5. fix bug that the prefix of paimon properties should be `paimon.`, not `paimon`
6. Support paimon with hive2
User can set `hive.version` in paimon catalog properties to specify the hive version.
Query like `select * from ut_p partitions(p2) where cast(var['a'] as int) > 0` will fall through parition/tablet prunning since it's plan like
```
mysql> explain analyzed plan select * from ut_p where id = 3 and cast(var['a'] as int) = 789;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalResultSink[26] ( outputExprs=[id#0, var#1] ) |
| +--LogicalProject[25] ( distinct=false, projects=[id#0, var#1], excepts=[] ) |
| +--LogicalFilter[24] ( predicates=((cast(var#4 as INT) = 789) AND (id#0 = 3)) ) |
| +--LogicalFilter[23] ( predicates=(0 = __DORIS_DELETE_SIGN__#2) ) |
| +--LogicalProject[22] ( distinct=false, projects=[id#0, var#1, __DORIS_DELETE_SIGN__#2, __DORIS_VERSION_COL__#3, element_at(var#1, 'a') AS `var`#4], excepts=[] ) |
| +--LogicalOlapScan ( qualified=regression_test_variant_p0.ut_p, indexName=<index_not_selected>, selectedIndexId=10145, preAgg=ON ) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.01 sec)
```
with an extra LogicalProject on top of LogicalOlapScan, so we should handle such case to prune parition/tablet
* [test](neredis) Add tpch test for query rewrite by materialized view (#30870)
query rewrite by materialized view sql is as following
q1, q5, q6, q8, q9, q12, q14
the other is not supported now, will be supported later
* change code usage