Hive file path may contain temporary directory like this:
drwxrwxrwx - root supergroup 0 2023-03-22 21:03 /usr/hive/warehouse/datalake_performance.db/clickbench_parquet_hits/.hive-staging_hive_2023-03-22_21-03-12_047_8461238469577574033-1
drwxrwxrwx - root supergroup 0 2023-05-18 15:03 /usr/hive/warehouse/datalake_performance.db/clickbench_parquet_hits/.hive-staging_hive_2023-05-18_15-03-52_780_3065787006787646235-1
This will cause error when be try to read these files. Need to filter them during FE plan.
The error message was not good for not exist column while show column stats:
```
MySQL [hive.tpch100]> show column stats `lineitem` (l_extendedpric);
ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: null
```
This pr show a meaningful message:
```
mysql> show column stats `lineitem` (l_extendedpric);
ERROR 1105 (HY000): errCode = 2, detailMessage = Column: l_extendedpric not exists
```
use three new plan node to represent defer materialize of TopN.
Example:
```
-- SQL
select * from t1 order by c1 limit 10;
-- PLAN
+------------------------------------------+
| Explain String |
+------------------------------------------+
| PhysicalDeferMaterializeResultSink |
| --PhysicalDeferMaterializeTopN |
| ----PhysicalDistribute |
| ------PhysicalDeferMaterializeTopN |
| --------PhysicalDeferMaterializeOlapScan |
+------------------------------------------+
```
1. add more logs and make error messages more clear
2. sleep a while between retry analyze
3. make concurrency of sync analyze configurable
4. Ignore internal columns like delete sign to save resources
Problem:
When create view with join in table partitions, an error would rise like "Unknown column"
Example:
CREATE VIEW my_view AS SELECT t1.* FROM t1 PARTITION(p1) JOIN t2 PARTITION(p2) ON t1.k1 = t2.k1;
select * from my_view ==> errCode = 2, detailMessage = Unknown column 'k1' in 't2'
Reason:
When create view, we do tosql first in order to persistent view sql. And when doing tosql of table reference, partition key
word was removed to keep neat of sql string. But here when we remove partition keyword it would regarded as an alias.
So "PARTITION" keyword can not be removed.
Solved:
Add “PARTITION” keyword back to tosql string.
join commute rule will swap the left and right child. This cause the change of logical properties. So we need recompute the logical properties in plan post process to get the correct result
Problem:
When create view with projection group_concat(xxx, xxx order by orderkey). It will failed during second parse of inline view
For example:
it works when doing
"SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM test GROUP BY id"
but when create view it does not work
"create view test_view as SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM test GROUP BY id"
Reason:
when creating view, we will doing parse again of view.toSql() to check whether it has some syntax error. And when doing toSql() to group_concat with order by, it add seperate ', ' between second parameter and order by. So when parsing again, it
would failed because it is different semantic with original statement.
group_concat(`name`, "," ORDER BY id) ==> group_concat(`name`, "," , ORDER BY id)
Solved:
Change toSql of group_concat and add order by statement analyze() of group_concat in Planner cause it would work if we get order by from view statement and do not analyze and binding slot reference to it
Cache the iceberg table. When accessing the same table, the metadata will only be loaded once.
Cache the snapshot of the table to optimize the performance of the iceberg table function.
Add cache support for iceberg's manifest file content
a simple test from 2.0s to 0.8s
before
mysql> refresh table tb3;
Query OK, 0 rows affected (0.03 sec)
mysql> select * from tb3;
+------+------+------+
| id | par | data |
+------+------+------+
| 1 | a | a |
| 2 | a | b |
| 3 | a | c |
....
| 68 | a | a |
| 69 | a | b |
| 70 | a | c |
+------+------+------+
70 rows in set (2.10 sec)
mysql> select * from tb3;
+------+------+------+
| id | par | data |
+------+------+------+
| 1 | a | a |
| 2 | a | b |
| 3 | a | c |
...
| 68 | a | a |
| 69 | a | b |
| 70 | a | c |
+------+------+------+
70 rows in set (2.00 sec)
after
mysql> refresh table tb3;
Query OK, 0 rows affected (0.03 sec)
mysql> select * from tb3;
+------+------+------+
| id | par | data |
+------+------+------+
| 1 | a | a |
| 2 | a | b |
...
| 68 | a | a |
| 69 | a | b |
| 70 | a | c |
+------+------+------+
70 rows in set (2.05 sec)
mysql> select * from tb3;
+------+------+------+
| id | par | data |
+------+------+------+
| 1 | a | a |
| 2 | a | b |
| 3 | a | c |
...
| 68 | a | a |
| 69 | a | b |
| 70 | a | c |
+------+------+------+
70 rows in set (0.80 sec)
When adding be, it is required to have only one colon, otherwise an error will be reported. However, ipv6 has many colons
```
String[] pair = hostPort.split(":");
if (pair.length != 2) {
throw new AnalysisException("Invalid host port: " + hostPort);
}
```