This PR refactors the old way of writing data to JDBC External Table & JDBC Catalog, mainly including the following tasks
1. Continuing the work of @BePPPower 's PR #18594, changing the logic of splicing Inster sql to operating off-heap memory and using preparedStatement.set to write data logic to complete
2. Supplement the support written by largeint type, mainly to adapt to Java.Math.BigInteger, which uses binary operations
3. Delete the splicing SQL logic in the JDBC External Table & JDBC Catalog related written code
ToDo: Binary type,like bit,binary, blob...
Finally, special thanks to @BePPPower , @AshinGau for his work
Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
required_slots in TFileScanRangeParams params for external hive table may be updated after FileQueryScanNode finalize. For text file, we need to use the origin required_slots in params so that the list could be updated later. Otherwise, query text file may get the following error:
[INTERNAL_ERROR]Unknown source slot descriptor, slot_id=3
Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica.
The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica.
The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool.
When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.
## Problem summary
When we want to push the filter through the union. We should check whether the union's children are `OneRowRelation` or not. If there are some `OneRowRelation`, we shouldn't push down the filter to that part
Before this PR
```
mysql> select * from (select 1 as a, 2 as b union all select 3, 3) t where a = 1;
+------+------+
| a | b |
+------+------+
| 1 | 2 |
| 3 | 3 |
+------+------+
2 rows in set (0.01 sec)
```
After this PR
```
mysql> select * from (select 1 as a, 2 as b union all select 3, 3) t where a = 1;
+------+------+
| a | b |
+------+------+
| 1 | 2 |
+------+------+
1 row in set (0.38 sec)
```
1. Fix create catalog with resource replay bug.
If user create catalog using `create catalog hive with resource xxx`, when replaying edit log,
there is a bug that resource may be dropped, causing NPE and FE will fail to start.
In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true.
So that `with resource` will not be allowed, and it will be deprecated later.
And also fix the replay bug to avoid NPE.
2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication.
When user create 2 hive catalogs, one use simple auth, the other use kerberos auth.
The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.`
So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`.
Which means this property will be added automatically when user creating hive catalog, to avoid such problem.
3. Fix calling `hdfsExists()` issue
When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found.
3. Some code refactor
Avoid import `org.apache.parquet.Strings`
before
mysql [test]>select cast(1 as DECIMALV3(16, 2)) / cast(3 as DECIMALV3(16, 2));
+-----------------------------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) |
+-----------------------------------------------------------+
| 0.00 |
+-----------------------------------------------------------+
mysql [test]>select * from divtest;
+------+------+
| id | val |
+------+------+
| 3 | 5.00 |
| 2 | 4.00 |
| 1 | 3.00 |
+------+------+
mysql [test]>select cast(1 as decimalv3(16,2)) / val from divtest;
+-------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / `val` |
+-------------------------------------+
| 0 |
| 0 |
| 0 |
+-------------------------------------+
after
mysql [test]>select cast(1 as DECIMALV3(16, 2)) / cast(3 as DECIMALV3(16, 2));
+-----------------------------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) |
+-----------------------------------------------------------+
| 0.33 |
+-----------------------------------------------------------+
mysql [test]>select cast(1 as decimalv3(16,2)) / val from divtest;
+-------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / `val` |
+-------------------------------------+
| 0.250000 |
| 0.200000 |
| 0.333333 |
+-------------------------------------+
This is because in the previous code, the constant 1.000 would be transformed into 1.
remove "ReduceType
* [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema
1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse.
2. Make some variables wrapped with std::unique<unique_ptr>
Performance:
| 状态 | QPS | 平均响应时间 (avg) | P99 响应时间 |
|------------------|-----|------------------|-------------|
| 开启 SchemaCache | 501 | 20ms | 34ms |
| 关闭 SchemaCache | 321 | 31ms | 61ms |
* handle schema change with schema version
* remove useless header
* rebase
1. Delete the stats for data size, since it would cost too much time but useless
2. Make task time out configurable since when it's common to analyze a quite huge table that the default 10 min is not suitable
This pr is to fix the bug introduced by PR #19909
The bug failed to set column unique id for iceberg table, which will cause the query result for iceberg table are all NULL.
```
mysql> select * from iceberg_partition_lower_case_parquet limit 1;
+------+------+------+---------+
| k1 | k2 | k3 | city |
+------+------+------+---------+
| NULL | NULL | NULL | Beijing |
+------+------+------+---------+
1 row in set (0.60 sec)
```
After fix:
```
mysql> select * from iceberg_partition_lower_case_parquet limit 1;
+------+------+------+---------+
| k1 | k2 | k3 | city |
+------+------+------+---------+
| 1 | k2_1 | k3_1 | Beijing |
+------+------+------+---------+
1 row in set (0.35 sec)
```
when olap table is dynamic partition enable, if drop and recover olap table, the table should be added to DynamicPartitionScheduler again
---------
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity.
By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed.
This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.
Increase the functionality of advanced materialized view
This feature already supported by legacy planner with PR #19650
This PR implement it in Nereids. This PR implement the features as below:
1. Support multiple columns in aggregate function. eg: select sum(c1 + c2) from t1;
2. Supports complex expressions. eg: select abs(c1), sum(abc(c1+1) + 1) from t1;
TODO:
1. Support adding where in materialized view
* [Bug](point query) checkAndSetPointQuery before checkEnableTwoPhaseRead
1. checkEnableTwoPhaseRead rely on thr short circuit flag
2. add more metric to display lookup profile
* fix rebase
We can not use the string as the variable key to use in the hint.
Before this PR
mysql> SET enable_nereids_planner=true;
Query OK, 0 rows affected (0.01 sec)
mysql> set enable_fallback_to_original_planner=false;
Query OK, 0 rows affected (0.10 sec)
mysql> explain select /*+ SET_var("enable_nereids_planner" = "false") */ 1;
ERROR 1105 (HY000): Exception, msg: Nereids cannot parse the SQL, and fallback disabled. caused by:
no viable alternative at input 'select /*+ SET_var("enable_nereids_planner"'(line 1, pos 27)
After this PR
mysql> SET enable_nereids_planner=true;
Query OK, 0 rows affected (0.01 sec)
mysql> set enable_fallback_to_original_planner=false;
Query OK, 0 rows affected (0.10 sec)
mysql> select /*+ SET_var("enable_nereids_planner" = "false") */ 1;
+------+
| 1 |
+------+
| 1 |
+------+
1 row in set (0.00 sec)
Describe your changes.
Support the string for the hint key in the Parser.
Before change, when doing optimize use Nereids planner, input will serialize to memory first. And when bug happen, it would be dump to minidump file when catching the exception.
We found that serialization process will cause the performance when statistic message too large or when optimization time be small enough.
So the user minidump using should change to ONLY YOU OPEN MINIDUMP SWITCH(set enable_minidump=true;) can you use it.
1. Before this PR if rowset does not contain column which should be read for related SlotDescriptor will call `insert_default` to column, but it's not this real defautl value.Real default value relevant information should be provided by the frontend side.
2. Support fetch when light schema change is not enabled, but disable for AGG or UNIQUE MOR model
When the postgresql bit type size is 1, it reads as a java.lang.boolean via jdbc, and if we match against string,
it will display true or false. But the normal display should be a number,
so when I detect that the size of bit is 1, I will match it with boolean