Support user-defined variables.
After this PR, we can use `set @a = xx` to define a user variable and use it in the query like `select @a`.
the changes of this PR:
1. Support the grammar for `set user variable` in the parser.
2. Add the `userVars` in `VariableMgr` to store the user-defined variables.
3. For the `set @a = xx`, we will store the variable name and its value in the `userVars` in `VariableMgr`.
4. For the `select @a`, we will get the value for the variable name in `userVars`.
* [fix](load) in strict mode, return error for load and insert if datatype convert fails
Revert "[fix](MySQL) the way Doris handles boolean type is consistent with MySQL (#19416)"
This reverts commit 68eb420cabe5b26b09d6d4a2724ae12699bdee87.
Since it changed other behaviours, e.g. in strict mode insert into t_int values ("a"),
it will result 0 is inserted into table, but it should return error instead.
* fix be ut
* fix regression tests
1. update all date related functions' signatures order.
1.1. if return value need to be compute with time info, args with datetimev2 at the top of the list, followed by datev2, datetime and date
1.2. if return value need to be compute with only date info, args with datev2 at the top of list, followed by datetimev2, date and datetime
2. Priority for use datev2, if we must cast date to datev2 or datetime/datetimev2
This commit support a function allows return a field column in named struct column.
Since the function can return any type, this commit also supports ANY_STRUCT_TYPE
and ANY_ELEMENT_TYPE.
There should be 2 kinds of ScanNode:
OlapScanNode
ExternalScanNode
The Backends used for ExternalScanNode should be controlled by FederationBackendPolicy.
But currently, only FileScanNode is controlled by FederationBackendPolicy, other scan node such as MysqlScanNode,
JdbcScanNode will use Mix Backend even if we enable and prefer to use Compute Backend.
In this PR, I modified the hierarchy of ExternalScanNode, the new hierarchy is:
ScanNode
OlapScanNode
SchemaScanNode
ExternalScanNode
MetadataScanNode
DataGenScanNode
EsScanNode
OdbcScanNode
MysqlScanNode
JdbcScanNode
FileScanNode
FileLoadScanNode
FileQueryScanNode
MaxComputeScanNode
IcebergScanNode
TVFScanNode
HiveScanNode
HudiScanNode
And previously, the BackendPolicy is the member of FileScanNode, now I moved it to the ExternalScanNode.
So that all subtype ExternalScanNode can use BackendPolicy to choose Compute Backend to execute the query.
All all ExternalScanNode should implement the abstract method createScanRangeLocations().
For scan node like jdbc scan node/mysql scan node, the scan range locations will be selected randomly from
compute node(if preferred).
And for compute node selection. If all scan nodes are external scan nodes, and prefer_compute_node_for_external_table
is set to true, the BE for this query will only select compute nodes.
1. derive analytic node stats, add support for rank()
2. filter estimation stats derive updated. update row count of filter column.
3. use ColumnStatistics.orginal to replace ColumnStatistics.orginalNdv, where ColumnStatistics.orginal is the column statisics get from TableScan.
TPCDS 70 on tpcds_sf100 improved from 23sec to 2 sec
This pr has no performance downgrade on other tpcds queries and tpch queries.
This PR does the following:
1. This PR is a substantial refactor of the JDBC client architecture. The previous monolithic JDBC client has been refactored into an abstract base class `JdbcClient`, and a set of database-specific subclasses (e.g., `JdbcMySQLClient`, `JdbcOracleClient`, etc.), and the JdbcClient required config, abstract into an object. This allows for improved modularity, easier addition of support for new databases, and cleaner, more maintainable code. This change is backward-compatible and does not affect existing functionality.
2. As a result of client refactoring, OceanBaseClient can automatically recognize the mode of operation as MySQL or Oracle, so we cancel the oceanbase_mode property in the Jdbc Catalog, but due to the cancellation of the property, When creating a single OceanBase Jdbc Table, the table type needs to be filled in as oceanbase(mysql mode) or oceanbase_oracle(oracle_mode). The above work is a change in the usage behavior, please note.
3. For the PostgreSQL Jdbc Catalog, I did two things:
1. The adaptation to MATERIALIZED VIEW and FOREIGN TABLE is added
2. Fixed reading jsonb, which had been incorrectly changed to json in a previous PR
4. fix some jdbc catalog test case
5. modify oceanbase jdbc doc
And,Thanks @wolfboys for the guidance
Nereids planner include all columns index in TFileScanRangeParams, this may cause the column projection incorrect for
text format table. Because csv reader use the column index position to split a line. Extra column index will cause get
wrong split result. This PR is to reset the column index after Projection, remove the useless column index.
fix 3 bugs:
1. failed to insert into a table with mv.
```sql
create table t (
id int,
c1 int,
c2 int,
c3 int
) duplicate key(id)
distributed by hash(id) buckets 4
create materialized view k12s3m as select id, sum(c1), max(c3) from t group by id;
insert into t select -4, -4, -4, 'd';
```
insert will rise exception because mv column is not handled. now we will add a target column and value as defineExpr.
2. failed to insert into a table with not all the columns.
```sql
insert into t(c1, c2) select c1, c2 from t
```
and t(id ukey, c1, c2, c3), will insert too many data, we fix it by change the output partitions.
3. failed to insert into a table with complex select.
the select statement has join or agg, fix the bug by the way similar to the one at 2nd bug.
1. make ColumnObject exception safe
2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema
3. add more test cases
When using nereids, if we use compare operator of bitmap type, an analyze exception need to be throwed.
like:
select id from (select BITMAP_EMPTY() as c0 from expr_test) as ref0 where c0 = 1 order by id
Which c0 in subq0 is a bitmap type, this scenario is not supported right now.
The test query includes the conversion of string types to other types, and the processing of materialized columns for nested subqueries, which is the regression test for bug fix(#18783)
Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica.
The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica.
The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool.
When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.
## Problem summary
When we want to push the filter through the union. We should check whether the union's children are `OneRowRelation` or not. If there are some `OneRowRelation`, we shouldn't push down the filter to that part
Before this PR
```
mysql> select * from (select 1 as a, 2 as b union all select 3, 3) t where a = 1;
+------+------+
| a | b |
+------+------+
| 1 | 2 |
| 3 | 3 |
+------+------+
2 rows in set (0.01 sec)
```
After this PR
```
mysql> select * from (select 1 as a, 2 as b union all select 3, 3) t where a = 1;
+------+------+
| a | b |
+------+------+
| 1 | 2 |
+------+------+
1 row in set (0.38 sec)
```
1. Fix create catalog with resource replay bug.
If user create catalog using `create catalog hive with resource xxx`, when replaying edit log,
there is a bug that resource may be dropped, causing NPE and FE will fail to start.
In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true.
So that `with resource` will not be allowed, and it will be deprecated later.
And also fix the replay bug to avoid NPE.
2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication.
When user create 2 hive catalogs, one use simple auth, the other use kerberos auth.
The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.`
So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`.
Which means this property will be added automatically when user creating hive catalog, to avoid such problem.
3. Fix calling `hdfsExists()` issue
When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found.
3. Some code refactor
Avoid import `org.apache.parquet.Strings`