when a query (#3492) contain “2 DistinctAggregation with one column” and “1
DistinctAggregation with two columns”, it will produce wrong result.
This pull request is not to solve this problem really, but to generate exceptions to avoid
getting wrong results.
This problem needs a real repair in future.
**1. phenomenon:**
The following two statements are the same, but a query has results and the other query has no results
mysql> select * from (select '积极' as kk1, sum(k2) from table_range where k1 = '2013-01-01' group by kk1)tt where kk1 = '积极';
+--------+-----------+
| kk1 | sum(`k2`) |
+--------+-----------+
| 积极 | 1 |
+--------+-----------+
1 row in set (0.01 sec)
mysql> select * from (select '积极' as kk1, sum(k2) from table_range where k1 = '2013-01-01' group by kk1)tt where kk1 in ('积极');
Empty set (0.01 sec)
**2. reason:**
In partition prune, constant in predicate(‘积极’ in ‘积极’) is mistakenly considered to meet partition prune conditions, and mistakenly regarded as partition prune column. Then in partition prune , no partition is considered to meet the requirements, so it is planned to be 0 partition in query planning
This PR is the first step to make Doris stream load more robust with higher concurrent
performance(#3368),the main work is to support txn management in db level isolation
and use ArrayDeque to stored final status txns.
For SQL like:
```
select * from
join1 left semi join join2
on join1.id = join2.id and join2.id > 1;
```
the predicate `join2.id > 1` can not be pushed down to table join1.
When there is subquery in where clause, the query will be rewritten to join operation.
And some auxiliary binary predicates will be generated. These binary predicates
will not go through the ExprRewriteRule, so they are not normalized as
"column to the left and constant to the right" format.
We need to take this case into account so that the `canPushDownPredicate()` judgement
will not throw exception.
Current implement is very simply and conservative, because our query planner is error-prone.
After we implement the new query planner, we could do this work by `Predicate Equivalence Class` and `PredicatePushDown` rule like presto.
* [Fix] Fix bug that rowset meta is deleted after compaction
After compaction, the tablet rowset meta will be modified by
adding to new output rowsets and deleting the old input rowsets.
The output version may equals to the input version.
So we should delete the "input" version from _rs_version_map
before adding the "output" version to _rs_version_map. Otherwise,
the new "output" version will be lost in _rs_version_map.
1. The correlated slot ref should be bound by the agg tuple of outer query.
However, the correlated having clause can not be analyzed correctly so the result is incorrect.
For example:
```
SELECT k1 FROM test GROUP BY k1
HAVING EXISTS(SELECT k1 FROM baseall GROUP BY k1 HAVING SUM(test.k1) = k1);
```
The correlated predicate is not executed.
2. The limit offset should also be rewritten when there is subquery in having clause.
For example:
```
select k1, count(*) cnt from test group by k1 having k1 in
(select k1 from baseall order by k1 limit 2) order by k1 limit 5 offset 3;
```
The new stmt should has a limit element with offset.
This CL add new command to set replication number of table in one time.
```
alter table test_tbl set ("replication_num" = "3");
```
It changes replication num of a unpartitioned table.
and
```
alter table test_tbl set ("default.replication_num" = "3");
```
It changes default replication num of the specified table.
This CL mainly solve the problem that when recycle `OlapTableSink`
object, GC thread will not do it immediately because the class override
the `finalize` method, and it will cause OOM.
This CL mainly made the following modifications:
1. Reorganized SegmentV2 upgrade document.
2. When the variable `use_v2_rollup` is set to true, the base rollup in v2 format is forcibly queried for verifying the data.
3. Fix a problem that there is no persistent storage format information in the schema change operation that performs v2 conversion.
4. Allow users to directly create v2 format tables.
when add a parition with storage_cooldown_time property like this:
alter table tablexxx ADD PARTITION p20200421 VALUES LESS THAN("1588262400") ("storage_medium" = "SSD", "storage_cooldown_time" = "2020-05-01 00:00:00");
and show partitions from tablexxx;
the CooldownTime is wrong: 2610-02-17 10:16:40, and what is more, the storage migration is based on the wrong timestamp.
The reason is that the result of DateLiteral.getLongValue is not timestamp.
This CL solve problem:
- FE can't aware Coordinate BE down and cancel the txns because the txns can't finish.
- Do some code style refactor
NOTICE: FE meta version upgrade to 83
Our current DELETE strategy reuses the LoadChecker framework.
LoadChecker runs jobs in different stages by polling them in every 5 seconds.
There are four stages of a load job, Pending/ETL/Loading/Quorum_finish,
each of them is allocated to a LoadChecker. Four example, if a load job is submitted,
it will be initialized to the Pending state, then wait for running by the Pending LoadChecker.
After the pending job is ran, its stage will change to ETL stage, and then wait for
running by the next LoadChecker(ETL). Because interval time of the LoadChecker is 5s,
in worst case, a pending job need to wait for 20s during its life cycle.
In particular, the DELETE jobs do not need to wait for polling, they can run the pushTask()
function directly to delete. In this commit, I add a delete handler to concurrently
processing delete tasks.
All delete tasks will push to BE immediately, not required to wait for LoadCheker,
without waiting for 2 LoadChecker(delete job started in LOADING state),
at most 10s will be save(5s per LoadCheker). The delete process now is synchronized
and users get response only after the delete finished or be canceled.
If a delete is running over a certain period of time, it will be cancelled with a timeout exception.
NOTICE: this CL upgrade FE meta version to 82
I don't why, but I found that sometimes when I use "==" to judge the equality of type,
it return false, even if the types are exactly same.
ISSUE: #3309
This CL only changes == to equals() to solve the problem, but the reason is still unknown.
This PR is to limit the replica usage, admin need to know the replica usage for every db and
table, be able to set replica quota for every db.
```
ALTER DATABASE db_name SET REPLICA QUOTA quota;
```
Now, column with REPLACE/REPLACE_IF_NOT_NULL can be filtered by ZoneMap/BloomFilter
when the rowset is base(version starts with zero). Always we think is an optimization.
But when some case, it will occurs bug.
create table test(
k1 int,
v1 int replace,
v2 int sum
);
If I have two records on different two versions
1 2 2 on version [0-10]
1 3 1 on version 11
If I perform a query
select * from test where k1 = 1 and v1 = 3;
The result will be 1 3 1, this is not right because of the first record is filtered.
The right answer is 1 3 3, the v2 should be summed.
Remove this optimization is necessity to make the result is right.
1. Avoid losing plugin if plugin failed to load when replaying
When in replay process, the plugin should always be added to the plugin manager,
even if that plugin failed to be loaded.
2. `show plugin` statement should show all plugins, not only the successfully installed plugins.
3. plugin's name should be unique globally and case insensitive.
4. Avoid creating new instances of plugins when doing metadata checkpoint.
5. Add a __builtin_ prefix for builtin plugins.
When the result type is a date type, the result expr type should not be cast.
Because in the FE function, the specific type of the date type is determined by the actual
type of the return value, not by the function return value type.
For example, the function `str_to_date` may return DATE or DATETIME, depends on the
format pattern.
DATE:
```
mysql> select str_to_date('11/09/2011', '%m/%d/%Y');
+---------------------------------------+
| str_to_date('11/09/2011', '%m/%d/%Y') |
+---------------------------------------+
| 2011-11-09 |
+---------------------------------------+
```
DATETIME:
```
mysql> select str_to_date('2014-12-21 12:34:56', '%Y-%m-%d %H:%i:%s');
+---------------------------------------------------------+
| str_to_date('2014-12-21 12:34:56', '%Y-%m-%d %H:%i:%s') |
+---------------------------------------------------------+
| 2014-12-21 12:34:56 |
+---------------------------------------------------------+
Queries like below cannot finish in a acceptable time, `store_sales` has 2800w rows, `customer_address` has 5w rows, for now Doris will create only one cross join node to execute this sql,
the time of eval the where clause is about 200-300 ns, the total count of eval will be 2800w * 5w, this is extremely large, and this will cost 2800w * 5w * 250 ns = 4 billion seconds;
```
select avg(ss_quantity)
,avg(ss_ext_sales_price)
,avg(ss_ext_wholesale_cost)
,sum(ss_ext_wholesale_cost)
from store_sales, customer_address
where ((ss_addr_sk = ca_address_sk
and ca_country = 'United States'
and ca_state in ('CO', 'IL', 'MN')
and ss_net_profit between 100 and 200
) or
(ss_addr_sk = ca_address_sk
and ca_country = 'United States'
and ca_state in ('OH', 'MT', 'NM')
and ss_net_profit between 150 and 300
) or
(ss_addr_sk = ca_address_sk
and ca_country = 'United States'
and ca_state in ('TX', 'MO', 'MI')
and ss_net_profit between 50 and 250
))
```
but this sql can be rewrite to
```
select avg(ss_quantity)
,avg(ss_ext_sales_price)
,avg(ss_ext_wholesale_cost)
,sum(ss_ext_wholesale_cost)
from store_sales, customer_address
where ss_addr_sk = ca_address_sk
and ca_country = 'United States' and (((ca_state in ('CO', 'IL', 'MN')
and ss_net_profit between 100 and 200
) or
(ca_state in ('OH', 'MT', 'NM')
and ss_net_profit between 150 and 300
) or
(ca_state in ('TX', 'MO', 'MI')
and ss_net_profit between 50 and 250
))
)
```
there for we can do a hash join first and then use
```
(((ca_state in ('CO', 'IL', 'MN')
and ss_net_profit between 100 and 200
) or
(ca_state in ('OH', 'MT', 'NM')
and ss_net_profit between 150 and 300
) or
(ca_state in ('TX', 'MO', 'MI')
and ss_net_profit between 50 and 250
))
)
```
to filter the value,
in TPCDS 10g dataset, the rewritten sql only cost about 1 seconds.
Currently we have implemented the plugin framework in FE.
This CL make the original audit log logic pluggable.
The following classes are mainly implemented:
1. AuditPlugin
The interface of audit plugin
2. AuditEvent
An AuditEvent contains all information about an audit event, such as a query, or a connection.
3. AuditEventProcessor
Audit event processor receive all audit events and deliver them to all installed audit plugins.
This CL implements two audit module plugins:
1. The builtin plugin `AuditLogBuilder`, which act same as the previous logic, to save the
audit log to the `fe.audit.log`
2. An optional plugin `AuditLoader`, which will periodically inserts the audit log into a Doris table
specified by the user. In this way, users can conveniently use SQL to query and analyze this
audit log table.
Some documents are added:
1. HELP docs of install/uninstall/show plugin.
2. Rename the `README.md` in `fe_plugins/` dir to `plugin-development-manual.md` and move
it to the `docs/` dir
3. `audit-plugin.md` to introduce the usage of `AuditLoader` plugin.
ISSUE: #3226
SchemaChangeJobV2 will use too much memory in FE, which may cause FullGC. But these data is useless after job is done, so we need to clean it up.
NOTICE: update FE meta version to 80
After doris support aggregation materialized view on duplicate table,
desc stmt of metadata is confused in sometimes. The reason is that
there is no grouping information in desc stmt of metadata.
For example:
There are two materialized view as following.
1. create materialized view k1_k2 as select k1, k2 from table;
2. create materialzied view deduplicated_k1_k2 as select k1, k2 from table group by k1, k2;
Before this commit, the metatdata in desc stmt is the same.
```
+-----------------------+-------+----------+------+-------+---------+-------+
| IndexName | Field | Type | Null | Key | Default | Extra |
+-----------------------+-------+----------+------+-------+---------+-------+
| k1_k2 | k1 | TINYINT | Yes | true | N/A | |
| | k2 | SMALLINT | Yes | true | N/A | |
| deduplicated_k1_k2 | k1 | TINYINT | Yes | true | N/A | |
| | k2 | SMALLINT | Yes | true | N/A | |
+-----------------------+-------+----------+------+-------+---------+-------+
```
So, we need to show the KeysType of materialized view in desc stmt.
Now, the desc stmt of all mvs is changed as following:
```
+-----------------------+---------------+-------+----------+------+-------+---------+-------+
| IndexName | IndexKeysType | Field | Type | Null | Key | Default | Extra |
+-----------------------+---------------+-------+----------+------+-------+---------+-------+
| k1_k2 | DUP_KEYS | k1 | TINYINT | Yes | true | N/A | |
| | | k2 | SMALLINT | Yes | true | N/A | |
| deduplicated_k1_k2 | AGG_KEYS | k1 | TINYINT | Yes | true | N/A | |
| | | k2 | SMALLINT | Yes | true | N/A | |
+-----------------------+---------------+-------+----------+------+-------+---------+-------+
```
NOTICE: this modify the the column of `desc` stmt.
The bug is described in issue: #3200.
This CL solve the problem by:
1. Refactor the alter operation conflict checking logic by introducing new classes `AlterOperations` and `AlterOpType`.
2. Allow add/drop temporary partition when dynamic partition feature is enabled.
3. Allow modifying table's property when there is temporary partition in table.
4. Make the properties `dynamic_partition.enable` optional, and default is true.
Doris support choose medium when create table, and the cluster balance strategy is dependent
between different storage medium, and most use will not specify the storage medium when create table,
even they kown that they should choose a storage medium, they have no idea about the
cluster's storage medium, so, I think we should make storage_medium and storage_cooldown_time
configurable, and this should be the admin's responsibility.
For Example, if the cluster's storage medium is HDD, but we need to change part of machines to SSD,
if we change the machine, the tablets before change is stored in HDD and they can't find a dest path
to migrate, and user will create table as usual, it will make all tablets stored in old machines and
the new machines will only store a little tablets. Without this config the only way is admin need
to traverse all partitions in cluster and change the property of storage_medium, it will increase
operational and maintenance costs.
So I add a FE config default_storage_medium, so that user can set the default storage medium.
This PR is to reduce the time cost for waiting transactions to be completed in same db by filter the running transactions in table level.
NOTICE: Update FE meta version to 79
The subquery in having clause should be rewritten too.
If not, ExprRewriteRule will not be apply in subquery.
For example:
select k1, sum (k2) from table group by k1 having sum(k2) > (select t1 from table2 where t2 between 1 and 2);
```t1 between 1 and 2``` should be rewritten to ```t1 >=1 and t1<=2```.
Fixed#3205. TPC-DS 14 will be passed after this commit.
issue #2344
* Add install/unintall Plugin statement
* Add show plugin statement
* Support install plugin through two ways:
* Built-in Plugin: use PluginMgr's register method.
* Dynamic Plugin: install by SQL statement, and the process:
1. check Plugin has already install?
2. download Plugin file from remote source or copy from local source
3. extract Plugin's .zip
4. read Plugin's plugin.properties, and check Plugin's Value
5. dynamic load .jar and init Plugin's main Class
6. invoke Plugin's init method
7. register Plugin into PluginMgr.
8. update meta
* Support FE Plugin dynamic uninstall process
1. check Plugin has install?
2. invoke Plugin's close method
3. delete Plugin from PluginMgr
4. update meta
* Add audit plugin interface
* Add plugin enable flags in Config
* Add plugin install path in Config, default plugin will install in ${DORIS_FE_PATH}/plugins
* Add FE plugins project
* Add audit plugin demo
The usage:
```
// install plugin and show plugins;
mysql>
mysql> install plugin from "/home/users/seaven/auditplugin.zip";
Query OK, 0 rows affected (0.05 sec)
mysql>
mysql> show plugins;
+-------------------+-------+---------------+---------+-------------+------------------------+--------+---------------------------------------+
| Name | Type | Description | Version | JavaVersion | ClassName | SoName | Sources |
+-------------------+-------+---------------+---------+-------------+------------------------+--------+---------------------------------------+
| audit_plugin_demo | AUDIT | just for test | 0.11.0 | 1.8.31 | plugin.AuditPluginDemo | NULL | /home/users/hekai/auditplugindemo.zip |
+-------------------+-------+---------------+---------+-------------+------------------------+--------+---------------------------------------+
1 row in set (0.00 sec)
mysql> show plugins;
+-------------------+-------+---------------+---------+-------------+------------------------+--------+---------------------------------------+
| Name | Type | Description | Version | JavaVersion | ClassName | SoName | Sources |
+-------------------+-------+---------------+---------+-------------+------------------------+--------+---------------------------------------+
| audit_plugin_demo | AUDIT | just for test | 0.11.0 | 1.8.31 | plugin.AuditPluginDemo | NULL | /home/users/hekai/auditplugindemo.zip |
+-------------------+-------+---------------+---------+-------------+------------------------+--------+---------------------------------------+
1 row in set (0.00 sec)
mysql> uninstall plugin audit_plugin_demo;
Query OK, 0 rows affected (0.04 sec)
mysql> show plugins;
Empty set (0.00 sec)
```
TODO:
*Config.plugin_dir should be created if missing