Process castexpr, such as: k (float) > 2.0, k(int) > 3.2, Doris On Es should ignore this doris native cast transformation for every row's col value, we push down this `cast semantic` to Elasticsearch.
I believe in this `predicate` situation, would decrease the mount of data for transmission。
k1 is float:
````
k1 >= 5
````
push-down filter:
```
{"range":{"k1":{"gte":"5.000000"}}}
```
k2 is int :
```
k2 > 3.2
```
push-down filter:
```
{"range":{"k2":{"gte":"3.2"}}}
```
when add a parition with storage_cooldown_time property like this:
alter table tablexxx ADD PARTITION p20200421 VALUES LESS THAN("1588262400") ("storage_medium" = "SSD", "storage_cooldown_time" = "2020-05-01 00:00:00");
and show partitions from tablexxx;
the CooldownTime is wrong: 2610-02-17 10:16:40, and what is more, the storage migration is based on the wrong timestamp.
The reason is that the result of DateLiteral.getLongValue is not timestamp.
related issue: #3306
Note: this PR just remove the es_scan_node_test.cpp which is useless
For the moment, just add a simple explain syntax for EsTable without translating the native predicates to ES queryDSL which is better to finished with moving the predicate translating from Doris BE to Doris FE, the whole work is still WIP.
This CL solve problem:
- FE can't aware Coordinate BE down and cancel the txns because the txns can't finish.
- Do some code style refactor
NOTICE: FE meta version upgrade to 83
Our current DELETE strategy reuses the LoadChecker framework.
LoadChecker runs jobs in different stages by polling them in every 5 seconds.
There are four stages of a load job, Pending/ETL/Loading/Quorum_finish,
each of them is allocated to a LoadChecker. Four example, if a load job is submitted,
it will be initialized to the Pending state, then wait for running by the Pending LoadChecker.
After the pending job is ran, its stage will change to ETL stage, and then wait for
running by the next LoadChecker(ETL). Because interval time of the LoadChecker is 5s,
in worst case, a pending job need to wait for 20s during its life cycle.
In particular, the DELETE jobs do not need to wait for polling, they can run the pushTask()
function directly to delete. In this commit, I add a delete handler to concurrently
processing delete tasks.
All delete tasks will push to BE immediately, not required to wait for LoadCheker,
without waiting for 2 LoadChecker(delete job started in LOADING state),
at most 10s will be save(5s per LoadCheker). The delete process now is synchronized
and users get response only after the delete finished or be canceled.
If a delete is running over a certain period of time, it will be cancelled with a timeout exception.
NOTICE: this CL upgrade FE meta version to 82
I don't why, but I found that sometimes when I use "==" to judge the equality of type,
it return false, even if the types are exactly same.
ISSUE: #3309
This CL only changes == to equals() to solve the problem, but the reason is still unknown.
This PR is to limit the replica usage, admin need to know the replica usage for every db and
table, be able to set replica quota for every db.
```
ALTER DATABASE db_name SET REPLICA QUOTA quota;
```
This PR is just a transitional way,but it is better to move the predicates transformation from Doris BE to Doris BE, in this way, Doris BE is responsible for fetching data from ES.
Add a `enable_keyword_sniff ` configuration item in creating External Elasticsearch Table ,it default to true , would to sniff the `keyword` type on the `text analyzed` Field and return the `json_path` which substitute the origin col name.
```
CREATE EXTERNAL TABLE `test` (
`k1` varchar(20) COMMENT "",
`create_time` datetime COMMENT ""
) ENGINE=ELASTICSEARCH
PROPERTIES (
"hosts" = "http://10.74.167.16:8200",
"user" = "root",
"password" = "root",
"index" = "test",
"type" = "doc",
"enable_keyword_sniff" = "true"
);
```
note: `enable_keyword_sniff` default to "true"
run this SQL:
```
select * from test where k1 = "wu yun feng"
```
Output predicate DSL:
```
{"term":{"k1.keyword":"wu yun feng"}}
```
and in this PR, I remove the elasticsearch version detected logic for now this is useless, maybe future is needed.
Multiple subqueries in the having statement need to be rewritten into multiple tables for join. The current rewriting rules need to be transformed.
And this writing is not common, and there is no strong requirement from the business side.
This function will be added later if it is required.
Now, column with REPLACE/REPLACE_IF_NOT_NULL can be filtered by ZoneMap/BloomFilter
when the rowset is base(version starts with zero). Always we think is an optimization.
But when some case, it will occurs bug.
create table test(
k1 int,
v1 int replace,
v2 int sum
);
If I have two records on different two versions
1 2 2 on version [0-10]
1 3 1 on version 11
If I perform a query
select * from test where k1 = 1 and v1 = 3;
The result will be 1 3 1, this is not right because of the first record is filtered.
The right answer is 1 3 3, the v2 should be summed.
Remove this optimization is necessity to make the result is right.
main refactor points are:
- Use a single get_absolute_tablet_path function instead of 3
independent functions
- Remove meaningless return value of register_tablet and deregister_tablet
- Some typo and format
1. Avoid losing plugin if plugin failed to load when replaying
When in replay process, the plugin should always be added to the plugin manager,
even if that plugin failed to be loaded.
2. `show plugin` statement should show all plugins, not only the successfully installed plugins.
3. plugin's name should be unique globally and case insensitive.
4. Avoid creating new instances of plugins when doing metadata checkpoint.
5. Add a __builtin_ prefix for builtin plugins.
This PR is to enhance the performance for txn manage task, when there are so many txn in
BE, the only one txn_map_lock and additional _txn_locks may cause poor performance, and
now we remove the additional _txn_locks and split the txn_map_lock into many small locks.
When the result type is a date type, the result expr type should not be cast.
Because in the FE function, the specific type of the date type is determined by the actual
type of the return value, not by the function return value type.
For example, the function `str_to_date` may return DATE or DATETIME, depends on the
format pattern.
DATE:
```
mysql> select str_to_date('11/09/2011', '%m/%d/%Y');
+---------------------------------------+
| str_to_date('11/09/2011', '%m/%d/%Y') |
+---------------------------------------+
| 2011-11-09 |
+---------------------------------------+
```
DATETIME:
```
mysql> select str_to_date('2014-12-21 12:34:56', '%Y-%m-%d %H:%i:%s');
+---------------------------------------------------------+
| str_to_date('2014-12-21 12:34:56', '%Y-%m-%d %H:%i:%s') |
+---------------------------------------------------------+
| 2014-12-21 12:34:56 |
+---------------------------------------------------------+
Queries like below cannot finish in a acceptable time, `store_sales` has 2800w rows, `customer_address` has 5w rows, for now Doris will create only one cross join node to execute this sql,
the time of eval the where clause is about 200-300 ns, the total count of eval will be 2800w * 5w, this is extremely large, and this will cost 2800w * 5w * 250 ns = 4 billion seconds;
```
select avg(ss_quantity)
,avg(ss_ext_sales_price)
,avg(ss_ext_wholesale_cost)
,sum(ss_ext_wholesale_cost)
from store_sales, customer_address
where ((ss_addr_sk = ca_address_sk
and ca_country = 'United States'
and ca_state in ('CO', 'IL', 'MN')
and ss_net_profit between 100 and 200
) or
(ss_addr_sk = ca_address_sk
and ca_country = 'United States'
and ca_state in ('OH', 'MT', 'NM')
and ss_net_profit between 150 and 300
) or
(ss_addr_sk = ca_address_sk
and ca_country = 'United States'
and ca_state in ('TX', 'MO', 'MI')
and ss_net_profit between 50 and 250
))
```
but this sql can be rewrite to
```
select avg(ss_quantity)
,avg(ss_ext_sales_price)
,avg(ss_ext_wholesale_cost)
,sum(ss_ext_wholesale_cost)
from store_sales, customer_address
where ss_addr_sk = ca_address_sk
and ca_country = 'United States' and (((ca_state in ('CO', 'IL', 'MN')
and ss_net_profit between 100 and 200
) or
(ca_state in ('OH', 'MT', 'NM')
and ss_net_profit between 150 and 300
) or
(ca_state in ('TX', 'MO', 'MI')
and ss_net_profit between 50 and 250
))
)
```
there for we can do a hash join first and then use
```
(((ca_state in ('CO', 'IL', 'MN')
and ss_net_profit between 100 and 200
) or
(ca_state in ('OH', 'MT', 'NM')
and ss_net_profit between 150 and 300
) or
(ca_state in ('TX', 'MO', 'MI')
and ss_net_profit between 50 and 250
))
)
```
to filter the value,
in TPCDS 10g dataset, the rewritten sql only cost about 1 seconds.
Relate Issue: https://github.com/apache/incubator-doris/issues/3248
SQL:
```
select * from test where (k2 = 6 and k3 = 1) or (k2 = 2 and k3 =3 and k4 = 'beijing');
```
Output filter:
```
((#k2:[6 TO 6] #k3:[1 TO 1]) (#(#k2:[2 TO 2] #k3:[3 TO 3]) #k4:beijing))~1
```
SQL:
```
select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and k3 =3 and (k4 = 'beijing' or k4 = 'zhaochun'));
```
Output filter:
```
(k2:[6 TO 6] k3:[7 TO 7] (#(#k2:[2 TO 2] #k3:[3 TO 3]) #((k4:beijing k4:zhaochun)~1)))~1
```
SQL:
```
select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and abs(k3) =3 and (k4 = 'beijing' or k4 = 'zhaochun'));
```
Output filter (`abs` can not be pushed down to es, so doris on es would not process this scenario ):
```
match_all
```
This CL fixes#3270 by skipping recently added version when performing cumulative compaction. A new config named "cumulative_compaction_skip_window_seconds" is added to adjust the time window.
In the past, when we want to modify some BE configs, we have to modify be.conf and then restart BE.
This patch provides a way to modify configs in the type of 'threshold', 'interval', 'enable flag'
when BE is running without restarting it.
You can update a single config once by BE's http API: `be_host:be_http_port/api/update_config?config_name=new_value`
It's not possible to insert duplicated transaction ids for a specific tablet, therefore we could use map<TabletInfo, vector<int64_t>> instead of map<TabletInfo, set<int64_t>> for expire_txn_map.
When calculating the cumulative point at first time, we should stop increasing
the cumulative point when we meet a rowset with overlap flag as OVERLAPPING,
even if it has only one segments.
Currently we have implemented the plugin framework in FE.
This CL make the original audit log logic pluggable.
The following classes are mainly implemented:
1. AuditPlugin
The interface of audit plugin
2. AuditEvent
An AuditEvent contains all information about an audit event, such as a query, or a connection.
3. AuditEventProcessor
Audit event processor receive all audit events and deliver them to all installed audit plugins.
This CL implements two audit module plugins:
1. The builtin plugin `AuditLogBuilder`, which act same as the previous logic, to save the
audit log to the `fe.audit.log`
2. An optional plugin `AuditLoader`, which will periodically inserts the audit log into a Doris table
specified by the user. In this way, users can conveniently use SQL to query and analyze this
audit log table.
Some documents are added:
1. HELP docs of install/uninstall/show plugin.
2. Rename the `README.md` in `fe_plugins/` dir to `plugin-development-manual.md` and move
it to the `docs/` dir
3. `audit-plugin.md` to introduce the usage of `AuditLoader` plugin.
ISSUE: #3226