Persistence stale rowsets meta. When BE reboots, stale rowsets meta
can resume and the stale version can also be readable before stale gc time.
ISSUE: #4453
1. Fix core bug wild pointer in PlanFragmentExecutor, fix issue #4447
2. Fix core bug wild pointer json load, fix issue #4452
3. Change the declare order of ODBC type in thrift for compatibility
* Implements the grammar of the batch delete #4051
* Process create, alter table when table has delete sign column
* Support the syntax for enabling the delete column
* Automatically filtered deleted data in the select statement.
* Automatically add delete sign when create rollup table
TODO:
* Optimize the reading and compaction logic on the be side, so that the data marked as deleted will be completely deleted during base compaction
replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow:
mysql> select replace("http://www.baidu.com:9090", "9090", "");
+------------------------------------------------------+
| replace('http://www.baidu.com:9090', '9090', '') |
+------------------------------------------------------+
| http://www.baidu.com: |
+------------------------------------------------------+
In some very special circumstances, such as code bugs, or human misoperation, etc.,
all replicas of some tablets may be lost. In this case, the data has been substantially lost.
However, in some scenarios, the business still hopes to ensure that the query will not
report errors even if there is data loss, and reduce the perception of the user layer.
At this point, we can use the blank Tablet to fill the missing replica to ensure that the query can be executed normally.
Add a new FE config `recover_with_empty_tablet`. default is false. true means to use empty tablet to fill the missing one.
Also fix a bug in Fix#4274
* [Feature][Cache] Cache proxy and coordinator #2581
1. Cache's abstract proxy class and BE's Cache implementation
2. Cache coordinator implemented by consistent hashing
* Adjusted the formatting code, naming and variables according to the comments
This PR is to add inPredicate support to delete statement,
and add max_allowed_in_element_num_of_delete variable to
limit element num of InPredicate in delete statement.
+ Building the materialized view function for schema_change here based on defineExpr.
+ This is a trick because the current storage layer does not support expression evaluation.
+ count distinct materialized view will set mv_expr with to_bitmap or hll_hash.
+ count materialized view will set mv_expr with count.
+ Support to regenerate historical data when a new materialized view is created in BE。
+ Support to_bitmap function
+ Support hll_hash function
+ Support count(field) function
For #3344
This CL mainly changes:
1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent.
2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly.
3. See `load-json-format.md` to get details of loading json format.
Fix: #3946
CL:
1. Add prepare phase for `from_unixtime()`, `date_format()` and `convert_tz()` functions, to handle the format string once for all.
2. Find the cctz timezone when init `runtime state`, so that don't need to find timezone for each rows.
3. Add constant rewrite rule for `utc_timestamp()`
4. Add doc for `to_date()`
5. Comment out the `push_handler_test`, it can not run in DEBUG mode, will be fixed later.
6. Remove `timezone_db.h/cpp` and add `timezone_utils.h/cpp`
The performance shows bellow:
11,000,000 rows
SQL1: `select count(from_unixtime(k1)) from tbl1;`
Before: 8.85s
After: 2.85s
SQL2: `select count(from_unixtime(k1, '%Y-%m-%d %H:%i:%s')) from tbl1 limit 1;`
Before: 10.73s
After: 4.85s
The date string format seems still slow, we may need a further enhancement about it.
* 1. Add enable spilling in query option, support spill disk in Analytic_Eval_Node, FE can open enable spilling by
set enable_spilling = true;
Now, Sort Node and Analytic_Eval_Node can spill to disk.
2. Delete merge merge_sorter code we do not use now.
3. Replace buffered_tuple_stream by buffered_tuple_stream2 in Analytic_Eval_Node and support spill to disk. Delete the useless code of buffered_block_mgr and buffered_tuple_stream.
4. Add DataStreamRecvr Profile. Move the counter belong to DataStreamRecvr from fragment to DataStreamRecvr Profile to make clear of Running Profile.
* change some hint in code
* replace disable_spill with enable_spill which is better compatible to FE
This CL mainly changes:
1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine.
2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.
This CL mainly changes:
1. Support `SELECT INTO OUTFILE` command.
2. Support export query result to a file via Broker.
3. Support CSV export format with specified column separator and line delimiter.
In the materialized view 2.0 the define expr should be set in column.
For example, the to_bitmap function on integer should be define in mv column.
```
create materialized view mv as select bitmap_union(to_bitmap(k1)) from table.
```
The meta of mv as following:
column name: __doris_materialized_view_bitmap_k1
column aggregate type: bitmap_union
column define exrp: to_bitmap(k1)
The is WIP pr for materialized view 2.0.
#3344
Fix#3390
This CL add more info in `JobDetails` column of `SHOW LOAD` result for Broker Load Job.
For example:
```
{
"Unfinished backends": {
"9c3441027ff948a0-8287923329a2b6a7": [10002]
},
"All backends": {
"9c3441027ff948a0-8287923329a2b6a7": [10002, 10004, 10006]
},
"ScannedRows": 2390016,
"TaskNumber": 1,
"FileNumber": 1,
"FileSize": 1073741824
}
```
2 newly added keys:
`Unfinished backends` indicates the BE which task on them are not finished.
`All backends` indicates the BE which this job has tasks on it.
One more thing, I pass the Backend Id along with the heartbeat msg from FE to BE, so that BE can
know the Id of themselves.
This PR is just a transitional way,but it is better to move the predicates transformation from Doris BE to Doris BE, in this way, Doris BE is responsible for fetching data from ES.
Add a `enable_keyword_sniff ` configuration item in creating External Elasticsearch Table ,it default to true , would to sniff the `keyword` type on the `text analyzed` Field and return the `json_path` which substitute the origin col name.
```
CREATE EXTERNAL TABLE `test` (
`k1` varchar(20) COMMENT "",
`create_time` datetime COMMENT ""
) ENGINE=ELASTICSEARCH
PROPERTIES (
"hosts" = "http://10.74.167.16:8200",
"user" = "root",
"password" = "root",
"index" = "test",
"type" = "doc",
"enable_keyword_sniff" = "true"
);
```
note: `enable_keyword_sniff` default to "true"
run this SQL:
```
select * from test where k1 = "wu yun feng"
```
Output predicate DSL:
```
{"term":{"k1.keyword":"wu yun feng"}}
```
and in this PR, I remove the elasticsearch version detected logic for now this is useless, maybe future is needed.
main refactor points are:
- Use a single get_absolute_tablet_path function instead of 3
independent functions
- Remove meaningless return value of register_tablet and deregister_tablet
- Some typo and format
Support BE plugin framework, include:
* update Plugin Manager, support Plugin find method
* support Builtin-Plugin register method
* plugin install/uninstall process
* PluginLoader:
* dynamic install and check Plugin .so file
* dynamic uninstall and check Plugin status
* PluginZip:
* support plugin remote/local .zip file download and extract
TODO:
* We should support a PluginContext to transmit necessary system variable when the plugin's init/close method invoke
* Add the entry which is BE dynamic Plugin install/uninstall process, include:
* The FE send install/uninstall Plugin statement (RPC way)
* The FE meta update request with Plugin list information
* The FE operation request(update/query) with Plugin (maybe don't need)
* Add the plugin status upload way
* Load already install Plugin when BE start
Related issue: #2663, #2828.
This CL support loading data into specified temporary partitions.
```
INSERT INTO tbl TEMPORARY PARTITIONS(tp1, tp2, ..) ....;
curl .... -H "temporary_partition: tp1, tp, .. " ....
LOAD LABEL db1.label1 (
DATA INFILE("xxxx")
INTO TABLE `tbl2`
TEMPORARY PARTITION(tp1, tp2, ...)
...
```
NOTICE: this CL change the FE meta version to 77.
There 3 major changes in this CL
## Syntax reorganization
Reorganized the syntax related to the `specify-partitions`. Removed some redundant syntax
definitions, and unified the syntax related to the `specify-partitions` under one syntax entry.
## Meta refactor
In order to be able to support specifying temporary partitions,
I made some changes to the way the partition information in the table is stored.
Partition information is now organized as follows:
The following two maps are reserved in OlapTable for storing formal partitions:
```
idToPartition
nameToPartition
```
Use the `TempPartitions` class for storing temporary partitions.
All the partition attributes of the formal partition and the temporary partition,
such as the range, the number of replicas, and the storage medium, are all stored
in the `partitionInfo` of the OlapTable.
In `partitionInfo`, we use two maps to store the range of formal partition
and temporary partition:
```
idToRange
idToTempRange
```
Use separate map is because the partition ranges of the formal partition and
the temporary partition may overlap. Separate map can more easily check the partition range.
All partition attributes except the partition range are stored using the same map,
and the partition id is used as the map key.
## Method to get partition
A table may contain both formal and temporary partitions.
There are several methods to get the partition of a table.
Typically divided into two categories:
1. Get partition by id
2. Get partition by name
According to different requirements, the caller may want to obtain
a formal partition or a temporary partition. These methods are
described below in order to obtain the partition by using the correct method.
1. Get by name
This type of request usually comes from a user with partition names. Such as
`select * from tbl partition(p1);`.
This type of request has clear information to indicate whether to obtain a
formal or temporary partition.
Therefore, we need to get the partition through this method:
`getPartition(String partitionName, boolean isTemp)`
To avoid modifying too much code, we leave the `getPartition(String
partitionName)`, which is same as:
`getPartition(partitionName, false)`
2. Get by id
This type of request usually means that the previous step has obtained
certain partition ids in some way,
so we only need to get the corresponding partition through this method:
`getPartition(long partitionId)`.
This method will try to get both formal partitions and temporary partitions.
3. Get all partition instances
Depending on the requirements, the caller may want to obtain all formal
partitions,
all temporary partitions, or all partitions. Therefore we provide 3 methods,
the caller chooses according to needs.
`getPartitions()`
`getTempPartitions()`
`getAllPartitions()`
2 Changes in this CL:
## Support multiple statements in one request like:
```
select 10; select 20; select 30;
```
ISSUE: #3049
For simple testing this CL, you can using mysql-client shell command tools:
```
mysql> delimiter //
mysql> select 1; select 2; //
+------+
| 1 |
+------+
| 1 |
+------+
1 row in set (0.01 sec)
+------+
| 2 |
+------+
| 2 |
+------+
1 row in set (0.02 sec)
Query OK, 0 rows affected (0.02 sec)
```
I add a new class called `OriginStatement.java`, to save the origin statement in string format with an index. This class is mainly for the following cases:
1. User send a multi-statement to the non-master FE:
`DDL1; DDL2; DDL3`
2. Currently we cannot separate the original string of a single statement from multiple statements. So we have to forward the entire statement to the Master FE. So I add an index in the forward request. `DDL1`'s index is 0, `DDL2`'s index is 1,...
3. When the Master FE handle the forwarded request, it will parse the entire statement, got 3 DDL statements, and using the `index` to get the specified the statement.
## Optimized the display of syntax errors
I have also optimized the display of syntax errors so that longer syntax errors can be fully displayed.
In a large scale cluster, we may rolling upgrade BEs, this patch add a
column named 'Version' for command 'show backends;', as well as website
'/system?path=//backends', to provide a method to check whether there
is any BE missing upgraded.
This bug occurred when BE make snapshot, the version required by fe had been merged into the cumulative version, so the snapshot task could not complete the task even if it retried. In order to solve this problem, the BackupJob could be set to CANCELLED, and the user could continue to retry the job.
Fix#3057
Fixes#2892
IMPORTANT NOTICE: this CL makes incompatible changes to V2 storage format, developers need to create new tables for test.
This CL refactors the metadata and page format for segment_v2 in order to
* make it easy to extend existing page type
* make it easy to add new page type while not sacrificing code reuse
* make it possible to use SIMD to speed up page decoding
Here we summary the main code changes
* Page and index metadata is redesigned, please see `segment_v2.proto`
* The new class `PageIO` is the single place for reading and writing all pages. This removes lots of duplicated code. `PageCompressor` and `PageDecompressor` are now useless and removed.
* The type of value ordinal is changed from `rowid_t` to 64-bits `ordinal_t`, this affects ordinal index as well.
* Column's ordinal index is now implemented by IndexPage, the same with IndexedColumn.
* Zone map index is now implemented by IndexedColumn
The `TStatusCode` struct is used in all FEs and BEs. In order to be able to
avoid errors when identifying status_codes in RPC when upgrading Doris
(update and restart the servers one by one), we must ensure that each element
always a fixed value.
If each element is not explicitly assigned a constant, then the value of
each element will be assigned from 0 in turn, which will need us to be very
careful when adding and removing elements, to avoid the same element on
different machines to be recognized as a different value. i.e., new elements
can only be added to the end, and only elements at the end can be deleted.
Unfortunately, this implicit constraint is likely to be ignored by
programmers when coding, especially those who are new to Doris.
No functional change in this patch.
1. Fix the bug introduced by https://github.com/apache/incubator-doris/pull/2947.
The following sql result is 0000, which is wrong. The result should be 1601
```
select date_format('2020-02-19 16:01:12','%H%i');
```
2. Add constant Express plan test, ensure the FE constant Express compute result is right.
3. Remove the `castToInt ` function in `FEFunctions`, which is duplicated with `CastExpr::getResultValue`
4. Implement `getNodeExplainString` method for `UnionNode`
The logic chain is following:
1. `date_format(if(, NULL, `dt`), '%Y%m%d')` as HASH_PARTITIONED exprs,which is not right, we should use Agg intermediate materialized slot
2. we don't use Agg intermediate materialized slot as HASH_PARTITIONED exprs, becasue
```
// the parent fragment is partitioned on the grouping exprs;
// substitute grouping exprs to reference the *output* of the agg, not the input
partitionExprs = Expr.substituteList(partitionExprs,
node.getAggInfo().getIntermediateSmap(), ctx_.getRootAnalyzer(), false);
parentPartition = DataPartition.hashPartitioned(partitionExprs);
```
the partitionExprs substitute failed。
3. partitionExprs substitute failed because partitionExprs has a casttodate child,but agg info getIntermediateSmap has a cast in datetime child.
4. The cast to date or cast to datetime child exist because `TupleIsNullPredicate` insert a `if` Expr. we don't have `if date` fn, so Doris use `if int` Expr.
5. the `date` in the `catstodate` depend on slot dt date type. the `datetime` in the `catstodatetime` depend on datetime arg type in `date_format` function.
So we could fix this issue by make if fn support date type or make date_format fn support date type