1. Disable the MySQL client and LZO library by default when building the Doris.
MySQL client library is used for MySQL external table feature.
This feature will be replaced by the new ODBC external table soon.
LZO library is used to compress/decompress data of some old data format of Doris,
which is no longer used anymore.
2. Add missing license to some files.
3. For all non-Apache-License code, all are explained in NOTICE file and the corresponding license is declared.
4. Remove the js source code from webroot, it will be downloaded as thirdparty
Since the Segment V2 has been released for a long time, we should make it as default storage format for newly created table.
This CL mainly changes:
1. For all newly created tables, their default storage format is Segment V2.
2. For all already exist tablets, their storage format remain unchanged.
3. Fix bugs described in Fix#4384 and Fix#4385
We have changed most of our serialization methods to json. In order to be compatible with previous data, these classes still retain the readFields method. Some prs that involve modifying metadata often modify the readFields method. To avoid this, we should Mark these methods as Deprecated #4398
* Implements the grammar of the batch delete #4051
* Process create, alter table when table has delete sign column
* Support the syntax for enabling the delete column
* Automatically filtered deleted data in the select statement.
* Automatically add delete sign when create rollup table
TODO:
* Optimize the reading and compaction logic on the be side, so that the data marked as deleted will be completely deleted during base compaction
Define Expr will not serialized in Column `toThrift`.
1. When adding partition, different indexes should use their own keys type
instead of using the keys type of base table uniformly.
`
2. There are two kinds of define expr in Column , one is analyzed, and the other is not analyzed.
Currently, analyzed define expr is only used when creating materialized views, so the define expr in RollupJob must be analyzed.
In other cases, such as define expr in `MaterializedIndexMeta`, it may not be analyzed after being relayed.
When executing the load, the analyzed define expr (such as to_bitmap(cast(k1, varchar))) will not be analyzed again.
Only a cast function will be added to the inner layer(such as to_bitmap(cast(cast(k1 ,int), varchar))) which is analyzed too.
The define expr that has not been analyzed (such as cast(k1, varchar)) will be analyzed when executing the load.
1. Input the correct keys type when mv is updated.
The keys type of mv should be used in schema change job rather then keys type of base table.
Otherwise, the be will core and thrown exception "Create replicas failed".
2. Forbidden add non-key column on agg mv directly when base table is duplicate model
If a dup table has a agg mv, user will not add a non-key column on mv.
The non-key column can only be added to dup index.
The rewrite rule named `CountToSum` does not distinguish between `Count` and `Count distinct` which causes `Count distinct` is rewritten as `Sum` incorrectly.
So this commit modified matching rule.
When the function is `Count distinct`, the rewrite rule will not take effect.
Fixed#4381
replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow:
mysql> select replace("http://www.baidu.com:9090", "9090", "");
+------------------------------------------------------+
| replace('http://www.baidu.com:9090', '9090', '') |
+------------------------------------------------------+
| http://www.baidu.com: |
+------------------------------------------------------+
When setting global variables, such as `set global default_rowset_type=beta`,
the operation is not correctly persisted.
This CL change the fe meta version to 90.
---------------
The main reason for this problem is that for the modification of global variable,
we directly use Java's reflection mechanism to modify static member variables in `GlobalVariable` class.
But in the persistence method of the `set` operation, we only persist the value stored
in the `globalSessionVariable` variable, and this variable does not contain Global Variable.
So I added a new OperationType: `OP_GLOBAL_VARIABLE_V2`,
and added a `GlobalVarPersistInfo` class to record all changes.
In some very special circumstances, such as code bugs, or human misoperation, etc.,
all replicas of some tablets may be lost. In this case, the data has been substantially lost.
However, in some scenarios, the business still hopes to ensure that the query will not
report errors even if there is data loss, and reduce the perception of the user layer.
At this point, we can use the blank Tablet to fill the missing replica to ensure that the query can be executed normally.
Add a new FE config `recover_with_empty_tablet`. default is false. true means to use empty tablet to fill the missing one.
Also fix a bug in Fix#4274
The partition column of table also must be the key in materialized view.
If not, when user wants to add partition of table, the be will core.
The materialized view could not create partition correctly when partition column has been aggregated.
If user wants to create a no grouping mv on aggregation table, the doris will thrown exception.
The correct approach is that explicit declare the grouping column.
For example:
Agg table: k1, k2, sum(k3)
Create materialized view stmt: select k1, k2 from agg_table group by k1, k2.
Fixed#4316
**Describe the bug**
Predicate push down where sub query has distinct may throw NPE
**To Reproduce**
Steps to reproduce the behavior:
1. create table like
```
+--------------+--------------+------+-------+---------+---------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-------+---------+---------+
| event_day | DATETIME | No | true | NULL | |
| title | VARCHAR(600) | No | true | NULL | |
| report_value | VARCHAR(50) | No | false | NULL | REPLACE |
+--------------+--------------+------+-------+---------+---------+
```
2. exec query
```
```SELECT
*
FROM
(
SELECT
DISTINCT event_day,
title
FROM
click_show_window
) a
WHERE
a.title IS NOT NULL
```
4. See error
```
ERROR 1064 (HY000): errCode = 2, detailMessage = Unexpected exception: null
```
This is because DISTINCT generate grouping exprs in agginfo, but this clause does not have a group by clause
* [Feature][Cache] Cache proxy and coordinator #2581
1. Cache's abstract proxy class and BE's Cache implementation
2. Cache coordinator implemented by consistent hashing
* Adjusted the formatting code, naming and variables according to the comments
The column types of the materialized view and the base table are different.
When mv is selected in query plan, the type of slot should be changed by mv column type.
For example:
base table: k1 int, k2 int
mv table: k1 int, k2 bigint sum
The k2 type of slot ref should be changed from int to bigint.
Closed. #4271
This PR is to add inPredicate support to delete statement,
and add max_allowed_in_element_num_of_delete variable to
limit element num of InPredicate in delete statement.
The new function approx_count_distinct is the alias of function ndv.
So Doris also need to rewrite approx_count_distinct to hll function when it is possible to match the hll materialized view.
Support ALTER ROUTINE LOAD JOB stmt, for example:
```
alter routine load db1.label1
properties
(
"desired_concurrent_number"="3",
"max_batch_interval" = "5",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"timezone" = "+08:00"
)
```
Details can be found in `alter-routine-load.md`
Revert “Change type of sum, min, max function column in mv”
This pr is revert pr #4199 .
The daily test is cored when the type of mv column has been changed.
So I revert the pr.
The daily core will be fixed in the future. After that, the pr#4199 will be enable.
Change-Id: Ie04fcfacfcd38480121addc5e454093d4ae75181
Now, if the length of URL is longer than 4096 bytes, netty will refuse.
The case can be reproduced by constructing a very long URL(longer than 4096bytes)
Add 2 http server params:
1. http_max_line_length
2. http_max_header_size
If the agg function is sum, the type of mv column will be bigint.
The only exception is that if the base column is largeint, the type of mv column will be largeint.
If the agg function is min or max, the type of mv column will be same as the type of base column.
For example, the type of mv column is smallint when the agg function is min.
If table1 and table2 are colocated using column k1, k2.
Query should contains all of the k1, k2 to apply colocation algorithm.
Query like select * from table1 inner join table2 where t1.k1 = t2.k1 can not be used as colocation.
We add the rule to avoid the problem.
This PR is mainly do three things:
1. Fix fe meta version bug introduced by #4029 , when fix conflict with #4086
2. Make drop check code easy to read
3. Add doc content for drop meta check
Try to select the BE with an existing replicas as the destination BE for
REPLICA_RELOCATING clone task.
Fix#4147
Also add 2 new FE configs `max_clone_task_timeout_sec` and `min_clone_task_timeout_sec`