Commit Graph

5755 Commits

Author SHA1 Message Date
67b842ce04 [License] Organize and modify the license of the code (#4371)
1. Disable the MySQL client and LZO library by default when building the Doris.

    MySQL client library is used for MySQL external table feature.
    This feature will be replaced by the new ODBC external table soon.

    LZO library is used to compress/decompress data of some old data format of Doris,
    which is no longer used anymore.

2. Add missing license to some files.

3. For all non-Apache-License code, all are explained in NOTICE file and the corresponding license is declared.

4. Remove the js source code from webroot, it will be downloaded as thirdparty
2020-08-24 21:51:55 +08:00
976820ba20 [SegmentV2] Change the default storage format to SegmentV2 (#4387)
Since the Segment V2 has been released for a long time, we should make it as default storage format for newly created table.

This CL mainly changes:
1. For all newly created tables, their default storage format is Segment V2.
2. For all already exist tablets, their storage format remain unchanged.
3. Fix  bugs described in Fix #4384 and Fix #4385
2020-08-24 21:51:17 +08:00
af2b749a87 make some readFields Deprecated (#4399)
We have changed most of our serialization methods to json. In order to be compatible with previous data, these classes still retain the readFields method. Some prs that involve modifying metadata often modify the readFields method. To avoid this, we should Mark these methods as Deprecated #4398
2020-08-21 22:58:08 +08:00
d61c10b761 [Delete] Support batch delete [part 1] (#4310)
* Implements the grammar of the batch delete #4051 
* Process create, alter table when table has delete sign column
* Support the syntax for enabling the delete column
* Automatically filtered deleted data in the select statement.
* Automatically add delete sign when create  rollup table
TODO:
 * Optimize the reading and compaction logic on the be side, so that the data marked as deleted will be completely deleted during base compaction
2020-08-21 22:57:16 +08:00
76a04de6c4 [MV] Input correct keys type of index meta when Add Partition (#4408)
Define Expr will not serialized in Column `toThrift`.

1. When adding partition, different indexes should use their own keys type
instead of using the keys type of base table uniformly.
`
2. There are two kinds of define expr in Column , one is analyzed, and the other is not analyzed.
Currently, analyzed define expr is only used when creating materialized views, so the define expr in RollupJob must be analyzed.
In other cases, such as define expr in `MaterializedIndexMeta`, it may not be analyzed after being relayed.
When executing the load, the analyzed define expr (such as to_bitmap(cast(k1, varchar))) will not be analyzed again.
Only a cast function will be added to the inner layer(such as to_bitmap(cast(cast(k1 ,int), varchar))) which is analyzed too.
The define expr that has not been analyzed (such as cast(k1, varchar)) will be analyzed when executing the load.
2020-08-21 10:42:41 +08:00
09b1965499 [MV] Fix errors when alter materialized view which based on dup table (#4375)
1. Input the correct keys type when mv is updated.
The keys type of mv should be used in schema change job rather then keys type of base table.
Otherwise, the be will core and thrown exception "Create replicas failed".

2. Forbidden add non-key column on agg mv directly when base table is duplicate model
If a dup table has a agg mv, user will not add a non-key column on mv.
The non-key column can only be added to dup index.
2020-08-21 10:36:03 +08:00
6bb111b42c Modify mv rewrite rule on 'Count distinct' (#4382)
The rewrite rule named `CountToSum` does not distinguish between `Count` and `Count distinct` which causes `Count distinct` is rewritten as `Sum` incorrectly.
So this commit modified matching rule.
When the function is `Count distinct`, the rewrite rule will not take effect.

Fixed #4381
2020-08-20 09:30:35 +08:00
bfb39a2826 [SQL][Function] Add replace() function (#4347)
replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow:
mysql> select replace("http://www.baidu.com:9090", "9090", "");
+------------------------------------------------------+
| replace('http://www.baidu.com:9090', '9090', '') |
+------------------------------------------------------+
| http://www.baidu.com: |
+------------------------------------------------------+
2020-08-20 09:28:53 +08:00
38a2a7a269 [Bug] Fix bug that modification of global variable can not be persisted. (#4324)
When setting global variables, such as `set global default_rowset_type=beta`,
the operation is not correctly persisted.

This CL change the fe meta version to 90.

---------------

The main reason for this problem is that for the modification of global variable,
we directly use Java's reflection mechanism to modify static member variables in `GlobalVariable` class.

But in the persistence method of the `set` operation, we only persist the value stored
in the `globalSessionVariable` variable, and this variable does not contain Global Variable.

So I added a new OperationType: `OP_GLOBAL_VARIABLE_V2`,
and added a `GlobalVarPersistInfo` class to record all changes.
2020-08-18 16:54:35 +08:00
3359467b9a [Tablet][Recovery] Support using empty tablet to repair the damaged or missing tablet (#4255)
In some very special circumstances, such as code bugs, or human misoperation, etc.,
all replicas of some tablets may be lost. In this case, the data has been substantially lost.
However, in some scenarios, the business still hopes to ensure that the query will not
report errors even if there is data loss, and reduce the perception of the user layer.
At this point, we can use the blank Tablet to fill the missing replica to ensure that the query can be executed normally.

Add a new FE config `recover_with_empty_tablet`. default is false. true means to use empty tablet to fill the missing one.

Also fix a bug in Fix #4274
2020-08-18 06:13:53 +00:00
53d00d92cc [Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) (#4352)
resolve the problem of querying ES table always route at same 3 BE nodes because of random strategy
2020-08-18 10:36:18 +08:00
e69496feaf [MysqlCompatibility] Support collate field option in expr (#4365)
Support SQL like:
```
select
collation_name,
character_set_name,
is_default collate utf8_general_ci = 'Yes' as is_default
from information_schema.collations
```
2020-08-17 22:52:57 +08:00
38921d4343 [MV]Forbidden aggregated partition key column on mv (#4343)
The partition column of table also must be the key in materialized view.
If not, when user wants to add partition of table, the be will core.
The materialized view could not create partition correctly when partition column has been aggregated.
2020-08-15 11:38:50 +08:00
4fa35c9f39 [Bug][RoutineLoad] Fix routine load timezone property invalid (#4339) 2020-08-13 23:40:54 +08:00
ac9c7741e9 [SQL]Support datagrip show database information (#4332)
Support show schema()
2020-08-13 23:39:05 +08:00
790779fb6f [SparkLoad]remove unncessary convert from dataframe to rdd (#4304) 2020-08-13 23:37:38 +08:00
48d89e06c3 [Bug fix]fix query id assign bug (#4291) 2020-08-12 22:42:36 +08:00
98fe80dd5a [MV]Forbidden no grouping mv on aggregation table (#4317)
If user wants to create a no grouping mv on aggregation table, the doris will thrown exception.
The correct approach is that explicit declare the grouping column.
For example:
Agg table: k1, k2, sum(k3)
Create materialized view stmt: select k1, k2 from agg_table group by k1, k2.

Fixed #4316
2020-08-12 20:57:25 +08:00
3354645c77 [BugFix][ColocateJoin] Fix bug of issue 4305 (#4306)
This PR use fragmentIdToSeqToAddressMap replace seqtoAddresss,
Beacause SeqBucket to Address should bind to fragment
2020-08-12 12:11:47 +08:00
48f3ba35ec [Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type (#4300) 2020-08-11 12:09:17 +08:00
493c88c1d6 [BUG] Fix NPE when distinct in predicate push down (#4294)
**Describe the bug**
Predicate push down where sub query has distinct may throw NPE

**To Reproduce**
Steps to reproduce the behavior:
1.  create table like 
```
+--------------+--------------+------+-------+---------+---------+
| Field        | Type         | Null | Key   | Default | Extra   |
+--------------+--------------+------+-------+---------+---------+
| event_day    | DATETIME     | No   | true  | NULL    |         |
| title        | VARCHAR(600) | No   | true  | NULL    |         |
| report_value | VARCHAR(50)  | No   | false | NULL    | REPLACE |
+--------------+--------------+------+-------+---------+---------+
```
2. exec query 
```

```SELECT
    *
FROM
    (
        SELECT
            DISTINCT event_day,
            title
        FROM
            click_show_window
    ) a
WHERE
    a.title IS NOT NULL
```
4. See error

```
ERROR 1064 (HY000): errCode = 2, detailMessage = Unexpected exception: null
```

This is because DISTINCT generate grouping exprs in agginfo, but this clause does not have a group by clause
2020-08-11 11:07:51 +08:00
a480dec7a4 Do not wrap NULL type tuple (#4245)
Do not wrap NULL type expr to IF(TupleIsNull(tids), NULL, expr)
2020-08-11 09:38:42 +08:00
6abb374d0c Fix duplicate table export fail (#4293) 2020-08-11 09:37:43 +08:00
4ad943e45d [Feature][Cache] Cache proxy and coordinator #2581 (#4248)
* [Feature][Cache] Cache proxy and coordinator #2581
1. Cache's abstract proxy class and BE's Cache implementation
2. Cache coordinator implemented by consistent hashing

* Adjusted the formatting code, naming and variables according to the comments
2020-08-10 16:40:25 +08:00
411ced5715 Secure singleton mode (#4257)
Co-authored-by: wangxixu <wangxixu@xiaomi.com>
2020-08-10 11:26:56 +08:00
f516172f23 Fix window function with limit zero bug 2 (#4235) 2020-08-10 10:29:05 +08:00
47fff6841b [Bug][ColocateJoin] Fix bug of #4287 and #4285 of Colocatejoin (#4289)
1.Table join itself should have same single partition to valid colocate join.
2.Check eqjoinConjuncts column order to valid colocate join.
2020-08-09 20:48:36 +08:00
a54b0eab0c [Bug]fix cancel query bug (#4275)
ConnectContext.kill() use executor to cancel query, but executor has never been set.
2020-08-08 20:29:32 +08:00
d5909ae503 [MaterializedView]Change the type of slot when mv is selected (#4272)
The column types of the materialized view and the base table are different.
When mv is selected in query plan, the type of slot should be changed by mv column type.
For example:
base table: k1 int, k2 int
mv table: k1 int, k2 bigint sum
The k2 type of slot ref should be changed from int to bigint.
Closed. #4271
2020-08-08 20:29:07 +08:00
eefad13107 [Feature] Support InPredicate in delete statement (#4006)
This PR is to add inPredicate support to delete statement,
and add max_allowed_in_element_num_of_delete variable to
limit element num of InPredicate in delete statement.
2020-08-06 23:19:40 +08:00
4c05eddc10 [SQL] Support approx_count_distinct rewrite to hll union in mv rewriter (#4239)
The new function approx_count_distinct is the alias of function ndv.
So Doris also need to rewrite approx_count_distinct to hll function when it is possible to match the hll materialized view.
2020-08-06 23:16:15 +08:00
237c0807a4 [RoutineLoad] Support modify routine load job (#4158)
Support ALTER ROUTINE LOAD JOB stmt, for example:

```
alter routine load db1.label1
properties
(
"desired_concurrent_number"="3",
"max_batch_interval" = "5",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false",
"timezone" = "+08:00"
)
```

Details can be found in `alter-routine-load.md`
2020-08-06 23:11:02 +08:00
c98b411500 [Bug] Revert part of #4199 to avoid BE crash(#4269)
Revert “Change type of sum, min, max function column in mv”

This pr is revert pr #4199 .
The daily test is cored when the type of mv column has been changed.
So I revert the pr.
The daily core will be fixed in the future. After that, the pr#4199 will be enable.

Change-Id: Ie04fcfacfcd38480121addc5e454093d4ae75181
2020-08-06 19:06:00 +08:00
173bc09833 [Alter]Analyze define expr before replay Rollup job (#4236)
The define expr should be analyzed after replay RollupJob.
The slot desc of define expr is used to transfrom to thrift and send to backend.
2020-08-05 21:47:18 +08:00
a4f3d43e15 fix version check bug (#4244)
Co-authored-by: gengjun <gengjun@dorisdb.com>
2020-08-05 21:45:36 +08:00
1b341601fe Generate jave files using maven (#4133)
generate generated-java files using maven instead of by build.sh
2020-08-05 15:20:39 +08:00
5caa347e86 [ColocateJoin] ColocateJoin support table join itself (#4230) (#4231)
if left table and right table is same table, they are naturally colocate relationship.
2020-08-02 22:05:45 +08:00
85e0a68783 [SQL][Bug] Fix multi predicate in correlation subquery analyze fail (#4211) 2020-08-02 22:05:23 +08:00
d64d65322b [Bug][DynamicPartition]Fix bug that Modify a dynamic partition property in a non-dynamic partition table will throw a Exception (#4127) 2020-08-02 22:03:57 +08:00
bdaef84a10 [FE] [HttpServer] Config netty param in HttpServer (#4225)
Now, if the length of URL is longer than 4096 bytes, netty will refuse.
The case can be reproduced by constructing a very long URL(longer than 4096bytes)

Add 2 http server params:
1. http_max_line_length
2. http_max_header_size
2020-08-01 17:59:01 +08:00
116d7ffa3c [SQL][Function] Add approx_count_distinct() function (#4221)
Add approx_count_distinct() function to replace the ndv() function
2020-08-01 17:54:19 +08:00
c32ddce0b5 [SQL][BUG]Fix window function with limit zero bug (#4207) 2020-08-01 17:43:47 +08:00
25f3420855 [MaterializedView] Change type of sum, min, max function column in mv (#4199)
If the agg function is sum, the type of mv column will be bigint.
The only exception is that if the base column is largeint, the type of mv column will be largeint.

If the agg function is min or max, the type of mv column will be same as the type of base column.
For example, the type of mv column is smallint when the agg function is min.
2020-08-01 17:43:23 +08:00
f412f99511 [Bug][ColocateJoin] Make a wrong choice of colocate join (#4216)
If table1 and table2 are colocated using column k1, k2.
Query should contains all of the k1, k2 to apply colocation algorithm.
Query like select * from table1 inner join table2 where t1.k1 = t2.k1 can not be used as colocation.
We add the rule to avoid the problem.
2020-07-31 15:18:00 +08:00
1ebd156b99 [Feature]Add fetch/update/clear proto of fe&be for cache (#4190) 2020-07-31 13:23:24 +08:00
b4cb8fb9b2 [Feature][Cache]Add interface, metric, variable and config for query cache (#4159) 2020-07-30 11:24:20 +08:00
fdcc223ad2 [Bug][Json] Refactor the json load logic to fix some bug
1. Add `json_root` for nest json data.
2. Remove `_jmap` to make the logic reasonable.
2020-07-30 10:36:34 +08:00
237271c764 [Bug] Fix fe meta version problem, make drop meta check code easy to read and add doc content for drop meta check (#4205)
This PR is mainly do three things:
1. Fix fe meta version bug introduced by #4029 , when fix conflict with #4086 
2. Make drop check code easy to read
3. Add doc content for drop meta check
2020-07-30 09:54:20 +08:00
8a169981cf [Bug][TabletRepair] Fix bug that too many replicas generated when decommission BE (#4148)
Try to select the BE with an existing replicas as the destination BE for
REPLICA_RELOCATING clone task.
Fix #4147 

Also add 2 new FE configs `max_clone_task_timeout_sec` and `min_clone_task_timeout_sec`
2020-07-30 09:46:33 +08:00
abeb25d2a9 Fx large int literal (#4168) 2020-07-30 00:53:50 +08:00