Commit Graph

184 Commits

Author SHA1 Message Date
ef6cd9ae25 Add files to gitignore (#2753) 2020-01-14 22:29:56 +08:00
273edced77 Replace PowerMock/EasyMock by Jmockit (1/3) (#2732)
This commit replaces the PowerMock/EasyMock in our unit tests, But not all.
PS.(The tests relevant to DescribeStmt are ignored until I find a way to fix it)
2020-01-13 21:28:18 +08:00
425b1cf29b Fix port already in use (#2716) 2020-01-09 16:01:17 +08:00
f7cea6dda5 CreateViewStmt/AlterViewStmt support cte and fix bug (#2641)
This commit contains the following changes:
1. Let create/alter view statement support cte sql. (Issue #2625 )

e.g.
```
Alter view test_tbl_view (h1, h2)
as
with testTbl_cte (w1, w2) as 
(
    select col1, col2 from testDb.testTbl
)
select w1 as c1, sum(w2) as c2 from testTbl_cte 
where w1 > 10 
group by w1 order by w1
```

2. Fix the bug that view's schema remains unchanged after replaying alter view. (Issue #2624 )
2020-01-08 23:11:38 +08:00
d4a3b34319 [Meta Serialization] Support GSON serialization for class "Type" (#2709)
"Type" is a abstract class, it has 4 sub classes:
1. ScalarType
2. ArrayType
3. MapType
4. StructType

This CL only support ScalarType. Other types can be added later.
2020-01-08 19:56:56 +08:00
af9529a207 [Dynamic Partition] Support for automatically adding partitions
In some scenarios, when a user creates an olap table that is range partition by time, the user needs to periodically add and remove partitions to ensure that the data is valid. As a result, adding and removing partitions dynamically can be very useful for users.
2020-01-03 23:45:04 +08:00
42dfe1369b Add filter conditions for 'show partitions from table' syntax (#2553)
Add filter conditions for show partitions from table syntax, to filter partitions needed
2020-01-03 19:52:25 +08:00
c098178f7a [Index] Implements create drop show index syntax for bitmap index [#2487] (#2573)
### create table with index 
```
CREATE TABLE table1
(
    siteid INT DEFAULT '10',
    citycode SMALLINT,
    username VARCHAR(32) DEFAULT '',
    pv BIGINT SUM DEFAULT '0',
    INDEX index_name [USING BITMAP] (siteid, citycode) COMMENT 'balabala'
)
AGGREGATE KEY(siteid, citycode, username)
DISTRIBUTED BY HASH(siteid) BUCKETS 10
PROPERTIES("replication_num" = "1");
```
### create index 
```
CREATE INDEX index_name  ON table1 (siteid, citycod) [USING BITMAP] COMMENT 'balabala';
or 
ALTER TABLE table1 ADD  INDEX index_name  [USING BITMAP] (siteid, citycod) COMMENT 'balabala';
```
### drop index
```
DROP INDEX index_name ON table1;
or
ALTER TABLE table1 DROP INDEX index_name 
```

### show index
```
SHOW INDEX[ES] FROM table1
```
output
```
+---------+-------------+-----------------+------------+---------+
| Table   | Index_name  | Column_name     | Index_type | Comment |
+---------+-------------+-----------------+------------+---------+
| table1  | index_name  | siteid,citycode | BITMAMP    | balabala|
+---------+-------------+-----------------+------------+---------+
```
2020-01-03 17:41:26 +08:00
1113f951c3 Alter view stmt (#2522)
This commit adds a new statement named alter view, like
ALTER VIEW view_name
(
col_1,
col_2,
col_3,
)
AS SELECT k1, k2, SUM(v1) FROM exampleDb.testTbl GROUP BY k1,k2
2019-12-27 14:02:56 +08:00
6f3c50a95c [Document] Add example for using CTE in INSERT operation (#2572) 2019-12-26 10:00:34 +08:00
11b78008cd Timezone variable support one digital time (#2513)
Support time zone variable like "-8:00","+8:00","8:00"
Time zone variable like "-8:00" is illegal in time-zone ID ,so we mush transfer it to standard format
2019-12-20 07:45:29 +08:00
b1bac4d0cd Support to create materialized view (#2431)
Support to create materialized view

This commit support to create materiliazed view.
The syntax of stmt is following:
CREATE Materialized View [MV name] AS
  SELECT select_expr[, select_expr ...]
  FROM [Base table name]
  GROUP BY column_name[, column_name ...]
  ORDER BY column_name[, column_name ...]

The CreateMaterializedViewClause is used to check the semantic of stmt in the first step.
Now, the where, having, limit clause is forbidden in CREATE MATERIALIZED VIEW.
Also the aggregation function is restricted in SUM/MIN/MAX.

The second step is to validate stmt according to metadata of base table.
For example, the aggregate type of mv column must be same as the aggregate type of base column in aggregate table.

The last step is to prepare index of mv and add this new mvJob in Handler.
The handler will asynchronous process this new mvJob.
2019-12-17 21:12:24 +08:00
e65a645138 Add classes related to "tag". (#2343)
[Tag System] 
This CL includes 2 parts:

    Add classes related to "tag"
        Resource: is the collective name of the nodes that provide various service capabilities in Doris cluster.
        Tag: A Tag consists of type and name.
        TagSet: TagSet represents a set of tags.
        TagManager: maintains 2 indexes:
        one is from tag to resource.
        one is from resource to tags

    ISSUE #1723

    Using JSON as serialization methods of metadata

    Introduce GSON library to serialize the new classes mentioned above.

    ISSUE #2415 #2389

GSON's version is updated to 2.8.6
2019-12-15 20:13:29 +08:00
a17b28ccc1 Modify FE QueryPlan UT test failure by accident (#2455) 2019-12-13 21:28:54 +08:00
8ba3c9d777 [Tag System] Forbid cluster related operations (#2429)
The multi cluster feature will be deprecated soon.
Add a FE config "disable_cluster_feature", and default is true, to
forbid any cluster related operations, include:

    * create/drop cluster
    * add free backend/add backend to cluster/decommission cluster balance
    * change the backends num of cluster
    * link/migration db

* fix ut
2019-12-13 10:11:30 +08:00
59f5851c29 Fix bug for show tables from unknown database doesn't throw error (#2445) 2019-12-12 23:18:52 +08:00
3af03d6283 Fix sql mode Bug (#2374)
This commit fixs the bug below,

FE throws a unexpected exception when encounter a query like :
Set sql_mode = '0,PIPES_AS_CONCAT'.

and make some change to sql mode analyze process, now the analyze process is no longer put in SetVar.class, but in VariableMgr.class.
2019-12-12 17:50:35 +08:00
c39d35df4c Add tablet compaction score metrics (#2427)
[Metric] Add tablet compaction score metrics

Backend:
    Add metric "tablet_max_compaction_score" to monitor the current max compaction
    score of tablets on this Backend. This metric will be updated each time
    the compaction thread picking tablets to compact.

Frontend:
    Add metric "tablet_max_compaction_score" for each Backend. These metrics will
    be updated when backends report tablet.
    And also add a calculated metric "max_tablet_compaction_core" to monitor the
    max compaction core of tablets on all Backends.
2019-12-12 17:46:59 +08:00
7f2144e7e5 Upgrade JMockit from version 1.13 to 1.48 (#2423) 2019-12-12 12:03:17 +08:00
8e6535053c [Tag System] Remove the 'isRestore' flag when creating table or partition (#2363)
'isRestore' flag is for the old version of backup and restore process,
which is deprecated long time ago. Remove it.

This commit is also for making a further step to  ISSUE #1723.
2019-12-10 16:37:44 +08:00
a3b7cf484b Set the load channel's timeout to be the same as the load job's timeout (#2405)
[Load] 

When performing a long-time load job, the following errors may occur. Causes the load to fail.

load channel manager add batch with unknown load id: xxx

There is a case of this error because Doris opened an unrelated channel during the load
process. This channel will not receive any data during the entire load process. Therefore,
after a fixed timeout, the channel will be released.

And after the entire load job is completed, it will try to close all open channels. When it try to
close this channel, it will find that the channel no longer exists and an error is reported.

This CL will pass the timeout of load job to the load channel, so that the timeout of load channels
will be same as load job's.
2019-12-06 21:51:00 +08:00
55d64e3be8 Remove the readFields() method in Writable interface (#2394)
All classes that implement the Wriable interface need only implement the write() method.
The read() method should be implemented by itself according to the situation of different
classes.
2019-12-06 21:46:21 +08:00
a46bf1ada3 [Authorization] Modify the authorization checking logic (#2372)
**Authorization checking logic**

There are some problems with the current password and permission checking logic. For example:
First, we create a user by:
`create user cmy@"%" identified by "12345";`

And then 'cmy' can login with password '12345' from any hosts.

Second, we create another user by:
`create user cmy@"192.168.%" identified by "abcde";`

Because "192.168.%" has a higher priority in the permission table than "%". So when "cmy" try
to login in by password "12345" from host "192.168.1.1", it should match the second permission
entry, and will be rejected because of invalid password.
But in current implementation, Doris will continue to check password on first entry, than let it pass. So we should change it.

**Permission checking logic**

After a user login, it should has a unique identity which is got from permission table. For example,
when "cmy" from host "192.168.1.1" login, it's identity should be `cmy@"192.168.%"`. And Doris
should use this identity to check other permission, not by using the user's real identity, which is
`cmy@"192.168.1.1"`.

**Black list**
Functionally speaking, Doris only support adding WHITE LIST, which is to allow user to login from
those hosts in the white list. But is some cases, we do need a BLACK LIST function.
Fortunately, by changing the logic described above, we can simulate the effect of the BLACK LIST.

For example, First we add a user by:
`create user cmy@'%' identified by '12345';`

And now user 'cmy' can login from any hosts. and if we don't want 'cmy' to login from host A, we
can add a new user by:
`create user cmy@'A' identified by 'other_passwd';`

Because "A" has a higher priority in the permission table than "%". If 'cmy' try to login from A using password '12345', it will be rejected.
2019-12-06 17:45:56 +08:00
9fbc1c7ee6 Support where/orderby/limit after “SHOW ALTER TABLE COLUMN“ syntax (#2380)
Features:
1、Support WHERE/ORDER BY/LIMIT
2、Columns:TableName、CreatTime、FinishTime、State
3、Only “And” between conditions
4、TableName and State column only support "=" operator
5、CreateTime and FinishTime column support “=”,“>=”,"<=",">","<","!=" operators
6、CreateTime and FinishTime column support Date and DateTime string, eg:"2019-12-04" or "2019-12-04 17:18:00"

TestCase:
MySQL [haibotest]> show alter table column where State='FINISHED' and CreateTime > '2019-12-03' order by FinishTime desc limit 0,2;
+-------+---------------+---------------------+---------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
| JobId | TableName | CreateTime | FinishTime | IndexName | IndexId | OriginIndexId | SchemaVersion | TransactionId | State | Msg | Progress | Timeout |
+-------+---------------+---------------------+---------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
| 11134 | test_schema_2 | 2019-12-03 19:21:42 | 2019-12-03 19:22:11 | test_schema_2 | 11135 | 11059 | 1:192010000 | 3 | FINISHED | | N/A | 86400 |
| 11096 | test_schema_3 | 2019-12-03 19:21:31 | 2019-12-03 19:21:51 | test_schema_3 | 11097 | 11018 | 1:2063361382 | 2 | FINISHED | | N/A | 86400 |
+-------+---------------+---------------------+---------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
2 rows in set (0.00 sec)
2019-12-06 16:24:44 +08:00
597a8b2146 Revert "Fix arithmetic operation between numeric and non-numeric (#2362)" (#2398)
This reverts commit 6857ffe1c5976ef06003aa479279368bafc581f1.
2019-12-06 14:58:38 +08:00
6857ffe1c5 Fix arithmetic operation between numeric and non-numeric (#2362)
fix arithmetic operation between numeric and non-numeric will cause unexpected value.
After this patch you will get
mysql> select 1 +  "kks";
+-----------+
| 1 + 'kks' |
+-----------+
|         1 |
+-----------+
1 row in set (0.02 sec)

mysql> select 1 -  "kks";
+-----------+
| 1 - 'kks' |
+-----------+
|         1 |
+-----------+
1 row in set (0.01 sec)
2019-12-06 10:33:06 +08:00
27d6794b81 Support subquery with non-scalar result in Binary predicate and Between-and predicate (#2360)
This commit add a new plan node named AssertNumRowsNode
which is used to determine whether the number of rows exceeds the limit.
The subquery in Binary predicate and Between-and predicate should be added a AssertNumRowsNode
which is used to determine whether the number of rows in subquery is more than 1.
If the number of rows in subquery is more than 1, the query will be cancelled.

For example:
There are 4 rows in table t1.
Query: select c1 from t1 where c1=(select c2 from t1);
Result: ERROR 1064 (HY000): Expected no more than 1 to be returned by expression select c2 from t1

ISSUE-2270
TPC-DS 6,54,58
2019-12-05 21:27:33 +08:00
4f39d405ee Fix some load bugs (#2384)
For #2383 
1. Limit the concurrent transactions of routine load job
2. Create new routine load task when txn is VISIBLE, not after COMMITTED.

For #2267 
1. All non-master daemon thread should also be started after catalog is ready.

For #2354 
1. `fixLoadJobMetaError()` should be called after all meta data is read, including image and edit logs.
2. Mini load job should set to CANCELLED when corresponding transaction is not found, instead
of UNKNOWN.
2019-12-05 13:41:04 +08:00
c8cff85c94 Fixed a bug that HttpServer in unit test does not start correctly. (#2361)
Because the http client in unit test try to connect to the server when
server is not ready yet.
2019-12-03 20:34:16 +08:00
086bb82fd2 Fixed a bug that Load job's state is incorrect when upgrading from 0.10.x to 0.11.x (#2356)
There is bug in Doris version 0.10.x. When a load job in PENDING or LOADING
state was replayed from image (not through the edit log), we forgot to add
the corresponding callback id in the CallbackFactory. As a result, the
subsequent finish txn edit logs cannot properly finish the job during the
replay process. This results in that when the FE restarts, these load jobs
that should have been completed are re-entered into the pending state,
resulting in repeated submission load tasks.

Those wrong images are unrecoverable, so that we have to reset all load jobs
in PENDING or LOADING state when restarting FE, depends on its corresponding
txn's status, to avoid submit jobs repeatedly.

If corresponding txn exist, set load job' state depends on txn's status.
If txn does not exist, may be the txn has been removed due to label expiration.
So that we don't know the txn is aborted or visible. So we have to set the job's state
as UNKNOWN, which need handle it manually.
2019-12-03 16:02:50 +08:00
875790eb13 Remove VersionHash used to comparation in Fe (#2335) 2019-12-02 19:59:13 +08:00
a2d7c42042 Add a variable to specifically limit the memory usage of the load part in the insert operation (#2305)
This variable is mainly for INSERT operation, because INSERT operation has both query and load part.
Using only the exec_mem_limit variable does not make a good distinction of memory limit between the two parts.
2019-11-28 13:03:11 +08:00
d5aeb9a6b7 Add document for session variables. (#2284)
Also make the variable effective in current session when setting it globally.
2019-11-24 22:47:05 +08:00
46181c0880 Fix some bugs about load label (#2241) 2019-11-23 00:04:45 +08:00
79ff0ad2a4 Add pipes_as_concat_mode (#2252)
This commit will add a new sql mode named MODE_PIPES_AS_CONCAT:
Description:
1、If this mode is active, '||' will be handled different from the original way ('||' and 'or' are seen as the same symbols in Doris) that it can be used to concat two exps and returns a new string. For example, 'a' || 'b' = 'ab' and 1 || 0 = '10'.
2. User can active this mode by "SET sql_mode = PIPES_AS_CONCAT", and deactive it by "SET sql_mode = '' ".
2019-11-22 15:01:53 +08:00
297542bd3f Delay start master only daemon threads (#2268)
These daemon thread should be started after catalog is ready,
otherwise it may cause some undefined behavior.
2019-11-22 14:39:37 +08:00
4fb498a1dc fix unit test failure for show columns from unknown table (#2261) 2019-11-21 21:38:36 +08:00
9b5eeaec19 Fix bug that DeployManager should start working after catalog is ready. (#2244)
Otherwise, it can not get master ip/port from not-ready catalog.
2019-11-20 09:49:09 +08:00
48d9318d07 Support date_add function to support partition prune (#2154)
Currently in the date_add/date_sub functions (DATE_ADD(DATETIME date,INTERVAL expr type)), the expr parameter is the interval you want to add.
Doris will convert these functions to xxx_sub/xxx_add. However, there is only the days_add function in fe, which causes other date_add formats, such as select date_add('2010-11-30 23:59:59', INTERVAL 2 DAY), cannot be pruned.

So I've added other functions to support fe partition prune
2019-11-08 18:57:21 +08:00
af79485eb2 Ignore --helper start argument if not first time to start FE (#2159) 2019-11-08 08:48:11 +08:00
89dc461f91 Fix UT and remove unused code (#2160) 2019-11-08 08:47:48 +08:00
5a4908e99a Forward stmt with stmt id generated on origin FE. (#2129)
Some stmt, such as DDL and DML stmt will be forwarded from non-master FE
to Master FE. But these stmt will be logged in non-master FE's audit log
with its origin stmt id generated on non-master FE.

So we should also pass this origin stmt id to Master, so that we can track
this stmt's execution process more easily.
2019-11-07 10:28:15 +08:00
65c3b0907a Support aggregation type of REPLACE_IF_NOT_NULL (#2127)
Some use has the requirment that only some of columns will be update in
one load operation, and others will retain as original. However, Doris
can't handle this situation, because user must specify value for all
columns. Then if a column aggregation method is REPLACE, use must query
original value to overwrite it. This often needs some work for user to
do.

If this CL is applied, user can use REPLACE_IF_NOT_NULL instead of
REPLACE. Then when load data to table, if user don't intent to change
value of this column, user can specify NULL for this column. Doris will
retain original value for this column.
2019-11-05 18:08:34 +08:00
ac5dd0c9f2 Support sql mode (#2083)
At present, we do not support SQL MODE which is similar to MySQL. In MySQL, SQL MODE is stored in global session and session with a 64 bit address,and every bit 0 or 1 on this address represents a mode state. Besides, MySQL supports combine mode which is composed of several modes.

We should support SQL MODE to deal with sql dialect problems. We can heuristically use the MySQL way to store SQL MODE in session and parse it into string when we need to return it back to client.

This commit suggests a solution to support SQL MODE. But it's just a sample, and the mode types in SqlModeHelper.java are not really meaningful from now on.
2019-11-01 23:21:00 +08:00
45df6aae08 Fix some routine load bugs (#2093)
Mainly fix the following issues:

1. A null pointer exception is raised when a database or table is dropped. The expected behavior is that the routine load job is stopped.

2. Memory leaks. Batch routine load task submissions are no longer performed, and modifications are submitted separately for each task.

3. Unreasonable task timeout.
    Routine load tasks should not be queued in the BE thread pool for execution. The task sent to the BE should be executed immediately, otherwise the task in the FE will be timeout first. Eventually leads to constant timeout for all subsequent tasks.

4. All routine load job should be scheduled once it being submitted. Not waiting the available BE slot. Otherwise, all later submitted jobs may not be scheduled forever.
2019-10-31 21:53:03 +08:00
5e8c96f28b Optimize FE start logic (#2052) 2019-10-31 11:11:50 +08:00
3c12af4dcc Limit the memory consumption of broker scan node (#1996)
If memory exceed limit, no more row batch will be pushed to batch queue
2019-10-17 14:40:16 +08:00
ac16318c9b [Bug-fix][Broker-load] Fix the bug of the label already exists when the txn has been finished (#1992)
If FE is restarted between txn committed and visible, the load job will be rescheduled and failed with label already exists.
The reason is that there are inconsistency between transaction of load job and meta of load job.
So, the replay of the txn attachment need to be done in function replayOnCommitted.
The load job state and progress is correct after that.
2019-10-16 16:35:18 +08:00
41e55cfca9 Modify fixed partition feature (#1989)
1. Not support MAVALUE in multi partition column.
2. Fix the incorrect show create table stmt.
2019-10-16 16:03:46 +08:00
ec7c8a2c6f Support adding fixed range partition
eg: ALTER TABLE test_table ADD PARTITION p0125 VALUES [("20190125"), ("20190126"));
2019-10-15 09:50:30 +08:00