Commit Graph

70 Commits

Author SHA1 Message Date
fcb651329c [Plugin] Making FE audit module pluggable (#3219)
Currently we have implemented the plugin framework in FE. 
This CL make the original audit log logic pluggable.
The following classes are mainly implemented:

1. AuditPlugin
    The interface of audit plugin

2. AuditEvent
    An AuditEvent contains all information about an audit event, such as a query, or a connection.

3. AuditEventProcessor
    Audit event processor receive all audit events and deliver them to all installed audit plugins.

This CL implements two audit module plugins:

1. The builtin plugin `AuditLogBuilder`, which act same as the previous logic, to save the 
    audit log to the `fe.audit.log`

2. An optional plugin `AuditLoader`, which will periodically inserts the audit log into a Doris table
    specified by the user. In this way, users can conveniently use SQL to query and analyze this
    audit log table.

Some documents are added:

1. HELP docs of install/uninstall/show plugin.
2. Rename the `README.md` in `fe_plugins/` dir to `plugin-development-manual.md` and move
    it to the `docs/` dir
3. `audit-plugin.md` to introduce the usage of `AuditLoader` plugin.

ISSUE: #3226
2020-04-03 09:53:50 +08:00
29b37dad49 Sql reference of materialized view (#3208)
* Sql reference of materialized view

Sql reference of Create and drop materialized view in English and Chinese.

* Change discription
2020-04-01 21:22:19 +08:00
aa8b2f86c4 [Bug][Refactor] Fix the conflict of temp partition and dynamic partition operations (#3201)
The bug is described in issue: #3200.

This CL solve the problem by:
1. Refactor the alter operation conflict checking logic by introducing new classes `AlterOperations` and `AlterOpType`.
2. Allow add/drop temporary partition when dynamic partition feature is enabled.
3. Allow modifying table's property when there is temporary partition in table.
4. Make the properties `dynamic_partition.enable` optional, and default is true.
2020-03-27 20:25:15 +08:00
c1969a3fb3 [Conf] Make default_storage_medium configurable (#2980)
Doris support choose medium when create table, and the cluster balance strategy is dependent
between different storage medium, and most use will not specify the storage medium when create table,
even they kown that they should choose a storage medium, they have no idea about the
cluster's storage medium, so, I think we should make storage_medium and storage_cooldown_time
configurable, and this should be the admin's responsibility.

For Example, if the cluster's storage medium is HDD, but we need to change part of machines to SSD,
if we change the machine, the tablets before change is stored in HDD and they can't find a dest path
to migrate, and user will create table as usual, it will make all tablets stored in old machines and
the new machines will only store a little tablets. Without this config the only way is admin need
to traverse all partitions in cluster and change the property of storage_medium, it will increase
operational and maintenance costs.

So I add a FE config default_storage_medium, so that user can set the default storage medium.
2020-03-27 20:22:18 +08:00
d4c1938b5c Open datetime min value limit (#3158)
the min_value in olap/type.h of datetime is 0000-01-01 00:00:00, so we don't need restrict datetime min in tablet_sink
2020-03-24 10:52:57 +08:00
14c088161c [New Stmt] Support setting replica status manually (#1522)
Sometimes a replica is broken on BE, but FE does not notice that.
In this case, we have to manually delete that replica on BE.
If there are hundreds of replicas need to be handled, this is a disaster.

So I add a new stmt:

    ADMIN SET REPLICA STATUS

which support setting tablet on specified BE as BAD or OK.
2020-03-16 13:42:30 +08:00
cf219ddf18 [ConsistencyCheck] Support checking replica consistency of tablet manually (#3067) 2020-03-10 15:25:25 +08:00
cc1a5fb8ea [Function] Support '%' in date format string (#3037)
eg:
select str_to_date('2014-12-21 12%3A34%3A56', '%Y-%m-%d %H%%3A%i%%3A%s');
select unix_timestamp('2007-11-30 10:30%3A19', '%Y-%m-%d %H:%i%%3A%s');

This also enable us to extract column fields from HDFS file path with contains '%'.
2020-03-05 08:56:02 +08:00
078e35a62e Support Amazon S3 data source in Broker Load (#3004) 2020-03-01 12:53:50 +08:00
cc0d41277c [Alter] Add more schema change to varchar type (#2777) 2020-02-19 23:14:43 +08:00
625411bd28 Doris support in memory olap table (#2847) 2020-02-18 10:45:54 +08:00
59dd2a0d7f Fix some typo in bitmap index docs. (#2879) 2020-02-11 19:03:42 +08:00
1f001481ae Support batch add and drop rollup indexes #2671 (#2781) 2020-02-11 12:58:01 +08:00
58ff952837 [Stmt] Support new show functions syntax to make user search function more conveniently (#2800)
SHOW [FULL] [BUILTIN] FUNCTIONS [IN|FROM db] [LIKE 'function_pattern'];
2020-01-20 14:14:42 +08:00
fc55423032 [SQL] Support Grouping Sets, Rollup and Cube to extend group by statement
Support Grouping Sets, Rollup and Cube to extend group by statement
support GROUPING SETS syntax 
```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) );
```
cube  or rollup like 
```
SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP|CUBE(a,b,c)
```

[ADD] support grouping functions in expr like grouping(a) + grouping(b) (#2039)
[FIX] fix analyzer error in window function(#2039)
2020-01-17 16:24:02 +08:00
3b24287251 Support 64 bits integers for BITMAP type (#2772)
Fixes #2771 

Main changes in this CL
* RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h
* leveraging Roaring64Map to support unsigned BIGINT for BITMAP type
* introduces two new format (SINGLE64 and BITMAP64) for BITMAP type

So far we have three storage format for BITMAP type

```
EMPTY := TypeCode(0x00)
SINGLE32 := TypeCode(0x01), UInt32LittleEndian
BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/)
```

In order to support BIGINT element and keep backward compatibility, introduce two new format

```
SINGLE64 := TypeCode(0x03), UInt64LittleEndian
BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64
```

Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column.

Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are

1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the [C++ version](https://github.com/RoaringBitmap/CRoaring/blob/v0.2.60/cpp/roaring64map.hh#L545) is different from the [Java version](35104c564e/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java (L1097)). Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this.
2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.
2020-01-17 14:13:38 +08:00
9e54751098 [Snapshot] Modify the prefer snapshot version (#2748)
In this CL, prefer snapshot version in snapshot request is defined
in thrift. So that both FE and BE can use this version value.
2020-01-15 15:10:14 +08:00
e5717efc5a [Insert] Return more info of insert operation (#2718)
Standardize the return results of INSERT operations,
which is convenient for users to use and locate problems.

More details can be found in insert-into-manual.md
2020-01-15 10:39:53 +08:00
a36193dfab Support decimal and timestamp type in orc load (#2759) 2020-01-15 07:40:30 +08:00
ccaa97a5ac Make bitmap functions accept any expression that returns bitmap (#2728)
This CL make bitmap_count, bitmap_union, and bitmap_union_count accept any expression whose return type is bitmap as input so that we can support flexible bitmap expression such as bitmap_count(bitmap_and(to_bitmap(1), to_bitmap(2))).

This CL also create separate documentation for each bitmap UDF to conform with other functions.
2020-01-11 14:02:12 +08:00
6bc54ef3f0 [Document] Add dynamic partition docs (#2711) 2020-01-08 23:08:48 +08:00
42dfe1369b Add filter conditions for 'show partitions from table' syntax (#2553)
Add filter conditions for show partitions from table syntax, to filter partitions needed
2020-01-03 19:52:25 +08:00
c098178f7a [Index] Implements create drop show index syntax for bitmap index [#2487] (#2573)
### create table with index 
```
CREATE TABLE table1
(
    siteid INT DEFAULT '10',
    citycode SMALLINT,
    username VARCHAR(32) DEFAULT '',
    pv BIGINT SUM DEFAULT '0',
    INDEX index_name [USING BITMAP] (siteid, citycode) COMMENT 'balabala'
)
AGGREGATE KEY(siteid, citycode, username)
DISTRIBUTED BY HASH(siteid) BUCKETS 10
PROPERTIES("replication_num" = "1");
```
### create index 
```
CREATE INDEX index_name  ON table1 (siteid, citycod) [USING BITMAP] COMMENT 'balabala';
or 
ALTER TABLE table1 ADD  INDEX index_name  [USING BITMAP] (siteid, citycod) COMMENT 'balabala';
```
### drop index
```
DROP INDEX index_name ON table1;
or
ALTER TABLE table1 DROP INDEX index_name 
```

### show index
```
SHOW INDEX[ES] FROM table1
```
output
```
+---------+-------------+-----------------+------------+---------+
| Table   | Index_name  | Column_name     | Index_type | Comment |
+---------+-------------+-----------------+------------+---------+
| table1  | index_name  | siteid,citycode | BITMAMP    | balabala|
+---------+-------------+-----------------+------------+---------+
```
2020-01-03 17:41:26 +08:00
3022955f32 Fix recover time in docs (#2588) 2019-12-27 14:51:41 +08:00
1113f951c3 Alter view stmt (#2522)
This commit adds a new statement named alter view, like
ALTER VIEW view_name
(
col_1,
col_2,
col_3,
)
AS SELECT k1, k2, SUM(v1) FROM exampleDb.testTbl GROUP BY k1,k2
2019-12-27 14:02:56 +08:00
f7032b07f3 Support more schema change from VARCHAR type (#2501) 2019-12-26 22:38:53 +08:00
c81b1db406 Support convert VARCHAR type to DATE type (#2489) 2019-12-18 12:58:47 +08:00
89003b774b Support Convert Varchar to INT (#2481) 2019-12-17 22:02:28 +08:00
bf31bd238b Change default storage model from aggregate to duplicate(#2318) (#2412)
change default storage model from aggregate to duplicate
for sql  `create table t (k1 int) DISTRIBUTED BY HASH(k1) BUCKETS 10 PROPERTIES("replication_num" = "1");`
before: 
```
 CREATE TABLE `t` (
  `k1` int(11) NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 10
PROPERTIES (
"storage_type" = "COLUMN"
);
```
after:
```
CREATE TABLE `t` (
  `k1` int(11) NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 10
PROPERTIES (
"storage_type" = "COLUMN"
);
```

#2318
2019-12-12 14:30:30 +08:00
5951a0eaea Add more schema change docs (#2411)
Add explanation about converting:

DATE -> DATETIME
DATETIME -> DATE
INT->DATE
2019-12-10 16:46:41 +08:00
a46bf1ada3 [Authorization] Modify the authorization checking logic (#2372)
**Authorization checking logic**

There are some problems with the current password and permission checking logic. For example:
First, we create a user by:
`create user cmy@"%" identified by "12345";`

And then 'cmy' can login with password '12345' from any hosts.

Second, we create another user by:
`create user cmy@"192.168.%" identified by "abcde";`

Because "192.168.%" has a higher priority in the permission table than "%". So when "cmy" try
to login in by password "12345" from host "192.168.1.1", it should match the second permission
entry, and will be rejected because of invalid password.
But in current implementation, Doris will continue to check password on first entry, than let it pass. So we should change it.

**Permission checking logic**

After a user login, it should has a unique identity which is got from permission table. For example,
when "cmy" from host "192.168.1.1" login, it's identity should be `cmy@"192.168.%"`. And Doris
should use this identity to check other permission, not by using the user's real identity, which is
`cmy@"192.168.1.1"`.

**Black list**
Functionally speaking, Doris only support adding WHITE LIST, which is to allow user to login from
those hosts in the white list. But is some cases, we do need a BLACK LIST function.
Fortunately, by changing the logic described above, we can simulate the effect of the BLACK LIST.

For example, First we add a user by:
`create user cmy@'%' identified by '12345';`

And now user 'cmy' can login from any hosts. and if we don't want 'cmy' to login from host A, we
can add a new user by:
`create user cmy@'A' identified by 'other_passwd';`

Because "A" has a higher priority in the permission table than "%". If 'cmy' try to login from A using password '12345', it will be rejected.
2019-12-06 17:45:56 +08:00
9fbc1c7ee6 Support where/orderby/limit after “SHOW ALTER TABLE COLUMN“ syntax (#2380)
Features:
1、Support WHERE/ORDER BY/LIMIT
2、Columns:TableName、CreatTime、FinishTime、State
3、Only “And” between conditions
4、TableName and State column only support "=" operator
5、CreateTime and FinishTime column support “=”,“>=”,"<=",">","<","!=" operators
6、CreateTime and FinishTime column support Date and DateTime string, eg:"2019-12-04" or "2019-12-04 17:18:00"

TestCase:
MySQL [haibotest]> show alter table column where State='FINISHED' and CreateTime > '2019-12-03' order by FinishTime desc limit 0,2;
+-------+---------------+---------------------+---------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
| JobId | TableName | CreateTime | FinishTime | IndexName | IndexId | OriginIndexId | SchemaVersion | TransactionId | State | Msg | Progress | Timeout |
+-------+---------------+---------------------+---------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
| 11134 | test_schema_2 | 2019-12-03 19:21:42 | 2019-12-03 19:22:11 | test_schema_2 | 11135 | 11059 | 1:192010000 | 3 | FINISHED | | N/A | 86400 |
| 11096 | test_schema_3 | 2019-12-03 19:21:31 | 2019-12-03 19:21:51 | test_schema_3 | 11097 | 11018 | 1:2063361382 | 2 | FINISHED | | N/A | 86400 |
+-------+---------------+---------------------+---------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
2 rows in set (0.00 sec)
2019-12-06 16:24:44 +08:00
46181c0880 Fix some bugs about load label (#2241) 2019-11-23 00:04:45 +08:00
d8cfbbedf7 Support bitmap_empty function (#2227) 2019-11-18 20:37:00 +08:00
6759e83a07 Add license header for md files and fix some translation's error (#2137) 2019-11-06 21:35:07 +08:00
65c3b0907a Support aggregation type of REPLACE_IF_NOT_NULL (#2127)
Some use has the requirment that only some of columns will be update in
one load operation, and others will retain as original. However, Doris
can't handle this situation, because user must specify value for all
columns. Then if a column aggregation method is REPLACE, use must query
original value to overwrite it. This often needs some work for user to
do.

If this CL is applied, user can use REPLACE_IF_NOT_NULL instead of
REPLACE. Then when load data to table, if user don't intent to change
value of this column, user can specify NULL for this column. Doris will
retain original value for this column.
2019-11-05 18:08:34 +08:00
95a3b4ccfe Add object type (#1948)
Add a new type: Object. Currently, it's mainly for complex aggregate metrics(HLL , Bitmap).

The Object type has the following constraints:
1 Object type could not as key column type
2 Object type doesn't support all indices (BloomFilter, short key, zone map, invert index)
3 Object type doesn't support filter and group by

In the implementation:

The Object type reuse the StringValue and StringVal, because in storage engine, the Object type is binary, it has a pointer and length.
2019-10-31 21:42:58 +08:00
41e55cfca9 Modify fixed partition feature (#1989)
1. Not support MAVALUE in multi partition column.
2. Fix the incorrect show create table stmt.
2019-10-16 16:03:46 +08:00
63fa260d3f Support prepare/close in UDF (#1985)
The prepare/close step of scalar function is already supported in execution framework, We only need to do is that support it in syntax and meta in frontend.

In addition, 'Hive' binary type of scalar function NOT supports prepare/close step, we need to make it supports.
2019-10-16 07:19:20 +08:00
ec7c8a2c6f Support adding fixed range partition
eg: ALTER TABLE test_table ADD PARTITION p0125 VALUES [("20190125"), ("20190126"));
2019-10-15 09:50:30 +08:00
62acf5d098 Limit the memory usage of Loading process (#1954) 2019-10-15 09:26:20 +08:00
b84ef013eb Fix the mistake for HLL in mini load (#1981)
[Docs] Fix mistakes for HLL column in mini load
2019-10-14 19:46:23 +08:00
ccc236484b Fix bug that failed to add KEY column to DUPLICATE KEY table (#1973) 2019-10-14 16:40:34 +08:00
ec3aa03c45 Add more routine load example (#1902) 2019-09-27 20:42:52 +08:00
40b9c3571b Support hll_empty function (#1825) 2019-09-25 09:28:02 +08:00
e8da855cd2 Support setting timezone for stream load and routine load (#1831) 2019-09-20 07:55:05 +08:00
e70e48c01e Add a ALTER operation to change distribution type from RANDOM to HASH (#1823)
Random distribution is no longer supported since version 0.9.
And we need a way to convert the random distribution to hash distribution.

    ALTER TABLE db.tbl SET ("distribution_type" = "hash");
2019-09-18 14:16:26 +08:00
714dca8699 Support table comment and column comment for view (#1799) 2019-09-18 09:45:28 +08:00
054a3f48bc Add where expr in broker load (#1812)
The where predicate in broker load is responsible for filtering transformed data.
The docs of help and operator has been changed.
2019-09-17 11:32:40 +08:00
9aa2045987 Refactor alter job (#1695) 2019-09-12 16:31:29 +08:00