Commit Graph

2647 Commits

Author SHA1 Message Date
d0f87728e0 [Doc] Add example of timeout property in alter table stmt (#3274) 2020-04-07 19:51:16 +08:00
c9c58342b2 [License] Add License to codes (#3272) 2020-04-07 16:35:13 +08:00
2ed184e06a Add config: tablet writer open rpc timeout (#3258) 2020-04-03 16:43:56 +08:00
fcb651329c [Plugin] Making FE audit module pluggable (#3219)
Currently we have implemented the plugin framework in FE. 
This CL make the original audit log logic pluggable.
The following classes are mainly implemented:

1. AuditPlugin
    The interface of audit plugin

2. AuditEvent
    An AuditEvent contains all information about an audit event, such as a query, or a connection.

3. AuditEventProcessor
    Audit event processor receive all audit events and deliver them to all installed audit plugins.

This CL implements two audit module plugins:

1. The builtin plugin `AuditLogBuilder`, which act same as the previous logic, to save the 
    audit log to the `fe.audit.log`

2. An optional plugin `AuditLoader`, which will periodically inserts the audit log into a Doris table
    specified by the user. In this way, users can conveniently use SQL to query and analyze this
    audit log table.

Some documents are added:

1. HELP docs of install/uninstall/show plugin.
2. Rename the `README.md` in `fe_plugins/` dir to `plugin-development-manual.md` and move
    it to the `docs/` dir
3. `audit-plugin.md` to introduce the usage of `AuditLoader` plugin.

ISSUE: #3226
2020-04-03 09:53:50 +08:00
29b37dad49 Sql reference of materialized view (#3208)
* Sql reference of materialized view

Sql reference of Create and drop materialized view in English and Chinese.

* Change discription
2020-04-01 21:22:19 +08:00
d3555e3624 [Conf][API Change] Change the default FE meta dir and BE storage_root_path
1. Change word of palo to doris in conf file.
2. Set default meta_dir to ${DORIS_HOME}/doris-meta
3. Comment out FE meta_dir, leave it to ${DORIS_HOME}/doris-meta, as exsting in FE Config.java.
4. Comment out BE storage_root_path, leave it to ${DORIS_HOME}/storage, as exsting in BE config.h.

NOTICE: default config is changed.
2020-03-27 20:42:12 +08:00
aa8b2f86c4 [Bug][Refactor] Fix the conflict of temp partition and dynamic partition operations (#3201)
The bug is described in issue: #3200.

This CL solve the problem by:
1. Refactor the alter operation conflict checking logic by introducing new classes `AlterOperations` and `AlterOpType`.
2. Allow add/drop temporary partition when dynamic partition feature is enabled.
3. Allow modifying table's property when there is temporary partition in table.
4. Make the properties `dynamic_partition.enable` optional, and default is true.
2020-03-27 20:25:15 +08:00
c1969a3fb3 [Conf] Make default_storage_medium configurable (#2980)
Doris support choose medium when create table, and the cluster balance strategy is dependent
between different storage medium, and most use will not specify the storage medium when create table,
even they kown that they should choose a storage medium, they have no idea about the
cluster's storage medium, so, I think we should make storage_medium and storage_cooldown_time
configurable, and this should be the admin's responsibility.

For Example, if the cluster's storage medium is HDD, but we need to change part of machines to SSD,
if we change the machine, the tablets before change is stored in HDD and they can't find a dest path
to migrate, and user will create table as usual, it will make all tablets stored in old machines and
the new machines will only store a little tablets. Without this config the only way is admin need
to traverse all partitions in cluster and change the property of storage_medium, it will increase
operational and maintenance costs.

So I add a FE config default_storage_medium, so that user can set the default storage medium.
2020-03-27 20:22:18 +08:00
8fa328c344 [Doc]Update doc for dynamic partition (#3093)
Add explain of dynamic dropping partition.
2020-03-25 20:45:13 +08:00
3b32938140 [Doc] Create CONTRIBUTING.md (#3180) 2020-03-24 13:42:21 +08:00
d4c1938b5c Open datetime min value limit (#3158)
the min_value in olap/type.h of datetime is 0000-01-01 00:00:00, so we don't need restrict datetime min in tablet_sink
2020-03-24 10:52:57 +08:00
0f14408f13 [Temp Partition] Support loading data into temp partitions (#3120)
Related issue: #2663, #2828.

This CL support loading data into specified temporary partitions.

```
INSERT INTO tbl TEMPORARY PARTITIONS(tp1, tp2, ..) ....;

curl .... -H "temporary_partition: tp1, tp, .. "  ....

LOAD LABEL db1.label1 (
DATA INFILE("xxxx") 
INTO TABLE `tbl2`
TEMPORARY PARTITION(tp1, tp2, ...)
...
```

NOTICE: this CL change the FE meta version to 77.

There 3 major changes in this CL

## Syntax reorganization

Reorganized the syntax related to the `specify-partitions`. Removed some redundant syntax
 definitions, and unified the syntax related to the `specify-partitions` under one syntax entry.

## Meta refactor

In order to be able to support specifying temporary partitions, 
I made some changes to the way the partition information in the table is stored.

Partition information is now organized as follows:

The following two maps are reserved in OlapTable for storing formal partitions:

    ```
    idToPartition
    nameToPartition
    ```

Use the `TempPartitions` class for storing temporary partitions.

All the partition attributes of the formal partition and the temporary partition,
such as the range, the number of replicas, and the storage medium, are all stored
in the `partitionInfo` of the OlapTable.

In `partitionInfo`, we use two maps to store the range of formal partition
and temporary partition:

    ```
    idToRange
    idToTempRange
    ```

Use separate map is because the partition ranges of the formal partition and
the temporary partition may overlap. Separate map can more easily check the partition range.

All partition attributes except the partition range are stored using the same map,
and the partition id is used as the map key.

## Method to get partition

A table may contain both formal and temporary partitions.
There are several methods to get the partition of a table.
Typically divided into two categories:

1. Get partition by id
2. Get partition by name

According to different requirements, the caller may want to obtain
a formal partition or a temporary partition. These methods are
described below in order to obtain the partition by using the correct method.

1. Get by name

This type of request usually comes from a user with partition names. Such as
`select * from tbl partition(p1);`.
This type of request has clear information to indicate whether to obtain a
formal or temporary partition.
Therefore, we need to get the partition through this method:

`getPartition(String partitionName, boolean isTemp)`

To avoid modifying too much code, we leave the `getPartition(String
partitionName)`, which is same as:

`getPartition(partitionName, false)`

2. Get by id

This type of request usually means that the previous step has obtained
certain partition ids in some way,
so we only need to get the corresponding partition through this method:

`getPartition(long partitionId)`.

This method will try to get both formal partitions and temporary partitions.

3. Get all partition instances

Depending on the requirements, the caller may want to obtain all formal
partitions,
all temporary partitions, or all partitions. Therefore we provide 3 methods,
the caller chooses according to needs.

`getPartitions()`
`getTempPartitions()`
`getAllPartitions()`
2020-03-19 15:07:01 +08:00
f6374fa9a5 Use default_rowset_type to replace compaction_rowset_type (#3101)
* use default_rowset_type to replace compaction_rowset_type

* segment v2 usage document
2020-03-16 22:23:48 +08:00
14c088161c [New Stmt] Support setting replica status manually (#1522)
Sometimes a replica is broken on BE, but FE does not notice that.
In this case, we have to manually delete that replica on BE.
If there are hundreds of replicas need to be handled, this is a disaster.

So I add a new stmt:

    ADMIN SET REPLICA STATUS

which support setting tablet on specified BE as BAD or OK.
2020-03-16 13:42:30 +08:00
5f18e99cdb [Doc] Update add fe node description (#3100) 2020-03-13 18:05:09 +08:00
d8c756260b Rewrite count distinct to bitmap and hll (#3096) 2020-03-13 11:44:40 +08:00
cf219ddf18 [ConsistencyCheck] Support checking replica consistency of tablet manually (#3067) 2020-03-10 15:25:25 +08:00
7400535b37 [Doc] Update compaction-action_EN.md (#3060)
fix typo
2020-03-09 22:09:43 +08:00
6e46dccd39 [Doc] Update compaction-action.md (#3059)
fix typo
2020-03-09 21:14:09 +08:00
765f284dcd [Doc] Add Downloads page to Doris website (#3039) 2020-03-09 09:42:46 +08:00
cd7207c869 Add ORC help doc (#3041) 2020-03-05 12:44:47 +08:00
cc1a5fb8ea [Function] Support '%' in date format string (#3037)
eg:
select str_to_date('2014-12-21 12%3A34%3A56', '%Y-%m-%d %H%%3A%i%%3A%s');
select unix_timestamp('2007-11-30 10:30%3A19', '%Y-%m-%d %H:%i%%3A%s');

This also enable us to extract column fields from HDFS file path with contains '%'.
2020-03-05 08:56:02 +08:00
70cc6df415 [Doc] Fix some typo (#3024) 2020-03-02 22:13:47 +08:00
511c5eed50 [Doc] Modify format of some docs (#3021)
Format of some docs are incorrect for building the doc website.
* fix a bug that `gensrc` dir can not be built with -j.
* fix ut bug of CreateFunctionTest
2020-03-02 19:07:52 +08:00
ef4bb0c011 [RoutineLoad] Auto Resume RoutineLoadJob (#2958)
When all backends restart, the routine load job can be resumed.
2020-03-02 13:27:35 +08:00
df56588bb5 [Temp Partition] Support add/drop/replace temp partitions (#2828)
This CL implements 3 new operations:

```
ALTER TABLE tbl ADD TEMPORARY PARTITION ...;
ALTER TABLE tbl DROP TEMPORARY PARTITION ...;
ALTER TABLE tbl REPLACE TEMPORARY PARTITION (p1, p2, ...);
```

User manual can be found in document:
`docs/documentation/cn/administrator-guide/alter-table/alter-table-temp-partition.md`

I did not update the grammar manual of `alter-table.md`.
This manual is too confusing and too big, I will reorganize this manual after.

This is the first part to implement the "overwrite load" feature mentioned in issue #2663.
I will implement the "load to temp partition" feature in next PR.

This CL also add GSON serialization method for the following classes (But not used):

```
Partition.java
MaterializedIndex.java
Tablet.java
Replica.java
```
2020-03-01 21:30:34 +08:00
0d1e28746e [Function] Support null_or_empty function (#2977)
It returns true if the string is empty or NULL. Otherwise it returns false.
2020-03-01 17:35:45 +08:00
078e35a62e Support Amazon S3 data source in Broker Load (#3004) 2020-03-01 12:53:50 +08:00
2ac07a8c07 [Doc] Fix docs mixed Chinese and English (#3017) 2020-02-28 16:36:37 +08:00
54b7828c3f [Doc] The doc of max_running_txn_num_per_db config (#3007) 2020-02-27 14:57:46 +08:00
57483ade00 [Doc] Fix typo in Chinese document (#2963)
Fix some errors in Chinese document
2020-02-25 22:30:21 +08:00
wyb
fc2d92d68a Update spark load doc (#2973) 2020-02-22 12:00:50 +08:00
96248058a1 [Doc] Modify the default port of restore meta instance in document (#2971) 2020-02-21 22:05:30 +08:00
70d2ccf384 Add spark load design (#2856) 2020-02-21 14:32:18 +08:00
cc0d41277c [Alter] Add more schema change to varchar type (#2777) 2020-02-19 23:14:43 +08:00
3994b52f34 [Alter] Change max create replicas timeout configurable (#2945) 2020-02-19 17:47:27 +08:00
87a84a793e Add more doc content about hdfs broker auth and config detail (#2935) 2020-02-18 21:15:33 +08:00
625411bd28 Doris support in memory olap table (#2847) 2020-02-18 10:45:54 +08:00
3e160aeb66 [GroupingSet] fix a bug when using grouping set without all column in a grouping set item (#2877)
fix a bug when using grouping sets without all column in a grouping set item will produce wrong value.
fix grouping function check will not work in group by clause
2020-02-12 21:50:12 +08:00
59dd2a0d7f Fix some typo in bitmap index docs. (#2879) 2020-02-11 19:03:42 +08:00
1f001481ae Support batch add and drop rollup indexes #2671 (#2781) 2020-02-11 12:58:01 +08:00
d8a47f9a7f [Doc] Fix CANCEL DECOMMISSION syntax bug (#2860)
Fix CANCEL DECOMMISSION syntax bug
2020-02-08 13:07:19 +08:00
581c771ff3 [Doc] Add create index usage document (#2832) 2020-02-05 15:35:41 +08:00
89c7234c1c Support starts_with (str, prefix) function (#2813)
Support starts_with function
2020-01-21 14:09:08 +08:00
acc89411dc Fix docs sequence error (#2814) 2020-01-20 22:35:40 +08:00
010f6cd1c1 Update installing/compilation.md (#2816)
Fix docker images version
2020-01-20 22:27:22 +08:00
58ff952837 [Stmt] Support new show functions syntax to make user search function more conveniently (#2800)
SHOW [FULL] [BUILTIN] FUNCTIONS [IN|FROM db] [LIKE 'function_pattern'];
2020-01-20 14:14:42 +08:00
fc55423032 [SQL] Support Grouping Sets, Rollup and Cube to extend group by statement
Support Grouping Sets, Rollup and Cube to extend group by statement
support GROUPING SETS syntax 
```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) );
```
cube  or rollup like 
```
SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP|CUBE(a,b,c)
```

[ADD] support grouping functions in expr like grouping(a) + grouping(b) (#2039)
[FIX] fix analyzer error in window function(#2039)
2020-01-17 16:24:02 +08:00
3b24287251 Support 64 bits integers for BITMAP type (#2772)
Fixes #2771 

Main changes in this CL
* RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h
* leveraging Roaring64Map to support unsigned BIGINT for BITMAP type
* introduces two new format (SINGLE64 and BITMAP64) for BITMAP type

So far we have three storage format for BITMAP type

```
EMPTY := TypeCode(0x00)
SINGLE32 := TypeCode(0x01), UInt32LittleEndian
BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/)
```

In order to support BIGINT element and keep backward compatibility, introduce two new format

```
SINGLE64 := TypeCode(0x03), UInt64LittleEndian
BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64
```

Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column.

Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are

1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the [C++ version](https://github.com/RoaringBitmap/CRoaring/blob/v0.2.60/cpp/roaring64map.hh#L545) is different from the [Java version](35104c564e/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java (L1097)). Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this.
2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.
2020-01-17 14:13:38 +08:00
8df63bc191 [Doc] Add en doc for dynamic partition feature (#2764) 2020-01-16 21:54:26 +08:00