Commit Graph

250 Commits

Author SHA1 Message Date
fc55423032 [SQL] Support Grouping Sets, Rollup and Cube to extend group by statement
Support Grouping Sets, Rollup and Cube to extend group by statement
support GROUPING SETS syntax 
```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) );
```
cube  or rollup like 
```
SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP|CUBE(a,b,c)
```

[ADD] support grouping functions in expr like grouping(a) + grouping(b) (#2039)
[FIX] fix analyzer error in window function(#2039)
2020-01-17 16:24:02 +08:00
3b24287251 Support 64 bits integers for BITMAP type (#2772)
Fixes #2771 

Main changes in this CL
* RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h
* leveraging Roaring64Map to support unsigned BIGINT for BITMAP type
* introduces two new format (SINGLE64 and BITMAP64) for BITMAP type

So far we have three storage format for BITMAP type

```
EMPTY := TypeCode(0x00)
SINGLE32 := TypeCode(0x01), UInt32LittleEndian
BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/)
```

In order to support BIGINT element and keep backward compatibility, introduce two new format

```
SINGLE64 := TypeCode(0x03), UInt64LittleEndian
BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64
```

Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column.

Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are

1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the [C++ version](https://github.com/RoaringBitmap/CRoaring/blob/v0.2.60/cpp/roaring64map.hh#L545) is different from the [Java version](35104c564e/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java (L1097)). Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this.
2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.
2020-01-17 14:13:38 +08:00
8df63bc191 [Doc] Add en doc for dynamic partition feature (#2764) 2020-01-16 21:54:26 +08:00
0ddca59d36 Add timestampadd/timestampdiff function (#2725) 2020-01-15 21:47:07 +08:00
9e54751098 [Snapshot] Modify the prefer snapshot version (#2748)
In this CL, prefer snapshot version in snapshot request is defined
in thrift. So that both FE and BE can use this version value.
2020-01-15 15:10:14 +08:00
7768629f08 Add bitmap_contains and bitmap_has_any functions (#2752) 2020-01-15 14:31:44 +08:00
e5717efc5a [Insert] Return more info of insert operation (#2718)
Standardize the return results of INSERT operations,
which is convenient for users to use and locate problems.

More details can be found in insert-into-manual.md
2020-01-15 10:39:53 +08:00
a36193dfab Support decimal and timestamp type in orc load (#2759) 2020-01-15 07:40:30 +08:00
f071d5a307 Support ends_with function (#2746) 2020-01-14 22:37:20 +08:00
e5197eff94 Update the doc of doris to fix some mistakes (#2758) 2020-01-14 22:26:49 +08:00
a99a49a444 Add bitamp_to_string function (#2731)
This CL changes:

1. add function bitmap_to_string and bitmap_from_string, which will
 convert a bitmap to/from string which contains all bit in bitmap
2. add function murmur_hash3_32, which will compute murmur hash for
input strings
3. make the function cast float to string the same with user result
logic
2020-01-13 12:31:37 +08:00
4e868252fc Add .clang-format and docs (#2724)
The problem of inconsistence style in Doris code is too big, it's hard to minimize modification when reformatting code.
So here, our aim is to make the style rules, tune the config in .clang-format.

Note: I choose clang-format-8.0+ to support richer sytle options.
2020-01-11 20:54:20 +08:00
ccaa97a5ac Make bitmap functions accept any expression that returns bitmap (#2728)
This CL make bitmap_count, bitmap_union, and bitmap_union_count accept any expression whose return type is bitmap as input so that we can support flexible bitmap expression such as bitmap_count(bitmap_and(to_bitmap(1), to_bitmap(2))).

This CL also create separate documentation for each bitmap UDF to conform with other functions.
2020-01-11 14:02:12 +08:00
6bc54ef3f0 [Document] Add dynamic partition docs (#2711) 2020-01-08 23:08:48 +08:00
a028c52edd Add BE function bitmap_or and bitmap_and (#2707) 2020-01-08 19:59:44 +08:00
e94d4656d8 Fix roles typo in privilege document (#2702) 2020-01-08 15:39:10 +08:00
367b4c058c [Doc]Remove used doc content about be_rpc_port (#2694) 2020-01-07 22:20:50 +08:00
42dfe1369b Add filter conditions for 'show partitions from table' syntax (#2553)
Add filter conditions for show partitions from table syntax, to filter partitions needed
2020-01-03 19:52:25 +08:00
c098178f7a [Index] Implements create drop show index syntax for bitmap index [#2487] (#2573)
### create table with index 
```
CREATE TABLE table1
(
    siteid INT DEFAULT '10',
    citycode SMALLINT,
    username VARCHAR(32) DEFAULT '',
    pv BIGINT SUM DEFAULT '0',
    INDEX index_name [USING BITMAP] (siteid, citycode) COMMENT 'balabala'
)
AGGREGATE KEY(siteid, citycode, username)
DISTRIBUTED BY HASH(siteid) BUCKETS 10
PROPERTIES("replication_num" = "1");
```
### create index 
```
CREATE INDEX index_name  ON table1 (siteid, citycod) [USING BITMAP] COMMENT 'balabala';
or 
ALTER TABLE table1 ADD  INDEX index_name  [USING BITMAP] (siteid, citycod) COMMENT 'balabala';
```
### drop index
```
DROP INDEX index_name ON table1;
or
ALTER TABLE table1 DROP INDEX index_name 
```

### show index
```
SHOW INDEX[ES] FROM table1
```
output
```
+---------+-------------+-----------------+------------+---------+
| Table   | Index_name  | Column_name     | Index_type | Comment |
+---------+-------------+-----------------+------------+---------+
| table1  | index_name  | siteid,citycode | BITMAMP    | balabala|
+---------+-------------+-----------------+------------+---------+
```
2020-01-03 17:41:26 +08:00
3b5f608df7 Delete documents of count_distinct function (#2642) 2020-01-02 23:35:10 +08:00
4220e09724 Change introduction document (#2640)
change FE default maximum memory introduction document from 2G to 4G in english
2020-01-02 23:03:56 +08:00
b440ff286f Change introduction document (#2639)
Change FE default maximum memory introduction document  from 2G to 4G
2020-01-02 23:03:17 +08:00
9783fb7221 Fix: UDF version `GLIBCXX_3.4.21' not found (#2629) 2019-12-31 18:32:42 +08:00
3022955f32 Fix recover time in docs (#2588) 2019-12-27 14:51:41 +08:00
1113f951c3 Alter view stmt (#2522)
This commit adds a new statement named alter view, like
ALTER VIEW view_name
(
col_1,
col_2,
col_3,
)
AS SELECT k1, k2, SUM(v1) FROM exampleDb.testTbl GROUP BY k1,k2
2019-12-27 14:02:56 +08:00
1421a9be41 [Compaction] Support compact only one rowset (#2558)
Support compaction operation to compact only one rowset.
After the modification, the last rowset of the tablet will
also be compacted.

At the same time, we added a `segments_overlap_pb` field to
the rowset meta. Used to describe whether the segment data
in the rowset overlaps. This field is set by `rowset_writer`.
Initially UNKNOWN for compatibility with existing data.

In addition, the version hash of the rowset generated after
compaction is directly set to the version hash of last rowset
participating in compaction, to ensure that the tablet's
version hash remains unchanged after compaction.
2019-12-27 10:08:41 +08:00
f7032b07f3 Support more schema change from VARCHAR type (#2501) 2019-12-26 22:38:53 +08:00
6f3c50a95c [Document] Add example for using CTE in INSERT operation (#2572) 2019-12-26 10:00:34 +08:00
e7be52fa58 Update basic-usage_EN.md (#2530) 2019-12-23 16:04:27 +08:00
20abfc5f6f Modify stream-load-manual_EN.md (#2528) 2019-12-23 15:34:19 +08:00
008e59476d Add curdate function doc (#2520) 2019-12-20 21:24:56 +08:00
6815979ba5 Fix invalid to_bitmap input lead to BE core (#2510) 2019-12-19 21:28:00 +08:00
63ea05f9c7 Add convert tablet rowset type (#2294)
to solve the issue #2246.

scheme is as following:

    add a optional preferred_rowset_type in TabletMeta for V2 format rollup index tablet
    add a boolean session variable use_v2_rollup, if set true, the query will v2 storage format rollup index to process the query.
    test queries will be sent to online service to verify the correctness of segment-v2 by send the the same queries to fe with use_v2_rollup set or not to check whether the returned results are the same.
2019-12-18 18:49:47 +08:00
222f8390c7 [Compaction] Fix the bug that cumulative point grows unreasonably (#2490)
When there are to many segment in one rowset, which is larger than
BE config 'max_cumulative_compaction_num_singleton_deltas', the
cumulative compaction will not work and just increase the cumulative
point, because there is only once rowset being selected.

So when selecting rowset for cumulative compaction, we should meet 2
requirments before finishing the selection logic:

1. compaction score is larger than 'max_cumulative_compaction_num_singleton_deltas'
2. at least 2 rowsets are selected.
2019-12-18 12:59:17 +08:00
c81b1db406 Support convert VARCHAR type to DATE type (#2489) 2019-12-18 12:58:47 +08:00
89003b774b Support Convert Varchar to INT (#2481) 2019-12-17 22:02:28 +08:00
e1ba0efbc7 Optimize compaction strategy of tablet on BE (#2473)
The current compaction selection strategy and cumulative point update logic
will cause the cumulative compaction to not work, and all compaction tasks
will be completed only by the base compaction. This can cause a large number
of data versions to pile up.

In the current cumulative point update logic, when a cumulative cannot select
enough number of rowsets, it will directly increase the cumulative point.
Therefore, when the data version generates the same speed as the cumulative
compaction polling, it will cause the cumulative point to continuously increase
without triggering the cumulative compaction.

The new strategy mainly modifies the update logic of cumulative point to ensure
that the above problems do not occur. At the same time, the new strategy also
takes into account the problem that compaction cannot be performed if cumulative
points stagnate for a long time. Cumulative points will be forced to increase
through threshold settings to ensure that compaction has a chance to execute.

Also add a new HTTP API to view the compaction status of specified tablet.
See `compaction-action.md` for details.
2019-12-17 10:30:43 +08:00
55cb1cd1f1 Update date_format.md (#2476) 2019-12-16 20:43:55 +08:00
b20a76163b Update from_unixtime.md (#2475) 2019-12-16 19:39:54 +08:00
9244db40f7 Update bitmap doc (#2467) 2019-12-16 18:56:53 +08:00
ebb6506924 Fix doc (#2449) 2019-12-12 20:56:25 +08:00
bf31bd238b Change default storage model from aggregate to duplicate(#2318) (#2412)
change default storage model from aggregate to duplicate
for sql  `create table t (k1 int) DISTRIBUTED BY HASH(k1) BUCKETS 10 PROPERTIES("replication_num" = "1");`
before: 
```
 CREATE TABLE `t` (
  `k1` int(11) NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 10
PROPERTIES (
"storage_type" = "COLUMN"
);
```
after:
```
CREATE TABLE `t` (
  `k1` int(11) NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 10
PROPERTIES (
"storage_type" = "COLUMN"
);
```

#2318
2019-12-12 14:30:30 +08:00
c07f37d78c [Segment V2] Add a control framework between FE and BE through heartbeat #2247 (#2364)
The control framework is implemented through heartbeat message. Use uint64_t as flags to control different functions. 
Now add a flag to set the default rowset type to beta.
2019-12-12 12:18:32 +08:00
5951a0eaea Add more schema change docs (#2411)
Add explanation about converting:

DATE -> DATETIME
DATETIME -> DATE
INT->DATE
2019-12-10 16:46:41 +08:00
a46bf1ada3 [Authorization] Modify the authorization checking logic (#2372)
**Authorization checking logic**

There are some problems with the current password and permission checking logic. For example:
First, we create a user by:
`create user cmy@"%" identified by "12345";`

And then 'cmy' can login with password '12345' from any hosts.

Second, we create another user by:
`create user cmy@"192.168.%" identified by "abcde";`

Because "192.168.%" has a higher priority in the permission table than "%". So when "cmy" try
to login in by password "12345" from host "192.168.1.1", it should match the second permission
entry, and will be rejected because of invalid password.
But in current implementation, Doris will continue to check password on first entry, than let it pass. So we should change it.

**Permission checking logic**

After a user login, it should has a unique identity which is got from permission table. For example,
when "cmy" from host "192.168.1.1" login, it's identity should be `cmy@"192.168.%"`. And Doris
should use this identity to check other permission, not by using the user's real identity, which is
`cmy@"192.168.1.1"`.

**Black list**
Functionally speaking, Doris only support adding WHITE LIST, which is to allow user to login from
those hosts in the white list. But is some cases, we do need a BLACK LIST function.
Fortunately, by changing the logic described above, we can simulate the effect of the BLACK LIST.

For example, First we add a user by:
`create user cmy@'%' identified by '12345';`

And now user 'cmy' can login from any hosts. and if we don't want 'cmy' to login from host A, we
can add a new user by:
`create user cmy@'A' identified by 'other_passwd';`

Because "A" has a higher priority in the permission table than "%". If 'cmy' try to login from A using password '12345', it will be rejected.
2019-12-06 17:45:56 +08:00
9fbc1c7ee6 Support where/orderby/limit after “SHOW ALTER TABLE COLUMN“ syntax (#2380)
Features:
1、Support WHERE/ORDER BY/LIMIT
2、Columns:TableName、CreatTime、FinishTime、State
3、Only “And” between conditions
4、TableName and State column only support "=" operator
5、CreateTime and FinishTime column support “=”,“>=”,"<=",">","<","!=" operators
6、CreateTime and FinishTime column support Date and DateTime string, eg:"2019-12-04" or "2019-12-04 17:18:00"

TestCase:
MySQL [haibotest]> show alter table column where State='FINISHED' and CreateTime > '2019-12-03' order by FinishTime desc limit 0,2;
+-------+---------------+---------------------+---------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
| JobId | TableName | CreateTime | FinishTime | IndexName | IndexId | OriginIndexId | SchemaVersion | TransactionId | State | Msg | Progress | Timeout |
+-------+---------------+---------------------+---------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
| 11134 | test_schema_2 | 2019-12-03 19:21:42 | 2019-12-03 19:22:11 | test_schema_2 | 11135 | 11059 | 1:192010000 | 3 | FINISHED | | N/A | 86400 |
| 11096 | test_schema_3 | 2019-12-03 19:21:31 | 2019-12-03 19:21:51 | test_schema_3 | 11097 | 11018 | 1:2063361382 | 2 | FINISHED | | N/A | 86400 |
+-------+---------------+---------------------+---------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
2 rows in set (0.00 sec)
2019-12-06 16:24:44 +08:00
d937f7b51e Fix the error of stream load doc (#2340) 2019-11-29 16:33:08 +08:00
8bf00afa25 Create table with nullable column for default (#2256)
Change the default column null property to nullable
2019-11-29 11:11:31 +08:00
e7b05f7eb3 Date format support java date style "yyyy-MM-dd HH:mm:ss" (#2309) 2019-11-28 14:34:31 +08:00
c33789ee49 Update insert-into-manual_EN.md (#2323)
Modify the description error for the enable_insert_strict parameter
2019-11-28 14:00:33 +08:00