Commit Graph

3158 Commits

Author SHA1 Message Date
59dd2a0d7f Fix some typo in bitmap index docs. (#2879) 2020-02-11 19:03:42 +08:00
1f001481ae Support batch add and drop rollup indexes #2671 (#2781) 2020-02-11 12:58:01 +08:00
d8a47f9a7f [Doc] Fix CANCEL DECOMMISSION syntax bug (#2860)
Fix CANCEL DECOMMISSION syntax bug
2020-02-08 13:07:19 +08:00
581c771ff3 [Doc] Add create index usage document (#2832) 2020-02-05 15:35:41 +08:00
89c7234c1c Support starts_with (str, prefix) function (#2813)
Support starts_with function
2020-01-21 14:09:08 +08:00
acc89411dc Fix docs sequence error (#2814) 2020-01-20 22:35:40 +08:00
010f6cd1c1 Update installing/compilation.md (#2816)
Fix docker images version
2020-01-20 22:27:22 +08:00
58ff952837 [Stmt] Support new show functions syntax to make user search function more conveniently (#2800)
SHOW [FULL] [BUILTIN] FUNCTIONS [IN|FROM db] [LIKE 'function_pattern'];
2020-01-20 14:14:42 +08:00
fc55423032 [SQL] Support Grouping Sets, Rollup and Cube to extend group by statement
Support Grouping Sets, Rollup and Cube to extend group by statement
support GROUPING SETS syntax 
```
SELECT a, b, SUM( c ) FROM tab1 GROUP BY GROUPING SETS ( (a, b), (a), (b), ( ) );
```
cube  or rollup like 
```
SELECT a, b,c, SUM( d ) FROM tab1 GROUP BY ROLLUP|CUBE(a,b,c)
```

[ADD] support grouping functions in expr like grouping(a) + grouping(b) (#2039)
[FIX] fix analyzer error in window function(#2039)
2020-01-17 16:24:02 +08:00
3b24287251 Support 64 bits integers for BITMAP type (#2772)
Fixes #2771 

Main changes in this CL
* RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h
* leveraging Roaring64Map to support unsigned BIGINT for BITMAP type
* introduces two new format (SINGLE64 and BITMAP64) for BITMAP type

So far we have three storage format for BITMAP type

```
EMPTY := TypeCode(0x00)
SINGLE32 := TypeCode(0x01), UInt32LittleEndian
BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/)
```

In order to support BIGINT element and keep backward compatibility, introduce two new format

```
SINGLE64 := TypeCode(0x03), UInt64LittleEndian
BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64
```

Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column.

Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are

1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the [C++ version](https://github.com/RoaringBitmap/CRoaring/blob/v0.2.60/cpp/roaring64map.hh#L545) is different from the [Java version](35104c564e/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java (L1097)). Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this.
2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.
2020-01-17 14:13:38 +08:00
8df63bc191 [Doc] Add en doc for dynamic partition feature (#2764) 2020-01-16 21:54:26 +08:00
0ddca59d36 Add timestampadd/timestampdiff function (#2725) 2020-01-15 21:47:07 +08:00
9e54751098 [Snapshot] Modify the prefer snapshot version (#2748)
In this CL, prefer snapshot version in snapshot request is defined
in thrift. So that both FE and BE can use this version value.
2020-01-15 15:10:14 +08:00
7768629f08 Add bitmap_contains and bitmap_has_any functions (#2752) 2020-01-15 14:31:44 +08:00
e5717efc5a [Insert] Return more info of insert operation (#2718)
Standardize the return results of INSERT operations,
which is convenient for users to use and locate problems.

More details can be found in insert-into-manual.md
2020-01-15 10:39:53 +08:00
a36193dfab Support decimal and timestamp type in orc load (#2759) 2020-01-15 07:40:30 +08:00
f071d5a307 Support ends_with function (#2746) 2020-01-14 22:37:20 +08:00
e5197eff94 Update the doc of doris to fix some mistakes (#2758) 2020-01-14 22:26:49 +08:00
a99a49a444 Add bitamp_to_string function (#2731)
This CL changes:

1. add function bitmap_to_string and bitmap_from_string, which will
 convert a bitmap to/from string which contains all bit in bitmap
2. add function murmur_hash3_32, which will compute murmur hash for
input strings
3. make the function cast float to string the same with user result
logic
2020-01-13 12:31:37 +08:00
4e868252fc Add .clang-format and docs (#2724)
The problem of inconsistence style in Doris code is too big, it's hard to minimize modification when reformatting code.
So here, our aim is to make the style rules, tune the config in .clang-format.

Note: I choose clang-format-8.0+ to support richer sytle options.
2020-01-11 20:54:20 +08:00
ccaa97a5ac Make bitmap functions accept any expression that returns bitmap (#2728)
This CL make bitmap_count, bitmap_union, and bitmap_union_count accept any expression whose return type is bitmap as input so that we can support flexible bitmap expression such as bitmap_count(bitmap_and(to_bitmap(1), to_bitmap(2))).

This CL also create separate documentation for each bitmap UDF to conform with other functions.
2020-01-11 14:02:12 +08:00
6bc54ef3f0 [Document] Add dynamic partition docs (#2711) 2020-01-08 23:08:48 +08:00
a028c52edd Add BE function bitmap_or and bitmap_and (#2707) 2020-01-08 19:59:44 +08:00
e94d4656d8 Fix roles typo in privilege document (#2702) 2020-01-08 15:39:10 +08:00
367b4c058c [Doc]Remove used doc content about be_rpc_port (#2694) 2020-01-07 22:20:50 +08:00
42dfe1369b Add filter conditions for 'show partitions from table' syntax (#2553)
Add filter conditions for show partitions from table syntax, to filter partitions needed
2020-01-03 19:52:25 +08:00
c098178f7a [Index] Implements create drop show index syntax for bitmap index [#2487] (#2573)
### create table with index 
```
CREATE TABLE table1
(
    siteid INT DEFAULT '10',
    citycode SMALLINT,
    username VARCHAR(32) DEFAULT '',
    pv BIGINT SUM DEFAULT '0',
    INDEX index_name [USING BITMAP] (siteid, citycode) COMMENT 'balabala'
)
AGGREGATE KEY(siteid, citycode, username)
DISTRIBUTED BY HASH(siteid) BUCKETS 10
PROPERTIES("replication_num" = "1");
```
### create index 
```
CREATE INDEX index_name  ON table1 (siteid, citycod) [USING BITMAP] COMMENT 'balabala';
or 
ALTER TABLE table1 ADD  INDEX index_name  [USING BITMAP] (siteid, citycod) COMMENT 'balabala';
```
### drop index
```
DROP INDEX index_name ON table1;
or
ALTER TABLE table1 DROP INDEX index_name 
```

### show index
```
SHOW INDEX[ES] FROM table1
```
output
```
+---------+-------------+-----------------+------------+---------+
| Table   | Index_name  | Column_name     | Index_type | Comment |
+---------+-------------+-----------------+------------+---------+
| table1  | index_name  | siteid,citycode | BITMAMP    | balabala|
+---------+-------------+-----------------+------------+---------+
```
2020-01-03 17:41:26 +08:00
3b5f608df7 Delete documents of count_distinct function (#2642) 2020-01-02 23:35:10 +08:00
4220e09724 Change introduction document (#2640)
change FE default maximum memory introduction document from 2G to 4G in english
2020-01-02 23:03:56 +08:00
b440ff286f Change introduction document (#2639)
Change FE default maximum memory introduction document  from 2G to 4G
2020-01-02 23:03:17 +08:00
9783fb7221 Fix: UDF version `GLIBCXX_3.4.21' not found (#2629) 2019-12-31 18:32:42 +08:00
3022955f32 Fix recover time in docs (#2588) 2019-12-27 14:51:41 +08:00
1113f951c3 Alter view stmt (#2522)
This commit adds a new statement named alter view, like
ALTER VIEW view_name
(
col_1,
col_2,
col_3,
)
AS SELECT k1, k2, SUM(v1) FROM exampleDb.testTbl GROUP BY k1,k2
2019-12-27 14:02:56 +08:00
1421a9be41 [Compaction] Support compact only one rowset (#2558)
Support compaction operation to compact only one rowset.
After the modification, the last rowset of the tablet will
also be compacted.

At the same time, we added a `segments_overlap_pb` field to
the rowset meta. Used to describe whether the segment data
in the rowset overlaps. This field is set by `rowset_writer`.
Initially UNKNOWN for compatibility with existing data.

In addition, the version hash of the rowset generated after
compaction is directly set to the version hash of last rowset
participating in compaction, to ensure that the tablet's
version hash remains unchanged after compaction.
2019-12-27 10:08:41 +08:00
f7032b07f3 Support more schema change from VARCHAR type (#2501) 2019-12-26 22:38:53 +08:00
6f3c50a95c [Document] Add example for using CTE in INSERT operation (#2572) 2019-12-26 10:00:34 +08:00
e7be52fa58 Update basic-usage_EN.md (#2530) 2019-12-23 16:04:27 +08:00
20abfc5f6f Modify stream-load-manual_EN.md (#2528) 2019-12-23 15:34:19 +08:00
008e59476d Add curdate function doc (#2520) 2019-12-20 21:24:56 +08:00
6815979ba5 Fix invalid to_bitmap input lead to BE core (#2510) 2019-12-19 21:28:00 +08:00
63ea05f9c7 Add convert tablet rowset type (#2294)
to solve the issue #2246.

scheme is as following:

    add a optional preferred_rowset_type in TabletMeta for V2 format rollup index tablet
    add a boolean session variable use_v2_rollup, if set true, the query will v2 storage format rollup index to process the query.
    test queries will be sent to online service to verify the correctness of segment-v2 by send the the same queries to fe with use_v2_rollup set or not to check whether the returned results are the same.
2019-12-18 18:49:47 +08:00
222f8390c7 [Compaction] Fix the bug that cumulative point grows unreasonably (#2490)
When there are to many segment in one rowset, which is larger than
BE config 'max_cumulative_compaction_num_singleton_deltas', the
cumulative compaction will not work and just increase the cumulative
point, because there is only once rowset being selected.

So when selecting rowset for cumulative compaction, we should meet 2
requirments before finishing the selection logic:

1. compaction score is larger than 'max_cumulative_compaction_num_singleton_deltas'
2. at least 2 rowsets are selected.
2019-12-18 12:59:17 +08:00
c81b1db406 Support convert VARCHAR type to DATE type (#2489) 2019-12-18 12:58:47 +08:00
89003b774b Support Convert Varchar to INT (#2481) 2019-12-17 22:02:28 +08:00
e1ba0efbc7 Optimize compaction strategy of tablet on BE (#2473)
The current compaction selection strategy and cumulative point update logic
will cause the cumulative compaction to not work, and all compaction tasks
will be completed only by the base compaction. This can cause a large number
of data versions to pile up.

In the current cumulative point update logic, when a cumulative cannot select
enough number of rowsets, it will directly increase the cumulative point.
Therefore, when the data version generates the same speed as the cumulative
compaction polling, it will cause the cumulative point to continuously increase
without triggering the cumulative compaction.

The new strategy mainly modifies the update logic of cumulative point to ensure
that the above problems do not occur. At the same time, the new strategy also
takes into account the problem that compaction cannot be performed if cumulative
points stagnate for a long time. Cumulative points will be forced to increase
through threshold settings to ensure that compaction has a chance to execute.

Also add a new HTTP API to view the compaction status of specified tablet.
See `compaction-action.md` for details.
2019-12-17 10:30:43 +08:00
55cb1cd1f1 Update date_format.md (#2476) 2019-12-16 20:43:55 +08:00
b20a76163b Update from_unixtime.md (#2475) 2019-12-16 19:39:54 +08:00
9244db40f7 Update bitmap doc (#2467) 2019-12-16 18:56:53 +08:00
ebb6506924 Fix doc (#2449) 2019-12-12 20:56:25 +08:00
bf31bd238b Change default storage model from aggregate to duplicate(#2318) (#2412)
change default storage model from aggregate to duplicate
for sql  `create table t (k1 int) DISTRIBUTED BY HASH(k1) BUCKETS 10 PROPERTIES("replication_num" = "1");`
before: 
```
 CREATE TABLE `t` (
  `k1` int(11) NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 10
PROPERTIES (
"storage_type" = "COLUMN"
);
```
after:
```
CREATE TABLE `t` (
  `k1` int(11) NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 10
PROPERTIES (
"storage_type" = "COLUMN"
);
```

#2318
2019-12-12 14:30:30 +08:00