Commit Graph

53 Commits

Author SHA1 Message Date
e5717efc5a [Insert] Return more info of insert operation (#2718)
Standardize the return results of INSERT operations,
which is convenient for users to use and locate problems.

More details can be found in insert-into-manual.md
2020-01-15 10:39:53 +08:00
a36193dfab Support decimal and timestamp type in orc load (#2759) 2020-01-15 07:40:30 +08:00
ccaa97a5ac Make bitmap functions accept any expression that returns bitmap (#2728)
This CL make bitmap_count, bitmap_union, and bitmap_union_count accept any expression whose return type is bitmap as input so that we can support flexible bitmap expression such as bitmap_count(bitmap_and(to_bitmap(1), to_bitmap(2))).

This CL also create separate documentation for each bitmap UDF to conform with other functions.
2020-01-11 14:02:12 +08:00
6bc54ef3f0 [Document] Add dynamic partition docs (#2711) 2020-01-08 23:08:48 +08:00
42dfe1369b Add filter conditions for 'show partitions from table' syntax (#2553)
Add filter conditions for show partitions from table syntax, to filter partitions needed
2020-01-03 19:52:25 +08:00
c098178f7a [Index] Implements create drop show index syntax for bitmap index [#2487] (#2573)
### create table with index 
```
CREATE TABLE table1
(
    siteid INT DEFAULT '10',
    citycode SMALLINT,
    username VARCHAR(32) DEFAULT '',
    pv BIGINT SUM DEFAULT '0',
    INDEX index_name [USING BITMAP] (siteid, citycode) COMMENT 'balabala'
)
AGGREGATE KEY(siteid, citycode, username)
DISTRIBUTED BY HASH(siteid) BUCKETS 10
PROPERTIES("replication_num" = "1");
```
### create index 
```
CREATE INDEX index_name  ON table1 (siteid, citycod) [USING BITMAP] COMMENT 'balabala';
or 
ALTER TABLE table1 ADD  INDEX index_name  [USING BITMAP] (siteid, citycod) COMMENT 'balabala';
```
### drop index
```
DROP INDEX index_name ON table1;
or
ALTER TABLE table1 DROP INDEX index_name 
```

### show index
```
SHOW INDEX[ES] FROM table1
```
output
```
+---------+-------------+-----------------+------------+---------+
| Table   | Index_name  | Column_name     | Index_type | Comment |
+---------+-------------+-----------------+------------+---------+
| table1  | index_name  | siteid,citycode | BITMAMP    | balabala|
+---------+-------------+-----------------+------------+---------+
```
2020-01-03 17:41:26 +08:00
3022955f32 Fix recover time in docs (#2588) 2019-12-27 14:51:41 +08:00
1113f951c3 Alter view stmt (#2522)
This commit adds a new statement named alter view, like
ALTER VIEW view_name
(
col_1,
col_2,
col_3,
)
AS SELECT k1, k2, SUM(v1) FROM exampleDb.testTbl GROUP BY k1,k2
2019-12-27 14:02:56 +08:00
f7032b07f3 Support more schema change from VARCHAR type (#2501) 2019-12-26 22:38:53 +08:00
c81b1db406 Support convert VARCHAR type to DATE type (#2489) 2019-12-18 12:58:47 +08:00
89003b774b Support Convert Varchar to INT (#2481) 2019-12-17 22:02:28 +08:00
bf31bd238b Change default storage model from aggregate to duplicate(#2318) (#2412)
change default storage model from aggregate to duplicate
for sql  `create table t (k1 int) DISTRIBUTED BY HASH(k1) BUCKETS 10 PROPERTIES("replication_num" = "1");`
before: 
```
 CREATE TABLE `t` (
  `k1` int(11) NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 10
PROPERTIES (
"storage_type" = "COLUMN"
);
```
after:
```
CREATE TABLE `t` (
  `k1` int(11) NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`k1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`k1`) BUCKETS 10
PROPERTIES (
"storage_type" = "COLUMN"
);
```

#2318
2019-12-12 14:30:30 +08:00
5951a0eaea Add more schema change docs (#2411)
Add explanation about converting:

DATE -> DATETIME
DATETIME -> DATE
INT->DATE
2019-12-10 16:46:41 +08:00
a46bf1ada3 [Authorization] Modify the authorization checking logic (#2372)
**Authorization checking logic**

There are some problems with the current password and permission checking logic. For example:
First, we create a user by:
`create user cmy@"%" identified by "12345";`

And then 'cmy' can login with password '12345' from any hosts.

Second, we create another user by:
`create user cmy@"192.168.%" identified by "abcde";`

Because "192.168.%" has a higher priority in the permission table than "%". So when "cmy" try
to login in by password "12345" from host "192.168.1.1", it should match the second permission
entry, and will be rejected because of invalid password.
But in current implementation, Doris will continue to check password on first entry, than let it pass. So we should change it.

**Permission checking logic**

After a user login, it should has a unique identity which is got from permission table. For example,
when "cmy" from host "192.168.1.1" login, it's identity should be `cmy@"192.168.%"`. And Doris
should use this identity to check other permission, not by using the user's real identity, which is
`cmy@"192.168.1.1"`.

**Black list**
Functionally speaking, Doris only support adding WHITE LIST, which is to allow user to login from
those hosts in the white list. But is some cases, we do need a BLACK LIST function.
Fortunately, by changing the logic described above, we can simulate the effect of the BLACK LIST.

For example, First we add a user by:
`create user cmy@'%' identified by '12345';`

And now user 'cmy' can login from any hosts. and if we don't want 'cmy' to login from host A, we
can add a new user by:
`create user cmy@'A' identified by 'other_passwd';`

Because "A" has a higher priority in the permission table than "%". If 'cmy' try to login from A using password '12345', it will be rejected.
2019-12-06 17:45:56 +08:00
9fbc1c7ee6 Support where/orderby/limit after “SHOW ALTER TABLE COLUMN“ syntax (#2380)
Features:
1、Support WHERE/ORDER BY/LIMIT
2、Columns:TableName、CreatTime、FinishTime、State
3、Only “And” between conditions
4、TableName and State column only support "=" operator
5、CreateTime and FinishTime column support “=”,“>=”,"<=",">","<","!=" operators
6、CreateTime and FinishTime column support Date and DateTime string, eg:"2019-12-04" or "2019-12-04 17:18:00"

TestCase:
MySQL [haibotest]> show alter table column where State='FINISHED' and CreateTime > '2019-12-03' order by FinishTime desc limit 0,2;
+-------+---------------+---------------------+---------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
| JobId | TableName | CreateTime | FinishTime | IndexName | IndexId | OriginIndexId | SchemaVersion | TransactionId | State | Msg | Progress | Timeout |
+-------+---------------+---------------------+---------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
| 11134 | test_schema_2 | 2019-12-03 19:21:42 | 2019-12-03 19:22:11 | test_schema_2 | 11135 | 11059 | 1:192010000 | 3 | FINISHED | | N/A | 86400 |
| 11096 | test_schema_3 | 2019-12-03 19:21:31 | 2019-12-03 19:21:51 | test_schema_3 | 11097 | 11018 | 1:2063361382 | 2 | FINISHED | | N/A | 86400 |
+-------+---------------+---------------------+---------------------+---------------+---------+---------------+---------------+---------------+----------+------+----------+---------+
2 rows in set (0.00 sec)
2019-12-06 16:24:44 +08:00
46181c0880 Fix some bugs about load label (#2241) 2019-11-23 00:04:45 +08:00
d8cfbbedf7 Support bitmap_empty function (#2227) 2019-11-18 20:37:00 +08:00
6759e83a07 Add license header for md files and fix some translation's error (#2137) 2019-11-06 21:35:07 +08:00
65c3b0907a Support aggregation type of REPLACE_IF_NOT_NULL (#2127)
Some use has the requirment that only some of columns will be update in
one load operation, and others will retain as original. However, Doris
can't handle this situation, because user must specify value for all
columns. Then if a column aggregation method is REPLACE, use must query
original value to overwrite it. This often needs some work for user to
do.

If this CL is applied, user can use REPLACE_IF_NOT_NULL instead of
REPLACE. Then when load data to table, if user don't intent to change
value of this column, user can specify NULL for this column. Doris will
retain original value for this column.
2019-11-05 18:08:34 +08:00
95a3b4ccfe Add object type (#1948)
Add a new type: Object. Currently, it's mainly for complex aggregate metrics(HLL , Bitmap).

The Object type has the following constraints:
1 Object type could not as key column type
2 Object type doesn't support all indices (BloomFilter, short key, zone map, invert index)
3 Object type doesn't support filter and group by

In the implementation:

The Object type reuse the StringValue and StringVal, because in storage engine, the Object type is binary, it has a pointer and length.
2019-10-31 21:42:58 +08:00
41e55cfca9 Modify fixed partition feature (#1989)
1. Not support MAVALUE in multi partition column.
2. Fix the incorrect show create table stmt.
2019-10-16 16:03:46 +08:00
63fa260d3f Support prepare/close in UDF (#1985)
The prepare/close step of scalar function is already supported in execution framework, We only need to do is that support it in syntax and meta in frontend.

In addition, 'Hive' binary type of scalar function NOT supports prepare/close step, we need to make it supports.
2019-10-16 07:19:20 +08:00
ec7c8a2c6f Support adding fixed range partition
eg: ALTER TABLE test_table ADD PARTITION p0125 VALUES [("20190125"), ("20190126"));
2019-10-15 09:50:30 +08:00
62acf5d098 Limit the memory usage of Loading process (#1954) 2019-10-15 09:26:20 +08:00
b84ef013eb Fix the mistake for HLL in mini load (#1981)
[Docs] Fix mistakes for HLL column in mini load
2019-10-14 19:46:23 +08:00
ccc236484b Fix bug that failed to add KEY column to DUPLICATE KEY table (#1973) 2019-10-14 16:40:34 +08:00
ec3aa03c45 Add more routine load example (#1902) 2019-09-27 20:42:52 +08:00
40b9c3571b Support hll_empty function (#1825) 2019-09-25 09:28:02 +08:00
e8da855cd2 Support setting timezone for stream load and routine load (#1831) 2019-09-20 07:55:05 +08:00
e70e48c01e Add a ALTER operation to change distribution type from RANDOM to HASH (#1823)
Random distribution is no longer supported since version 0.9.
And we need a way to convert the random distribution to hash distribution.

    ALTER TABLE db.tbl SET ("distribution_type" = "hash");
2019-09-18 14:16:26 +08:00
714dca8699 Support table comment and column comment for view (#1799) 2019-09-18 09:45:28 +08:00
054a3f48bc Add where expr in broker load (#1812)
The where predicate in broker load is responsible for filtering transformed data.
The docs of help and operator has been changed.
2019-09-17 11:32:40 +08:00
9aa2045987 Refactor alter job (#1695) 2019-09-12 16:31:29 +08:00
c354f30767 Fix mistake in docs (#1796) 2019-09-12 14:15:06 +08:00
b327643132 Fix bug that failed to limit the mem usage of HLL column when loading (#1778)
Should use arena to allocate mem for HyperLogLog column.
2019-09-11 10:20:46 +08:00
044489b92f Optimize some kinds of load jobs (#1762)
1. Support specifying label to Insert Into stmt.

    INSERT INTO tbl1 WITH LABEL label1 ...;

2. Return job' state corresponding to the existing label in result of stream load.

    ...
    "Status": "Label Already Exists",
    "ExistingJobStatus": "FINISHED"
    ...

3. Return the recent 2000 transactions in SHOW PROC '/transactions'
2019-09-09 22:11:12 +08:00
3a33f3d350 Make bitmap_union agg column support insert into and broker load (#1721) 2019-08-30 14:44:51 +08:00
6865f4238b Add limit to show tablet stmt (#1547)
Also add some where predicates for filtering results
ISSUE #1687
2019-08-28 16:25:12 +08:00
1e4dd77d2a Add bitmap agg type and udaf (#1610) 2019-08-26 14:24:42 +08:00
978b1ee1af Add strict mode in Routine load, Stream load and Mini load (#1677) 2019-08-20 21:56:45 +08:00
8e6814cfcd Support setting timeout for stream load (#1670) 2019-08-20 15:43:03 +08:00
ba6d728f26 Enable parsing columns from file path for Broker Load (#1582) (#1635)
Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv

This patch is able to parse columns from file path like in Spark(Partition Discovery).

This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column.
2019-08-19 09:39:21 +08:00
0e6560ceca Fix document typo (#1657) 2019-08-16 14:52:32 +08:00
1ed25ad83d Add kafka_default_offsets when no partiotion specify
Support read kafka partition from start (#1642)
2019-08-16 13:30:26 +08:00
add6266c71 Broker load supports function (#1592)
* Broker load supports function
The commit support the column function in broker load.
The grammar of LoadStmt has not been changed.
Example:
columns terminated by ',' (tmp_c1, tmp_c2) set (c1=tmp_c1+tmp_c2)

Also, the old function is compatible such as default_value, strftime etc.
After this commit, there are no difference in column function between stream load and broker load except old function.
2019-08-09 13:27:31 +08:00
326d765c64 Add doc of modify replication num upon partition (#1611) 2019-08-08 16:47:32 +08:00
fd2accbcf9 Modify some docs' format to make it work with document website (#1604) 2019-08-08 14:47:38 +08:00
4c2a3d6da4 Merge Help document to documentation (#1586)
Help document collation (integration of help and documentation documents)
2019-08-07 21:31:53 +08:00
000e9cf53c Add administrator guide of load (#1488)
The catalogue of load docs:
---- load-manual.md
---- broker-load-manual.md
---- insert-into-manual.md
---- stream-load-manual.md

This commit also changes max/min_stream_load_timeout to max/min_load_timeout.
The old config named stream_load_timeout means the max timeout suited for all types of load.
So the config name has been changed.
2019-07-25 21:02:32 +08:00
756a680143 Add a website builder of Doris documentations (#1396)
The build script locates in docs/website.
Built with Sphinx using a theme provided by Read the Docs.
2019-06-26 19:10:39 +08:00