Commit Graph

130 Commits

Author SHA1 Message Date
9aa2045987 Refactor alter job (#1695) 2019-09-12 16:31:29 +08:00
76987275b9 Fix result of unix_timestamp() (#1727) 2019-08-30 21:39:16 +08:00
3a33f3d350 Make bitmap_union agg column support insert into and broker load (#1721) 2019-08-30 14:44:51 +08:00
378ce8ca04 Use double when converting TIME type value (#1722)
TIME type value is saved in DOUBLE, so using int64 can extend the time range.
2019-08-29 21:19:19 +08:00
0c2e344f45 Refactor DateLiteral class in FE (#1644)
1. Add FE time zone function support
2. Refactor DateLiteral class in FE
ISSUE #1583
2019-08-27 22:20:06 +08:00
7e981b2b14 Limit the disk usage to avoid running out of disk capacity (#1702)
Set high watermark and flood stage of disk used capacity.
And forbid some operations if disk usage is too high.
2019-08-27 22:18:17 +08:00
b6b860c808 Make the max recursion depth of distribution pruner configurable (#1709)
Add a new FE config 'max_distribution_pruner_recursion_depth'.
2019-08-27 22:17:07 +08:00
b28f4242c3 Add config max_concurrent_task_num_per_be (#1693)
This config is used to control the max concurrent task num per be.
The cluster max concurrent task num = max_concurrent_task_num_per_be * number of be.
2019-08-24 00:56:40 +08:00
00f8040bf3 Fix bug that 2 same stream load jobs may both be able to executed successfully (#1690)
This will cause 2 jobs trying to write same file, and cause file damaged.
2019-08-22 19:38:16 +08:00
2b2bc82ae2 Add timeout on snapshot of data (#1672)
Release snapshot when finishing or cancelling backup/restore job.
Snapshot may takes a lot disk space if not releasing them in time.
2019-08-21 21:18:53 +08:00
ba6d728f26 Enable parsing columns from file path for Broker Load (#1582) (#1635)
Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv

This patch is able to parse columns from file path like in Spark(Partition Discovery).

This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column.
2019-08-19 09:39:21 +08:00
6d73658207 Support checking error data row when doing INSERT (#1597)
If strict mode is true, and at least one row is filtered, the insert operation will fail and a url will be given to get the error rows.

```
ERROR 1064 (HY000): all partitions have no load data. url: http://host:ip/api/_load_error_log?file=__shard_2/error_log_insert_stmt_e0a620e93dc54461-b89ec64768367d25_e0a620e93dc54461_b89ec64768367d25
```

 If all rows are good, insert will return OK with affected rows:

```
Query OK, 1 row affected (0.26 sec)
```

If strict mode is false, and at least one row is good, the insert operation will return OK with affected rows and warnings. If has error row num, a label will be returned:

```
Query OK, 1 row affected, 1 warning (0.32 sec)
{'label':'7d66c457-658b-4a3e-bdcf-8beee872ef2c'}
```
2019-08-16 21:40:29 +08:00
b85bd334de Remove tempory fail UT (#1659) 2019-08-16 11:26:41 +08:00
780a255112 Change the prefix of table info apis (#1625)
The pathtrie could not distinguish the different param key with the same prefix path.
So the prefix of table info apis has been change to /api/external which is used by spark-doris-connector.
2019-08-13 11:30:32 +08:00
e3348c46a9 Expose data pruned-filter-scan ability (#1527) 2019-08-11 12:59:24 +08:00
add6266c71 Broker load supports function (#1592)
* Broker load supports function
The commit support the column function in broker load.
The grammar of LoadStmt has not been changed.
Example:
columns terminated by ',' (tmp_c1, tmp_c2) set (c1=tmp_c1+tmp_c2)

Also, the old function is compatible such as default_value, strftime etc.
After this commit, there are no difference in column function between stream load and broker load except old function.
2019-08-09 13:27:31 +08:00
4c2a3d6da4 Merge Help document to documentation (#1586)
Help document collation (integration of help and documentation documents)
2019-08-07 21:31:53 +08:00
f7a05d8580 Support setting timezone variable in FE (#1587) 2019-08-07 09:25:26 +08:00
93a3577baa Support multi partition column when creating table (#1574)
When creating table with OLAP engine, use can specify multi parition columns.
eg:

PARTITION BY RANGE(`date`, `id`)
(
    PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
    PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
    PARTITION `p201703_all`  VALUES LESS THAN ("2017-04-01")
)

Notice that load by hadoop cluster does not support multi parition column table.
2019-08-05 16:16:43 +08:00
938c6d4cdf Thrown TabletQuorumFailedException in commitTxn (#1575)
The TabletQuorumFailedException will be thrown in commitTxn while the success replica num of tablet is less then quorom replica num.
The Hadoop load does not handle this exception because the push task will retry it later.
The streaming broker, insert, stream and mini load will catch this exception and abort the txn after that.
2019-08-04 15:54:03 +08:00
0694b6a6fa Fix bugs of Broker load (#1546)
Use same UUID as query ID and load ID of a load execution plan.
Each load execution plan has a load ID, and as a plan, there is also a query ID.
We can use same UUID as query ID and load ID, for tracing the load process more easily.

Change the load ID when retrying a load execution plan.
When a load execution plan retry, the load ID should be changed, otherwise BE can not
distinguish the old and new load requests.

Cancel the running loading task when cancelling the broker load.
When user cancel a broker load, the running loading task should also be cancelled, or
it may occupies the worker thread for a long time.

Remove the unnecessary query report when doing load execution plan.
Only the last query report is needed.

Add a new BE config tablet_writer_rpc_timeout_sec.
It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing
about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading.

Use streaming_load_max_mb instead of mini_load_max_mb in BE config.

Add more logs for tracing a broker load process easily.
2019-07-27 20:17:05 +08:00
69040572fb Use different ID instead of table ID for base index of an OLAP table (#1524) 2019-07-23 15:48:45 +08:00
1f3f3f76a2 Fix the duplicated request bug of mini load (#1504)
The function of miniLoadBegin will return the txn_id.
If the backend sends the duplicated request to frontend, frontend will return the txn_id which was created by the same mini load.

The issue is that frontend returns the txn_id when the last same request hasn't been begun the txn.
The frontend returns the zero which is initialized txn_id and the be could not execute the load plan with a error txn_id.

The commit conbines the `createLoadJob` and `execute` together in the write lock. It protects the atomicity of `create` and `beginTxn`.
So the duplicated request cannot get the txn id before the last same request is finished.
2019-07-18 23:52:12 +08:00
2551248a52 Support grant GRANT_PRIV on database or table level (#1472)
Currently, GRANT_PRIV can only be granted on global level, which means
it can only be granted on *.*. Grant it on db.* or db.tbl are not allowed.

This will not be able to meet the requirement to create a user who has privilege
to grant privileges to other users on specified database or table, such as:

GRANT SELECT_PRIV ON db1.* TO cmy@'%';

So I extend the range of GRANT_PRIV. User can now grant GRANT_PRIV on
database or even table level, such as:

GRANT GRANT_PRIV ON db1.* TO cmy@'%';

And after being granted, the user cmy@'%' can now grant GRANT_PRIV on db1.* to
other users.
2019-07-16 19:25:18 +08:00
0d48a3961c Refactor Storage Engine (#1478)
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.

1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
   Nameing rowset with tablet_id and version will lead to
   many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
   with different format.
2019-07-15 21:18:22 +08:00
863eb83cb1 Delete deprecated code in Frontend (#1463)
1. Delete Clone/CloneJob/CloneChecker
    The old clone framework is deprecated, using TabletChecker/TabletScheduler instead
2. Delete old BackupJob/RestoreJob
3. Delete OP_DROP_USER edit log
4. Delete CLONE_DONE edit log
2019-07-12 13:34:05 +08:00
81f062dd4c Bug-fix: query es table would fail when thrift_port configuration not set (#1455) 2019-07-11 12:29:18 +08:00
9c96a688c3 Fix bug that user can set null default value to non-nullable column in create table stmt (#1453)
In create table stmt, column definition `k1 INT NOT NULL DEFAULT NULL`
should not be allowed
2019-07-10 23:48:29 +08:00
645f0a5279 Persist auth info in LoadJob (#1443)
The new class named 'AuthorizationInfo' is used to save the auth info in jobs.
The job doesn't need to retrieve the auth info by meta id which maybe throw the exception when db or table has been dropped or renamed.
The persistence of 'AuthorizationInfo' take effect in META_VERSION 56
2019-07-09 20:50:55 +08:00
bde362c3cd Modify insert operation's behavior (#1444)
Before changing default insert operation to streaming load, if the select result
of a insert stmt is empty, a label will still be returned to the user, and user
can use this label to check the insert load job's status.

After changing the insert operation, if the select result is empty, a exception
will be thrown to user client directly without any label.

This new usage pattern is not friendly to already existed users, which is forcing
them to change their way of using insert operation.

So I add a new FE config 'using_old_load_usage_pattern', default is false.
If set to true, a label will be returned to user even if the select result is empty.
2019-07-09 10:17:09 +08:00
7eab12a40e Support reading Parquet file when loading data (#1173) 2019-07-01 18:39:27 +08:00
6b83440b59 Get table name from DataSourceInfo instead of DataDesc (#1405) 2019-06-29 11:20:12 +08:00
5c1b4f641e Add report version for publish task (#1401) 2019-06-28 20:15:08 +08:00
e807064a88 Modify colocation creation logic (#1289) 2019-06-25 21:20:18 +08:00
322de9cd8e Add sql-function doc of cast_to_bigint (#1370) 2019-06-24 19:40:57 +08:00
120e7e9119 Add more UT for FEFunctions (#1344) 2019-06-21 21:54:14 +08:00
7550b2f09b Convert mini load to streaming mini load (#1323)
* This commit has brought contribution to streaming mini load
The operation of streaming mini load is sames as previous. Also, user can check the load by frontend.
The difference is that streaming mini load finish the task before reply of REST API while the non-streaming only register a load.

* When updating doris
Updating fe or be firstly are also supported. After fe and be are updated, the streaming mini load will take effect.

* For multi mini load
The non-streaming mini load still has been used by multi mini load. The behavior of multi mini load has not been changed.

* Add a interface named isSupportedFunction
This function is used to protect the correctness of new feature which consists of be and fe during updaing.
2019-06-21 19:34:50 +08:00
ea71277094 Support mysql client 8.0 connection fe (#1349)
for example:
mysql --default-auth=mysql_native_password -P9030 -utest -ptest123456 -hA.B.C.D
2019-06-21 19:15:34 +08:00
b002ba04d9 Fix the error of duplicated label (#1303) 2019-06-14 14:13:38 +08:00
ff0dd0d2da Support SSL authentication with Kafka in routine load job (#1235) 2019-06-07 16:29:01 +08:00
f424321625 Fix IllegalArgumentException in LoadManager (#1240) 2019-06-04 22:23:13 +08:00
309b779a7d Check colocate table name should be case-sensitive (#1224) 2019-05-30 22:47:22 +08:00
180d8e5cbd Modify some thirdparties (#1228)
1. Change Kafka java client from 2.0.0 to 0.10.1.1. Because high version client may not support low server server.
2. Enable SSL in librdkafka
2019-05-30 21:23:37 +08:00
fa4ac9f751 Replay GlobalVariable by Annotation (#1219) 2019-05-29 19:21:42 +08:00
f648bdd968 Fix datediff function (#1208) 2019-05-28 15:55:31 +08:00
f985ea99fc Add support column reference in LOAD statement (#1162) 2019-05-15 20:26:10 +08:00
ffe3eaa1a7 Implement adddate, days_add and from_unixtime function in FE (#1149) 2019-05-13 16:59:52 +08:00
15c9be4dfe Fix bug that balance task always choose high usage path (#1143) 2019-05-11 22:07:17 +08:00
ae18cebe0b Improve colocate table balance logic for backend added (#1139)
1. Improve colocate table balance logic for backend added
2. Add more comment
3. Break loop early
2019-05-11 21:49:51 +08:00
1eeb5ea891 Add str_to_date function in fe (#1118) 2019-05-09 17:20:44 +08:00