Commit Graph

499 Commits

Author SHA1 Message Date
e8da855cd2 Support setting timezone for stream load and routine load (#1831) 2019-09-20 07:55:05 +08:00
7bf02d0ae7 Fix bug that routine load may mistakenly skipped some data (#1832)
Reproduce:
1. start a routine load, send a routine load task to BE
2. BE executes task successfully and commit to FE.
3. Commit request failed on FE because database is renamed(throw db not found exception)
4. After commit failed, BE will send rollback request to FE.
5. FE receive this rollback request and mistakenly update the routine load progress,
   because the number of loaded rows in this rollback request's attachment is larger than 0
2019-09-20 07:54:11 +08:00
e516eba940 Remove the "author" tag (#1829) 2019-09-19 16:59:08 +08:00
e70e48c01e Add a ALTER operation to change distribution type from RANDOM to HASH (#1823)
Random distribution is no longer supported since version 0.9.
And we need a way to convert the random distribution to hash distribution.

    ALTER TABLE db.tbl SET ("distribution_type" = "hash");
2019-09-18 14:16:26 +08:00
714dca8699 Support table comment and column comment for view (#1799) 2019-09-18 09:45:28 +08:00
3f63bde5cb Fix 'Invalid Column Name' error when loading parquet file (#1820) 2019-09-17 21:17:55 +08:00
c4e28f0d13 Update FeConstants meta version to VERSION_62 (#1822)
This should be modified along with commit a232a56c0
2019-09-17 17:30:22 +08:00
054a3f48bc Add where expr in broker load (#1812)
The where predicate in broker load is responsible for filtering transformed data.
The docs of help and operator has been changed.
2019-09-17 11:32:40 +08:00
ede51da777 Resolve reduce/reduce conflict in our syntax (#1811) 2019-09-16 20:25:05 +08:00
a232a56c06 Add parallel_exchange_instance_num to set parallel after exchange (#1788) 2019-09-16 16:41:14 +08:00
86feddb5d7 Fix bug that dead lock may happen when drop table during alter table process (#1800)
the cancel() function will try get database's write lock, while its caller may already
hold the database's read lock.
2019-09-16 00:12:00 +08:00
dcea6daf4f Fix Cluster meta write error (#1802) 2019-09-13 22:06:55 +08:00
9aa2045987 Refactor alter job (#1695) 2019-09-12 16:31:29 +08:00
a85ffa1c2a Fix FE log error (#1785) 2019-09-11 16:13:34 +08:00
044489b92f Optimize some kinds of load jobs (#1762)
1. Support specifying label to Insert Into stmt.

    INSERT INTO tbl1 WITH LABEL label1 ...;

2. Return job' state corresponding to the existing label in result of stream load.

    ...
    "Status": "Label Already Exists",
    "ExistingJobStatus": "FINISHED"
    ...

3. Return the recent 2000 transactions in SHOW PROC '/transactions'
2019-09-09 22:11:12 +08:00
8b663bf416 Fix bug: unknown column from the inline view (#1770)
Revert code from PR-1617. The column who belongs to inline view need to be initialized by alias.
2019-09-09 20:57:42 +08:00
cd5cfea5cc Encapsulate HLL logic (#1756) 2019-09-09 15:52:10 +08:00
b85cb0071b Bug-fix: error result of union stmt (#1758)
ISSUES-1725: The result of union stmt whose child is outer join stmt is incorrect.

Example:
sql: (select k1 from empty) union all (select b.k1 k1 from left_table a left join empty  b on a.k2 = b.k2);
context: the empty table has no data.
error result: 0
expect result: null

Reason:
The judgment (columns k1 who belongs to union tuple is nullable ) is incorrect.
It could not be determined by slot attribute of children when the slot is produced by the outer join.
The slot A is not nullable while the result of outer join is nullable which is same as slot A.
So, the judgment needs to consider if the slot is come from the outer join.
2019-09-08 21:26:31 +08:00
f23ac0eadd Planner support push down predicates past agg, win and sort (#1471) 2019-09-08 09:30:46 +08:00
2f52ae7988 Add PreAgg Hint (#1617)
eg:
SELECT xxx FROM tbl /*+ PREAGGOPEN */ 
This will open pre-aggregation forcibly for the specified table
2019-09-06 19:47:18 +08:00
a84c64785f Shuffle partitioned instance to avoid skew (#1744) 2019-09-04 18:31:18 +08:00
726509e9b9 Add MIN/MAX aggregate function compatible with char/varchar (#1739) 2019-09-04 17:28:27 +08:00
fddfffe4c0 Fix bug that failed to create a new partition when no partition in a table (#1688) 2019-09-04 13:36:17 +08:00
6f4feca3dc Add rowset id generator to FE and BE (#1678) 2019-09-02 18:51:31 +08:00
ba170aa9e6 Fix NPE of DataDescription (#1735)
When user does not specify column mapping in BrokerLoadStmt, NPE may be thrown.
2019-09-02 16:03:26 +08:00
76987275b9 Fix result of unix_timestamp() (#1727) 2019-08-30 21:39:16 +08:00
06b87d998a Error check about column which has no default value (#1728)
This commit check the all of parsed column include hadoop function and other function.
Otherwise, the load will thrown the "Column has no default value" exception while the column also has been defined by a non-hadoop function.
2019-08-30 20:23:32 +08:00
3a33f3d350 Make bitmap_union agg column support insert into and broker load (#1721) 2019-08-30 14:44:51 +08:00
378ce8ca04 Use double when converting TIME type value (#1722)
TIME type value is saved in DOUBLE, so using int64 can extend the time range.
2019-08-29 21:19:19 +08:00
c541c3fd59 Fix bug that failed to get enough normal replica because path hash is not set. (#1714)
Path Hash of a replica in metadata should be set immediately after replica is created.
And we should not depend on path hash to find replicas. Because path hash may be set
delayed.
2019-08-28 19:37:38 +08:00
6865f4238b Add limit to show tablet stmt (#1547)
Also add some where predicates for filtering results
ISSUE #1687
2019-08-28 16:25:12 +08:00
0c2e344f45 Refactor DateLiteral class in FE (#1644)
1. Add FE time zone function support
2. Refactor DateLiteral class in FE
ISSUE #1583
2019-08-27 22:20:06 +08:00
7e981b2b14 Limit the disk usage to avoid running out of disk capacity (#1702)
Set high watermark and flood stage of disk used capacity.
And forbid some operations if disk usage is too high.
2019-08-27 22:18:17 +08:00
b6b860c808 Make the max recursion depth of distribution pruner configurable (#1709)
Add a new FE config 'max_distribution_pruner_recursion_depth'.
2019-08-27 22:17:07 +08:00
a1b92768dd Add a loaded rows in SHOW LOAD result (#1686)
Loaded rows will be updated periodically by query report. So that
user can see that a load job is still running or being blocked.
2019-08-27 14:13:47 +08:00
1e4dd77d2a Add bitmap agg type and udaf (#1610) 2019-08-26 14:24:42 +08:00
b28f4242c3 Add config max_concurrent_task_num_per_be (#1693)
This config is used to control the max concurrent task num per be.
The cluster max concurrent task num = max_concurrent_task_num_per_be * number of be.
2019-08-24 00:56:40 +08:00
00f8040bf3 Fix bug that 2 same stream load jobs may both be able to executed successfully (#1690)
This will cause 2 jobs trying to write same file, and cause file damaged.
2019-08-22 19:38:16 +08:00
2b2bc82ae2 Add timeout on snapshot of data (#1672)
Release snapshot when finishing or cancelling backup/restore job.
Snapshot may takes a lot disk space if not releasing them in time.
2019-08-21 21:18:53 +08:00
0792e06eed Fix NPE of insert load job persist operation (#1683)
tracking url may be null
2019-08-21 20:30:55 +08:00
9f50f84b68 Fix bug: "SHOW DATA" or "SHOW PARTITIONS", the DATA-SIZE less than 0 (#1680) 2019-08-21 15:33:26 +08:00
978b1ee1af Add strict mode in Routine load, Stream load and Mini load (#1677) 2019-08-20 21:56:45 +08:00
0a27ef030b Reduce the number of partition info in BrokerScanNode param (#1675)
And we should reduce the number of partition info in BrokerScanNode param if user already
set target partitions to load, instead of adding all partitions' info.
It will cause the size of RPC packet too large.
2019-08-20 19:30:57 +08:00
8e6814cfcd Support setting timeout for stream load (#1670) 2019-08-20 15:43:03 +08:00
731f78accc Don't persisted the data source info in broker load (#1665) 2019-08-19 15:45:21 +08:00
ba6d728f26 Enable parsing columns from file path for Broker Load (#1582) (#1635)
Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv

This patch is able to parse columns from file path like in Spark(Partition Discovery).

This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column.
2019-08-19 09:39:21 +08:00
6d73658207 Support checking error data row when doing INSERT (#1597)
If strict mode is true, and at least one row is filtered, the insert operation will fail and a url will be given to get the error rows.

```
ERROR 1064 (HY000): all partitions have no load data. url: http://host:ip/api/_load_error_log?file=__shard_2/error_log_insert_stmt_e0a620e93dc54461-b89ec64768367d25_e0a620e93dc54461_b89ec64768367d25
```

 If all rows are good, insert will return OK with affected rows:

```
Query OK, 1 row affected (0.26 sec)
```

If strict mode is false, and at least one row is good, the insert operation will return OK with affected rows and warnings. If has error row num, a label will be returned:

```
Query OK, 1 row affected, 1 warning (0.32 sec)
{'label':'7d66c457-658b-4a3e-bdcf-8beee872ef2c'}
```
2019-08-16 21:40:29 +08:00
82d0afc1ba FROM_UNIXTIME should only convert timestamp from 0 to 253402271999 (#1658)
which is between 1970-01-01 00:00:00 ~ 9999-12-31 23:59:59, otherwise, return null
2019-08-16 18:29:57 +08:00
1ed25ad83d Add kafka_default_offsets when no partiotion specify
Support read kafka partition from start (#1642)
2019-08-16 13:30:26 +08:00
b85bd334de Remove tempory fail UT (#1659) 2019-08-16 11:26:41 +08:00