Commit Graph

18263 Commits

Author SHA1 Message Date
82d0afc1ba FROM_UNIXTIME should only convert timestamp from 0 to 253402271999 (#1658)
which is between 1970-01-01 00:00:00 ~ 9999-12-31 23:59:59, otherwise, return null
2019-08-16 18:29:57 +08:00
57a1a718c7 print logs when parse scroll result failed (#1661) 2019-08-16 17:48:23 +08:00
0e6560ceca Fix document typo (#1657) 2019-08-16 14:52:32 +08:00
1ed25ad83d Add kafka_default_offsets when no partiotion specify
Support read kafka partition from start (#1642)
2019-08-16 13:30:26 +08:00
b85bd334de Remove tempory fail UT (#1659) 2019-08-16 11:26:41 +08:00
4f27129368 Fix get label when use StreamLoad (#1655) 2019-08-16 09:56:20 +08:00
b94892082b Use same dir during schema change (#1653) 2019-08-15 18:20:09 +08:00
a551abba58 Modify timediff documents (#1600) 2019-08-15 12:45:53 +08:00
38c82c039f Prepare _input_row_num and _input_rowsets_size before compaction (#1643) 2019-08-14 22:54:27 +08:00
4cc2285094 Make http server and thrift server backlog num configurable (#1638) 2019-08-14 19:58:48 +08:00
85e89b79d5 Print src tuple in error_sample file (#1641)
The src tuple could not be print in error_sample file when the value is filtered by strict mode.
This commit fix this issue.
2019-08-14 19:58:09 +08:00
cc7a2a3eb8 Check all tablet using partition tablet map during publish version (#1619) 2019-08-14 18:24:52 +08:00
03b99ddd37 Fix bug that bad replica can not be synchronized when report (#1634)
When the replica is recovered from bad on BE, the report process
should change the bad status of replica on FE to false, or the replica
can not be recovered.
2019-08-14 09:49:44 +08:00
199ff968dc Fix time zone compatibility (#1631) 2019-08-13 18:44:35 +08:00
dcb75729db Change cumulative compaction for decoupling storage from compution (#1576)
1. Calculate cumulative point when loading tablet first time.
2. Simplify pick rowsets logic upon delete predicate.
3. Saving meta and modify rowsets only once after cumulative compaction.
2019-08-13 18:25:56 +08:00
582c313190 Fix HLLContext cast error (#1632) 2019-08-13 14:38:21 +08:00
780a255112 Change the prefix of table info apis (#1625)
The pathtrie could not distinguish the different param key with the same prefix path.
So the prefix of table info apis has been change to /api/external which is used by spark-doris-connector.
2019-08-13 11:30:32 +08:00
032d0b41bb Fix compile error (#1630) 2019-08-13 10:00:18 +08:00
c8352a9e4d Insert select Stmt keep the same semantics with mysql (#1626) (#1628) 2019-08-13 09:56:26 +08:00
1e2a4c3b9b Fix tablet restore api in BE(#1623) (#1624) 2019-08-13 09:34:24 +08:00
69af50aa8c Time zone related BE function (#1598)
Details can be found in time-zone.md document
2019-08-12 20:57:59 +08:00
c0253a17fc Add block compression codec and remove not used codec (#1622) 2019-08-12 20:47:16 +08:00
af8256be2a Implement BetaRowsetWriter (#1590)
BetaRowsetWriter is used to write rowset in V2 segment format.

This PR contains several interface changes
1. Rowset.make_snapshot() is renamed to `link_files_to` because hard links are also useful in copy task, linked schema change, etc
2. Rowset.copy_files_to_path() is renamed to `copy_files_to` to be consistent with other names
3. RowsetWriter.mem_pool() is removed because not all rowset writers use MemPool
4. RowsetWriter.garbage_collection() is removed because it's not used by clients
5. SegmentGroup's make_snapshot() is removed because link_segments_to_path() provides similar functionality
2019-08-12 16:41:47 +08:00
3080139e78 Avoid load or query failed when doing alter job
2 cases:

Sometimes a missing version replica can not be repaired. Which may cause query failed
with error: failed to initialize storage reader. tablet=xxx, res=-214

Cancel the rollup job when there are load jobs on that table may cause load job fail.
We should ignore "table not found" exception when committing the txn.
2019-08-12 16:27:34 +08:00
b4ba77a594 Fix bug that encounter "No more data to read" when accessing broker (#1621)
The errorMsg in TBrokerOperationStatus is set to null because of
invalid string joint operation.
2019-08-12 14:06:47 +08:00
2bd01b23c7 Add page cache for column page in BetaRowset (#1607) 2019-08-12 10:42:00 +08:00
cf2155cf45 Add spark-doric-connector overview (#1526) 2019-08-11 13:00:16 +08:00
e3348c46a9 Expose data pruned-filter-scan ability (#1527) 2019-08-11 12:59:24 +08:00
add6266c71 Broker load supports function (#1592)
* Broker load supports function
The commit support the column function in broker load.
The grammar of LoadStmt has not been changed.
Example:
columns terminated by ',' (tmp_c1, tmp_c2) set (c1=tmp_c1+tmp_c2)

Also, the old function is compatible such as default_value, strftime etc.
After this commit, there are no difference in column function between stream load and broker load except old function.
2019-08-09 13:27:31 +08:00
a6d3099a68 Fix bug: localtime is not thread-safe,then changed to localtime_r. (#1614) 2019-08-08 22:00:43 +08:00
69de5df167 Fix bug that cluster balance may cause load job failed (#1581)
The bug is described in issue #1580 . And this patch will fix 2 cases of cluster balance

After finish adding the new replica, the new replica's version may not catch up with
the visible version, so the new replica may be treated as a stale and redundant replica, which
will be deleting at next tablet checking round.

I add a mark named needFurtherRepair to the newly added replica, only mark it when that replica's version does not catch up with visible version. This replica will receive a further repair at next tablet checking round, instead of being deleted.

When deleting the redundant replicas, there may be some load jobs on it. Delete these replicas may cause the load job fail.

Before deleting a redundant replica, I first mark the next txn id on that replica, and set replica's
state to CLONE. The CLONE state will ensure that no more load jobs will be on that replica, and we
will wait all load jobs before the marked txn id to be finished. After that, the replica can be deleted safely.
2019-08-08 18:38:30 +08:00
326d765c64 Add doc of modify replication num upon partition (#1611) 2019-08-08 16:47:32 +08:00
f4ad2381e6 Fix error DCHECK for partition_columns (#1606) 2019-08-08 16:29:08 +08:00
fd2accbcf9 Modify some docs' format to make it work with document website (#1604) 2019-08-08 14:47:38 +08:00
b937887133 Include header file for ‘preadv' which caused break build on ubuntu 18.04 (#1602) 2019-08-08 09:30:21 +08:00
60d997fe67 Fix errors when ES username and passwd is empty (#1601) 2019-08-08 09:29:23 +08:00
4c2a3d6da4 Merge Help document to documentation (#1586)
Help document collation (integration of help and documentation documents)
2019-08-07 21:31:53 +08:00
41cbedf57d Manage tablet by partition id (#1591) 2019-08-07 20:54:50 +08:00
dc4a5e6c10 Support Decimal Type when load Parquet File (#1595) 2019-08-07 19:52:23 +08:00
9402456f5b Fix parquet directory have empty file (#1593) 2019-08-07 15:08:22 +08:00
f7a05d8580 Support setting timezone variable in FE (#1587) 2019-08-07 09:25:26 +08:00
343b913f0d Fix a serious bug that will cause all replicas being deleted. (#1589)
Revert commit: eda55a7394fcec2f7b6c0aefd1628f9d63911815
2019-08-06 19:23:53 +08:00
b2e678dfc1 Support Segment for BetaRowset (#1577)
We create a new segment format for BetaRowset. New format merge
data file and index file into one file. And we create a new format
for short key index. In origin code index is stored in format like
RowCusor which is not efficient to compare. Now we encode multiple
column into binary, and we assure that this binary is sorted same
with the key columns.
2019-08-06 17:15:11 +08:00
ec7b9e421f Acquire tablet map write lock during tablet gc (#1588) 2019-08-06 17:14:39 +08:00
d938f9a6ea Implement the initial version of BetaRowset (#1568) 2019-08-06 10:40:16 +08:00
eda55a7394 Fix bug that unable to delete replica if version is missing (#1585)
If there is a redundant replica on BE which version is missing,
the tablet report logic can not drop it correctly.
2019-08-05 16:19:05 +08:00
93a3577baa Support multi partition column when creating table (#1574)
When creating table with OLAP engine, use can specify multi parition columns.
eg:

PARTITION BY RANGE(`date`, `id`)
(
    PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
    PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
    PARTITION `p201703_all`  VALUES LESS THAN ("2017-04-01")
)

Notice that load by hadoop cluster does not support multi parition column table.
2019-08-05 16:16:43 +08:00
938c6d4cdf Thrown TabletQuorumFailedException in commitTxn (#1575)
The TabletQuorumFailedException will be thrown in commitTxn while the success replica num of tablet is less then quorom replica num.
The Hadoop load does not handle this exception because the push task will retry it later.
The streaming broker, insert, stream and mini load will catch this exception and abort the txn after that.
2019-08-04 15:54:03 +08:00
6c21a5a484 Switch MAKE_TEST off in build.sh (#1579) 2019-08-03 22:49:35 +08:00
cefe1794d4 Fix bug that replicas of a tablet may be located on same host (#1517)
Doris support deploy multi BE on one host. So when allocating BE for replicas of
a tablet, we should select different host. But there is a bug in tablet scheduler
that same host may be selected for one tablet. This patch will fix this problem.

There are some places related to this problem:

1. Create Table
    There is no bug in Create Table process.

2. Tablet Scheduler
    Fixed when selecting BE for REPLICA_MISSING and REPLICA_RELOCATING.
    Fixed when balance the tablet.

3. Colocate Table Balancer
    Fixed when selecting BE for repairing colocate backend sequence.
    Not fix in colocate group balance. Leave it to colocate repairing.

4. Tablet report
    Tablet report may add replica to catalog. But I did not check the host here,
    Tablet Scheduler will fix it.
2019-08-01 10:26:06 +08:00