Commit Graph

153 Commits

Author SHA1 Message Date
0341ffde67 Revert commit 'Add log to detect empty load file' (#447)
Looks like we still need to send push task without file
to Backend, or load job will fail.
Fix it later.
2018-12-19 12:29:12 +08:00
5a6e5cfd07 Add log to detect empty load file (#445)
We find that a load file may not be generated for rollup tablet,
add a log to observe.
2018-12-18 12:44:36 +08:00
b9201ece0b Parse thrift port from cluster state (#443) 2018-12-18 11:28:42 +08:00
7f014bdb11 Check meta context when update partition version (#438)
Partition.updateVisibleVersionAndVersionHash() is the only method that
may call Catalog.getCurrentCatalogJournalVersion() in a non-replay thread.

So we have to check whether MetaContext is null. If MetaContext is null, which
means this is a non-replay thread, and we do not need call Catalog.getCurrentCatalogJournalVersion().

Also modify the load logic to make delete job done more quickly.
2018-12-17 18:46:27 +08:00
45e42bd003 Redesign the access to meta version (#436)
Because the meta version is only be used in catalog saving and loading.
So currently this version is a field of Catalog class. And we can get this
version only by calling Catalog.getCurrentCatalogJournalVersion().

But in restore process, we need to read the meta data which is saved with
a specified meta version. So we need a flexible way to read a meta data
using a specified meta version, not only the version from Catalog.

So we create a new class called MetaContext. Currently it only has one field,
'journalVersion', to save the current journal version. And it is a
thread local variable, so that we can create a MetaContext anywhere we want,
and setting the 'journalVersion' which we want to use for reading meta.

Currently, there are 4 threads which is related to meta data saving and loading.

The Frontend starting thread, which will call Catalog.initialize() to load the image.
the Frontend state listener thread, which will listen the state changing, and call
transferToMaster() or transferToNonMaster().
Edit log replayed thread, which is created when calling transferToNonMaster().
It will replay edit log
Checkpoint thread, which is created when calling transferToMaster(). It will do
the checkpoint periodically.
Notice that we get the 'current meta version' only when 'READING' the meta (not WRITING).
So we only need to take care of all 'READING' threads.
We create MetaContext thread local variable for these 4 threads, and thread 2,3,4's
meta context inherit from thread 1's meta context. Because thread 1 will load the origin
image file and get the very first meta version.

And we leave the Catalog.getCurrentCatalogJournalVersion()'s name unchanged, just
change its content, because we don't want change a lot codes this time.

On the other hand, we add the current meta version in backup job info file when doing
backup job. So that when restoring from a backup snapshot, we can know which meta
version we should use for read the meta.
And also , we add a new property "meta_version" for Restore stmt, so that we can specify
the meta version used for reading backup meta. It is for those old backup snapshots
which do not has meta version saving in backup job info file.
2018-12-17 10:05:16 +08:00
548da0546a Fix compile error in run-fe-ut.sh (#415) 2018-12-11 17:46:13 +08:00
81ee15ed25 Fix compile failure in RLTaskTxnCommitAttachment (#414) 2018-12-11 16:00:07 +08:00
8913c23134 Fix compile failure in GlobalTransactionMgrTest (#412) 2018-12-11 13:53:38 +08:00
fc41842c18 Add a frontend interface for committing RoutineLoadTask (#368)
1. add a needSchedulerTasksQueue in LoadManager: the RoutineLoadTaskScheduler will poll task from this queue and schedule task.
2. add a frontend interface named rlTaskCommit: commit txn, update offset and renew a task for the same partitions
3. add extra property in transaction state: in rlTaskCommit, extra property which looks like {"job_id": xxx, "progress": xxx}
When fe initialize routine load job meta from logs, all of txn state which related to routine load job will be used for initializing progress of job.

Add a TxnStateChangeListener interface for transaction
1. onCommitted , onAborted, beforeAborted will be called by different type of txn
2. RoutineLoadJob will update job progress and renew a task when onCommitted callback
3. Add TxnStateChangeListener into TransactionState
4. set transactionState to committed will call onCommitted callback if callback is not null
5. set transactionState to aborted will call beforeAborted and onAborted
6. beforeAborted in RoutineLoadJob will check if there is related task when TxnStatusChangeReason is TIMEOUT. It will prevent abort when there is a related task by throw TransactionException
7. Other reason of abort will not prevent abort. The onAborted will be call and job state will be change to paused

Change extra to TxnCommitAttachment in TLoadTxnCommitRequest
1. The KAFKA source of TTxnSourceType means that this is a routine load task commit. And the TRLTaskTxnCommitAttachment is the commitInfo of this task.
2. TRLTaskTxnCommitAttachment will be convert to RLTaskTxnCommitAttachment which include progress of this task, task id, numOfErrorData etc.

Add param TxnCommitAttachment into commitTransaction
1. The TxnCommitAttachment will be updated in commitTransaction
2018-12-11 11:06:25 +08:00
ac01da4984 Clear client pool when heartbeat failed (#408)
When heartbeat failed, we should clear the connections cached
in client pool, or we will get broken connections from the pool.
Since we don't have the REOPEN logic(which may cause ugly code style),
a broken connection may cause a rpc blocked and failed.
So clear them all and recreate them when needed is a simple way to
resolve this problem.

We only clear connections in backend and broker pool.
No need to clear heartbeat pool because heartbeat is very frequent,
such the connections can be invalid automatically.
2018-12-10 18:52:51 +08:00
b5737ee59a Refactor heartbeat logic (#403)
* Refactor heartbeat logic

Currently we only have Backend heartbeat. And without Frontend
or Broker heartbeat, we don't know the status of these nodes,
thus can't do failover logic in some cases.

1. Add Frontend and Broker heartbeat.
    Frontend heartbeat using BootstrapFinish http rest api
    Broker heartbeat using ping() rpc.
2. All heartbeats are managed in HeartbeatMgr.
3. Rename BrokerAddress to FsBroker.
2018-12-10 14:41:12 +08:00
b4d89b19e8 Fix bug that ColumnType is no longer used (#400) 2018-12-06 19:19:29 +08:00
088a914e11 Support Colocate Join (#245) (#246)
* Support colocate join

Colocate join means two table are distributed by the columns being joined,
then we can join them locally on each backend.

Colocate join no data movement and has more concurrency.
2018-12-06 18:59:17 +08:00
7b2007f852 Revert 'Support 'NO_BACKSLASH_ESCAPES' sql_mode (#392) 2018-12-05 20:23:34 +08:00
cb7e8ff2bb Fix compile failure in ScanNode (#384) 2018-12-04 16:51:48 +08:00
31d1630149 Support 'NO_BACKSLASH_ESCAPES' sql_mode (#382) 2018-12-04 11:33:04 +08:00
d9eb8a2ca1 Fix cast error in BrokerScanNode (#383) 2018-12-04 11:30:03 +08:00
c556ed13f6 Support TRUNCATE TABLE stmt (#377)
* Support TRUNCATE TABLE stmt

User can use TRUNCATE TABLE stmt to empties a table
or partitions completely.
Unlike DELETE, it will drop the tablets directly, and
without any performance impact.

* Fix bugs that new partition should use new ID

* Use equals() to compare Integer

* Fix compile bug

* Fix bug on single range parititon

* Check table's state again after creating partition
2018-12-01 21:18:27 +08:00
9447a349ec Subsititue ColumnType to Type (#366)
* Subsititue ColumnType to Type
2018-11-30 16:30:30 +08:00
5694bcbd78 Fix stream load failure when target table contains HLL and insert failure when it contains subquery (#359) 2018-11-29 15:40:04 +08:00
f1718578f3 Fix insert error when it contains HLL (#358) 2018-11-27 17:10:41 +08:00
cfefa71daa Fix cast error in StreamLoadScanNodeush (#356) 2018-11-27 16:03:33 +08:00
b2d89dfee9 Add connection id to CurrentQueryStatisticsProcDir (#355) 2018-11-27 14:28:39 +08:00
cddd864d83 Avoid 'No more data to read' error when handling stream load RPC (#354)
* Avoid 'No more data to read' error when handling stream load rpc

1. Catch throwable of all stream load rpc.
2. Avoid setting null string as error msg of rpc result status.

* Change setError_msgs to addToError_msgs
2018-11-27 14:18:41 +08:00
dedfccfaf5 Optimize the publish logic of streaming load (#350)
1. Only collect all error replicas if publish task is timeout.
2. Add 2 metrics to monitor the success of failure of txn.
3. Change publish timeout to Config.load_straggler_wait_second
2018-11-26 19:01:50 +08:00
bbdf4fba4a Add distributor which schedule task to be fairly, for routine load job (#333)
Step1: updateBeIdTaskMaps, remove unavailable BE and add new alive BE
Step2: process timeout tasks, if a task has already been allocated to BE but not finished before DEFAULT_TASK_TIMEOUT, it will be discarded.
       At the same time, the partitions belong to old tasks will be allocated to a new task. The new task with a signature will be added in the queue of needSchedulerRoutineLoadTask.
Step3: process all needSchedulerRoutineLoadTasks, allocate task to BE. The task will be executed by BE.
2018-11-23 10:35:10 +08:00
485db34f1e Modify partition's version name to what it means (#334)
* Modify partition's version name to what it means.

1. committedVersion(Hash) -> visibleVersion(Hash)
2. currentVersion(Hash) -> committedVersion(Hash)
3. add some comment to make the code more readable

* Check if editlog is null in CatalogIdGenerator
    To avoid unit test failure
2018-11-21 19:21:16 +08:00
791e89568e Change PaloMetrics' name and Catalog's Id generator (#329)
* Change PaloMetrics' name and Catalog's Id generator
1. Remove 'Palo' prefix of class Metric.
2. Add a new CatalogIdGenerator to replace the old AtomicLong, to avoid too many edit logs.
3. Add a new histogram to monitor write letency of edit log write.

* modify next id logic

* fix a bug that Metric is not init before using HISTO_EDIT_LOG_WRITE_LATENCY

* fix a problem
2018-11-20 18:59:18 +08:00
9a2ad18428 Add path info of replica in catalog (#327)
Add path info of replica in catalog

Also fix a bug that when calling check_none_row_oriented_table,
store is null, it cannot be used to create table.
Instead, OLAPHeader can be used to get storage type information.
2018-11-19 17:42:46 +08:00
44029937e4 Add scheduler routine load job for stream load (#313)
1. fetch need_scheduler routine load job
2. caculate current concurrent task number of job
3. divide kafka partition into tasks
2018-11-15 21:04:22 +08:00
8ac9492b11 Fix SHOW BACKENDS return ERROR (#320)
In some case, errMsg in Backend maybe null. we change it when check it
nut null

Issue: #317
2018-11-15 20:14:39 +08:00
d7ee57e881 Optimize quota unit (#309)
Originally, we can only set quota in bytes unit. This commit add quota unit:K/KB/M/MB/G/GB/T/TB/P/PB
for convenience.
2018-11-15 14:03:52 +08:00
063f7d7a9a Fix code LICENSE for file modified from LevelDB. (#300) 2018-11-12 16:09:40 +08:00
ae8d16c81e Fix failed cases in regression test (#299) 2018-11-12 11:15:39 +08:00
2081b7fea5 Be compatible with old RPC (#296)
Add palo.PInternalService which can server old version palo's client.

Issue: #293
2018-11-10 15:46:45 +08:00
6f206ae9c6 Fix some license (#290) 2018-11-09 14:30:09 +08:00
1d8fc4bb69 Improve cardinality, avgRowSize, numNodes stat info in OlapScanNode (#256)
Currently, the cardinality, avgRowSize, numNodes stat info in OlapScanNode is none, So the broadcastCost and partitionCost are both wrong and Doris couldn't auto choose a best join strategy.

So we should make the statistical information in OlapScanNode more precise.
2018-11-07 13:59:05 +08:00
fc8f78d81c Fix unit test failure (#286) 2018-11-07 12:53:11 +08:00
0c4edc2b3c Fix BE can't be grayscale upgraded (#285) 2018-11-07 09:34:39 +08:00
370e73ce5d Fix truncation error in CastExpr (#283) 2018-11-06 18:57:13 +08:00
8d7bd01a71 Simplify constant Expr (#255)
Simplify constant Expr could improve Partition Pruning. The examples for constant Expr Simplifing:

1 + 1 + 1 --> 3
date_add('2018-08-08', 1) --> 2018-08-09
year('2018-07-24')*12 + month('2018-07-24') -> 24223
2018-11-06 17:24:54 +08:00
acb332833a Fix view missed parenthesis bug (#253) 2018-11-06 15:25:40 +08:00
cb36e411e9 Support AnalyticExpr in View (#248) 2018-11-05 20:39:21 +08:00
8b665a41c8 Support NULLS LAST and NULLS FIRST syntax (#252)
Allow User specify the null ordering

NULLS FIRST: specifies that NULL values should be returned before
non-NULL values.
NULLS LAST: specifies that NULL values should be returned after
non-NULL values.
2018-11-05 20:35:10 +08:00
9ae631adb6 Fix InsertStmt reAnalyze bug (#251) 2018-11-05 15:36:40 +08:00
69f3b02485 Fix a bug that user can not kill it own connection (#276) 2018-11-02 16:36:59 +08:00
312dfd10bb Change SQL built-in function's symbol (#274) 2018-11-02 16:24:21 +08:00
847d29e394 Delete useless debug log (#250) 2018-11-02 16:06:01 +08:00
c92892bbb9 Fix UnionStmt toSql bug (#249) 2018-11-02 14:50:09 +08:00
ad12d907da Failed to register equal conjuncts which refer more than three tuples (#266)
Change-Id: I7eaf28ee6db35671971108f3edefe908d46ae87f
2018-11-01 20:34:48 +08:00