Commit Graph

54 Commits

Author SHA1 Message Date
5f9359d618 Use SleepFor() instead of usleep() (#3211) 2020-03-29 14:18:19 +08:00
a77515fe03 [Backup] Fix backup job block at SNAPSHOTING phase (#3058)
This bug occurred when BE make snapshot, the version required by fe had been merged into the cumulative version, so the snapshot task could not complete the task even if it retried. In order to solve this problem, the BackupJob could be set to CANCELLED, and the user could continue to retry the job.

Fix #3057
2020-03-11 14:05:02 +08:00
aa58cd99d9 Fix disks_total_capacity metric bug (#2988)
Now disks_total_capacity metric is a user specified capacity, but
disks_avail_capacity is the disk's actual available capacity, so
disks_total_capacity may be less than disks_avail_capacity, and
UsedPct on FE may be a negative number as a result.
We'd better to use disk actual capacity for disks_total_capacity metric.
2020-03-02 19:09:50 +08:00
625411bd28 Doris support in memory olap table (#2847) 2020-02-18 10:45:54 +08:00
9ee1704859 [util] Import util tools from KUDU (#2905)
1. MonoTime/MonoDelta
   MonoTime: The MonoTime represents a particular point in time, relative to some fixed but unspecified reference point.
   MonoDelta: The MonoDelta class represents an elapsed duration of time, the delta between two MonoTime instances.

2. CountDownLatch
   This is a C++ implementation of the Java CountDownLatch
2020-02-14 18:01:16 +08:00
d0e2fc3305 Remove resource_info related members from TaskWorkerPool (#2704)
The `TResourceInfo` was used to help `cgruops` to isolate resources,
but it is no longer used.

In fact, the `TResourceInfo` information is no longer carried in
the requests from FE to BE.
2020-01-16 14:39:08 +08:00
c39d35df4c Add tablet compaction score metrics (#2427)
[Metric] Add tablet compaction score metrics

Backend:
    Add metric "tablet_max_compaction_score" to monitor the current max compaction
    score of tablets on this Backend. This metric will be updated each time
    the compaction thread picking tablets to compact.

Frontend:
    Add metric "tablet_max_compaction_score" for each Backend. These metrics will
    be updated when backends report tablet.
    And also add a calculated metric "max_tablet_compaction_core" to monitor the
    max compaction core of tablets on all Backends.
2019-12-12 17:46:59 +08:00
c5ce72215d Optimize tablet report with expired transaction. (#2215)
When there are lots of expired transactions on BE, and with large
number of tablet, the report thread may become to slow. Because it
has to iterate the whole transaction map for each tablet.

But this is unnecessary. We should first build a expired transaction
map with 'tablet id' as key. And for each tablet, we only need to seek
the expired transaction map once with tablet id, instead of traversing
the whole transaction map.
2019-11-15 23:03:21 +08:00
11872d5cf6 Sending clear txn task explicitly after transaction being aborted (#2182) 2019-11-13 11:22:45 +08:00
0bcfddab92 Remove clear_alter_task (#2056)
Alter task has been refactored and clear_alter_task is not necessary.
2019-10-24 18:57:14 +08:00
58c882fa2a Remove SchemaChangeV1 (#2014) 2019-10-21 15:07:28 +08:00
f130bd3e7b Use Env function to operate directory (#1980)
Now Env has unify all environment operation, such as file operation.
However some of our old functions don't leverage it. This change unify
FileUtils::scan_dir to use Env's function.
2019-10-15 09:25:12 +08:00
7df1418ff4 Check transaction_id in TClearTransactionTaskRequest (#1872) 2019-09-26 10:15:43 +08:00
9aa2045987 Refactor alter job (#1695) 2019-09-12 16:31:29 +08:00
7e981b2b14 Limit the disk usage to avoid running out of disk capacity (#1702)
Set high watermark and flood stage of disk used capacity.
And forbid some operations if disk usage is too high.
2019-08-27 22:18:17 +08:00
2b2bc82ae2 Add timeout on snapshot of data (#1672)
Release snapshot when finishing or cancelling backup/restore job.
Snapshot may takes a lot disk space if not releasing them in time.
2019-08-21 21:18:53 +08:00
a6f0b5c789 Change RowsetWriter num_rows() return int64_t (#1535) 2019-07-24 10:44:38 +08:00
c34b35e6c4 Add ALTER_TABLET task in be (#1497)
This a for the new implementation of alter table process.
2019-07-23 15:16:21 +08:00
755b12cd75 Add partition id to tablet meta in be (#1490)
FE uses partition_id to publish version. BE should check whether all tablets related with this partition have the version. But Tablet in BE does not have partition id in its metadata. So that BE could not check it.

This patch will add partition id to tablet meta during report task.
Sync at most 10k tablets during set tablet meta.
2019-07-17 14:07:55 +08:00
0d48a3961c Refactor Storage Engine (#1478)
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.

1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
   Nameing rowset with tablet_id and version will lead to
   many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
   with different format.
2019-07-15 21:18:22 +08:00
5c1b4f641e Add report version for publish task (#1401) 2019-06-28 20:15:08 +08:00
9d03ba236b Uniform Status (#1317) 2019-06-14 23:38:31 +08:00
488e3825f7 Fix bug that restore process in BE causes BE crash (#1193)
When calling SnapshotLoader.move(), all files should be revoked if they
are in GC queue, or the file may be deleted after move() success.
2019-05-23 19:22:29 +08:00
a08170fd50 Enhance the usabilities (#1100)
* Enhence the usabilities

1. Add metrics to monitor transactions and steaming load process in BE.
2. Modify BE config 'result_buffer_cancelled_interval_time' to 300s.
3. Modify FE config 'enable_metric_calculator' to true.
4. Add more log for tracing broker load process.
5. Modify the query report process, to cancel query immediately if some instance failed.

* Fix bugs
1. Avoid NullPointer when enabling colocation join with broker load
2. Return immediately when pull load task coordinator execution failed
2019-05-07 15:55:04 +08:00
d3251a19f7 Modify the method to obtain some metrics (#904) 2019-04-10 19:37:48 +08:00
daa9d975ca Fix bugs of Tablet Scheduler (#600) 2019-01-29 15:35:07 +08:00
9a272f0592 Optimize something (#585)
1. Optimize the error msg of Tablet scheduler.
2. Missing helper nodes info when modify Frontends.
3. Fix bug that olap tablet's header lock is not released.
2019-01-25 14:27:33 +08:00
798a66e6a0 Implement new tablet repair and balance framework (#336)
More detail, see issue #540
2019-01-16 13:29:17 +08:00
5b1e3d3f40 Optimize backup & restore process (#460)
1. Print broker address for debug.
2. Do not letting backup job cancelled if it already in state UPLOAD_INFO.
3. Cancel task on Backends when job is cancelled.
4. Show detail progress of backup and restore job.
5. Make 'show snapshot' result more readable.
6. Change upload and download thread num of backup and restore in Backend to 1.
2018-12-24 16:49:16 +08:00
0341ffde67 Revert commit 'Add log to detect empty load file' (#447)
Looks like we still need to send push task without file
to Backend, or load job will fail.
Fix it later.
2018-12-19 12:29:12 +08:00
5a6e5cfd07 Add log to detect empty load file (#445)
We find that a load file may not be generated for rollup tablet,
add a log to observe.
2018-12-18 12:44:36 +08:00
e99468c387 Add HttpClient class (#441)
Replace FileDownloader with HttpClient, this patch change clone_copy and
pusher's download.
2018-12-18 09:55:11 +08:00
ff95f23615 Remove OLAP_LOG_DEBUG AND OLAP_LOG_TRACE log format (#378)
Use VLOG(3) and VLOG(10) instead
2018-12-03 10:08:21 +08:00
3d324e38ea Remove OLAP_LOG_INFO log format. Use LOG(INFO) instead (#372) 2018-11-30 20:59:40 +08:00
49302955c8 Revert "Remove OLAP_LOG_INFO log format. Use LOG(INFO) instead (#370)" (#371)
This reverts commit a816925776de06dc7503ea7429802cad9042d0e4.
2018-11-30 20:56:51 +08:00
a816925776 Remove OLAP_LOG_INFO log format. Use LOG(INFO) instead (#370)
* Remove unused row-oriented format flags

* Remove unused row-oriented format flags

* Remove OLAP_LOG_INFO log format. Use LOG(INFO) instead
2018-11-30 20:36:58 +08:00
9a2ad18428 Add path info of replica in catalog (#327)
Add path info of replica in catalog

Also fix a bug that when calling check_none_row_oriented_table,
store is null, it cannot be used to create table.
Instead, OLAPHeader can be used to get storage type information.
2018-11-19 17:42:46 +08:00
37b4cafe87 Change variable and namespace name in BE (#268)
Change 'palo' to 'doris'
2018-11-02 10:22:32 +08:00
2868793b6b Change license to Apache License 2.0 (#262) 2018-11-01 09:06:01 +08:00
051aced48d Missing many files in last commit
In last commit, a lot of files has been missed
2018-10-31 16:19:21 +08:00
4f6f8572de Added: Add 3 new metrics of Backends: host_fd_metrics, process_fd_metrics and process_thread_metrics, to monitor open file number and thread number.
Added: Support getting column size and precision info of table or view using JDBC.

Updated: Change the promethues type name GAUGE to lowercase, to fit the latest promethues version.
Updated: Backend ip saved in FE will be compared with BE's local ip when doing heartbeat, to avoid false positive heartbeat response.
Updated: Using version_num of tablet instead of calculating nice value to select cumulative compaction candicates.

Fixed: Predicates should not be pushed down to subquery which contains limit clause.
Fixed: Fix the formula of calculating BE load score.
Fixed: Fix a bug that in some edge cases, non-master Fontend may wait for a unnecessary long timeout after forwarding cmd to Master FE.
Fixed: A bug that granting privs on more than one table does not work.
Fixed: Support 'Insert into' table which contains HLL columns.
Fixed: ExportStmt' toSql() method may throw NullPointer Exception if table does not exist.
Fixed: Remove unnecessary 'get capacity' operation to avoid IO impact.

Internal commit id: merge to c16bd603a53dfe2089ff95704c698a738c317792
2018-10-26 14:48:21 +08:00
cc74efb3c5 merge to ddb65b69f9c788e359e191889cb31f15279c41ec (#224)
1. Apache HDFS broker support HDFS HA and Hadoop kerberos authentication.
2. New Backup and Restore function. Use Fs Broker to backup your data to HDFS or restore them from HDFS.
3. Table-Level Privileges. Grant fine-grained privileges on table-level to specified user.
4. A lot of bugs fixed.
5. Performance improvement.
2018-08-24 17:12:26 +08:00
19997510a6 merge to 9625ef157dd44c58802d63cb7547f037b75fd710 (#208)
1. Implement Backend http server using libevent instead of mongoose.
2. Remove Old Hypertable rpc framework, use brpc instead.
3. Change rpc from FE to BE to brpc.
4. Fs broker support HDFS HA.
5. add more metrics to monitor.
6. Lots of bug fixed.
2018-07-17 09:20:30 +08:00
9f7b1ea6d4 merge to 87fd4ebd9977afb1e1193429dd75c7c82caab204 (#202)
1. ix bugs in query layer.
2. remove some redundant code in BE
3. support specify multi helper node when starting FE
4. add proc 'cluster_load_statistic' to show load balance situation of Palo
2018-06-08 08:42:23 +08:00
2419384e8a push 3.3.19 to github (#193)
* push 3.3.19 to github

* merge to 20ed420122a8283200aa37b0a6179b6a571d2837
2018-05-15 20:38:22 +08:00
5de798fdd6 Merge code to github (#187)
* merge to 95787f8be1fd0ff215708fb0f49997b632876586
* Bugs fixed
2018-03-23 14:04:55 +08:00
ca3470119c find disk damage, report immediately (#183) 2018-02-01 15:19:17 +08:00
1eddd0d7ad report the real disk available capacity to fe (#156) 2017-12-14 10:49:19 +08:00
51d5c727a7 make UUID to be authentication token (#107) 2017-09-20 21:25:10 +08:00
db8c40e5f0 add authentication to DownloadAction (#91)
* add authentication to DownloadAction

1. use cluster_id as token;
2. add dir limit, only files in data dir can be accessed.

* enable authentication in DownloadAction by default
2017-09-13 16:54:00 +08:00