Commit Graph

2889 Commits

Author SHA1 Message Date
ad3a0fb79d [Metric] Add metrics of tablet version num distribution (#5665)
Add metrics (P50, P75, P90, P95, P99, etc.) to show the distribution of tablets version count.

```
# TYPE doris_be_tablet_version_num_distribution histogram
doris_be_tablet_version_num_distribution{quantile="0.50"} 9.21429
doris_be_tablet_version_num_distribution{quantile="0.75"} 11.7949
doris_be_tablet_version_num_distribution{quantile="0.90"} 13
doris_be_tablet_version_num_distribution{quantile="0.95"} 13
doris_be_tablet_version_num_distribution{quantile="0.99"} 13
doris_be_tablet_version_num_distribution_sum 950
doris_be_tablet_version_num_distribution_count 100
```
2021-04-23 21:23:22 +08:00
12b2447724 [Optimize] Optimize the assign logic of compaction tasks to avoid starvation (#5683)
1. Reserve a slot to ensure that the cumulative compaction can be executed.
2. Ensure that the compaction score metric can be updated.
2021-04-23 09:48:37 +08:00
b93e841688 [Optimize] Remove expired txns in batch to avoid holding lock for too long (#5675)
This CL mainly changes:

1.  Add a config to control the expire time of load job

    Add a new FE config "streaming_label_keep_max_second" to control
    the expire time of some high frequency load job such as INSERT and STREAM LOAD.

2. Remove expired txn in batch to avoid holding transaction lock for a long time
2021-04-23 09:47:30 +08:00
ec29322c10 [Bug] Avoid waiting too long when rpc is slow. (#5669)
Total execution time should not longer than stream load timeout.
2021-04-23 09:46:40 +08:00
b12399657b [Bug] Fix StackOverFlow bug after rewriting the column descs of load stmt (#5656)
1. Fix a Self-referencing bug.
2. Also fix a display bug of SHOW BROKER.
2021-04-23 09:45:39 +08:00
4fa25b6eb9 [Optimize] make tablet meta checkpoint to be threadpool model (#5654)
Currently Tablet meta checkpoint is a memory-exhausted operation.
If a host has 12 disks, it will start 12 threads to do tablet meta checkpoint.
In our experience, the data size of one tablet can be as high as 2G.
If 12 threads do the checkpoint at the same time, it maybe cause OOM.

Therefore, this PR try to solve this problem.
Firstly, it only start one thread to produce table meta checkpoint tasks.
Secondly, it creates a thread pool to handle these tasks.
You can configure the size of the thread pool to control the parallelism in case of OOM.
It is a producer-customer model.
2021-04-23 09:45:15 +08:00
7eea811f6b [Feature] Flink Doris Connector (#5372) (#5375) 2021-04-23 09:43:48 +08:00
f5cf008bcc [Bug] Fix stream load UT failed (#5692)
Also move the stream load rocksdb dir to the first of storage root paths
2021-04-23 09:33:42 +08:00
a803ceea86 [refactor] Remove boost mutex, use std::mutex instead (#5684)
* Remove boost mutex, use std::mutex instead

* replace shared_mutex
2021-04-22 11:29:36 +08:00
8332581df8 [Optimize] Filter partitions by where header when generate stream load plan (#5667) 2021-04-21 16:56:17 +08:00
9b0d6ecaf0 [Log] Add error msg when tablet not found (#5659)
Before drop a tablet, it will try to find the tablet in tablet map.
But the tablet maybe has been not existed.
Therefore, it is better to print the error message and error status.
2021-04-21 16:37:47 +08:00
a4f8194111 [Audit][Stream Load] Support audit function for stream load (#5452)
Record finished stream load job (both successful job and failed job) into audit log
so that we can see when the stream load job was executed and check the details of stream load jobs.
2021-04-21 16:36:12 +08:00
b121ad6b95 [Refactor] Remove jprotobuf and use grpc client to connect brpc service (#5650) 2021-04-21 10:25:58 +08:00
d15fe05f3c [Metrics] Add metrics to monitor BE's agent task queue size (#5648)
* [Metrics] Add metrics to monitor BE's agent task queue size

Sometimes, user's DDL or background task may last a long time,
it's not easy to find out which procedure has problem.
This patch add metric to monitor BE's agent task queue size,
which would be helpful for troubleshooting.

The raw metrics on BE looks like:
doris_be_agent_task_queue_size{type="REPORT_OLAP_TABLE"} 0
doris_be_agent_task_queue_size{type="REPORT_DISK_STATE"} 0
doris_be_agent_task_queue_size{type="REPORT_TASK"} 0
doris_be_agent_task_queue_size{type="CHECK_CONSISTENCY"} 0
doris_be_agent_task_queue_size{type="DELETE"} 0
doris_be_agent_task_queue_size{type="CLEAR_TRANSACTION_TASK"} 0
doris_be_agent_task_queue_size{type="PUBLISH_VERSION"} 0
doris_be_agent_task_queue_size{type="UPLOAD"} 0
doris_be_agent_task_queue_size{type="DROP_TABLE"} 0
doris_be_agent_task_queue_size{type="CREATE_TABLE"} 39
doris_be_agent_task_queue_size{type="RELEASE_SNAPSHOT"} 0
doris_be_agent_task_queue_size{type="STORAGE_MEDIUM_MIGRATE"} 245
doris_be_agent_task_queue_size{type="CLONE"} 0
doris_be_agent_task_queue_size{type="MOVE"} 0
doris_be_agent_task_queue_size{type="ALTER_TABLE"} 0
doris_be_agent_task_queue_size{type="DOWNLOAD"} 0
doris_be_agent_task_queue_size{type="PUSH"} 0
doris_be_agent_task_queue_size{type="UPDATE_TABLET_META_INFO"} 0
doris_be_agent_task_queue_size{type="MAKE_SNAPSHOT"} 0

* fix typo
2021-04-21 09:23:33 +08:00
ab64dbe65d not need to deserialize again (#5644)
Co-authored-by: wangxixu <wangxixu@xiaomi.com>
2021-04-21 09:23:12 +08:00
be733cfa9c [Metrics] Add some large memtrackers' metric (#5614)
MemTracker can provide memory consumption for us to find out which
module consume more memory, but it's just a current value, this patch
add metrics for some large memory consumers, then we can find out
which module consume more memory in timeline, it would be useful to
troubleshoot OOM problems and optimize configs.
2021-04-21 09:15:04 +08:00
6f46c52f1d Make the max stmt length to be loaded in audit table can be defined by user. (#5673) 2021-04-21 09:14:32 +08:00
18c260913b [BE] Set RowBlock's parent memtracker as instance tracker. (#5607) 2021-04-21 09:14:14 +08:00
a2e83e65d2 [BE] Add scanner/etl thread pool queue size metric. (#5619)
* [BE] Add scanner/etl thread pool queue size metric.

* Fix compilation problem.
2021-04-20 09:14:57 +08:00
7445051174 [Refactor] fix warning in gcc8+, update rapidjson (#5649) 2021-04-20 09:14:44 +08:00
caa7af3d1f [Metric] Standardise histogram metric output for prometheus (#5671)
Update histogram metric's output to prometheus standard, the output
like following:

test_registry_task_duration{quantile="0.50"} 50
test_registry_task_duration{quantile="0.75"} 75
test_registry_task_duration{quantile="0.90"} 95.8333
test_registry_task_duration{quantile="0.95"} 100
test_registry_task_duration{quantile="0.99"} 100
test_registry_task_duration_sum 5050
test_registry_task_duration_count 100
2021-04-20 09:14:28 +08:00
2b7d7e3385 fix typo (#5652) 2021-04-20 09:13:56 +08:00
d74c2b7092 update README.md about the version of docker image (#5653)
Co-authored-by: weizuo <weizuo@xiaomi.com>
2021-04-20 09:13:43 +08:00
bf4159c74d Remove unused keyword (#5676) 2021-04-20 09:11:45 +08:00
b4a4c29651 (#5638) stale rowset can't be access after clone finish (#5639)
* (#5638) stale rowset can't be access after clone finish

* clear stale rowset after clone
2021-04-19 09:27:41 +08:00
4313639157 [Metric] Add metrics for routine load (#5641)
* [Metric] Add metrics for routine load

Add following metrics for routine load:
doris_fe_job{job="load", type="ROUTINE_LOAD", state="NEED_SCHEDULE"} 0
doris_fe_job{job="load", type="ROUTINE_LOAD", state="RUNNING"} 1
doris_fe_job{job="load", type="ROUTINE_LOAD", state="PAUSED"} 0
doris_fe_job{job="load", type="ROUTINE_LOAD", state="STOPPED"} 0
doris_fe_job{job="load", type="ROUTINE_LOAD", state="CANCELLED"} 0

* change UTC
2021-04-19 09:26:58 +08:00
6be03f339c [Bug] Fix bug that tablets are not dropped when replacing tables (#5627)
When replacing table with swap = false, the origin table's tablets
should be removed from tablet inverted index.

Co-authored-by: xxiao2018 <benghua3_1@sina.com>
2021-04-19 09:26:19 +08:00
6f000c2ea4 (#5621) using KeyMayExist instead of Get when visit rocksdb (#5622) 2021-04-19 09:25:59 +08:00
b6c0767754 [Bug] Fix alter table failed when none of new load jobs succeed on alter replica (#5617)
* [Bug] Fix alter table failed when none of new load jobs succeed on altering replica

Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-04-15 15:55:57 +08:00
892fbf6ded Update s3_reader_test.cpp (#5658) 2021-04-15 10:59:30 +08:00
c4cc681d14 remove boost_foreach, using c++ foreach instead (#5611) 2021-04-15 10:52:29 +08:00
50ffae44b1 [BUG] Fix bug that Unique/AGG key will read all key columns when there are two rowsets (#5632) 2021-04-14 00:12:05 +08:00
9403157da4 [DOC] Add docs of warning of Docker env 1.3 of JDK 11 and some doc of FE config. (#5628) 2021-04-14 00:10:30 +08:00
489aae5021 [Doc] Update README.md about gcc version (#5624)
`BE`code can compile pass using `GCC 7.3.0`, but  compiling `BE UT`code will encounter error using `GCC 7.3.0`.  After upgrade gcc version to `10.2.1` which is the same version as in docker image, `BE UT` can be compiled successfully.
2021-04-14 00:10:08 +08:00
bb98c08489 [LICENSE] Remove unused font file and fix license (#5642)
* remove unused font file and fix license
2021-04-13 14:21:38 +08:00
70b0d113e1 fix license header (#5630) 2021-04-13 11:01:29 +08:00
75db273b93 [Doris On ES][WIP] Support external ES table with SSL secured and configurable node sniffing (#5325)
Support external ES  table with `SSL` secured and configurable node sniffing
2021-04-12 11:23:49 +08:00
91043bb116 [BuildEnv] Update Dockerfile add libasan (#5616) 2021-04-11 22:02:49 +08:00
a25e3afa5b [Colocate plan][Step1] Colocate join covers more situations (#5521)
The old colocate join can only cover the case where the child is hash or scan.
In fact, as long as the child's data distribution meets the requirements,
no matter what the plan node on the child node is, a colocate join can be performed.
2021-04-11 22:02:03 +08:00
9c7d8d2e98 [Bug] Fix bug that isPreAggregation is incorrectly set (#5608)
1. The MaterializedViewSelector should be reset for each scan node
2. On the BE side, columns with delete conditions must be added to the return column.
2021-04-09 14:13:06 +08:00
40f53ac71f fix bitmap unit test failed (#5610) 2021-04-08 10:25:59 +08:00
2b5e4dc951 Add fmt library to speed up mysql text result serialization (#5554)
* Add fmt library to speed up mysql text result serialization

* use BUILD_SYSTEM instead of make

Co-authored-by: gaodayue <gaodayue@bytedance.com>
2021-04-08 09:16:58 +08:00
b423274f17 [Enhance] Make MemTracker more accurate (#5515) (#5516)
* [Enhance] Make MemTracker more accurate (#5515)
 This PR main about:
 1. Improve the readability of MemTrackers' name
 2. Add the MemTracker of:
    * Load
    * Compaction
    * SchemaChange
    * StoragePageCache
    * TabletManager
 3. Change SchemaChange to a Singleon

* revise some code for Code Review

* change the name of mem_tracker

* keep reader_context have the same lifetime of rowset_reader in schema change.

* change vlog notice to log(warning) in schema change
2021-04-08 09:14:55 +08:00
514d245a1f make like predicate operator public (#5552)
Co-authored-by: wangxixu <wangxixu@xiaomi.com>
2021-04-08 09:14:15 +08:00
904a2ac86a [Bug] keytab file maybe not thread-safe (#5578)
* make keytab file thread-safe

* remove delte file code

Co-authored-by: wangxixu <wangxixu@xiaomi.com>
2021-04-08 09:12:45 +08:00
5a0a039026 [refacor] Remove minizip source code (#5571) 2021-04-08 09:12:22 +08:00
d641a26490 [Refactor] Remove boost filesystem (#5579)
* use std::filesystem instead of boost
Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
2021-04-08 09:11:59 +08:00
1e8c4584ab [Function] Add BE udf bitmap_min (#2538) (#5581)
this function will return the min result of the input bitmap .
2021-04-08 09:11:32 +08:00
3e34fe2529 [FE] [BUG] GroupingFunctionCallExpr: realChildren should be copied too. (#5584) 2021-04-08 09:11:11 +08:00
771cb64290 Solve the situation that the hardware information of the Web UI home page cannot be loaded (#5585)
Solve the situation that the hardware information of the Web UI home page cannot be loaded

Co-authored-by: zhangjf@shuhaisc.com <zhangfeng800729>
2021-04-08 09:10:56 +08:00