* [Bug] Filter out unavaiable backends when getting scan range location
In the previous implementation, we will eliminate non-surviving BEs in the Coordinator phase.
But for Spark or Flink Connector, there is no such logic, so when a BE node is down,
it will cause the problem of querying errors through the Connector.
* fix ut
* fix compiule
To avoid showing too many memtracker on BE web pages.
The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE.
OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata.
TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task.
VERBOSE is used for other more detailed memtrackers.
1. Add /api/compaction/run_status to show the running compaction tasks.
2. Support do base and cumulative compaction for one tablet at same time.
3. Modify some log level.
4. Add a feedback document.
The previous replay logic does not record the size of the map, which eventually resulted in EOF when reading the log.
This pr replaces the replay logic directly with json.
At the same time, the replay logic of image is supplemented.
The pr ensure that the attributes 'lastStreamLoadTime' of backend can be correctly recorded in the image.
Record finished stream load job (both successful job and failed job) into audit log
so that we can see when the stream load job was executed and check the details of stream load jobs.
* [Enhance] Make MemTracker more accurate (#5515)
This PR main about:
1. Improve the readability of MemTrackers' name
2. Add the MemTracker of:
* Load
* Compaction
* SchemaChange
* StoragePageCache
* TabletManager
3. Change SchemaChange to a Singleon
* revise some code for Code Review
* change the name of mem_tracker
* keep reader_context have the same lifetime of rowset_reader in schema change.
* change vlog notice to log(warning) in schema change
* update gcc to gcc 10 and support c++17
update brpc to 0.9.7
update boost to 1.73
remove third-party boost 1.54 for mysql
* update cmake version
* ignore jdk version
* remove unused patch
* avoid use SYS_getrandom call
In version 0.13, we support a more efficient compaction logic.
This logic will maintain multiple version paths of the tablet.
This can avoid -230 errors and can also support incremental clone.
But the previous incremental clone uses the incremental rowset meta recorded in `incr_rs_meta`.
At present, the incremental rowset meta recorded in `incr_rs_meta` and the records
in `stale_rs_meta` are duplicated, and the current clone logic does not adapt to the
new multi-version path, resulting in many cases not triggering incremental clone.
This CL mainly modified:
1. Removed `incr_rs_meta` metadata
2. Modified the clone logic. When the clone is incremented, it will try to read the rowset in `stale_rs_meta`.
3. Delete a lot of code that was previously used for version compatibility.
At present, the application of vlog in the code is quite confusing.
It is inherited from impala VLOG_XX format, and there is also VLOG(number) format.
VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG
Based on PR #4475, this patch add a new feature for single tablet migration between different disks by http.
Co-authored-by: weizuo <weizuo@xiaomi.com>
ThreadSanitizer, aka TSAN, is a useful tool to detect multi-thread
problems, such as data race, mutex problems, etc.
We should detect TSAN problems for Doris BE, both unit tests and
server should pass through TSAN mode, to make Doris more robustness.
This is the very beginning patch to fix TSAN problems, and some
difficult problems are suppressed in file 'tsan_suppressions', you
can suppress these problems by setting:
export TSAN_OPTIONS="suppressions=tsan_suppressions"
before running:
`BUILD_TYPE=tsan ./run-be-ut.sh --run`
For the task of rebalancing tablet among different disks on the same BE,
It might be an effective strategy to ensure all tablets under the same partition
evenly distribute on the different disks. Thus, it is necessary to obtain the
distribution of tablets under the same partition between different disks on a BE.
This patch add a new http interface for BE to acquire the distribution of tablets
under a partition between different disks on the same BE.
add a flag of fuzzy_parse, if the json file all object keys are the same and has same order, we only need to parse the first row, and then use index instead key to parse value
It would be helpful to monitor the count of timeout canceled fragments
when there is any issuse cause fragments execute failed or queued too
long time.
The name and another config name are close to each other and are indistinguishable.
So this pr modify the name.
The document description has also been changed
(1) The implementation logic of `tablets web page` is that:
Firstly, get all the tablets of the BE;
Secondly, return specific number tablets in front to web page according to the value of request parameter `limit`.
This patch optimize the implementation logic of `tablets web page` that getting specific number tablets in BE
according to the value of request parameter `limit` and then return them to web page.
(2) It will return default `1000` tablets through http interface `http://be_host:webserver_port/tablets_page`.
If the number of tablet in the BE is less than 1000, there will be `coredump` in the following code:
be/src/http/action/tablets_info_action.cpp
Support persistence of configuration items modified at runtime via HTTP API.
```
FE:
GET /api/_set_config?key=value&persist=true
BE
POST /api/update_config?key=value&persist=true
```
The modified config will be saved in `fe_custom.conf` or `be_custom.conf`.
And when process starts, it will load `fe.conf/be.conf` first, then `fe_custom.conf/be_custom.conf`.
After tablet level metrics is supported, the http metrics API may response
a very large body when a BE holds a large number of tablets, and cause heavy
network traffic.
This patch introduce http content compression to reduce network traffic.
BE can not graceful exit because some threads are running in endless
loop. This patch do the following optimization:
- Use the well encapsulated Thread and ThreadPool instead of std::thread
and std::vector<std::thread>
- Use CountDownLatch in thread's loop condition to avoid endless loop
- Introduce a new class Daemon for daemon works, like tcmalloc_gc,
memory_maintenance and calculate_metrics
- Decouple statistics type TaskWorkerPool and StorageEngine notification
by submit tasks to TaskWorkerPool's queue
- Reorder objects' stop and deconstruct in main(), i.e. stop network
services at first, then internal services
- Use libevent in pthreads mode, by calling evthread_use_pthreads(),
then EvHttpServer can exit gracefully in multi-threads
- Call brpc::Server's Stop() and ClearServices() explicitly
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
1. Disable the MySQL client and LZO library by default when building the Doris.
MySQL client library is used for MySQL external table feature.
This feature will be replaced by the new ODBC external table soon.
LZO library is used to compress/decompress data of some old data format of Doris,
which is no longer used anymore.
2. Add missing license to some files.
3. For all non-Apache-License code, all are explained in NOTICE file and the corresponding license is declared.
4. Remove the js source code from webroot, it will be downloaded as thirdparty
* Implements the grammar of the batch delete #4051
* Process create, alter table when table has delete sign column
* Support the syntax for enabling the delete column
* Automatically filtered deleted data in the select statement.
* Automatically add delete sign when create rollup table
TODO:
* Optimize the reading and compaction logic on the be side, so that the data marked as deleted will be completely deleted during base compaction