Commit Graph

45 Commits

Author SHA1 Message Date
0131c33966 [Enhance] Improve the readability of memtrackers' name (#5455)
Improve the readability of memtrackers' name, then you will be happy to read website be_ip:port/mem_tracker
2021-03-11 22:33:31 +08:00
7eae3e280a [optimization] use inline optimize ExprContext::get_value (#5385) 2021-02-16 22:35:14 +08:00
51ccd44865 [Load Parallel][3/3] Support parallel delta writer (#5369)
In the previous broker load, multiple OlapTableSinks would send data to the same LoadChannel,
and because of the lock granularity problem, LoadChannel could only process these requests serially,
which made it impossible to make full use of cluster resources.

This CL modifies the related locks so that LoadChannel can process these requests in parallel.

In the test, with a size of 20G, the load speed of 334 million rows of data in 3 nodes has been
increased from 9min to 5min, and after enabling 2 concurrency, it can be increased to 3min.

Also modify the profile of load job.
2021-02-07 22:42:18 +08:00
93a4c7efc1 [LOG] Standardize the use of VLOG in code (#5264)
At present, the application of vlog in the code is quite confusing.
It is inherited from impala VLOG_XX format, and there is also VLOG(number) format.
VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG
2021-01-21 12:09:09 +08:00
58e58c94d8 [TSAN] Fix tsan bugs (part 1) (#5162)
ThreadSanitizer, aka TSAN, is a useful tool to detect multi-thread
problems, such as data race, mutex problems, etc.
We should detect TSAN problems for Doris BE, both unit tests and
server should pass through TSAN mode, to make Doris more robustness.
This is the very beginning patch to fix TSAN problems, and some
difficult problems are suppressed in file 'tsan_suppressions', you
can suppress these problems by setting:
export TSAN_OPTIONS="suppressions=tsan_suppressions"

before running:
`BUILD_TYPE=tsan ./run-be-ut.sh --run`
2021-01-15 09:45:11 +08:00
5d6a1a7290 [Load] support ignoring eovercrowded when tablet sink (#5156)
If adding the ignore_eovercrowded flag, the `PTabletWriterAddBatchRequest`
won't failed on `EOVERCROWDED` to avoid load jobs failed in this error.
It only effects the NodeChannel(the load job), other rpc requests will still check if overcrowded.
2021-01-09 23:40:51 +08:00
ca9e5c4785 [Bug] Add a flag to prevent repeated close operation of OlapTabletSink (#5034)
The close method of OlapTabletSink may be called twice.
In the open_internal() method of plan_fragment_executor, close is called once.
If an error occurs in this call, it will be called again in fragment_mgr.
So here we use a flag to prevent repeated close operations.

Co-authored-by: morningman <chenmingyu@baidu.com>
2020-12-09 09:30:09 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
09f97f8a05 [Refactor] Fixes some be typo part 2 (#4747) 2020-10-20 09:28:57 +08:00
83f6f46c34 [Config] Limit the version number of tablet (#4687)
Add a BE config `max_tablet_version_num` to limit the version number of a single tablet.
To avoid too many versions
2020-10-13 10:08:16 +08:00
75e0ba32a1 Fixes some be typo (#4714) 2020-10-13 09:37:15 +08:00
b780df697a [refactor] Optimize threads usage mode in BE (#4440)
BE can not graceful exit because some threads are running in endless
loop. This patch do the following optimization:
- Use the well encapsulated Thread and ThreadPool instead of std::thread
  and std::vector<std::thread>
- Use CountDownLatch in thread's loop condition to avoid endless loop
- Introduce a new class Daemon for daemon works, like tcmalloc_gc,
  memory_maintenance and calculate_metrics
- Decouple statistics type TaskWorkerPool and StorageEngine notification
  by submit tasks to TaskWorkerPool's queue
- Reorder objects' stop and deconstruct in main(), i.e. stop network
  services at first, then internal services
- Use libevent in pthreads mode, by calling evthread_use_pthreads(),
  then EvHttpServer can exit gracefully in multi-threads
- Call brpc::Server's Stop() and ClearServices() explicitly
2020-09-06 20:19:14 +08:00
498b06fbe2 [Metrics] Support tablet level metrics (#4428)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. 
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-09-02 10:39:41 +08:00
e71152132c [metrics] Redesign metrics to 3 layers (#4115)
Redesign metrics to 3 layers:
    MetricRegistry - MetricEntity - Metrics
    MetricRegistry : the register center
    MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several 
        MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift 
        clients and servers, and so on. 
    Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, 
        thrift_opened_clients on a thrift_client entity.
    MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across 
        different MetricEntities.
2020-08-08 11:23:01 +08:00
10f822eb43 [MemTracker] make all MemTrackers shared (#4135)
We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web.
As follows:
1. nearly all MemTracker raw ptr -> shared_ptr
2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent)
3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec 
     before MemTracker's destructor. So we don't change these code.
4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared 
    too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile.
Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.
2020-07-31 21:57:21 +08:00
fdd65c50c4 [Bug] fix mem_tracker use-after-free & add UT for it (#3899) 2020-06-20 19:08:53 +08:00
51367abce7 [Bug] Fix bug that BE crash when doing Insert Operation (#3872)
Mainly change:
1. Fix the bug in `update_status(status)` of `PlanFragmentExecutor`.
2. When the FE Coordinator executes `execRemoteFragmentAsync()`, if it finds an RPC error, return a Future with an error code instead of exception.
3. Protect the `_status` in RuntimeState with lock
4. Move the `_runtime_profile` of RuntimeState before the `_obj_pool`, so that the profile will be
deconstructed after the object pool.
5. Remove the unused `ObjectPool` param in RuntimeProfile constructor. If I don't remove it,
RuntimeProfile will depends on the `_obj_pool` in RuntimeProfile.
2020-06-19 17:09:04 +08:00
7591527977 [Bug] Fix a bug that insert null bitmap crashes BE (#3830)
INSERT INTO VALUES to_bitmap('xx') may insert null into bitmap column, which may cause dirty data to be written.
2020-06-12 18:03:02 +08:00
1cc78fe69b [Enhancement] Convert metric to Json format (#3635)
Add a JSON format for existing metrics like this.
```
{
    "tags":
    {
        "metric":"thread_pool",
        "name":"thrift-server-pool",
        "type":"active_thread_num"
    },
    "unit":"number",
    "value":3
}
```
I add a new JsonMetricVisitor to handle the transformation.
It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor.
Also I add
1.  A unit item to indicate the metric better 
2. Cloning tablet statistics divided by database.
3. Use white space to replace newline in audit.log
2020-05-27 08:49:30 +08:00
fb02bb5cd9 [Load] Fix mem limit in NodeChannel (#3643) 2020-05-22 09:11:59 +08:00
c85d847b1e [CompileBug] fix a compile error (#3502)
NodeChannel::mark_close() missing `return`
2020-05-07 23:01:46 +08:00
94539e7120 Non blocking OlapTableSink (#3143)
ImplementaItion Notes
NodeChannel
_cur_batch -> _pending_batches: when _cur_batch is filled up, move it to _pending_batches.
add_row() just produce batches.
try_send_and_fetch_status() tries to consume one pending batch. If has in flight packet, skip send in this round.
So we can add one sender thread to be in charge of all node channels try_send.

IndexChannel
init(), open() stay the same.
Use for_each_node_channel() to expose the detailed changes of NodeChannel.(It's more easy to read & modify)
Sender thread
See func OlapTableSink::_send_batch_process()

Why use polling?
If we use wait/notify, it will notify when generate a new batch. We can't skip sending this batch, coz it won't notify the same batch again. So wait/notify can't avoid blocking simply.
So I choose polling.
It's wasting to continuously try_send(), but it's difficult to set the suitable polling interval. Thus, I add std::this_thread::yield() to give up the time slice, give priority to other process/threads (if there are other process/threads waiting in the queue).
2020-05-07 10:43:41 +08:00
0430714ca9 Remove redundant call function _wait_in_flight_packet() (#3399)
The function `_wait_in_flight_packet` has been called in `_send_cur_batch`.
No need to call twice.
2020-04-27 20:45:25 +08:00
2ed184e06a Add config: tablet writer open rpc timeout (#3258) 2020-04-03 16:43:56 +08:00
d4c1938b5c Open datetime min value limit (#3158)
the min_value in olap/type.h of datetime is 0000-01-01 00:00:00, so we don't need restrict datetime min in tablet_sink
2020-03-24 10:52:57 +08:00
8eb413fa69 [Bug][RoutineLoad] Fix bug that routine Load encounter "label already used" exception (#2959)
This CL modify 2 things:

1. When a routine load task submit failed, it will not be put back to the task queue.
2. The rpc timeout when executing a routine load task in BE is set to `query_timeout` of the task plan.

ISSUE: #2964
2020-02-22 22:01:14 +08:00
3c539aac54 [Refactor] Some tiny refactor on streaming-load related code (#2891)
Mainly contains the following modifications:
1. Use `std::unique_ptr` to replace some naked pointers
2. Modify some methods from member-method to local-static-function
3. Modify some methods do not need to be public to private
4. Some formatting changes: such as wrapping lines that are too long
5. Remove some useless variables
6. Add or modify some comments for easier understanding

No functional changes in this patch.
2020-02-13 10:42:52 +08:00
1c9cfa7e0f Fix invalid to_bitmap input lead to BE core (#2706) 2020-01-08 22:14:37 +08:00
6815979ba5 Fix invalid to_bitmap input lead to BE core (#2510) 2019-12-19 21:28:00 +08:00
a3b7cf484b Set the load channel's timeout to be the same as the load job's timeout (#2405)
[Load] 

When performing a long-time load job, the following errors may occur. Causes the load to fail.

load channel manager add batch with unknown load id: xxx

There is a case of this error because Doris opened an unrelated channel during the load
process. This channel will not receive any data during the entire load process. Therefore,
after a fixed timeout, the channel will be released.

And after the entire load job is completed, it will try to close all open channels. When it try to
close this channel, it will find that the channel no longer exists and an error is reported.

This CL will pass the timeout of load job to the load channel, so that the timeout of load channels
will be same as load job's.
2019-12-06 21:51:00 +08:00
a2d7c42042 Add a variable to specifically limit the memory usage of the load part in the insert operation (#2305)
This variable is mainly for INSERT operation, because INSERT operation has both query and load part.
Using only the exec_mem_limit variable does not make a good distinction of memory limit between the two parts.
2019-11-28 13:03:11 +08:00
95a3b4ccfe Add object type (#1948)
Add a new type: Object. Currently, it's mainly for complex aggregate metrics(HLL , Bitmap).

The Object type has the following constraints:
1 Object type could not as key column type
2 Object type doesn't support all indices (BloomFilter, short key, zone map, invert index)
3 Object type doesn't support filter and group by

In the implementation:

The Object type reuse the StringValue and StringVal, because in storage engine, the Object type is binary, it has a pointer and length.
2019-10-31 21:42:58 +08:00
62acf5d098 Limit the memory usage of Loading process (#1954) 2019-10-15 09:26:20 +08:00
cbf6214762 Add a miss break (#1923) 2019-09-30 20:32:05 +08:00
8f016d3ab2 Make HLL be able to handle invalid data (#1908)
In this change list
1. validate HLL column when loading data, if data is invalid, this row
will be filtered.
2. seems as empty HLL when serializing invalid type of HLL data, with
this change, all ingested data will be valid.
3. seems as empty HLL when deserializing nullptr or invalid type of HLL data.
With this change, dirty data can be handled normally.
4. rename function empty_hll to hll_empty.
5. disable memtable_flush_execute_test because this will fails
sometimes. When tearing down, some thread is not joined, and they will
visit destroyed resource, which is invalid.
2019-09-29 10:55:23 +08:00
3a33f3d350 Make bitmap_union agg column support insert into and broker load (#1721) 2019-08-30 14:44:51 +08:00
a1b92768dd Add a loaded rows in SHOW LOAD result (#1686)
Loaded rows will be updated periodically by query report. So that
user can see that a load job is still running or being blocked.
2019-08-27 14:13:47 +08:00
cd2b8373c2 Fix Stream load double NumberTotalRows (#1664) 2019-08-19 12:23:43 +08:00
6d73658207 Support checking error data row when doing INSERT (#1597)
If strict mode is true, and at least one row is filtered, the insert operation will fail and a url will be given to get the error rows.

```
ERROR 1064 (HY000): all partitions have no load data. url: http://host:ip/api/_load_error_log?file=__shard_2/error_log_insert_stmt_e0a620e93dc54461-b89ec64768367d25_e0a620e93dc54461_b89ec64768367d25
```

 If all rows are good, insert will return OK with affected rows:

```
Query OK, 1 row affected (0.26 sec)
```

If strict mode is false, and at least one row is good, the insert operation will return OK with affected rows and warnings. If has error row num, a label will be returned:

```
Query OK, 1 row affected, 1 warning (0.32 sec)
{'label':'7d66c457-658b-4a3e-bdcf-8beee872ef2c'}
```
2019-08-16 21:40:29 +08:00
0694b6a6fa Fix bugs of Broker load (#1546)
Use same UUID as query ID and load ID of a load execution plan.
Each load execution plan has a load ID, and as a plan, there is also a query ID.
We can use same UUID as query ID and load ID, for tracing the load process more easily.

Change the load ID when retrying a load execution plan.
When a load execution plan retry, the load ID should be changed, otherwise BE can not
distinguish the old and new load requests.

Cancel the running loading task when cancelling the broker load.
When user cancel a broker load, the running loading task should also be cancelled, or
it may occupies the worker thread for a long time.

Remove the unnecessary query report when doing load execution plan.
Only the last query report is needed.

Add a new BE config tablet_writer_rpc_timeout_sec.
It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing
about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading.

Use streaming_load_max_mb instead of mini_load_max_mb in BE config.

Add more logs for tracing a broker load process easily.
2019-07-27 20:17:05 +08:00
a88b55e649 Add more logs and metrics to trace the broker load process (#1530)
The Operator wants to known when the job being scheduled as PENDING
and LOADING. And how long it takes to finish these sub states.

Also add 2 metrics on BE to monitor the memtable's flush time.
`memtable_flush_total` and `memtable_flush_duration_us`
2019-07-23 21:42:44 +08:00
69040572fb Use different ID instead of table ID for base index of an OLAP table (#1524) 2019-07-23 15:48:45 +08:00
6c1f95c3a0 Fix bug that BE may crash when closing OlapTableSink (#1507)
The `_profile` in OlapTableSink may not be initialized if `prepare()`
method is not called. So when close the OlapTableSink, we should
check if `_profile` is initialized.
2019-07-19 10:30:44 +08:00
4e043e66e2 Modify the result json format of mini load (#1487)
Mini load is now using stream load framework. But we should keep the
mini load return behavior and result json format be same as old.
So PUBLISH_TIMEOUT error should be treated as OK in mini load.

Also add 2 counters for OlapTableSink profile:
SerializeBatchTime: time of serializing all row batch.
WaitInFlightPacketTime: time of waiting last send packet
2019-07-16 19:15:41 +08:00
0d48a3961c Refactor Storage Engine (#1478)
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.

1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
   Nameing rowset with tablet_id and version will lead to
   many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
   with different format.
2019-07-15 21:18:22 +08:00