Commit Graph

90 Commits

Author SHA1 Message Date
498b06fbe2 [Metrics] Support tablet level metrics (#4428)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. 
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-09-02 10:39:41 +08:00
97d963468a [Code Cleanup] Template nest convert to c++11 syntax and style (#4442) 2020-08-26 10:51:52 +08:00
4c571cb6f5 Revert "[Metrics] Support tablet level metrics (#4327)" (#4397)
This reverts commit 56260a65c87830ffe34109195ee4d6f1d543e630.

Co-authored-by: morningman <chenmingyu@baidu.com>
2020-08-19 22:37:52 +08:00
56260a65c8 [Metrics] Support tablet level metrics (#4327)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-08-18 16:56:12 +08:00
e71152132c [metrics] Redesign metrics to 3 layers (#4115)
Redesign metrics to 3 layers:
    MetricRegistry - MetricEntity - Metrics
    MetricRegistry : the register center
    MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several 
        MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift 
        clients and servers, and so on. 
    Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, 
        thrift_opened_clients on a thrift_client entity.
    MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across 
        different MetricEntities.
2020-08-08 11:23:01 +08:00
10f822eb43 [MemTracker] make all MemTrackers shared (#4135)
We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web.
As follows:
1. nearly all MemTracker raw ptr -> shared_ptr
2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent)
3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec 
     before MemTracker's destructor. So we don't change these code.
4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared 
    too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile.
Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.
2020-07-31 21:57:21 +08:00
50e6a2c8a0 [SQL][Function] Fix from/to_base64 may return incorrect value (#4183)
from/to_base64 may return incorrect value when the value is null #4130 
remove the duplicated base64 code
fix the base64 encoded string length is wrong, and this will cause the memory error
2020-07-27 22:55:05 +08:00
8500d8b695 [metrics] Use atomic instead of SpinLock for integer metric (#4036) 2020-07-17 11:01:33 +08:00
d07a23ece3 [webserver] Introduce mustache to simplify BE's website render (#4062)
cpp-mustache is a C++ implementation of a Mustache template engine
with support for RapidJSON, and in order to simplify RapidJSON object
building, we introduce class EasyJson from Apache Kudu.
2020-07-16 22:39:51 +08:00
fdd65c50c4 [Bug] fix mem_tracker use-after-free & add UT for it (#3899) 2020-06-20 19:08:53 +08:00
6c4d7c60dd [Feature] Add QueryDetail to store query statistics. (#3744)
1. Store the query statistics in memory.
2. Supporting RESTFUL interface to get the statistics.
2020-06-15 18:16:54 +08:00
2211cb0ee0 [Metrics] Add metrics document and 2 new metrics of TCP (#3835) 2020-06-15 09:48:09 +08:00
60f93b2142 Fix bitmap type (#3749) 2020-06-03 10:07:58 +08:00
1cc78fe69b [Enhancement] Convert metric to Json format (#3635)
Add a JSON format for existing metrics like this.
```
{
    "tags":
    {
        "metric":"thread_pool",
        "name":"thrift-server-pool",
        "type":"active_thread_num"
    },
    "unit":"number",
    "value":3
}
```
I add a new JsonMetricVisitor to handle the transformation.
It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor.
Also I add
1.  A unit item to indicate the metric better 
2. Cloning tablet statistics divided by database.
3. Use white space to replace newline in audit.log
2020-05-27 08:49:30 +08:00
6788cacb94 Fix unit test failed (#3642)
Fix some unittest failed due to glog, this may be we change the ut build dir,and the log path is not exist in new build dir, so we change the log from file to stdout
2020-05-25 18:55:19 +08:00
7fb74db0a1 [Trace] Introduce trace util to BE
Ref https://github.com/apache/incubator-doris/issues/3566
Introduce trace utility from Kudu to BE. This utility has been widely used in Kudu,
Impala also import this trace utility.
This trace util is used for tracing each phases in a thread, and can be dumped to
string to see each phases' time cost and diagnose which phase cost more time.
This util store a Trace object as a threadlocal variable, we can add trace entries
which record the current file name, line number, user specified symbols and
timestamp to this object, and it's able to add some counters to this Trace
object. And then, it can be dumped to human readable string.
There are some helpful macros defined in trace.h, here is a simple example for
usage:
```
  scoped_refptr<Trace> t1(new Trace);            // New 2 traces
  scoped_refptr<Trace> t2(new Trace);
  t1->AddChildTrace("child_trace", t2.get());    // t1 add t2 as a child named "child_trace"

  TRACE_TO(t1, "step $0", 1);  // Explicitly trace to t1
  usleep(10);
  // ... do some work
  ADOPT_TRACE(t1.get());   // Explicitly adopt to trace to t1
  TRACE("step $0", 2);     // Implicitly trace to t1
  {
    // The time spent in this scope is added to counter t1.scope_time_cost
    TRACE_COUNTER_SCOPE_LATENCY_US("scope_time_cost");
    ADOPT_TRACE(t2.get());  // Adopt to trace to t2 for the duration of the current scope
    TRACE("sub start");     // Implicitly trace to t2
    usleep(10);
    // ... do some work
    TRACE("sub before loop");
    for (int i = 0; i < 10; ++i) {
      TRACE_COUNTER_INCREMENT("iterate_count", 1);  // Increase counter t2.iterate_count

      MicrosecondsInt64 start_time = GetMonoTimeMicros();
      usleep(10);
      // ... do some work
      MicrosecondsInt64 end_time = GetMonoTimeMicros();
      int64_t dur = end_time - start_time;
      // t2's simple histogram metric with name prefixed with "lbm_writes"
      const char* counter = BUCKETED_COUNTER_NAME("lbm_writes", dur);
      TRACE_COUNTER_INCREMENT(counter, 1);
    }
    TRACE("sub after loop");
  }
  TRACE("goodbye $0", "cruel world");     // Automatically restore to trace to t1
  std::cout << t1->DumpToString(Trace::INCLUDE_ALL) << std::endl;
```
output looks like:
```
0514 02:16:07.988054 (+     0us) trace_test.cpp:76] step 1
0514 02:16:07.988112 (+    58us) trace_test.cpp:80] step 2
0514 02:16:07.988863 (+   751us) trace_test.cpp:103] goodbye cruel world
Related trace 'child_trace':
0514 02:16:07.988120 (+     0us) trace_test.cpp:85] sub start
0514 02:16:07.988188 (+    68us) trace_test.cpp:88] sub before loop
0514 02:16:07.988850 (+   662us) trace_test.cpp:101] sub after loop
Metrics: {"scope_time_cost":744,"child_traces":[["child_trace",{"iterate_count":10,"lbm_writes_lt_1ms":10}]]}
```
Exclude the original source code, this patch
do the following work to adapt to Doris:
- Rename "kudu" namespace to "doris"
- Update some names to the existing function names in Doris, i.g. strings::internal::SubstituteArg::kNoArg -> strings::internal::SubstituteArg::NoArg
- Use doris::SpinLock instead of kudu::simple_spinlock which hasn't been imported
- Use manual malloc() and free() instead of kudu::Arena which hasn't been imported
- Use manual rapidjson::Writer instead of kudu::JsonWriter which hasn't been imported
- Remove all TRACE_EVENT related unit tests since TRACE_EVENT is not imported this time
- Update CMakeLists.txt

NOTICE(#3622):
This is a "revert of revert pull request".
This pr is mainly used to synthesize the PRs whose commits were
scattered and submitted due to the wrong merge method into a complete single commit.
2020-05-18 14:55:11 +08:00
69a63f6f53 Revert "[trace] Introduce trace util to BE" (#3614)
This revert is used to correct the mess of the commit
timeline caused by the wrong merge method.
2020-05-18 13:16:39 +08:00
8406723912 adapt to Doris 2020-05-13 12:13:47 +00:00
e066791e47 import original files 2020-05-13 19:03:20 +08:00
b576e54fe6 [ASAN] Fix some address problems detected by ASAN (#3495)
LSAN detected errors have been fixed by a prior pathch (#3326), but
there are still some ASAN detected errors.
This patch try to fix these errors to make Doris BE more robustness.
And then we can add CI run in LSAN/ASAN mode to detect memory errors
as early as possible.
2020-05-11 10:30:45 +08:00
b58b1b3953 [metrics] Make DorisMetrics to be a real singleton (#3417) 2020-05-04 09:20:53 +08:00
72f3082358 [Metrics] Add some metrics for container size in BE (#3246)
We can observe the workload of BE, and also it's a way to check
whether there is any problem in BE, like some container increase
too large and lead to OOM.

This patch add the following metrics:
```
Name                                   Description
rowset_count_generated_and_in_use      The total count of rowset id generated and in use since BE last start
unused_rowsets_count                   The total count of unused rowset waiting to be GC
broker_count                           The total count of brokers in management
data_stream_receiver_count             The total count of data stream receivers in management
fragment_endpoint_count                The total count of fragment endpoints of data stream in management, should always equal to data_stream_receiver_count
active_scan_context_count              The total count of active scan contexts
plan_fragment_count                    The total count of plan fragments in executing
load_channel_count                     The total count of load channels in management
result_buffer_block_count              The total count of result buffer blocks for queries, each block has a limited queue size (default 1024)
result_block_queue_count               The total count of queues for fragments, each queue has a limited size (default 20, by config::max_memory_sink_batch_count)
routine_load_task_count                The total count of routine load tasks in executing
small_file_cache_count                 The total count of cached small files' digest info
stream_load_pipe_count                 The total count of stream load pipes, each pipe has a limited buffer size (default 1M)
tablet_writer_count                    The total count of tablet writers
brpc_endpoint_stub_count               The total count of brpc endpoints
```
2020-04-25 16:13:39 +08:00
4a7a88ede1 [LSAN] Fix some memory leak detected by LSAN (#3326) 2020-04-22 22:59:44 +08:00
8fc284d593 [config] Support to modify configs when BE is running without restarting (#3264)
In the past, when we want to modify some BE configs, we have to modify be.conf and then restart BE.
This patch provides a way to modify configs in the type of 'threshold', 'interval', 'enable flag'
when BE is running without restarting it.
You can update a single config once by BE's http API: `be_host:be_http_port/api/update_config?config_name=new_value`
2020-04-08 11:17:47 +08:00
5f9359d618 Use SleepFor() instead of usleep() (#3211) 2020-03-29 14:18:19 +08:00
8aa8b8c96d [Code Refactor] Using block manager to unify the data file access. (#3189)
Earlier we introduced `BlockManager` to separate data access logic from
underlying file read and write logic.

This CL further unifies all `SegmentV2` data access to the `BlockManager`, 
removes the previous `FileManager` class, and move the file cache to the `FileBlockManager`.

There are no logical changes to this CL.

After this CL, all user table data is read through the `WritableBlock` and `ReadableBlock` 
returned by the `BlockManager`, and no file operations are performed directly.
2020-03-25 20:39:07 +08:00
099e0f74bd Remove unused LLVM related codes of directory:be/src/exprs (#2910) (#2972)
Remove unused LLVM related codes of directory (step 3):be/src/exprs (#2910)

there are many LLVM related codes in code base, but these codes are not really used.
The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris.
The PR delete all LLVM related code of directory: be/src/exprs
2020-02-24 18:23:08 +08:00
3e6dfa31c4 [UnitTest] Fix BE unit test randomly failed (#2970)
* fix http server related unit test failed due to http port has been used
* fix unit test failed in DEBUG build type
2020-02-21 22:21:02 +08:00
c617fc9064 Fix the flush_status bug in flush-executor (#2933)
For a tablet, there may be multiple memtables, which will be
flushed to disk one by one in the order of generation.

If a memtable flush fails, then the load job will definitely
fail, but the previous implementation will overwrite `_flush_status`,
which may make the error can not be detected, leads to an error
load job to be success.

This patch also have two other changes:
1. Use `std::bind` to replace `boost::bind`;
2. Removes some unneeded headers.
2020-02-19 20:23:19 +08:00
f20eb12457 [util] Import ThreadPool and Thread from KUDU (#2915)
Thread pool design point:
  All tasks submitted directly to the thread pool enter a FIFO queue and are
dispatched to a worker thread when one becomes free. Tasks may also be
submitted via ThreadPoolTokens. The token wait() and shutdown() functions
can then be used to block on logical groups of tasks.
  A token operates in one of two ExecutionModes, determined at token
construction time:
  1. SERIAL: submitted tasks are run one at a time.
  2. CONCURRENT: submitted tasks may be run in parallel.
     This isn't unlike submitted without a token, but the logical grouping that tokens
     impart can be useful when a pool is shared by many contexts (e.g. to
     safely shut down one context, to derive context-specific metrics, etc.).
Tasks submitted without a token or via ExecutionMode::CONCURRENT tokens are
processed in FIFO order. On the other hand, ExecutionMode::SERIAL tokens are
processed in a round-robin fashion, one task at a time. This prevents them
from starving one another. However, tokenless (and CONCURRENT token-based)
tasks can starve SERIAL token-based tasks.

Thread design point:
  1. It is a thin wrapper around pthread that can register itself with the singleton ThreadMgr
(a private class implemented in thread.cpp entirely, which tracks all live threads so
that they may be monitored via the debug webpages). This class has a limited subset of
boost::thread's API. Construction is almost the same, but clients must supply a
category and a name for each thread so that they can be identified in the debug web
UI. Otherwise, join() is the only supported method from boost::thread.
  2. Each Thread object knows its operating system thread ID (TID), which can be used to
attach debuggers to specific threads, to retrieve resource-usage statistics from the
operating system, and to assign threads to resource control groups.
  3. Threads are shared objects, but in a degenerate way. They may only have
up to two referents: the caller that created the thread (parent), and
the thread itself (child). Moreover, the only two methods to mutate state
(join() and the destructor) are constrained: the child may not join() on
itself, and the destructor is only run when there's one referent left.
These constraints allow us to access thread internals without any locks.
2020-02-17 11:22:09 +08:00
9ee1704859 [util] Import util tools from KUDU (#2905)
1. MonoTime/MonoDelta
   MonoTime: The MonoTime represents a particular point in time, relative to some fixed but unspecified reference point.
   MonoDelta: The MonoDelta class represents an elapsed duration of time, the delta between two MonoTime instances.

2. CountDownLatch
   This is a C++ implementation of the Java CountDownLatch
2020-02-14 18:01:16 +08:00
4e151b1551 Remove boost exception when parse store path (#2861) 2020-02-10 17:50:52 +08:00
e7817053cc [Uitls] ParseUtil::parse_mem_spec support K and T suffix (#2854) 2020-02-07 09:31:35 +08:00
a27e89065b Add file cache for v2 (#2782)
Add file descriptor cache for segment v2 to solve too many open file problems
2020-02-04 00:16:01 +08:00
c71eefa2ac Add path util (#2747)
Note that the methods in path_util are only related to path processing,
and do not involve any file and IO operations

The upcoming patch will use these util methods, used to extract operations
such as concatenation of directory strings from processing logic.
2020-01-18 00:05:00 +08:00
3b24287251 Support 64 bits integers for BITMAP type (#2772)
Fixes #2771 

Main changes in this CL
* RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h
* leveraging Roaring64Map to support unsigned BIGINT for BITMAP type
* introduces two new format (SINGLE64 and BITMAP64) for BITMAP type

So far we have three storage format for BITMAP type

```
EMPTY := TypeCode(0x00)
SINGLE32 := TypeCode(0x01), UInt32LittleEndian
BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/)
```

In order to support BIGINT element and keep backward compatibility, introduce two new format

```
SINGLE64 := TypeCode(0x03), UInt64LittleEndian
BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64
```

Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column.

Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are

1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the [C++ version](https://github.com/RoaringBitmap/CRoaring/blob/v0.2.60/cpp/roaring64map.hh#L545) is different from the [Java version](35104c564e/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java (L1097)). Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this.
2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.
2020-01-17 14:13:38 +08:00
a99a49a444 Add bitamp_to_string function (#2731)
This CL changes:

1. add function bitmap_to_string and bitmap_from_string, which will
 convert a bitmap to/from string which contains all bit in bitmap
2. add function murmur_hash3_32, which will compute murmur hash for
input strings
3. make the function cast float to string the same with user result
logic
2020-01-13 12:31:37 +08:00
4b8f7f9c32 Use cgroups memory limit and cpu cores in container (#2710) 2020-01-10 00:45:50 +08:00
de4d1778c6 Fix incompatibility with arm architecture in util and gutil (#2650)
1. upgrade  gutil code from imapla  to new verison, include `cpuinfo`, `spinlock` and `linux_syscall_support `
2. impliments arm version  utf8 check code
3. remove incompatible code from stopwatch
2020-01-06 18:39:31 +08:00
1648226927 Adapt arrow 0.15 API (#2657)
This CL supports arrow's zero copy read interface, which can make code
comply with arrow 0.15.
And the schema change unit test has some problem, I disable it in run-ut.sh
2020-01-04 15:54:29 +08:00
4ed87964fe Add zip util(#2348) (#2441)
Support .zip file extract by minizip
2019-12-27 10:10:21 +08:00
cf6d705df9 Add intersect_count UDAF (#2418)
1 Because we don't support array type currently, so I use variable arguments instead.

2 intersect_count directly return final count, not bitmap like bitmap_union, because intersect_count return bitmap is more complex and need more serialize. If we really need bitmap format from intersect_count, we could do that in another PR and which won't have compatibility problems.
2019-12-13 16:12:05 +08:00
83b5455be5 [Load] Fix several races in stream load that could cause BE crash (#2414)
This CL fixes the following problems
1. check whether TabletsChannel has been closed/cancelled in `reduce_mem_usage` to avoid using a closed DeltaWriter
2. make `FlushHandle.wait` wait for all submitted tasks to finish so that memtable is deallocated before its delta writer
3. make `~MemTracker()` release its consumption bytes to accommodate situations in aggregate_func.h that bitmap and hll call `MemTracker::consume` without corresponding `MemTracker::release`, which cause the consumption of root tracker never drops to zero
2019-12-10 21:59:05 +08:00
f635552a20 Port latest faststring (#2403) 2019-12-06 20:39:56 +08:00
ba3d16f4c7 Add BinaryPrefixPage (#2308) 2019-11-29 07:39:11 +08:00
14769b0beb Improve to_bitmap parse int performance (#2223) 2019-11-19 18:00:19 +08:00
ccc1b9d98c Optimize percentile_approx through radix sort (#2102) (#2107) 2019-11-05 09:25:47 +08:00
f53f188c5d Add arrow IPC serialization for Doris-Spark-Connector (#2013) 2019-10-31 10:32:06 +08:00
05643dc403 Replace Arena with MemPool (#2012)
After replacing Arena with MemPool, we can achieve one copy for string
value read from segment v2. We can exchange MemPool's chunk between
RowBlockV2 and RowBlock. This change only replace Arena, this work will
be done in other change list.
2019-10-19 15:53:24 +08:00
024348d74b Enable auto convert when check in (#1926)
Leverage gitattributes to enable auto convert end-of-line to LF when
checking in. Convert already exist CRLF to LF by removing all files and
checking out with new .gitattributes file. Except .gitattributes, all
files are only modified at the end of line.
2019-10-09 22:31:27 +08:00