doris

Author	SHA1	Message	Date
Xinyi Zou	80f0b5fd1c	[BUG] Fix calculation error when the memory parameter is a float value percentage (#5916 ) When parsing memory parameters in `ParseUtil::parse_mem_spec`, convert the percentage to `double` instead of `int`. The currently affected parameters include `mem_limit` and `storage_page_cache_limit`	2021-05-27 22:06:50 +08:00
HappenLee	d0462f4383	[Bug] Fix Backend UT Problem (#5784 ) (#5785 ) 1. relocation R_X86_64_32 against `__gxx_personality_v0' can not be used when making a shared object; recompile with -fPIC 2. warning: the use of `tmpnam' is dangerous, better use `mkstemp' 3. Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test couldn't detect the number of threads.	2021-05-17 11:51:59 +08:00
stdpain	a359b1cb8b	[UT] fix ut failed in new_metrics_test (#5817 )	2021-05-17 11:51:22 +08:00
Zhengguo Yang	98e80aa65e	[refactor] Replace boost::function with std::function (#5700 ) Replace boost::function with std::function	2021-05-09 22:00:48 +08:00
Zhengguo Yang	a803ceea86	[refactor] Remove boost mutex, use std::mutex instead (#5684 ) * Remove boost mutex, use std::mutex instead * replace shared_mutex	2021-04-22 11:29:36 +08:00
Yingchun Lai	caa7af3d1f	[Metric] Standardise histogram metric output for prometheus (#5671 ) Update histogram metric's output to prometheus standard, the output like following: test_registry_task_duration{quantile="0.50"} 50 test_registry_task_duration{quantile="0.75"} 75 test_registry_task_duration{quantile="0.90"} 95.8333 test_registry_task_duration{quantile="0.95"} 100 test_registry_task_duration{quantile="0.99"} 100 test_registry_task_duration_sum 5050 test_registry_task_duration_count 100	2021-04-20 09:14:28 +08:00
Zhengguo Yang	d641a26490	[Refactor] Remove boost filesystem (#5579 ) * use std::filesystem instead of boost Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>	2021-04-08 09:11:59 +08:00
Zhengguo Yang	6ede4c6ec1	[Feature] Support backup,restore,load,export directly connect to s3 (#5399 ) * [doris-1008] support backup and restore directly to cloud storage via aws s3 protocol * Internal][S3DirectAccess] Support backup,restore,load,export directlyconnect to s3 1. Support load and export data from/to s3 directly. 2. Add a config to auto convert broker access to s3 acces when available Change-Id: Iac96d4b3670776708bc96a119ff491db8cb4cde7 (cherry picked from commit 2f03832ca52221cc7436069b96c45c48c4bc7201) * [Internal][S3DirectAccess] File path glob compatible with broker Change-Id: Ie55e07a547aa22c6fa8d432ca926216c10384e68 (cherry picked from commit d4fb25544c0dc06d23e1ada571ec3f8edd4ba56f) * [internal] [doris-1008] fix log4j class not found Change-Id: I468176aca0d821383c74ee658d461aba9e7d5be3 (cherry picked from commit 029adaa9d6ded8503acbd6644c1519456f3db232) * add poms Co-authored-by: yangzhengguo01 <yangzhengguo01@baidu.com>	2021-02-22 16:07:56 +08:00
stdpain	a841905184	[optimization] use replace top instead of push pop in priority #5312 (#5313 )	2021-02-04 09:21:54 +08:00
HuangWei	64b3660be2	[UT] fix the bug of getting current running dir (#5193 ) Fixed the logic after `readlink`, add a test_util function `GetCurrentRunningDir()`.	2021-01-19 10:23:50 +08:00
Skysheepwang	0d3564c2e1	[Feature] Implementation of histogram metric (#5148 ) #5146 Add histogram metrics into util/metrics.h. The data structure of histogram is implemented in util/histogram.h, which could also be used in other situations that in need of histogram. Unit tests added as well.	2021-01-04 09:32:46 +08:00
Yingchun Lai	11c0aafa5c	[UT] Speed up BE unit test (#5131 ) There are some long loops and sleeps in unit tests, it will cost a very long time to run all unit tests, especially run in TSAN mode. This patch speed up unit tests by shortening long loops and sleeps, on my environment all unit tests finished in 1 minite. It's useful to do basic functional unit tests. You can switch to run in this mode by adding a new environment variable 'DORIS_ALLOW_SLOW_TESTS'. For example, you can set: export DORIS_ALLOW_SLOW_TESTS=1 and also you can disable it by setting: export DORIS_ALLOW_SLOW_TESTS=0	2020-12-27 22:19:56 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
sduzh	10e1e29711	Remove header file common/names.h (#4945 )	2020-11-26 17:00:48 +08:00
Yingchun Lai	6cbefd5621	[LRUCache] Expose LRU Cache status to metrics (#4688 ) Expose LRU Cache status to metrics would be helpful to diagnose problems like high usage, low hit rate.	2020-10-22 21:37:02 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
Yingchun Lai	64ebea2e43	[Feature] Support gzip compression for http response (#4533 ) After tablet level metrics is supported, the http metrics API may response a very large body when a BE holds a large number of tablets, and cause heavy network traffic. This patch introduce http content compression to reduce network traffic.	2020-09-06 20:30:12 +08:00
Yingchun Lai	498b06fbe2	[Metrics] Support tablet level metrics (#4428 ) Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet, but we have no insight about tablets in the cluster. This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request, and not return tablet level metrics by default.	2020-09-02 10:39:41 +08:00
ZhangYu0123	97d963468a	[Code Cleanup] Template nest convert to c++11 syntax and style (#4442 )	2020-08-26 10:51:52 +08:00
Mingyu Chen	4c571cb6f5	Revert "[Metrics] Support tablet level metrics (#4327 )" (#4397 ) This reverts commit 56260a65c87830ffe34109195ee4d6f1d543e630. Co-authored-by: morningman <chenmingyu@baidu.com>	2020-08-19 22:37:52 +08:00
Yingchun Lai	56260a65c8	[Metrics] Support tablet level metrics (#4327 ) Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet, but we have no insight about tablets in the cluster. This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request, and not return tablet level metrics by default.	2020-08-18 16:56:12 +08:00
Yingchun Lai	e71152132c	[metrics] Redesign metrics to 3 layers (#4115 ) Redesign metrics to 3 layers: MetricRegistry - MetricEntity - Metrics MetricRegistry : the register center MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift clients and servers, and so on. Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, thrift_opened_clients on a thrift_client entity. MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across different MetricEntities.	2020-08-08 11:23:01 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
Zhengguo Yang	50e6a2c8a0	[SQL][Function] Fix from/to_base64 may return incorrect value (#4183 ) from/to_base64 may return incorrect value when the value is null #4130 remove the duplicated base64 code fix the base64 encoded string length is wrong， and this will cause the memory error	2020-07-27 22:55:05 +08:00
Yingchun Lai	8500d8b695	[metrics] Use atomic instead of SpinLock for integer metric (#4036 )	2020-07-17 11:01:33 +08:00
Yingchun Lai	d07a23ece3	[webserver] Introduce mustache to simplify BE's website render (#4062 ) cpp-mustache is a C++ implementation of a Mustache template engine with support for RapidJSON, and in order to simplify RapidJSON object building, we introduce class EasyJson from Apache Kudu.	2020-07-16 22:39:51 +08:00
HuangWei	fdd65c50c4	[Bug] fix mem_tracker use-after-free & add UT for it (#3899 )	2020-06-20 19:08:53 +08:00
lichaoyong	6c4d7c60dd	[Feature] Add QueryDetail to store query statistics. (#3744 ) 1. Store the query statistics in memory. 2. Supporting RESTFUL interface to get the statistics.	2020-06-15 18:16:54 +08:00
Mingyu Chen	2211cb0ee0	[Metrics] Add metrics document and 2 new metrics of TCP (#3835 )	2020-06-15 09:48:09 +08:00
HangyuanLiu	60f93b2142	Fix bitmap type (#3749 )	2020-06-03 10:07:58 +08:00
lichaoyong	1cc78fe69b	[Enhancement] Convert metric to Json format (#3635 ) Add a JSON format for existing metrics like this. ``` { "tags": { "metric":"thread_pool", "name":"thrift-server-pool", "type":"active_thread_num" }, "unit":"number", "value":3 } ``` I add a new JsonMetricVisitor to handle the transformation. It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor. Also I add 1. A unit item to indicate the metric better 2. Cloning tablet statistics divided by database. 3. Use white space to replace newline in audit.log	2020-05-27 08:49:30 +08:00
yangzhg	6788cacb94	Fix unit test failed (#3642 ) Fix some unittest failed due to glog， this may be we change the ut build dir，and the log path is not exist in new build dir， so we change the log from file to stdout	2020-05-25 18:55:19 +08:00
Mingyu Chen	7fb74db0a1	[Trace] Introduce trace util to BE Ref https://github.com/apache/incubator-doris/issues/3566 Introduce trace utility from Kudu to BE. This utility has been widely used in Kudu, Impala also import this trace utility. This trace util is used for tracing each phases in a thread, and can be dumped to string to see each phases' time cost and diagnose which phase cost more time. This util store a Trace object as a threadlocal variable, we can add trace entries which record the current file name, line number, user specified symbols and timestamp to this object, and it's able to add some counters to this Trace object. And then, it can be dumped to human readable string. There are some helpful macros defined in trace.h, here is a simple example for usage: ``` scoped_refptr<Trace> t1(new Trace); // New 2 traces scoped_refptr<Trace> t2(new Trace); t1->AddChildTrace("child_trace", t2.get()); // t1 add t2 as a child named "child_trace" TRACE_TO(t1, "step $0", 1); // Explicitly trace to t1 usleep(10); // ... do some work ADOPT_TRACE(t1.get()); // Explicitly adopt to trace to t1 TRACE("step $0", 2); // Implicitly trace to t1 { // The time spent in this scope is added to counter t1.scope_time_cost TRACE_COUNTER_SCOPE_LATENCY_US("scope_time_cost"); ADOPT_TRACE(t2.get()); // Adopt to trace to t2 for the duration of the current scope TRACE("sub start"); // Implicitly trace to t2 usleep(10); // ... do some work TRACE("sub before loop"); for (int i = 0; i < 10; ++i) { TRACE_COUNTER_INCREMENT("iterate_count", 1); // Increase counter t2.iterate_count MicrosecondsInt64 start_time = GetMonoTimeMicros(); usleep(10); // ... do some work MicrosecondsInt64 end_time = GetMonoTimeMicros(); int64_t dur = end_time - start_time; // t2's simple histogram metric with name prefixed with "lbm_writes" const char* counter = BUCKETED_COUNTER_NAME("lbm_writes", dur); TRACE_COUNTER_INCREMENT(counter, 1); } TRACE("sub after loop"); } TRACE("goodbye $0", "cruel world"); // Automatically restore to trace to t1 std::cout << t1->DumpToString(Trace::INCLUDE_ALL) << std::endl; ``` output looks like: ``` 0514 02:16:07.988054 (+ 0us) trace_test.cpp:76] step 1 0514 02:16:07.988112 (+ 58us) trace_test.cpp:80] step 2 0514 02:16:07.988863 (+ 751us) trace_test.cpp:103] goodbye cruel world Related trace 'child_trace': 0514 02:16:07.988120 (+ 0us) trace_test.cpp:85] sub start 0514 02:16:07.988188 (+ 68us) trace_test.cpp:88] sub before loop 0514 02:16:07.988850 (+ 662us) trace_test.cpp:101] sub after loop Metrics: {"scope_time_cost":744,"child_traces":[["child_trace",{"iterate_count":10,"lbm_writes_lt_1ms":10}]]} ``` Exclude the original source code, this patch do the following work to adapt to Doris: - Rename "kudu" namespace to "doris" - Update some names to the existing function names in Doris, i.g. strings::internal::SubstituteArg::kNoArg -> strings::internal::SubstituteArg::NoArg - Use doris::SpinLock instead of kudu::simple_spinlock which hasn't been imported - Use manual malloc() and free() instead of kudu::Arena which hasn't been imported - Use manual rapidjson::Writer instead of kudu::JsonWriter which hasn't been imported - Remove all TRACE_EVENT related unit tests since TRACE_EVENT is not imported this time - Update CMakeLists.txt NOTICE(#3622): This is a "revert of revert pull request". This pr is mainly used to synthesize the PRs whose commits were scattered and submitted due to the wrong merge method into a complete single commit.	2020-05-18 14:55:11 +08:00
Mingyu Chen	69a63f6f53	Revert "[trace] Introduce trace util to BE" (#3614 ) This revert is used to correct the mess of the commit timeline caused by the wrong merge method.	2020-05-18 13:16:39 +08:00
Yingchun Lai	8406723912	adapt to Doris	2020-05-13 12:13:47 +00:00
Yingchun Lai	e066791e47	import original files	2020-05-13 19:03:20 +08:00
Yingchun Lai	b576e54fe6	[ASAN] Fix some address problems detected by ASAN (#3495 ) LSAN detected errors have been fixed by a prior pathch (#3326), but there are still some ASAN detected errors. This patch try to fix these errors to make Doris BE more robustness. And then we can add CI run in LSAN/ASAN mode to detect memory errors as early as possible.	2020-05-11 10:30:45 +08:00
Yingchun Lai	b58b1b3953	[metrics] Make DorisMetrics to be a real singleton (#3417 )	2020-05-04 09:20:53 +08:00
Yingchun Lai	72f3082358	[Metrics] Add some metrics for container size in BE (#3246 ) We can observe the workload of BE, and also it's a way to check whether there is any problem in BE, like some container increase too large and lead to OOM. This patch add the following metrics: ``` Name Description rowset_count_generated_and_in_use The total count of rowset id generated and in use since BE last start unused_rowsets_count The total count of unused rowset waiting to be GC broker_count The total count of brokers in management data_stream_receiver_count The total count of data stream receivers in management fragment_endpoint_count The total count of fragment endpoints of data stream in management, should always equal to data_stream_receiver_count active_scan_context_count The total count of active scan contexts plan_fragment_count The total count of plan fragments in executing load_channel_count The total count of load channels in management result_buffer_block_count The total count of result buffer blocks for queries, each block has a limited queue size (default 1024) result_block_queue_count The total count of queues for fragments, each queue has a limited size (default 20, by config::max_memory_sink_batch_count) routine_load_task_count The total count of routine load tasks in executing small_file_cache_count The total count of cached small files' digest info stream_load_pipe_count The total count of stream load pipes, each pipe has a limited buffer size (default 1M) tablet_writer_count The total count of tablet writers brpc_endpoint_stub_count The total count of brpc endpoints ```	2020-04-25 16:13:39 +08:00
Yingchun Lai	4a7a88ede1	[LSAN] Fix some memory leak detected by LSAN (#3326 )	2020-04-22 22:59:44 +08:00
Yingchun Lai	8fc284d593	[config] Support to modify configs when BE is running without restarting (#3264 ) In the past, when we want to modify some BE configs, we have to modify be.conf and then restart BE. This patch provides a way to modify configs in the type of 'threshold', 'interval', 'enable flag' when BE is running without restarting it. You can update a single config once by BE's http API: `be_host:be_http_port/api/update_config?config_name=new_value`	2020-04-08 11:17:47 +08:00
HuangWei	5f9359d618	Use SleepFor() instead of usleep() (#3211 )	2020-03-29 14:18:19 +08:00
Mingyu Chen	8aa8b8c96d	[Code Refactor] Using block manager to unify the data file access. (#3189 ) Earlier we introduced `BlockManager` to separate data access logic from underlying file read and write logic. This CL further unifies all `SegmentV2` data access to the `BlockManager`, removes the previous `FileManager` class, and move the file cache to the `FileBlockManager`. There are no logical changes to this CL. After this CL, all user table data is read through the `WritableBlock` and `ReadableBlock` returned by the `BlockManager`, and no file operations are performed directly.	2020-03-25 20:39:07 +08:00
trueeyu	099e0f74bd	Remove unused LLVM related codes of directory:be/src/exprs (#2910 ) (#2972 ) Remove unused LLVM related codes of directory (step 3):be/src/exprs (#2910) there are many LLVM related codes in code base, but these codes are not really used. The higher version of GCC is not compatible with the LLVM 3.4.2 version currently used by Doris. The PR delete all LLVM related code of directory: be/src/exprs	2020-02-24 18:23:08 +08:00
yangzhg	3e6dfa31c4	[UnitTest] Fix BE unit test randomly failed (#2970 ) * fix http server related unit test failed due to http port has been used * fix unit test failed in DEBUG build type	2020-02-21 22:21:02 +08:00
LingBin	c617fc9064	Fix the flush_status bug in flush-executor (#2933 ) For a tablet, there may be multiple memtables, which will be flushed to disk one by one in the order of generation. If a memtable flush fails, then the load job will definitely fail, but the previous implementation will overwrite `_flush_status`, which may make the error can not be detected, leads to an error load job to be success. This patch also have two other changes: 1. Use `std::bind` to replace `boost::bind`; 2. Removes some unneeded headers.	2020-02-19 20:23:19 +08:00
lichaoyong	f20eb12457	[util] Import ThreadPool and Thread from KUDU (#2915 ) Thread pool design point: All tasks submitted directly to the thread pool enter a FIFO queue and are dispatched to a worker thread when one becomes free. Tasks may also be submitted via ThreadPoolTokens. The token wait() and shutdown() functions can then be used to block on logical groups of tasks. A token operates in one of two ExecutionModes, determined at token construction time: 1. SERIAL: submitted tasks are run one at a time. 2. CONCURRENT: submitted tasks may be run in parallel. This isn't unlike submitted without a token, but the logical grouping that tokens impart can be useful when a pool is shared by many contexts (e.g. to safely shut down one context, to derive context-specific metrics, etc.). Tasks submitted without a token or via ExecutionMode::CONCURRENT tokens are processed in FIFO order. On the other hand, ExecutionMode::SERIAL tokens are processed in a round-robin fashion, one task at a time. This prevents them from starving one another. However, tokenless (and CONCURRENT token-based) tasks can starve SERIAL token-based tasks. Thread design point: 1. It is a thin wrapper around pthread that can register itself with the singleton ThreadMgr (a private class implemented in thread.cpp entirely, which tracks all live threads so that they may be monitored via the debug webpages). This class has a limited subset of boost::thread's API. Construction is almost the same, but clients must supply a category and a name for each thread so that they can be identified in the debug web UI. Otherwise, join() is the only supported method from boost::thread. 2. Each Thread object knows its operating system thread ID (TID), which can be used to attach debuggers to specific threads, to retrieve resource-usage statistics from the operating system, and to assign threads to resource control groups. 3. Threads are shared objects, but in a degenerate way. They may only have up to two referents: the caller that created the thread (parent), and the thread itself (child). Moreover, the only two methods to mutate state (join() and the destructor) are constrained: the child may not join() on itself, and the destructor is only run when there's one referent left. These constraints allow us to access thread internals without any locks.	2020-02-17 11:22:09 +08:00
lichaoyong	9ee1704859	[util] Import util tools from KUDU (#2905 ) 1. MonoTime/MonoDelta MonoTime: The MonoTime represents a particular point in time, relative to some fixed but unspecified reference point. MonoDelta: The MonoDelta class represents an elapsed duration of time, the delta between two MonoTime instances. 2. CountDownLatch This is a C++ implementation of the Java CountDownLatch	2020-02-14 18:01:16 +08:00
LingBin	4e151b1551	Remove boost exception when parse store path (#2861 )	2020-02-10 17:50:52 +08:00
kangkaisen	e7817053cc	[Uitls] ParseUtil::parse_mem_spec support K and T suffix (#2854 )	2020-02-07 09:31:35 +08:00

1 2 3

107 Commits