Commit Graph

453 Commits

Author SHA1 Message Date
1999a0c26b [optimization] open gcc strict-aliasing optimization (#6034)
* open gcc strict-aliasing optimization

* use -Werror=strick-alias
2021-06-18 11:39:24 +08:00
5cfe081b05 [Bug] Remove duplicate memtracker (#6041)
* [Enhanece] Remove duplicate memtracker

This problem will cause frequent creation of memtracker and affect query concurrency.
2021-06-18 11:28:37 +08:00
d57c2344e1 [MemTracker] Refactored the hierarchical structure of memtracker (#5956)
To avoid showing too many memtracker on BE web pages.
The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE.

OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata.
TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task.
VERBOSE is used for other more detailed memtrackers.
2021-06-16 09:44:24 +08:00
bde60280b8 [Optimize] use string_view instead of std::string in string function (#6010) 2021-06-16 09:40:13 +08:00
6d6c3d9703 [Enhancement] Reduce memory consumption by releasing readers earier (#5811)
We created multiple rowset readers to read data of one tablet,
after one rowset reader has reached EOF, it can be released to
reduce resource (typically memory) consumption.
As the same, we can release segment reader when it reach EOF.
2021-06-16 09:37:50 +08:00
d33a6d1b98 [Function] Support date function: yearweek(), week(), makedate(). (#6000) 2021-06-10 17:38:25 +08:00
4d64612b96 [ARRAY]Save array's size instead of offset. (#5983)
* Save array's size instead of offset.

* Optimize variable name

* Fix comment
2021-06-10 12:32:58 +08:00
d790cc6a50 [BUG] Fixed the problem that substring function may access illegal address (#5952) 2021-06-03 18:38:10 +08:00
81ecf3d097 [Bug] Rebuilt version graph of a tablet when there are too many orphan vertex (#5945)
The version information of the tablet will be stored in the memory
in an adjacency graph data structure.
And as the new version is written and the old version is deleted,
the data structure will begin to have empty vertex with no edge associations(orphan vertex).

These orphan vertexs should be removed somehow.
2021-06-03 09:59:20 +08:00
63c99eb4cb [Cache][Enhancement] Assure sql cache only one version (#5793)
For PR #5792. This patch add a new param `cache type` to distinguish sql cache and partition cache.
When update sql cache,  we make assure one sql key only has one version cache.
2021-05-28 13:45:47 +08:00
80f0b5fd1c [BUG] Fix calculation error when the memory parameter is a float value percentage (#5916)
When parsing memory parameters in `ParseUtil::parse_mem_spec`, convert the percentage to `double` instead of `int`.

The currently affected parameters include `mem_limit` and `storage_page_cache_limit`
2021-05-27 22:06:50 +08:00
ba38973209 use virtual hosted-style request to access object store (#5894)
* use virtual hosted-style access request object store
2021-05-27 15:52:07 +08:00
d0462f4383 [Bug] Fix Backend UT Problem (#5784) (#5785)
1. relocation R_X86_64_32 against `__gxx_personality_v0' can not be used when making a shared object; recompile with -fPIC
2. warning: the use of `tmpnam' is dangerous, better use `mkstemp'
3. Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test couldn't detect the number of threads.
2021-05-17 11:51:59 +08:00
a359b1cb8b [UT] fix ut failed in new_metrics_test (#5817) 2021-05-17 11:51:22 +08:00
01a45e8691 add read buffer when use s3 reader (#5791) 2021-05-17 11:46:38 +08:00
b686205b97 [Optimize] Reduce lock conflicts in ThreadResourceMgr of be (#5772)
Removed some useless code that caused lock conflicts in ThreadResourceMgr of be.
2021-05-12 10:59:53 +08:00
98e80aa65e [refactor] Replace boost::function with std::function (#5700)
Replace boost::function with std::function
2021-05-09 22:00:48 +08:00
11cce06962 [Feature] Support create history dynamic partition (#5703)
1. Add a new dynamic partition property `create_history_partition`.
    If set to true, Doris will create all partitions from `start` to `end`.

2. Add a new FE config `max_dynamic_partition_num`
    To limit the number of partitions created when creating one table.
2021-05-08 12:05:19 +08:00
e519a24c9a dynamic adjust compaction policy (#5651)
Co-authored-by: weizuo <weizuo@xiaomi.com>
2021-04-26 12:39:13 +08:00
de87f4ae84 [Feature] Add list partition support (#5529)
Add list partition support
2021-04-24 17:42:27 +08:00
a803ceea86 [refactor] Remove boost mutex, use std::mutex instead (#5684)
* Remove boost mutex, use std::mutex instead

* replace shared_mutex
2021-04-22 11:29:36 +08:00
be733cfa9c [Metrics] Add some large memtrackers' metric (#5614)
MemTracker can provide memory consumption for us to find out which
module consume more memory, but it's just a current value, this patch
add metrics for some large memory consumers, then we can find out
which module consume more memory in timeline, it would be useful to
troubleshoot OOM problems and optimize configs.
2021-04-21 09:15:04 +08:00
caa7af3d1f [Metric] Standardise histogram metric output for prometheus (#5671)
Update histogram metric's output to prometheus standard, the output
like following:

test_registry_task_duration{quantile="0.50"} 50
test_registry_task_duration{quantile="0.75"} 75
test_registry_task_duration{quantile="0.90"} 95.8333
test_registry_task_duration{quantile="0.95"} 100
test_registry_task_duration{quantile="0.99"} 100
test_registry_task_duration_sum 5050
test_registry_task_duration_count 100
2021-04-20 09:14:28 +08:00
892fbf6ded Update s3_reader_test.cpp (#5658) 2021-04-15 10:59:30 +08:00
c4cc681d14 remove boost_foreach, using c++ foreach instead (#5611) 2021-04-15 10:52:29 +08:00
40f53ac71f fix bitmap unit test failed (#5610) 2021-04-08 10:25:59 +08:00
b423274f17 [Enhance] Make MemTracker more accurate (#5515) (#5516)
* [Enhance] Make MemTracker more accurate (#5515)
 This PR main about:
 1. Improve the readability of MemTrackers' name
 2. Add the MemTracker of:
    * Load
    * Compaction
    * SchemaChange
    * StoragePageCache
    * TabletManager
 3. Change SchemaChange to a Singleon

* revise some code for Code Review

* change the name of mem_tracker

* keep reader_context have the same lifetime of rowset_reader in schema change.

* change vlog notice to log(warning) in schema change
2021-04-08 09:14:55 +08:00
d641a26490 [Refactor] Remove boost filesystem (#5579)
* use std::filesystem instead of boost
Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
2021-04-08 09:11:59 +08:00
1e8c4584ab [Function] Add BE udf bitmap_min (#2538) (#5581)
this function will return the min result of the input bitmap .
2021-04-08 09:11:32 +08:00
ad67dd34a0 update gcc to gcc 10 and support c++17 (#5394)
* update gcc to gcc 10 and support c++17
    update brpc to 0.9.7
    update boost to 1.73
    remove third-party boost 1.54 for mysql

* update cmake version

* ignore jdk version

* remove unused patch

* avoid use SYS_getrandom call
2021-03-25 09:30:38 +08:00
bfeb717abe [Refactor] fix some warning in gcc higher than 7 make decimal12_t as a POD type (#5547) 2021-03-23 09:37:10 +08:00
cef3cbc53a [Bug] Fix bug that the last column may be null when using multibytes separator (#5534) 2021-03-23 09:35:30 +08:00
a91888a68b [BUG] fix memory limit failure and optimize memory usage in join stage (#5514)
This patch works well on tpcds-1T query-24
2021-03-21 11:32:51 +08:00
c9a25aa29e [UT] fix memory tracker ut (#5501)
* [UT] fix memory tracker ut

* Update mem_limit_test.cpp
2021-03-12 13:45:04 +08:00
8ead0aaad8 [Enhance] Sort directories by available space when do trash sweep (#5498)
* [Enhance] Sort directories by available space when do trash sweep

In the case when one disk is about to be full, we want to sweep trash
data on this disk as quickly as possible. The currently trash sweep
function is to remove trashed files order by path's name, however, disk
data directories may have some large different available space because
of the load balance algorithm, this patch improve it to remove files by
directories' available space.

* add log
2021-03-12 13:43:27 +08:00
0131c33966 [Enhance] Improve the readability of memtrackers' name (#5455)
Improve the readability of memtrackers' name, then you will be happy to read website be_ip:port/mem_tracker
2021-03-11 22:33:31 +08:00
e023ef5404 [Load] Support multi bytes LineDelimiter and ColumnSeparator (#5462)
* [Internal][Support Multibytes Separator] doris-1079
support multi bytes LineDelimiter and ColumnSeparator
2021-03-09 09:35:39 +08:00
c38a1c799f [Config] Support config validating when BE bootstrap and update BE's config by API (#5379)
Some invalid config value may cause BE work in an unexpected behavior,
this patch aim to support config validating when BE bootstrap and update BE's config by API
to reject invalid value.
This is a work to accomplish PR #4423
2021-03-04 22:21:49 +08:00
47d6b1ff0b Fix ut failed for topn_function_test (#5449)
Co-authored-by: caiconghui [蔡聪辉] <caiconghui@xiaomi.com>
2021-03-04 21:53:52 +08:00
9c8766356a [Bug-Fix][Bitmap][Be] Resolve bitmap_not calculate wrong result(#5440) (#5441)
bitmap_not calculate wrong result(#5440)

Execute follow sql, and expect response ''
```
select bitmap_to_string(bitmap_not(bitmap_from_string('1'), bitmap_from_string('2,1'))); 
```

Co-authored-by: lanhuajian <lanhuajian@sankuai.com>
2021-03-04 15:46:42 +08:00
6ede4c6ec1 [Feature] Support backup,restore,load,export directly connect to s3 (#5399)
* [doris-1008] support backup and restore directly to cloud storage via aws s3 protocol

* Internal][S3DirectAccess] Support backup,restore,load,export directlyconnect to s3
1. Support load and export data from/to s3 directly.
2. Add a config to auto convert broker access to s3 acces when available

Change-Id: Iac96d4b3670776708bc96a119ff491db8cb4cde7

(cherry picked from commit 2f03832ca52221cc7436069b96c45c48c4bc7201)

* [Internal][S3DirectAccess] File path glob compatible with broker

Change-Id: Ie55e07a547aa22c6fa8d432ca926216c10384e68
(cherry picked from commit d4fb25544c0dc06d23e1ada571ec3f8edd4ba56f)

* [internal] [doris-1008] fix log4j class not found

Change-Id: I468176aca0d821383c74ee658d461aba9e7d5be3
(cherry picked from commit 029adaa9d6ded8503acbd6644c1519456f3db232)

* add poms

Co-authored-by: yangzhengguo01 <yangzhengguo01@baidu.com>
2021-02-22 16:07:56 +08:00
7eae3e280a [optimization] use inline optimize ExprContext::get_value (#5385) 2021-02-16 22:35:14 +08:00
51ccd44865 [Load Parallel][3/3] Support parallel delta writer (#5369)
In the previous broker load, multiple OlapTableSinks would send data to the same LoadChannel,
and because of the lock granularity problem, LoadChannel could only process these requests serially,
which made it impossible to make full use of cluster resources.

This CL modifies the related locks so that LoadChannel can process these requests in parallel.

In the test, with a size of 20G, the load speed of 334 million rows of data in 3 nodes has been
increased from 9min to 5min, and after enabling 2 concurrency, it can be increased to 3min.

Also modify the profile of load job.
2021-02-07 22:42:18 +08:00
462efeaf39 [Performance Optimization and Refactor] (#5358) (#5364)
1. Add BlockColumnPredicate support OR and AND column predicate in RowBlockV2
2. Support evaluate vectorization delete predicate in storage engine not in Reader in SegmentV2
2021-02-07 22:41:33 +08:00
a1808c1a71 [Function] Add BE udf bitmap_not (#5346) (#5357)
this function will return the not result of inputs two bitmap.
2021-02-07 22:39:17 +08:00
780900ac9c [Feature] Support preceding filter original data when loading (#5338)
Support conditional filtering of original data in broker load and routine load
eg:

```
LOAD LABEL `label1`
(
DATA INFILE ('bos://cmy-repo/1.csv')
INTO TABLE tbl2
COLUMNS TERMINATED BY '\t'
(event_day, product_id, ocpc_stage, user_id)
SET (
	ocpc_stage = ocpc_stage + 100
)
PRECEDING FILTER user_id = 1381035
WHERE ocpc_stage > 30
)
...
```
2021-02-07 22:37:48 +08:00
a6e2c3e3f1 [Bug][Clone] Fix the bug that incremental clone is not triggered (#5230)
In version 0.13, we support a more efficient compaction logic. 
This logic will maintain multiple version paths of the tablet.
This can avoid -230 errors and can also support incremental clone.

But the previous incremental clone uses the incremental rowset meta recorded in `incr_rs_meta`.
At present, the incremental rowset meta recorded in `incr_rs_meta` and the records
in `stale_rs_meta` are duplicated, and the current clone logic does not adapt to the
new multi-version path, resulting in many cases not triggering incremental clone.

This CL mainly modified:

1. Removed `incr_rs_meta` metadata
2. Modified the clone logic. When the clone is incremented, it will try to read the rowset in `stale_rs_meta`.
3. Delete a lot of code that was previously used for version compatibility.
2021-02-06 22:04:48 +08:00
a841905184 [optimization] use replace top instead of push pop in priority #5312 (#5313) 2021-02-04 09:21:54 +08:00
wyb
128752b4f9 [Routine load] Fix kafka load too many task bug (#5327) 2021-02-03 13:23:30 +08:00
bf0cb78b67 [optimization] avoid extra memory copy while build hash table (#5301)
avoid extra memory copy while build hash table
2021-01-30 20:32:12 +08:00