doris

Author	SHA1	Message	Date
luozenglin	b686205b97	[Optimize] Reduce lock conflicts in ThreadResourceMgr of be (#5772 ) Removed some useless code that caused lock conflicts in ThreadResourceMgr of be.	2021-05-12 10:59:53 +08:00
Zhengguo Yang	98e80aa65e	[refactor] Replace boost::function with std::function (#5700 ) Replace boost::function with std::function	2021-05-09 22:00:48 +08:00
Mingyu Chen	11cce06962	[Feature] Support create history dynamic partition (#5703 ) 1. Add a new dynamic partition property `create_history_partition`. If set to true, Doris will create all partitions from `start` to `end`. 2. Add a new FE config `max_dynamic_partition_num` To limit the number of partitions created when creating one table.	2021-05-08 12:05:19 +08:00
weizuo93	e519a24c9a	dynamic adjust compaction policy (#5651 ) Co-authored-by: weizuo <weizuo@xiaomi.com>	2021-04-26 12:39:13 +08:00
qiye	de87f4ae84	[Feature] Add list partition support (#5529 ) Add list partition support	2021-04-24 17:42:27 +08:00
Zhengguo Yang	a803ceea86	[refactor] Remove boost mutex, use std::mutex instead (#5684 ) * Remove boost mutex, use std::mutex instead * replace shared_mutex	2021-04-22 11:29:36 +08:00
Yingchun Lai	be733cfa9c	[Metrics] Add some large memtrackers' metric (#5614 ) MemTracker can provide memory consumption for us to find out which module consume more memory, but it's just a current value, this patch add metrics for some large memory consumers, then we can find out which module consume more memory in timeline, it would be useful to troubleshoot OOM problems and optimize configs.	2021-04-21 09:15:04 +08:00
Yingchun Lai	caa7af3d1f	[Metric] Standardise histogram metric output for prometheus (#5671 ) Update histogram metric's output to prometheus standard, the output like following: test_registry_task_duration{quantile="0.50"} 50 test_registry_task_duration{quantile="0.75"} 75 test_registry_task_duration{quantile="0.90"} 95.8333 test_registry_task_duration{quantile="0.95"} 100 test_registry_task_duration{quantile="0.99"} 100 test_registry_task_duration_sum 5050 test_registry_task_duration_count 100	2021-04-20 09:14:28 +08:00
Mingyu Chen	892fbf6ded	Update s3_reader_test.cpp (#5658 )	2021-04-15 10:59:30 +08:00
Zhengguo Yang	c4cc681d14	remove boost_foreach, using c++ foreach instead (#5611 )	2021-04-15 10:52:29 +08:00
Zhengguo Yang	40f53ac71f	fix bitmap unit test failed (#5610 )	2021-04-08 10:25:59 +08:00
HappenLee	b423274f17	[Enhance] Make MemTracker more accurate (#5515 ) (#5516 ) * [Enhance] Make MemTracker more accurate (#5515) This PR main about: 1. Improve the readability of MemTrackers' name 2. Add the MemTracker of: * Load * Compaction * SchemaChange * StoragePageCache * TabletManager 3. Change SchemaChange to a Singleon * revise some code for Code Review * change the name of mem_tracker * keep reader_context have the same lifetime of rowset_reader in schema change. * change vlog notice to log(warning) in schema change	2021-04-08 09:14:55 +08:00
Zhengguo Yang	d641a26490	[Refactor] Remove boost filesystem (#5579 ) * use std::filesystem instead of boost Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>	2021-04-08 09:11:59 +08:00
Patrick	1e8c4584ab	[Function] Add BE udf bitmap_min (#2538 ) (#5581 ) this function will return the min result of the input bitmap .	2021-04-08 09:11:32 +08:00
stdpain	ad67dd34a0	update gcc to gcc 10 and support c++17 (#5394 ) * update gcc to gcc 10 and support c++17 update brpc to 0.9.7 update boost to 1.73 remove third-party boost 1.54 for mysql * update cmake version * ignore jdk version * remove unused patch * avoid use SYS_getrandom call	2021-03-25 09:30:38 +08:00
stdpain	bfeb717abe	[Refactor] fix some warning in gcc higher than 7 make decimal12_t as a POD type (#5547 )	2021-03-23 09:37:10 +08:00
Mingyu Chen	cef3cbc53a	[Bug] Fix bug that the last column may be null when using multibytes separator (#5534 )	2021-03-23 09:35:30 +08:00
stdpain	a91888a68b	[BUG] fix memory limit failure and optimize memory usage in join stage (#5514 ) This patch works well on tpcds-1T query-24	2021-03-21 11:32:51 +08:00
stdpain	c9a25aa29e	[UT] fix memory tracker ut (#5501 ) * [UT] fix memory tracker ut * Update mem_limit_test.cpp	2021-03-12 13:45:04 +08:00
Yingchun Lai	8ead0aaad8	[Enhance] Sort directories by available space when do trash sweep (#5498 ) * [Enhance] Sort directories by available space when do trash sweep In the case when one disk is about to be full, we want to sweep trash data on this disk as quickly as possible. The currently trash sweep function is to remove trashed files order by path's name, however, disk data directories may have some large different available space because of the load balance algorithm, this patch improve it to remove files by directories' available space. * add log	2021-03-12 13:43:27 +08:00
Yingchun Lai	0131c33966	[Enhance] Improve the readability of memtrackers' name (#5455 ) Improve the readability of memtrackers' name, then you will be happy to read website be_ip:port/mem_tracker	2021-03-11 22:33:31 +08:00
Zhengguo Yang	e023ef5404	[Load] Support multi bytes LineDelimiter and ColumnSeparator (#5462 ) * [Internal][Support Multibytes Separator] doris-1079 support multi bytes LineDelimiter and ColumnSeparator	2021-03-09 09:35:39 +08:00
Yingchun Lai	c38a1c799f	[Config] Support config validating when BE bootstrap and update BE's config by API (#5379 ) Some invalid config value may cause BE work in an unexpected behavior, this patch aim to support config validating when BE bootstrap and update BE's config by API to reject invalid value. This is a work to accomplish PR #4423	2021-03-04 22:21:49 +08:00
caiconghui	47d6b1ff0b	Fix ut failed for topn_function_test (#5449 ) Co-authored-by: caiconghui [蔡聪辉] <caiconghui@xiaomi.com>	2021-03-04 21:53:52 +08:00
924060929	9c8766356a	[Bug-Fix][Bitmap][Be] Resolve bitmap_not calculate wrong result(#5440 ) (#5441 ) bitmap_not calculate wrong result(#5440) Execute follow sql, and expect response '' ``` select bitmap_to_string(bitmap_not(bitmap_from_string('1'), bitmap_from_string('2,1'))); ``` Co-authored-by: lanhuajian <lanhuajian@sankuai.com>	2021-03-04 15:46:42 +08:00
Zhengguo Yang	6ede4c6ec1	[Feature] Support backup,restore,load,export directly connect to s3 (#5399 ) * [doris-1008] support backup and restore directly to cloud storage via aws s3 protocol * Internal][S3DirectAccess] Support backup,restore,load,export directlyconnect to s3 1. Support load and export data from/to s3 directly. 2. Add a config to auto convert broker access to s3 acces when available Change-Id: Iac96d4b3670776708bc96a119ff491db8cb4cde7 (cherry picked from commit 2f03832ca52221cc7436069b96c45c48c4bc7201) * [Internal][S3DirectAccess] File path glob compatible with broker Change-Id: Ie55e07a547aa22c6fa8d432ca926216c10384e68 (cherry picked from commit d4fb25544c0dc06d23e1ada571ec3f8edd4ba56f) * [internal] [doris-1008] fix log4j class not found Change-Id: I468176aca0d821383c74ee658d461aba9e7d5be3 (cherry picked from commit 029adaa9d6ded8503acbd6644c1519456f3db232) * add poms Co-authored-by: yangzhengguo01 <yangzhengguo01@baidu.com>	2021-02-22 16:07:56 +08:00
stdpain	7eae3e280a	[optimization] use inline optimize ExprContext::get_value (#5385 )	2021-02-16 22:35:14 +08:00
Mingyu Chen	51ccd44865	[Load Parallel][3/3] Support parallel delta writer (#5369 ) In the previous broker load, multiple OlapTableSinks would send data to the same LoadChannel, and because of the lock granularity problem, LoadChannel could only process these requests serially, which made it impossible to make full use of cluster resources. This CL modifies the related locks so that LoadChannel can process these requests in parallel. In the test, with a size of 20G, the load speed of 334 million rows of data in 3 nodes has been increased from 9min to 5min, and after enabling 2 concurrency, it can be increased to 3min. Also modify the profile of load job.	2021-02-07 22:42:18 +08:00
HappenLee	462efeaf39	[Performance Optimization and Refactor] (#5358 ) (#5364 ) 1. Add BlockColumnPredicate support OR and AND column predicate in RowBlockV2 2. Support evaluate vectorization delete predicate in storage engine not in Reader in SegmentV2	2021-02-07 22:41:33 +08:00
HappenLee	a1808c1a71	[Function] Add BE udf bitmap_not (#5346 ) (#5357 ) this function will return the not result of inputs two bitmap.	2021-02-07 22:39:17 +08:00
Mingyu Chen	780900ac9c	[Feature] Support preceding filter original data when loading (#5338 ) Support conditional filtering of original data in broker load and routine load eg: ``` LOAD LABEL `label1` ( DATA INFILE ('bos://cmy-repo/1.csv') INTO TABLE tbl2 COLUMNS TERMINATED BY '\t' (event_day, product_id, ocpc_stage, user_id) SET ( ocpc_stage = ocpc_stage + 100 ) PRECEDING FILTER user_id = 1381035 WHERE ocpc_stage > 30 ) ... ```	2021-02-07 22:37:48 +08:00
Mingyu Chen	a6e2c3e3f1	[Bug][Clone] Fix the bug that incremental clone is not triggered (#5230 ) In version 0.13, we support a more efficient compaction logic. This logic will maintain multiple version paths of the tablet. This can avoid -230 errors and can also support incremental clone. But the previous incremental clone uses the incremental rowset meta recorded in `incr_rs_meta`. At present, the incremental rowset meta recorded in `incr_rs_meta` and the records in `stale_rs_meta` are duplicated, and the current clone logic does not adapt to the new multi-version path, resulting in many cases not triggering incremental clone. This CL mainly modified: 1. Removed `incr_rs_meta` metadata 2. Modified the clone logic. When the clone is incremented, it will try to read the rowset in `stale_rs_meta`. 3. Delete a lot of code that was previously used for version compatibility.	2021-02-06 22:04:48 +08:00
stdpain	a841905184	[optimization] use replace top instead of push pop in priority #5312 (#5313 )	2021-02-04 09:21:54 +08:00
wyb	128752b4f9	[Routine load] Fix kafka load too many task bug (#5327 )	2021-02-03 13:23:30 +08:00
stdpain	bf0cb78b67	[optimization] avoid extra memory copy while build hash table (#5301 ) avoid extra memory copy while build hash table	2021-01-30 20:32:12 +08:00
HappenLee	a5298d617d	[Performance Improve] Push Down _conjunctf of 'not in' and '!=' to Storage Engine. (#5207 )	2021-01-23 21:07:01 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
HuangWei	64b3660be2	[UT] fix the bug of getting current running dir (#5193 ) Fixed the logic after `readlink`, add a test_util function `GetCurrentRunningDir()`.	2021-01-19 10:23:50 +08:00
Yingchun Lai	58e58c94d8	[TSAN] Fix tsan bugs (part 1) (#5162 ) ThreadSanitizer, aka TSAN, is a useful tool to detect multi-thread problems, such as data race, mutex problems, etc. We should detect TSAN problems for Doris BE, both unit tests and server should pass through TSAN mode, to make Doris more robustness. This is the very beginning patch to fix TSAN problems, and some difficult problems are suppressed in file 'tsan_suppressions', you can suppress these problems by setting: export TSAN_OPTIONS="suppressions=tsan_suppressions" before running: `BUILD_TYPE=tsan ./run-be-ut.sh --run`	2021-01-15 09:45:11 +08:00
Skysheepwang	6c098e45fc	[Optimize][Cache]Implementation of Separated Page Cache (#5008 ) #4995 Implementation of Separated Page Cache - Add config "index_page_cache_ratio" to set the ratio of capacity of index page cache - Change the member of StoragePageCache to maintain two type of cache - Change the interface of StoragePageCache for selecting type of cache - Change the usage of page cache in read_and_decompress_page in page_io.cpp - add page type as argument - check if current page type is available in StoragePageCache (cover the situation of ratio == 0 or 1) - Add type as argument in superior call of read_and_decompress_page - Change Unit Test	2021-01-04 12:19:24 +08:00
Skysheepwang	0d3564c2e1	[Feature] Implementation of histogram metric (#5148 ) #5146 Add histogram metrics into util/metrics.h. The data structure of histogram is implemented in util/histogram.h, which could also be used in other situations that in need of histogram. Unit tests added as well.	2021-01-04 09:32:46 +08:00
HappenLee	5807413ad0	[UT] Add ut for column predicate of comlumnblock (#5123 ) Add ut for column predicate of ColumnBlock	2021-01-04 09:29:30 +08:00
HappenLee	f2cf8d2c5e	[Bug-Fix] Fix the bug of `PERCENTILE_APPROX` return error result `nan` and add `PERCENTILE_APPROX` UT (#5172 )	2021-01-03 15:45:22 +08:00
HappenLee	9e19b6b133	[Performance Improve] Push Down _conjunct of 'A is NULL' and 'B is not NULL' to Storage Engine. (#5092 ) This patch mainly do the following: - Support #5086 - Refactor ColumnRangeValue to support contain null	2021-01-03 15:45:07 +08:00
HuangWei	5e1a80bb22	[UT][Bug] fix LOOP_LESS_OR_MORE (#5157 ) This bug introduced by #5131. When AllowSlowTests() is true, we should loop more.	2020-12-29 09:48:19 +08:00
Yingchun Lai	11c0aafa5c	[UT] Speed up BE unit test (#5131 ) There are some long loops and sleeps in unit tests, it will cost a very long time to run all unit tests, especially run in TSAN mode. This patch speed up unit tests by shortening long loops and sleeps, on my environment all unit tests finished in 1 minite. It's useful to do basic functional unit tests. You can switch to run in this mode by adding a new environment variable 'DORIS_ALLOW_SLOW_TESTS'. For example, you can set: export DORIS_ALLOW_SLOW_TESTS=1 and also you can disable it by setting: export DORIS_ALLOW_SLOW_TESTS=0	2020-12-27 22:19:56 +08:00
HuangWei	85076b5678	[UT] fix test_env & add a sample (#5085 ) Easily create tests.	2020-12-27 22:14:30 +08:00
xinghuayu007	9ddf434f6b	[Bug-Fix] Fix partition cache match bug (#5060 ) When partition cache is not cached continuely, range query may fail. For example, partition key 20201011 and 20201013 is cached, but rang query is between 20201011 and 20201013, the query will not hit the cache. issue:#5059	2020-12-19 11:17:44 +08:00
Youngwb	650536d53e	[Feature] Add Topn udaf (#4803 ) For #4674 This is a udaf for approximate topn using Space-Saving algorithm. At present, we can only calculate the frequent items and their frequencies in a certain column, based on which we can implement similar topN functions supported by Kylin in the future. I have also added a test to calculate the accuracy of this algorithm. The following is a rough running result. The total amount of data is 1 million lines and follows the Zipfian distribution, where Element Cardinality represents the data cardinality, 20X, 50X.. The value representing space_expand_rate is 20,50, which is used to set the counter number in the space-saving algorithm ``` zf exponent = 0.5 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 94% 98% 99% zf exponent = 0.6，1 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 100% 100% 100% ```	2020-12-16 21:58:34 +08:00
HuangWei	49f26f4413	[UT] cleanup storage engine creation in tablet_mgr_test etc (#5077 ) Mistakenly use the string '_engine_data_path' as the path, actually the storage engine is not open, so option/path is needless. Cleanup it to avoid any doubt about the file path management.	2020-12-15 09:30:32 +08:00

1 2 3 4 5 ...

438 Commits