doris

Author	SHA1	Message	Date
Yingchun Lai	be733cfa9c	[Metrics] Add some large memtrackers' metric (#5614 ) MemTracker can provide memory consumption for us to find out which module consume more memory, but it's just a current value, this patch add metrics for some large memory consumers, then we can find out which module consume more memory in timeline, it would be useful to troubleshoot OOM problems and optimize configs.	2021-04-21 09:15:04 +08:00
Zhengguo Yang	40f53ac71f	fix bitmap unit test failed (#5610 )	2021-04-08 10:25:59 +08:00
Patrick	1e8c4584ab	[Function] Add BE udf bitmap_min (#2538 ) (#5581 ) this function will return the min result of the input bitmap .	2021-04-08 09:11:32 +08:00
stdpain	ad67dd34a0	update gcc to gcc 10 and support c++17 (#5394 ) * update gcc to gcc 10 and support c++17 update brpc to 0.9.7 update boost to 1.73 remove third-party boost 1.54 for mysql * update cmake version * ignore jdk version * remove unused patch * avoid use SYS_getrandom call	2021-03-25 09:30:38 +08:00
caiconghui	47d6b1ff0b	Fix ut failed for topn_function_test (#5449 ) Co-authored-by: caiconghui [蔡聪辉] <caiconghui@xiaomi.com>	2021-03-04 21:53:52 +08:00
924060929	9c8766356a	[Bug-Fix][Bitmap][Be] Resolve bitmap_not calculate wrong result(#5440 ) (#5441 ) bitmap_not calculate wrong result(#5440) Execute follow sql, and expect response '' ``` select bitmap_to_string(bitmap_not(bitmap_from_string('1'), bitmap_from_string('2,1'))); ``` Co-authored-by: lanhuajian <lanhuajian@sankuai.com>	2021-03-04 15:46:42 +08:00
stdpain	7eae3e280a	[optimization] use inline optimize ExprContext::get_value (#5385 )	2021-02-16 22:35:14 +08:00
HappenLee	a1808c1a71	[Function] Add BE udf bitmap_not (#5346 ) (#5357 ) this function will return the not result of inputs two bitmap.	2021-02-07 22:39:17 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
HappenLee	f2cf8d2c5e	[Bug-Fix] Fix the bug of `PERCENTILE_APPROX` return error result `nan` and add `PERCENTILE_APPROX` UT (#5172 )	2021-01-03 15:45:22 +08:00
Yingchun Lai	11c0aafa5c	[UT] Speed up BE unit test (#5131 ) There are some long loops and sleeps in unit tests, it will cost a very long time to run all unit tests, especially run in TSAN mode. This patch speed up unit tests by shortening long loops and sleeps, on my environment all unit tests finished in 1 minite. It's useful to do basic functional unit tests. You can switch to run in this mode by adding a new environment variable 'DORIS_ALLOW_SLOW_TESTS'. For example, you can set: export DORIS_ALLOW_SLOW_TESTS=1 and also you can disable it by setting: export DORIS_ALLOW_SLOW_TESTS=0	2020-12-27 22:19:56 +08:00
Youngwb	650536d53e	[Feature] Add Topn udaf (#4803 ) For #4674 This is a udaf for approximate topn using Space-Saving algorithm. At present, we can only calculate the frequent items and their frequencies in a certain column, based on which we can implement similar topN functions supported by Kylin in the future. I have also added a test to calculate the accuracy of this algorithm. The following is a rough running result. The total amount of data is 1 million lines and follows the Zipfian distribution, where Element Cardinality represents the data cardinality, 20X, 50X.. The value representing space_expand_rate is 20,50, which is used to set the counter number in the space-saving algorithm ``` zf exponent = 0.5 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 94% 98% 99% zf exponent = 0.6，1 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 100% 100% 100% ```	2020-12-16 21:58:34 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Mingyu Chen	bfdb15c730	[Bug] Fix some date functions to make their result same as MySQL (#4786 ) dayofweek, dayofmonth, dayofyear, weekofyear, timediff Also fix ut compilation problem	2020-10-27 12:52:44 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
HappenLee	c00a5cb543	[Bug] Fix the core problem of function `split_part` and add the UT of core case (#4721 ) issue:#4720	2020-10-13 10:09:39 +08:00
Zhengguo Yang	75e0ba32a1	Fixes some be typo (#4714 )	2020-10-13 09:37:15 +08:00
Zhengguo Yang	98e71a8b9f	[Bug][Function] Fix rand() function return same value (#4709 ) fix rand function return same value when no parameter	2020-10-11 15:40:38 +08:00
ccoffline	f3cdf167d1	[Feature] Add time_round builtin functions (#4640 ) #4619 Add time_round functions that provides `time_floor` & `time_ceil` at each time unit. Fix two related bugs. - #4618 - Fix `struct TimeInterval` to use `int64_t` instead of `int32_t`, in case when the second diff overflow	2020-10-09 16:05:51 +08:00
Yingchun Lai	b1853caeed	[UDF] Improve performance of function money_format (#4672 ) Use static local variable instead of create it every calls. Time cost of the new added unit benchmark test could reduce from about 60 seconds to 10 seconds.	2020-09-28 13:39:41 +08:00
xinghuayu007	1a30bcbf36	[SQL Function][Bug] Fix parse_url() bug (#4429 ) The parameter 'part' of parse_url function does not support lower case, and parse protocol not right. And This function does not support parse 'port'. This PR tries to make parse_url function case insensitive and support parse 'port'. The issue: #4451	2020-09-03 17:06:09 +08:00
Yingchun Lai	498b06fbe2	[Metrics] Support tablet level metrics (#4428 ) Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet, but we have no insight about tablets in the cluster. This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request, and not return tablet level metrics by default.	2020-09-02 10:39:41 +08:00
xinghuayu007	bfb39a2826	[SQL][Function] Add replace() function (#4347 ) replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow: mysql> select replace("http://www.baidu.com:9090", "9090", ""); +------------------------------------------------------+ \| replace('http://www.baidu.com:9090', '9090', '') \| +------------------------------------------------------+ \| http://www.baidu.com: \| +------------------------------------------------------+	2020-08-20 09:28:53 +08:00
Zhengguo Yang	10e3fc2778	[BUG] Fix abs function cannot handle bigint or bigger data type (#4326 )	2020-08-12 20:58:35 +08:00
Mingyu Chen	912547260a	[UnitTest] Refactor BE unit test script (#4266 ) 1. Rename run-ut.sh to run-be-ut.sh 2. Find all test files from build dir instead of declaring separately in the script 3. Add gtest output to collect the result of unit test.	2020-08-11 10:23:51 +08:00
Zhengguo Yang	50e6a2c8a0	[SQL][Function] Fix from/to_base64 may return incorrect value (#4183 ) from/to_base64 may return incorrect value when the value is null #4130 remove the duplicated base64 code fix the base64 encoded string length is wrong， and this will cause the memory error	2020-07-27 22:55:05 +08:00
Mingyu Chen	1bfb105ec1	[Bug] Fix bug that routine load task throw exception when calling afterVisible() (#3979 )	2020-07-01 09:22:33 +08:00
HuangWei	fdd65c50c4	[Bug] fix mem_tracker use-after-free & add UT for it (#3899 )	2020-06-20 19:08:53 +08:00
worker24h	e76f712bb3	[Bug] Load data is error in json load	2020-05-28 17:28:33 +08:00
yangzhg	ba7d2dbf7b	[Function] Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (#3638 ) Support utf-8 encoding for string function `instr`, `locate`, `locate_pos`, `lpad`, `rpad` and add unit test for them	2020-05-22 14:34:26 +08:00
EmmyMiao87	16deac96a9	[UT][Bug] Fix the ut error of bitmap_intersect (#3664 ) Change-Id: Id32fd9381119f30786acae9b4ac61b0d5ef9df48	2020-05-22 10:29:12 +08:00
worker24h	ef8fd1fcbe	[Load] Support load json-data into Doris by RoutineLoad or StreamLoad (#3553 ) Doris support load json-data by RoutineLoad or StreamLoad	2020-05-21 13:00:49 +08:00
EmmyMiao87	0d66e6bd15	Support bitmap_intersect (#3571 ) * Support bitmap_intersect Support aggregate function Bitmap Intersect, it is mainly used to take intersection of grouped data. The function 'bitmap_intersect(expr)' calculates the intersection of bitmap columns and returns a bitmap object. The defination is following: FunctionName: bitmap_intersect, InputType: bitmap, OutputType: bitmap The scenario is as follows: Query which users satisfy the three tags a, b, and c at the same time. ``` select bitmap_to_string(bitmap_intersect(user_id)) from ( select bitmap_union(user_id) user_id from bitmap_intersect_test where tag in ('a', 'b', 'c') group by tag ) a ``` Closed #3552. * Add docs of bitmap_union and bitmap_intersect * Support null of bitmap_intersect	2020-05-20 21:12:02 +08:00
Yingchun Lai	b576e54fe6	[ASAN] Fix some address problems detected by ASAN (#3495 ) LSAN detected errors have been fixed by a prior pathch (#3326), but there are still some ASAN detected errors. This patch try to fix these errors to make Doris BE more robustness. And then we can add CI run in LSAN/ASAN mode to detect memory errors as early as possible.	2020-05-11 10:30:45 +08:00
Youngwb	a656a7ddd4	Support append_trailing_char_if_absent function (#3439 )	2020-05-09 08:59:34 +08:00
yangzhg	94b3a2bd50	[Bug] Fix string functions not support multibyte string (#3345 ) Let string functions support utf8 encoding	2020-05-08 12:52:46 +08:00
HangyuanLiu	ad6698cd31	[Performance] Use Google/CCTZ to replace boost at timezone function (#3300 ) NOTICE: the thirdparty dependency need to upgrade to add libcctz.	2020-04-23 09:26:04 +08:00
Yingchun Lai	4a7a88ede1	[LSAN] Fix some memory leak detected by LSAN (#3326 )	2020-04-22 22:59:44 +08:00
Yingchun Lai	8fc284d593	[config] Support to modify configs when BE is running without restarting (#3264 ) In the past, when we want to modify some BE configs, we have to modify be.conf and then restart BE. This patch provides a way to modify configs in the type of 'threshold', 'interval', 'enable flag' when BE is running without restarting it. You can update a single config once by BE's http API: `be_host:be_http_port/api/update_config?config_name=new_value`	2020-04-08 11:17:47 +08:00
kangkaisen	fca6c4e523	Fix bitmap null crash (#3042 )	2020-03-05 21:30:32 +08:00
Lishi	0d1e28746e	[Function] Support null_or_empty function (#2977 ) It returns true if the string is empty or NULL. Otherwise it returns false.	2020-03-01 17:35:45 +08:00
yangzhg	3e6dfa31c4	[UnitTest] Fix BE unit test randomly failed (#2970 ) * fix http server related unit test failed due to http port has been used * fix unit test failed in DEBUG build type	2020-02-21 22:21:02 +08:00
kangkaisen	a76f2b8211	bitmap_union_count support window function (#2902 )	2020-02-19 14:33:05 +08:00
Lishi	89c7234c1c	Support starts_with (str, prefix) function (#2813 ) Support starts_with function	2020-01-21 14:09:08 +08:00
Dayue Gao	3b24287251	Support 64 bits integers for BITMAP type (#2772 ) Fixes #2771 Main changes in this CL * RoaringBitmap is renamed to BitmapValue and moved into bitmap_value.h * leveraging Roaring64Map to support unsigned BIGINT for BITMAP type * introduces two new format (SINGLE64 and BITMAP64) for BITMAP type So far we have three storage format for BITMAP type ``` EMPTY := TypeCode(0x00) SINGLE32 := TypeCode(0x01), UInt32LittleEndian BITMAP32 := TypeCode(0x02), RoaringBitmap(defined by https://github.com/RoaringBitmap/RoaringFormatSpec/) ``` In order to support BIGINT element and keep backward compatibility, introduce two new format ``` SINGLE64 := TypeCode(0x03), UInt64LittleEndian BITMAP64 := TypeCode(0x04), CustomRoaringBitmap64 ``` Please note that SINGLE64/BITMAP64 doesn't replace SINGLE32/BITMAP32. Doris will choose the smaller (in terms of space) type automatically during serializing. For example, BITMAP32 is preferred over BITMAP64 when the maximum element is <= UINT32_MAX. This will also make BE rollback possible as long as user didn't write element larger than UINT32_MAX into bitmap column. Another important design decision is that we fork and maintain our own version of Roaring64Map instead of using the one in "roaring/roaring64map.hh". The reasons are 1. RoaringBitmap doesn't define a standard for the binary format of 64-bits bitmap. As a result, different implementations of Roaring64Map use different format. For example the [C++ version](https://github.com/RoaringBitmap/CRoaring/blob/v0.2.60/cpp/roaring64map.hh#L545) is different from the [Java version](`35104c564e/src/main/java/org/roaringbitmap/longlong/Roaring64NavigableMap.java (L1097)`). Even for CRoaring, the format may change in future releases. However Doris require the serialized format to be stable across versions. Fork is a safe way to achieve this. 2. We may want to make some code changes to Roaring64Map according to our needs. For example, in order to use the BITMAP32 format when the maximum element can be represented in 32 bits, we may want to access the private member of Roaring64Map. Another example is we want to further customize and optimize the format for BITMAP64 case, such as using vint64 instead of uint64 for map size.	2020-01-17 14:13:38 +08:00
HangyuanLiu	0ddca59d36	Add timestampadd/timestampdiff function (#2725 )	2020-01-15 21:47:07 +08:00
DanyBin	7768629f08	Add bitmap_contains and bitmap_has_any functions (#2752 )	2020-01-15 14:31:44 +08:00
frwrdt	f071d5a307	Support ends_with function (#2746 )	2020-01-14 22:37:20 +08:00
kangkaisen	1c9cfa7e0f	Fix invalid to_bitmap input lead to BE core (#2706 )	2020-01-08 22:14:37 +08:00
DanyBin	a028c52edd	Add BE function bitmap_or and bitmap_and (#2707 )	2020-01-08 19:59:44 +08:00

1 2

74 Commits