doris

Author	SHA1	Message	Date
stdpain	bd88309346	[Refactor] fix warning in gcc8+, fix warning from brpc, s2 (#5763 ) Fix warning from brpc, S2 Fix -Warray-bounds	2021-05-12 10:38:46 +08:00
Zhengguo Yang	a803ceea86	[refactor] Remove boost mutex, use std::mutex instead (#5684 ) * Remove boost mutex, use std::mutex instead * replace shared_mutex	2021-04-22 11:29:36 +08:00
stdpain	7445051174	[Refactor] fix warning in gcc8+, update rapidjson (#5649 )	2021-04-20 09:14:44 +08:00
Patrick	1e8c4584ab	[Function] Add BE udf bitmap_min (#2538 ) (#5581 ) this function will return the min result of the input bitmap .	2021-04-08 09:11:32 +08:00
stdpain	a56e7e2192	[Refactor] make uint24_t,OLAPIndexFixedHeader as a POD type (#5559 )	2021-03-27 18:59:23 +08:00
stdpain	a1bce25677	[BUG] Fix Memory Leak in SchemaChange And Fix some DCHECK error (#5491 )	2021-03-17 09:27:05 +08:00
stdpain	7eae3e280a	[optimization] use inline optimize ExprContext::get_value (#5385 )	2021-02-16 22:35:14 +08:00
HappenLee	a1808c1a71	[Function] Add BE udf bitmap_not (#5346 ) (#5357 ) this function will return the not result of inputs two bitmap.	2021-02-07 22:39:17 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
lihuigang	05ac7fcd4a	[Function] Add BE udf bitmap_xor (#5098 ) this function will return the xor result of inputs two bitmap .	2021-01-04 09:27:46 +08:00
HappenLee	f2cf8d2c5e	[Bug-Fix] Fix the bug of `PERCENTILE_APPROX` return error result `nan` and add `PERCENTILE_APPROX` UT (#5172 )	2021-01-03 15:45:22 +08:00
Youngwb	650536d53e	[Feature] Add Topn udaf (#4803 ) For #4674 This is a udaf for approximate topn using Space-Saving algorithm. At present, we can only calculate the frequent items and their frequencies in a certain column, based on which we can implement similar topN functions supported by Kylin in the future. I have also added a test to calculate the accuracy of this algorithm. The following is a rough running result. The total amount of data is 1 million lines and follows the Zipfian distribution, where Element Cardinality represents the data cardinality, 20X, 50X.. The value representing space_expand_rate is 20,50, which is used to set the counter number in the space-saving algorithm ``` zf exponent = 0.5 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 94% 98% 99% zf exponent = 0.6，1 Element cardinality 20X 50X 100X 1000 100% 100% 100% 10000 100% 100% 100% 100000 100% 100% 100% 500000 100% 100% 100% ```	2020-12-16 21:58:34 +08:00
Xinyi Zou	b1b99ae884	[Function] Support Decimal to calculate variance and standard deviation (#4959 )	2020-12-06 08:49:01 +08:00
Mingyu Chen	5215727b45	[Function] Let "str_to_date" return correct type (#5004 ) The return type of str_to_date depends on whether the time part is included in the format. If included, it is DATETIME, otherwise it is DATE. If the format parameter is not constant, the return type will be DATETIME. The above judgment has been completed in the FE query planning stage, so here we directly set the value type to the return type set in the query plan. For example: A table with one column k1 varchar, and has 2 lines: "%Y-%m-%d" "%Y-%m-%d %H:%i:%s" Query: SELECT str_to_date("2020-09-01", k1) from tbl; Result will be: 2020-09-01 00:00:00 2020-09-01 00:00:00 Query: SELECT str_to_date("2020-09-01", "%Y-%m-%d"); Return type is DATE Query: SELECT str_to_date("2020-09-01", "%Y-%m-%d %H:%i:%s"); Return type is DATETIME	2020-12-03 09:33:26 +08:00
lichaoyong	bb36de52a6	[Bug] Fix locate bug when start_pos larger than str len (#4975 ) ``` select locate('', 'abc', 10); ``` Return 0 not 10	2020-11-29 10:38:30 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
sduzh	10e1e29711	Remove header file common/names.h (#4945 )	2020-11-26 17:00:48 +08:00
HappenLee	f0e89395e6	[Bug] Fix DCHECK failed in group_concat (#4850 ) issue:#4849	2020-11-11 21:21:37 +08:00
Mingyu Chen	bfdb15c730	[Bug] Fix some date functions to make their result same as MySQL (#4786 ) dayofweek, dayofmonth, dayofyear, weekofyear, timediff Also fix ut compilation problem	2020-10-27 12:52:44 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
HappenLee	c00a5cb543	[Bug] Fix the core problem of function `split_part` and add the UT of core case (#4721 ) issue:#4720	2020-10-13 10:09:39 +08:00
Zhengguo Yang	75e0ba32a1	Fixes some be typo (#4714 )	2020-10-13 09:37:15 +08:00
Zhengguo Yang	98e71a8b9f	[Bug][Function] Fix rand() function return same value (#4709 ) fix rand function return same value when no parameter	2020-10-11 15:40:38 +08:00
ccoffline	f3cdf167d1	[Feature] Add time_round builtin functions (#4640 ) #4619 Add time_round functions that provides `time_floor` & `time_ceil` at each time unit. Fix two related bugs. - #4618 - Fix `struct TimeInterval` to use `int64_t` instead of `int32_t`, in case when the second diff overflow	2020-10-09 16:05:51 +08:00
Yingchun Lai	b1853caeed	[UDF] Improve performance of function money_format (#4672 ) Use static local variable instead of create it every calls. Time cost of the new added unit benchmark test could reduce from about 60 seconds to 10 seconds.	2020-09-28 13:39:41 +08:00
HuangWei	4caa6f9b33	[Bug] fix get_parsed_paths() subscript out of range (#4585 )	2020-09-12 16:04:21 +08:00
xinghuayu007	1a30bcbf36	[SQL Function][Bug] Fix parse_url() bug (#4429 ) The parameter 'part' of parse_url function does not support lower case, and parse protocol not right. And This function does not support parse 'port'. This PR tries to make parse_url function case insensitive and support parse 'port'. The issue: #4451	2020-09-03 17:06:09 +08:00
ZhangYu0123	97d963468a	[Code Cleanup] Template nest convert to c++11 syntax and style (#4442 )	2020-08-26 10:51:52 +08:00
xinghuayu007	bfb39a2826	[SQL][Function] Add replace() function (#4347 ) replace is an user defined function, which is to replace all old substrings with a new substring in a string, as follow: mysql> select replace("http://www.baidu.com:9090", "9090", ""); +------------------------------------------------------+ \| replace('http://www.baidu.com:9090', '9090', '') \| +------------------------------------------------------+ \| http://www.baidu.com: \| +------------------------------------------------------+	2020-08-20 09:28:53 +08:00
ZhangYu0123	11ec7bbe24	[Bug]Add LargeInt cast to Date and Datatime, add timezone to stale_version_path_json_doc (#4321 ) (1) Add LargeInt cast to date and datatime, see #3864 LargeInt can cast to date and datatime. Fix this error: Unable to find _ZN5doris13CastFunctions16cast_to_date_valEPN9doris_udf15FunctionContextERKNS1_11LargeIntValE (2) Add local timezone info to stale_version_path_json_doc rest api Add timezone to "last create time" field. { "path id": "1", "last create time": "1970-01-01 10:46:40 +0800", "path list": "1 -> [2-3] -> [4-5]" }, and add timezone to the test unix, see #4121 .	2020-08-13 23:38:30 +08:00
Zhengguo Yang	10e3fc2778	[BUG] Fix abs function cannot handle bigint or bigger data type (#4326 )	2020-08-12 20:58:35 +08:00
ZhangYu0123	bdbe59a41a	[Bug]Fix be crash caused by decimal to date (#4282 ) Fix be crash caused by cast decimal to date. A be crashed bug caused by Unable to find. _ZN5doris18DecimalV2Operators16cast_to_date_val. also see #4281	2020-08-09 20:47:43 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
worker24h	fdcc223ad2	[Bug][Json] Refactor the json load logic to fix some bug 1. Add `json_root` for nest json data. 2. Remove `_jmap` to make the logic reasonable.	2020-07-30 10:36:34 +08:00
Zhengguo Yang	50e6a2c8a0	[SQL][Function] Fix from/to_base64 may return incorrect value (#4183 ) from/to_base64 may return incorrect value when the value is null #4130 remove the duplicated base64 code fix the base64 encoded string length is wrong， and this will cause the memory error	2020-07-27 22:55:05 +08:00
Mingyu Chen	c3d9feed75	[Load][Json] Refactor json load logic to make it more reasonable (#4020 ) This CL mainly changes: 1. Reorganized the code logic to limit the supported json format to two, and the import behavior is more consistent. 2. Modified the statistical behavior of the number of error rows when loading in json format, so that the error rows can be counted correctly. 3. See `load-json-format.md` to get details of loading json format.	2020-07-07 23:07:28 +08:00
yangzhg	5ade21b55d	[Load] Support load true or false as boolean value (#3898 ) Fixes #3831 After this PR insert into: `1/"1" -> 1, 0/"0"->0, true/"true"->1, false/"false" -> 0, "10"->null, "xxxx" -> null` load: `1/true -> 1, 0/false -> 0` other -> null	2020-07-02 13:58:24 +08:00
Mingyu Chen	af1beb6ce4	[Enhance] Add prepare phase for some timestamp functions (#3947 ) Fix: #3946 CL: 1. Add prepare phase for `from_unixtime()`, `date_format()` and `convert_tz()` functions, to handle the format string once for all. 2. Find the cctz timezone when init `runtime state`, so that don't need to find timezone for each rows. 3. Add constant rewrite rule for `utc_timestamp()` 4. Add doc for `to_date()` 5. Comment out the `push_handler_test`, it can not run in DEBUG mode, will be fixed later. 6. Remove `timezone_db.h/cpp` and add `timezone_utils.h/cpp` The performance shows bellow: 11,000,000 rows SQL1: `select count(from_unixtime(k1)) from tbl1;` Before: 8.85s After: 2.85s SQL2: `select count(from_unixtime(k1, '%Y-%m-%d %H:%i:%s')) from tbl1 limit 1;` Before: 10.73s After: 4.85s The date string format seems still slow, we may need a further enhancement about it.	2020-06-29 19:15:09 +08:00
sdgshawn	61be7132a9	fix for be server crash which throwing syntax error when parse json … (#3846 ) Fix for be server crash which throwing syntax error when parse json from kafka message	2020-06-13 12:45:16 +08:00
kangkaisen	559714f3d4	Fix largeint max min bug (#3793 )	2020-06-08 21:01:30 +08:00
HangyuanLiu	0f6e74f3f9	[BUG] Fix location url in agg_fn_evaluator (#3780 )	2020-06-06 11:34:12 +08:00
EmmyMiao87	e16873a6c1	Fix large string val allocation failure (#3724 ) * Fix large string val allocation failure Large bitmap will need use StringVal to allocate large memory, which is large than MAX_INT. The overflow will cause serialization failure of bitmap. Fixed #3600	2020-06-03 17:07:54 +08:00
worker24h	e76f712bb3	[Bug] Load data is error in json load	2020-05-28 17:28:33 +08:00
worker24h	fb66bac5fe	[Bug] Fix null pointer access in json-load (#3692 ) Add check for null pointer to avoid core dump	2020-05-26 22:41:30 +08:00
yangzhg	ba7d2dbf7b	[Function] Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (#3638 ) Support utf-8 encoding for string function `instr`, `locate`, `locate_pos`, `lpad`, `rpad` and add unit test for them	2020-05-22 14:34:26 +08:00
worker24h	ef8fd1fcbe	[Load] Support load json-data into Doris by RoutineLoad or StreamLoad (#3553 ) Doris support load json-data by RoutineLoad or StreamLoad	2020-05-21 13:00:49 +08:00
EmmyMiao87	0d66e6bd15	Support bitmap_intersect (#3571 ) * Support bitmap_intersect Support aggregate function Bitmap Intersect, it is mainly used to take intersection of grouped data. The function 'bitmap_intersect(expr)' calculates the intersection of bitmap columns and returns a bitmap object. The defination is following: FunctionName: bitmap_intersect, InputType: bitmap, OutputType: bitmap The scenario is as follows: Query which users satisfy the three tags a, b, and c at the same time. ``` select bitmap_to_string(bitmap_intersect(user_id)) from ( select bitmap_union(user_id) user_id from bitmap_intersect_test where tag in ('a', 'b', 'c') group by tag ) a ``` Closed #3552. * Add docs of bitmap_union and bitmap_intersect * Support null of bitmap_intersect	2020-05-20 21:12:02 +08:00
Dayue Gao	b62b310864	[Bug] Fix BE crash when input to hll_merge is null (#3521 )	2020-05-09 11:01:48 +08:00
Youngwb	a656a7ddd4	Support append_trailing_char_if_absent function (#3439 )	2020-05-09 08:59:34 +08:00
yangzhg	94b3a2bd50	[Bug] Fix string functions not support multibyte string (#3345 ) Let string functions support utf8 encoding	2020-05-08 12:52:46 +08:00

1 2 3 4

151 Commits