Commit Graph

277 Commits

Author SHA1 Message Date
Pxl
2d83167e50 [Feature] [Lateral-View] support outer combinator of table function (#9147) 2022-04-24 12:09:40 +08:00
ae680b4248 [UDF] support RPC udaf part 1: support create RPC udaf in fe (#8510) 2022-04-21 17:38:58 +08:00
869fdff2f0 [refactor] add reference path for source file from impala (#9115)
According to the requirements of the APLv2, the referenced code needs to be marked with the path of the source code.
2022-04-20 12:29:57 +08:00
9ac6d23a44 [Feature]support stddev/variance agg functions to window function (#8962) 2022-04-14 12:07:26 +08:00
290366787c [refactor] refactor code, replace some file with stl libs (#8759)
1. replace ConditionVariables with std::condition_variable
2. repalace Mutex with std::mutex
3. repalce MonoTime with std::chrono
2022-04-13 09:55:29 +08:00
5a44eeaf62 [refactor] Unify all unit tests into one binary file (#8958)
1. solved the previous delayed unit test file size is too large (1.7G+) and the unit test link time is too long problem problems
2. Unify all unit tests into one file to significantly reduce unit test execution time to less than 3 mins
3. temporarily disable stream_load_test.cpp, metrics_action_test.cpp, load_channel_mgr_test.cpp because it will re-implement part of the code and affect other tests
2022-04-12 15:30:40 +08:00
6ed59bb98b [refactor](code_style) remove useless inline #8933
1.Member functions defined in a class are inline by default (implicitly), and do not need to be added
2.inline is a keyword used for implementation, which has no effect when placed before the function declaration
2022-04-10 18:29:55 +08:00
c5718928df [feature-wip](array-type) support explode and explode_outer table function (#8766)
explode(ArrayColumn) desc:
> Create a row for each element in the array column. 

explode_outer(ArrayColumn) desc:
> Create a row for each element in the array column. Unlike explode, if the array is null or empty, it returns null.

Usage example:
1. create a table with array column, and insert some data;
2. open enable_lateral_view and enable_vectorized_engine;
```
set enable_lateral_view = true;
set enable_vectorized_engine=true;
```
3. use explode_outer
```
> select * from array_test;
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    3 | NULL | NULL   |
|    1 |    2 | [1, 2] |
|    2 |    3 | NULL   |
|    4 | NULL | []     |
+------+------+--------+

> select k1,explode_column from array_test LATERAL VIEW explode_outer(k3) TempExplodeView as explode_column;
+------+----------------+
| k1   | explode_column |
+------+----------------+
|    1 |              1 |
|    1 |              2 |
|    2 |           NULL |
|    4 |           NULL |
|    3 |           NULL |
+------+----------------+
```
4. explode usage example. explode return empty rows while the ARRAY is null or empty
```
> select k1,explode_column from array_test LATERAL VIEW explode(k3) TempExplodeView as explode_column;
+------+----------------+
| k1   | explode_column |
+------+----------------+
|    1 |              1 |
|    1 |              2 |
+------+----------------+
```
2022-04-08 12:11:04 +08:00
Pxl
dbbc6549bd [feature](vectorized) support vexplode_bitmap (#8890) 2022-04-08 09:20:26 +08:00
0b98d78664 [improvement](hll) Optimize Hyperloglog (#8829)
In meituan, pr #6625 was revert due to the oom probleam.
currently, we are trying to modify the old hyperloglog, based on pr #8555, we did some works.
via some test, we find it better than old hll, and better than apache:master hll.

Changes summary:

- use SIMD max tp speed up heavy function _merge_registers
- use phmap::flat_hash_set rather than std::set
- replace std::max
- other small changes
2022-04-08 09:06:08 +08:00
519305cb22 [feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669)
Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
2022-04-08 09:02:26 +08:00
e53c90fbef min and max window function bug fix (#8822)
[Fix bug] min and max window function bug fix #8822
2022-04-07 08:36:33 +08:00
6b0a642390 [feature][vectorized] Support explode json array func #8526 (#8539) 2022-04-03 10:06:47 +08:00
a75e4a1469 Window funnel (#8485)
Add new feature window funnel
2022-04-02 22:08:50 +08:00
6c5bbc6e4c fix agg functions check failed from empty table (#8785)
fix agg functions check failed from empty table
2022-04-02 10:44:55 +08:00
82792726ab [ubsan] fix some ubsan complains on vector and pointer (#8733) 2022-03-31 13:50:25 +08:00
bf73ab69f2 [Bug] Fix DCHECK failed in runtime filter and mutable block (#8720)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-03-31 11:13:05 +08:00
bea9a7ba4f [feature] Support pre-aggregation for quantile type (#8234)
Add a new column-type to speed up the approximation of quantiles.
1. The  new column-type is named `quantile_state` with fixed aggregation function `quantile_union`, which stores the intermediate results of pre-aggregated approximation calculations for quantiles.
2. support pre-aggregation of new column-type and quantile_state related functions.
2022-03-24 09:11:34 +08:00
e44038caf3 [feature-wip](array-type) Array data can be loaded in stream load. (#8368) (#8585)
Please refer to #8367 .
2022-03-22 15:25:40 +08:00
Pxl
be3d203289 [feature][vectorized] support table function explode_numbers() (#8509) 2022-03-22 11:38:00 +08:00
eeae516e37 [Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476)
Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G

Implement a new way of memory statistics based on TCMalloc New/Delete Hook,
MemTracker and TLS, and it is expected that all memory new/delete/malloc/free
of the BE process can be counted.
2022-03-20 23:06:54 +08:00
2ec0b81030 [improvement](storage) Low cardinality string optimization in storage layer (#8318)
Low cardinality string optimization in storage layer
2022-03-20 23:04:25 +08:00
Pxl
a824c3e489 [feature](vectorized) support lateral view (#8448) 2022-03-17 10:04:24 +08:00
d39c021d71 [fix] min function of not null varchar column get error result (#8479) 2022-03-16 11:38:55 +08:00
3ba4de0d27 [fix](ut) fix some UT compile or run failed cases (#8489) 2022-03-16 11:38:35 +08:00
e17aef9467 [refactor] refactor the implement of MemTracker, and related usage (#8322)
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;

Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
2022-03-11 22:04:23 +08:00
e0ef9b8f6c [refactor](vectorized) to_bitmap(-1) return NULL instead of return parse failed error_message (#8373) 2022-03-11 17:21:47 +08:00
68dd799796 [improvement](vectorized) Support function tuple is null (#8442) 2022-03-11 16:54:37 +08:00
Pxl
10c3712aa1 [fix](vectorized) fix arithmetic calculate get wrong result(#8226) 2022-03-09 13:03:57 +08:00
Pxl
8214f32003 [fix] fix core dump on minmax_filter with decimal type (#8381) 2022-03-08 18:56:48 +08:00
454b45bea3 [feature](vectorize)(function) support regexp&&sm4&&aes functions (#8307) 2022-03-08 13:14:02 +08:00
d9c2c2cac6 Revert "[refactor] remove unused new_in_predicate code (#8263)" (#8372)
This reverts commit 757e35744d4f6319e936fca84b4be13cf043a578.
2022-03-07 15:55:38 +08:00
Pxl
0ee53be883 [fix][improvement](runtime-filter) fix string type length limit error && add runtime filter decimal support (#8282) 2022-03-03 22:44:49 +08:00
09bfb8b9d3 [fix] (rpc-udf) Fixed the problem that the query could not be interrupted (#8248)
if an error occurred in the rpc server during the execution of rpc-udf.
Add java,cpp,python demo of rpc-udf server
2022-03-03 09:30:03 +08:00
246ac4e37a [fix] fix a bug of encryption function with iv may return wrong result (#8277) 2022-03-02 17:26:44 +08:00
940efc6014 [Fix]Remove duplicated destructor function in MinMaxFuncBase (#8287) 2022-03-01 18:38:09 +08:00
2b9b0fc1ec [Fix] Function percentile input null return null (#8238) 2022-03-01 14:42:48 +08:00
757e35744d [refactor] remove unused new_in_predicate code (#8263)
remove unused code of new_in_predicate.h/cpp
2022-03-01 11:11:42 +08:00
e77e2b0bf0 [improvement](lateral-view) Add number rows filtered in profile (#8251)
Add `RowsFiltered` counter in TableFunctionNode profile.
So that we can know the total number of rows that TableFunctionNode processed
2022-03-01 11:04:57 +08:00
8642fa38b9 [Bug] Double/Float % 0 should be NULL (#8230)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-25 11:03:42 +08:00
Pxl
4c5d7c27df [Bug] group_concat(value,null) not return null 2022-02-25 11:03:23 +08:00
a6bc9cbe53 [Function] Refactor the function code of log (#8199)
1. Support return null when input is invalid
2. Del the unless code in vec function

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-02-24 11:06:58 +08:00
c47368f80c [fix] (udf) fix check_fn and fn_call function name not same (#8132) 2022-02-22 09:18:07 +08:00
16020cbdf9 [fix](lateral-view) Fix bug that explode_json_array_string return unstable result (#8152)
Co-authored-by: morningman <chenmingyu@baidu.com>
2022-02-21 09:38:36 +08:00
826738d97f [docs]Some doc improvements and typo fix (#8153) 2022-02-21 09:36:01 +08:00
50864aca7d [refactor] fix warings when compile with clang (#8069) 2022-02-19 11:29:02 +08:00
7a73645eee [refactor] remove some unused code (#8022) 2022-02-12 15:17:28 +08:00
5029ef46c9 [fix] fix ltrim result may incorrect in some case (#7963)
fix ltrim result may incorrect in some case
according to https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Built-in Function: int __builtin_cl/tz (unsigned int x)
If x is 0, the result is undefined.
So we handle the case of 0 separately

this function return different between gcc and clang when x is 0
2022-02-09 13:06:37 +08:00
Pxl
0553ce2944 [feature](vectorization) support function topn && remove some unused code (#7793) 2022-02-09 13:05:31 +08:00
f8d086d87f [feature](rpc) (experimental)Support implement UDF through GRPC protocol. (#7519)
Support implement UDF through GRPC protocol. This brings several benefits: 
1. The udf implementation language is not limited to c++, users can use any familiar language to implement udf
2. UDF is decoupled from Doris, udf will not cause doris coredump, udf computing resources are separated from doris, and doris services are not affected

But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large.

Create function like

```
CREATE FUNCTION rpc_add(INT, INT) RETURNS INT PROPERTIES (
  "SYMBOL"="add_int",
  "OBJECT_FILE"="127.0.0.1:9999",
  "TYPE"="RPC"
);
```
Function service need to implement `check_fn` and `fn_call` methods
Note:
THIS IS AN EXPERIMENTAL FEATURE, THE INTERFACE AND DATA STRUCTURE MAY BE CHANGED IN FUTURE !!!
2022-02-08 09:25:09 +08:00