Commit Graph

46 Commits

Author SHA1 Message Date
95f2f43c02 [fix](macOS) Failed to run BE UT due to syscall to map cache into shared region failed (#15641)
According to the post https://developer.apple.com/forums/thread/676684, the executable whose size is bigger than 2G may fail to start. The size of the executable `doris_be_test` generated by run-be-ut.sh is 2.1G (> 2G) now and we can't run it on macOS (arm64).

We can separate the debug info from the executable `doris_be_test` to reduce the size. After that, we can run `doris_be_test` successfully.
2023-01-06 01:23:37 +08:00
388f067300 [chore](workflow) Disable memory tracker by default on BE UT (macOS) (#14508) 2022-11-23 16:25:42 +08:00
249b688663 [chore](github) Add a workflow to check BE UT on macOS (#14506) 2022-11-23 08:38:28 +08:00
Pxl
bcd641877f [Enhancement](scan) disable build key range and filters when push down agg work (#14248)
disable build key range and filters when push down agg work
2022-11-21 12:47:57 +08:00
74a1e28af3 [Opt](exec) prevent the scan key split whole range (#14088)
prevent the scan key split whole range
2022-11-11 15:46:00 +08:00
2ef8f3f6f4 [enhancement](java-udf) Support loading libjvm at runtime (#13660) 2022-10-28 08:45:12 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00
de9b9b3e8e [chore](ut) enable asan core dump when running be ut (#12371) 2022-09-07 10:09:18 +08:00
b4101d46f0 [fix](workflow) Fix the errors when using sh to run shell scripts (#11898) 2022-08-19 21:28:52 +08:00
e63c83e8e1 [fix](script) Support starting BE without Java environment (#11910) 2022-08-19 17:58:40 +08:00
4fa53b4cdb [chore](workflow) Add shellcheck to check shell scripts (#11744) 2022-08-18 16:07:28 +08:00
573ebf235e [enhancement](build) Support customizing extra compile flags (#11444) 2022-08-03 11:02:17 +08:00
5215d95064 [enhancement](workflow) Use ccache to speed the BE UT (Clang) up (#11339) 2022-07-29 21:19:26 +08:00
d9095922d9 [Enhancement] [Memory] add strict memory usage compile option STRICT_MEMORY_USE (#10936)
In the strict memory usage mode of STRICT_MEMORY_USE=ON, when the capacity of the vectorized Hash Table is greater than 2G, it starts to grow when 75% of the capacity is satisfied, the memory usage of the vectorized Join becomes 50% of the previous value.

STRICT_MEMORY_USE=ON` expects BE to use less memory, and gives priority to ensuring stability when the cluster memory is limited.
2022-07-18 16:16:43 +08:00
b04a791895 [Enhancement] support compile with jemalloc (#10542)
A test feature to use jemalloc as default malloc.
2022-07-11 12:15:35 +08:00
Pxl
4750e94746 set default do not build benchmark-tool && and use lld/gold (#10215) 2022-06-25 22:31:11 +08:00
90f229c038 [refactor] remove useless plugin test code (#10061)
* remove plugin test code

* remove plugin test

Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-06-16 10:43:28 +08:00
39a2785ce2 [enhancement] support simd instructions on arm cpus through sse2neon (#10068)
* [enhancement] support simd instructions on arm cpus through sse2neon
2022-06-14 09:17:09 +08:00
e0cf2677a0 [dependency][enhancement] support build libhdfs in arm cpus (#10018)
Supports native hdfs functionality on arm cpu
This pr mainly upgrades libdfs3 and supports running on arm,and make libhdfs3 with kerberos as default
2022-06-10 19:40:41 +08:00
ca05d1ee01 [fix](memory tracker) Fix lru cache, compaction tracker, add USE_MEM_TRACKER compile (#9661)
1. Fix Lru Cache MemTracker consumption value is negative.
2. Fix compaction Cache MemTracker has no track.
3. Add USE_MEM_TRACKER compile option.
4. Make sure the malloc/free hook is not stopped at any time.
2022-05-25 08:56:17 +08:00
b3a2a92bf5 [deps] libhdfs3 build enable kerberos support (#9524)
Currently, the libhdfs3 library integrated by doris BE does not support accessing the cluster with kerberos authentication 
enabled, and found that kerberos-related dependencies(gsasl and krb5) were not added when build libhdfs3.

so, this pr will enable kerberos support and rebuild libhdfs3 with dependencies gsasl and krb5:

- gsasl version: 1.8.0
- krb5 version: 1.19
2022-05-22 20:58:19 +08:00
119ff2c02d [enhancement] Improve debugging experience. (#9677) 2022-05-19 16:36:37 +08:00
87fc46f84c update comments in run-be-ut.sh (#9092) 2022-04-26 12:48:35 +08:00
Pxl
d161161767 [refactor](script) remove unused parament on run-be-ut.sh (#9000)
parament -v is not work on run-be-ut.sh now.
2022-04-14 11:45:35 +08:00
5a44eeaf62 [refactor] Unify all unit tests into one binary file (#8958)
1. solved the previous delayed unit test file size is too large (1.7G+) and the unit test link time is too long problem problems
2. Unify all unit tests into one file to significantly reduce unit test execution time to less than 3 mins
3. temporarily disable stream_load_test.cpp, metrics_action_test.cpp, load_channel_mgr_test.cpp because it will re-implement part of the code and affect other tests
2022-04-12 15:30:40 +08:00
0d761f9909 [feature-wip][UDF][DIP-1] Support variable-size input and output for Java UDF (#8678)
This feature is proposed in DSIP-1. This PR support variable-length input and output Java UDF.
2022-04-11 09:36:16 +08:00
76e0634030 [fix](ut) fix be ut in olap (#8739) 2022-03-30 09:53:21 +08:00
Pxl
a8af8d2981 [fix](vectorized) fix core dump on get_json_string and add some ut (#8496) 2022-03-17 10:08:31 +08:00
77b21fba03 [chore] make options of build.sh and run-be-ut.sh work (#8271)
The h option of build.sh and j option of run-be-ut.sh do not work
in the docker with image
apache/incubator-doris:build-env-ldb-toolchain-latest.
2022-03-02 10:17:50 +08:00
f8d086d87f [feature](rpc) (experimental)Support implement UDF through GRPC protocol. (#7519)
Support implement UDF through GRPC protocol. This brings several benefits: 
1. The udf implementation language is not limited to c++, users can use any familiar language to implement udf
2. UDF is decoupled from Doris, udf will not cause doris coredump, udf computing resources are separated from doris, and doris services are not affected

But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large.

Create function like

```
CREATE FUNCTION rpc_add(INT, INT) RETURNS INT PROPERTIES (
  "SYMBOL"="add_int",
  "OBJECT_FILE"="127.0.0.1:9999",
  "TYPE"="RPC"
);
```
Function service need to implement `check_fn` and `fn_call` methods
Note:
THIS IS AN EXPERIMENTAL FEATURE, THE INTERFACE AND DATA STRUCTURE MAY BE CHANGED IN FUTURE !!!
2022-02-08 09:25:09 +08:00
e7103bfd08 [chore] Set the full path of make program to CMake (#7909) 2022-02-06 08:34:08 +08:00
Pxl
3ee000c13c [chore] support build with libc++ && add some build config (#7903)
support LIBCPP/LDD/BUILD_META_TOOL for build.sh
2022-01-30 16:47:22 +08:00
fb6e22f4ca [Fix] fix memory leak in be unit test (#7857)
1. fix be unit test memory leak
2. ignore mindump test with ASAN test
2022-01-29 01:00:38 +08:00
Pxl
cd73a6b84b [chore] fix clang compile error (#7883) 2022-01-26 12:53:35 +08:00
1c711705d7 [chore] Use ccache to speed recompiling test code up. (#7811) 2022-01-22 10:18:52 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00
760fc02bfe Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache (#6916)
Added bprc stub cache check and reset api, used to test whether the bprc stub cache is available, and reset the bprc stub cache
add a config used for auto check and reset bprc stub
2021-11-05 09:45:37 +08:00
Pxl
4dd610c28d [Feature] Support for storage layer benchmark (#6506)
* add benchmark tool
2021-09-02 09:57:19 +08:00
9148bcb673 [Build] Reduce the parallel of build (#6469) 2021-08-18 15:24:19 +08:00
9216735cfa [New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329)
1. FE vectorized plan code
2. Function register vec function
3. Diff function nullable type
4. New thirdparty code and new thrift struct
2021-08-11 14:54:06 +08:00
109b55ee5f [Shell] Add build parallel option (#5819)
Add build parallel option then we can build project with a user specified parallel not a fixed value.
2021-05-19 09:32:58 +08:00
029a8a046b [Build] Turn on glibc compatibility by default for upgrading gcc10 (#5528) 2021-03-21 11:22:53 +08:00
db2120a7f2 [Build][BE] Fix GLIBC_COMPATIBILITY can not compile in centos6 (#5472)
Add option to disable glibc_compatibility
2021-03-07 20:47:13 +08:00
e536823f92 [Thirdparty] Fix build thirdparty may be failed (#5187)
1. fix build thirdparty may be failed  in some os, because of default lib path is `lib` or`lib64` or `arrow` bulld failed by `brotil` and `zstd`
2. fix canot extract `.tar.bz2` file
2021-01-04 15:21:18 +08:00
279ae1cb75 Add fuzzy_parse option to speed up json import (#5114)
add a flag of fuzzy_parse, if the json file all object keys are the same and has same order, we only need to parse the first row, and then use index instead key to parse value
2020-12-25 09:19:42 +08:00
912547260a [UnitTest] Refactor BE unit test script (#4266)
1. Rename run-ut.sh to run-be-ut.sh
2. Find all test files from build dir instead of declaring separately in the script
3. Add gtest output to collect the result of unit test.
2020-08-11 10:23:51 +08:00