Commit Graph

5948 Commits

Author SHA1 Message Date
e63afc1a3c [feature-wip](remote storage)(step2) add storage_backend_mgr on BE side (#8663)
1. add storage backend mgr
2. remove env_remote
2022-03-31 11:13:14 +08:00
bf73ab69f2 [Bug] Fix DCHECK failed in runtime filter and mutable block (#8720)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-03-31 11:13:05 +08:00
3a8ca80eab [fix](doc) fix typo for show tablets command (#8740) 2022-03-30 10:22:00 +08:00
b98da02611 [chore][fix](httpv2) Use mariadb-java-client for http query api (#8716)
In #8319, I remove mysql-connector-java dependency because of license incompatibility.
But we need a mysql compatible driver for http query api. So I choose mariadb-java-client,
which is under LGPL.
2022-03-30 09:59:45 +08:00
76e0634030 [fix](ut) fix be ut in olap (#8739) 2022-03-30 09:53:21 +08:00
ba91b44553 [fix](load) fix bug that NodeChannel can not be destroyed ontime (#8705)
After the ReusableClosure is reset, we can not call join() method, or it will blocked forever.
2022-03-30 09:52:11 +08:00
46e1b05490 [refactor] Fix some code comments typo and cleanup unused include (#8684) 2022-03-30 09:51:48 +08:00
22cf6ea17c [chore] Modify build.sh and refactor dependency of FE submodules (#8732)
This PR fixes the #8731 and refactor the `build.sh` script.

The build.sh script is currently responsible for the compilation of the following Doris components.
1. FE
    - fe-common
    - fe-core
    - spark-dpp
    - hive-udf
    - java-udf
    - ui
2. BE
    - palo_be
    - meta_tool
3. broker

In the FE module.
- The 4 submodules `fe-common, fe-core, spark-dpp and ui` together form Frontend.
- `spark-dpp, hive-udf and java-udf` can be compiled separately to produce jar packages for individual use.

In the BE module.
- `palo_be` can start the BE process separately.
- `meta_tool` can be compiled separately to produce binaries.

The modified build.sh script has the following changes:

1. there is no longer an option to compile `ui` separately, build together with `--fe`.
2. `fe/be/spark-dpp/hive-udf/java-udf/palo_be/meta_tool` can be compiled separately.
3. all components except `java-udf` will be compiled by default (`java-udf` is in development)

Remaining issues:

Several submodules of FE have messy dependencies.
For example, `java-udf` depends on `fe-core`, and `fe-core` depends on `spark-dpp`,
resulting in a large binary jar of `java-udf`.
It needs to be reorganized afterwards.
2022-03-30 00:13:24 +08:00
3724f94728 [refactor][optimize](storage) Code optimization and refactoring for low-cardinality columns in storage layer (#8627)
* Optimize predicate calculation and refactor
2022-03-29 19:11:54 +08:00
3f5bc5206d [Improvement] broker load with hdfs support wildcard (#8718)
broker load with hdfs support wildcard
2022-03-29 18:21:41 +08:00
1ddfe20950 fix typo (#8714)
fix typo
2022-03-29 18:21:16 +08:00
92b95e1f57 [doc] Update VARCHAR.md (#8703)
* Update VARCHAR.md
2022-03-29 18:20:30 +08:00
23155e0f37 [typo] Fix runtime filter docs (#8702)
Fix runtime filter docs
2022-03-29 18:20:09 +08:00
c7bdf3e7c1 [doc] Update flink-doris-connector.md (#8696)
* Update flink-doris-connector.md
2022-03-29 18:19:47 +08:00
da87e0c4ee optimize create tpch table statments to achieve higher performance (#8683)
optimize create tpch table statments to achieve higher performance
2022-03-29 18:19:22 +08:00
b20af5ffa2 [Vectorized][refactor] refactor stddev/variance agg functions (#8660)
* [Vectorized][refactor] refactor stddev agg functions
2022-03-29 18:18:06 +08:00
Pxl
a9d185fcc4 [Enhancement] add clang-tidy config && add C++ Code Diagnostic document (#8642)
add clang-tidy config && add C++ Code Diagnostic document
2022-03-29 18:17:09 +08:00
8d2d8893d3 [fix] Fix a typo caused by a refactoring (#8724) 2022-03-29 16:37:17 +08:00
ba933d1e5e [refactor](storage_engine) Remove mem tablet from be (#8694) 2022-03-29 15:06:40 +08:00
66a3c574df [Vectorized][Bug] fix percentile_approx function to return always nullable (#8572) 2022-03-29 14:47:39 +08:00
23b348456b [Bug] Read bitmap/hll column failed for storage layer vectorization (#8560)
* fix bitmap error

* Update be/src/olap/rowset/segment_v2/segment_iterator.cpp

Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>

Co-authored-by: Wang Bo <wangbo36@meituan.com>
Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
2022-03-29 14:18:59 +08:00
0d43f8e130 [refactor] remove atomic.h/cpp use std::atomic instead (#8693) 2022-03-29 12:41:41 +08:00
f3659c87c1 [fix][chore](repository)(fe) check reponame when creating repository and modify build.sh (#8671)
1. We need to check repo name when creating repository
2. modify build.sh to not install spark-dpp when spark-dpp is not compiled
2022-03-29 11:32:52 +08:00
d82c138a60 [fix](user-property) Fix bug that can not set exec_mem_limit at user level (#8710) 2022-03-29 10:03:33 +08:00
365eba0b92 [fix] fix core dump when avg on not null decimal in empty table (#8681) 2022-03-28 12:41:00 +08:00
db5299d63e [fix] fix compile error (#8688)
Introduced from PR #8643
The condition_variable can only wait unique_lock
2022-03-28 11:53:42 +08:00
b67596ba2a [fix](ut) fix be ut failed (#8682) 2022-03-28 10:50:41 +08:00
Rio
7cf39fe885 [typo] Optimize some code comments (#8673) 2022-03-28 10:38:10 +08:00
aa1592b932 [community] Add more collaborators (#8672)
1. add dataroaring
2. remove qidaye because he has became the committer of Doris

discuss thread: https://lists.apache.org/thread/8bxnj7qw2p120v077nm8gny52m65d22r
2022-03-28 10:37:31 +08:00
e4c0dd97ed [doc] fix buffer pool default value (#8670) 2022-03-28 10:37:12 +08:00
Pxl
8eef5c337a [doc] fix sql-mode document (#8662) 2022-03-28 10:35:27 +08:00
079e35f3d3 [doc] update doc of vec-execution-engine (#8655) 2022-03-28 10:26:28 +08:00
d45026171d [test] regression framework use RollingFileAppender by default (#8654) 2022-03-28 10:25:34 +08:00
79be81a8a4 [chore] Optimize build_lz4 in build-thirdparty.sh (#8653) 2022-03-28 10:24:32 +08:00
727e8842d4 [test] limit memory used by regression test framework (#8651) 2022-03-28 10:24:12 +08:00
6cbc5014b9 [doc] update export.md (#8650)
"where" should be in front of "to".
2022-03-28 10:23:53 +08:00
7cfce63a13 [fix](mini-load) Remove mini load in LOADING and PENDING state (#8649)
1. Remove some unused code.
2. handle mini load with wrong state
    1. For some historical reasons, some mini load jobs in LOADING state have not been cleared.
        As a result, new load jobs cannot be committed.
    2. If a mini load job is created right before FE restart, the mini load job will be in PENDING state forever.
        But it should be removed finally.
2022-03-28 10:22:17 +08:00
57e038120f [chore] add -rtlib=compiler-rt for UBSAN under clang (#8647) 2022-03-28 10:21:55 +08:00
887301474d [doc] Update compilation.md (#8646)
Added solutions to the "fatal error: Killed signal terminated program ..."
problem encountered when compiling with Docker to the FAQ.
2022-03-28 10:21:31 +08:00
70fd5c0735 [doc] optimize some doc expression (#8645) 2022-03-28 10:20:38 +08:00
ea45940ef0 [fix] fix memory leak in VDataStreamRecvr::SenderQueue (#8643)
After `VDataStreamRecvr::SenderQueue::close` clears `_block_queue`, calling 
`VDataStreamRecvr::SenderQueue::add_block` again will cause a memory leak.

So, change the lock position, like the other add_block and add_batch.
2022-03-28 10:19:22 +08:00
cdf0a016c3 [fix](vec) fix coredump for aggregate function when delete large_data, due to alloc-dealloc-mismatch (#8641) 2022-03-28 10:17:13 +08:00
11f9f5fe4d [chore][be-test] Link gtest_main to provide default main function definition. (#8631) 2022-03-28 10:14:48 +08:00
726eaa68ea [fix](vectorization) Vectorization decimal arithmetic inconsistent (#8626) 2022-03-28 10:12:39 +08:00
HB
39717a85a2 [fix](load) Fix null column bug in load's mapping column setting (#8625) 2022-03-28 10:08:00 +08:00
f96bc62573 [feature](balance) Support balance between disks on a single BE (#8553)
Current situation of Doris is that the cluster is balanced, but the disks of a backend may be unbalanced.
for example, backend A have two disks: disk1 and disk2, disk1's usage is 98%, but disk2's usage is only 40%.
disk1 is unable to take more data, therefore only one disk of backend A can take new data,
the available write throughput of backend A is only half of its ability, and we can not resolve this through load or 
partition rebalance now.

So we introduce disk rebalancer, disk rebalancer is different from other rebalancer(load or partition)
which take care of cluster-wide data balancing. it takes care about backend-wide data balancing.

[For more details see #8550](https://github.com/apache/incubator-doris/issues/8550)
2022-03-28 10:03:21 +08:00
b2861f36c4 [chore] optimize aws thirdparty package download. (#8637) 2022-03-28 09:35:51 +08:00
Pxl
02612c7ec0 [Refactor] Remove ununsed file (#8657) 2022-03-27 01:41:06 +08:00
aeee738af0 Revert "[Refactor][agent_task] Remove etl mgr and etl job pool from be (#8635)" (#8666)
This reverts commit 6bc982c37436acf288f566cf10e084731b80fa44.
2022-03-25 18:32:50 +08:00
e285d09157 [Enhancement](load) speed up stream load for duplicate table, use template for faster get_type_info. (#8500) 2022-03-25 15:18:43 +08:00