Commit Graph

168 Commits

Author SHA1 Message Date
b6b8ef3a18 [chore](script) avoid failed while buiding on non-git repository (#23982)
Co-authored-by: yiguolei <676222867@qq.com>
2023-09-08 10:08:00 +08:00
7727535b91 [Enhancement](build) try to download commit specific source code when git submodule failed (#23846) 2023-09-04 20:12:46 +08:00
57ca7d66d3 [Fix](multi-catalog) Fix zlib init error by using doris's zlib shared library and jni.log does not output. (#23260) 2023-09-02 21:44:14 +08:00
5ba505ebf4 [fix](multi-catalog)fix avro and jdbc scanner dependency (#23015)
add preload-extensions module, put all conflict dependencies to pom.xml in preload-extensions
2023-08-20 19:28:17 +08:00
919bfd73f1 [improvement](multi-catalog)add scanner isolation class loader (#22247)
Add scanner isolation class loader to make each plugin non-conflicting.
The BE will get scanner classes by JNI call and use JniClassLoader load them.
In the last version,we always get canner classes from the system class path by default,
so it cannot isolate the classes for each scanner
2023-08-10 10:02:46 +08:00
8371171e44 [Feature](inverted index) add inverted index tool (#22207) 2023-07-27 21:28:34 +08:00
b2be42c31c [fix](jdbc catalog) fix jdbc catalog like expr query error (#22141) 2023-07-25 22:30:28 +08:00
f84af95ac4 [feature](Nereids) Add minidump replay and refactor user feature of minidump (#20716)
### Two main changes:
- 1、add minidump replay
- 2、change minidump serialization of statistic messages and some interface between main logic of nereids optimizer and minidump

### Use of nereids ut:
- 1、save minidump files:  
        Execute command by mysql-client:
```
set enable_nereids_planner=true;
set enable_minidump=true;
```
        Execute sql in mysql-client
- 2、use nereids-ut script to execute directory:
```
cp -r ${DORIS_HOME}/minidump ${DORIS_HOME}/output/fe && cd ${DORIS_HOME}/output/fe
./nereids_ut --d ${directory_of_minidump_files}
```

### Refactor of minidump
- move statistics used serialization to serialization of input and serialize with catalogs
- generating minidump file only when enable_minidump flag is set, minidump module interactive with main optimizer only by :
serializeInputsToDumpFile(catalog, statistics, query) && serializeOutputsToDumpFile(outputplan).
2023-07-25 15:26:19 +08:00
d180ed418d [fix](stacktrace) Speed up stack trace (#21755)
Introduce libunwind get stack trace, cost is negligible and has line numbers.
use StackTraceCache, PHDRCache speed up, is customizable and has some optimizations.
Other stack trace tools remain: glog, boost, glibc, in case for need.

TODO:

currently support linux __x86_64__, __arm__, __powerpc__, not supported __FreeBSD__, APPLE
Note: __arm__, __powerpc__ not been verified
Support signal handle
libunwid support unw_backtrace for jemalloc
Use of undefined compile option USE_MUSL for later
2023-07-19 15:43:14 +08:00
fde73b6cc6 [Fix](multi-catalog) Fix hadoop short circuit reading can not enabled in some environments. (#21516)
Fix hadoop short circuit reading can not enabled in some environments.
- Revert #21430 because it will cause performance degradation issue.
- Add `$HADOOP_CONF_DIR` to `$CLASSPATH`.
- Remove empty `hdfs-site.xml`. Because in some environments it will cause hadoop short circuit reading can not enabled.
- Copy the hadoop common native libs(which is copied from https://github.com/apache/doris-thirdparty/pull/98
) and add it to `LD_LIBRARY_PATH`. Because in some environments `LD_LIBRARY_PATH` doesn't contain hadoop common native libs, which will cause hadoop short circuit reading can not enabled.
2023-07-06 15:00:26 +08:00
a6b51ec19a [Feature](avro) Support Apache Avro file format (#19990)
support read avro file by hdfs() or s3() .
```sql
select * from s3(
         "uri" = "http://127.0.0.1:9312/test2/person.avro",
         "ACCESS_KEY" = "ak",
         "SECRET_KEY" = "sk",
         "FORMAT" = "avro");
+--------+--------------+-------------+-----------------+
| name   | boolean_type | double_type | long_type       |
+--------+--------------+-------------+-----------------+
| Alyssa |            1 |     10.0012 | 100000000221133 |
| Ben    |            0 |    5555.999 |      4009990000 |
| lisi   |            0 | 5992225.999 |      9099933330 |
+--------+--------------+-------------+-----------------+

select * from hdfs(
                "uri" = "hdfs://127.0.0.1:9000/input/person2.avro",
                "fs.defaultFS" = "hdfs://127.0.0.1:9000",
                "hadoop.username" = "doris",
                "format" = "avro");
+--------+--------------+-------------+-----------+
| name   | boolean_type | double_type | long_type |
+--------+--------------+-------------+-----------+
| Alyssa |            1 |  8888.99999 |  89898989 |
+--------+--------------+-------------+-----------+
```

current avro reader only support common data type, the complex data types will be supported later.
2023-06-28 21:15:35 +08:00
1dec592e91 [improvement](fs_bench) optimize the usage of fs benchmark tool for hdfs (#21154)
Optimize the usage of fs benchmark tool:

1. Remove `Open` benchmark, it is useless.
2. Remove `Delete` benchmark, it is dangerous.
3. Add `SingleRead` benchmark, user can specify an exist file to test read operation:

    `sh bin/run-fs-benchmark.sh --conf=conf/hdfs_read.conf --fs_type=hdfs --operation=single_read`

4. Modify the `run-fs-benchmark.sh`, remove `OPTS` section, use options in `fs_benchmark_tool` directly
5. Add some custom counters in the benchmark result, eg:

```
--------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                      Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------------------
HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1              6864 ms         2385 ms            1 ReadRate=200.936M/s
HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1              3919 ms         1828 ms            1 ReadRate=351.96M/s
HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1              3839 ms         1819 ms            1 ReadRate=359.265M/s
HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_mean         4874 ms         2011 ms            3 ReadRate=304.054M/s
HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_median       3919 ms         1828 ms            3 ReadRate=351.96M/s
HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_stddev       1724 ms          324 ms            3 ReadRate=89.3768M/s
HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_cv          35.37 %         16.11 %             3 ReadRate=29.40%
HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_max          6864 ms         2385 ms            3 ReadRate=359.265M/s
HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_min          3839 ms         1819 ms            3 ReadRate=200.936M/s
```

- For `open_read` and `single_read`, add `ReadRate` as `bytes per second`.
- For `create_write`, add `WriteRate` as `bytes per second`.
- For `exists` and `rename`, add `ExistsCost` and `RenameCost` as `time cost per one operation`.
2023-06-26 11:37:14 +08:00
615778924e [feature](fs) add fs benchmark tool framework (#20770)
Add an optional executable binary fs_benchmark_tool, for test the performance of file system such as hdfs, s3.
Usage:

./fs_benchmark_tool --conf my.conf --fs_type=s3 --operation=read --iterations=5
in my.conf, you can add any config key value with following format:

key1=value1
key2=value2
By default, this binary will not be built. Only build it when setting BUILD_FS_BENCHMARK=ON.
The binary will be installed in output/be/lib.

For developer, you can add new subclass of BaseBenchmark to add your own benchmark.
See be/src/io/fs/benchmark/s3_benchmark.hpp for an example
2023-06-14 17:50:06 +08:00
cd46f459db [minor](script) fix typo in build.sh (#20757) 2023-06-14 09:05:01 +08:00
5d2758cb8f [improvement](build) move add BE extension jars to java_extensions dir (#20740)
Follow #20185
Move all BE java extension jars to `be/lib/java_extensions/` dir.
Also remove `udf` dir, used for BE native udf, which is deprecated since v1.2

The final output is:

```
output
├── be
│   ├── bin
│   ├── conf
│   ├── dict
│   ├── lib
|   ├── java_extensions
│       ├── hudi-scanner-jar-with-dependencies.jar
│       ├── java-udf-jar-with-dependencies.jar
│       ├── jdbc-scanner-jar-with-dependencies.jar
│       ├── max-compute-scanner-jar-with-dependencies.jar
│       └── paimon-scanner-jar-with-dependencies.jar
│   ├── LICENSE-dist.txt
│   ├── licenses
│   ├── log
│   ├── NOTICE.txt
│   ├── storage
│   └── www
└── fe
    ├── bin
    ├── conf
    ├── doris-meta
    ├── lib
    ├── LICENSE-dist.txt
    ├── licenses
    ├── log
    ├── mysql_ssl_default_certificate
    ├── NOTICE.txt
    ├── spark-dpp
    └── webroot
```
2023-06-13 18:55:12 +08:00
57656b2459 [Enhancement](java-udf) java-udf module split to sub modules (#20185)
The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner.

Co-authored-by: lexluo <lexluo@tencent.com>
2023-06-13 09:41:22 +08:00
a0c3ebeeab [Chore](build) fix typo in build.sh (#19846) 2023-05-19 11:49:20 +08:00
cbe12cfb69 [chore](build) Support specifying output path #19669
now we could

./build.sh --be --fe --output PATH_TO_BINARY_YOU_LIKE
did not modify the default value
2023-05-19 09:22:36 +08:00
3d795de2d5 [chore](build) avoid generating generated code every time (#19813)
When calling generated-source.sh in build.sh, not to remove the gensrc/build dir.
2023-05-19 08:47:36 +08:00
b3ce4593b1 [deps](libhdfs) update hadoop libhdfs to 3.3.4.1 for doris (#19832) 2023-05-19 08:44:32 +08:00
f32deb18e9 [Update](build) change clucene from thirdparty to git module (#19352) 2023-05-19 08:25:51 +08:00
3cc8bbb93f [chore](Java UDF)remove the error code and add the copy jar (#19503)
* [chore](Java UDF)remove the error code and add the copy jar

* [chore](Java UDF)remove the error code and add the copy jar
2023-05-11 16:17:29 +08:00
b72ff93c7a [chore](java udf)Add Java UDF compilation options (#19468) 2023-05-10 10:51:11 +08:00
096aa25ca6 [improvement](orc-reader) Implements ORC lazy materialization (#18615)
- Implements ORC lazy materialization, integrate with the implementation of https://github.com/apache/doris-thirdparty/pull/56 and https://github.com/apache/doris-thirdparty/pull/62.
- Refactor code: Move `execute_conjuncts()` and `execute_conjuncts_and_filter_block()` in `parquet_group_reader `to `VExprContext`, used by parquet reader and orc reader.
- Add session variables `enable_parquet_lazy_materialization` and `enable_orc_lazy_materialization` to control whether enable lazy materialization.
- Modify `build.sh` to update apache-orc submodule or download package every time.
2023-05-09 23:33:33 +08:00
e08de52ee7 [chore](compile) using PCH for compilation acceleration under clang (#19303) 2023-05-08 19:51:06 +08:00
c936810e83 [fix](compile) fix bug in build.sh (#19314)
fix path for $(dirname $0)/generated-source.sh to enable docker build
2023-05-06 10:00:20 +08:00
c9fa10ac10 [fix](doc) avoid generate config doc automatically (#19302)
After #19246, when compilng FE, it will automatically generate Config and Session Variables doc and overwrite the origin one.
Need to avoid it because it is not ready to use yet
2023-05-05 20:39:05 +08:00
9d18be9dd3 [doc](thrift) update doc for thrift 0.16 (#19217)
* 1

update doc for thrift 0.16
2023-05-02 16:00:10 +08:00
Pxl
ec517a53a8 [Chore](build) upgrade clang-format version to 16 && move thrift to fe-common (#19155)
upgrade clang-format version to 16
move thrift to fe-common
fix core dump on pipeline engine when operator canceled and not prepared
2023-04-28 14:14:51 +08:00
Pxl
c40860aba4 [Chore](thrift) generate thrift java code to make code analysis work well (#18793)
generate thrift java code to make code analysis work well
2023-04-19 19:33:17 +08:00
afbbf84675 [chore](build) add apache-orc git submodule path (#18695)
1. Add apache-orc git submodule update path, not update all modules
When sh build.sh, update all modules will fails serveral times because of unstable github network.
It wastes many time.

2. Add gitignore for  be/src/apache-orc/   to avoid mistake commits.
2023-04-17 00:08:25 +08:00
4284fc4e75 [chore] Download apache orc source code from github if git does not work in build.sh. (#18625)
* [chore] Download apache orc source code from github if git does not work in build.sh.

* add cd "${DORIS_HOME}"

* Fix blank issue.
2023-04-14 17:54:14 +08:00
2209b714d1 [chore](orc) Update orc lib to third party lib(1.8.3) using git submodule. (#18531) 2023-04-12 10:37:50 +08:00
12a9214448 [chore](build) Build java udf by default #18255 2023-04-09 10:33:39 +08:00
Pxl
76d76f672c [Chore](build) enchancement for backend build time usage (#18344) 2023-04-06 11:13:21 +08:00
aff260c06f [Enhancement](HttpServer) Support https interface (#16834)
1. Organize http documents
2. Add http interface authentication for FE
3. **Support https interface for FE**
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
2023-04-03 14:18:17 +08:00
7e61a85331 [refactor](libhdfs) introduce hadoop libhdfs (#18204)
1. Introduce hadoop libhdfs 
2. For Linux-X86 platform, use the hadoop libhdfs
3. For other platform, use libhdfs3, because currently we don't have  hadoop libhdfs binary for other platform

Co-authored-by: adonis0147 <adonis0147@gmail.com>
2023-03-31 18:41:39 +08:00
e5793249cd [opt](hashtable) Modify default filled strategy to 75% (#18242) 2023-03-31 09:28:11 +08:00
f9c4542d04 [chore](build) Porting to Clang-16 (#18196)
This PR ports the codebase to Clang-16.

Upgrade some third-party libraries:
1. Apache BRPC: 1.2.0 -> 1.4.0 (Some bugs are fixed and all patches for 1.2.0 can be removed.)
2. Boost: 1.73.0 -> 1.81.0 (Porting to Clang-16)
3. libclucene: 2.4.6 -> 2.4.8 (Porting to Clang-16)
2023-03-30 10:36:29 +08:00
0bb04c08aa [improvement](coverage) build be with coverage enabled, which can get coverage data with llvm-cov-15 (#17995) 2023-03-23 12:07:19 +08:00
82df2ae9d8 [feature](mysql) Support secure MySQL connection to FE (#17138)
Background:
Doris currently does not support SSL connection from MySQL clients, it's not secure enough in some cases, especially access Doris via the public internet.

Solution:
- Use TLS1.2 protocol to encrypt information.
- Implementation details
  * server <--- connect <--- client
  * if enable SSL: {
  * server <--- SSL connection request packet <--- client
  * server <--- SSL Exchange ---> client } (we will add this `if` logic part in this PR)
  * server ---> handshake request packet ---> client
  * server <--- encrypted data ---> client (this part will be realized in this PR)
- reference1 https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_connection_phase.html#sect_protocol_connection_phase_initial_handshake_ssl_handshake
- reference2 https://www.rfc-editor.org/rfc/rfc5246

close #16313

Signed-off-by: Yukang Lian <yukang.lian2022@gmail.com>
Co-authored-by: Gavin Chou <gavineaglechou@gmail.com>
Co-authored-by: morningman <morningman@163.com>
2023-03-04 12:14:48 +08:00
3b94ca5ceb [chore](macOS) Use LLVM Clang by default (#17292)
Use LLVM Clang by default
2023-03-03 14:18:02 +08:00
Pxl
b1347f4c38 [Chore](build) make compile option work on C objects && some refactor of cmakelists (#16451)
make compile option work on C objects && some refactor of cmakelists
2023-02-14 13:35:20 +08:00
0142ef8b95 [improvement](scanner) Supports bthread scanner (#16031) 2023-02-09 10:24:56 +08:00
2810029d24 [fix](multi-catalog) fix bug that replay init catalog may happen after catalog is dropped (#15919) 2023-01-14 09:41:37 +08:00
be110ffaf6 [thirdparty](clucene) add clucene deps for doris inverted index (#15807)
As part of Inverted Index DSIP steps, we'd like to contribute our inverted index implementations step by step.
First of all we need to introduce clucene to doris thirdparty libs, because inverted index implementations are based on 
lucence API and index file format, also we add our features and performance improvements base on clucene, so we 
need to maintain the repo ourselves
2023-01-12 21:59:19 +08:00
89c21af87d [chore](fe) update fe snapshot to 1.2 and fix auditloader compile error (#15787)
This PR #14925 change some field of AuditEvent, so we need to upgrade the fe-core's SNAPSHOT to 1.2
because auditloader depends on fe-core

Already push the 1.2-SNAPSHOT to
https://repository.apache.org/content/repositories/snapshots/org/apache/doris/fe-core/1.2-SNAPSHOT/
2023-01-11 08:46:48 +08:00
95f2f43c02 [fix](macOS) Failed to run BE UT due to syscall to map cache into shared region failed (#15641)
According to the post https://developer.apple.com/forums/thread/676684, the executable whose size is bigger than 2G may fail to start. The size of the executable `doris_be_test` generated by run-be-ut.sh is 2.1G (> 2G) now and we can't run it on macOS (arm64).

We can separate the debug info from the executable `doris_be_test` to reduce the size. After that, we can run `doris_be_test` successfully.
2023-01-06 01:23:37 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
Pxl
c804024e5d [Chore](workflow) add clang-tidy workflow (#14737)
add clang-tidy workflow
2022-12-02 14:10:29 +08:00