doris

Author	SHA1	Message	Date
Jerry Hu	b6b8ef3a18	[chore](script) avoid failed while buiding on non-git repository (#23982 ) Co-authored-by: yiguolei <676222867@qq.com>	2023-09-08 10:08:00 +08:00
airborne12	7727535b91	[Enhancement](build) try to download commit specific source code when git submodule failed (#23846 )	2023-09-04 20:12:46 +08:00
Qi Chen	57ca7d66d3	[Fix](multi-catalog) Fix zlib init error by using doris's zlib shared library and `jni.log` does not output. (#23260 )	2023-09-02 21:44:14 +08:00
slothever	5ba505ebf4	[fix](multi-catalog)fix avro and jdbc scanner dependency (#23015 ) add preload-extensions module, put all conflict dependencies to pom.xml in preload-extensions	2023-08-20 19:28:17 +08:00
slothever	919bfd73f1	[improvement](multi-catalog)add scanner isolation class loader (#22247 ) Add scanner isolation class loader to make each plugin non-conflicting. The BE will get scanner classes by JNI call and use JniClassLoader load them. In the last version，we always get canner classes from the system class path by default, so it cannot isolate the classes for each scanner	2023-08-10 10:02:46 +08:00
airborne12	8371171e44	[Feature](inverted index) add inverted index tool (#22207 )	2023-07-27 21:28:34 +08:00
ZhangYu0123	b2be42c31c	[fix](jdbc catalog) fix jdbc catalog like expr query error (#22141 )	2023-07-25 22:30:28 +08:00
LiBinfeng	f84af95ac4	[feature](Nereids) Add minidump replay and refactor user feature of minidump (#20716 ) ### Two main changes: - 1、add minidump replay - 2、change minidump serialization of statistic messages and some interface between main logic of nereids optimizer and minidump ### Use of nereids ut: - 1、save minidump files: Execute command by mysql-client: ``` set enable_nereids_planner=true; set enable_minidump=true; ``` Execute sql in mysql-client - 2、use nereids-ut script to execute directory: ``` cp -r ${DORIS_HOME}/minidump ${DORIS_HOME}/output/fe && cd ${DORIS_HOME}/output/fe ./nereids_ut --d ${directory_of_minidump_files} ``` ### Refactor of minidump - move statistics used serialization to serialization of input and serialize with catalogs - generating minidump file only when enable_minidump flag is set, minidump module interactive with main optimizer only by : serializeInputsToDumpFile(catalog, statistics, query) && serializeOutputsToDumpFile(outputplan).	2023-07-25 15:26:19 +08:00
Xinyi Zou	d180ed418d	[fix](stacktrace) Speed up stack trace (#21755 ) Introduce libunwind get stack trace, cost is negligible and has line numbers. use StackTraceCache, PHDRCache speed up, is customizable and has some optimizations. Other stack trace tools remain: glog, boost, glibc, in case for need. TODO: currently support linux __x86_64__, __arm__, __powerpc__, not supported __FreeBSD__, APPLE Note: __arm__, __powerpc__ not been verified Support signal handle libunwid support unw_backtrace for jemalloc Use of undefined compile option USE_MUSL for later	2023-07-19 15:43:14 +08:00
Qi Chen	fde73b6cc6	[Fix](multi-catalog) Fix hadoop short circuit reading can not enabled in some environments. (#21516 ) Fix hadoop short circuit reading can not enabled in some environments. - Revert #21430 because it will cause performance degradation issue. - Add `$HADOOP_CONF_DIR` to `$CLASSPATH`. - Remove empty `hdfs-site.xml`. Because in some environments it will cause hadoop short circuit reading can not enabled. - Copy the hadoop common native libs(which is copied from https://github.com/apache/doris-thirdparty/pull/98 ) and add it to `LD_LIBRARY_PATH`. Because in some environments `LD_LIBRARY_PATH` doesn't contain hadoop common native libs, which will cause hadoop short circuit reading can not enabled.	2023-07-06 15:00:26 +08:00
DongLiang-0	a6b51ec19a	[Feature](avro) Support Apache Avro file format (#19990 ) support read avro file by hdfs() or s3() . ```sql select * from s3( "uri" = "http://127.0.0.1:9312/test2/person.avro", "ACCESS_KEY" = "ak", "SECRET_KEY" = "sk", "FORMAT" = "avro"); +--------+--------------+-------------+-----------------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------------+ \| Alyssa \| 1 \| 10.0012 \| 100000000221133 \| \| Ben \| 0 \| 5555.999 \| 4009990000 \| \| lisi \| 0 \| 5992225.999 \| 9099933330 \| +--------+--------------+-------------+-----------------+ select * from hdfs( "uri" = "hdfs://127.0.0.1:9000/input/person2.avro", "fs.defaultFS" = "hdfs://127.0.0.1:9000", "hadoop.username" = "doris", "format" = "avro"); +--------+--------------+-------------+-----------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------+ \| Alyssa \| 1 \| 8888.99999 \| 89898989 \| +--------+--------------+-------------+-----------+ ``` current avro reader only support common data type, the complex data types will be supported later.	2023-06-28 21:15:35 +08:00
Mingyu Chen	1dec592e91	[improvement](fs_bench) optimize the usage of fs benchmark tool for hdfs (#21154 ) Optimize the usage of fs benchmark tool: 1. Remove `Open` benchmark, it is useless. 2. Remove `Delete` benchmark, it is dangerous. 3. Add `SingleRead` benchmark, user can specify an exist file to test read operation: `sh bin/run-fs-benchmark.sh --conf=conf/hdfs_read.conf --fs_type=hdfs --operation=single_read` 4. Modify the `run-fs-benchmark.sh`, remove `OPTS` section, use options in `fs_benchmark_tool` directly 5. Add some custom counters in the benchmark result, eg: ``` -------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------------------- HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 6864 ms 2385 ms 1 ReadRate=200.936M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 3919 ms 1828 ms 1 ReadRate=351.96M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 3839 ms 1819 ms 1 ReadRate=359.265M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_mean 4874 ms 2011 ms 3 ReadRate=304.054M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_median 3919 ms 1828 ms 3 ReadRate=351.96M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_stddev 1724 ms 324 ms 3 ReadRate=89.3768M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_cv 35.37 % 16.11 % 3 ReadRate=29.40% HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_max 6864 ms 2385 ms 3 ReadRate=359.265M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_min 3839 ms 1819 ms 3 ReadRate=200.936M/s ``` - For `open_read` and `single_read`, add `ReadRate` as `bytes per second`. - For `create_write`, add `WriteRate` as `bytes per second`. - For `exists` and `rename`, add `ExistsCost` and `RenameCost` as `time cost per one operation`.	2023-06-26 11:37:14 +08:00
Mingyu Chen	615778924e	[feature](fs) add fs benchmark tool framework (#20770 ) Add an optional executable binary fs_benchmark_tool, for test the performance of file system such as hdfs, s3. Usage: ./fs_benchmark_tool --conf my.conf --fs_type=s3 --operation=read --iterations=5 in my.conf, you can add any config key value with following format: key1=value1 key2=value2 By default, this binary will not be built. Only build it when setting BUILD_FS_BENCHMARK=ON. The binary will be installed in output/be/lib. For developer, you can add new subclass of BaseBenchmark to add your own benchmark. See be/src/io/fs/benchmark/s3_benchmark.hpp for an example	2023-06-14 17:50:06 +08:00
Mingyu Chen	cd46f459db	[minor](script) fix typo in build.sh (#20757 )	2023-06-14 09:05:01 +08:00
Mingyu Chen	5d2758cb8f	[improvement](build) move add BE extension jars to java_extensions dir (#20740 ) Follow #20185 Move all BE java extension jars to `be/lib/java_extensions/` dir. Also remove `udf` dir, used for BE native udf, which is deprecated since v1.2 The final output is: ``` output ├── be │ ├── bin │ ├── conf │ ├── dict │ ├── lib \| ├── java_extensions │ ├── hudi-scanner-jar-with-dependencies.jar │ ├── java-udf-jar-with-dependencies.jar │ ├── jdbc-scanner-jar-with-dependencies.jar │ ├── max-compute-scanner-jar-with-dependencies.jar │ └── paimon-scanner-jar-with-dependencies.jar │ ├── LICENSE-dist.txt │ ├── licenses │ ├── log │ ├── NOTICE.txt │ ├── storage │ └── www └── fe ├── bin ├── conf ├── doris-meta ├── lib ├── LICENSE-dist.txt ├── licenses ├── log ├── mysql_ssl_default_certificate ├── NOTICE.txt ├── spark-dpp └── webroot ```	2023-06-13 18:55:12 +08:00
lexluo09	57656b2459	[Enhancement](java-udf) java-udf module split to sub modules (#20185 ) The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner. Co-authored-by: lexluo <lexluo@tencent.com>	2023-06-13 09:41:22 +08:00
airborne12	a0c3ebeeab	[Chore](build) fix typo in build.sh (#19846 )	2023-05-19 11:49:20 +08:00
zclllyybb	cbe12cfb69	[chore](build) Support specifying output path #19669 now we could ./build.sh --be --fe --output PATH_TO_BINARY_YOU_LIKE did not modify the default value	2023-05-19 09:22:36 +08:00
Mingyu Chen	3d795de2d5	[chore](build) avoid generating generated code every time (#19813 ) When calling generated-source.sh in build.sh, not to remove the gensrc/build dir.	2023-05-19 08:47:36 +08:00
Mingyu Chen	b3ce4593b1	[deps](libhdfs) update hadoop libhdfs to 3.3.4.1 for doris (#19832 )	2023-05-19 08:44:32 +08:00
airborne12	f32deb18e9	[Update](build) change clucene from thirdparty to git module (#19352 )	2023-05-19 08:25:51 +08:00
yongkang.zhong	3cc8bbb93f	[chore](Java UDF)remove the error code and add the copy jar (#19503 ) * [chore](Java UDF)remove the error code and add the copy jar * [chore](Java UDF)remove the error code and add the copy jar	2023-05-11 16:17:29 +08:00
yongkang.zhong	b72ff93c7a	[chore](java udf)Add Java UDF compilation options (#19468 )	2023-05-10 10:51:11 +08:00
Qi Chen	096aa25ca6	[improvement](orc-reader) Implements ORC lazy materialization (#18615 ) - Implements ORC lazy materialization, integrate with the implementation of https://github.com/apache/doris-thirdparty/pull/56 and https://github.com/apache/doris-thirdparty/pull/62. - Refactor code: Move `execute_conjuncts()` and `execute_conjuncts_and_filter_block()` in `parquet_group_reader `to `VExprContext`, used by parquet reader and orc reader. - Add session variables `enable_parquet_lazy_materialization` and `enable_orc_lazy_materialization` to control whether enable lazy materialization. - Modify `build.sh` to update apache-orc submodule or download package every time.	2023-05-09 23:33:33 +08:00
DeadlineFen	e08de52ee7	[chore](compile) using PCH for compilation acceleration under clang (#19303 )	2023-05-08 19:51:06 +08:00
minghong	c936810e83	[fix](compile) fix bug in build.sh (#19314 ) fix path for $(dirname $0)/generated-source.sh to enable docker build	2023-05-06 10:00:20 +08:00
Mingyu Chen	c9fa10ac10	[fix](doc) avoid generate config doc automatically (#19302 ) After #19246, when compilng FE, it will automatically generate Config and Session Variables doc and overwrite the origin one. Need to avoid it because it is not ready to use yet	2023-05-05 20:39:05 +08:00
Mingyu Chen	9d18be9dd3	[doc](thrift) update doc for thrift 0.16 (#19217 ) * 1 update doc for thrift 0.16	2023-05-02 16:00:10 +08:00
Pxl	ec517a53a8	[Chore](build) upgrade clang-format version to 16 && move thrift to fe-common (#19155 ) upgrade clang-format version to 16 move thrift to fe-common fix core dump on pipeline engine when operator canceled and not prepared	2023-04-28 14:14:51 +08:00
Pxl	c40860aba4	[Chore](thrift) generate thrift java code to make code analysis work well (#18793 ) generate thrift java code to make code analysis work well	2023-04-19 19:33:17 +08:00
ZhangYu0123	afbbf84675	[chore](build) add apache-orc git submodule path (#18695 ) 1. Add apache-orc git submodule update path, not update all modules When sh build.sh, update all modules will fails serveral times because of unstable github network. It wastes many time. 2. Add gitignore for be/src/apache-orc/ to avoid mistake commits.	2023-04-17 00:08:25 +08:00
Qi Chen	4284fc4e75	[chore] Download apache orc source code from github if git does not work in build.sh. (#18625 ) * [chore] Download apache orc source code from github if git does not work in build.sh. * add cd "${DORIS_HOME}" * Fix blank issue.	2023-04-14 17:54:14 +08:00
Qi Chen	2209b714d1	[chore](orc) Update orc lib to third party lib(1.8.3) using git submodule. (#18531 )	2023-04-12 10:37:50 +08:00
siriume	12a9214448	[chore](build) Build java udf by default #18255	2023-04-09 10:33:39 +08:00
Pxl	76d76f672c	[Chore](build) enchancement for backend build time usage (#18344 )	2023-04-06 11:13:21 +08:00
yongjinhou	aff260c06f	[Enhancement](HttpServer) Support https interface (#16834 ) 1. Organize http documents 2. Add http interface authentication for FE 3. Support https interface for FE 4. Provide authentication interface 5. Add http interface authentication for BE 6. Support https interface for BE	2023-04-03 14:18:17 +08:00
Mingyu Chen	7e61a85331	[refactor](libhdfs) introduce hadoop libhdfs (#18204 ) 1. Introduce hadoop libhdfs 2. For Linux-X86 platform, use the hadoop libhdfs 3. For other platform, use libhdfs3, because currently we don't have hadoop libhdfs binary for other platform Co-authored-by: adonis0147 <adonis0147@gmail.com>	2023-03-31 18:41:39 +08:00
Xinyi Zou	e5793249cd	[opt](hashtable) Modify default filled strategy to 75% (#18242 )	2023-03-31 09:28:11 +08:00
Adonis Ling	f9c4542d04	[chore](build) Porting to Clang-16 (#18196 ) This PR ports the codebase to Clang-16. Upgrade some third-party libraries: 1. Apache BRPC: 1.2.0 -> 1.4.0 (Some bugs are fixed and all patches for 1.2.0 can be removed.) 2. Boost: 1.73.0 -> 1.81.0 (Porting to Clang-16) 3. libclucene: 2.4.6 -> 2.4.8 (Porting to Clang-16)	2023-03-30 10:36:29 +08:00
chunping	0bb04c08aa	[improvement](coverage) build be with coverage enabled, which can get coverage data with llvm-cov-15 (#17995 )	2023-03-23 12:07:19 +08:00
abmdocrt	82df2ae9d8	[feature](mysql) Support secure MySQL connection to FE (#17138 ) Background: Doris currently does not support SSL connection from MySQL clients, it's not secure enough in some cases, especially access Doris via the public internet. Solution: - Use TLS1.2 protocol to encrypt information. - Implementation details * server <--- connect <--- client * if enable SSL: { * server <--- SSL connection request packet <--- client * server <--- SSL Exchange ---> client } (we will add this `if` logic part in this PR) * server ---> handshake request packet ---> client * server <--- encrypted data ---> client (this part will be realized in this PR) - reference1 https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_connection_phase.html#sect_protocol_connection_phase_initial_handshake_ssl_handshake - reference2 https://www.rfc-editor.org/rfc/rfc5246 close #16313 Signed-off-by: Yukang Lian <yukang.lian2022@gmail.com> Co-authored-by: Gavin Chou <gavineaglechou@gmail.com> Co-authored-by: morningman <morningman@163.com>	2023-03-04 12:14:48 +08:00
Adonis Ling	3b94ca5ceb	[chore](macOS) Use LLVM Clang by default (#17292 ) Use LLVM Clang by default	2023-03-03 14:18:02 +08:00
Pxl	b1347f4c38	[Chore](build) make compile option work on C objects && some refactor of cmakelists (#16451 ) make compile option work on C objects && some refactor of cmakelists	2023-02-14 13:35:20 +08:00
Xiaocc	0142ef8b95	[improvement](scanner) Supports bthread scanner (#16031 )	2023-02-09 10:24:56 +08:00
Mingyu Chen	2810029d24	[fix](multi-catalog) fix bug that replay init catalog may happen after catalog is dropped (#15919 )	2023-01-14 09:41:37 +08:00
airborne12	be110ffaf6	[thirdparty](clucene) add clucene deps for doris inverted index (#15807 ) As part of Inverted Index DSIP steps, we'd like to contribute our inverted index implementations step by step. First of all we need to introduce clucene to doris thirdparty libs, because inverted index implementations are based on lucence API and index file format, also we add our features and performance improvements base on clucene, so we need to maintain the repo ourselves	2023-01-12 21:59:19 +08:00
Mingyu Chen	89c21af87d	[chore](fe) update fe snapshot to 1.2 and fix auditloader compile error (#15787 ) This PR #14925 change some field of AuditEvent, so we need to upgrade the fe-core's SNAPSHOT to 1.2 because auditloader depends on fe-core Already push the 1.2-SNAPSHOT to https://repository.apache.org/content/repositories/snapshots/org/apache/doris/fe-core/1.2-SNAPSHOT/	2023-01-11 08:46:48 +08:00
Adonis Ling	95f2f43c02	[fix](macOS) Failed to run BE UT due to syscall to map cache into shared region failed (#15641 ) According to the post https://developer.apple.com/forums/thread/676684, the executable whose size is bigger than 2G may fail to start. The size of the executable `doris_be_test` generated by run-be-ut.sh is 2.1G (> 2G) now and we can't run it on macOS (arm64). We can separate the debug info from the executable `doris_be_test` to reduce the size. After that, we can run `doris_be_test` successfully.	2023-01-06 01:23:37 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
Pxl	c804024e5d	[Chore](workflow) add clang-tidy workflow (#14737 ) add clang-tidy workflow	2022-12-02 14:10:29 +08:00

1 2 3 4

168 Commits