doris

Author	SHA1	Message	Date
Mingyu Chen	38c5030f97	[opt](log) refactor the log dir config (#32933 ) Refactor the config for log dir of FE and BE TLDR: - Use env variable `LOG_DIR` to set root log dir - Remove `sys_log_dir` for FE and BE Details: 1. FE 1. The root log dir is set by env variable `LOG_DIR` in `fe.conf` 2. The default value of `audit_log_dir` is same as `${LOG_DIR}/` 3. The default value of `spark_launcher_log_dir` is `${LOG_DIR}/spark_launcher_log` 4. The default value of `nereids_trace_log_dir` is `${LOG_DIR}/nereids_trace_log` 5. The origin `sys_log_dir` is deprecated, and default value is `""`. But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir. 2. BE 1. The root log dir is set by env variable `LOG_DIR` in `be.conf` 2. Remove `pipeline_tracing_log_dir`, use `${LOG_DIR}` directly. 3. The origin `sys_log_dir` is deprecated, and default value is `""`. But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.	2024-04-17 23:41:59 +08:00
Mingyu Chen	c0d7a5660e	[fix](paimon) support paimon with hive2 (#32455 ) In order to support paimon with hive2, we need to modify the origin HiveMetastoreClient.java to let it compatible with both hive2 and hive3. And this modified HiveMetastoreClient should be at the front of the CLASSPATH, so that it can overwrite the HiveMetastoreClient in hadoop jar. This PR mainly changes: 1. Copy HiveMetastoreClient.java in FE to BE's preload jar. 2. Split the origin `preload-extensions-jar-with-dependencies.jar` into 2 jars 1. `preload-extensions-project.jar`, which contains the modified HiveMetastoreClient. 2. `preload-extensions-jar-with-dependencies.jar`, which contains other dependency jars. 3. Modify the `start_be.sh`, to let `preload-extensions-project.jar` be loaded first. 4. Change the way the assemble the jni scanner jar Only need to assemble the project jar, without other dependencies. Because actually we only use classed under `org.apache.doris` package. So remove other unused dependency jars can also reduce the output size of BE. 5. fix bug that the prefix of paimon properties should be `paimon.`, not `paimon` 6. Support paimon with hive2 User can set `hive.version` in paimon catalog properties to specify the hive version.	2024-03-26 15:31:07 +08:00
zy-kkk	c33277a957	[fix](script) Fix the JAVA_OPTS set on macOS (#32170 )	2024-03-15 18:01:56 +08:00
yiguolei	6ea5218ee8	Revert "[Enhencement](env) Checking Master branch must use JDK17 (#31587 )" This reverts commit fa499cc200344eaaf837fd52211820dc7b7b9296.	2024-03-06 13:13:49 +08:00
Tiewei Fang	fa499cc200	[Enhencement](env) Checking Master branch must use JDK17 (#31587 ) Add to check the JDK version in `env.sh`, and force master to use java 17 version	2024-03-06 13:05:58 +08:00
Tiewei Fang	ea427e8c51	[fix](JDK17) It will report an exception whenwe start BE with JDK17 and query AVRO table : InaccessibleObjectException (#30541 ) * [fix](JDK17) It will report an exception whenwe start BE with JDK17 and query AVRO table : InaccessibleObjectException (#30003)	2024-01-30 15:33:40 +08:00
slothever	b1a9370004	[fix](glue)support access glue iceberg with credential list (#30473 ) merge from #30292	2024-01-28 18:23:07 +08:00
Mingyu Chen	9773fef4a1	[fix](class-loader) fix class loader conflict on BE side (#29942 ) 1. make `hadoop-common` in be java extension as `provided`. 2. must load be java extension jars before hadoop jars	2024-01-16 18:37:06 +08:00
slothever	25428bd7fb	[fix](kerberos) fix BE kerberos ccache renew, optimize kerbero options (#29291 ) 1. we need remove BE kinit, and use jni login with keytab, because kinit cannot renew TGT for doris in many complex cases. > This pull requet will support new instance from keytab: https://github.com/apache/doris-thirdparty/pull/173, so now we won't need kinit cmd, just login with keytab and principal 2. add `kerberos_ccache_path` to set kerberos credentials cache path manually. 3. add `max_hdfs_file_handle_cache_time_ms` to set hdfs fs handle cache time.	2024-01-16 18:35:29 +08:00
Mingyu Chen	12af86176a	[fix](class-loader) fix class loader conflict on BE side (#29942 ) 1. make `hadoop-common` in be java extension as `provided`. 2. must load be java extension jars before hadoop jars	2024-01-14 15:53:33 +08:00
zy-kkk	8fc9c18c85	[improvement](jdbc catalog) Put the jdbc connection pool parameters into catalog properties (#29195 )	2024-01-12 11:40:28 +08:00
Guangming Lu	10368a71a4	[fix][security]security optimize for executable binary file doris_be access should be restricted (#29303 )	2023-12-30 23:39:16 +08:00
Adonis Ling	d2eea9b3ae	[chore](macOS) Reduce the size of executables on macOS arm64 (#26894 ) Like #15641, we should reduce the size of executables on macOS arm64. Otherwise, we can not run doris_be and doris_be_test with ASAN build type on macOS arm64 now.	2023-11-14 12:21:08 +08:00
Mingyu Chen	f41b6a5fc3	[minor](doc) update the doc for docker env and custom_lib dir (#25088 ) 1. Update the doc for `apache/doris:build-env-for-2.0` 2. Update the doc for `custom_dir`	2023-10-09 09:50:31 +08:00
yujun	07f9f27fa9	[improvement](start script) start script can not set http proxy (#25086 ) be clone snapshot using http, if set http proxy, then be clone snapshot will failed. so the start script forbit set env http proxy.	2023-10-08 10:06:06 +08:00
Mingyu Chen	9d0f4c0094	[minor](be) set fd number check to 60000 for BE start script (#25078 ) Modify the BE fd number check to 60000, because the default fd number value of some system is 65535, which is smaller than previous threshold 65536, so reduce to 60000 to let Doris start normally in most of system.	2023-10-07 19:02:39 +08:00
Dongyang Li	8a226bbd63	[fix](start_be) ignore output from command -v (#24739 )	2023-09-21 19:57:43 +08:00
Dongyang Li	4c79a76491	[improve](script) echo infos if java cmd is not valid when starting be (#24714 ) Co-authored-by: stephen <hello-stephen@qq.com>	2023-09-21 12:43:24 +08:00
Calvin Kirs	1a553f7e14	[Improve](start-shell)Optimize fe&be startup (#24556 ) - sh start_fe/start_be --console is used to instruct the program to run in console mode. - sh start_fe/start_be --daemon is used to instruct the program to run in daemon mode. - sh start_fe/start_be used starts as a background execution, records output and error logs to the specified file	2023-09-19 23:00:59 +08:00
Mingyu Chen	e090b83e33	[improvemnt](script) support custom lib dir to save custom libs (#23887 ) Sometimes, user need to add some custom libs to the cluster, such lzo.jar, orai18n.jar, etc. In previous, these lib files are places in fe/lib or be/lib. But when upgrading cluster, the lib dir will be replaced by new lib dir, so that all custom libs are lost. In this PR, I add new dir custom_lib for FE and BE, and user can place custom lib files in it.	2023-09-05 11:54:19 +08:00
zzzzzzzs	774a771e0c	[Improve](be)check swap (#18891 ) Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>	2023-09-05 09:39:55 +08:00
Adonis Ling	2885de1d63	[chore](macOS) Fix invalid option errors in start_be.sh (#23861 )	2023-09-05 09:07:53 +08:00
Qi Chen	57ca7d66d3	[Fix](multi-catalog) Fix zlib init error by using doris's zlib shared library and `jni.log` does not output. (#23260 )	2023-09-02 21:44:14 +08:00
zhangdong	ffadf09eec	[fix](catalog)add custom jar (#23406 ) - allow put custom jar in `${DORIS_HOME}/lib/java_extensions/custom_extension` such as `paimon-s3-0.4.0-incubating.jar` - add some note for paimon and fqdn	2023-08-25 11:10:53 +08:00
slothever	5ba505ebf4	[fix](multi-catalog)fix avro and jdbc scanner dependency (#23015 ) add preload-extensions module, put all conflict dependencies to pom.xml in preload-extensions	2023-08-20 19:28:17 +08:00
slothever	919bfd73f1	[improvement](multi-catalog)add scanner isolation class loader (#22247 ) Add scanner isolation class loader to make each plugin non-conflicting. The BE will get scanner classes by JNI call and use JniClassLoader load them. In the last version，we always get canner classes from the system class path by default, so it cannot isolate the classes for each scanner	2023-08-10 10:02:46 +08:00
Xinyi Zou	96a46302e8	[fix](stacktrace) Fix Jemalloc enable profile fail to run BE after rewrites dl_iterate_phdr (#22549 ) Jemalloc heap profile follows libgcc's way of backtracing by default. rewrites dl_iterate_phdr will cause Jemalloc to fail to run after enable profile. TODO, two solutions: - Jemalloc specifies GNU libunwind as the prof backtracing way, but my test failed, --enable-prof-libunwind not work: --enable-prof-libunwind not work jemalloc/jemalloc#2504 - ClickHouse/libunwind solves Jemalloc profile backtracing, but the branch of ClickHouse/libunwind has been out of touch with GNU libunwind and LLVM libunwind, which will leave the fate to others.	2023-08-03 19:32:36 +08:00
Xinyi Zou	bc87002028	[opt](conf) remote scanner thread num is changed to core num * 10 (#22427 )	2023-08-01 23:09:49 +08:00
Tiewei Fang	e8f4323e0f	[Fix](jdbcCatalog) fix typo of some variable #22214	2023-07-26 08:34:45 +08:00
Xinyi Zou	1afe090486	[improvement](memory) modify jemalloc conf in be.conf (#21943 ) modify jemalloc conf in be.conf disable je_purge_all_arena_dirty_pages	2023-07-20 10:34:31 +08:00
Qi Chen	fde73b6cc6	[Fix](multi-catalog) Fix hadoop short circuit reading can not enabled in some environments. (#21516 ) Fix hadoop short circuit reading can not enabled in some environments. - Revert #21430 because it will cause performance degradation issue. - Add `$HADOOP_CONF_DIR` to `$CLASSPATH`. - Remove empty `hdfs-site.xml`. Because in some environments it will cause hadoop short circuit reading can not enabled. - Copy the hadoop common native libs(which is copied from https://github.com/apache/doris-thirdparty/pull/98 ) and add it to `LD_LIBRARY_PATH`. Because in some environments `LD_LIBRARY_PATH` doesn't contain hadoop common native libs, which will cause hadoop short circuit reading can not enabled.	2023-07-06 15:00:26 +08:00
Mingyu Chen	242a35fa80	[fix](s3) fix s3 fs benchmark tool (#21401 ) 1. fix concurrency bug of s3 fs benchmark tool, to avoid crash on multi thread. 2. Add `prefetch_read` operation to test prefetch reader. 3. add `AWS_EC2_METADATA_DISABLED` env in `start_be.sh` to avoid call ec2 metadata when creating s3 client. 4. add `AWS_MAX_ATTEMPTS` env in `start_be.sh` to avoid warning log of s3 sdk.	2023-07-05 16:20:58 +08:00
Ashin Gau	9adbca685a	[opt](hudi) use spark bundle to read hudi data (#21260 ) Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data. Advantage for using spark-bundle to read hudi data: 1. The performance of spark-bundle is more than twice that of hive-bundle 2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm 3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris Disadvantage for using spark-bundle to read hudi data: 1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M) 2. spark-bundle only provides `RDD` interface and cannot be used directly	2023-07-04 17:04:49 +08:00
Qi Chen	88b2d81873	[Fix](multi-catalog) Add hadoop system classpath to CLASSPATH to resolve can not enable hadoop short circuit reading in some environments. (#21430 ) Add hadoop system classpath to CLASSPATH to resolve can not enable hadoop short circuit reading in some environments.	2023-07-03 14:51:34 +08:00
zy-kkk	53b2fe5db6	[improvement](jdbc) Set the JDBC connection timeout to be conf (#21000 )	2023-06-20 14:23:48 +08:00
zy-kkk	9c30fb5a21	[fix](script)Fix the JAVA_OPTS version error of the BE start script (#20766 )	2023-06-14 15:25:00 +08:00
Mingyu Chen	5d2758cb8f	[improvement](build) move add BE extension jars to java_extensions dir (#20740 ) Follow #20185 Move all BE java extension jars to `be/lib/java_extensions/` dir. Also remove `udf` dir, used for BE native udf, which is deprecated since v1.2 The final output is: ``` output ├── be │ ├── bin │ ├── conf │ ├── dict │ ├── lib \| ├── java_extensions │ ├── hudi-scanner-jar-with-dependencies.jar │ ├── java-udf-jar-with-dependencies.jar │ ├── jdbc-scanner-jar-with-dependencies.jar │ ├── max-compute-scanner-jar-with-dependencies.jar │ └── paimon-scanner-jar-with-dependencies.jar │ ├── LICENSE-dist.txt │ ├── licenses │ ├── log │ ├── NOTICE.txt │ ├── storage │ └── www └── fe ├── bin ├── conf ├── doris-meta ├── lib ├── LICENSE-dist.txt ├── licenses ├── log ├── mysql_ssl_default_certificate ├── NOTICE.txt ├── spark-dpp └── webroot ```	2023-06-13 18:55:12 +08:00
Adonis Ling	8c4f3d4126	[chore](macOS) Fix JAVA_OPTS in start_be.sh (#19267 ) We should set -XX:-MaxFDLimit on macOS if we enable java support for BE otherwise BE may fail to start up.	2023-05-08 14:01:10 +08:00
yongkang.zhong	7b02fa5cd6	[optimization](conf) optimization JAVA_OPTS for be conf and be bin (#19029 )	2023-04-27 13:48:46 +08:00
zhangdong	04d18eec59	[Improve](be)check max open file #18888	2023-04-22 08:42:43 +08:00
Pxl	9e64951721	[Chore](asan) set decrementOutputRecursionDepth to suppressions and remove some unu… (#18845 ) 18845	2023-04-20 23:33:25 +08:00
Mingyu Chen	7e61a85331	[refactor](libhdfs) introduce hadoop libhdfs (#18204 ) 1. Introduce hadoop libhdfs 2. For Linux-X86 platform, use the hadoop libhdfs 3. For other platform, use libhdfs3, because currently we don't have hadoop libhdfs binary for other platform Co-authored-by: adonis0147 <adonis0147@gmail.com>	2023-03-31 18:41:39 +08:00
Xinyi Zou	01d012bab7	[fix](memory) Remove page cache regular clear, disabled jemalloc prof by default (#18218 ) Remove page cache regular clear Now the page cache is turned off by default. If the user manually opens the page cache, it can be considered that the user can accept the memory usage of the page cache, and then can consider adding a manual clear command to the cache. fix memory gc cancel top memory query jemalloc prof is not enabled by default	2023-03-30 09:39:37 +08:00
Xinyi Zou	f36465e76e	[enhancement](memory) optimize jemalloc heap profile doc (#18094 )	2023-03-25 13:04:45 +08:00
Adonis Ling	f21508baec	[chore](macOS) Disable detect_container_overflow at BE startup (#17514 ) BE failed to start up due to container-overflow errors reported by address sanitizer.	2023-03-08 10:21:45 +08:00
Mingyu Chen	30df268c1f	[fix](hdfs)(catalog) fix BE crash when hdfs-site.xml not exist in be/conf and fix compute node logic (#17244 ) We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml, if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash. This CL mainly changes: Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist. Refactor the HDFSCommonBuilder so that it can return error correctly. Add BE IP info in status, so that we can get ip from error msg like: ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file 000.snappy.orc, err: [INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends. This CL refactor this logic and also change some FE config: prefer_compute_node_for_external_table If set to true, query on external table will prefer to assign to compute node. And the max number of compute node is controlled by min_backend_num_for_external_table. If set to false, query on external table will assign to any node. min_backend_num_for_external_table Only take effect when prefer_compute_node_for_external_table is true. If the compute node number is less than this value, query on external table will try to get some mix node to assign, to let the total number of node reach this value. If the compute node number is larger than this value, query on external table will assign to compute node only.	2023-03-02 11:09:55 +08:00
spaces-x	9f9651b2f2	[Enhancement](Jemalloc): correct the varialbe name of malloc_conf & enable prof (#15382 ) enable profile and correct the conf name in Jemalloc.	2022-12-28 09:50:59 +08:00
Mingyu Chen	5cf88a5339	[improvement](config) opt the message when missing JAVA_HOME for BE (#15045 ) Make the error message easy to understand	2022-12-14 23:17:46 +08:00
Yongqiang YANG	44eb1cf1c3	[fix](chore) read max_map_count from proc and make notice much more understandable (#14137 ) Some users can not use sysctl under non-root in linux, so we read max_map_count from proc. Notice users that they can change max_map_count under root.	2022-11-11 23:05:54 +08:00
Yongqiang YANG	a58ac48a6e	[chore](bin) do not set heap limit for tcmalloc until doris does not allocates large unused memory (#13761 ) We set heap limit for tcmalloc to avoid oom introduced by tcmalloc which allocates memory for cache even free memory of a machine is little. However, doris allocates large memory unused in some cases, so tcmalloc would throw an oom exception even ther are a lot free memory in a machine. We can set the limit after we fix the problem again.	2022-11-08 19:26:30 +08:00

1 2

82 Commits