Fix hadoop short circuit reading can not enabled in some environments.
- Revert #21430 because it will cause performance degradation issue.
- Add `$HADOOP_CONF_DIR` to `$CLASSPATH`.
- Remove empty `hdfs-site.xml`. Because in some environments it will cause hadoop short circuit reading can not enabled.
- Copy the hadoop common native libs(which is copied from https://github.com/apache/doris-thirdparty/pull/98
) and add it to `LD_LIBRARY_PATH`. Because in some environments `LD_LIBRARY_PATH` doesn't contain hadoop common native libs, which will cause hadoop short circuit reading can not enabled.
Usage: ./build-thirdparty.sh [options...] [packages...]
Optional options:
-j <num> build thirdparty parallel
--clean clean the extracted data
--continue <package> continue to build the remaining packages (starts from the specified package)
Examples:
1. Specify packages to build.
Build gflags, gtest and glog by executing ./build-thirdparty.sh gflags gtest glog.
2. Continue to build the remaining packages.
Build the remaining packages (starts from sse2neon) by executing ./build-thirdparty.sh --continue sse2neon.
Currently, our third party libraries are built by autotools or cmake. Under some scenarios, we may use system-wide headers or libraries to build them which may make the build process fail.
We can configure the search paths explicitly to help autotools and cmake find the right dependencies.
This PR ports the codebase to Clang-16.
Upgrade some third-party libraries:
1. Apache BRPC: 1.2.0 -> 1.4.0 (Some bugs are fixed and all patches for 1.2.0 can be removed.)
2. Boost: 1.73.0 -> 1.81.0 (Porting to Clang-16)
3. libclucene: 2.4.6 -> 2.4.8 (Porting to Clang-16)
This is the first step to introduce official hadoop libhdfs to Doris.
Because the current hdfs client libhdfs3 lacks some important feature and is hard to maintain.
Download the hadoop 3.3.4 binary from hadoop website: https://hadoop.apache.org/releases.html
Extract libs and headers which are used for libhdfs, and pack them into hadoop_lib_3.3.4-x86.tar.gz
Upload it to https://github.com/apache/doris-thirdparty/releases/tag/hadoop-libs-3.3.4
TODO:
The hadoop libs for arm is missing, we need to find a way to build it
1. change clucene version from 2.4.4->2.4.6
2. update build-thirdparty.sh clucene's build block, adding USE_BTHREAD CMAKE flag, this flag is inherited from doris's USE_BTHREAD_SCANNER.
When building CLucene, CMake may find the wrong Boost and zlib. We should pass the search paths to the build command for CLucene explicitly to find the correct dependencies.
As part of Inverted Index DSIP steps, we'd like to contribute our inverted index implementations step by step.
First of all we need to introduce clucene to doris thirdparty libs, because inverted index implementations are based on
lucence API and index file format, also we add our features and performance improvements base on clucene, so we
need to maintain the repo ourselves
HOW to reproduce?
Add export CMAKE_BUILD_TYPE=DEBUG in custom_env.sh. Then build thirdparty in MAC.
There are two problems:
build vectorscan with DEBUG type, will got unused-but-set-variable error:
doris/thirdparty/src/vectorscan-vectorscan-5.4.7/src/nfa/mcclellancompile.cpp:1485:13: error: variable 'total_daddy' set but not used [-Werror,-Wunused-but-set-variable]
u16 total_daddy = 0;
gflags will output libgflags_debug.a instead of libgflags.a while build with DEBUG type. Then we will got error can not find library gflags error.
To avoid these errors, we set CMAKE_BUILD_TYPE while build vectorscan and gflags.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: Adonis Ling <adonis0147@gmail.com>
In #15037, I modified the build script of libgsasl to enable GSSAPI,
but it is still wrong, because the PATH does not include the `thirdparty/installed/bin`,
so when building libgsasl, it will report error:
`WARNING: MIT Kerberos krb5-config not found, disabling GSSAPI`
but `krb5-config` is in `thirdparty/installed/bin`.
Without GSSAPI, the libhdfs3 can not access hdfs with kerberos authentication.
On macOS, we need some extra libraries to build the codebase,
therefore two packages were introduced to the project. They are `binutils` and `gettext`.
It takes a lot of time to build these packages completely. This PR introduces a way to build the needed libraries
and other stuff are skipped to build. It can save the time to build the third-party libraries on macOS.
Currently, we may fail to build the third-party libraries if we keep the outdated extracted data.
Considering the following scenario, Bob added patches to some libraries and Alice updates the codebase and builds
the third-party libraries. If Alice kept the outdated extracted data, she should fail to build the third-party libraries
because the patches are not applied due to the outdated `patched_marks`.
This PR introduces a way to clean the outdated data before building the third-party libraries.
There are some issues with the compile flag `-Wno-unused-but-set-variable` for clang.
1. `-Wno-unused-but-set-variable` should be set when building source by clang-15 on Linux. (#13000#13016)
2. On macOS Monterey, Apple Clang 13 may treat it as a unknown warning option and the compilation process may interrupt.
This PR introduces a better way to make this compile flag more portable.
1. Test whether the compiler recognizes this flag.
2. Add this flag if the compiler recognizes it.