why upgrade? anything wrong?
Try to fix the problem about opentelemetry::v1::ext::http::client::curl::HttpOperation::Send(), I have updated the pr info.
Usage: ./build-thirdparty.sh [options...] [packages...]
Optional options:
-j <num> build thirdparty parallel
--clean clean the extracted data
--continue <package> continue to build the remaining packages (starts from the specified package)
Examples:
1. Specify packages to build.
Build gflags, gtest and glog by executing ./build-thirdparty.sh gflags gtest glog.
2. Continue to build the remaining packages.
Build the remaining packages (starts from sse2neon) by executing ./build-thirdparty.sh --continue sse2neon.
If we use Clang-16 to build the third-party libraries and build doris_be_test against them, we can not run doris_be_test successfully. Some errors with BRPC occur.
I tested this on Linux (x86_64) and macOS (x86_64/arm64), these errors always raised.
Currently, our third party libraries are built by autotools or cmake. Under some scenarios, we may use system-wide headers or libraries to build them which may make the build process fail.
We can configure the search paths explicitly to help autotools and cmake find the right dependencies.
This PR ports the codebase to Clang-16.
Upgrade some third-party libraries:
1. Apache BRPC: 1.2.0 -> 1.4.0 (Some bugs are fixed and all patches for 1.2.0 can be removed.)
2. Boost: 1.73.0 -> 1.81.0 (Porting to Clang-16)
3. libclucene: 2.4.6 -> 2.4.8 (Porting to Clang-16)
add <optional> head to solve the compilation issue
use 3.12.9 as the protoc.artifact's version, because there is no 3.12.21
See: https://repo.maven.apache.org/maven2/com/google/protobuf/protoc/
Remove --show-progress arguments of wget because it is not supported in low version wget
This is the first step to introduce official hadoop libhdfs to Doris.
Because the current hdfs client libhdfs3 lacks some important feature and is hard to maintain.
Download the hadoop 3.3.4 binary from hadoop website: https://hadoop.apache.org/releases.html
Extract libs and headers which are used for libhdfs, and pack them into hadoop_lib_3.3.4-x86.tar.gz
Upload it to https://github.com/apache/doris-thirdparty/releases/tag/hadoop-libs-3.3.4
TODO:
The hadoop libs for arm is missing, we need to find a way to build it
1. change clucene version from 2.4.4->2.4.6
2. update build-thirdparty.sh clucene's build block, adding USE_BTHREAD CMAKE flag, this flag is inherited from doris's USE_BTHREAD_SCANNER.
When building CLucene, CMake may find the wrong Boost and zlib. We should pass the search paths to the build command for CLucene explicitly to find the correct dependencies.
Fix conflit name TCHAR in odbc sqltypes.h and clucene clucene-config.h.
Change TCHAR to TWCHAR in odbc sqltypes.h, because TCHAR in odbc is not found used in doris,
but there are too many places to call clucene's TCHAR.
thirdparty/installed/include/sqltypes.h:
`typedef char TCHAR;`
thirdparty/installed/include/CLucene/clucene-config.h:
`typedef wchar_t TCHAR;`
As part of Inverted Index DSIP steps, we'd like to contribute our inverted index implementations step by step.
First of all we need to introduce clucene to doris thirdparty libs, because inverted index implementations are based on
lucence API and index file format, also we add our features and performance improvements base on clucene, so we
need to maintain the repo ourselves
HOW to reproduce?
Add export CMAKE_BUILD_TYPE=DEBUG in custom_env.sh. Then build thirdparty in MAC.
There are two problems:
build vectorscan with DEBUG type, will got unused-but-set-variable error:
doris/thirdparty/src/vectorscan-vectorscan-5.4.7/src/nfa/mcclellancompile.cpp:1485:13: error: variable 'total_daddy' set but not used [-Werror,-Wunused-but-set-variable]
u16 total_daddy = 0;
gflags will output libgflags_debug.a instead of libgflags.a while build with DEBUG type. Then we will got error can not find library gflags error.
To avoid these errors, we set CMAKE_BUILD_TYPE while build vectorscan and gflags.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: Adonis Ling <adonis0147@gmail.com>
In #15037, I modified the build script of libgsasl to enable GSSAPI,
but it is still wrong, because the PATH does not include the `thirdparty/installed/bin`,
so when building libgsasl, it will report error:
`WARNING: MIT Kerberos krb5-config not found, disabling GSSAPI`
but `krb5-config` is in `thirdparty/installed/bin`.
Without GSSAPI, the libhdfs3 can not access hdfs with kerberos authentication.
On macOS, we need some extra libraries to build the codebase,
therefore two packages were introduced to the project. They are `binutils` and `gettext`.
It takes a lot of time to build these packages completely. This PR introduces a way to build the needed libraries
and other stuff are skipped to build. It can save the time to build the third-party libraries on macOS.
When doris be getFileStatus from HDFS2 server, libhdfs3 will throw exception because of the permission code returned by hdfs2 server is greater than 1<<12.
The bit 12 of permission code is aclBit which has been deprecated in hadoop3. so we remove the check code in libhdfs3, same as hadoop3 java project.
Currently, we may fail to build the third-party libraries if we keep the outdated extracted data.
Considering the following scenario, Bob added patches to some libraries and Alice updates the codebase and builds
the third-party libraries. If Alice kept the outdated extracted data, she should fail to build the third-party libraries
because the patches are not applied due to the outdated `patched_marks`.
This PR introduces a way to clean the outdated data before building the third-party libraries.
Upgrade simdjson from 1.0.2 to latest version 3.0.1 to avoid -mlzcnt compiler flag causing BE UT(macOS) failure.
simdjson is now only used by VJsonScanner and disabled by default. So the impact of upgrade is limited.