doris

Author	SHA1	Message	Date
zclllyybb	2759383365	[branch-2.1](timezone) refactor tzdata load to accelerate and unify timezone parsing (#37062 ) (#37269 ) pick https://github.com/apache/doris/pull/37062 1. revert https://github.com/apache/doris/pull/25097. we decide to rely on OS. not maintain independent tzdata anymore to keep result consistency 2. refactor timezone load. removed rwlock. before: ```sql mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates; +-------------------------------------------------------------------------------------+--------------------------------------------------------+ \| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) \| count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) \| +-------------------------------------------------------------------------------------+--------------------------------------------------------+ \| 16000000 \| 16000000 \| +-------------------------------------------------------------------------------------+--------------------------------------------------------+ 1 row in set (6.88 sec) ``` now: ```sql mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates; +-------------------------------------------------------------------------------------+--------------------------------------------------------+ \| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) \| count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) \| +-------------------------------------------------------------------------------------+--------------------------------------------------------+ \| 16000000 \| 16000000 \| +-------------------------------------------------------------------------------------+--------------------------------------------------------+ 1 row in set (2.61 sec) ``` 3. now don't support timezone offset format string like 'UTC+8', like we already said in https://doris.apache.org/docs/dev/query/query-variables/time-zone/#usage 4. support case-insensitive timezone parsing in nereids. 5. a bug when parse timezone using nereids. should check DST by input, but wrongly by now before. now fixed. doc pr: https://github.com/apache/doris-website/pull/810	2024-07-15 10:56:48 +08:00
Yulei-Yang	969f7532d6	[fix](deps) fix NoSuchMethodError: newInstanceFromKeytab when use kerberos (#37322 )	2024-07-08 10:19:02 +08:00
yiguolei	47ded2c6a0	Revert "[fix](compile) fix two compile errors on MacOS (#33834 ) (#34005 )" This reverts commit 743fb62a2c42cc5cc662583c235f7336d5e6ddef.	2024-04-26 00:55:21 +08:00
camby	743fb62a2c	[fix](compile) fix two compile errors on MacOS (#33834 ) (#34005 )	2024-04-25 19:39:35 +08:00
morningman	3e6d0fa35b	[branch-2.1](auditlog) remove auditlog build command in build.sh	2024-03-28 00:07:09 +08:00
Mingyu Chen	c0d7a5660e	[fix](paimon) support paimon with hive2 (#32455 ) In order to support paimon with hive2, we need to modify the origin HiveMetastoreClient.java to let it compatible with both hive2 and hive3. And this modified HiveMetastoreClient should be at the front of the CLASSPATH, so that it can overwrite the HiveMetastoreClient in hadoop jar. This PR mainly changes: 1. Copy HiveMetastoreClient.java in FE to BE's preload jar. 2. Split the origin `preload-extensions-jar-with-dependencies.jar` into 2 jars 1. `preload-extensions-project.jar`, which contains the modified HiveMetastoreClient. 2. `preload-extensions-jar-with-dependencies.jar`, which contains other dependency jars. 3. Modify the `start_be.sh`, to let `preload-extensions-project.jar` be loaded first. 4. Change the way the assemble the jni scanner jar Only need to assemble the project jar, without other dependencies. Because actually we only use classed under `org.apache.doris` package. So remove other unused dependency jars can also reduce the output size of BE. 5. fix bug that the prefix of paimon properties should be `paimon.`, not `paimon` 6. Support paimon with hive2 User can set `hive.version` in paimon catalog properties to specify the hive version.	2024-03-26 15:31:07 +08:00
wudongliang	33e40345b6	[Improve](build-script)Add be-extension-ignore to ignore avro-scanner when building release packages (#32699 )	2024-03-24 08:06:46 +08:00
wangbo	e7919ab1b6	Add DISABLE_BUILD_UI/DISABLE_BUILD_SPARK_DPP/DISABLE_BUILD_HIVE_UDF in build.sh (#32595 )	2024-03-22 08:52:16 +08:00
zy-kkk	3da8e4b04a	[chore](build) delete palo_be soft link (#32353 )	2024-03-21 14:07:22 +08:00
yagagagaga	29b858d8c9	[chore](build) Using multithread to accelerate FE compilation (#31855 )	2024-03-07 16:53:49 +08:00
zclllyybb	b177b26d39	[branch-2.1](tracing) Pick pipeline tracing and relative bugfix (#31367 ) * [Feature](pipeline) Trace pipeline scheduling (part I) (#31027) * [fix](compile) Fix performance compile fail #31305 * [fix](compile) Fix macOS compilation issues for PURE macro and CPU core identification (#31357) * [fix](compile) Correct PURE macro definition to fix compilation on macOS * 2 --------- Co-authored-by: zy-kkk <zhongyk10@gmail.com>	2024-02-29 08:42:35 +08:00
kindred77	3b093cabd1	Fix building issue in be on ubuntu with test enabled. (#31407 ) Co-authored-by: tangye <tangye@bestpay.com.cn>	2024-02-27 10:12:44 +08:00
HowardQin	0d32aeeaf6	[improvement](load) Enable lzo & Remove dependency on Markus F.X.J. Oberhumer's lzo library (#30573 ) Issue Number: close #29406 1. increase lzop version to 0x1040, I set to 0x1040 only for decompressing lzo files compressed by higher version of lzop, no change of decompressing logic, actully, 0x1040 should have "F_H_FILTER" feature, but it mainly for audio and image data, so we do not support it. 2. use orc::lzoDecompress() instead of lzo1x_decompress_safe() to decompress lzo data 3. use crc32c::Extend() instead of lzo_crc32() 4. use olap_adler32() instead of lzo_adler32() 5. thus, remove dependency of Markus F.X.J. Oberhumer's lzo library 6. remove DORIS_WITH_LZO, so lzo file are supported by stream and broker load by default 7. add some regression test	2024-02-05 22:00:24 +08:00
Guangming Lu	74202fe7e2	[Enhancement] support user custom defined privite settings,xml for build fe (#29015 ) Issue Number: close #29014	2023-12-31 13:42:56 +08:00
Gavin Chou	2018ab23f0	[chore](build) Add MVN_OPT env variable to enrich building FE with extra options (#28375 ) e.g. just export or add it to custom_env.sh ``` export MVN_OPT="-o" ``` will build FE with maven option "-o" (offline), which means maven does not need to download meta from maven repo, it is useful for saving time if the internet is unstable or unusable.	2023-12-15 13:20:39 +08:00
zclllyybb	b580ee91ce	[fix](compile) fix macOS compile and format code (#27494 )	2023-11-23 23:24:10 +08:00
Adonis Ling	d2eea9b3ae	[chore](macOS) Reduce the size of executables on macOS arm64 (#26894 ) Like #15641, we should reduce the size of executables on macOS arm64. Otherwise, we can not run doris_be and doris_be_test with ASAN build type on macOS arm64 now.	2023-11-14 12:21:08 +08:00
minghong	97646b098e	[compile](submodule) move submodule update to BE compile (#26109 ) orc and clucene is only used by BE, thus move the update to BE compile part	2023-10-31 16:51:36 +08:00
airborne12	11e04f76fb	[Enhancement](submodule) print current submodule commit id when building (#26017 )	2023-10-27 16:28:27 +08:00
zclllyybb	9a675fcdfc	[chore](be) Add default timezone files (#25097 )	2023-10-20 13:12:24 +08:00
Jerry Hu	b6b8ef3a18	[chore](script) avoid failed while buiding on non-git repository (#23982 ) Co-authored-by: yiguolei <676222867@qq.com>	2023-09-08 10:08:00 +08:00
airborne12	7727535b91	[Enhancement](build) try to download commit specific source code when git submodule failed (#23846 )	2023-09-04 20:12:46 +08:00
Qi Chen	57ca7d66d3	[Fix](multi-catalog) Fix zlib init error by using doris's zlib shared library and `jni.log` does not output. (#23260 )	2023-09-02 21:44:14 +08:00
slothever	5ba505ebf4	[fix](multi-catalog)fix avro and jdbc scanner dependency (#23015 ) add preload-extensions module, put all conflict dependencies to pom.xml in preload-extensions	2023-08-20 19:28:17 +08:00
slothever	919bfd73f1	[improvement](multi-catalog)add scanner isolation class loader (#22247 ) Add scanner isolation class loader to make each plugin non-conflicting. The BE will get scanner classes by JNI call and use JniClassLoader load them. In the last version，we always get canner classes from the system class path by default, so it cannot isolate the classes for each scanner	2023-08-10 10:02:46 +08:00
airborne12	8371171e44	[Feature](inverted index) add inverted index tool (#22207 )	2023-07-27 21:28:34 +08:00
ZhangYu0123	b2be42c31c	[fix](jdbc catalog) fix jdbc catalog like expr query error (#22141 )	2023-07-25 22:30:28 +08:00
LiBinfeng	f84af95ac4	[feature](Nereids) Add minidump replay and refactor user feature of minidump (#20716 ) ### Two main changes: - 1、add minidump replay - 2、change minidump serialization of statistic messages and some interface between main logic of nereids optimizer and minidump ### Use of nereids ut: - 1、save minidump files: Execute command by mysql-client: ``` set enable_nereids_planner=true; set enable_minidump=true; ``` Execute sql in mysql-client - 2、use nereids-ut script to execute directory: ``` cp -r ${DORIS_HOME}/minidump ${DORIS_HOME}/output/fe && cd ${DORIS_HOME}/output/fe ./nereids_ut --d ${directory_of_minidump_files} ``` ### Refactor of minidump - move statistics used serialization to serialization of input and serialize with catalogs - generating minidump file only when enable_minidump flag is set, minidump module interactive with main optimizer only by : serializeInputsToDumpFile(catalog, statistics, query) && serializeOutputsToDumpFile(outputplan).	2023-07-25 15:26:19 +08:00
Xinyi Zou	d180ed418d	[fix](stacktrace) Speed up stack trace (#21755 ) Introduce libunwind get stack trace, cost is negligible and has line numbers. use StackTraceCache, PHDRCache speed up, is customizable and has some optimizations. Other stack trace tools remain: glog, boost, glibc, in case for need. TODO: currently support linux __x86_64__, __arm__, __powerpc__, not supported __FreeBSD__, APPLE Note: __arm__, __powerpc__ not been verified Support signal handle libunwid support unw_backtrace for jemalloc Use of undefined compile option USE_MUSL for later	2023-07-19 15:43:14 +08:00
Qi Chen	fde73b6cc6	[Fix](multi-catalog) Fix hadoop short circuit reading can not enabled in some environments. (#21516 ) Fix hadoop short circuit reading can not enabled in some environments. - Revert #21430 because it will cause performance degradation issue. - Add `$HADOOP_CONF_DIR` to `$CLASSPATH`. - Remove empty `hdfs-site.xml`. Because in some environments it will cause hadoop short circuit reading can not enabled. - Copy the hadoop common native libs(which is copied from https://github.com/apache/doris-thirdparty/pull/98 ) and add it to `LD_LIBRARY_PATH`. Because in some environments `LD_LIBRARY_PATH` doesn't contain hadoop common native libs, which will cause hadoop short circuit reading can not enabled.	2023-07-06 15:00:26 +08:00
DongLiang-0	a6b51ec19a	[Feature](avro) Support Apache Avro file format (#19990 ) support read avro file by hdfs() or s3() . ```sql select * from s3( "uri" = "http://127.0.0.1:9312/test2/person.avro", "ACCESS_KEY" = "ak", "SECRET_KEY" = "sk", "FORMAT" = "avro"); +--------+--------------+-------------+-----------------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------------+ \| Alyssa \| 1 \| 10.0012 \| 100000000221133 \| \| Ben \| 0 \| 5555.999 \| 4009990000 \| \| lisi \| 0 \| 5992225.999 \| 9099933330 \| +--------+--------------+-------------+-----------------+ select * from hdfs( "uri" = "hdfs://127.0.0.1:9000/input/person2.avro", "fs.defaultFS" = "hdfs://127.0.0.1:9000", "hadoop.username" = "doris", "format" = "avro"); +--------+--------------+-------------+-----------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------+ \| Alyssa \| 1 \| 8888.99999 \| 89898989 \| +--------+--------------+-------------+-----------+ ``` current avro reader only support common data type, the complex data types will be supported later.	2023-06-28 21:15:35 +08:00
Mingyu Chen	1dec592e91	[improvement](fs_bench) optimize the usage of fs benchmark tool for hdfs (#21154 ) Optimize the usage of fs benchmark tool: 1. Remove `Open` benchmark, it is useless. 2. Remove `Delete` benchmark, it is dangerous. 3. Add `SingleRead` benchmark, user can specify an exist file to test read operation: `sh bin/run-fs-benchmark.sh --conf=conf/hdfs_read.conf --fs_type=hdfs --operation=single_read` 4. Modify the `run-fs-benchmark.sh`, remove `OPTS` section, use options in `fs_benchmark_tool` directly 5. Add some custom counters in the benchmark result, eg: ``` -------------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------------------- HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 6864 ms 2385 ms 1 ReadRate=200.936M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 3919 ms 1828 ms 1 ReadRate=351.96M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1 3839 ms 1819 ms 1 ReadRate=359.265M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_mean 4874 ms 2011 ms 3 ReadRate=304.054M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_median 3919 ms 1828 ms 3 ReadRate=351.96M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_stddev 1724 ms 324 ms 3 ReadRate=89.3768M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_cv 35.37 % 16.11 % 3 ReadRate=29.40% HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_max 6864 ms 2385 ms 3 ReadRate=359.265M/s HdfsReadBenchmark/iterations:1/repeats:3/manual_time/threads:1_min 3839 ms 1819 ms 3 ReadRate=200.936M/s ``` - For `open_read` and `single_read`, add `ReadRate` as `bytes per second`. - For `create_write`, add `WriteRate` as `bytes per second`. - For `exists` and `rename`, add `ExistsCost` and `RenameCost` as `time cost per one operation`.	2023-06-26 11:37:14 +08:00
Mingyu Chen	615778924e	[feature](fs) add fs benchmark tool framework (#20770 ) Add an optional executable binary fs_benchmark_tool, for test the performance of file system such as hdfs, s3. Usage: ./fs_benchmark_tool --conf my.conf --fs_type=s3 --operation=read --iterations=5 in my.conf, you can add any config key value with following format: key1=value1 key2=value2 By default, this binary will not be built. Only build it when setting BUILD_FS_BENCHMARK=ON. The binary will be installed in output/be/lib. For developer, you can add new subclass of BaseBenchmark to add your own benchmark. See be/src/io/fs/benchmark/s3_benchmark.hpp for an example	2023-06-14 17:50:06 +08:00
Mingyu Chen	cd46f459db	[minor](script) fix typo in build.sh (#20757 )	2023-06-14 09:05:01 +08:00
Mingyu Chen	5d2758cb8f	[improvement](build) move add BE extension jars to java_extensions dir (#20740 ) Follow #20185 Move all BE java extension jars to `be/lib/java_extensions/` dir. Also remove `udf` dir, used for BE native udf, which is deprecated since v1.2 The final output is: ``` output ├── be │ ├── bin │ ├── conf │ ├── dict │ ├── lib \| ├── java_extensions │ ├── hudi-scanner-jar-with-dependencies.jar │ ├── java-udf-jar-with-dependencies.jar │ ├── jdbc-scanner-jar-with-dependencies.jar │ ├── max-compute-scanner-jar-with-dependencies.jar │ └── paimon-scanner-jar-with-dependencies.jar │ ├── LICENSE-dist.txt │ ├── licenses │ ├── log │ ├── NOTICE.txt │ ├── storage │ └── www └── fe ├── bin ├── conf ├── doris-meta ├── lib ├── LICENSE-dist.txt ├── licenses ├── log ├── mysql_ssl_default_certificate ├── NOTICE.txt ├── spark-dpp └── webroot ```	2023-06-13 18:55:12 +08:00
lexluo09	57656b2459	[Enhancement](java-udf) java-udf module split to sub modules (#20185 ) The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner. Co-authored-by: lexluo <lexluo@tencent.com>	2023-06-13 09:41:22 +08:00
airborne12	a0c3ebeeab	[Chore](build) fix typo in build.sh (#19846 )	2023-05-19 11:49:20 +08:00
zclllyybb	cbe12cfb69	[chore](build) Support specifying output path #19669 now we could ./build.sh --be --fe --output PATH_TO_BINARY_YOU_LIKE did not modify the default value	2023-05-19 09:22:36 +08:00
Mingyu Chen	3d795de2d5	[chore](build) avoid generating generated code every time (#19813 ) When calling generated-source.sh in build.sh, not to remove the gensrc/build dir.	2023-05-19 08:47:36 +08:00
Mingyu Chen	b3ce4593b1	[deps](libhdfs) update hadoop libhdfs to 3.3.4.1 for doris (#19832 )	2023-05-19 08:44:32 +08:00
airborne12	f32deb18e9	[Update](build) change clucene from thirdparty to git module (#19352 )	2023-05-19 08:25:51 +08:00
yongkang.zhong	3cc8bbb93f	[chore](Java UDF)remove the error code and add the copy jar (#19503 ) * [chore](Java UDF)remove the error code and add the copy jar * [chore](Java UDF)remove the error code and add the copy jar	2023-05-11 16:17:29 +08:00
yongkang.zhong	b72ff93c7a	[chore](java udf)Add Java UDF compilation options (#19468 )	2023-05-10 10:51:11 +08:00
Qi Chen	096aa25ca6	[improvement](orc-reader) Implements ORC lazy materialization (#18615 ) - Implements ORC lazy materialization, integrate with the implementation of https://github.com/apache/doris-thirdparty/pull/56 and https://github.com/apache/doris-thirdparty/pull/62. - Refactor code: Move `execute_conjuncts()` and `execute_conjuncts_and_filter_block()` in `parquet_group_reader `to `VExprContext`, used by parquet reader and orc reader. - Add session variables `enable_parquet_lazy_materialization` and `enable_orc_lazy_materialization` to control whether enable lazy materialization. - Modify `build.sh` to update apache-orc submodule or download package every time.	2023-05-09 23:33:33 +08:00
DeadlineFen	e08de52ee7	[chore](compile) using PCH for compilation acceleration under clang (#19303 )	2023-05-08 19:51:06 +08:00
minghong	c936810e83	[fix](compile) fix bug in build.sh (#19314 ) fix path for $(dirname $0)/generated-source.sh to enable docker build	2023-05-06 10:00:20 +08:00
Mingyu Chen	c9fa10ac10	[fix](doc) avoid generate config doc automatically (#19302 ) After #19246, when compilng FE, it will automatically generate Config and Session Variables doc and overwrite the origin one. Need to avoid it because it is not ready to use yet	2023-05-05 20:39:05 +08:00
Mingyu Chen	9d18be9dd3	[doc](thrift) update doc for thrift 0.16 (#19217 ) * 1 update doc for thrift 0.16	2023-05-02 16:00:10 +08:00
Pxl	ec517a53a8	[Chore](build) upgrade clang-format version to 16 && move thrift to fe-common (#19155 ) upgrade clang-format version to 16 move thrift to fe-common fix core dump on pipeline engine when operator canceled and not prepared	2023-04-28 14:14:51 +08:00
Pxl	c40860aba4	[Chore](thrift) generate thrift java code to make code analysis work well (#18793 ) generate thrift java code to make code analysis work well	2023-04-19 19:33:17 +08:00

1 2 3 4

188 Commits