doris

Author	SHA1	Message	Date
zy-kkk	f498beed07	[improvement](jdbc)Support for automatically obtaining the precision of the trino/presto timestamp type (#21386 )	2023-07-04 18:59:42 +08:00
zy-kkk	aec5bac498	[improvement](jdbc)Support for automatically obtaining the precision of the hana timestamp type (#21380 )	2023-07-04 18:59:21 +08:00
zy-kkk	b27fa70558	[fix](jdbc) fix presto jdbc catalog pushDown and nameFormat (#21447 )	2023-07-04 18:58:33 +08:00
zy-kkk	be406a1696	[typo](docs) fix presto jdbc catalog docs (#21445 )	2023-07-04 18:24:58 +08:00
YueW	899f7fbfeb	[fix](regression case) fix variable scope bug in some inverted index regression cases (#21194 ) fix variable scope bug in some inverted index regression cases	2023-07-04 18:05:46 +08:00
AKIRA	9d997b9349	[revert](nereids) Revert data size agg (#21216 ) To make stats derivation more precise	2023-07-04 18:02:15 +08:00
jakevin	1b86e658fd	[fix](Nereids): decrease the memo GroupExpression of limits (#21354 )	2023-07-04 17:15:41 +08:00
Mingyu Chen	13fb69550a	[improvement](kerberos) disable hdfs fs handle cache to renew kerberos ticket at fix interval (#21265 ) Add a new BE config `kerberos_ticket_lifetime_seconds`, default is 86400. Better set it same as the value of `ticket_lifetime` in `krb5.conf` If a HDFS fs handle in cache is live longer than HALF of this time, it will be set as invalid and recreated. And the kerberos ticket will be renewed.	2023-07-04 17:13:34 +08:00
Mingyu Chen	c2b483529c	[fix](heartbeat) need to set backend status base on edit log (#21410 ) For non-master FE, must set Backend's status based on the content of edit log. There is a bug that if we set fe config: `max_backend_heartbeat_failure_tolerance_count` larger that one, the non-master FE will not set Backend as dead until it receive enough number of heartbeat edit log, which is wrong. This will causing the Backend is dead on Master FE, but is alive on non-master FE	2023-07-04 17:12:53 +08:00
Ashin Gau	9adbca685a	[opt](hudi) use spark bundle to read hudi data (#21260 ) Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data. Advantage for using spark-bundle to read hudi data: 1. The performance of spark-bundle is more than twice that of hive-bundle 2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm 3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris Disadvantage for using spark-bundle to read hudi data: 1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M) 2. spark-bundle only provides `RDD` interface and cannot be used directly	2023-07-04 17:04:49 +08:00
morrySnow	90dd8716ed	[refactor](multicast) change the way multicast do filter, project and shuffle (#21412 ) Co-authored-by: Jerry Hu <mrhhsg@gmail.com> 1. Filtering is done at the sending end rather than the receiving end 2. Projection is done at the sending end rather than the receiving end 3. Each sender can use different shuffle policies to send data	2023-07-04 16:51:07 +08:00
hqx871	09f414e0f4	fix lru cache handle field order (#21435 ) For LRUHandle, all fields should be put ahead of key_data. The LRUHandle is allocated using malloc and starting from field key_data is for key data.	2023-07-04 16:10:05 +08:00
jakevin	9e8501f191	[Performance](Nereids): speedup analyze by removing sort()/addAll() in OptimizeGroupExpressionJob to (#21452 ) sort() and allAll() all rules will cost much time and it's useless action, remove them to speed up. explain tpcds q72: 1.72s -> 1.46s	2023-07-04 16:01:54 +08:00
Huang Haijun	890e55b604	[typo](docs)Delete unsupported sql statements in GROUP_CONCAT() (#21455 ) Delete unsupported sql statements in GROUP_CONCAT()	2023-07-04 14:46:49 +08:00
Pxl	65cb91e60e	[Chore](agg-state) add sessionvariable enable_agg_state (#21373 ) add sessionvariable enable_agg_state	2023-07-04 14:25:21 +08:00
Kang	9477436524	[fix](test) add def keyword to define local variable success (#21206 ) add def keyword to define local variable success	2023-07-04 14:24:37 +08:00
Jerry Hu	b5da3f74f5	[improvement](join) avoid unnecessary copying in _build_output_block (#21360 ) If the source columns are mutually exclusive within a temporary block, there is no need to duplicate the data.	2023-07-04 12:13:49 +08:00
Kaijie Chen	cac465472a	[chore](tools) add submodules in .idea/vcs.xml (#21383 )	2023-07-04 11:44:09 +08:00
Calvin Kirs	e4c0a0ac24	[improve](dependency)Upgrade dependency version (#21431 ) exclude old netty version upgrade spring-boot version to 2.7.13 used ojdbc8 replace ojdbc6 upgrade jackson version to 2.15.2 upgrade fabric8 version to 6.7.2	2023-07-04 11:29:21 +08:00
Xinyi Zou	b86dd11a7d	[fix](pipeline) refactor olap table sink close (#20771 ) For pipeline, olap table sink close is divided into three stages, try_close() --> pending_finish() --> close() only after all node channels are done or canceled, pending_finish() will return false, close() will start. this will avoid block pipeline on close(). In close, check the index channel intolerable failure status after each node channel failure, if intolerable failure is true, the close will be terminated in advance, and all node channels will be canceled to avoid meaningless blocking.	2023-07-04 11:27:51 +08:00
zhangdong	8cbc1d58e1	[fix](MTMV) Disable partition specification temporarily (#20793 ) The syntax for supporting partition updates in the future has not been investigated yet and there are issues with partition syntax. Therefore, the partition syntax has been temporarily removed in the current version and will be added after future research.	2023-07-04 11:09:04 +08:00
jakevin	d5f39a6e54	[Performance](Nereids) refactor code speedup analyze (#21458 ) refactor those code which cost much time.	2023-07-04 10:59:07 +08:00
starocean999	599ba4529c	[fix](nereids) need run ConvertInnerOrCrossJoin rule again after EliminateNotNull (#21346 ) after running EliminateNotNull rule, the join conjuncts may be removed from inner join node. So need run ConvertInnerOrCrossJoin rule to convert inner join with no join conjuncts to cross join node.	2023-07-04 10:52:36 +08:00
Kaijie Chen	b1c16b96d6	[refactor](load) move validator out of VOlapTableSink (#21460 )	2023-07-04 10:16:56 +08:00
TengJianPing	938c0765cd	[improvement](memory) improve inserting sparse rows into string column (#21420 ) For the following test, which simulate hash join outputing 435699854 rows from 5131 buiding rows: { auto col = doris::vectorized::ColumnString::create(); constexpr int build_rows = 5131; constexpr int output_rows = 435699854; std::string str("01234567"); for (int i = 0; i < build_rows; ++i) { col->insert_data(str.data(), str.size()); } int indices[output_rows]; for (int i = 0; i < output_rows; ++i) { indices[i] = i % build_rows; } auto col2 = doris::vectorized::ColumnString::create(); doris::MonotonicStopWatch watch; watch.start(); col2->insert_indices_from(*col, indices, indices + output_rows); watch.stop(); LOG(WARNING) << "string column insert_indices_from, rows: " << output_rows << ", time: " << doris::PrettyPrinter::print(watch.elapsed_time(), doris::TUnit::TIME_NS); } The ColumnString::insert_indices_from inserting time improve from 6s665ms to 3s158ms: W0702 23:08:39.672044 1277989 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 3s153ms W0702 23:09:36.368853 1282061 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 3s158ms W0703 00:30:26.093307 1468640 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 6s761ms W0703 00:31:21.043638 1472937 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 6s665ms	2023-07-04 09:34:10 +08:00
xzj7019	70f473f32c	[improvement](nereids) Refine tpcds tools (#21421 ) Refine tpcds test tools, including split 99 cases into separate files, and refine 100g schema with range partition format. --------- Co-authored-by: zhongjian.xzj <zhongjian.xzj@zhongjianxzjdeMacBook-Pro.local>	2023-07-04 09:28:02 +08:00
ZenoYang	790b771a49	[improvement](execute) Eliminate virtual function calls when serializing and deserializing aggregate functions (#21427 ) Eliminate virtual function calls when serializing and deserializing aggregate functions. For example, in AggregateFunctionUniq::deserialize_and_merge method, calling read_pod_binary(ref, buf) in the for loop generates a large number of virtual function calls. void deserialize_and_merge(AggregateDataPtr __restrict place, BufferReadable& buf, Arena* arena) const override { auto& set = this->data(place).set; UInt64 size; read_var_uint(size, buf); set.rehash(size + set.size()); for (size_t i = 0; i < size; ++i) { KeyType ref; read_pod_binary(ref, buf); set.insert(ref); } } template <typename Type> void read_pod_binary(Type& x, BufferReadable& buf) { buf.read(reinterpret_cast<char*>(&x), sizeof(x)); } BufferReadable has only one subclass, VectorBufferReader, so it is better to implement the BufferReadable class directly. The following sql was tested on SSB-flat dataset: SELECT COUNT (DISTINCT lo_partkey), COUNT (DISTINCT lo_suppkey) FROM lineorder_flat; before: MergeTime: 415.398ms after opt: MergeTime: 174.660ms	2023-07-04 09:26:37 +08:00
Xiangyu Wang	11e18f4c98	[Fix](multi-catalog) fix NPE for FileCacheValue. (#21441 ) FileCacheValue.files may be null if there is not any files exists for some partitions.	2023-07-03 23:38:58 +08:00
FreeOnePlus	5e6242e235	[typo](docs) Refactor upgrade documentation (#21449 ) Co-authored-by: Yijia Su <suyijia@selectdb.com>	2023-07-03 20:14:19 +08:00
morrySnow	bb33ad0bde	[opt](docs) update nereids doc to reflect the latest changes (#21444 )	2023-07-03 18:50:01 +08:00
Pxl	f7c724f8a3	[Bug](excution) avoid core dump on filter_block_internal and add debug information (#21433 ) avoid core dump on filter_block_internal and add debug information	2023-07-03 18:10:30 +08:00
starocean999	63b170251e	[fix](nereids)cast filter and join conjunct's return type to boolean (#21434 )	2023-07-03 17:22:46 +08:00
Mingyu Chen	d4a1549003	[minor](broker) fix name in broker's pom.xml (#20840 ) change palo -> doris do not check compiler's version inenv.sh, because building broker does not need gcc compiler. And the version is also checked in CMakefile	2023-07-03 16:46:47 +08:00
Qi Chen	f80df20b6f	[Fix](multi-catalog) Fix read error in mixed partition locations. (#21399 ) Issue Number: close #20948 Fix read error in mixed partition locations(for example, some partitions locations are on s3, other are on hdfs) by `getLocationType` of file split level instead of the table level.	2023-07-03 15:14:17 +08:00
Qi Chen	88b2d81873	[Fix](multi-catalog) Add hadoop system classpath to CLASSPATH to resolve can not enable hadoop short circuit reading in some environments. (#21430 ) Add hadoop system classpath to CLASSPATH to resolve can not enable hadoop short circuit reading in some environments.	2023-07-03 14:51:34 +08:00
jakevin	9fa2dac352	[fix](Nereids): DefaultPlanRewriter visit plan children. (#21395 )	2023-07-03 13:20:01 +08:00
minghong	17af099dc3	[fix](nereids)miss group id in explain plan #21402 after we introduce "PushdownFilterThroughProject" post processor, some plan node missed their groupExpression (withChildren function will remove groupExpression). this is not good for debug, since it takes more time to find the owner group of a plan node This pr record the missing owner group id in plan node mutableState.	2023-07-03 13:16:33 +08:00
amory	8e8a8da2e7	[Improve](regresstest) update collect distinct regress test for array hash (#21417 ) this regress sql can make sense of array hashing function is working fine	2023-07-03 12:16:11 +08:00
TengJianPing	7e02566333	[fix](pipeline) fix coredump caused by uncaught exception (#21387 ) For pipeline engine, ExecNode::create_tree may throw exception sometimes, e.g. SELECT MIN(-3.40282347e+38) FROM t0; will throw a exception bacause of invalid decimal precision. * Query id: 346886bf48494e77-96eeea5361233618 * * Aborted at 1688101183 (unix time) try "date -d @1688101183" if you are using GNU date * * Current BE git commitID: 2fcb0e090b * * SIGABRT unknown detail explain (@0x13ef42) received by PID 1306434 (TID 1306918 OR 0x7ff0763e1700) from PID 1306434; stack trace: * terminate called recursively 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t, void) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:413 1# 0x00007FFA8780E090 in /lib/x86_64-linux-gnu/libc.so.6 2# raise at ../sysdeps/unix/sysv/linux/raise.c:51 3# abort at /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81 4# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75 5# __cxxabiv1::__terminate(void ()()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48 6# 0x000055B6C30C7401 in /mnt/ssd01/doris-master/VEC_ASAN/be/lib/doris_be 7# 0x000055B6C30C7554 in /mnt/ssd01/doris-master/VEC_ASAN/be/lib/doris_be 8# doris::vectorized::create_decimal(unsigned long, unsigned long, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/data_types/data_type_decimal.cpp:167 9# doris::vectorized::DataTypeFactory::create_data_type(doris::TypeDescriptor const&, bool) at /home/zcp/repo_center/doris_master/doris/be/src/vec/data_types/data_type_factory.cpp:185 10# doris::vectorized::AggFnEvaluator::AggFnEvaluator(doris::TExprNode const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exprs/vectorized_agg_fn.cpp:79 11# std::unique_ptr<doris::vectorized::AggFnEvaluator, std::default_delete<doris::vectorized::AggFnEvaluator> > doris::vectorized::AggFnEvaluator::create_unique<doris::TExprNode const&>(doris::TExprNode const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exprs/vectorized_agg_fn.h:49 12# doris::vectorized::AggFnEvaluator::create(doris::ObjectPool, doris::TExpr const&, doris::TSortInfo const&, doris::vectorized::AggFnEvaluator*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exprs/vectorized_agg_fn.cpp:92 13# doris::vectorized::AggregationNode::init(doris::TPlanNode const&, doris::RuntimeState) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/vaggregation_node.cpp:158 14# doris::ExecNode::create_tree_helper(doris::RuntimeState, doris::ObjectPool, std::vector<doris::TPlanNode, std::allocator<doris::TPlanNode> > const&, doris::DescriptorTbl const&, doris::ExecNode, int, doris::ExecNode*) at /home/zcp/repo_center/doris_master/doris/be/src/exec/exec_node.cpp:276 15# doris::ExecNode::create_tree(doris::RuntimeState, doris::ObjectPool, doris::TPlan const&, doris::DescriptorTbl const&, doris::ExecNode) at /home/zcp/repo_center/doris_master/doris/be/src/exec/exec_node.cpp:231 16# doris::pipeline::PipelineFragmentContext::prepare(doris::TPipelineFragmentParams const&, unsigned long) at /home/zcp/repo_center/doris_master/doris/be/src/pipeline/pipeline_fragment_context.cpp:253 17# doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState, doris::Status)> const&)::$_1::operator()(int) const at /home/zcp/repo_center/doris_master/doris/be/src/runtime/fragment_mgr.cpp:895 18# doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState, doris::Status)> const&)::$_0::operator()() const at /home/zcp/repo_center/doris_master/doris/be/src/runtime/fragment_mgr.cpp:926 19# void std::__invoke_impl<void, doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState, doris::Status)> const&)::$_0&>(std::__invoke_other, doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState, doris::Status*)> const&)::$_0&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61	2023-07-03 10:58:13 +08:00
Gabriel	a3d34e1e08	[decimalv2](compatibility) add config to allow invalid decimalv2 literal (#21327 )	2023-07-03 10:55:27 +08:00
minghong	2827bc1a39	[Fix](nereids) fix a bug in ColumnStatistics.numNulls update #21220 no impact on tpch has impact on tpcds 95, before 1.63 sec, after 1.30 sec	2023-07-03 10:51:23 +08:00
Pxl	59c1bbd163	[Feature](materialized view) support query match mv with agg_state on nereids planner (#21067 ) * support create mv contain aggstate column * update * update * update * support query match mv with agg_state on nereids planner update * update * update	2023-07-03 10:19:31 +08:00
Pxl	f90e8fcb26	[Chore](storage) add debug info for TabletColumn::get_aggregate_function (#21408 )	2023-07-03 10:02:44 +08:00
Qi Chen	124516c1ea	[Fix](orc-reader) Fix `Wrong data type for column` error when column order in hive table is not same in orc file schema. (#21306 ) `Wrong data type for column` error when column order in hive table is not same in orc file schema. The root cause is in order to handle the following case: The table in orc format of Hive 1.x may encounter system column names such as `_col0`, `_col1`, `_col2`... in the underlying orc file schema, which need to use the column names in the hive table for mapping. ### Solution Currently fix this issue by handling the following case by specifying hive version to 1.x.x in the hive catalog configuration. ```sql CREATE CATALOG hive PROPERTIES ( 'hive.version' = '1.x.x' ); ```	2023-07-03 09:32:55 +08:00
Jerry Hu	ca0953ea51	[improvement](join) Serialize build keys in a vectorized (columnar) way (#21361 ) There is a significant performance improvement in serializing keys in the aggregate node through vectorization. Now, applying it to the join node also brings performance improvement.	2023-07-03 09:29:10 +08:00
slothever	f5af735fa6	[fix](multi-catalog)fix obj file cache and dlf iceberg catalog (#21238 ) 1. fix storage prefix for obj file cache: oss/cos/obs don't need convert to s3 prefix , just convert when create split 2. dlf iceberg catalog: support dlf iceberg table, use s3 file io.	2023-07-02 21:08:41 +08:00
xy720	f74e635aa5	[bug](proc) fix NumberFormatException in show proc '/current_queries' (#21400 ) If the current query is running for a very long time, the ExecTime of this query may larger than the MAX_INT value, then a NumberFormatException will be thrown when execute "show proc '/current_queries'." The query's ExecTime is long type, we should not use 'Integer.parseInt' to parse it.	2023-07-01 17:42:46 +08:00
Kaijie Chen	1c961f2272	[refactor](load) move generate_delete_bitmap from memtable to beta rowset writer (#21329 )	2023-07-01 17:22:45 +08:00
Mingyu Chen	887d33c789	[fix](cup) add keywords KW_PERCENT (#21404 ) Or it may cause some edit log replay error, like parsing create routine load stmt, which contains this keyword as a column name	2023-07-01 16:53:54 +08:00
Gabriel	7075bcc526	[tools](refactor) remove unused session variables (#21405 )	2023-07-01 16:14:36 +08:00

1 2 3 4 5 ...

11652 Commits