Commit Graph

9179 Commits

Author SHA1 Message Date
06dee69174 [Refactor](map) remove using column array in map to reduce offset column (#17330)
1. remove column array in map 
2. add offsets column in map 
Aim to reduce duplicate offset  from key-array and value-array in disk
2023-03-09 11:22:26 +08:00
368e6a4f9c [Bug](array filter) Fix bug due to ColumnArray::filter_generic invalid inplace size_at after set_end_ptr (#17554)
We should make a new PodArray to add items instead of do it inplace
2023-03-09 10:59:29 +08:00
00727e8c11 [fix](in-bitmap) fix result may be wrong if the left side of the in bitmap predicate is a constant (#17570) 2023-03-09 10:59:05 +08:00
Pxl
65b8dfc7ff [Enchancement](function) Inline some aggregate function && remove nullable combinator (#17328)
1. Inline some aggregate function
2. remove nullable combinator
2023-03-09 10:39:04 +08:00
6923bf8d7b [fix](file cache)fix block file cache can't be configured (#17511) 2023-03-09 10:12:08 +08:00
397cc011c4 [fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420)
ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided.
Solved: 192/256 supports calculation without init vector

For other algorithms, an error should be reported when there is no init vector

Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector.

Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt

Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found
2023-03-09 09:51:41 +08:00
8a6a4b82aa [typo](docs) Add a hyperlink to facilitate user redirect. (#17563) 2023-03-09 09:47:10 +08:00
b6128f9b65 [dependenct](fe) Replace jackson-mapper-asl with fastxml-jsckson (#17303) 2023-03-09 09:35:58 +08:00
2b6d971c2f [fix](nereids)fix first_value/lead/lag window function bug in nereids (#17315)
* [fix](nereids)fix first_value/lead/lag window function bug in nereids

* add more test

* add order by to fix test case

* fix test cases
2023-03-09 09:35:27 +08:00
4822b9811a [feature](nereids)support bitmap runtime filter on nereids (#16927)
* A in(B) -> bitmap_contains(bitmap_union(B), A)
support bitmap runtime filter on nereids

* GroupPlan -> Plan

* fmt

* fix target cast problem
remove test code
2023-03-09 09:30:24 +08:00
2cf90ddfc5 [fix](scanner) remove useless _src_block_mem_reuse to avoid core dump while loading (#17559)
The _src_block_mem_reuse variable actually not work, since the _src_block is cleared each time when we call get_block.
But current code may cause core dump, see issue #17587. Because we insert some result column generated by expr into dest block, and such a column holds a pointer to some column in original schema. When clearing the data of _src_block, some column's data in dest block is also cleared.

e.g. coalesce will return a result column which holds a pointer to some original column, see issue #17588
2023-03-09 09:26:32 +08:00
ebda7ba5c6 [Fix](FQDN) fix slow when ip changed (#17455) 2023-03-09 09:07:16 +08:00
cb7d0fddfb (# [feature](ui)add profile download button 17547) 2023-03-08 21:54:33 +08:00
f0bd002911 [fix](DOE) Fix esquery not working (#17566)
Function esquery does not work because there is a problem parsing the first parameter type.
The first parameter, which is SlotRef, will be cast to CastExpr. This will cause error while generating ES DSL.
Add more types to adapt esquery function.
2023-03-08 21:51:17 +08:00
bd5ed2b0c2 [enhancement](histogram) optimize the histogram bucketing strategy, etc (#17264)
* optimize the histogram bucketing strategy, etc

* fix p0 regression of histogram
2023-03-08 20:12:05 +08:00
75e4f86c2d [fix](meta) fix catlog parameter when checking privilege of show_create_table stmt (#17445)
the ctl parameter of show_create_table stmt is not set in checkTblPriv, this is not correct for multicatalog
2023-03-08 19:50:31 +08:00
05b04e4c39 [BugFix](PG catalog) fix that pg catalog can not get all schemas that a pg user can access. (#17517)
Describe your changes.
In the past, pg catalog use sql SELECT schema_name FROM information_schema.schemata where schema_owner='<UserName>'; to select schemas of an user. Howerver, this sql can not find all schemas that a user can access, that because:

A user may not be the owner of an schema, but may have read permission on the schema.
A user may inherit the permissions of its user group and thus have read permissions on one schema.
For these reasons, we replace the sql statement with select nspname from pg_namespace where has_schema_privilege('<UserName>', nspname, 'USAGE');
2023-03-08 19:12:47 +08:00
678f34cad3 [fix](planner) insert default value should not change return type of function object in function set (#17536)
function now's return type changed to datetimev2 by mistake.
It can be reproduced in the following way

CREATE TABLE `testdt` (
  `c1` int(11) NULL,
  `c2` datetimev2 NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=OLAP
DUPLICATE KEY(`c1`, `c2`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`c1`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false"
);

 insert into testdt2(c1) values(1);

select now();
2023-03-08 17:08:28 +08:00
b1ca87eb9b [FIX](complex-type) fix Is null predict for map/struct (#17497)
Fix is null predicate is not supported in select statement for map and struct column
2023-03-08 17:03:06 +08:00
feacb15e71 [Improvement](datev2) push down datev2 predicates with date literal (#17522) 2023-03-08 16:54:54 +08:00
eea6d770d7 [fix](bitmap) fix wrong result of bitmap_or for null (#17456)
Result of select bitmap_to_string(bitmap_or(to_bitmap(1), null)) should be 1 instead of null.

This PR fix logic of bitmap_or and bitmap_or_count.

Other count related funcitons should also be checked and fix, they will be fixed in another PR.
2023-03-08 16:29:01 +08:00
36b6cea462 [feature-wip](nereids) Support Q-Error to measure the accuracy of derived statistics (#17185)
Collect each estimated output rows and exact output rows for each plan node, and use this to measure the accuracy of derived statistics. The estimated result is managed by ProfileManager. We would get this estimated result in the http request by query id later.
2023-03-08 16:26:24 +08:00
d908d5fe01 [dependency](fe)Dependency Upgrade (#17377)
* Upgrade log4j to 2.X
  - binding log4j version to 2.18.0
  - used log4j-1.2-api complete smooth upgrade
* Upgrade filerupload to 1.5
* Upgrade commons-io to 2.7
* Upgrade commons-compress to 1.22
* Upgrade gson to 2.8.9
* Upgrade guava to 30.0-jre
* Binding jackson version to 2.14.2
* Upgrade netty-all to 4.1.89.final
* Upgrade protobuf to 3.21.12
* Upgrade kafka-clints to 3.4.0
* Upgrade calcite version to 1.33.0
* Upgrade aws-java-sdk to 1.12.302
* Upgrade hadoop to 3.3.4
* Upgrade zookeeper to 3.4.14
* Binding tomcat-embed-core to 8.5.86
* Upgrade apache parent pom to 25
* Use hive-exec-core as a hive dependency, add the missing jar-hive-serde separately
* Basic public dependencies are extracted to parent dependencies
* Use jackson uniformly as the basic json tool
* Remove springloaded, spring-boot-devtools has the same functionality
* Modify the spark-related dependency scope to provide, which should be provided at runtime
2023-03-08 14:28:40 +08:00
aab14922af [Feature](Nereids) support MarkJoin (#16616)
# Proposed changes
1.The new optimizer supports the combination of subquery and disjunction.In the way of MarkJoin, it behaves the same as the old optimizer. For design details see:https://emmymiao87.github.io/jekyll/update/2021/07/25/Mark-Join.html.
2.Implicit type conversion is performed when conjects are generated after subquery parsing
3.Convert the unnesting of scalarSubquery in filter from filter+join to join + Conjuncts.
2023-03-08 14:26:24 +08:00
626fbc34f9 [bugfix](jsonb) Fix create mv using jsonb key cause be crash (#17430) 2023-03-08 14:18:26 +08:00
f3b50b3472 [enhance](cooldown) skip once failed follow cooldown tablet (#16810) 2023-03-08 14:14:13 +08:00
8001d65811 [fix](insert) fix memory leak for insert transaction (#17530) 2023-03-08 14:10:59 +08:00
273d2100ac [enhance](cooldown) turn write cooldown meta async (#16813) 2023-03-08 14:06:21 +08:00
3a877857ae [improvement](inverted index)Remove searcher bitmap timer to improve query speed (#17407)
Timer becomes a bottleneck when the query hit volume is very high.
2023-03-08 14:03:36 +08:00
335c1e5953 [fix](memory) Fix MacOS mem_limit parse error and GC after env Init #17528
Fix MacOS mem_limit parse result is 0.
Fix GC after env Init, otherwise, when the memory is insufficient, BE will start failure.
*** Query id: 0-0 ***
*** Aborted at 1677833773 (unix time) try "date -d @1677833773" if you are using GNU date ***
*** Current BE git commitID: 8ee5f45 ***
*** SIGSEGV address not mapped to object (@0x70) received by PID 24145 (TID 0x7fa53c9fd700) from PID 112; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at be/src/common/signal_handler.h:420
 1# os::Linux::chained_handler(int, siginfo*, void*) in /usr/local/jdk/jre/lib/amd64/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/local/jdk/jre/lib/amd64/server/libjvm.so
 3# signalHandler(int, siginfo*, void*) in /usr/local/jdk/jre/lib/amd64/server/libjvm.so
 4# 0x00007FA56295A400 in /lib64/libc.so.6
 5# doris::MemTrackerLimiter::log_process_usage_str(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) at be/src/runtime/memory/mem_tracker_limiter.cpp:208
 6# doris::MemTrackerLimiter::print_log_process_usage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) at be/src/runtime/memory/mem_tracker_limiter.cpp:226
 7# doris::Daemon::memory_maintenance_thread() at be/src/common/daemon.cpp:245
 8# doris::Thread::supervise_thread(void*) at be/src/util/thread.cpp:455
 9# start_thread in /lib64/libpthread.so.0
10# clone in /lib64/libc.so.6
2023-03-08 14:00:57 +08:00
4ea0d6c5fa [feature](array_function) add support for array_popfront (#17416) 2023-03-08 13:57:38 +08:00
b1d65f855d [Feature](array-function) Support array_concat function (#17436) 2023-03-08 13:57:16 +08:00
Pxl
e2ac06d6d6 [Chore](execution) change PipelineTaskState to enum class && remove some row-based code (#17300)
1. change PipelineTaskState to enum class
2. remove some row-based code on FoldConstantExecutor::_get_result
3. reduce memcpy on minmax runtime filter function(Now we can guarantee that the input data is aligned)
4. add Wunused-template check, and remove some unused function, change some static function to inline function.
2023-03-08 12:41:15 +08:00
2b6133f4d0 [feature](Nereids): pushdown complex project through inner/outer Join. (#17365) 2023-03-08 12:00:56 +08:00
778acb3c5b [opt](string) optimize string equal comparision (#17336)
Optimize string equal and not-equal comparison by using memequal_small_allow_overflow15.
2023-03-08 11:30:00 +08:00
4b743061b4 [feature](function) support type template in SQL function (#17344)
A new way just like c++ template is proposed in this PR. The previous functions can be defined much simpler using template function. 

    # map element extract template function
    [['element_at', '%element_extract%'], 'E', ['ARRAY<E>', 'BIGINT'], 'ALWAYS_NULLABLE', ['E']],

    # map element extract template function
    [['element_at', '%element_extract%'], 'V', ['MAP<K, V>', 'K'], 'ALWAYS_NULLABLE', ['K', 'V']],


BTW, the plain type function is not affected and the legacy ARRAY_X MAP_K_V is still supported for compatability.
2023-03-08 10:51:31 +08:00
9213dd906a [enhancement](exception) add exception structure and using unique ptr in VExplodeBitmapTableFunction (#17531)
add exception class in common.
using unique ptr in VExplodeBitmapTableFunction
support single exception or nested exception, like this:

---SingleException
[E-100] test OS_ERROR bug
@ 0x55e80b93c0d9 doris::Exception::Exception<>()
@ 0x55e80b938df1 doris::ExceptionTest_NestedError_Test::TestBody()
@ 0x55e82e16bafb testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x55e82e15ab3a testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x55e82e1361e3 testing::Test::Run()
@ 0x55e82e136f29 testing::TestInfo::Run()
@ 0x55e82e1376e4 testing::TestSuite::Run()
@ 0x55e82e148042 testing::internal::UnitTestImpl::RunAllTests()
@ 0x55e82e16dcab testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@ 0x55e82e15ce4a testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x55e82e147bab testing::UnitTest::Run()
@ 0x55e80c4b39e3 RUN_ALL_TESTS()
@ 0x55e80c4a99b5 main
@ 0x7f0a619d0493 __libc_start_main
@ 0x55e80b84602a _start
@ (nil) (unknown)
2023-03-08 10:44:14 +08:00
c97422bd3d [enhancement](regression-test) add sleep 3s for schema change and rollup (#17484)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-08 10:43:05 +08:00
4692d6764c [refactor](remove string val) remove string val structure, it is same with string ref (#17461)
remove stringval, decimalv2val, bigintval
2023-03-08 10:42:20 +08:00
f21508baec [chore](macOS) Disable detect_container_overflow at BE startup (#17514)
BE failed to start up due to container-overflow errors reported by address sanitizer.
2023-03-08 10:21:45 +08:00
ae916f7cb3 [docs](doc) Add docs for Apache Kyuubi (#17481)
* add kyuubi doc of zh-CN & en
2023-03-08 09:36:50 +08:00
a767472c56 [fix](DOE)Fix es p0 case error (#17502)
Fix es array parse error, introduced by #16806
2023-03-08 08:06:30 +08:00
69c62b6c6c [Fix](vectorization) fixed that when a column's _fixed_values exceeds the max_pushdown_conditions_per_column limit, the column will not perform predicate pushdown, but if there are subsequent columns that need to be pushed down, the subsequent column pushdown will be misplaced in _scan_keys and it causes query results to be wrong (#17405)
the max_pushdown_conditions_per_column limit, the column will not perform predicate pushdown, but if there are subsequent columns that need to be pushed down, the subsequent column pushdown will be misplaced in _scan_keys and it causes query results to be wrong
Co-authored-by: tongyang.hty <hantongyang@douyu.tv>
2023-03-08 07:23:56 +08:00
6b88df2bdd [enhancement](planner) support case transition of timestamp datatype when create table (#17305) 2023-03-07 21:03:25 +08:00
5334a5899e [fix](remote)fix whole file cache and sub file cache (#17468) 2023-03-07 19:55:18 +08:00
06468ba627 [vectorized](bug) fix array constructor function change origin column from block (#17296) 2023-03-07 16:42:23 +08:00
fd8adb492d [fix](nereids) fix bugs in nereids window function (#17284)
fix two problems:

1. push agg-fun in windowExpression down to AggregateNode
for example, sql:
select sum(sum(a)) over (order by b)
Plan:
windowExpression( sum(y) over (order by b))
+--- Agg(sum(a) as y, b)

2. push other expr to upper proj
for example, sql:
select sum(a+1) over ()
Plan:
windowExpression(sum(y) over ())
+--- Project(a + 1 as y,...)
+--- Agg(a,...)
2023-03-07 16:35:37 +08:00
8ccc805cd0 [Fix](Lightweight schema Change) query error caused by array default type is unsupported (#17331)
We have supportted array type default [], but when using lightweight schema Change to add column array type, query failed as follows:

Fix "array default type is unsupported" error.
Fix the default value filling assignment digit problem.
2023-03-07 16:30:41 +08:00
86252e25bf [regression-test](Nereids) add binary arithmetic regression test cases(#17363)
add all of the valid binary arithmetic expressions test for nereids.
currently, float, double, stringlike(string, char, varchar) doesn't support div, bitand, bitor, bitxor.
some results with float type are incorrect because of inaccurate precision of regression-test framework.
2023-03-07 15:48:22 +08:00
fca567068e [Enhancement](spark load)Support for RM HA (#15000)
Adding RM HA configuration to the spark load.
Spark can accept HA parameters via config, we just need to accept it in the DDL

CREATE EXTERNAL RESOURCE spark_resource_sinan_node_manager_ha
PROPERTIES
(
"type" = "spark",
"spark.master" = "yarn",
"spark.submit.deployMode" = "cluster",
"spark.executor.memory" = "10g",
"spark.yarn.queue" = "XXXX",
"spark.hadoop.yarn.resourcemanager.address" = "XXXX:8032",
"spark.hadoop.yarn.resourcemanager.ha.enabled" = "true",
"spark.hadoop.yarn.resourcemanager.ha.rm-ids" = "rm1,rm2",
"spark.hadoop.yarn.resourcemanager.hostname.rm1" = "XXXX",
"spark.hadoop.yarn.resourcemanager.hostname.rm2" = "XXXX",
"spark.hadoop.fs.defaultFS" = "hdfs://XXXX",
"spark.hadoop.dfs.nameservices" = "hacluster",
"spark.hadoop.dfs.ha.namenodes.hacluster" = "mynamenode1,mynamenode2",
"spark.hadoop.dfs.namenode.rpc-address.hacluster.mynamenode1" = "XXX:8020",
"spark.hadoop.dfs.namenode.rpc-address.hacluster.mynamenode2" = "XXXX:8020",
"spark.hadoop.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"working_dir" = "hdfs://XXXX/doris_prd_data/sinan/spark_load/",
"broker" = "broker_personas",
"broker.username" = "hdfs",
"broker.password" = "",
"broker.dfs.nameservices" = "XXX",
"broker.dfs.ha.namenodes.XXX" = "mynamenode1, mynamenode2",
"broker.dfs.namenode.rpc-address.XXXX.mynamenode1" = "XXXX:8020",
"broker.dfs.namenode.rpc-address.XXXX.mynamenode2" = "XXXX:8020",
"broker.dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
);

Co-authored-by: liujh <liujh@t3go.cn>
2023-03-07 15:46:14 +08:00