Commit Graph

13721 Commits

Author SHA1 Message Date
0e7f0abe61 [test] (Nereids) add regression-test of arithmetic expressions of decimalv3 for nereids (#17549)
add regression-test of decimalv3 for nereids and refactor some suites.
too many suites will be changed, so this pr we just add arithmetic test.

1. some tests are disabled because of unfixed results and precision, detailed a big integer mul and div a float will cause the latter and bit-op will cause the former.
2. the disabled tests with tag original planner are caused by unfixed results.
2023-03-22 11:25:49 +08:00
Pxl
40ca250678 [Feature](materialized-view) support where clause on create materialized view (#17534)
support where clause on create materialized view
2023-03-22 11:25:13 +08:00
Pxl
401836f523 [Bug](planner) fix core dump when lateral view above union node and have predicate (#17912)
fix core dump when lateral view above union node and have predicate
2023-03-22 11:24:45 +08:00
17a1ce5ed3 [fix](nereids) add a project node above sort node to eliminate unused order by keys (#17913)
if the order by keys are not simple slot in sort node, the order by exprs have to been added to sort node's output tuple. In that case, we need add a project node above sort node to eliminate the unused order by exprs. for example:

```sql
WITH t0 AS 
    (SELECT DATE_FORMAT(date,
         '%Y%m%d') AS date
    FROM cir_1756_t1 ), t3 AS 
    (SELECT date_format(date,
         '%Y%m%d') AS `date`
    FROM `cir_1756_t2`
    GROUP BY  date_format(date, '%Y%m%d')
    **ORDER BY  date_format(date, '%Y%m%d')** )
SELECT t0.date
FROM t0
LEFT JOIN t3
    ON t0.date = t3.date;
```

before:
```
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject[159] ( distinct=false, projects=[date#1], excepts=[], canEliminate=true )                                                         |
| +--LogicalJoin[158] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(date#1 = date#3)], otherJoinConjuncts=[] ) |
|    |--LogicalProject[151] ( distinct=false, projects=[date_format(date#0, '%Y%m%d') AS `date`#1], excepts=[], canEliminate=true )                |
|    |  +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t1, indexName=cir_1756_t1, selectedIndexId=412339, preAgg=ON )              |
|    +--LogicalSort[157] ( orderKeys=[date_format(cast(date#3 as DATETIME), '%Y%m%d') asc null first] )                                            |
|       +--LogicalAggregate[156] ( groupByExpr=[date#3], outputExpr=[date#3], hasRepeat=false )                                                    |
|          +--LogicalProject[155] ( distinct=false, projects=[date_format(date#2, '%Y%m%d') AS `date`#3], excepts=[], canEliminate=true )          |
|             +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t2, indexName=cir_1756_t2, selectedIndexId=412352, preAgg=ON )        |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
```

after:
```
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                   |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject[171] ( distinct=false, projects=[date#2], excepts=[], canEliminate=true )                                                         |
| +--LogicalJoin[170] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(date#2 = date#4)], otherJoinConjuncts=[] ) |
|    |--LogicalProject[162] ( distinct=false, projects=[date_format(date#0, '%Y%m%d') AS `date`#2], excepts=[], canEliminate=true )                |
|    |  +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t1, indexName=cir_1756_t1, selectedIndexId=1049812, preAgg=ON )             |
|    +--LogicalProject[169] ( distinct=false, projects=[date#4], excepts=[], canEliminate=false )                                                  |
|       +--LogicalSort[168] ( orderKeys=[date_format(cast(date#4 as DATETIME), '%Y%m%d') asc null first] )                                         |
|          +--LogicalAggregate[167] ( groupByExpr=[date#4], outputExpr=[date#4], hasRepeat=false )                                                 |
|             +--LogicalProject[166] ( distinct=false, projects=[date_format(date#3, '%Y%m%d') AS `date`#4], excepts=[], canEliminate=true )       |
|                +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t2, indexName=cir_1756_t2, selectedIndexId=1049825, preAgg=ON )    |
+--------------------------------------------------------------------------------------------------------------------------------------------------+
```
2023-03-22 11:19:32 +08:00
6cbf393665 [enhance](meta action) remove useless pb field and refactor writer cooldown meta code (#17652) 2023-03-22 11:13:13 +08:00
d4ca7cb57a [chore](macOS) Specify the version of LLVM for Homebrew to install it (#17945)
Clang 16 was released last week and we haven't ported the codebase to it. If Homebrew bumped the version of LLVM, our workflows would fail.
2023-03-22 11:09:28 +08:00
f600f70619 [ehancement](fe) Tune for stats framework (#17860) 2023-03-22 11:07:56 +08:00
173d68409c [enhencement](planner) update and delete support use alias for target table (#17914) 2023-03-22 11:07:39 +08:00
b91a3b5a72 [fix](planner) should not bind slot on brother's tuple in subquery (#17813)
consider the query like this:
```sql
SELECT
    k3, k4
FROM
    test
WHERE
    EXISTS( SELECT
            d.*
        FROM
            (SELECT
                k1 AS _1234, SUM(k2)
            FROM
                `test` d
            GROUP BY _1234) d
                LEFT JOIN
            (SELECT
                k1 AS _1234,
                    SUM(k2)
            FROM
                `test`
            GROUP BY _1234) temp ON d._1234 = temp._1234) 
ORDER BY k3, k4
```

when we analyze group by exprs in `temp` inline view. we bind the `_1234` on `d._1234` by mistake.
that because, when we do analyze in a **SUB-QUERY**, we will resolve SlotRef by itself **AND** parent's tuple. in the meanwhile, we register child's tuple to parent's analyzer. So, in a **SUB-QUERY**, the brother's tuple will affect the resolve result of current inlineview's slot.

This PR:

1. add a flag on the function `resolveColumnRef` in `Analyzer`
```java
private TupleDescriptor resolveColumnRef(String colName, boolean requestFromChild);
private TupleDescriptor resolveColumnRef(TableName tblName, String colName, boolean requestByChild);
``` 

2. add a flag to specify whether the tuple is from child.
```java
// alias name -> <from child, tupleDesc>
private final Multimap<String, Pair<Boolean, TupleDescriptor>> tupleByAlias;
```

when `requestByChild == true`, we **SKIP** the tuple from other child to avoid resolve error.
2023-03-22 11:00:55 +08:00
a4b151e469 [fix](planner) should always execute projection plan (#17885)
1. should always execute projection plan, whatever the statement it is.
2. should always execute projection plan, since we only have vectorized engine now
2023-03-22 10:53:15 +08:00
6fa239384d [refactor](Nereids) remove tabletPruned flag in LogicalOlapScan. (#17983) 2023-03-22 10:45:14 +08:00
7fd0ec7d17 [Bug](float) fix wrong value when enable fold constant by BE (#17901) 2023-03-22 09:51:03 +08:00
7ddba7bf54 [fix](multi-catalog) when checkProperties failed,will have dirty data (#17877) 2023-03-22 09:40:07 +08:00
213735b5fb [doc](auth)ranger doc (#17927) 2023-03-22 09:38:54 +08:00
545d3b1c3e [Enhancement](auth)support ranger col priv (#17915)
1.When querying data, it is no longer necessary to verify the permissions of the entire table, but rather to verify the 
permissions of the queried columns. Currently, the 'ranger' already supports column permissions, and the internal 
catalog provides the implementation of dummy column permissions (the actual verified permissions are still table 
permissions)

2.delete roles in userIdentity

3.Change trigger logic of initAccessController
2023-03-22 09:00:17 +08:00
8df4a94826 [fix](MTMV) Tasks leak when dropping job (#17984)
1. Divide MTMV regression tests into 4 suites
2. Try to remove tasks which were killed by dropping job actions in running map.
2023-03-21 23:22:17 +08:00
cb79e42e5c [refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586)
See #17764 for details
I have tested:
- Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp
- Outfile to local/s3/hdfs/broker.
- Load from local/s3/hdfs/broker.
- Query file on local/s3/hdfs/broker file system, with table value function and catalog.
- Backup/Restore with local/s3/hdfs/broker file system

Not test:
- cold & host data separation case.
2023-03-21 21:08:38 +08:00
82716ec99d [fix](Nereids) type coercion for subquery (#17661)
Complete the type coercion of the subquery in the function Binder process.

Expressions generated when subqueries are nested are uniformly converted to implicit types in the analyze stage.
Method: Add a typeCoercionExpr field to the subquery expression to store the generated cast information.

Fix scenario where scalarSubQuery handles arithmetic expressions when implicitly converting types
2023-03-21 20:38:06 +08:00
4193884a32 [feature](array_zip) Support array_zip function (#17696) 2023-03-21 18:44:30 +08:00
861d9c985c [refactor](Nereids): refactor Join Reorder Rule. (#17809) 2023-03-21 16:12:07 +08:00
7754619e2b [fix](quit) be can not quit cleanly due to deadlock (#17971) 2023-03-21 12:52:48 +08:00
61366b21aa [regression-test](merge-on-write) Optimize merge-on-write case execution time (#17956) 2023-03-21 12:49:42 +08:00
ed7c880e18 [fix](Nereids) should turn off parallel scan when do local finalize agg (#17961) 2023-03-21 11:55:35 +08:00
1f569b7a7d [enhancement](topn explain) display explain two phase read more precise (#17946) 2023-03-21 10:53:47 +08:00
4023670f35 [BugFix](DOE) Add http prefix when it's not set in hosts properties. (#17745)
* Add http prefix when it's not set in hosts properties
2023-03-21 10:08:20 +08:00
6c8ed9135d [fix](truncate) fix unable to truncate table due to wrong storage medium (#17917)
When setting FE config default_storage_medium to SSD, and set all BE storage path as SSD.
And table will be stored with storage medium SSD.
But there is a FE config storage_cooldown_second and its default value is 30 days.
So after 30 days, the storage medium of table will be changed to HDD, which is unexpected.

This PR removes the storage_cooldown_second, and use a max value to set the cooldown time of SSD
storage medium when the default_storage_medium is SSD.
2023-03-21 10:04:47 +08:00
656b01d191 [fix](agg) Avoid reusing a non-nullable column that has been converted to nullable within a block (#17944)
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:420
 1# os::Linux::chained_handler(int, siginfo*, void*) in /usr/local/java/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/local/java/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so
 3# signalHandler(int, siginfo*, void*) in /usr/local/java/jdk1.8.0_202/jre/lib/amd64/server/libjvm.so
 4# 0x00007F4051C9F400 in /lib64/libc.so.6
 5# memcpy at /root/doris/be/src/glibc-compatibility/memcpy/memcpy_x86_64.cpp:219
 6# doris::vectorized::ColumnString::deserialize_and_insert_from_arena(char const*) at /root/doris/be/src/vec/columns/column_string.cpp:226
 7# doris::vectorized::ColumnString::deserialize_vec_with_null_map(std::vector<StringRef, std::allocator<StringRef> >&, unsigned long, unsigned char const*) at /root/doris/be/src/vec/columns/column_string.cpp:283
 8# void doris::vectorized::AggregationNode::_serialize_with_serialized_key_result(doris::RuntimeState*, doris::vectorized::Block*, bool*)::{lambda(auto:1&&)#1}::operator()<doris::vectorized::AggregationMethodSerialized<PHHashMap<StringRef, char*, DefaultHash<StringRef, void>, false> >&>(doris::vectorized::AggregationMethodSerialized<PHHashMap<StringRef, char*, DefaultHash<StringRef, void>, false> >&) const at /root/doris/be/src/vec/exec/vaggregation_node.cpp:1232
 9# doris::vectorized::AggregationNode::_serialize_with_serialized_key_result(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/vec/exec/vaggregation_node.cpp:1294
10# std::_Function_handler<doris::Status (doris::RuntimeState*, doris::vectorized::Block*, bool*), std::_Bind_result<doris::Status, doris::Status (doris::vectorized::AggregationNode::*(doris::vectorized::AggregationNode*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>))(doris::RuntimeState*, doris::vectorized::Block*, bool*)> >::_M_invoke(std::_Any_data const&, doris::RuntimeState*&&, doris::vectorized::Block*&&, bool*&&) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:293
11# doris::vectorized::AggregationNode::get_next(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/vec/exec/vaggregation_node.cpp:508
12# doris::ExecNode::get_next_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/exec/exec_node.cpp:852
13# doris::PlanFragmentExecutor::get_vectorized_internal(doris::vectorized::Block**) at /root/doris/be/src/runtime/plan_fragment_executor.cpp:352
14# doris::PlanFragmentExecutor::open_vectorized_internal() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:300
15# doris::PlanFragmentExecutor::open() at /root/doris/be/src/runtime/plan_fragment_executor.cpp:253
16# doris::FragmentExecState::execute() at /root/doris/be/src/runtime/fragment_mgr.cpp:251
17# doris::FragmentMgr::_exec_actual(std::shared_ptr<doris::FragmentExecState>, std::function<void (doris::PlanFragmentExecutor*)>) at /root/doris/be/src/runtime/fragment_mgr.cpp:498
18# std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::PlanFragmentExecutor*)>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:291
19# doris::ThreadPool::dispatch_thread() at /root/doris/be/src/util/threadpool.cpp:542
20# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:455
21# start_thread in /lib64/libpthread.so.0
22# clone in /lib64/libc.so.6
2023-03-21 09:00:06 +08:00
7b93c17364 [Bug][Fix] regexp function core dump DCHECK failed and error result (#17953)
CREATE TABLE `test` (
`name` varchar(64) NULL,
`age` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`name`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`name`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
);
insert into `test` values ("lemon",1),("tom",2);

select a.name regexp concat('^', a.name) from test a;
2023-03-21 08:56:19 +08:00
a73524af49 [fix](regression-test) print real and expect rows when fail in exception (#17949) 2023-03-21 08:52:04 +08:00
9021d2ae11 [fix](fragment mgr) fix the bug that g_fragmentmgr_prepare_latency update uncorrectly #17909 2023-03-21 08:50:38 +08:00
11a0ae9a87 [fix](ctas) fix show load throw NPE after ctas (#17937)
Missing userinfo

java.lang.NullPointerException: null
        at org.apache.doris.load.loadv2.LoadJob.getShowInfo(LoadJob.java:816) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.load.loadv2.LoadManager.getLoadJobInfosByDb(LoadManager.java:557) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.ShowExecutor.handleShowLoad(ShowExecutor.java:1094) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.ShowExecutor.execute(ShowExecutor.java:280) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.StmtExecutor.handleShow(StmtExecutor.java:1862) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:619) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:435) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:414) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.ConnectProcessor.dispatch(ConnectProcessor.java:558) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:799) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
2023-03-21 08:49:51 +08:00
bae9d8d7f2 [Feature-Wip](MySQL LOAD)Add trim quotes property for mysql load (#17775)
Add trim quotes property for mysql load to trim double quotes in the load files.
2023-03-21 00:32:58 +08:00
5cac64413a [Feature](ES): Support es get alias field type. (#17783)
Support es get alias field type.
2023-03-21 00:32:24 +08:00
dc284b62d9 [vectorized](function) support array_filter function (#17832) 2023-03-20 23:18:10 +08:00
8ffc85b6ff [fix](planner)project should be done inside inlineview (#17831)
* [fix](planner)project should be done inside inlineview

* add src column for slots in scan node's output tuple
2023-03-20 21:12:45 +08:00
3661ca4510 [Bug](regression-test) Fix the collect_set/list size limit result failed (#17963)
fix regression test failed in fuzzy mode:collect_set
2023-03-20 21:00:41 +08:00
Pxl
a92115f709 [Bug](materialized-view) fix select mv rollback fail on left join (#17850)
fix select mv rollback fail on left join
2023-03-20 19:14:17 +08:00
b4634342aa [bug](txn) fix concurrent txns's status data race when aborting txn (#17893) 2023-03-20 17:55:03 +08:00
bd8e3e6405 [refactor](date) unify DateTimeValue and VecDateTimeValue (#17670) 2023-03-20 16:27:08 +08:00
4ef7839567 [doc](auth)auth doc version (#17943) 2023-03-20 13:24:00 +08:00
Pxl
45232d65a6 [Chore](case) remove load big lateral view from p1 to p2 (#17851)
move load big lateral view from p1 to p2, this case takes a long time to execute
2023-03-20 13:10:12 +08:00
223d7a36eb adjust distribution stats derive, fix bug in join estimation (#17916) 2023-03-20 13:04:29 +08:00
93cfd5cd2b [Enhance](ComputeNode)support k8s watch (#17442)
Describe your changes.

1.Add the watch mechanism to listen for changes in k8s statefulSet and update nodes in time.
2.For broker, there is only one name by default when using deployManager
3.Refactoring code makes it easier to understand and maintain
4.Fix jar package conflicts between okhttp-ws and okhttp

Previously, the logic of k8sDeployManager.getGroupHostInfos was to call the endpoints () interface of k8s,
which would cause if the pod was unexpectedly restarted, k8sDeployManager would delete the pod before the
restart from the fe or be list and add the pod after the restart to the fe or be list, which obviously does not
meet our expectations.
Now, after fqdn is enabled, we call the statefulSets() interface of k8s to listen for the number of copies to
determine whether we need to be online or offline.
In addition, the watch mechanism is added to avoid the possible A-B-A problem caused by timed polling.
For the sake of stability, when the watch mechanism does not receive messages for a period of time,
it will be degraded to the polling mode.

Now several environment variables have been added,ENV_FE_STATEFULSET,ENV_FE_OBSERVER_STATEFULSET,ENV_BE_STATEFULSET,ENV_BROKER_STATEFULSET,ENV_CN_STATEFULSET For statefulsetName,One-to-one correspondence with ENV_FE_SERVICE,ENV_FE_OBSERVER_SERVICE,ENV_BE_SERVICE,ENV_BROKER_SERVICE,ENV_CN_SERVICE,If a serviceName is configured, the corresponding statefulsetName must be configured, otherwise the program cannot be started.
2023-03-20 11:36:32 +08:00
06095054ab [doc](readme)correct some links (#17938) 2023-03-20 09:58:03 +08:00
ff09daaae3 [fix](test) add order by to test_alter_table_column (#17936) 2023-03-20 09:07:43 +08:00
378789ba8a [Fix](parquet-reader) Fix dict_filter crashed caused by VDirectInPredicate checking expr result is not nullable. (#17924)
Be crashed in parquet dict_filter function caused by VDirectInPredicate checking expr result is not nullable.
2023-03-20 00:02:59 +08:00
5c990fb737 [fix](nereids) Analyze failed for SQL that has count distinct with same col (#17928)
This problem is caused by the slots with same hashcodes was put in the hashset results into the wrong rules was selected.Use list instead of set as return type of getDistinctArguments method
2023-03-19 21:31:47 +08:00
74dfdc00dc [nerids](statistics) remove lock in statistics cache loader #17833
remove the redandunt lock in the CacheLoader, since it use the forkjoinpool in default
Add execute time log for collect stats
Avoid submit duplicate task, when there already has a task to load for the same column
2023-03-19 20:30:21 +08:00
bf814411d8 [fix](tests) add order by for test_multiply and test_schema_change_with_delete (#17929) 2023-03-19 17:57:29 +08:00
1d26b4d6c2 [improvement](predicate) Cache the dict code in ComparisonPredicate (#17684) 2023-03-19 17:37:28 +08:00