Commit Graph

5755 Commits

Author SHA1 Message Date
03cb69c0ee [feature](backup-restore) Add local backup/restore not upload/download by broker (#20492) 2023-06-07 21:35:15 +08:00
09344eaab5 [feature](load) introduce single-stream-multi-table load (#20006)
For routine load (kafka load), user can produce all data for different
table into single topic and doris will dispatch them into corresponding
table.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-06-07 17:55:25 +08:00
Pxl
fbbf4c420e [Bug](Agg-State) fix agg state function get wrong input argument list (#20546)
fix agg state function get wrong input argument list
2023-06-07 17:32:48 +08:00
c910e9b78b [doc](disk)fix disk capacity doc error (#20506) 2023-06-07 15:20:04 +08:00
6b325a8458 [fix](Nereids): union output can be Alias<Slot> (#20532) 2023-06-07 15:11:07 +08:00
cd70c37402 [fix](nereids) filter and project node should be pushed down through cte (#20508)
1.move PushdownFilterThroughCTEAnchor and PushdownProjectThroughCTEAnchor into PUSH_DOWN_FILTERS rule set
2.move PushdownFilterThroughProject before MergeProjectPostProcessor
2023-06-07 10:36:32 +08:00
b83039de76 [fix](stats) Make alter column stats no forward (#20501)
For test convenient, since daily regression tests queries would be sent any FE rather than master only.
2023-06-07 10:14:44 +08:00
b65094c8df [Improvement](multi-catalog) paimon supports projection push down (#20522)
Co-authored-by: hugoluo <hugoluo@tencent.com>
2023-06-07 00:39:08 +08:00
c991249360 [enhancement](cooldown) use cooldown replica first when generating scan node (#20384) 2023-06-06 22:15:49 +08:00
82cf76f92b [fix](Nereids) join condition not extract as conjunctions (#20498) 2023-06-06 20:34:19 +08:00
05bdbce8fc [Feature](Nereids) support update unique table statement (#20313) 2023-06-06 20:32:43 +08:00
0c6292abaa [fix](stats) skip forbid_unknown_col_stats check for invisible column and internal db (#20362)
1. skip forbidUnknownColStats check for in-visible columns
2. use columsStatistics.isUnknown to tell if this stats is unknown
3. skip unknown stats check for internal schema
2023-06-06 19:07:33 +08:00
a569d371b3 [fix](Nereids) give clean error message when there are subquery in the on clause (#20211)
Add the rule for checking the join node in `analysis/CheckAnalysis.java` file. When we check the join node, we should check its' on clause. If there are some subquery expression, we should throw exception.

Before this PR
```
mysql> select a.k1 from baseall a join test b on b.k2 in (select 49);
ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: nul
```

After this PR
```
mysql> select a.k1 from baseall a join test b on b.k2 in (select 49);
ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: Not support OnClause contain Subquery, expr:k2 IN (INSUBQUERY) (LogicalOneRowRelation ( projects=[49 AS `49`#28], buildUnionNode=true ))
```
2023-06-06 16:50:20 +08:00
b1a8bb28f7 [Fix](WorkloadGroup)Fix query queue nereids bug #20484 2023-06-06 16:44:35 +08:00
48021366bf [fix](load) fix unified load redirect status delegate error (#20467) 2023-06-06 15:46:48 +08:00
13f1b90768 [Fix] (tablet) fix tablet queryable set (#20413) (#20414) 2023-06-06 15:38:01 +08:00
fe63a0a3bb [Feature](multi-catalog)support paimon catalog (#19681)
CREATE CATALOG paimon_n2 PROPERTIES (
"dfs.ha.namenodes.HDFS1006531" = "nn2,nn1",
"dfs.namenode.rpc-address.HDFS1006531.nn2" = "172.16.65.xx:4007",
"dfs.namenode.rpc-address.HDFS1006531.nn1" = "172.16.65.xx:4007",
"hive.metastore.uris" = "thrift://172.16.65.xx:7004",
"type" = "paimon",
"dfs.nameservices" = "HDFS1006531",
"hadoop.username" = "hadoop",
"paimon.catalog.type" = "hms",
"warehouse" = "hdfs://HDFS1006531/data/paimon1",
"dfs.client.failover.proxy.provider.HDFS1006531" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
);
2023-06-06 15:08:30 +08:00
ae428c29e2 [feature](planner)(nereids) support user defined variable (#20334)
Support user-defined variables.
After this PR, we can use `set @a = xx` to define a user variable and use it in the query like `select @a`.

the changes of this PR:
1. Support the grammar for `set user variable` in the parser.
2. Add the `userVars` in `VariableMgr` to store the user-defined variables.
3. For the `set @a = xx`, we will store the variable name and its value in the `userVars` in `VariableMgr`.
4. For the `select @a`, we will get the value for the variable name in `userVars`.
2023-06-06 14:35:16 +08:00
0fce7b9011 [fix](http) Let the sdk find the httpclient package determined (#20205) 2023-06-06 14:20:38 +08:00
1f032a551d [Improve](array-functions) support array first function (#20397)
add array_first(lambda, [1,2,3,null]) function for doris
2023-06-06 12:08:46 +08:00
1b94b6368f [fix](load) in strict mode, return error for insert if datatype convert fails (#20378)
* [fix](load) in strict mode, return error for load and insert if datatype convert fails

Revert "[fix](MySQL) the way Doris handles boolean type is consistent with MySQL (#19416)"

This reverts commit 68eb420cabe5b26b09d6d4a2724ae12699bdee87.

Since it changed other behaviours, e.g. in strict mode insert into t_int values ("a"),
it will result 0 is inserted into table, but it should return error instead.

* fix be ut

* fix regression tests
2023-06-06 12:04:03 +08:00
e553615a27 [opt](Nereids) perfer use datev2 / datetimev2 in date related functions (#20224)
1. update all date related functions' signatures order. 
1.1. if return value need to be compute with time info, args with datetimev2 at the top of the list, followed by datev2, datetime and date
1.2. if return value need to be compute with only date info, args with datev2 at the top of list, followed by datetimev2, date and datetime
2. Priority for use datev2, if we must cast date to datev2 or datetime/datetimev2
2023-06-06 11:42:29 +08:00
c56eddbfa9 [bug](jdbc) fix trino date/datetime filter (#20443)
When querying Trino's JDBC catalog, if our WHERE filter condition is k1 >= '2022-01-01', this format is incorrect. 
In Trino, the correct format should be k1 >= date '2022-01-01' or k1 >= timestamp '2022-01-01 00:00:00'. 
Therefore, the date string in the WHERE condition needs to be converted to the date or timestamp format supported by Trino.
2023-06-06 11:20:42 +08:00
d02737a293 [feature](struct-type) support struct_element function (#19045)
This commit support a function allows return a field column in named struct column.
Since the function can return any type, this commit also supports ANY_STRUCT_TYPE
and ANY_ELEMENT_TYPE.
2023-06-06 10:44:08 +08:00
f839c90c27 [fix][refactor](backend-policy)(compute) refactor the hierarchy of external scan node and fix compute node bug #20402
There should be 2 kinds of ScanNode:

OlapScanNode
ExternalScanNode
The Backends used for ExternalScanNode should be controlled by FederationBackendPolicy.
But currently, only FileScanNode is controlled by FederationBackendPolicy, other scan node such as MysqlScanNode,
JdbcScanNode will use Mix Backend even if we enable and prefer to use Compute Backend.

In this PR, I modified the hierarchy of ExternalScanNode, the new hierarchy is:

ScanNode
    OlapScanNode
    SchemaScanNode
    ExternalScanNode
        MetadataScanNode
        DataGenScanNode
        EsScanNode
        OdbcScanNode
        MysqlScanNode
        JdbcScanNode
        FileScanNode
            FileLoadScanNode
            FileQueryScanNode
                MaxComputeScanNode
                IcebergScanNode
                TVFScanNode
                HiveScanNode
                    HudiScanNode
And previously, the BackendPolicy is the member of FileScanNode, now I moved it to the ExternalScanNode.
So that all subtype ExternalScanNode can use BackendPolicy to choose Compute Backend to execute the query.

All all ExternalScanNode should implement the abstract method createScanRangeLocations().

For scan node like jdbc scan node/mysql scan node, the scan range locations will be selected randomly from
compute node(if preferred).

And for compute node selection. If all scan nodes are external scan nodes, and prefer_compute_node_for_external_table
is set to true, the BE for this query will only select compute nodes.
2023-06-06 10:35:30 +08:00
b7fc17da68 [feature-wip](multi-catalog)(step2)support read max compute data by JNI (#19819)
Issue Number: #19679
2023-06-05 22:10:08 +08:00
fac0b50f56 [Fix](Planner)fix cast date/datev2/datetime to float/double return null. (#20008) 2023-06-05 19:06:50 +08:00
92721c84d3 [improve](nereids)derive analytics node stats (#20340)
1. derive analytic node stats, add support for rank()
2. filter estimation stats derive updated. update row count of filter column.
3. use ColumnStatistics.orginal to replace ColumnStatistics.orginalNdv, where ColumnStatistics.orginal is the column statisics get from TableScan.
TPCDS 70 on tpcds_sf100 improved from 23sec to 2 sec
This pr has no performance downgrade on other tpcds queries and tpch queries.
2023-06-05 18:56:20 +08:00
c7dd7c2eba Fix query hang when using queue (#20434) 2023-06-05 18:12:26 +08:00
7d11db0807 [fix](Nereids) throw NPE when sql cannot be parsed by all planner (#20440) 2023-06-05 17:49:08 +08:00
bc65e9b5fb [fix](MTMV) Support star expressions in select list (#20355) 2023-06-05 17:06:05 +08:00
9d39fd7aae [fix](Nereids): fix filter can't be pushdown unionAll (#20310) 2023-06-05 16:56:25 +08:00
f0b0bda04a [Fix](Nereids) Fix duplicated name in view does not throw exception (#20374)
When using nereids, if we have duplicated name in output of view, we need to throw an exception. A check rule was added in bindExpression rule set
2023-06-05 16:10:54 +08:00
a66d5a6ae0 [fix](workload-group) fix workload group non-existence error (#20428) 2023-06-05 15:33:26 +08:00
fe942eaf44 [Fix](Nereids) Fix minidump using put all of hashmap (#20268)
Minidump file wants to get information as much as possible, but when close the switch, these methods should not be called after refactor pr: #20049. Other place of doing more jobs after add Minidump feature also be checked.
2023-06-05 13:05:15 +08:00
0dc6d3a568 [fix](nereids) avg size of column stats always be 0 (#20341)
it takes lot of effort to compute the avgSizeByte for col stats.
we use schema information to avoid compute actual average size
2023-06-05 13:01:58 +08:00
cd0379df4e [fix](nereids) select with specified partition name is not work as expected (#20269)
This PR is to fix the select specific partition issue, certain codes related to this feature were accidentally deleted.
2023-06-05 12:48:54 +08:00
3c28a71378 [fix](dynamic partition) partition create failed after alter distributed column (#20239)
This pr fix following two problems:

Problem1: Alter column comment make add dynamic partition failed inside issue #10811

create table with dynamic partition policy;
restart FE;
alter distribution column comment;
alter dynamic_partition.end to trigger add new partition by dynamic partition scheduler;
Then we got the error log, and the new partition create failed.
dynamic add partition failed: errCode = 2, detailMessage =      Cannot assign hash distribution with different distribution cols. default is: [id int(11) NULL COMMENT 'new_comment_of_id'], db: default_cluster:example_db, table: test_2
Problem2: rename distributed column, make old partition insert failed. inside #20405

The key point of the reproduce steps is restart FE.

It seems all versions will be affected, include master and lts-1.1 and so on.
2023-06-05 12:20:50 +08:00
a6d8115cbc [Improvement](planner) expand sql-block-rule to make it can be used on all kinds of sql stmt (#19540)
Currently, sql-block-rule can only be used for query statements, while it's useful for other stmts like insert / delete / alter / drop etc. Now remove the limitation and expand its using scenario.
2023-06-05 11:01:43 +08:00
660ab34147 [fix](multicatalog) support read from hive table with HIVE_UNION_SUBDIR in path location (#20329) 2023-06-05 11:01:24 +08:00
12f89b879f [fix](stats) Analysis info lost after checkpoint (#20412)
1. Implement write/read for AnalysisManager 
2. If database or table has any column with complex type, the analyze stmt would fail directly. Enable to ignore complex type columns and analyze rest of them in this PR
2023-06-05 10:51:02 +08:00
c6387847aa [fix](nereids) change defaultConcreteType function's return value for decimal (#20380)
1. add default decimalv2 and decimalv3 for NullType
2. change defaultConcreteType of decimalv3 to this
2023-06-05 10:50:07 +08:00
59a0f80233 [Improve](array-function)Improve array function intersect (#20085)
now we just support array function with 2 arrays , but intersect operator can support more than 2 arrays
2023-06-05 10:38:48 +08:00
Pxl
8e39f0cf6b [Enchancement](Agg State) storage function name and result is nullable in agg state type (#20298)
storage function name and result is nullable in agg state type
2023-06-04 22:44:48 +08:00
ad5e34ab9c [Doc](statistics) supplement stats doc (regression test and automatic collection) (#20071) 2023-06-03 17:25:33 +08:00
77855fcd43 [fix](inverted index) fix transaction id changed when light index change (#20302) 2023-06-03 16:05:02 +08:00
ffadaa4935 [improvement](inverted index) skip write index on load and generate index on compaction (#20325) 2023-06-03 16:03:21 +08:00
6958a8f92f [fix](dynamic_partition) fix dead lock when modify dynamic partition property for olap table (#20390)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-06-03 08:25:20 +08:00
299c3dc396 [fix](Nereids) should not inherit child's limit and offset when generate exchange node (#20373)
in legacy planner, when we new exchange, it inherit its child's limit and offset.
but in Nereids, we should not do this. because if we need set limit or offset, we will set it manually.
In this PR, we use a new ctor of ExchangeNode to ensure not set limit or offset unexpected.
2023-06-02 19:55:33 +08:00
a8e0841ef1 [fix](workload-group) fix incorrect memoryLimitPercent value (#20377) 2023-06-02 18:57:57 +08:00