Commit Graph

2736 Commits

Author SHA1 Message Date
b98a3ed86c [fix](frontend) fix notify update storage policy agent task null exception #12470 2022-09-13 16:20:11 +08:00
dc80a993bc [feature-wip](new-scan) New load scanner. (#12275)
Related pr:
https://github.com/apache/doris/pull/11582
https://github.com/apache/doris/pull/12048

Using new file scan node and new scheduling framework to do the load job, replace the old broker scan node.
The load part (Be part) is work in progress. Query part (Fe) has been tested using tpch benchmark.

Please review only the FE code in this pr, BE code has been disabled by enable_new_load_scan_node configuration. Will send another pr soon to fix be side code.
2022-09-13 13:36:34 +08:00
5b4d3616a4 [feature](Nereids): semi join transpose. (#12515)
* [feature](Nereids): semi join transpose.

* fix conditionChecker and check lasscom
2022-09-13 13:32:47 +08:00
d35a8a24a5 [feature](nereids) push down Project through Limit (#12490)
This rule is rewrite project -> limit to limit -> project. The reason is we could get tree like project -> limit -> project -> other node. If we do not rewrite it. we could not merge the two project into one. And if we has more than one project on one node, the second one will overwrite the first one when translate. Then, be will core dump or return slot cannot find error.
2022-09-13 13:26:12 +08:00
c3d7d4ce7a [fix](Nereids): fix LAsscom project split. (#12506) 2022-09-13 12:12:39 +08:00
6b52e47805 [fix](agg)the intermediate slots should be materialized as output slots (#12441)
in some case, the output slots of agg info may be materialized by call SlotDescriptor's materializeSrcExpr method, but not the intermediate slots. This pr set intermediate slots materialized info to keep consistent with output slots.
2022-09-13 11:28:27 +08:00
87439e227e [Enhancement](DOE): Doe support object/nested use string (#12401)
* MOD: doe support object/nested use string
2022-09-13 09:59:48 +08:00
b1c2a8343f [Bug](array_type) Forbid adding array key columns #12479
mysql> desc array_test;
+-----------+----------------+------+-------+---------+-------+
| Field     | Type           | Null | Key   | Default | Extra |
+-----------+----------------+------+-------+---------+-------+
| id        | INT            | Yes  | true  | NULL    |       |
| c_array   | ARRAY<INT(11)> | Yes  | false | NULL    | NONE  |
+-----------+----------------+------+-------+---------+-------+

Before:
mysql> ALTER TABLE array_test ADD COLUMN add_arr_key array<int> key NULL DEFAULT NULL;
Query OK, 0 rows affected (0.00 sec)

After:
mysql> ALTER TABLE array_test ADD COLUMN c_array array<int> key NULL DEFAULT NULL;
ERROR 1105 (HY000): errCode = 2, detailMessage = Array can only be used in the non-key column of the duplicate table at present.

mysql> ALTER TABLE array_test MODIFY COLUMN c_array array<int> key NULL DEFAULT NULL;
ERROR 1105 (HY000): errCode = 2, detailMessage = Array can only be used in the non-key column of the duplicate table at present.
2022-09-13 08:48:28 +08:00
503a79e4d8 [Bugfix](load) fix be may core dump when load column mapping has function (#12509)
fix be may core dump when load column mapping has function
this bug may be introduced by #12375
2022-09-13 08:44:10 +08:00
ecfefae715 [enhancement](load) make default load mem limit configurable (#12348)
* make LoadMemLimit valid for broker load, stream load and routine load

Co-authored-by: wuhangze <wuhangze@jd.com>
2022-09-12 10:25:01 +08:00
0c260152b7 [fix](profile) fix query instance profile may be lost. (#12418) 2022-09-09 22:58:04 +08:00
f80d7bdd5b [enhancement](Nereids) add type coercion between decimal and integral (#12482) 2022-09-09 20:08:03 +08:00
2b62ac2fef [Feature](Nereids) Main framework for selecting rollup index. (#12464)
# Proposed changes
First step of #12303 

## Problem summary

This is the first step for supporting rollup index selection for aggregate/unique key OLAP table.

This PR aims to select rollup index when the aggregate node is present and the aggregate function matches the value type. So pre-aggregation is turned on by default.  Cases that pre-aggregation should be turned off will be addressed in the next PR.

Main steps for rollup index selection: 

1. filter rollup indexes with all the required columns.
2. filter rollup indexes that match the key prefix most.
3. order the rollup indexes by row count, column count, rollup index id.

TODO remaining:
1. address cases that pre-aggregation should be turned off. (next PR)
2. add more test cases. 

Refactor
- Add `Project.getSlotToProducer` to extract a map from the project output slot to its producing expression.
- Add `Filter.getConjuncts` to split the filter condition to conjunctive predicates.
- Move the usage of `ExpressionReplacer` to `ExpressionUtils.replace(expr, replaceMap)` to simplify the code.
2022-09-09 18:14:31 +08:00
dc7e5ca039 [fix](nereids) uncorrelated subquery can't get the correct result (#12421)
When the current non-correlated subquery is executed, an error will be reported that the corresponding column cannot be found.
The reason is that the tupleID of the child obtained in visitPhysicalNestedLoopJoin is not consistent with the child.

The non-correlated subquery will trigger this bug because it uses crossJoin.
At the same time, sub-query regression tests for non-associative and complex scenarios have been added

Co-authored-by: morrySnow <morrysnow@126.com>
2022-09-09 18:08:34 +08:00
77b93ebc09 [enhancement](Nereids) add optionalAnd to simplify code (#12497)
Add optionalAnd to avoid adding True which may make BE crash. Use optional to simplify code.
2022-09-09 15:54:32 +08:00
6b8a139f2d [feature](Nereids) Support function registry (#12481)
Support function registry.

The classes:
- BuiltinFunctions: contains the built-in functions list
- FunctionRegistry: used to register scalar functions and aggregate functions, it can find the function by name
- FunctionBuilder: used to resolve a BoundFunction class, extract the constructor, and build to a BoundFunction by arguments(`List<Expression>`)

Register example: you can add built-in functions in the list for simplicity

```java
public class BuiltinFunctions implements FunctionHelper {
    public final List<ScalarFunc> scalarFunctions = ImmutableList.of(
            scalar(Substring.class, "substr", "substring"),
            scalar(WeekOfYear.class),
            scalar(Year.class)
    );

    public final ImmutableList<AggregateFunc> aggregateFunctions = ImmutableList.of(
            agg(Avg.class),
            agg(Count.class),
            agg(Max.class),
            agg(Min.class),
            agg(Sum.class)
    );
}
```

Note:
- Currently, we only support register scalar functions add aggregate functions, we will support register table functions.
- Currently, we only support resolve function by function name and difference arity, but can not resolve the same arity override function, e.g. `some_function(Expression)` and `some_function(Literal)`
2022-09-09 15:19:45 +08:00
c9a6486f8c [fix](Nereids) subquery predicate's slot appears in having's output by mistake (#12494)
when uncorrelated subquery in having predicates, having's output will appears one slot from subquery by mistake. This PR fix it by always add a project on the top of having.

Co-authored-by: mch_ucchi <organic_chemistry@foxmail.com>
2022-09-09 11:52:56 +08:00
73351917ab [Enhancement](array-type) Add readable information in subquery for array type #12463 2022-09-09 11:17:50 +08:00
a04f9814fe [fix](Nereids) column prune generate empty project list on join's child (#12486)
* [fix](Nereids) column prune generate empty project list on join's child
2022-09-09 10:43:57 +08:00
a468085efe [improvement](error info)improve the s3 path err msg #12438 2022-09-09 09:14:24 +08:00
b45a8379eb [bugfix](odbc) escape identifiers for sqlserver and postgresql (#12487)
Delimited identifier format for sqlserver and postgresql is different from MySQL.
Sqlserver use brackets ([ ]) and postgresql use double quotes("").
2022-09-09 09:11:03 +08:00
e84272ed43 [improvment](planner) unset common fields to reduce plan thrift size (#12495)
1. For query with 1656 union, the plan thrift size will be reduced from 400MB+ to 2MB.
This optimization is introduced from #4904, but lost after #9720

2. Disable ExprSubstitutionMap.verify when debug is disable.
So that the plan time of query with 1656 union will be reduced from 20s to 2s
2022-09-09 09:02:45 +08:00
d2a23a4cf9 [enhancement](Nereids) change aggregate and join stats calc algorithm (#12447)
The original statistic derive calculate algorithm rely on NDV and other column statistics. But we cannot get these stats in product environment. 
This PR change these operator's stats calc algorithm to use a DEFAULT RATIO variable instead of column statistics.
We should change these algorithm when we could get column stats in product environment
2022-09-09 01:00:07 +08:00
b4f0f39e77 [feature](Nereids) implement uncheckedCast method in VarcharLiteral (#12468)
Implement uncheckedCast on VarcharLiteral for a temp way to let TimestampArithmetic work.
We should remove these code and do implicit cast in TypeCoercion rule in future.
2022-09-09 00:33:37 +08:00
8478efad44 [improve](Nereids): check same logicalProperty when insert a Group. (#12469) 2022-09-09 00:00:11 +08:00
85bd297777 [feature](function)Support function "current_date" in FE (#11702)
Issue Number: close #11699
2022-09-08 16:00:57 +08:00
d1ab6b1db2 [enhancement](nereids) add syntax support for fractional literal (#12444)
Just as legacy planner, Nereids parse all fractional literal to decimal.
In the future, we will add more syntax for user to control the fractional literal type.
2022-09-08 15:54:20 +08:00
7c7ac86fe8 [feature](Nereids): Left deep tree join order. (#12439)
* [feature](Nereids): Left deep tree join order.
2022-09-08 15:09:22 +08:00
491dd34ba7 [fix](planner) fix orthogonal_bitmap_union_count plan : wrong PREAGGREGATION (#12095)
Execution plan display when using orthogonal_bitmap_union_count function:

PREAGGREGATION: OFF

Reason: Invalid Aggregate Operator: orthogonal_bitmap_union_count

The correct plan is: PREAGGREGATION: ON
Co-authored-by: lihuigang <lihuigang@meituan.com>
2022-09-08 15:00:43 +08:00
461a4cc94e [Enhancement](Error Msg) show details of COLUMN and TABLE name regex #11999
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-09-08 14:59:39 +08:00
824a192f8f [enhancement](http) executeSQL rest api support streaming response (#12239) 2022-09-08 14:57:15 +08:00
9225dd16ca [fix](grouping sets) grouping sets cause be core or return wrong results (#12313) 2022-09-08 14:55:50 +08:00
74ffdbeebc [feature](Nereids) Support OneRowRelation and EmptyRelation (#12416)
Support OneRowRelation and EmptyRelation.

OneRowRelation: `select 100, 'abc', substring('abc', 1, 2)`
EmptyRelation: `select * from tbl limit 0`

Note:
PhysicalOneRowRelation will translate to UnionNode(constExpr) for BE execution
2022-09-08 12:21:13 +08:00
a6880ca573 [fix](Nereids) throw IndexOutOfBoundsException in DistributionSpecHash#equalsSatisfy (#12446)
In earlier PR #11976 , we changed DistributionSpecHash#equalsSatisfy, and forgot to check whether the length of both side are same. When required's shuffle slot size longer than current one, exception will be thrown.
2022-09-08 11:41:48 +08:00
dd2f834c79 [feature-wip](parquet-reader) bug fix, create compress codec before parsing dictionary (#12422)
## Fix five bugs:
1. Parquet dictionary data may be compressed, but `ColumnChunkReader` try to parse dictionary data before creating compression codec, causing unexpected data errors.
2. `FE` doesn't resolve array type
3. `ParquetFileHdfsScanner`  doesn't fill partition values when the table is partitioned
4. `ParquetFileHdfsScanner` set `_scanner_eof = true` when a scan range is empty, causing the end of the scanner, and resulting in data loss
5. typographical error in `PageReader`
2022-09-08 09:54:25 +08:00
a536030979 [FOLLOWUP](load) fix nullable and add regression (#12375)
* [FOLLOWUP](load) fix nullable and add regression
2022-09-08 00:05:04 +08:00
bdbce77227 [fix](nereids) cast left child of TimestampArithmetic to wrong type in BindFunction (#12423) 2022-09-07 20:32:47 +08:00
184be8d13c [fix](array-type) ARRAY is not supported in bloomfilter index (#12353)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-07 18:00:01 +08:00
941bda5a20 [enhancement](spark-load)support dynamic set env (#12276)
* [enhancement](spark-load)support dynamic set env and display spark appid

* [enhancement](spark-load)support dynamic set env
2022-09-07 16:24:29 +08:00
40f481049a [fix](Nereids)lowest cost plan map do not be merged when do group merge (#12396)
* [fix](Nereids)lowest cost plan map do not be merged when do group merge
2022-09-07 16:13:11 +08:00
f2923f9180 [Refactor](Nereids) Simplify get input and output slots for plan/expression. (#12356)
Simplify the code of getting input/output slots from `Expression` or `Plan`.

**new interfaces add**

`Expression`:
`getInputSlots`: Get all the input slots of the expression.

`Plan`:
- `getOutputSet`: Get the output slot set of the plan.
- `getInputSlots`: Get the input slot set of the plan.

**changed interface**

`TreeNode`:
- `collect`: return `set` as result instead of `list`.
2022-09-07 14:05:37 +08:00
0bb06a1fa7 [feature](Nereids) let nullable of Year, WeekOfYear and Divide be the same as implementation in BE (#12374)
These function/expression should always be nullable, so just return true in the overwrite method.
- Year
- WeekOfYear
- Divide
2022-09-07 13:09:08 +08:00
46776af2a3 [fix](Nereids)plan translator lost other conjuncts on hash join node (#12391)
In the earlier PR #11812 , we split join condition into two parts: hash join conjuncts and other condition. But we forgot to translate other condition into other conjuncts in HashJoinNode of legacy planner. So we get wrong result if query has other condition on join node. Such as:

SELECT * FROM lineorder INNER JOIN part ON lo_partkey = p_partkey WHERE lo_orderkey > p_size;
2022-09-07 11:32:05 +08:00
42bdde8750 [Feature](Vectorized) support jdbc scan node (#12010) 2022-09-07 10:29:41 +08:00
232d17efea [Enhancement](sparkload) cast the src slot types of bitmap columns to bitmap when FE push tasks in spark load (#12394)
In the current spark load implementation, the types of source data, that BE reads from the Broker, are all set to varchar.
However, the two types of varchar and bitmap are not compatible anymore after version 1.1.0, which will cause spark load failure.

An example of spark load error message:

detailMessage = type not match, originType=VARCHAR(*), targeType=BITMAP
Describe your changes.

Set the src type of the bitmap columns from varchar to bitmapwhen fe pushtasks.
2022-09-07 10:07:38 +08:00
a465549f5e [feature](Nereids)support parse and analyze having clause (#12129)
Implement the having clause for Nereids Planner.

NOTE:

This PR aims at making Nereids Planner generate the correct logical plan and physical plan only. The runtime correctness is not the goal in this PR due to GROUP BY is not ready in Nereids Planner.
2022-09-07 09:47:03 +08:00
55fb90d6ae [feature](Nereids)add colocate, shuffle and bucket shuffle join algorithm to Nereids (#11976)
This PR
1. add support below join algorithm already supported by legacy to Nereids
- colocate join
- bucket shuffle join
- shuffle join
- broadcast join

2. update all cost enforce derive utils
- ChildOutputPropertyDeriver
- EnforceMissingPropertiesHelper
- RequestPropertyDeriver

3. add a local quick sort plan used in enforce
4. set PhysicalProperties to PhysicalPlan when choose best plan from memo
5. rename Job#pushTask to Job#pushJob
2022-09-07 00:31:21 +08:00
4c36e3dfa6 [fix](Nereids)LogicalAggregate's equals and hashCode missing two attributes (#12393)
After applying NormalizeAggregate rule, owner groups of all aggregate children are removed.
The root cause is the new aggregate node is regarded as the old aggregate node, because LogicalAggregate.equals() does not take some attributes ("normalized", "disassembled") into account.
2022-09-07 00:07:26 +08:00
3a0aae1b82 [enhancement](explain)add projections and output id in explain string (#12358)
In earlier PR #11842, we add the ability of projection on each ExecNode.
But, we cannot get the projection expr list in explain. This is inconvenience to debug.
This PR add them into explain string if they exist.
2022-09-06 21:03:02 +08:00
f1507f93ee [enhancement](chore)add single empty line rule to fe check style for Nereids (#12365) 2022-09-06 14:19:59 +08:00