Commit Graph

6259 Commits

Author SHA1 Message Date
d35a8a24a5 [feature](nereids) push down Project through Limit (#12490)
This rule is rewrite project -> limit to limit -> project. The reason is we could get tree like project -> limit -> project -> other node. If we do not rewrite it. we could not merge the two project into one. And if we has more than one project on one node, the second one will overwrite the first one when translate. Then, be will core dump or return slot cannot find error.
2022-09-13 13:26:12 +08:00
c3d7d4ce7a [fix](Nereids): fix LAsscom project split. (#12506) 2022-09-13 12:12:39 +08:00
6b52e47805 [fix](agg)the intermediate slots should be materialized as output slots (#12441)
in some case, the output slots of agg info may be materialized by call SlotDescriptor's materializeSrcExpr method, but not the intermediate slots. This pr set intermediate slots materialized info to keep consistent with output slots.
2022-09-13 11:28:27 +08:00
550b1e531b [fix](doc) add the key columes description of the table model document (#12500)
add the key columes description of the table model document
2022-09-13 11:27:05 +08:00
353f9e3782 [regression](json) add a nullable case for stream load with json format (#12505) 2022-09-13 10:45:01 +08:00
9f25544f2f [feature-wip](parquet-reader) page index bug fix (#12428)
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-09-13 10:28:53 +08:00
8a274d7851 [feature-wip](new-scan) refactor some interface about predicate push down in scan node (#12527)
This PR introduce a new enum type `PushDownType`:
```
enum class PushDownType {
        // The predicate can not be pushed down to data source
        UNACCEPTABLE,
        // The predicate can be pushed down to data source
        // and the data source can fully evaludate it
        ACCEPTABLE,
        // The predicate can be pushed down to data source
        // but the data source can not fully evaluate it.
        PARTIAL_ACCEPTABLE
    };
```

And derived class of VScanNode can override following method to determine whether to accept
a bianry/in/bloom filter/is null predicate:

```
PushDownType _should_push_down_binary_predicate();
PushDownType _should_push_down_in_predicate();
PushDownType _should_push_down_function_filter();
PushDownType _should_push_down_bloom_filter();
PushDownType _should_push_down_is_null_predicate();
```
2022-09-13 10:25:13 +08:00
87439e227e [Enhancement](DOE): Doe support object/nested use string (#12401)
* MOD: doe support object/nested use string
2022-09-13 09:59:48 +08:00
97cb095010 [test](join)add test join case4 #12508 2022-09-13 09:09:49 +08:00
8be5527be4 [test](join)add some join cases (#12501) 2022-09-13 08:59:32 +08:00
4c73755b40 [test](window-function) add regression test of window function (#12529) 2022-09-13 08:58:19 +08:00
e33f4f90ae [fix](exec) Avoid query thread block on wait_for_start (#12411)
When FE send cancel rpc to BE, it does not notify the wait_for_start() thread, so that the fragment will be blocked and occupy the execution thread.
Add a max wait time for wait_for_start() thread. So that it will not block forever.
2022-09-13 08:57:37 +08:00
b1c2a8343f [Bug](array_type) Forbid adding array key columns #12479
mysql> desc array_test;
+-----------+----------------+------+-------+---------+-------+
| Field     | Type           | Null | Key   | Default | Extra |
+-----------+----------------+------+-------+---------+-------+
| id        | INT            | Yes  | true  | NULL    |       |
| c_array   | ARRAY<INT(11)> | Yes  | false | NULL    | NONE  |
+-----------+----------------+------+-------+---------+-------+

Before:
mysql> ALTER TABLE array_test ADD COLUMN add_arr_key array<int> key NULL DEFAULT NULL;
Query OK, 0 rows affected (0.00 sec)

After:
mysql> ALTER TABLE array_test ADD COLUMN c_array array<int> key NULL DEFAULT NULL;
ERROR 1105 (HY000): errCode = 2, detailMessage = Array can only be used in the non-key column of the duplicate table at present.

mysql> ALTER TABLE array_test MODIFY COLUMN c_array array<int> key NULL DEFAULT NULL;
ERROR 1105 (HY000): errCode = 2, detailMessage = Array can only be used in the non-key column of the duplicate table at present.
2022-09-13 08:48:28 +08:00
503a79e4d8 [Bugfix](load) fix be may core dump when load column mapping has function (#12509)
fix be may core dump when load column mapping has function
this bug may be introduced by #12375
2022-09-13 08:44:10 +08:00
c8e9a32bb2 [Function](cbrt)Add cbrt function for doris (#12523)
Add cbrt function for doris
2022-09-12 19:58:45 +08:00
ecfefae715 [enhancement](load) make default load mem limit configurable (#12348)
* make LoadMemLimit valid for broker load, stream load and routine load

Co-authored-by: wuhangze <wuhangze@jd.com>
2022-09-12 10:25:01 +08:00
fc605779ed [fix](array-type) support to export the array type to hdfs (#12504)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-09-12 10:23:33 +08:00
9b73b45d05 [Doc](Streamload) update streamload default timeout #12499
Co-authored-by: wudi <>
2022-09-12 10:23:18 +08:00
efd2bdb203 [improvement](new-scan) avoid too many scanner context scheduling (#12491)
When select large number of data from a table, the profile will show that:

- ScannerCtxSchedCount: 2.82664M(2826640)
But there is only 8 times of ScannerSchedCount, most of them are busy running.
After improvement, the ScannerCtxSchedCount will be reduced to only 10.
2022-09-12 10:22:54 +08:00
e879c26232 [Enhancement](ChunkAllocator) Constructor of singleton class should be private #12516
Co-authored-by: weizuo <weizuo@xiaomi.com>
2022-09-12 10:21:49 +08:00
0c260152b7 [fix](profile) fix query instance profile may be lost. (#12418) 2022-09-09 22:58:04 +08:00
a6a378c9ca [fix](regression-test) remove 2 regression cases for nereids temporarily which blocked the pipeline (#12517)
removed below cases in regression suite: nereids_syntax_p0/sub_query_correlated
1. qt_not_exists_unCorrelated
2. qt_not_exist_uncorr
2022-09-09 22:20:35 +08:00
f80d7bdd5b [enhancement](Nereids) add type coercion between decimal and integral (#12482) 2022-09-09 20:08:03 +08:00
2b62ac2fef [Feature](Nereids) Main framework for selecting rollup index. (#12464)
# Proposed changes
First step of #12303 

## Problem summary

This is the first step for supporting rollup index selection for aggregate/unique key OLAP table.

This PR aims to select rollup index when the aggregate node is present and the aggregate function matches the value type. So pre-aggregation is turned on by default.  Cases that pre-aggregation should be turned off will be addressed in the next PR.

Main steps for rollup index selection: 

1. filter rollup indexes with all the required columns.
2. filter rollup indexes that match the key prefix most.
3. order the rollup indexes by row count, column count, rollup index id.

TODO remaining:
1. address cases that pre-aggregation should be turned off. (next PR)
2. add more test cases. 

Refactor
- Add `Project.getSlotToProducer` to extract a map from the project output slot to its producing expression.
- Add `Filter.getConjuncts` to split the filter condition to conjunctive predicates.
- Move the usage of `ExpressionReplacer` to `ExpressionUtils.replace(expr, replaceMap)` to simplify the code.
2022-09-09 18:14:31 +08:00
dc7e5ca039 [fix](nereids) uncorrelated subquery can't get the correct result (#12421)
When the current non-correlated subquery is executed, an error will be reported that the corresponding column cannot be found.
The reason is that the tupleID of the child obtained in visitPhysicalNestedLoopJoin is not consistent with the child.

The non-correlated subquery will trigger this bug because it uses crossJoin.
At the same time, sub-query regression tests for non-associative and complex scenarios have been added

Co-authored-by: morrySnow <morrysnow@126.com>
2022-09-09 18:08:34 +08:00
554ba40b13 [feature-wip](unique-key-merge-on-write) update delete bitmap when increamental clone (#12364) 2022-09-09 17:03:27 +08:00
77b93ebc09 [enhancement](Nereids) add optionalAnd to simplify code (#12497)
Add optionalAnd to avoid adding True which may make BE crash. Use optional to simplify code.
2022-09-09 15:54:32 +08:00
66491ec137 [Improvement](sort) improve partial sort algorithm (#12349)
* [Improvement](sort) improve partial sort algorithm
2022-09-09 15:44:18 +08:00
6b8a139f2d [feature](Nereids) Support function registry (#12481)
Support function registry.

The classes:
- BuiltinFunctions: contains the built-in functions list
- FunctionRegistry: used to register scalar functions and aggregate functions, it can find the function by name
- FunctionBuilder: used to resolve a BoundFunction class, extract the constructor, and build to a BoundFunction by arguments(`List<Expression>`)

Register example: you can add built-in functions in the list for simplicity

```java
public class BuiltinFunctions implements FunctionHelper {
    public final List<ScalarFunc> scalarFunctions = ImmutableList.of(
            scalar(Substring.class, "substr", "substring"),
            scalar(WeekOfYear.class),
            scalar(Year.class)
    );

    public final ImmutableList<AggregateFunc> aggregateFunctions = ImmutableList.of(
            agg(Avg.class),
            agg(Count.class),
            agg(Max.class),
            agg(Min.class),
            agg(Sum.class)
    );
}
```

Note:
- Currently, we only support register scalar functions add aggregate functions, we will support register table functions.
- Currently, we only support resolve function by function name and difference arity, but can not resolve the same arity override function, e.g. `some_function(Expression)` and `some_function(Literal)`
2022-09-09 15:19:45 +08:00
c9a6486f8c [fix](Nereids) subquery predicate's slot appears in having's output by mistake (#12494)
when uncorrelated subquery in having predicates, having's output will appears one slot from subquery by mistake. This PR fix it by always add a project on the top of having.

Co-authored-by: mch_ucchi <organic_chemistry@foxmail.com>
2022-09-09 11:52:56 +08:00
b1db8aef58 [regression](array-type) add some case for array insert (#12474)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-09-09 11:18:06 +08:00
73351917ab [Enhancement](array-type) Add readable information in subquery for array type #12463 2022-09-09 11:17:50 +08:00
a04f9814fe [fix](Nereids) column prune generate empty project list on join's child (#12486)
* [fix](Nereids) column prune generate empty project list on join's child
2022-09-09 10:43:57 +08:00
f98ec06783 [feature-wip](new-scan) Add memtracker and span for new olap scan node (#12281)
Add memtracker and span for new olap scan node
2022-09-09 09:39:08 +08:00
a468085efe [improvement](error info)improve the s3 path err msg #12438 2022-09-09 09:14:24 +08:00
b4663062da [feature-wip](parquet-reader) bug fix, parquet footer buffer is small when containing many columns (#12477)
Failed when reading parquet file with many columns(>1600).

mysql> select int_col from types_sf100_r100w limit 5;
ERROR 1105 (HY000): errCode = 2, detailMessage = Couldn't deserialize thrift msg:
TProtocolException: Invalid data
parse_thrift_footer uses fixed length buffer(=64k) to read parquet footer, but the meta data of a parquet file with 1600 columns can exceed 5MB.

Therefore, the buffer size needs to be applied according to the actual length.
2022-09-09 09:12:34 +08:00
b45a8379eb [bugfix](odbc) escape identifiers for sqlserver and postgresql (#12487)
Delimited identifier format for sqlserver and postgresql is different from MySQL.
Sqlserver use brackets ([ ]) and postgresql use double quotes("").
2022-09-09 09:11:03 +08:00
3c4c4b1a87 [feature-wip](parquet-reader) add gzip compression codec (#12488)
Query failed when reading parquet data compressed by GZIP:

mysql> select * from customer limit 1;
ERROR 1105 (HY000): errCode = 2, detailMessage = unknown compression type(GZIP)
2022-09-09 09:10:25 +08:00
3cc06820c4 [doc](performance) performance doc and script update (#12493) 2022-09-09 09:09:49 +08:00
2aad293d8a delete_doc_upd (#12473)
delete_doc_update
2022-09-09 09:08:12 +08:00
22dec46f48 [fix](vectorized load) fix incomplete errmsg when find partition failed (#12485)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2022-09-09 09:03:06 +08:00
e84272ed43 [improvment](planner) unset common fields to reduce plan thrift size (#12495)
1. For query with 1656 union, the plan thrift size will be reduced from 400MB+ to 2MB.
This optimization is introduced from #4904, but lost after #9720

2. Disable ExprSubstitutionMap.verify when debug is disable.
So that the plan time of query with 1656 union will be reduced from 20s to 2s
2022-09-09 09:02:45 +08:00
d2a23a4cf9 [enhancement](Nereids) change aggregate and join stats calc algorithm (#12447)
The original statistic derive calculate algorithm rely on NDV and other column statistics. But we cannot get these stats in product environment. 
This PR change these operator's stats calc algorithm to use a DEFAULT RATIO variable instead of column statistics.
We should change these algorithm when we could get column stats in product environment
2022-09-09 01:00:07 +08:00
b4f0f39e77 [feature](Nereids) implement uncheckedCast method in VarcharLiteral (#12468)
Implement uncheckedCast on VarcharLiteral for a temp way to let TimestampArithmetic work.
We should remove these code and do implicit cast in TypeCoercion rule in future.
2022-09-09 00:33:37 +08:00
8478efad44 [improve](Nereids): check same logicalProperty when insert a Group. (#12469) 2022-09-09 00:00:11 +08:00
2ccbbb5392 [fix](stream load) Fix wrong conversion of null value when vstream load json format (#12460) 2022-09-08 16:48:35 +08:00
85bd297777 [feature](function)Support function "current_date" in FE (#11702)
Issue Number: close #11699
2022-09-08 16:00:57 +08:00
d1ab6b1db2 [enhancement](nereids) add syntax support for fractional literal (#12444)
Just as legacy planner, Nereids parse all fractional literal to decimal.
In the future, we will add more syntax for user to control the fractional literal type.
2022-09-08 15:54:20 +08:00
7c7ac86fe8 [feature](Nereids): Left deep tree join order. (#12439)
* [feature](Nereids): Left deep tree join order.
2022-09-08 15:09:22 +08:00
14221adbbd [fix](agg) crash caused by failure of prepare (#12437) 2022-09-08 15:03:45 +08:00