Commit Graph

2336 Commits

Author SHA1 Message Date
33e9d5b2da [enhance](test): remove some System.out.println in UT. (#10859) 2022-07-15 11:16:24 +08:00
b4927a8f15 [chore][nereids] Bump the version of antlr4 to 4.10.1 (#10780) 2022-07-15 10:43:05 +08:00
505758c76b [BUG] (decimalv3) fix FE UTs (#10834) 2022-07-14 19:24:50 +08:00
13e9cb146f [feature-wip](unique-key-merge-on-write) Add option to enable unique-key-merge-on-write, DSIP-018[5/1] (#10814)
* Add option in FE

* add opt in be

* some fix

* update

* fix code style

* fix typo

* fix typo

* update

* code format
2022-07-14 12:10:58 +08:00
e361eb385e [vectorized][udf] improvement java-udaf with group by clause (#10296)
save for file about udaf
add bool _destory_deserialize
update some code according reviewer
change destroy all data at once
2022-07-14 11:23:42 +08:00
3b46242483 [feature-wip] Optimize Decimal type (#10794)
* [feature-wip](decimalv3) support decimalv3

* [feature-wip] Optimize Decimal type

Co-authored-by: liaoxin <liaoxinbit@126.com>
2022-07-14 10:50:50 +08:00
077ec4b114 [bug](multi-catalog) empty hadoop configuration when reading iceberg table (#10793) 2022-07-14 10:18:59 +08:00
e78cca1009 (Refactor)[Nereids] Combine operator and plan (#10786)
in #9755, we split plan into plan & operator, but in subsequent development, we found the rule became complex and counter intuition: 
1. we must create an operator instance, then wrap a plan by the operator type.
2. relational algebra(operator) not contains children 

e.g.
```java
logicalProject().then(project -> {
    List<NamedExpression> boundSlots =
        bind(project.operator.getProjects(), project.children(), project);
    LogicalProject op = new LogicalProject(flatBoundStar(boundSlots));
    // wrap a plan
    return new LogicalUnaryPlan(op, project.child());
})
```

after combine operator and plan, the code become to:
```java
logicalProject().then(project -> {
    List<NamedExpression> boundSlots =
        bind(project.getProjects(), project.children(), project);
    return new LogicalProject(flatBoundStar(boundSlots), project.child());
})
```

Originally, we thought it would be convenient for `Memo.copyIn()` after split plan & operator, because Memo don't known how to re-new the plan(assembling child plan in the children groups) by the plan type. So plan must provide the `withChildren()` abstract method to assembling children. The less plan type, the lower code cost we have(logical/physical with leaf/unary/binary plan, about 6 plans, no concrete plan e.g. LogicalAggregatePlan). 

But the convenient make negative effect that difficult to understand, and people must known the concept then can develop some new rules, and rule become ugly. So we combine the plan & operator, make the rule as simple as possible, the negative effect is we must overwrite some withXxx for all concrete plan, e.g. LogicalAggregate, PhysicalHashJoin.
2022-07-13 19:05:15 +08:00
bd982ac815 [Bug] Fix array functions arguments mismatch (#10549)
Currently, we convert array<Int> to array<BigInt>

For example, the input array_sum([1, 2, 3]) can match function array_sum(Array<Int>) as well as array_sum(Array<BigInt>).

But when a function has more than one argument, the function may be match incorrectly.

For example, the input array_contains([1, 2, 3], 2147483648) will match the function array_contains(Array<BigInt>, BigInt), but the correct match should be array_contains(Array<Int>, Int)

The correct match should be:
array_contains([1, 2, 3], 1) match array_contains(Array<Int>, Int)
array_contains([1, 2, 3], 2147483648) match array_contains(Array<Int>, Int)
array_contains([2147483648, 2147483649, 2147483650], 2147483648) match array_contains(Array<BigInt>, BigInt)

now is:
array_contains([1, 2, 3], 1) match array_contains(Array<Int>, Int)
array_contains([1, 2, 3], 2147483648) match array_contains(Array<BigInt>, BigInt)
array_contains([2147483648, 2147483649, 2147483650], 2147483648) match array_contains(Array<BigInt>, BigInt)

And this will cause some trouble.

Assume that there are two functions being defined:
Int array_functions(Array<Int>, Int)
BigInt array_functions(Array<BigInt>, BigInt)

And array_functions([1,2,3], 2147483648) will match BigInt array_functions(Array<BigInt>, BigInt), but the result type should not be BigInt, but should be Int.
2022-07-13 14:54:49 +08:00
f9f711cd16 FIX: fix datetimev2 decimal error. (#10736) 2022-07-13 08:32:26 +08:00
Pxl
d6210edcda [bugfix]set IsNullPredicate to ALWAYS_NOT_NULLABLE (#10785) 2022-07-13 08:28:00 +08:00
d278f400d4 [enhancement](show data skew) Support show avg_row_count for data skew of one table (#10790) 2022-07-13 08:27:20 +08:00
486cf0ebd4 [Feature] Lightweight schema change of add/drop column (#10136)
* [Schema Change] support fast add/drop column  (#49)

* [feature](schema-change) support fast schema change. coauthor: yixiutt

* [schema change] Using columns desc from fe to read data. coauthor: Lchangliang

* [feature](schema change) schema change optimize for add/drop columns.

1.add uniqueId field for class column.
2.schema change for add/drop columns directly update schema meta

Co-authored-by: yixiutt <yixiu@selectdb.com>
Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com>

[Feature](schema change) fix write and add regression test (#69)

Co-authored-by: yixiutt <yixiu@selectdb.com>

[schema change] be ssupport that delete use newest schema

add delete regression test

fix regression case (#107)

tmp

[feature](schema change) light schema change exclude rollup and agg/uniq/dup key type.

[feature](schema change) fe olapTable maxUniqueId write in disk.

[feature](schema change) add rpc iface for sc add column.

[feature](schema change) add columnsDesc to TPushReq for ligtht sc.

resolve the deadlock when schema change (#124)

fix columns from fe don't has bitmap_index flag (#134)

add update/delete case

construct MATERIALIZED schema from origin schema when insert

fix not vectorized compaction coredump

use segment cache

choose newest schema by schema version when compaction (#182)

[bugfix](schema change) fix ligth schema change problem.

[feature](schema change) light schema change add alter job. (#1)

fix be ut

[bug] (schema change) unique drop key column should not light schema
change

[feature](schema change) add schema change regression-test.

fix regression test

[bugfix](schema change) fix multi alter clauses for light schema change. (#2)

[bugfix](schema change) fix multi clauses calculate column unique id (#3)

modify PushTask process (#217)

[Bugfix](schema change) fix jobId replay cause bdbje exception.

[bug](schema change) fix max col unique id repeatitive. (#232)

[optimize](schema change) modify pendingMaxColUniqueId generate rule.

fix compaction error
* fix be ut

* fix snapshot load core

fix unique_id error (#278)

[refact](fe) remove redundant code for light schema change. (#4)

[refact](fe) remove redundant code for light schema change. (#4)

format fe core

format be core

fix be ut

modify fe meta version

fix rebase error

flush schema into rowset_meta in old table

[refactor](schema change) refact fe light schema change. (#5)

delete the change of schemahash and support get max version schema

* modify for review

* fix be ut

* fix schema change test
2022-07-12 19:41:06 +08:00
f5036fea63 [enhancement][multi-catalog]Add strong checker for hms table (#10724) 2022-07-11 23:48:15 +08:00
5a54d518dc [Refactor](Nereids) remove generic type from concrete expressions (#10761)
in the past, we use generic type for plan and expression to support pattern match framework, it can support type inference without unsafely type cast. then, we observed that expression usually traverse or rewrite by visitor pattern, so generic type is useless for expression and introduces complexity. so we remove generic type from concrete expressions.
2022-07-11 22:30:42 +08:00
c51badb1ae [feature-wip](datev2) add FE functions and fix some bugs (#10767) 2022-07-11 19:25:31 +08:00
deae728fc6 [refactor](nereids) Refine some code snippets (#10672)
Refine some code snippets:
1. Rename: ExpressionUtils::add -> ExpressionUtils::and
2. Reduce temporary objects when combing expressions.
2022-07-11 16:31:38 +08:00
51855633e4 [feature](Nereids): cost and enforcer job in cascades. (#10657)
Issue Number: close #9640

Add enforcer job for cascades.

Inspired by to *NoisePage enforcer job*, and *ORCA paper*

During this period, we will derive physical property for plan tree, and prune the plan according to the cos.
2022-07-11 15:01:59 +08:00
639f1cd26c [improvement](parquet-reader) Add some profile for parquet reader (#10740) 2022-07-11 12:19:06 +08:00
81101fc1c5 [enhancement](alter) Make alter job more robust by ignoring some task failure (#10719)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-07-11 12:16:48 +08:00
8472ea8324 Revert "[Enhancement] Add column prune support for VOlapScanNode (#10615)" (#10734) 2022-07-11 12:16:08 +08:00
1dccfa3d84 [enhancement](nereids) make SSB works (#10659)
enhancement
- refactor compute output expression on root fragment in nereids planner
- refactor aggregate plan translator
- refactor aggregate disassemble rule
- slightly refactor sort plan translator
- add exchange node on the top of plan node tree if it is needed
- slightly refactor PhysicalPlanTranslator#translatePlan

fix
- slotDescriptor should not reuse between TupleDescriptors
- expression's nullable now works fine
- remove quotes when parse string literal
- set resolvedTupleExprs in SortNode to control output
- remove the extra column in sortTupleSlotExprs in SortInfo

known issues
- aggregate function must be the top expression in output expression (need project in ExecNode in BE)
- first phase aggregate could not convert to stream mode.
- OlapScanNode do not set data partition
- Sort could not process expression like 'order by a + 1' and SortInfo generated in a trick way and should be refactor when we want to support 'order by a + 1'
- column prune do not work as expected
2022-07-11 11:33:17 +08:00
46662bfee8 [Bug] CTAS varchar length lost (#10738) 2022-07-10 23:51:36 +08:00
a6e4c88663 [improve](planner): split output expr to multiple line. (#10710)
* [improve](planner): split output expr to multiple line.

+---------------------------------------------------+
| Explain String                                    |
+---------------------------------------------------+
| PLAN FRAGMENT 0                                   |
|   OUTPUT EXPRS:                                   |
|     <slot 9> `user_id`                            |
|     <slot 11> `default_cluster:test`.`tbl`.`date` |
|     <slot 10> `city`                              |
|     <slot 12> `default_cluster:test`.`tbl`.`age`  |
+---------------------------------------------------+

* *: fix UT and regression-test.
2022-07-10 11:35:48 +08:00
1f08f2d144 [Bug][Vectorized] Support array function in where pre in volap_scan_node (#10467)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
Support array function in where pre in volap_scan_node
2022-07-09 16:22:01 +08:00
24d824a783 [improvement](multi-catalog) Impl parallel for file scanner to improve the scanner performance (#10620)
Add multi-thread support in FileScanNode on be and impl the file spilt logic in fe.
2022-07-09 15:52:53 +08:00
d5ea677282 [feature](tracing) Support query tracing to improve doris observability by introducing OpenTelemetry. (#10533)
The collection of query traces is implemented in fe and be, and the spans are exported to zipkin.
DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-012%3A+Introduce+opentelemetry
2022-07-09 15:50:40 +08:00
3229730933 [refactor]broker rpc timeout configuration parameterization (#10692) 2022-07-09 06:27:02 +08:00
Pxl
f58a071605 [Bug][Function] pass intermediate argument list to be (#10650) 2022-07-08 20:50:05 +08:00
e6da00bb26 [feature](nereides) support sort translator (#10678)
Physical sort:
     * 1. Build sortInfo
     *    There are two types of slotRef:
     *    one is generated by the previous node, collectively called old.
     *    the other is newly generated by the sort node, collectively called new.
     *    Filling of sortInfo related data structures,
     *    a. ordering use newSlotRef.
     *    b. sortTupleSlotExprs use oldSlotRef.
     * 2. Create sortNode
     * 3. Create mergeFragment

TODO:
1.Currently, columns that do not exist in select but exist in order by cannot be parsed.
eg: select key from table order by value;

2.For the combination of Literal and slotRefrance in select, there is a problem with parsing, 
eg: select key ,(10-value) from table;
2022-07-08 19:22:48 +08:00
eeee036cba [fix](optimizer) join reorder may cause column non-existence problem (#10670)
for example:
select * from t1 inner join t2 on t1.a = t2.b inner join t3 on t3.c = t2.b;
If t3 is a large table, it will be placed first after the reorderTable,
and the problem that t2.b does not exist will occur in reanalyzing.
2022-07-08 17:28:32 +08:00
e37d29485f [Enhancement] Add column prune support for VOlapScanNode (#10615) 2022-07-08 13:56:26 +08:00
fe8acdb268 [feature-wip](array-type) add agg function collect_list and collect_set (#10606)
add codes for collect_list and collect_set and update regression output, before output format for ARRAY(string) already changed.

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-07-08 12:48:46 +08:00
331fa50501 [feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280)
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.

Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.

The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.

To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
2022-07-08 12:18:39 +08:00
6c3a25bf14 [enhancement](nereids) add betweentocompound rewrite rule for ssb (#10630)
add betweentocompound rewrite rule for ssb.
for example:
1. A BETWEEN X AND Y ==> A >= X AND A <= Y
2. A NOT BETWEEN X AND Y ==> A < X OR A > Y
2022-07-08 10:07:04 +08:00
874299f39e [feature-wip](multi-catalog)(fix) federation query failed (#10602)
Fix https://github.com/apache/doris/pull/10521, multi-catalog query failed for two reasons:
1. The `SelectStmt` does not get the correct catalog.
2. External table should have three level aliases.

Disable querying external views.
Support show create table for external table&view.
2022-07-08 08:52:17 +08:00
3ce9e7cfca [enhance](planner): remove redundant field in sort (#10624)
SortInfo is in SortNode. But there are some replicated field in SortNode

Issue Number: close #10616

Remove the redundant field in `TSortNode` which exist in `TSortInfo`.

[API-BREAK] This has changed `Thrift` file.
2022-07-07 22:32:07 +08:00
a2df5beebb [fix](Nereids): fix ut. (#10658)
fix ut.
2022-07-07 12:00:47 +08:00
5dfb59844f [enhancement](Nereids)refactor PlannerContext and JobContext (#10485)
Refactor Context in Cascades:
use two context in cascades framework.

JobContext is used in each job, contains such attributes:
- reference to PlannerContext
- current cost upper bound 
- current required physical properties

PlannerContext is used to hold global info for query planner, contains such attributes:
- reference to Memo
- reference to connectContext
- reference to ruleset could be used for plan
- job pool to maintain unexecuted jobs
- job scheduler to schedule unexecuted jobs
- current job context for next job to be executed
2022-07-06 18:36:31 +08:00
f758e1166a [fix] Fix RewriteBinaryPredicatesRule which causes wrong query results in some cases. (#10551)
During the query planning phase, the binary predicate rewrite optimization process converting DecimalLiteral to integers may overflow, resulting in false values like "id = 12345678901.0" (see the issue for detailed examples).

This pr fixes a possible overflow and optimizes the case where DecimalLiteral is not in the column type value range.

Issue Number: close #10544
2022-07-06 15:39:27 +08:00
0b80457c1f [feature](nereids) support like and regexp predicate (#10411)
support like and regexp predicate for nereids.
for example:
select * from t1 where k1 like 'xxx' and k2 regexp '^sa'
2022-07-06 14:32:06 +08:00
0b9f508379 [fix](nereids) fix ut,check bound should be called recursively on the plan node (#10530)
fix ut,check bound should be called recursively on the plan node
2022-07-06 10:37:05 +08:00
c936abd2a3 [fix](fe) when bdbje adding follower, master write op may failed. (#10376) 2022-07-06 10:29:16 +08:00
5f5e01b285 [feature-wip](multi-catalog) Fix hive partition prune in hive and hudi external table. (#10547)
`ExprBuilder` use stack to build the expr. 
The input order is : col, value and the output is value, col, but the `>=` is not reverse.
Example:
`col >=  1` => `1 >= col`

In this case, it's better use the queue to keeper the input order.

And also the `CompoundPredicate(OR)` have some problems,  it should be `alwaysTrue` whenever it's not a partition key or it's not a supported op.
2022-07-06 10:22:16 +08:00
589ab06b5c [enhancement](nereids) make filter node and join node work in Nereids (#10605)
enhancement
- add functions `finalizeForNereids` and `finalizeImplForNereids` in stale expression to generate some attributes using in BE.
- remove unnecessary parameter `Analyzer` in function `getBuiltinFunction`
- swap join condition if its left hand expression related to right table
- change join physical implementation to broadcast hash join 
- add push predicate rule into planner

fix
- swap join children visit order to ensure the last fragment is root
- avoid visit join left child twice

known issues
- expression compute will generate a wrong answer when expression include arithmetic with two literal children.
2022-07-05 18:23:00 +08:00
3b0ddd7ae0 [Enhancement](Nereids)(Step1) prune column for filter/agg/join/sort (#10478)
Column pruning for filter/agg/join/sort.

#### For agg
Pattern : agg()
Transformed:
```
agg
  |
project
  |
child
```
#### For filter()/sort():
Pattern: project(filter()/join()/sort())
Transformed:
```
project
    |
filter/sort
   |
project
   |
child
```
#### For join
Pattern: project(join())
Transformed:
```
        project
             |
           join
       /          \
project    project
   |              |
child        child
```

for example:
```sql
table a: k1,v1
table b: k1,k2,k3,v1
select a.k1,b.k2 from a,b on a.k1 = b.k1 where a.k1 > 1
```

origin plan tree:
```
         project(a.k1,b.k2 )
                        |
          join(a:k1,v1 b:k1,k2,k3,v1)
                /                   \
 scan(a:k1,v1)         scan(b:k1,k2,k3,v1)
```

transformed plan tree:

```
              project(a.k1,b.k2 )
                        |
               join(a:k1 b:k1,k2)
               /                  \
          project(k1)   project(k1,k2)
               |                      |
 scan(a:k1,v1)       scan(b:k1,k2,k3,v1)
```
2022-07-05 17:54:21 +08:00
f40ae7c654 [feature-wip](multi-catalog) support "show proc 'catalogs/'" (#10596) 2022-07-05 13:40:24 +08:00
680118c6b9 [Feature] [nereids] Agg rewrite rule of nereids optmizer (#10412)
Add Rule for disassemble the logical aggregate node, this is necessary since our execution framework is distributed and the execution of aggregate always in two steps, first, aggregate locally then merge them.

Add some fields to logical aggregate to determine whether a logical aggreate operator has been disasembled and mark the aggregate phase it belongs and add the logic to mapping  the new aggregate function to its stale definition to get the function intermediate type.
2022-07-05 11:57:42 +08:00
e444ac7a87 [format](*): using guava package header (#10325) 2022-07-05 11:05:39 +08:00
73ba806046 [feature-wip](multi-catalog) Add catalog to information_schema table "columns". (#10592) 2022-07-05 09:57:19 +08:00