Commit Graph

5161 Commits

Author SHA1 Message Date
5dfb59844f [enhancement](Nereids)refactor PlannerContext and JobContext (#10485)
Refactor Context in Cascades:
use two context in cascades framework.

JobContext is used in each job, contains such attributes:
- reference to PlannerContext
- current cost upper bound 
- current required physical properties

PlannerContext is used to hold global info for query planner, contains such attributes:
- reference to Memo
- reference to connectContext
- reference to ruleset could be used for plan
- job pool to maintain unexecuted jobs
- job scheduler to schedule unexecuted jobs
- current job context for next job to be executed
2022-07-06 18:36:31 +08:00
29d4809c80 [BugFix](Array) fix DataTypeArray to_string use after free (#10640)
ColumnArray::convert_to_full_column_if_const override the base function
and ColumnArray::create generate a temporary variable
2022-07-06 18:18:00 +08:00
416fb73621 docs format fix for explode-json-array table function (#10613)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-07-06 17:57:19 +08:00
Pxl
6d092a6d53 set strleft to always_nullable (#10496) 2022-07-06 17:56:01 +08:00
cff9ffa0e1 fix the inaccurate comments (#10617)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-07-06 17:54:43 +08:00
b4c5dfc28e [Improvement] remove redundant code of VOlapScanner (#10621) 2022-07-06 17:54:10 +08:00
d9ba946118 [enhance](*): git ignore package-lock.json. (#10637) 2022-07-06 17:53:22 +08:00
bff561c0da [feature](script) add --grace option for stop_be.sh (#10626)
be asan mem leak check needs exit app gracefully.
2022-07-06 17:53:01 +08:00
a7df6e3dee rename some files inside vec/sink dir (#10636)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-07-06 17:52:47 +08:00
f758e1166a [fix] Fix RewriteBinaryPredicatesRule which causes wrong query results in some cases. (#10551)
During the query planning phase, the binary predicate rewrite optimization process converting DecimalLiteral to integers may overflow, resulting in false values like "id = 12345678901.0" (see the issue for detailed examples).

This pr fixes a possible overflow and optimizes the case where DecimalLiteral is not in the column type value range.

Issue Number: close #10544
2022-07-06 15:39:27 +08:00
0b80457c1f [feature](nereids) support like and regexp predicate (#10411)
support like and regexp predicate for nereids.
for example:
select * from t1 where k1 like 'xxx' and k2 regexp '^sa'
2022-07-06 14:32:06 +08:00
006283c036 [Fix] select nested type of string within type array should be wrapped with '' in vectorized path (#10498) 2022-07-06 10:47:36 +08:00
0b9f508379 [fix](nereids) fix ut,check bound should be called recursively on the plan node (#10530)
fix ut,check bound should be called recursively on the plan node
2022-07-06 10:37:05 +08:00
c936abd2a3 [fix](fe) when bdbje adding follower, master write op may failed. (#10376) 2022-07-06 10:29:16 +08:00
5f5e01b285 [feature-wip](multi-catalog) Fix hive partition prune in hive and hudi external table. (#10547)
`ExprBuilder` use stack to build the expr. 
The input order is : col, value and the output is value, col, but the `>=` is not reverse.
Example:
`col >=  1` => `1 >= col`

In this case, it's better use the queue to keeper the input order.

And also the `CompoundPredicate(OR)` have some problems,  it should be `alwaysTrue` whenever it's not a partition key or it's not a supported op.
2022-07-06 10:22:16 +08:00
43015f11a5 [Improvement] remove beHttpAddress in regression test (#10623) 2022-07-06 08:59:29 +08:00
8e364fb848 [fix](load) skip empty orc file (#10593)
Something the upstream system(eg, hive) may create empty orc file
which only has a header and footer, without schema.
And if we call `_reader->createRowReader()` with selected columns,
it will throw ParserError: Invalid column selected xx.
So here we first check its number of rows and skip these kind of files.

This is only a fix for non-vec load, for vec load, it use arrow scanner
to read orc file, which does not have this problem.
2022-07-05 22:18:56 +08:00
1f57fcc4e9 remove duplicate codes from function_test_util.cpp (#10607)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-07-05 20:43:56 +08:00
89e56ea67f [refactor] remove alpha rowset related code and vectorized row batch related code (#10584) 2022-07-05 20:33:34 +08:00
3e87960202 [bugfix] fix bug of vhash join build (#10614)
* [bugfix] fix bug of vhash join build

* format code
2022-07-05 19:14:42 +08:00
589ab06b5c [enhancement](nereids) make filter node and join node work in Nereids (#10605)
enhancement
- add functions `finalizeForNereids` and `finalizeImplForNereids` in stale expression to generate some attributes using in BE.
- remove unnecessary parameter `Analyzer` in function `getBuiltinFunction`
- swap join condition if its left hand expression related to right table
- change join physical implementation to broadcast hash join 
- add push predicate rule into planner

fix
- swap join children visit order to ensure the last fragment is root
- avoid visit join left child twice

known issues
- expression compute will generate a wrong answer when expression include arithmetic with two literal children.
2022-07-05 18:23:00 +08:00
3b0ddd7ae0 [Enhancement](Nereids)(Step1) prune column for filter/agg/join/sort (#10478)
Column pruning for filter/agg/join/sort.

#### For agg
Pattern : agg()
Transformed:
```
agg
  |
project
  |
child
```
#### For filter()/sort():
Pattern: project(filter()/join()/sort())
Transformed:
```
project
    |
filter/sort
   |
project
   |
child
```
#### For join
Pattern: project(join())
Transformed:
```
        project
             |
           join
       /          \
project    project
   |              |
child        child
```

for example:
```sql
table a: k1,v1
table b: k1,k2,k3,v1
select a.k1,b.k2 from a,b on a.k1 = b.k1 where a.k1 > 1
```

origin plan tree:
```
         project(a.k1,b.k2 )
                        |
          join(a:k1,v1 b:k1,k2,k3,v1)
                /                   \
 scan(a:k1,v1)         scan(b:k1,k2,k3,v1)
```

transformed plan tree:

```
              project(a.k1,b.k2 )
                        |
               join(a:k1 b:k1,k2)
               /                  \
          project(k1)   project(k1,k2)
               |                      |
 scan(a:k1,v1)       scan(b:k1,k2,k3,v1)
```
2022-07-05 17:54:21 +08:00
86502b014d [feature-wip](unique-key-merge-on-write)port IntervalTree from kudu (#10511)
See the DISP-18:https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model
This patch is for step 3.1 in scheduling.
2022-07-05 17:43:01 +08:00
575bf18d55 [enhancement] speed up week_of_year by pre_calc table (#10586) 2022-07-05 15:37:02 +08:00
f40ae7c654 [feature-wip](multi-catalog) support "show proc 'catalogs/'" (#10596) 2022-07-05 13:40:24 +08:00
680118c6b9 [Feature] [nereids] Agg rewrite rule of nereids optmizer (#10412)
Add Rule for disassemble the logical aggregate node, this is necessary since our execution framework is distributed and the execution of aggregate always in two steps, first, aggregate locally then merge them.

Add some fields to logical aggregate to determine whether a logical aggreate operator has been disasembled and mark the aggregate phase it belongs and add the logic to mapping  the new aggregate function to its stale definition to get the function intermediate type.
2022-07-05 11:57:42 +08:00
585d42330c [BUG] fix bug in bloom filter for datev2 (#10579) 2022-07-05 11:10:03 +08:00
a2f74bf260 [Improvement] remove profile with poor readability (#10581) 2022-07-05 11:09:23 +08:00
b7441ed291 [chore] remove default REPOSITORY_URL link (#10599) 2022-07-05 11:07:18 +08:00
302e078e6a [dev env]: add idea provided doc. (#10597) 2022-07-05 11:06:53 +08:00
e444ac7a87 [format](*): using guava package header (#10325) 2022-07-05 11:05:39 +08:00
3c140ae05b [fix] [docs] Fixed Use examples in sequence-column-manual.md file. (#10588)
* [fix] [docs] Fixed Use examples in sequence-column-manual.md file.

Co-authored-by: 杨帅统 <yangshuaitong@gaolvgo.com>
Co-authored-by: spaces-x <weixiao5220@gmail.com>
2022-07-05 10:27:13 +08:00
cc2de23455 [docs] add quick compaction configs (#10559) 2022-07-05 10:03:37 +08:00
73ba806046 [feature-wip](multi-catalog) Add catalog to information_schema table "columns". (#10592) 2022-07-05 09:57:19 +08:00
9c990b073f [regression] modify compaction cases, not depend on beHttpAddress (#10553) 2022-07-04 22:36:12 +08:00
570139e332 [fix][be] Delete uncivilized comments. (#10578) 2022-07-04 22:35:15 +08:00
1f1bdaa9c3 [bugfix] fix coredump of left anti join (#10591) 2022-07-04 22:29:41 +08:00
Pxl
e68ab0084b [bugfix]fix default value get wrong result because no implement read_by_rowids (#10582) 2022-07-04 19:30:49 +08:00
1cee0a7028 [feature-wip](multi-catalog) Modify the persist method about data source (#10523) 2022-07-04 18:24:14 +08:00
46bff6bba0 [fix](multi-catalog) fix the core dump on hms table (#10573)
In the funciton `TextConverter::write_vec_column`, it should execute the statement `nullable_column->get_null_map_data().push_back(0);` for every row.
Otherwise the null map will get error and cause the core dump.
2022-07-04 15:52:05 +08:00
88420deec1 [Bug][docs] Fix wrong links in README.md (#10394)
fix deadlink in README
2022-07-04 14:44:23 +08:00
e6f090e5bf [enhancement](Nereids)make nereids work (#10550)
Nereids could execute query: `select a from t;`

**enhancement**
- add a queriable interface for QueryStmt and LogicalPlanAdapter Temporarily
- refactor GroupId, GroupId extends doris.common.id now
- GroupId is generated by it's memo now, not global yet
- add varchar type
- Nereids enabled only when vectorized engine enabled

**fix**
- set output and column label to logicalPlanAdapter
- set output expression on root fragment
- set select partition and select index id to OlapScanNode
- BatchRulesJob add rule type mismatch
- add all implementation rules to rule set
- SlotReference get catalog column no longer returns null values
- bind star correctly
- implement `isNullable` in expressions

**known issue**
- could not do expression mapping(e.g. a + 1) on project node(wait intermediate tuple interface and project ability in ExecNode in be)
- aggregate do not work
- sort do not work
- filter do not work
- join do not work
2022-07-04 14:15:33 +08:00
9d4a9b95a4 [Build] fix the compile error with clang (#10570)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-07-04 11:13:17 +08:00
1a173a854e [fix](routine-load) Fix that routine load cannot work with old kafka version (#10554)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-07-04 10:47:50 +08:00
d56d88d391 [improvement]No need to memset flags for vectorization predicates (#10564) 2022-07-04 10:23:08 +08:00
c5f85c9818 [community] modify release doc to remove incubator (#10574) 2022-07-04 10:18:23 +08:00
Pxl
0b251481d5 [Enhancement][Storage] refactor Comparison Predicates (#10380) 2022-07-04 09:22:27 +08:00
d6658f16d2 [chore][community](github)Change 'max-old-space-size' to 8192 (#10557) 2022-07-04 08:59:54 +08:00
7bfe438884 [BUG] fix bug in literal debug_string when literal is null (#10567) 2022-07-04 08:57:55 +08:00
b11e72b76b [chore] turn off java-udf by default when compiling in parallel (#10569) 2022-07-03 23:24:49 +08:00