Commit Graph

8276 Commits

Author SHA1 Message Date
e37d29485f [Enhancement] Add column prune support for VOlapScanNode (#10615) 2022-07-08 13:56:26 +08:00
fe8acdb268 [feature-wip](array-type) add agg function collect_list and collect_set (#10606)
add codes for collect_list and collect_set and update regression output, before output format for ARRAY(string) already changed.

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-07-08 12:48:46 +08:00
331fa50501 [feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280)
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.

Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.

The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.

To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
2022-07-08 12:18:39 +08:00
e159e748df [chore](dependency) fix opentelemetry-cpp enable o3 optimization will core. (#10675) 2022-07-08 10:08:07 +08:00
6c3a25bf14 [enhancement](nereids) add betweentocompound rewrite rule for ssb (#10630)
add betweentocompound rewrite rule for ssb.
for example:
1. A BETWEEN X AND Y ==> A >= X AND A <= Y
2. A NOT BETWEEN X AND Y ==> A < X OR A > Y
2022-07-08 10:07:04 +08:00
874299f39e [feature-wip](multi-catalog)(fix) federation query failed (#10602)
Fix https://github.com/apache/doris/pull/10521, multi-catalog query failed for two reasons:
1. The `SelectStmt` does not get the correct catalog.
2. External table should have three level aliases.

Disable querying external views.
Support show create table for external table&view.
2022-07-08 08:52:17 +08:00
03296aedd5 [BUG] fix core dump caused by runtime filter (#10611) 2022-07-08 08:28:39 +08:00
853f85aea4 [enhancement] improve performance of week() and yearweek() (#10633) 2022-07-08 08:26:58 +08:00
3ce9e7cfca [enhance](planner): remove redundant field in sort (#10624)
SortInfo is in SortNode. But there are some replicated field in SortNode

Issue Number: close #10616

Remove the redundant field in `TSortNode` which exist in `TSortInfo`.

[API-BREAK] This has changed `Thrift` file.
2022-07-07 22:32:07 +08:00
c583d3e27c [fix][vectorized] Fix bug of VInPredicate on date type (#10663) 2022-07-07 22:15:33 +08:00
f03335d61d [action](Nereids): add label auto for nereids UT. (#10665) 2022-07-07 18:21:04 +08:00
a2df5beebb [fix](Nereids): fix ut. (#10658)
fix ut.
2022-07-07 12:00:47 +08:00
8012d63ea0 [fix] substr('', 1, 5) return empty string instead of null (#10622) 2022-07-06 22:51:02 +08:00
8de8a9571a [docs] Fixed description about networks in Quick Start (#10639) 2022-07-06 22:49:43 +08:00
3bf8c761a4 [BUG] Fix invalid return type for left and right function (#10643) 2022-07-06 22:49:19 +08:00
5dfb59844f [enhancement](Nereids)refactor PlannerContext and JobContext (#10485)
Refactor Context in Cascades:
use two context in cascades framework.

JobContext is used in each job, contains such attributes:
- reference to PlannerContext
- current cost upper bound 
- current required physical properties

PlannerContext is used to hold global info for query planner, contains such attributes:
- reference to Memo
- reference to connectContext
- reference to ruleset could be used for plan
- job pool to maintain unexecuted jobs
- job scheduler to schedule unexecuted jobs
- current job context for next job to be executed
2022-07-06 18:36:31 +08:00
29d4809c80 [BugFix](Array) fix DataTypeArray to_string use after free (#10640)
ColumnArray::convert_to_full_column_if_const override the base function
and ColumnArray::create generate a temporary variable
2022-07-06 18:18:00 +08:00
416fb73621 docs format fix for explode-json-array table function (#10613)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-07-06 17:57:19 +08:00
Pxl
6d092a6d53 set strleft to always_nullable (#10496) 2022-07-06 17:56:01 +08:00
cff9ffa0e1 fix the inaccurate comments (#10617)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-07-06 17:54:43 +08:00
b4c5dfc28e [Improvement] remove redundant code of VOlapScanner (#10621) 2022-07-06 17:54:10 +08:00
d9ba946118 [enhance](*): git ignore package-lock.json. (#10637) 2022-07-06 17:53:22 +08:00
bff561c0da [feature](script) add --grace option for stop_be.sh (#10626)
be asan mem leak check needs exit app gracefully.
2022-07-06 17:53:01 +08:00
a7df6e3dee rename some files inside vec/sink dir (#10636)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-07-06 17:52:47 +08:00
f758e1166a [fix] Fix RewriteBinaryPredicatesRule which causes wrong query results in some cases. (#10551)
During the query planning phase, the binary predicate rewrite optimization process converting DecimalLiteral to integers may overflow, resulting in false values like "id = 12345678901.0" (see the issue for detailed examples).

This pr fixes a possible overflow and optimizes the case where DecimalLiteral is not in the column type value range.

Issue Number: close #10544
2022-07-06 15:39:27 +08:00
0b80457c1f [feature](nereids) support like and regexp predicate (#10411)
support like and regexp predicate for nereids.
for example:
select * from t1 where k1 like 'xxx' and k2 regexp '^sa'
2022-07-06 14:32:06 +08:00
006283c036 [Fix] select nested type of string within type array should be wrapped with '' in vectorized path (#10498) 2022-07-06 10:47:36 +08:00
0b9f508379 [fix](nereids) fix ut,check bound should be called recursively on the plan node (#10530)
fix ut,check bound should be called recursively on the plan node
2022-07-06 10:37:05 +08:00
c936abd2a3 [fix](fe) when bdbje adding follower, master write op may failed. (#10376) 2022-07-06 10:29:16 +08:00
5f5e01b285 [feature-wip](multi-catalog) Fix hive partition prune in hive and hudi external table. (#10547)
`ExprBuilder` use stack to build the expr. 
The input order is : col, value and the output is value, col, but the `>=` is not reverse.
Example:
`col >=  1` => `1 >= col`

In this case, it's better use the queue to keeper the input order.

And also the `CompoundPredicate(OR)` have some problems,  it should be `alwaysTrue` whenever it's not a partition key or it's not a supported op.
2022-07-06 10:22:16 +08:00
43015f11a5 [Improvement] remove beHttpAddress in regression test (#10623) 2022-07-06 08:59:29 +08:00
8e364fb848 [fix](load) skip empty orc file (#10593)
Something the upstream system(eg, hive) may create empty orc file
which only has a header and footer, without schema.
And if we call `_reader->createRowReader()` with selected columns,
it will throw ParserError: Invalid column selected xx.
So here we first check its number of rows and skip these kind of files.

This is only a fix for non-vec load, for vec load, it use arrow scanner
to read orc file, which does not have this problem.
2022-07-05 22:18:56 +08:00
1f57fcc4e9 remove duplicate codes from function_test_util.cpp (#10607)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-07-05 20:43:56 +08:00
89e56ea67f [refactor] remove alpha rowset related code and vectorized row batch related code (#10584) 2022-07-05 20:33:34 +08:00
3e87960202 [bugfix] fix bug of vhash join build (#10614)
* [bugfix] fix bug of vhash join build

* format code
2022-07-05 19:14:42 +08:00
589ab06b5c [enhancement](nereids) make filter node and join node work in Nereids (#10605)
enhancement
- add functions `finalizeForNereids` and `finalizeImplForNereids` in stale expression to generate some attributes using in BE.
- remove unnecessary parameter `Analyzer` in function `getBuiltinFunction`
- swap join condition if its left hand expression related to right table
- change join physical implementation to broadcast hash join 
- add push predicate rule into planner

fix
- swap join children visit order to ensure the last fragment is root
- avoid visit join left child twice

known issues
- expression compute will generate a wrong answer when expression include arithmetic with two literal children.
2022-07-05 18:23:00 +08:00
3b0ddd7ae0 [Enhancement](Nereids)(Step1) prune column for filter/agg/join/sort (#10478)
Column pruning for filter/agg/join/sort.

#### For agg
Pattern : agg()
Transformed:
```
agg
  |
project
  |
child
```
#### For filter()/sort():
Pattern: project(filter()/join()/sort())
Transformed:
```
project
    |
filter/sort
   |
project
   |
child
```
#### For join
Pattern: project(join())
Transformed:
```
        project
             |
           join
       /          \
project    project
   |              |
child        child
```

for example:
```sql
table a: k1,v1
table b: k1,k2,k3,v1
select a.k1,b.k2 from a,b on a.k1 = b.k1 where a.k1 > 1
```

origin plan tree:
```
         project(a.k1,b.k2 )
                        |
          join(a:k1,v1 b:k1,k2,k3,v1)
                /                   \
 scan(a:k1,v1)         scan(b:k1,k2,k3,v1)
```

transformed plan tree:

```
              project(a.k1,b.k2 )
                        |
               join(a:k1 b:k1,k2)
               /                  \
          project(k1)   project(k1,k2)
               |                      |
 scan(a:k1,v1)       scan(b:k1,k2,k3,v1)
```
2022-07-05 17:54:21 +08:00
86502b014d [feature-wip](unique-key-merge-on-write)port IntervalTree from kudu (#10511)
See the DISP-18:https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model
This patch is for step 3.1 in scheduling.
2022-07-05 17:43:01 +08:00
575bf18d55 [enhancement] speed up week_of_year by pre_calc table (#10586) 2022-07-05 15:37:02 +08:00
f40ae7c654 [feature-wip](multi-catalog) support "show proc 'catalogs/'" (#10596) 2022-07-05 13:40:24 +08:00
680118c6b9 [Feature] [nereids] Agg rewrite rule of nereids optmizer (#10412)
Add Rule for disassemble the logical aggregate node, this is necessary since our execution framework is distributed and the execution of aggregate always in two steps, first, aggregate locally then merge them.

Add some fields to logical aggregate to determine whether a logical aggreate operator has been disasembled and mark the aggregate phase it belongs and add the logic to mapping  the new aggregate function to its stale definition to get the function intermediate type.
2022-07-05 11:57:42 +08:00
585d42330c [BUG] fix bug in bloom filter for datev2 (#10579) 2022-07-05 11:10:03 +08:00
a2f74bf260 [Improvement] remove profile with poor readability (#10581) 2022-07-05 11:09:23 +08:00
b7441ed291 [chore] remove default REPOSITORY_URL link (#10599) 2022-07-05 11:07:18 +08:00
302e078e6a [dev env]: add idea provided doc. (#10597) 2022-07-05 11:06:53 +08:00
e444ac7a87 [format](*): using guava package header (#10325) 2022-07-05 11:05:39 +08:00
3c140ae05b [fix] [docs] Fixed Use examples in sequence-column-manual.md file. (#10588)
* [fix] [docs] Fixed Use examples in sequence-column-manual.md file.

Co-authored-by: 杨帅统 <yangshuaitong@gaolvgo.com>
Co-authored-by: spaces-x <weixiao5220@gmail.com>
2022-07-05 10:27:13 +08:00
cc2de23455 [docs] add quick compaction configs (#10559) 2022-07-05 10:03:37 +08:00
73ba806046 [feature-wip](multi-catalog) Add catalog to information_schema table "columns". (#10592) 2022-07-05 09:57:19 +08:00
9c990b073f [regression] modify compaction cases, not depend on beHttpAddress (#10553) 2022-07-04 22:36:12 +08:00