Commit Graph

4687 Commits

Author SHA1 Message Date
f5bef328fe [fix] disable transfer data large than 2GB by brpc (#9770)
because of brpc and protobuf cannot transfer data large than 2GB, if large than 2GB will overflow, so add a check before send
2022-05-25 18:41:13 +08:00
be026addde [security] update canal version to fix fastjson security issue (#9763) 2022-05-25 18:22:37 +08:00
2ad691edf7 [doc] Add manual for Array data type and functions (#9700)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-05-25 16:44:20 +08:00
2725127421 [fix] group by with two NULL rows after left join (#9688)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-05-25 16:43:55 +08:00
ca05d1ee01 [fix](memory tracker) Fix lru cache, compaction tracker, add USE_MEM_TRACKER compile (#9661)
1. Fix Lru Cache MemTracker consumption value is negative.
2. Fix compaction Cache MemTracker has no track.
3. Add USE_MEM_TRACKER compile option.
4. Make sure the malloc/free hook is not stopped at any time.
2022-05-25 08:56:17 +08:00
cc9321a09b [Enhancement](Nereids)refactor plan node into plan + operator (#9755)
Close #9623 

Summary:
This pr refactor plan node into plan + operator.

In the previous version in nereids, a plan node consists of children and relational algebra, e.g.
```java
class LogicalJoin extends LogicalBinary {
  private Plan left, right;
}
```
This structure above is easy to understand, but it difficult to optimize `Memo.copyIn`: rule generate complete sub-plan,
and Memo must compare the complete sub-plan to distinct GroupExpression and hurt performance.

First, we need change the rule to generate partial sub-plan, and replace some children plan to a placeholder, e.g. LeafOp in Columbia optimizer. And then mark some children in sub-plan to unchanged, and bind the relate group, so don't have to compare and copy some sub-plan if relate group exists.

Second, we need separate the origin `Plan` into `Plan` and `Operator`, which Plan contains children and Operator, and Operator just denote relation relational algebra(no children/ input field). This design make operator and children not affect each other. So plan-group binder can generate placeholder plan(contains relate group) for the sub-query, don't have to generate current plan node case by case because the plan is immutable(means generate a new plan with replace children). And rule implementer can reuse the placeholder to generate partial sub-plan.

Operator and Plan have the similar inheritance structure like below. XxxPlan contains XxxOperator, e.g. LogicalBinary contains a LogicalBinaryOperator.
```
          TreeNode
             │
             │
     ┌───────┴────────┐                                                   Operator
     │                │                                                       │
     │                │                                                       │
     │                │                                                       │
     ▼                ▼                                                       ▼
Expression          Plan                                                PlanOperator
                      │                                                       │
                      │                                                       │
          ┌───────────┴─────────┐                                             │
          │                     │                                 ┌───────────┴──────────────────┐
          │                     │                                 │                              │
          │                     │                                 │                              │
          ▼                     ▼                                 ▼                              ▼
     LogicalPlan          PhysicalPlan                   LogicalPlanOperator           PhysicalPlanOperator
          │                     │                                 │                              │
          │                     │                                 │                              │
          │                     │                                 │                              │
          │                     │                                 │                              │
          │                     │                                 │                              │
          │                     │                                 │                              │
          ├───►LogicalLeaf      ├──►PhysicalLeaf                  ├──► LogicalLeafOperator       ├───►PhysicalLeafOperator
          │                     │                                 │                              │
          │                     │                                 │                              │
          │                     │                                 │                              │
          ├───►LogicalUnary     ├──►PhysicalUnary                 ├──► LogicalUnaryOperator      ├───►PhysicalUnaryOperator
          │                     │                                 │                              │
          │                     │                                 │                              │
          │                     │                                 │                              │
          └───►LogicalBinary    └──►PhysicalBinary                └──► LogicalBinaryOperator     └───►PhysicalBinaryOperator
```

The concrete operator extends the XxxNaryOperator, e.g.
```java
class LogicalJoin extends LogicalBinaryOperator;
class PhysicalProject extends PhysicalUnaryOperator;
class LogicalRelation extends LogicalLeafOperator;
```

So the first example change to this:
```java
class LogicalBinary extends AbstractLogicalPlan implements BinaryPlan {
  private Plan left, right;
  private LogicalBinaryOperator operator;
}

class LogicalJoin extends LogicalBinaryOperator {}
```

Under such changes, Rule must build the plan and operator as needed, not only the plan like before.
for example: JoinCommutative Rule
```java
public Rule<Plan> build() {
  // the plan override function can automatic build plan, according to the Operator's type,
  // so return a LogicalBinary(LogicalJoin, Plan, Plan)
  return innerLogicalJoin().then(join -> plan(
    // operator
    new LogicalJoin(join.op.getJoinType().swap(), join.op.getOnClause()),
    // children
    join.right(),
    join.left()
  )).toRule(RuleType.LOGICAL_JOIN_COMMUTATIVE);
}
```
2022-05-24 20:53:24 +08:00
90e8cda5f2 [Enhancement](Vectorized)build hash table with new thread, as non-vec… (#9290)
* [Enhancement][Vectorized]build hash table with new thread, as non-vectorized past do

edit after comments

* format code with clang format

Co-authored-by: lidongyang <dongyang.li@rateup.com.cn>
Co-authored-by: stephen <hello-stephen@qq.com>
2022-05-24 10:23:15 +08:00
6353539ef7 [bugfix]teach BufferedBlockMgr2 track memory right (#9722)
The problem was introduced by e2d3d0134eee5d50b6619fd9194a2e5f9cb557dc.
2022-05-24 10:18:51 +08:00
8b7bb2d07c [bugfix]fix column reader compress codec unsafe problem (#9741)
by moving codec from shared reader to unshared iterator
2022-05-23 20:25:49 +08:00
5039ec4570 [vec][opt] opt hash join build resize hash table before insert data (#9735)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-23 15:13:57 +08:00
fdd5bc07a9 [doc]Add SQL Select usage help documentation (#9729)
Add SQL Select usage help documentation
2022-05-23 13:33:07 +08:00
500c36717d [Bug-Fix][Vectorized] Full join return error result (#9690)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-23 13:29:37 +08:00
77297bb7ee Fix some typos in fe/. (#9682) 2022-05-23 12:11:01 +08:00
5b13fa2b15 [typo] Fix typos in comments (#9710) 2022-05-23 12:01:37 +08:00
ddda91c89d [doc] Update dev image (#9721) 2022-05-23 11:59:15 +08:00
d97e2b1eb2 [doc] update docs for FE UT (#9718) 2022-05-22 21:36:45 +08:00
d8f1b77cc1 [improvement](planner) Backfill the original predicate pushdown code (#9703)
Due to the current architecture, predicate derivation at rewrite cannot satisfy all cases,
because rewrite is performed on first and then where, and when there are subqueries, all cases cannot be derived.
So keep the predicate pushdown method here.

eg.
select * from t1 left join t2 on t1 = t2 where t1 = 1;

InferFiltersRule can't infer t2 = 1, because this is out of specification.

The expression(t2 = 1) can actually be deduced to push it down to the scan node.
2022-05-22 21:35:32 +08:00
3768fdd3f8 [doc] Add trim_tailing_spaces_for_external_table_query variable to the docs. (#9701) 2022-05-22 21:32:23 +08:00
d270f4f2d4 [config](checksum) Disable consistency checker by default (#9699)
Disable by default because current checksum logic has some bugs.
And it will also bring some overhead.
2022-05-22 21:31:43 +08:00
ad4da4aa8f [doc] Fix typos in documentation (#9692) 2022-05-22 21:30:22 +08:00
c13a6a1d8a [fix] NullPredicate should implement evaluate_vec (#9689)
select column from table where column is null
2022-05-22 21:29:53 +08:00
75b3707a28 [refactor](load) add tablet errors when close_wait return error (#9619) 2022-05-22 21:27:42 +08:00
3391de482b [Refactor] simplify some code in routine load (#9532) 2022-05-22 21:25:39 +08:00
b3a2a92bf5 [deps] libhdfs3 build enable kerberos support (#9524)
Currently, the libhdfs3 library integrated by doris BE does not support accessing the cluster with kerberos authentication 
enabled, and found that kerberos-related dependencies(gsasl and krb5) were not added when build libhdfs3.

so, this pr will enable kerberos support and rebuild libhdfs3 with dependencies gsasl and krb5:

- gsasl version: 1.8.0
- krb5 version: 1.19
2022-05-22 20:58:19 +08:00
97fad7a2ff [doc]Add insert best practices (#9723)
Add insert best practices
2022-05-22 16:24:20 +08:00
31e40191a8 [Refactor] add vpre_filter_expr for vectorized to improve performance (#9508) 2022-05-22 11:45:57 +08:00
0c4b47756a [enhancement](community): enhance java style (#9693)
Enhance java style.

Now: checkstyle about code order is in this page--Class and Interface Declarations

This pr can make idea auto rearrange code
2022-05-20 15:24:30 +08:00
61a60d1dcc [code style] minor update for code style (#9695) 2022-05-20 11:47:49 +08:00
8fa677b59c [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner (#9666)
* [Refactor][Bug-Fix][Load Vec] Refactor code of basescanner and vjson/vparquet/vbroker scanner
1. fix bug of vjson scanner not support `range_from_file_path`
2. fix bug of vjson/vbrocker scanner core dump by src/dest slot nullable is different
3. fix bug of vparquest filter_block reference of column in not 1
4. refactor code to simple all the code

It only changed vectorized load, not original row based load.

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-20 11:43:03 +08:00
6f61af7682 [Vectorized][java-udf] add datetime&&largeint&&decimal type to java-udf (#9440) 2022-05-20 10:26:09 +08:00
5fa6e892be [fix](broker-scan-node) Remove trailing spaces in broker_scanner. Make it consistent with hive and trino behavior. (#9190)
Hive and trino/presto would automatically trim the trailing spaces but Doris doesn't.
This would cause different query result with hive.

Add a new session variable "trim_tailing_spaces_for_external_table_query".
If set to true, when reading csv from broker scan node, it will trim the tailing space of the column
2022-05-20 09:55:13 +08:00
defdae1e7d [improvement](stream-load) adjust read unit of http to optimize stream load (#9154) 2022-05-20 09:52:36 +08:00
1e940f28b0 [docs] Fix error command of meta tool docs (#9590)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-05-20 09:36:26 +08:00
c2d41c84bf [feature](nereids): add join rules base code (#9598) 2022-05-20 08:18:08 +08:00
2c79d223e4 [refactor][rowset]move rowset writer to a single place (#9368) 2022-05-19 23:57:02 +08:00
c048b1f0f9 [fix](sparkload): fix min_value will be negative number when maxGlobalDictValue exceeds integer range (#9436) 2022-05-19 23:56:24 +08:00
ef65f484df [Enhancement] improve parquet reader via arrow's prefetch and multi thread (#9472)
* add ArrowReaderProperties to parquet::arrow::FileReader

* support perfecth batch
2022-05-19 23:52:01 +08:00
1355bc162b [Enhance] Add host info to heartbeat error msg (#9499) 2022-05-19 23:45:53 +08:00
Pxl
6951c42d5c [Bug][Vectorized] fix schema change add varchar type column default value get wrong result (#9523) 2022-05-19 23:38:57 +08:00
c09858671d [improvement][performance] improve lru cache resize performance and memory usage (#9521) 2022-05-19 23:37:59 +08:00
939daa07f1 [fix] fix Code Quality Analysis failed (#9685) 2022-05-19 23:13:47 +08:00
0f9ef26576 [Bug] Fix timestamp_diff issue when timeunit is year and month (#9574) 2022-05-19 21:24:43 +08:00
73c4ec7167 Fix some typos in be/. (#9681) 2022-05-19 20:55:39 +08:00
87e3904cc6 Fix some typos for docs. (#9680) 2022-05-19 20:55:21 +08:00
cbc7b167b1 [Feature] cancel load support state (#9537) 2022-05-19 16:37:56 +08:00
119ff2c02d [enhancement] Improve debugging experience. (#9677) 2022-05-19 16:36:37 +08:00
235d586f11 [style](fe) code correct rules and name rules (#9670)
* [style](fe) code correct rules and name rules

* revert some change according to comments
2022-05-19 16:36:03 +08:00
7c2db79b73 [BUG] fix bug for vectorized compaction and some storage vectorization bug (#9610) 2022-05-19 16:35:15 +08:00
cbf1e20fbc [doc]update streamload 2pc doc (#9651)
Co-authored-by: wudi <>
2022-05-19 14:30:17 +08:00
7a9bf5b23e [FeConfig](Project) Project optimization is enabled by default (#9667) 2022-05-19 14:03:14 +08:00