Files
doris/regression-test
924060929 ff990eb869 [enhancement](Nereids) refactor expression rewriter to pattern match (#32617)
this pr can improve the performance of the nereids planner, in plan stage.

1. refactor expression rewriter to pattern match, so the lots of expression rewrite rules can criss-crossed apply in a big bottom-up iteration, and rewrite until the expression became stable. now we can process more cases because original there has no loop, and sometimes only process the top expression, like `SimplifyArithmeticRule`.
2. replace `Collection.stream()` to `ImmutableXxx.Builder` to avoid useless method call
3. loop unrolling some codes, like `Expression.<init>`, `PlanTreeRewriteBottomUpJob.pushChildrenJobs`
4. use type/arity specified-code, like `OneRangePartitionEvaluator.toNereidsLiterals()`, `PartitionRangeExpander.tryExpandRange()`, `PartitionRangeExpander.enumerableCount()`
5. refactor `ExtractCommonFactorRule`, now we can extract more cases, and I fix the deed loop when use `ExtractCommonFactorRule` and `SimplifyRange` in one iterative, because `SimplifyRange` generate right deep tree, but `ExtractCommonFactorRule` generate left deep tree
6. refactor `FoldConstantRuleOnFE`, support visitor/pattern match mode, in ExpressionNormalization, pattern match can criss-crossed apply with other rules; in PartitionPruner, visitor can evaluate expression faster
7. lazy compute and cache some operation
8. use int field to compare date
9. use BitSet to find disableNereidsRules
10. two level loop usually faster then build Multimap when bind slot in Scope, so I revert the code
11. `PlanTreeRewriteBottomUpJob` don't need to clearStatePhase any more

### test case
100 threads parallel continuous send this sql which query an empty table, test in my mac machine(m2 chip, 8 core), enable sql cache
```sql
select  count(1),date_format(time_col,'%Y%m%d'),varchar_col1
from tbl
where  partition_date>'2024-02-15'  and (varchar_col2 ='73130' or varchar_col3='73130') and time_col>'2024-03-04'
  and  time_col<'2024-03-05'
group by date_format(time_col,'%Y%m%d'),varchar_col1
order by date_format(time_col,'%Y%m%d') desc, varchar_col1 desc,count(1) asc
limit 1000
```

before this pr: 3100 peak QPS, about 2700 avg QPS
after this pr: 4800 peak QPS, about 4400 avg QPS

(cherry picked from commit 7338683fdbdf77711f2ce61e580c19f4ea100723)
2024-04-10 14:59:45 +08:00
..

新加case注意事项

  1. 变量名前要写 def,否则是全局变量,并行跑的 case 的时候可能被其他 case 影响。

    Problematic code:

    ret = ***
    

    Correct code:

    def ret = ***
    
  2. 尽量不要在 case 中 global 的设置 session variable,或者修改集群配置,可能会影响其他 case。

    Problematic code:

    sql """set global enable_pipeline_x_engine=true;"""
    

    Correct code:

    sql """set enable_pipeline_x_engine=true;"""
    
  3. 如果必须要设置 global,或者要改集群配置,可以指定 case 以 nonConcurrent 的方式运行。

    示例

  4. case 中涉及时间相关的,最好固定时间,不要用类似 now() 函数这种动态值,避免过一段时间后 case 就跑不过了。

    Problematic code:

    sql """select count(*) from table where created < now();"""
    

    Correct code:

    sql """select count(*) from table where created < '2023-11-13';"""
    
  5. case 中 streamload 后请加上 sync 一下,避免在多 FE 环境中执行不稳定。

    Problematic code:

    streamLoad { ... }
    sql """select count(*) from table """
    

    Correct code:

    streamLoad { ... }
    sql """sync"""
    sql """select count(*) from table """