Files
doris/regression-test
924060929 cf04c9c300 [enhancement](Nereids) refine and speedup analyzer (#31792) (#32111)
## Proposed changes
1. check data type whether can applied should not throw exception when real data type is subclass of signature data type
2. merge `SlotBinder` and `FunctionBinder` to `ExpressionAnalyzer` to skip rewrite the whole expression tree multiple times.
3. `ExpressionAnalyzer.buildCustomSlotBinderAnalyzer()` provide more refined code to bind slot by different parts and different priority
4. the origin slot binder has O(n^2) complexity, this pr use `Scope.nameToSlot` to support O(n) bind
5. modify some `Collection.stream()` to `ImmutableXxx.builder()` to remove some method call which are difficult to inline by jvm in the hot path, e.g. `Expression.<init>` and `AbstractTreeNode.<init>`
6. modify some `ImmutableXxx.copyOf(xxx)` to `Utils.fastToImmutableList(xxx)` to skip addition copy of the array
7. set init size to `Immutable.builder()` to skip some useless resize
8. lazy compute and cache some heavy operations, like `Scope.nameToSlot` and `CaseWhen.computeDataTypesForCoercion()`

(cherry picked from commit 83c2f5a95827136aac4f0a78c5e841e9a099858c)
2024-03-12 17:09:38 +08:00
..

新加case注意事项

  1. 变量名前要写 def,否则是全局变量,并行跑的 case 的时候可能被其他 case 影响。

    Problematic code:

    ret = ***
    

    Correct code:

    def ret = ***
    
  2. 尽量不要在 case 中 global 的设置 session variable,或者修改集群配置,可能会影响其他 case。

    Problematic code:

    sql """set global enable_pipeline_x_engine=true;"""
    

    Correct code:

    sql """set enable_pipeline_x_engine=true;"""
    
  3. 如果必须要设置 global,或者要改集群配置,可以指定 case 以 nonConcurrent 的方式运行。

    示例

  4. case 中涉及时间相关的,最好固定时间,不要用类似 now() 函数这种动态值,避免过一段时间后 case 就跑不过了。

    Problematic code:

    sql """select count(*) from table where created < now();"""
    

    Correct code:

    sql """select count(*) from table where created < '2023-11-13';"""
    
  5. case 中 streamload 后请加上 sync 一下,避免在多 FE 环境中执行不稳定。

    Problematic code:

    streamLoad { ... }
    sql """select count(*) from table """
    

    Correct code:

    streamLoad { ... }
    sql """sync"""
    sql """select count(*) from table """