Files
doris/fe
EmmyMiao87 cbc42db010 [JoinReorder] Implement a better join reorder algorithm. (#6226)
The current JoinReorder algorithm mainly sorts according to the star model,
and only considers the query association relationship between the table and the table.
The problems are following:
1. Only applicable to user data whose data model is a star model, data of other models cannot be sorted.
2. Regardless of the cost of the table, it is impossible to determine the size of the join table relationship,
   and the real query optimization ability is weak.
3. It is impossible to avoid possible time-consuming joins such as cross joins by sorting.

The new JoinReorder algorithm mainly introduces a new sorting algorithm for Join
The new ranking algorithm introduces the cost evaluation model to Doris.

The sorting algorithm is mainly based on the following three principles:
1. The order is: Largest node, Smallest node. . . Second largest node
2. Cross join is better than Inner join
3. The right children of Outer join, semi join, and anti join do not move

PlanNode's cost model evaluation mainly relies on two values: cardinality and selectivity.
cardinality: cardinality, can also be simply understood as the number of rows.
selectivity: selectivity, a value between 0 and 1. Predicate generally has selectivity.
The cost model generally calculates the final cardinality of a PlanNode based on the pre-calculated
cardinality of PlanNode and the selectivity of the predicate to which it belongs.

Currently, you can configure "enable_cost_based_join_reorder" to control the opening and closing of JoinReorder.
When the configuration is turned on, the new sorting algorithm will take effect, when it is turned off,
the old sorting algorithm will take effect, and it is turned off by default.

The new sorting algorithm currently has no cost base evaluation for external tables (odbc, es)
and set calculations (intersect, except). When using these queries, it is not recommended to enable cost base join reorder.
When using these queries, it is not recommended to enable cost base join reorder.

At the code architecture level:
1. The new sorting algorithm occurs in the single-node execution planning stage.
2. Refactored the init and finalize phases of PlanNode to ensure that PlanNode planning
   and cost evaluation have been completed before the sorting algorithm occurs.
2021-07-14 13:08:28 +08:00
..

# fe-common

This module is used to store some common classes of other modules.

# spark-dpp

This module is Spark DPP program, used for Spark Load function.
Depends: fe-common

# fe-core

This module is the main process module of FE.
Depends: fe-common, spark-dpp