[Performance] Improve performence of hash join in some case (#3148)
improve performent of hash join when build table has to many duplicated rows, this will cause hash table collisions and slow down the probe performence. In this pr when join type is semi join or anti join, we will build a hash table without duplicated rows. benchmark: dataset: tpcds dataset `store_sales` and `catalog_sales` ``` mysql> select count(*) from catalog_sales; +----------+ | count(*) | +----------+ | 14401261 | +----------+ 1 row in set (0.44 sec) mysql> select count(distinct cs_bill_cdemo_sk) from catalog_sales; +------------------------------------+ | count(DISTINCT `cs_bill_cdemo_sk`) | +------------------------------------+ | 1085080 | +------------------------------------+ 1 row in set (2.46 sec) mysql> select count(*) from store_sales; +----------+ | count(*) | +----------+ | 28800991 | +----------+ 1 row in set (0.84 sec) mysql> select count(distinct ss_addr_sk) from store_sales; +------------------------------+ | count(DISTINCT `ss_addr_sk`) | +------------------------------+ | 249978 | +------------------------------+ 1 row in set (2.57 sec) ``` test querys: query1: `select count(*) from (select store_sales.ss_addr_sk from store_sales left semi join catalog_sales on catalog_sales.cs_bill_cdemo_sk = store_sales.ss_addr_sk) a;` query2: `select count(*) from (select catalog_sales.cs_bill_cdemo_sk from catalog_sales left semi join store_sales on catalog_sales.cs_bill_cdemo_sk = store_sales.ss_addr_sk) a;` benchmark result: ||query1|query2| |:--:|:--:|:--:| |before|14.76 sec|3 min 16.52 sec| |after|12.64 sec|10.34 sec|
This commit is contained in:
@ -44,6 +44,9 @@ HashJoinNode::HashJoinNode(
|
||||
_match_all_build =
|
||||
(_join_op == TJoinOp::RIGHT_OUTER_JOIN || _join_op == TJoinOp::FULL_OUTER_JOIN);
|
||||
_is_push_down = tnode.hash_join_node.is_push_down;
|
||||
_build_unique = _join_op == TJoinOp::LEFT_ANTI_JOIN|| _join_op == TJoinOp::RIGHT_ANTI_JOIN
|
||||
|| _join_op == TJoinOp::RIGHT_SEMI_JOIN || _join_op == TJoinOp::LEFT_SEMI_JOIN
|
||||
|| _join_op == TJoinOp::NULL_AWARE_LEFT_ANTI_JOIN
|
||||
}
|
||||
|
||||
HashJoinNode::~HashJoinNode() {
|
||||
|
||||
Reference in New Issue
Block a user