1.Convert subqueries to Apply nodes. 2.Convert ApplyNode to ordinary join. ### Detailed design: There are three types of current subexpressions, scalarSubquery, inSubquery, and Exists. The scalarSubquery refers to the returned data as 1 row and 1 column. **Subquery replacement** ``` before: scalarSubquery: filter(t1.a = scalarSubquery(output b)); inSubquery: filter(inSubquery); inSubquery = (t1.a in select ***); exists: filter(exists); exists = (select ***); end: scalarSubquery: filter(t1.a = b); inSubquery: filter(True); exists: filter(True); ``` **Subquery Transformation Rules** ``` PushApplyUnderFilter * before: * Apply * / \ * Input(output:b) Filter(Correlated predicate/UnCorrelated predicate) * * after: * Filter(Correlated predicate) * | * Apply * / \ * Input(output:b) Filter(UnCorrelated predicate) ``` ``` PushApplyUnderProject * before: * Apply * / \ * Input(output:b) Project(output:a) * * after: * Project(b,(if the Subquery is Scalar add 'a' as the output column)) * / \ * Input(output:b) Apply ``` ``` ApplyPullFilterOnAgg * before: * Apply * / \ * Input(output:b) agg(output:fn,c; group by:null) * | * Filter(Correlated predicate(Input.e = this.f)/UnCorrelated predicate) * * end: * Apply(Correlated predicate(Input.e = this.f)) * / \ * Input(output:b) agg(output:fn,this.f; group by:this.f) * | * Filter(UnCorrelated predicate) ``` ``` ApplyPullFilterOnProjectUnderAgg * before: * apply * / \ * Input(output:b) agg * | * Project(output:a) * | * Filter(correlated predicate(Input.e = this.f)/Unapply predicate) * | * child * apply * / \ * Input(output:b) agg * | * Filter(correlated predicate(Input.e = this.f)/Unapply predicate) * | * Project(output:a,this.f, Unapply predicate(slots)) * | * child ``` ``` ScalarToJoin * UnCorrelated -> CROSS_JOIN * Correlated -> LEFT_OUTER_JOIN ``` ``` InToJoin * Not In -> LEFT_ANTI_JOIN * In -> LEFT_SEMI_JOIN ``` ``` existsToJoin * Exists * Correlated -> LEFT_SEMI_JOIN * correlated LEFT_SEMI_JOIN(Correlated Predicate) * / \ --> / \ * input queryPlan input queryPlan * * UnCorrelated -> CROSS_JOIN(limit(1)) * uncorrelated CROSS_JOIN * / \ --> / \ * input queryPlan input limit(1) * | * queryPlan * * Not Exists * Correlated -> LEFT_ANTI_JOIN * correlated LEFT_ANTI_JOIN(Correlated Predicate) * / \ --> / \ * input queryPlan input queryPlan * * UnCorrelated -> CROSS_JOIN(Count(*)) * Filter(count(*) = 0) * | * apply Cross_Join * / \ --> / \ * input queryPlan input agg(output:count(*)) * | * limit(1) * | * queryPlan ```
# Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # fe-common This module is used to store some common classes of other modules. # spark-dpp This module is Spark DPP program, used for Spark Load function. Depends: fe-common # fe-core This module is the main process module of FE. Depends: fe-common, spark-dpp