Follow https://github.com/apache/doris/pull/25302, and use the unified jni framework to refactor java udaf.
This PR has removed the old interfaces to run java udf/udaf. Thanks to the ease of use of the new framework, the core code for modifying UDAF does not exceed 100 lines, and the logic is similar to that of UDF.
This function requires one arguments just as ARRAY_AGG(col) and col means the column whose values you want to aggregate.
This function Aggregates the values including NULL in a column into an array and returns a value of the ARRAY data type.
This function can be used to replace bitmap_union(to_bitmap(expr)), because bitmap_union(to_bitmap(expr)) need create many many small bitmaps firstly and then merge them into a single bitmap.
bitmap_agg will convert the column value into a bitmap directly. Its performance is better than bitmap_union(to_bitmap(expr)) . In our test , there is about 30% improvement.
count_by_enum(expr1, expr2, ... , exprN);
Treats the data in a column as an enumeration and counts the number of values in each enumeration. Returns the number of enumerated values for each column, and the number of non-null values versus the number of null values.
The original logic is to first deserialize the ColumnString into a HashSet (insert the deserialized elements into the hashset), and then traverse all the HashSet elements into the target HashSet during the merge phase.
After optimization, when deserializing, elements are directly inserted into the target HashSet, thereby reducing unnecessary hashset insert overhead.
In one of our internal query tests, 30 hashsets were merged in second phase aggregation(the average cardinality is 1,400,000), and the cardinality after merging is 42,000,000. After optimization, the MergeTime dropped from 5s965ms to 3s375ms.
New aggregation function: map_agg.
This function requires two arguments: a key and a value, which are used to build a map.
select map_agg(column1, column2) from t group by column3;
in order to solve agg of sum/count is not compatibility during the upgrade process.
in PR [refactor](agg_state) refactor agg_state type to support fixed length object type #20370 have changed the serialize type and serialize column of sum/count
before is ColumnVector, now sum/count change to use ColumnFixedLengthObject
so during the upgrade process, will be not compatible if exist Old BE and Newer BE