The original Doris bitmap aggregation function has poor performance on the intersection and union set of bitmap cardinality of more than one billion. There are two reasons for this. The first is that when the bitmap cardinality is large, if the data size exceeds 1g, the network / disk IO time consumption will increase; The second point is that all the sink data of the back-end be instance are transferred to the top node for intersection and union calculation, which leads to the pressure on the top single node and becomes the bottleneck.
My solution is to create a fixed schema table based on the Doris fragmentation rule, and hash fragment the ID range based on the bitmap, that is, cut the ID range vertically to form a small cube. Such bitmap blocks will become smaller and evenly distributed on all back-end be instances. Based on the schema table, some new high-performance udaf aggregation functions are developed. All Scan nodes participate in intersection and union calculation, and top nodes only summarize
The design goal is that the base number of bitmap is more than 10 billion, and the response time of cross union set calculation of 100 dimensional granularity is within 5 s.
There are three udaf functions in this commit: orthogonal_bitmap_intersect_count, orthogonal_bitmap_union_count, orthogonal_bitmap_intersect.
This PR is to support grammar like the following: INSTALL PLUGIN FROM [source] [PROPERTIES("KEY"="VALUE", ...)]
user can set md5sum="xxxxxxx", so we don't need to provide a md5 uri.
1. Split /_cluster/state into /_mapping and /_search_shards requests to reduce permissions and make the logic clearer
2. Rename part es related objects to make their representation more accurate
3. Simply support docValue and Fields in alias mode, and take the first one by default
#3311
This plugin is used to output data to Doris for logstash
Use the HTTP protocol to interact with the Doris FE Http interface
Load data through Doris's stream load
Mainly changes:
1. Shade and provide the thrift lib in spark-doris-connector
2. Add a `build.sh` for spark-doris-connector
3. Move the README.md of spark-doris-connector to `docs/`
4. Change the line delimiter of `fe/src/test/java/org/apache/doris/analysis/AggregateTest.java`