This PR implement the new bloom filter index: NGram bloom filter index, which was proposed in #10733.
The new index can improve the like query performance greatly, from our some test case , can get order of magnitude improve.
For how to use it you can check the docs in this PR, and the index based on the ```enable_function_pushdown```,
you need set it to ```true```, to make the index work for like query.
Upgrade simdjson from 1.0.2 to latest version 3.0.1 to avoid -mlzcnt compiler flag causing BE UT(macOS) failure.
simdjson is now only used by VJsonScanner and disabled by default. So the impact of upgrade is limited.
Original: group by is bound to the outputExpression of the current node.
Problem: When the name of the new reference of outputExpression is the same as the child's output column, the child's output column should be used for group by, but at this time, the new reference of the node's outputExpression will be used for group by, resulting in an error
Now: Give priority to the child's output for group by binding. If the child does not have a corresponding column, use the outputExpression of this node for binding
When light schema change is enabled by default (#15344), regression tests that run SQL by selecting data from the materialized index will fail.
This PR disabled those failed queries in the regression test. Those tests would be added back when nereids planner could give the correct plan when light schema change is enabled.
Support return bitmap data in select statement in vectorization mode
In the scenario of using Bitmap to circle people, users need to return the Bitmap results to the upper layer, which is parsing the contents of the Bitmap to deal with high QPS query scenarios
SELECT 2 FROM tbl GROUP BY 1
it should produce 2 would the table is not empty when table is not empty. Before this PR, the execution of nereids generated plan would produce empty result set
The join node need project operation to remove unnecessary columns from the output tuples.
For SetOperationNode output tuple and input tuple is consistent and do not need project,
but the children of SetOperationNode may be join nodes, so the children of the SetOperationNode
need to do the project operation.
1. Fix 1 bug:
Throw null pointer exception when reading data after the reader reaches the end of file, so should return directly when `_do_lazy_read` read no data.
2. Optimize code:
Remove unused parameters.
3. Fix regression test
Add a new config "jdbc_drivers_dir" for both FE and BE.
User can put jdbc drivers' jar file in this dir, and only specify file name in "driver_url" properties
when creating jdbc resource.
And Doris will find jar files in this dir.
Also modify the logic so that when the jdbc resource is modified, the corresponding jdbc table
will get the latest properties.
In InferPredicates, we need pull predicates from project children then use sid replace id1.
In our code, use alias name as key, use expression as value to build map. Obviously, sid has two alias name(id1,id2) so throw Duplicate key exception.