Make `Need_2_Approval` check as required.
After this PR merged, all PRs need at least 2 approvals to merge.
One must be committer, the other can be anyone.
Add an optional executable binary fs_benchmark_tool, for test the performance of file system such as hdfs, s3.
Usage:
./fs_benchmark_tool --conf my.conf --fs_type=s3 --operation=read --iterations=5
in my.conf, you can add any config key value with following format:
key1=value1
key2=value2
By default, this binary will not be built. Only build it when setting BUILD_FS_BENCHMARK=ON.
The binary will be installed in output/be/lib.
For developer, you can add new subclass of BaseBenchmark to add your own benchmark.
See be/src/io/fs/benchmark/s3_benchmark.hpp for an example
Support match syntax in nereids.
match syntax use like:
```sql
select * from test where msg match "hello";
select * from test where msg match_any "hello";
select * from test where msg match_all "hello hi";
select * from test where msg match_phrase "hello world";
```
`match` is same as `match_any`.
the pr of match syntax in original planner: https://github.com/apache/doris/pull/14211
This PR calculates the size of the inverted index files. The changes consist of:
Introduction of a new get_inverted_index_size() method in different column writers such as ScalarColumnWriter, StructColumnWriter, ArrayColumnWriter, and MapColumnWriter. This method will fetch the size of the inverted index file associated with that column. If the file size cannot be fetched, it defaults to 0.
A new method file_size() has been added in InvertedIndexColumnWriter class which retrieves the size of the file stored on disk. If the file size cannot be fetched, it logs an error and returns -1.
Additionally, a new method get_inverted_index_file_size() is introduced in SegmentWriter which aggregates the inverted index file sizes of all the column writers.
hudi serializer `org.apache.hudi.common.util.SerializationUtils$KryoInstantiator.newKryo` throws error like `java.lang.IllegalArgumentException: classLoader cannot be null`. Set the default class loader for scan thread.
```
public Kryo newKryo() {
Kryo kryo = new Kryo();
...
// Thread.currentThread().getContextClassLoader() returns null
kryo.setClassLoader(Thread.currentThread().getContextClassLoader());
...
return kryo;
}
```
The changes in this PR:
1. rename BatchRewriteJob to AbstractBatchJobExecutor
2. add a new rewrite job type, CostBasedRewriteJob. It receive a RewriteJob as input, compare the cost of two candidate plans using or not using the input RewriteJob and return the lower cost plan as the rewrite result.
3. do some small refactor on NereidsPlanner for better abstraction
4. do some refactor on dir structure of Nereids
The usage of cbo rewrite framework:
if you want let a rule or a rule list to be run in cbo rewrite frame work, you just need to wrap the rule / rule list with costBased function of class Rewriter, for example
```java
...
costBased(
custom(RuleType.AGG_SCALAR_SUBQUERY_TO_WINDOW_FUNCTION,
AggScalarSubQueryToWindowFunction::new)
),
...
```
* [Bug](topn opt) Fix Two-Phase read when some rowset swept
If this is a Two-Phase read query, and we need to delay the release of Rowset by row->update_delayed_expired_timestamp() to expand the lifespan of rowsets. This is necessary to avoid data loss during the second phase reading, where some stale rowsets may be swept and result in missing data.
finally {
sql """ DROP MATERIALIZED VIEW ${testMv} ON ${testTable} """
sql """ DROP TABLE ${testTable} """
sql """ DROP DATABASE ${testDb} """
}
in this case, there maybe some error before create materialized view. so when they failed, drop materialized view will be executed, but it does not created at that time. This will cause another exception, and the real failure will be hiden by regression-test.
As we know, log4j2 some times may be bottleneck in doris fe when there are many logs to be output in sync mode while asynchronous logging has a better performance, and we find that capturing caller location has a similar impact across all logging libraries, and slows down asynchronous logging by about 30-100x. so, here we provide three log mode for log4j2 to meet the needs of different users.
refer to https://logging.apache.org/log4j/2.x/performance.html
1. cast string literal to date like type should not be an implict cast
2. the string representation of float like type should not be scientific notation
3. the data type of like function's regex expr should be string type even if it's a null literal
4. add -Xss4m in fe.conf to prevent stack overflow in some case
When FE is old version, be is new version, issue a schema change(add column) and
then query, old version of FE query without schema version could result in reading
stale schema from schema cache
1. Add hdfs file handle cache for hdfs file reader
Copied from Impala, `https://github.com/apache/impala/blob/master/be/src/util/lru-multi-cache.h`. (Thanks for the Impala team)
This is a lru cache that can store multi entries with same key.
The key is build with {file name + modification time}
The value is the hdfsFile pointer that point to a certain hdfs file.
This cache is to avoid reopen same hdfs file mutli time, which can save
query time.
Add a BE config `max_hdfs_file_handle_cache_num` to limit the max number
of file handle cache, default is 20000.
2. Add file meta cache
The file meta cache is a lru cache. the key is {file name + modification time},
the value is the parsed file meta info of the certain file, which can save
the time of re-parsing file meta everytime.
Currently, it is only used for caching parquet file footer.
The test show that is cache is hit, the `FileOpenTime` and `ParseFooterTime` is reduce to almost 0
in query profile, which can save time when there are lots of files to read.
This PR addresses the refactoring of common methods that were originally located within the ODBC classes, but were used by the JDBC classes. These methods have now been moved to the JDBC classes to improve code readability and maintainability.
In addition, we have disabled the creation of ODBC external tables by default. However, this will not affect the existing usage of ODBC. You can still enable the ODBC external tables through the enable_odbc_table setting. Please be aware that we plan to completely remove the ODBC external tables in future versions, so we recommend using the JDBC Catalog as a priority.