In previous, the counter in `profile` may be updated when close the file reader.
And the file reader may be closed when the object being deconstruted.
But at that time, the `profile` object may already be deleted, causing NPE and BE will crash.
This PR try to fix this issue:
1. Remove the "profile counter update" logic from all `close()` method.
2. Add a new interface `ProfileCollector`
It has 2 methods:
- `collect_profile_at_runtime()`
It can be called at runtime, eg, in every `get_next_block()` method.
So that the counter in profile can be updated at runtime.
- `collect_profile_before_close()`
Should be called before the object call `close()`. And it will only be called once.
3. Derived from `ProfileCollector`
All classes which may update the profile counter in `close()` method should extends
the `ProfileCollector`. Such as `GenericReader`, etc. And implement `collect_profile_before_close()`
And `collect_profile_before_close()` will be called in `scanner->mark_to_need_to_close()`.
A similar bug compares to #22140 .
When executing a query with hms catalog, the query maybe failed because some hdfs files are not existed. We should just distinguish this kind of errors and skip it.
```
errCode = 2, detailMessage = (xxx.xxx.xxx.xxx)[CANCELLED][INTERNAL_ERROR]failed to init reader for file hdfs://xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc, err: [INTERNAL_ERROR]Init OrcReader failed. reason = Failed to read hdfs://xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc: [INTERNAL_ERROR]Read hdfs file failed. (BE: xxx.xxx.xxx.xxx) namenode:hdfs://xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc, err: (2), No such file or directory), reason: RemoteException: File does not exist: /xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:158) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:426)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
```
Sometimes, the partitions of a hive table may on different storage, eg, some is on HDFS, others on object storage(cos, etc).
This PR mainly changes:
1. Fix the bug of accessing files via cosn.
2. Add a new field `fs_name` in TFileRangeDesc
This is because, when accessing a file, the BE will get a hdfs client from hdfs client cache, and different file in one query
request may have different fs name, eg, some of are `hdfs://`, some of are `cosn://`, so we need to specify fs name
for each file, otherwise, it may return error:
`reason: IllegalArgumentException: Wrong FS: cosn://doris-build-1308700295/xxxx, expected: hdfs://[172.xxxx:4007](http://172.xxxxx:4007/)`
In some cases, the high load of HDFS may lead to a long time to read the data on HDFS,
thereby slowing down the overall query efficiency. HDFS Client provides Hedged Read.
This function can start another read thread to read the same data when a read request
exceeds a certain threshold and is not returned, and whichever is returned first will use the result.
eg:
create catalog regression properties (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.16.47:7004',
'dfs.client.hedged.read.threadpool.size' = '128',
'dfs.client.hedged.read.threshold.millis' = "500"
);
Refactor the interface of create_file_reader
the file_size and mtime are merged into FileDescription, not in FileReaderOptions anymore.
Now the file handle cache can get correct file's modification time from FileDescription.
Add HdfsIO for hdfs file reader
pick from [Enhancement](multi-catalog) Add hdfs read statistics profile. #21442
1. Add hdfs file handle cache for hdfs file reader
Copied from Impala, `https://github.com/apache/impala/blob/master/be/src/util/lru-multi-cache.h`. (Thanks for the Impala team)
This is a lru cache that can store multi entries with same key.
The key is build with {file name + modification time}
The value is the hdfsFile pointer that point to a certain hdfs file.
This cache is to avoid reopen same hdfs file mutli time, which can save
query time.
Add a BE config `max_hdfs_file_handle_cache_num` to limit the max number
of file handle cache, default is 20000.
2. Add file meta cache
The file meta cache is a lru cache. the key is {file name + modification time},
the value is the parsed file meta info of the certain file, which can save
the time of re-parsing file meta everytime.
Currently, it is only used for caching parquet file footer.
The test show that is cache is hit, the `FileOpenTime` and `ParseFooterTime` is reduce to almost 0
in query profile, which can save time when there are lots of files to read.
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
1. Introduce hadoop libhdfs
2. For Linux-X86 platform, use the hadoop libhdfs
3. For other platform, use libhdfs3, because currently we don't have hadoop libhdfs binary for other platform
Co-authored-by: adonis0147 <adonis0147@gmail.com>
See #17764 for details
I have tested:
- Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp
- Outfile to local/s3/hdfs/broker.
- Load from local/s3/hdfs/broker.
- Query file on local/s3/hdfs/broker file system, with table value function and catalog.
- Backup/Restore with local/s3/hdfs/broker file system
Not test:
- cold & host data separation case.
The main purpose of this pr is to import `fileCache` for lakehouse reading remote files.
Use the local disk as the cache for reading remote file, so the next time this file is read,
the data can be obtained directly from the local disk.
In addition, this pr includes a few other minor changes
Import File Cache:
1. The imported `fileCache` is called `block_file_cache`, which uses lru replacement policy.
2. Implement a new FileRereader `CachedRemoteFilereader`, so that the logic of `file cache` is hidden under `CachedRemoteFilereader`.
Other changes:
1. Add a new interface `fs()` for `FileReader`.
2. `IOContext` adds some statistical information to count the situation of `FileCache`
Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>
Currently, there are two sets of file readers in Doris, this pr rewrites the old broker reader with the new file reader.
TODO:
1. rewrite stream load pipe and kafka consumer pipe