Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
TabletSink and LoadChannel in BE are M: N relationship,
Every once in a while LoadChannel will randomly return its own runtime profile to a TabletSink, so usually all LoadChannel runtime profiles are saved on each TabletSink, and the timeliness of the same LoadChannel profile saved on different TabletSinks is different, and each TabletSink will periodically send fe reports all the LoadChannel profiles saved by itself, and ensures to update the latest LoadChannel profile according to the timestamp.
The function compoundReader->openInput is called three times, and if any of these calls fail,
an error is logged, and the function returns early. If one or two of the calls succeed, but the others fail,
there might be a situation where the allocated memory for the IndexInput objects is not freed.
To fix this, you could use std::unique_ptr to manage the memory for IndexInput objects.
This would automatically clean up the memory when the function goes out of scope.
Formerly S3FileWriter has to write each buffer with 5MB or more then upload one part, after all these works are done it could then process the incoming data, it's blocking and inefficient. This pr brings one bufferpool where the data could write into memory buffer immediately if has free buffer and then it would be uploaded into the S3.
This pr doesn't provide the ability to elegantly support cases where there is no free buffer, i'll leave it as one future work.
Add `MergeRangeFileReader` to merge small IO to optimize parquet&orc read performance.
`MergeRangeFileReader` is a FileReader that efficiently supports random access in format like parquet and orc.
In order to merge small IO in parquet and orc, the random access ranges should be generated when creating the
reader. The random access ranges is a list of ranges that order by offset.
The range in random access ranges should be reading sequentially, can be skipped, but can't be read repeatedly.
When calling read_at, if the start offset located in random access ranges, the slice size should not span two ranges.
For example, in parquet, the random access ranges is the column offsets in a row group.
When reading at offset, if [offset, offset + 8MB) contains many random access ranges,
the reader will read data in [offset, offset + 8MB) as a whole, and copy the data in random access ranges into small
buffers(name as box, default 1MB, 64MB in total). A box can be occupied by many ranges,
and use a reference counter to record how many ranges are cached in the box. If reference counter equals zero,
the box can be release or reused by other ranges. When there is no empty box for a new read operation,
the read operation will do directly.
## Effects
The runtime of ClickBench reduces from 102s to 77s, and the runtime of Query 24 reduces from 24.74s to 9.45s.
The profile of Query 24:
```
VFILE_SCAN_NODE (id=0):(Active: 8s344ms, % non-child: 83.06%)
- FileReadBytes: 534.46 MB
- FileReadCalls: 1.031K (1031)
- FileReadTime: 28s801ms
- GetNextTime: 8s304ms
- MaxScannerThreadNum: 12
- MergedSmallIO: 0ns
- CopyTime: 157.774ms
- MergedBytes: 549.91 MB
- MergedIO: 94
- ReadTime: 28s642ms
- RequestBytes: 507.96 MB
- RequestIO: 1.001K (1001)
- NumScanners: 18
```
1001 request IOs has been merged into 94 IOs.
## Remaining problems
1. Add p2 regression test in nest PR
2. Profiles are scattered in various codes and will be refactored in the next PR
3. Support ORC reader
* [bugfix](memleak) UserFunctionCache may have memory leak during close
* [bugfix](memleak) UserFunctionCache may have memory leak during close
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
* [bugfix](memoryleak) inlist is memory leak if the type is int
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add unexpected/result support
* Rename result.hpp -> result.h && Add NOLINT in expected.hpp
* Add NOLINT in result.h to avoid clang-tidy checker
* Rename result.h to expected.h
* Add Apache License for be/src/util/expected.hpp
* Disable clang-format in be util/expected.hpp
disallow call new method explicitly
force to use create_shared or create_unique to use shared ptr
placement new is allowed
reference https://abseil.io/tips/42 to add factory method to all class.
I think we should follow this guide because if throw exception in new method, the program will terminate.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
`Export` syntax provides asynchronous export function, but `Export` does not achieve vectorization.
`Outfile` syntax provides synchronous export function`.
So we can reimplement the export syntax with oufile syntax.
Fix decimal v3 precision loss issues in the multi-catalog module.
Now it will use decimal v3 to represent decimal type in the multi-catalog module.
Regression Test: `test_load_with_decimal.groovy`
Optimize instr and locate function for constant arguments.
instr and locate function constant arguments has 58%~200% performance improvement.
refactor locate(substr, str, pos) as standardized arguments processing.
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.