1. Fix the issue with tvf reading empty compressed files.
2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0
Issue Number: close#29406
1. increase lzop version to 0x1040,
I set to 0x1040 only for decompressing lzo files compressed by higher version of lzop,
no change of decompressing logic,
actully, 0x1040 should have "F_H_FILTER" feature,
but it mainly for audio and image data, so we do not support it.
2. use orc::lzoDecompress() instead of lzo1x_decompress_safe() to decompress lzo data
3. use crc32c::Extend() instead of lzo_crc32()
4. use olap_adler32() instead of lzo_adler32()
5. thus, remove dependency of Markus F.X.J. Oberhumer's lzo library
6. remove DORIS_WITH_LZO, so lzo file are supported by stream and broker load by default
7. add some regression test
Fixed the problem of not being able to read parquet lz4 compressed format. By default, it is decompressed according to the Hadoop lz4 format. If it fails, it will fall back to the standard lz4 compression format.
1. do not split compress data file
Some data file in hive is compressed with gzip, deflate, etc.
These kinds of file can not be splitted.
2. Support lz4 block codec
for hive scan node, use lz4 block codec instead of lz4 frame codec
4. Support snappy block codec
For hadoop snappy
5. Optimize the `count(*)` query of csv file
For query like `select count(*) from tbl`, only need to split the line, no need to split the column.
Need to pick to branch-2.0 after this PR: #22304
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
* update gcc to gcc 10 and support c++17
update brpc to 0.9.7
update boost to 1.73
remove third-party boost 1.54 for mysql
* update cmake version
* ignore jdk version
* remove unused patch
* avoid use SYS_getrandom call
At present, the application of vlog in the code is quite confusing.
It is inherited from impala VLOG_XX format, and there is also VLOG(number) format.
VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG