To support schema evolution, Iceberg add schema information to Parquet file metadata.
But for early iceberg version, it doesn't write any schema information to Parquet file.
This PR is to support read parquet without schema information.
Now we reuse buffer pool for broadcast shuffle on pipeline engine. This PR ensures that a pipeline with a broadcast shuffle sink will not be scheduled if there are no available buffer in the buffer pool
Support IPV6 in Apache Doris, the main changes are:
1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string
2. BRPC and HTTP support binding to IPV6 address
3. BRPC and HTTP support visiting IPV6 Services
Hive 1.x may write orc file with internal column name (_col0, _col1, _col2...).
This will cause query result be NULL because column name in orc file doesn't match
with column name in Doris table schema. This pr is to support query Hive orc files with internal column names.
For now, we haven't see any problem in Parquet file, will send new pr to fix parquet if any problem show up in the future.
* [improvement](block exception safe) make block queue exception safe
This is part of exception safe: #16366.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
In our current logic, index page will be pre-decoded but it will return OK
as index page use BinaryPlainPageBuilder and first 4 bytes of the page is a offset
so it's high probablility not equal to EncodingTypePB::DICT_ENCODING which
is 5.
Code in bitshuffle_page_pre_decode.h
```
if constexpr (USED_IN_DICT_ENCODING) {
auto type = decode_fixed32_le((const uint8_t*)&data.data[0]);
if (static_cast<EncodingTypePB>(type) != EncodingTypePB::DICT_ENCODING) {
return Status::OK();
}
size_of_dict_header = BINARY_DICT_PAGE_HEADER_SIZE;
data.remove_prefix(4);
}
```
But if type just equal to EncodingTypePB::DICT_ENCODING and then it will use
BitShuffle to decode BinaryPlainPage, which will leads to an fatal error.
Use FE cluster token to auth stream load.
This auth is only open for be, and fe auth still only support http basic auth.
I will use this auth for mysql load to build a no-auth stream load from fe to be.
And this will avoid double auth in mysql load.
More information to see the design doc.
Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows.
Tests result:
- large result: matching 99% out of 247 million rows shows 8x speed up.
- small result: matching 0.1% out of 247 million rows shows 2x speed up.
Issue Number: close#16351
Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
Currently batch_size is not passed on to SegmentIterator, the SegmentIterator uses the hard coded value 4096 - 32 as the max row count of a block.
* fix bug
multiget_data working in bthread and may block the whole worker pthread of BRPC framework and effect other bthreads, so I seperate work task into a seperate task pool.
when pg table have some unsupported column type like: point, polygon, jsonb......
jdbc catalog will convert it to string type in doris. but get result set in java is org.postgresql.util.PGobject
Some test need this pr: #16442