The SQL `SELECT nationkey FROM regression_test_query_p0_limit.tpch_tiny_nation ORDER BY nationkey DESC LIMIT 5`
make be core dump since dereference a nullptr `read_orderby_key_columns in VCollectIterator::_topn_next`,
triggered by skipping _colname_to_value_range init in #16818 .
This PR makes two changes:
1. avoid read_orderby_key_columns nullptr in TabletReader::_init_orderby_keys_param
2. return error if read_orderby_key_columns is nullptr unexpected in VCollectIterator::_topn_next to avoid core dump
1、support stream load with json, csv format for map
2、fix olap convertor when compaction action in map column which has null
3、support select outToFile for map
4、add some regression-test
The background is described in this issue: #15723,
where users used Apache Druid to satisfy such lambada requirements before.
We will not make Doris dropping data not belonged to current time window automatically like Druid,
which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way:
1. Support mutable property for a partition.
2. The mutable property of a partition is passed from FE to BE in a load procedure
3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio',
so that data write to immutable partition will be neglected and not cause load failure.
Use Example:
1. Add immutable partition or modify an partition to be immutable:
- alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true');
- alter table test_tbl modify partition xx set ('mutable' = 'false');
2. Write 5 records into table, two of then belongs to immutable partition
Introduced a new function non_nullable to BE, which can extract concrete data column from a nullable column. If the input argument is already not a nullable column, raise an error.
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
Logic in function VCollectIterator::build_heap is not robust, which may cause memory leak:
Level1Iterator* cumu_iter = new Level1Iterator(
cumu_children, _reader, cumu_children.size() > 1, _is_reverse, _skip_same);
RETURN_IF_NOT_EOF_AND_OK(cumu_iter->init());
std::list<LevelIterator*> children;
children.push_back(*base_reader_child);
children.push_back(cumu_iter);
_inner_iter.reset(
new Level1Iterator(children, _reader, _merge, _is_reverse, _skip_same));
cumu_iter will be leaked if cumu_iter->init()); is not success.
The element in InvertedIndexSearcherCache is inverted index searcher, which is a file descriptor of inverted index file, so InvertedIndexSearcherCache is actually cache file descriptor of inverted index file.
If open file descriptor limit of the Linux system is set too small and config inverted_index_searcher_cache_limit is too big, during high pressure load maybe cause "Too many open files".
So, when insert inverted index searcher into InvertedIndexSearcherCache, need also check whether reach file_descriptor_number limit for inverted index file.
There is a bug in inverted_index_writer when adding multiple lines array values' index.
This problem can cause error result when doing schema change adding index.
* [improve](dynamic table) refine SegmentWriter columns writer generate
```
Dynamic Block consists of two parts, dynamic part of columns and static part of columns
static dynamic
| ----- | ------- |
the static ones are original _tablet_schame columns
the dynamic ones are auto generated and extended from file scan.
```
**We should only consisder to use Block info to generte columns when it's a dynamic table load procudure.**
And seperate the static ones and dynamic ones
* test
To avoid data irrecoverable due to delete bitmap calculation error,do compaction with merge on read. Through this way ,even if the delete bitmap calculation is wrong, the data can be recovered by full compaction.
Save cached file segment into path like `cache_path / hash(filepath).substr(0, 3) / hash(filepath) / offset`
to prevent too many directories in `cache_path`.
Set column names from path to lower case in case-insensitive case.
This is for Iceberg columns from path. Iceberg columns are case sensitive,
which may cause error for table with partitions.
Now we use a thrift message per fragment instance. However, there are many same messages between instances in a fragment. So this PR aims to extract the same messages and we only need to send thrift message once for a fragment
To support schema evolution, Iceberg add schema information to Parquet file metadata.
But for early iceberg version, it doesn't write any schema information to Parquet file.
This PR is to support read parquet without schema information.
Now we reuse buffer pool for broadcast shuffle on pipeline engine. This PR ensures that a pipeline with a broadcast shuffle sink will not be scheduled if there are no available buffer in the buffer pool
Support IPV6 in Apache Doris, the main changes are:
1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string
2. BRPC and HTTP support binding to IPV6 address
3. BRPC and HTTP support visiting IPV6 Services
Hive 1.x may write orc file with internal column name (_col0, _col1, _col2...).
This will cause query result be NULL because column name in orc file doesn't match
with column name in Doris table schema. This pr is to support query Hive orc files with internal column names.
For now, we haven't see any problem in Parquet file, will send new pr to fix parquet if any problem show up in the future.
* [improvement](block exception safe) make block queue exception safe
This is part of exception safe: #16366.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
In our current logic, index page will be pre-decoded but it will return OK
as index page use BinaryPlainPageBuilder and first 4 bytes of the page is a offset
so it's high probablility not equal to EncodingTypePB::DICT_ENCODING which
is 5.
Code in bitshuffle_page_pre_decode.h
```
if constexpr (USED_IN_DICT_ENCODING) {
auto type = decode_fixed32_le((const uint8_t*)&data.data[0]);
if (static_cast<EncodingTypePB>(type) != EncodingTypePB::DICT_ENCODING) {
return Status::OK();
}
size_of_dict_header = BINARY_DICT_PAGE_HEADER_SIZE;
data.remove_prefix(4);
}
```
But if type just equal to EncodingTypePB::DICT_ENCODING and then it will use
BitShuffle to decode BinaryPlainPage, which will leads to an fatal error.