LSC updates tablet's schema in writing. Be optimized adding columns via linked schema change and
it distinguishes adding by comparing column name. e.g. if new column's name is not found in old schema,
then it is a newly-add column.
When a table is under schema-changing, it adds __doris_shadow_ prefix in name of columns in shadow index.
Then writes during schema-changing would bring schema with __doris_shadow_ to be.
If schema change request arrives at be after writes, then be do it as a add-column schema change due to
__doris_shadow_ is not in base tablet.
Sometimes the dict is not initialized when run comparison predicate here, for example, the full page is null, then the reader will skip read, so that the dictionary is not inited. The cached code is wrong during this case, because the following page maybe not null, and the dict should have items in the future.
This will result the dict string column query return wrong result, if there are many null values in the column.
I also add some regression test for dict column's equal query, larger than query, less than query.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
Using both `MergeRangeFileReader` and `BufferedStreamReader` simultaneously would waste a lot of memory,
so turn off prefetch data in `BufferedStreamReader` when using MergeRangeFileReader.
Issue Number: About #19038, we found in this case, l_orderkey has many nulls,
so we can filter it by null count statistics in the row group and page level,
then it can improve a lot of performance in this case.
Dynamic mode used in array type when serialize it to mysql row buffer using dynamic mode, when combine binary row format with dynamic mode,something goes wrong, and lead to invalid binary row format.
* [fix](segment_iter) do not init segment_iterator twice
SegmentIterator::init is called by Segment::new_iterator and
BetaRowsetReader::get_segment_iterators twice.
This reverts commit 296b0c92f702675b92eee3c8af219f3862802fb2.
we can use drop table force stmt to fast drop tablets, no need to check tablet dropped state in every report
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
We found qt_q11 in regression test test_external_catalog_hive is very slow.
The result is only one record, so other data should be filtered out in the parquet lazy read situation.
Then we found currently the parquet reader read many records because we can only skip parquet page. But in order to skip parquet page, currently we need to read page header, then it will caused prefetch data. Therefore, prefetch data in this case may be not good.
So there are two issues:
Skip whole row group in this case.
Prefetching data in this case may be not good, need to improve it.
This PR resolve issues 1.
1. Refactor file cache. Before refactor, the file cache config format is "[{"path":"/path/to/file_cache","normal":21474836480,"persistent":10737418240,"query_limit":10737418240}]" and now change to "[{"path":"/mnt/disk3/selectdb_cloud/file_cache","total_size":21474836480,"query_limit":10737418240}]". It will be simpler than before.
2. Support more strategy. Support file cache priority. The file cache will have three queue, name as 'index'/'normal'/'disposable'. We can avoid that the higher priority data is eliminate by the lower priority data.
int64_t months = _year * 12 + _month - 1 + sign * (12 * interval.year + interval.month);
_year = months / 12;
if (_year > 9999) {
return false;
}
_month = (months % 12) + 1;
if (_day > s_days_in_month[_month]) {
_day = s_days_in_month[_month];
if (_month == 2 && doris::is_leap(_year)) {
_day++;
}
}
The variable "months" may be negative. Taking modulus with it (_month) may also result in a negative value, which can cause an array access overflow.