Commit Graph

4377 Commits

Author SHA1 Message Date
c98829c94b [improvement](load) log time consumed by waiting flush (#19226) 2023-05-03 17:48:13 +08:00
145b94531f [Fix](load) fix request_slave_tablet_pull_rowset get wrong url in case of ipv6 address (#19026) 2023-05-02 09:55:09 +08:00
b0c215e694 [enhance](be)add more profile in prefetched buffered reader (#19119) 2023-05-02 09:53:39 +08:00
eac61dc410 [vectorized](function) add some check about result type in array map (#19228) 2023-05-01 16:28:11 +08:00
a978be32a6 [fix](schema_change) remove shadow prefix of schema for tablesink (#18822)
LSC updates tablet's schema in writing. Be optimized adding columns via linked schema change and
it distinguishes adding by comparing column name. e.g. if new column's name is not found in old schema,
then it is a newly-add column.

When a table is under schema-changing, it adds __doris_shadow_ prefix in name of columns in shadow index.
Then  writes during schema-changing would bring schema with __doris_shadow_ to be.
If schema change request arrives at be after writes, then be do it as a add-column schema change due to 
__doris_shadow_ is not in base tablet.
2023-04-30 22:46:36 +08:00
8eab20d3df [bugfix](low cardinality) cached code is wrong will result wrong query result when many null pages (#19221)
Sometimes the dict is not initialized when run comparison predicate here, for example, the full page is null, then the reader will skip read, so that the dictionary is not inited. The cached code is wrong during this case, because the following page maybe not null, and the dict should have items in the future.
This will result the dict string column query return wrong result, if there are many null values in the column.
I also add some regression test for dict column's equal query, larger than query, less than query.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-29 21:28:41 +08:00
d383f1f3d7 [optimization](simd) optimize count_zero_num for ColumnNullable #19124 2023-04-29 14:50:39 +08:00
c74c2a4f8e [fix](Metadata tvf) Metadata TVF supports read the specified columns from Fe (#19110) 2023-04-29 00:06:08 +08:00
4a10d146bf [pipeline](exec) fix regression prepare failed cause query core dump (#19208)
fix regression prepare failed cause query core dump
2023-04-28 20:46:39 +08:00
bee3aa3007 be conf action supports specify item (#19159) 2023-04-28 19:12:51 +08:00
a324ee794c [fix](memory) Fix Aggregation null key memory leak due to incorrect aggfunc destroy #19201 2023-04-28 18:41:41 +08:00
1379d7f3e0 [fix](memory) mmap threshold can be modified in conf, Increase to 128M 2023-04-28 18:17:22 +08:00
6626f26506 [optimize](string) optimize char_length function by SIMD (#18925)
Optimize char_length function by SIMD
(1) optimize utf8_len compute
(2) 840% up
2023-04-28 17:22:35 +08:00
aef9355cd3 [feature-wip](partial update) PART1: support basic partial write (#17542) 2023-04-28 17:17:57 +08:00
Pxl
ec517a53a8 [Chore](build) upgrade clang-format version to 16 && move thrift to fe-common (#19155)
upgrade clang-format version to 16
move thrift to fe-common
fix core dump on pipeline engine when operator canceled and not prepared
2023-04-28 14:14:51 +08:00
52b1bd2c81 [clone](download) fix be clone action download tablet content length overflow (#18851) 2023-04-28 11:35:17 +08:00
65a82a0b57 [opt](FileReader) turn off prefetch data in parquet page reader when using MergeRangeFileReader (#19102)
Using both `MergeRangeFileReader` and `BufferedStreamReader` simultaneously would waste a lot of memory,
so turn off prefetch data in `BufferedStreamReader` when using MergeRangeFileReader.
2023-04-28 09:27:56 +08:00
28016c53f0 [profile](rf) refactor profile of runtime filters (#19134)
* [profile](rf) refactor profile of runtime filters


---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-04-28 08:46:42 +08:00
3ed5cf8350 [Optimize] add has_filter template param in get_next_run() to decrease _has_filter condition checking count in the loop. (#19043) 2023-04-27 21:23:36 +08:00
e4f7d77c5c [Optimize](parquet-reader) Opt by filtering null count statistics in rowgroup and page level. (#19106)
Issue Number: About #19038, we found in this case, l_orderkey has many nulls,
so we can filter it by null count statistics in the row group and page level,
then it can improve a lot of performance in this case.
2023-04-27 21:21:30 +08:00
95d91e7010 [bugfix](txn_manager) use write lock to protect txn_tablet_map (#19161) 2023-04-27 20:21:20 +08:00
9e2b118288 [RegressTest](Exec) Add DCHECK null_aware_left_anti_join in mark join (#19149) 2023-04-27 17:52:03 +08:00
f23c93b3c6 [fix](memory) Fix AggFunc memory leak due to incorrect destroy (#19126) 2023-04-27 14:58:32 +08:00
98a975b013 [fix](memory) Fix SchemaChange memory leak due to incorrect aggfunc destroy (#19130) 2023-04-27 14:44:00 +08:00
8412571030 [fix](memleak) avoid memleak due to race condition (#19071) 2023-04-27 14:22:09 +08:00
68d3111629 [bugfix](topn) fix memory leak in topn AcceptNullPredicate (#19060)
fix the memory leak reported by ASAN as follows.
2023-04-27 14:07:57 +08:00
b9855a6e29 [Fix](inverted index) fix memory leak for inverted index (#19008)
forget to delete handler->_shared_lock
2023-04-27 11:53:55 +08:00
e76b3a316f [Bug](mysql proto) fix binary proto with dynamic mode (#19055)
Dynamic mode used in array type when serialize it to mysql row buffer using dynamic mode, when combine binary row format with dynamic mode,something goes wrong, and lead to invalid binary row format.
2023-04-27 11:18:01 +08:00
84d040bdbf [fix](heartbeat) fix update BE last start time (#18962)
Sometimes the LastStartTime info in show backends result is unchanged even if BE restart.
This PR fix it
2023-04-27 09:59:04 +08:00
20395ce501 [feature](array_function): add support for array_cum_sum function (#18231) 2023-04-27 09:57:13 +08:00
6eb12640a1 [fix](segment_iter) do not init segment_iterator twice (#18337)
* [fix](segment_iter) do not init segment_iterator twice

SegmentIterator::init is called by Segment::new_iterator and
BetaRowsetReader::get_segment_iterators twice.
2023-04-27 09:51:57 +08:00
a262f42a28 [refactor](exceptionsafe) make scanner and scancontext exception safe (#19057) 2023-04-27 09:23:01 +08:00
d12fe4a7d2 [bug](fix)fix Geo memory leak (#19116) 2023-04-27 09:04:10 +08:00
925efc1902 [bug](map-type)fix some bugs in map and map element function (#18935)
fix some bugs in map and map element function.
2023-04-26 22:10:15 +08:00
aabcab9dbe [Improvement](runtime filter) Improve merge phase (#18828) 2023-04-26 21:01:20 +08:00
1ccbdee757 [FIX](map-type)fix map regress test & create mapTypeInfo without delete #19033 2023-04-26 19:03:55 +08:00
a32fa219ec Revert "[Enhancement](compaction) stop tablet compaction when table dropped (#18702)" (#19086)
This reverts commit 296b0c92f702675b92eee3c8af219f3862802fb2.

we can use drop table force stmt to fast drop tablets, no need to check tablet dropped state in every report

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-04-26 18:27:46 +08:00
Pxl
60cda12e57 [Bug](pipeline-engine) fix hang on insert into select when enable pipeline engine (#19075) 2023-04-26 16:50:19 +08:00
e1651bfea5 [bugfix](aggregate_function) Fix wrong registration for percentile_approx #19070 2023-04-26 16:17:46 +08:00
1dfc5ea34c [bugfix](jsonb) fix jsonb parser crash on noavx2 host (#18977)
support avx2 and noavx2 for jsonb parser using __AVX2__ macro.
2023-04-26 15:10:12 +08:00
94b11af17c [fixbug](json-reader) fix memory leak of new_json_reader #19067 2023-04-26 12:54:47 +08:00
5bd4a3897e [optimize](multi-catalog) Skip whole row group in lazy_read if data has been filtered. (#19039)
We found qt_q11 in regression test test_external_catalog_hive is very slow.
The result is only one record, so other data should be filtered out in the parquet lazy read situation.
Then we found currently the parquet reader read many records because we can only skip parquet page. But in order to skip parquet page, currently we need to read page header, then it will caused prefetch data. Therefore, prefetch data in this case may be not good.

So there are two issues:

Skip whole row group in this case.
Prefetching data in this case may be not good, need to improve it.
This PR resolve issues 1.
2023-04-26 12:10:14 +08:00
375789d345 [enhancement](JNI) Provide default environment variables if it is unset (#19041) 2023-04-26 12:06:38 +08:00
5fd6d8ebd4 [fix](function) Support more behaviors of cast time in MySQL 2023-04-26 07:49:54 +08:00
2c836251b2 [Fix](schema scanner) Fixed the problem of overflow when multiplying two INT 2023-04-25 23:58:47 +08:00
1be5dac036 [improve] Refactor file cache and Improve the file cache strategy (#18652)
1. Refactor file cache. Before refactor, the file cache config format is "[{"path":"/path/to/file_cache","normal":21474836480,"persistent":10737418240,"query_limit":10737418240}]" and now change to "[{"path":"/mnt/disk3/selectdb_cloud/file_cache","total_size":21474836480,"query_limit":10737418240}]". It will be simpler than before.
2. Support more strategy. Support file cache priority. The file cache will have three queue,  name as 'index'/'normal'/'disposable'. We can avoid that the higher priority data is eliminate by the lower priority data.
2023-04-25 23:14:28 +08:00
17b59df8dd [fix](function) Array_map compared offset rows one by one (#18406)
Array_map 's multi columns compare not only nested data rows to be equal,but also the offsets data must equal each other.
2023-04-25 19:12:19 +08:00
fa0f3a2859 [fix](planner) vdatetime_value.cpp:1585 Array access may overflow. (#18872)
int64_t months = _year * 12 + _month - 1 + sign * (12 * interval.year + interval.month);
    _year = months / 12;
    if (_year > 9999) {
        return false;
    }
    _month = (months % 12) + 1;
    if (_day > s_days_in_month[_month]) {
        _day = s_days_in_month[_month];
        if (_month == 2 && doris::is_leap(_year)) {
            _day++;
        }
    }
The variable "months" may be negative. Taking modulus with it (_month) may also result in a negative value, which can cause an array access overflow.
2023-04-25 17:57:21 +08:00
8d21f20753 [enhancement](javaudf) not depend on parent will cause deconstructor core (#18948)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-25 15:26:54 +08:00
339d804ec4 [Refactor](exceptionsafe) add factory creator to some class (#19000) 2023-04-25 14:33:47 +08:00