Commit Graph

17652 Commits

Author SHA1 Message Date
ea8d4f2d0b [fix][regression]update ccr test project (#32445) 2024-03-21 14:07:24 +08:00
50c247e08c [fix](snapshot-loader) Fix be crash caused by deref end() iterator (#32489)
The standard said that the input parameter `pos` of std::vector::erase
must be valid and dereferenceable, the `end()` iterator cannot be used
as a value of `pos`. I did some tests and the crash only occurs when the
vector is empty. Fortunately `local_files` is usually not empty.
2024-03-21 14:07:24 +08:00
612d3595e4 [improvement](spill) optimize the spilling logic of hash join operator (#32202) 2024-03-21 14:07:24 +08:00
e892774c9a [improvement](agg) streaming agg should not take too much memory when spilling enabled (#32426) 2024-03-21 14:07:24 +08:00
7484a7ba5f [fix](broker load) improve the checking of overlapping partitions of same table (#32254) 2024-03-21 14:07:24 +08:00
2196c534e8 [fix](group commit) Fix compatibility issues on serializing and deserializing wal file (#32299) 2024-03-21 14:07:24 +08:00
2057886d30 [fix](group commit) Fix invalid function problem on p2 regression-test (#32481) 2024-03-21 14:07:24 +08:00
3c377a8957 [fix](group commit) Fix group commit connect to observer fe (#32222) 2024-03-21 14:07:24 +08:00
14c9537679 [fix](decimal) fix Arithmetic Overflow error of converting string to decimal (#32246) 2024-03-21 14:07:24 +08:00
ab512f935c [pipelineX](api) Add api for long-running tasks (#32459) 2024-03-21 14:07:24 +08:00
66fe61b591 [fix](nereids)support topn-filter for non pipeline engine #32397 2024-03-21 14:07:24 +08:00
f99db38998 [fix](ParquetReader) Fix Parquet Reader to read int96 parquet type problem (#32394)
`hi - JULIAN_EPOCH_OFFSET_DAYS` could be negative, so we can't all use unsigned int.
2024-03-21 14:07:24 +08:00
e541ca9f11 [Fix](Job)When jobname is the do keyword, parsing errors will occur when executing SQL. (#32379) 2024-03-21 14:07:24 +08:00
725f86a27b [fix](group commit) Fix p2 regression-test (#32270) 2024-03-21 14:07:24 +08:00
0635a8716c [improve](group commit) Group commit support chunked stream load in flink (#32135) 2024-03-21 14:07:24 +08:00
d640c54b80 [fix](regression) prepare_insert failed when connect to observer fe (#32223) 2024-03-21 14:07:24 +08:00
7422f185da [Fix](smooth-upgrade) Fix incompatibility when upgrade from 2.0 to 2.1 (#32444) 2024-03-21 14:07:24 +08:00
715eed0748 [opt](like) opt LIKE and REGEXP clause with concat(col, pattern_str) (#32333)
opt LIKE and REGEXP clause with concat(col, pattern_str)
2024-03-21 14:07:24 +08:00
c45e2f3e6f fix routine load regression test fail (#32406) 2024-03-21 14:07:24 +08:00
6ea8e51261 [Performance](join) speed up the colocate and bucket shuffle join by change rf size (#32421) 2024-03-21 14:07:24 +08:00
73de61ed84 [opt](hive) skip hidden file and dir (#32412)
When query hive table, we should skip all hidden dirs and files, like:
```
/visible/.hidden/path
/visible/.hidden.txt
```
2024-03-21 14:07:24 +08:00
a5f3611b88 [Fix](Regression) DCHECK failed in runtime filter wrapper (#32446) 2024-03-21 14:07:23 +08:00
7a0b591b8f [FIX](array_agg) fix array agg with other agg function (#32387)
fix array agg with other agg function
2024-03-21 14:07:23 +08:00
a0a3a2a2ce [Fix](Variant) fix variant with not null (#32248)
ignore null bitmap for not null and make subcolumn access slots always nullable
2024-03-21 14:07:23 +08:00
6aec479013 [fix](planner)decimalv3 literal's precision and scale is not correctly set (#32288) 2024-03-21 14:07:23 +08:00
353add74db [chore](ci) fix script (#32420)
Co-authored-by: stephen <hello-stephen@qq.com>
2024-03-21 14:07:23 +08:00
590e1d52ec [pipelineX](streaming agg) Fix wrong columns produced by streaming agg (#32411)
* [pipelineX](streaming agg) Fix wrong columns produced by streaming agg

* update
2024-03-21 14:07:23 +08:00
6c8b5bb26f [fix](feut) comment out doc gen execution (#32413)
Followup #32384
After doc is removed, the doc generator should be skipped, or FE ut can not run
2024-03-21 14:07:23 +08:00
99b8db5f9d [Chore](tools) update ssb tools (#32308) 2024-03-21 14:07:23 +08:00
4bf5a21ba3 [pipelineX](cancel) Remove lock for mapping query ctx to fragment (#32346) 2024-03-21 14:07:23 +08:00
32f7f0b50c [enhancement](test)unique model by modify a value type from SMALLINT to other type (#32348)
* [enhancement](test)unique model by modify a key type from SMALLINT to other type

* [enhancement](test)unique model by modify a value type from SMALLINT to other type
2024-03-21 14:07:23 +08:00
b66840efd7 [Fix](regression test) Fix <=> rf cause regresion test failed (#32377) 2024-03-21 14:07:23 +08:00
e8475a527b [regression-test]( fix case ) fix case that using same table in one db with another case (#32380) 2024-03-21 14:07:23 +08:00
a4151e022e [bug](fold) fix fold constant rule can't handle variable expr (#32313) 2024-03-21 14:07:23 +08:00
74445065ab [docs](MoveRepo) Update .asf.yaml (#32391) 2024-03-21 14:07:23 +08:00
fdcf5b7d34 [enhancement](dict) check valid of offset in page (#32349) 2024-03-21 14:07:23 +08:00
26ed4b69b1 [opt](jdbc catalog) filter jdbc datasource internal database (#32294) 2024-03-21 14:07:23 +08:00
e952b5ef5b [opt](jdbc catalog) Refine the jdbc_connector close logic and actively clear the jvm occupied by jdbcexecutor (#32300) 2024-03-21 14:07:23 +08:00
f132c9b2c6 [Improve](spark-load)update spark version for spark load to resolve cve problem (#30368) 2024-03-21 14:07:23 +08:00
4d4cd43458 [Fix](Nereids) fix leading syntax problems and data mismatched problem (#32286)
- fix syntax problems of only one table used in leading or mistake usage of brace
  example: leading(t1),leading(t1 {t2})
- fix cte used in subquery of using leading
  example: with cte as (select c1 from t1) select count(*) from t1 join (select /*+ leading(cte t2) */ c2 from t2 join cte on c2 = cte.c1) as alias on t1.c1 = alias.c2;
  which cte used in subquery and subquery also have leading
- fix data mismatched with original plan cause of on predicate push to nullable side
  example: select count(*) from t1 left join t2 on c1 > 500 and c2 > 500 can not change to select count(*) from t1 left join t2 on c2 > 500 where c1 > 500
2024-03-21 14:07:23 +08:00
fab48f54b1 [enhancement](nereids)simplify OneRowRelation scalar subquery (#32276)
select count() from t where dt > (select '2024-02-02 00:00:00');
-->
select count() from t where dt > '2024-02-02 00:00:00';
2024-03-21 14:07:23 +08:00
163007a665 [fix](grouping sets) fix grouping sets have multiple empty sets (#32317)
in this #32112, handling empty sets (empty expression cases) has been addressed. However, multiple empty sets in grouping sets have different grouping IDs
2024-03-21 14:07:22 +08:00
403820599d [bug](inverted index) fix npe of InvertedIndexStorageFormat in table property (#32357)
Fix problem that if fe upgrade from a older version, it has error like:

```
MySQL [test]> show full tables;
ERROR 1105 (HY000): NullPointerException, msg: java.lang.NullPointerException: Cannot invoke "org.apache.doris.thrift.TInvertedIndexStorageFormat.toString()" because the return value of "org.apache.doris.catalog.OlapTable.getInvertedIndexStorageFormat()" is null
```
2024-03-21 14:07:22 +08:00
e99b33c274 [opt](file-meta-cache) reduce file meta cache size and disable cache for some cases (#32340)
File meta cache on BE is used to cache the meta for external table's file such as parquet footer.
This cache is counted by number, not memory consumption.
So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache
will be large and causing OOM.

This PR mainly changes:

1. Add a new method `exceed_prune_limit()` for `CachePolicy`
    For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time.

2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

4. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.
2024-03-21 14:07:22 +08:00
279ea2f366 [feature](proxy-protocol) Support proxy protocol v1 (#32338)
Enable proxy protocol to support IP transparency.
See: `IP Transparency` in f57387b502/docs/en/docs/admin-manual/cluster-management/load-balancing.md
for details
2024-03-21 14:07:22 +08:00
3da8e4b04a [chore](build) delete palo_be soft link (#32353) 2024-03-21 14:07:22 +08:00
2e564036ef [fix](profile) avoid update profile in deconstructor (#32131)
In previous, the counter in `profile` may be updated when close the file reader.
And the file reader may be closed when the object being deconstruted.
But at that time, the `profile` object may already be deleted, causing NPE and BE will crash.

This PR try to fix this issue:

1. Remove the "profile counter update" logic from all `close()` method.

2. Add a new interface `ProfileCollector`

	It has 2 methods:
	
	- `collect_profile_at_runtime()`

		It can be called at runtime, eg, in every `get_next_block()` method.
		So that the counter in profile can be updated at runtime.
		
	- `collect_profile_before_close()`

		Should be called before the object call `close()`. And it will only be called once.
		
3. Derived from `ProfileCollector`

	All classes which may update the profile counter in `close()` method should extends
	the `ProfileCollector`. Such as `GenericReader`, etc. And implement `collect_profile_before_close()`
	
	And `collect_profile_before_close()` will be called in `scanner->mark_to_need_to_close()`.
2024-03-21 14:07:22 +08:00
8bd101129a [behavior change](output) change float output format (#32049) 2024-03-21 14:07:22 +08:00
7874edf992 [doc](ranger)change path of access_controller.class (#32138) 2024-03-21 14:07:19 +08:00
724bc82362 [refactor](chore) replace HashMapWithStackMemory with std::unordered_map (#32309) 2024-03-21 14:07:19 +08:00