Commit Graph

7088 Commits

Author SHA1 Message Date
50c247e08c [fix](snapshot-loader) Fix be crash caused by deref end() iterator (#32489)
The standard said that the input parameter `pos` of std::vector::erase
must be valid and dereferenceable, the `end()` iterator cannot be used
as a value of `pos`. I did some tests and the crash only occurs when the
vector is empty. Fortunately `local_files` is usually not empty.
2024-03-21 14:07:24 +08:00
612d3595e4 [improvement](spill) optimize the spilling logic of hash join operator (#32202) 2024-03-21 14:07:24 +08:00
e892774c9a [improvement](agg) streaming agg should not take too much memory when spilling enabled (#32426) 2024-03-21 14:07:24 +08:00
2196c534e8 [fix](group commit) Fix compatibility issues on serializing and deserializing wal file (#32299) 2024-03-21 14:07:24 +08:00
14c9537679 [fix](decimal) fix Arithmetic Overflow error of converting string to decimal (#32246) 2024-03-21 14:07:24 +08:00
ab512f935c [pipelineX](api) Add api for long-running tasks (#32459) 2024-03-21 14:07:24 +08:00
f99db38998 [fix](ParquetReader) Fix Parquet Reader to read int96 parquet type problem (#32394)
`hi - JULIAN_EPOCH_OFFSET_DAYS` could be negative, so we can't all use unsigned int.
2024-03-21 14:07:24 +08:00
0635a8716c [improve](group commit) Group commit support chunked stream load in flink (#32135) 2024-03-21 14:07:24 +08:00
7422f185da [Fix](smooth-upgrade) Fix incompatibility when upgrade from 2.0 to 2.1 (#32444) 2024-03-21 14:07:24 +08:00
715eed0748 [opt](like) opt LIKE and REGEXP clause with concat(col, pattern_str) (#32333)
opt LIKE and REGEXP clause with concat(col, pattern_str)
2024-03-21 14:07:24 +08:00
6ea8e51261 [Performance](join) speed up the colocate and bucket shuffle join by change rf size (#32421) 2024-03-21 14:07:24 +08:00
a5f3611b88 [Fix](Regression) DCHECK failed in runtime filter wrapper (#32446) 2024-03-21 14:07:23 +08:00
7a0b591b8f [FIX](array_agg) fix array agg with other agg function (#32387)
fix array agg with other agg function
2024-03-21 14:07:23 +08:00
a0a3a2a2ce [Fix](Variant) fix variant with not null (#32248)
ignore null bitmap for not null and make subcolumn access slots always nullable
2024-03-21 14:07:23 +08:00
590e1d52ec [pipelineX](streaming agg) Fix wrong columns produced by streaming agg (#32411)
* [pipelineX](streaming agg) Fix wrong columns produced by streaming agg

* update
2024-03-21 14:07:23 +08:00
4bf5a21ba3 [pipelineX](cancel) Remove lock for mapping query ctx to fragment (#32346) 2024-03-21 14:07:23 +08:00
b66840efd7 [Fix](regression test) Fix <=> rf cause regresion test failed (#32377) 2024-03-21 14:07:23 +08:00
fdcf5b7d34 [enhancement](dict) check valid of offset in page (#32349) 2024-03-21 14:07:23 +08:00
e952b5ef5b [opt](jdbc catalog) Refine the jdbc_connector close logic and actively clear the jvm occupied by jdbcexecutor (#32300) 2024-03-21 14:07:23 +08:00
163007a665 [fix](grouping sets) fix grouping sets have multiple empty sets (#32317)
in this #32112, handling empty sets (empty expression cases) has been addressed. However, multiple empty sets in grouping sets have different grouping IDs
2024-03-21 14:07:22 +08:00
e99b33c274 [opt](file-meta-cache) reduce file meta cache size and disable cache for some cases (#32340)
File meta cache on BE is used to cache the meta for external table's file such as parquet footer.
This cache is counted by number, not memory consumption.
So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache
will be large and causing OOM.

This PR mainly changes:

1. Add a new method `exceed_prune_limit()` for `CachePolicy`
    For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time.

2. Reduce the default capability of file meta cache, from 20000 to 1000

    Also change the default capability of hdfs file handle cache, from 20000 to 1000

4. Change judgement of whether enable file meta cache when querying

    If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
    will be disabled for this query. Because cache is useless if there are too many files.
2024-03-21 14:07:22 +08:00
2e564036ef [fix](profile) avoid update profile in deconstructor (#32131)
In previous, the counter in `profile` may be updated when close the file reader.
And the file reader may be closed when the object being deconstruted.
But at that time, the `profile` object may already be deleted, causing NPE and BE will crash.

This PR try to fix this issue:

1. Remove the "profile counter update" logic from all `close()` method.

2. Add a new interface `ProfileCollector`

	It has 2 methods:
	
	- `collect_profile_at_runtime()`

		It can be called at runtime, eg, in every `get_next_block()` method.
		So that the counter in profile can be updated at runtime.
		
	- `collect_profile_before_close()`

		Should be called before the object call `close()`. And it will only be called once.
		
3. Derived from `ProfileCollector`

	All classes which may update the profile counter in `close()` method should extends
	the `ProfileCollector`. Such as `GenericReader`, etc. And implement `collect_profile_before_close()`
	
	And `collect_profile_before_close()` will be called in `scanner->mark_to_need_to_close()`.
2024-03-21 14:07:22 +08:00
8bd101129a [behavior change](output) change float output format (#32049) 2024-03-21 14:07:22 +08:00
724bc82362 [refactor](chore) replace HashMapWithStackMemory with std::unordered_map (#32309) 2024-03-21 14:07:19 +08:00
fd1345bef0 fix load channel may memory leak (#32277) 2024-03-21 14:07:19 +08:00
0990014e94 [fix](datetime) fix datetime rounding on BE (#32075) 2024-03-21 14:07:19 +08:00
b5ab1159bb [Enhancement](inverted index) make compiler happy (#32332) 2024-03-21 14:07:19 +08:00
85b2c42f76 [Enhancement](jdbc catalog) Add a property to test the connection when creating a Jdbc catalog (#32125) (#32531) 2024-03-21 14:05:59 +08:00
27973b6999 [fix](schema-change) fix the bug of handling empty blocks in schema change (#32396)
* [fix](schema-change) fix the bug of handling empty blocks in schema change

* add case
2024-03-19 22:12:26 +08:00
9eb2f90e27 [Optimize](inverted index) optimize inverted index bitmap copy (#32279) (#32469) 2024-03-19 17:28:59 +08:00
ecadb60bcd [Pick 2.1](inverted index) support inverted index format v2 (#30145) (#32418) 2024-03-19 08:11:33 +08:00
ef2151ae66 [Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306) (#32364)
bp #32306
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-03-18 11:23:01 +08:00
2add3bc13a [fix](partial update) compaction may cause update failue (#31551) (#32361) 2024-03-18 10:58:51 +08:00
4bf202db04 [pipelineX](exchange) Make exchange buffer size configurable (#32201) 2024-03-16 20:58:20 +08:00
5ceccb5ba5 [fix](compatibility) should enable windown funnel mode from 2.0 (#32284) 2024-03-16 20:56:16 +08:00
c5ffeff833 [fix](s3 client)add default ca cert list for s3 client to avoid problem:'curlCode:77' (#32285)
Co-authored-by: ryanzryu <ryanzryu@tencent.com>
2024-03-16 20:55:28 +08:00
83ab61ad22 Add QUEUE_START_TIME/QUEUE_END_TIME/QUERY_STATUS column for active_queries (#32259) 2024-03-16 20:53:46 +08:00
258dcfca97 [Refactor](executor)Add information_schema.workload_groups (#32195) (#32314) 2024-03-15 20:46:54 +08:00
720aaf9dd6 fix compile 2024-03-15 18:13:41 +08:00
e3bb499cc6 [fix](function)revert function REPEAT nullable mode #32226 2024-03-15 18:06:28 +08:00
97b35d6830 [fix](nereids)AssertNumRow node's output should be nullable (#32136)
Co-authored-by: Co-Author Jerry Hu <mrhhsg@gmail.com>
2024-03-15 18:06:28 +08:00
9c1888e7ec [RuntimeFilter](exec) support min max runtime filter and do refactor (#32210) 2024-03-15 18:06:20 +08:00
8d988930bd [Fix](segment write) handle variant bloom filter in segment writer (#32011) 2024-03-15 18:06:20 +08:00
04a59d6071 [improve](distinct agg) add check of hash table to decide whether emplace value (#32063)
* [improve](distinct agg) add check of hash table to emplace value
2024-03-15 18:06:15 +08:00
Pxl
5e4da61df9 [Bug](top-n) do not get runtime predicate when predicate not initialized (#32208) 2024-03-15 18:06:15 +08:00
687ab1a3e1 [fix](ip) conversion of ipv4 or ipv6 to arrow type #32240 2024-03-15 18:05:35 +08:00
c8f3643890 [exec](runtimefilter) support null aware in runtime filter (#32152)
null aware in runtime filter
2024-03-15 18:05:13 +08:00
aca7328109 [bugfix]json_length() BE crash fix (#32145)
Co-authored-by: Rohit Satardekar <rohitrs1983@gmail.com>
2024-03-15 18:04:49 +08:00
62023d705d [refactor](rename) rename task group to workload group in be (#32204)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-03-15 18:04:02 +08:00
0578b28d54 [fix](function) fixed the get_json_string function (#32150) 2024-03-15 18:04:02 +08:00