Commit Graph

3143 Commits

Author SHA1 Message Date
Pxl
f17ac173b4 [Improvementation](join) empty_block shall be set true when build block only one row (#33721)
empty_block shall be set true when build block only one row
2024-04-18 19:05:17 +08:00
ea19224d14 [exec](table_fun) opt numbers table func performance (#33804) 2024-04-18 19:04:03 +08:00
b518b9dd15 [shuffle](minor) Log error status if exchange is shutdown early (#33748) 2024-04-17 23:42:14 +08:00
89c4fa5a75 [fix](move-memtable) close wait on all sinks (#33710) 2024-04-17 23:42:13 +08:00
6976d019a3 [opt](inverted index) topn opt reads only limit number of records (#33665) 2024-04-17 23:42:13 +08:00
22a6b1d3f5 [feature](function) support hll functions hll_from_base64, hll_to_base64 (#32089)
Issue Number: #31320 

Support two hll functions:

- hll_from_base64
Convert a base64 string(result of function hll_to_base64) into a hll.
- hll_to_base64
Convert an input hll to a base64 string.
2024-04-17 23:42:13 +08:00
3096150d1b [feature](agg) support aggregate function group_array_intersect (#33265) 2024-04-17 23:42:13 +08:00
07a8f44443 [improvement](spill) improve config and fix spill bugs (#33519) 2024-04-17 23:42:13 +08:00
b07e0a2f06 [FIX](cast)fix full/right out join for cast array (#33475)
in some case, we has code
```
        if (_join_op == TJoinOp::RIGHT_OUTER_JOIN || _join_op == TJoinOp::FULL_OUTER_JOIN) {
            _probe_column_convert_to_null = _convert_block_to_null(*input_block);
        }
```
then do next function like cast , but in function cast we assume block column is same with from_type.which will make status error
2024-04-17 23:42:13 +08:00
775022c204 [refactor](pipelineX) Reduce prepare overhead (PART II) (#33681) 2024-04-17 23:42:13 +08:00
59de97be5e [improvement](mow) Add profile for delete_bitmap get_agg function (#33576) 2024-04-17 23:42:13 +08:00
4863167f90 [refactor](pipelineX) Reduce prepare overhead (PART I) (#33550) 2024-04-17 23:42:12 +08:00
Pxl
341cb40693 [Chore](log) adjust output order on PrintInstanceStandardInfo and reduce warning log when rpc finished (#33652)
adjust output order on PrintInstanceStandardInfo and reduce warning log when rpc finished
2024-04-17 23:42:12 +08:00
5b616da543 [refine](Operator) When _stop_emplace_flag is not set to true, perform batch processing on the block. (#33173) 2024-04-17 23:42:12 +08:00
3df8f0cad8 [improve](move-memtable) add more info in LoadStreamStub errors (#33618) 2024-04-17 23:42:12 +08:00
06a155abb0 [branch-2.1](cherry-pick) Pick some partial-update PR from master (#33639)
* [Fix](partial-update) Fix partial update fail when the datetime default value is 'current_time' (#32926)

* Problem: When importing data that includes datetime with a default value of current time for partial column updates, the import fails.
Reason: Partial column updates do not handle the logic for datetime default values.
Solution: During partial column updates, when the default value is set to current time, read the current time from the runtime state and write it into the data.

* [Enhancement](partial update)Add timezone case for partial update timestamp #33177

* [fix](partial update) Support partial update when the date default value is 'current_date'. This PR is a extension of PR #32926. (#33394)
2024-04-17 23:42:12 +08:00
2cd4012541 [opt](scan) read scan ranges in the order of partitions (#33515) (#33657)
backport: #33515
2024-04-17 23:42:12 +08:00
e7209d9a85 [fix](merge-iterator) Fix mem leak when get next batch failed (#33627) 2024-04-17 23:42:12 +08:00
4740b22481 [fix](test) fix some p2 external table test cases (#33624)
bp #33621
Also fix a merge bug from #33245
2024-04-17 23:42:12 +08:00
8ee8de7857 [Fix](executor)reset remote scan thread num #33579 2024-04-17 23:42:11 +08:00
48880c3e1a [Fix](timezone) fix miss of expected rounding of Date type with timezone #33553 2024-04-17 23:42:11 +08:00
d1a68b8c42 [enhancement](merge-iterator) catch exception to avoid coredump when copy_rows (#33567) 2024-04-17 23:42:00 +08:00
ae68cca07d [fix](schema change) CastStringConverter is compiled failed in g++ (#33546)
follow #32873, CastStringConverter is compiled failed in g++ for uninitialized value, which is ok in clang:
2024-04-17 23:42:00 +08:00
249a9c9875 [Feature](Variant) support aggregation model for Variant type (#33493)
refactor use `insert_from` to replace `replace_column_data` for variable lengths columns
2024-04-17 23:42:00 +08:00
50b64a111d [refactor](heap sort) Simplify sorted block view (#33477) 2024-04-17 23:42:00 +08:00
6bcf24b1f6 [bug](not in) if not in (null) could eos early (#33482)
* [bug](not in) if not in (null) could eos early
2024-04-17 23:41:59 +08:00
272269f9c1 [Fix](inverted index) fix fast execute problem when need read data opt enabled (#33526) 2024-04-17 23:41:59 +08:00
fefbde8927 [log](move-memtable) improve logs in vtablet_writer_v2 and load_stream (#33103) 2024-04-12 15:09:25 +08:00
9b7af4c0cf [feature](schema change) unified schema change for parquet and orc reader (#32873)
Following #25138, unified schema change interface for parquet and orc reader, and can be applied to other format readers as well.
Unified schema change interface for all format readers:
- First, read the data according to the column type of the file into source column;
- Second, convert source column to the destination column with type planned by FE.
2024-04-12 15:09:25 +08:00
Pxl
5f30463bb3 [Chore](descriptors) remove unused codes for descriptors (#33408)
remove unused codes for descriptors
2024-04-12 15:09:25 +08:00
26d9082b9a [Feature](function) Add function strcmp (#33272) 2024-04-12 15:09:25 +08:00
31984bb4f0 [feature](function) support quote string function #33055 2024-04-12 15:09:25 +08:00
f0463a9034 [Feature][Enhancement](hive-writer) Add hive-writer runtime profiles, change output file names (#33245)
Issue Number: #31442

- Add hive-writer runtime profiles.
- Change output file names to `${query_id}${uuid}-${index}.${compression}.${format}`. e.g. `"d8735c6fa444a6d-acd392981e510c2b_34fbdcbb-b2e1-4f2c-b68c-a384238954a9-0.snappy.parquet"`. For the same partition writer, when the file size exceeds `hive_sink_max_file_size`, the currently written file will be closed and a new file will be generated, in which ${index} in the new file name will be incremented, while the rest will be the same .
2024-04-12 10:43:16 +08:00
3c9c6c18a8 [Enhancement](hive-writer) Write only regular fields to file in the hive-writer. (#33000) 2024-04-12 10:29:08 +08:00
3d66723214 [branch-2.1](auto-partition) pick auto partition and some more prs (#33523) 2024-04-11 17:12:17 +08:00
f7d52b5b1c [feature](expr) add type check when expr prepare (#33330) 2024-04-11 09:31:50 +08:00
ef26479282 [improve](serde) support complex type in write/read pb serde (#33124)
support complex type and ip/jsonb in DataTypeSerDe::write_column_to_pb/read_column_from_pb function
2024-04-11 09:31:50 +08:00
ea1e542e31 [fix](partial-update) remove unnecessary DECHEK on IndexChannel::num_rows_filtered (#33160) 2024-04-11 09:31:50 +08:00
Pxl
3081fc584d [Improvement](runtime-filter) support sync join node build side's size to init bloom runtime filter (#32180)
support sync join node build side's size to init bloom runtime filter
2024-04-11 09:31:50 +08:00
e2ad7149c3 [feature](debug point) Add handler to debug point (#33350) 2024-04-10 16:24:13 +08:00
0e262ba0e4 [improvement](spill) improve cancel of spill and improve log printing (#33229)
* [improvement](spill) improve cancel of spill and improve log printing

* fix
2024-04-10 16:23:20 +08:00
28acfaed2b [fix](pipeline)group by and output is empty (#33192) 2024-04-10 16:23:20 +08:00
d667df2d06 [improvement](spill) avoid unnecessary spilling in hash join build phase (#33277) 2024-04-10 16:21:50 +08:00
Pxl
8fd6d4c41b [Chore](build) add -Wconversion and remove some unused code (#33127)
add -Wconversion and remove some unused code
2024-04-10 15:26:08 +08:00
cc363f26c2 [fix](Nereids) fix group concat (#33091)
Fix failed in regression_test/suites/query_p0/group_concat/test_group_concat.groovy

select
group_concat( distinct b1, '?'), group_concat( distinct b3, '?')
from
table_group_concat
group by
b2

exception:

lowestCostPlans with physicalProperties(GATHER) doesn't exist in root group

The root cause is '?' is push down to slot by NormalizeAggregate, AggregateStrategies treat the slot as a distinct parameter and generate a invalid PhysicalHashAggregate, and then reject by ChildOutputPropertyDeriver.

I fix this bug by avoid push down literal to slot in NormalizeAggregate, and forbidden generate stream aggregate node when group by slots is empty
2024-04-10 14:59:46 +08:00
6c5dd820c0 [improvement](spill) improve spill timers (#33156) 2024-04-10 14:55:11 +08:00
c61d6ad1e2 [Feature] support function uuid_to_int and int_to_uuid #33005 2024-04-10 14:53:56 +08:00
bf022f9d8d [enhancement](function truncate) truncate can use column as scale argument (#32746)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-04-10 14:53:56 +08:00
8b1d174b13 [Optimize] Move strings_pool from individual tree nodes to the tree itself (#33089)
Previously, strings_pool was allocated within each tree node. However, due to the Arena's alignment of allocated chunks to at least 4K, this allocation size was excessively large for a single tree node. Consequently, when there are numerous nodes within the SubcolumnTree, a significant portion of memory was wasted. Moving strings_pool to the tree itself optimizes memory usage and reduces wastage, improving overall efficiency.
2024-04-10 14:53:56 +08:00
3b42dc73af [improvement](spill) avoid spill if memory is enough (#33075) 2024-04-10 14:53:27 +08:00