Commit Graph

3135 Commits

Author SHA1 Message Date
b07e0a2f06 [FIX](cast)fix full/right out join for cast array (#33475)
in some case, we has code
```
        if (_join_op == TJoinOp::RIGHT_OUTER_JOIN || _join_op == TJoinOp::FULL_OUTER_JOIN) {
            _probe_column_convert_to_null = _convert_block_to_null(*input_block);
        }
```
then do next function like cast , but in function cast we assume block column is same with from_type.which will make status error
2024-04-17 23:42:13 +08:00
775022c204 [refactor](pipelineX) Reduce prepare overhead (PART II) (#33681) 2024-04-17 23:42:13 +08:00
59de97be5e [improvement](mow) Add profile for delete_bitmap get_agg function (#33576) 2024-04-17 23:42:13 +08:00
4863167f90 [refactor](pipelineX) Reduce prepare overhead (PART I) (#33550) 2024-04-17 23:42:12 +08:00
Pxl
341cb40693 [Chore](log) adjust output order on PrintInstanceStandardInfo and reduce warning log when rpc finished (#33652)
adjust output order on PrintInstanceStandardInfo and reduce warning log when rpc finished
2024-04-17 23:42:12 +08:00
5b616da543 [refine](Operator) When _stop_emplace_flag is not set to true, perform batch processing on the block. (#33173) 2024-04-17 23:42:12 +08:00
3df8f0cad8 [improve](move-memtable) add more info in LoadStreamStub errors (#33618) 2024-04-17 23:42:12 +08:00
06a155abb0 [branch-2.1](cherry-pick) Pick some partial-update PR from master (#33639)
* [Fix](partial-update) Fix partial update fail when the datetime default value is 'current_time' (#32926)

* Problem: When importing data that includes datetime with a default value of current time for partial column updates, the import fails.
Reason: Partial column updates do not handle the logic for datetime default values.
Solution: During partial column updates, when the default value is set to current time, read the current time from the runtime state and write it into the data.

* [Enhancement](partial update)Add timezone case for partial update timestamp #33177

* [fix](partial update) Support partial update when the date default value is 'current_date'. This PR is a extension of PR #32926. (#33394)
2024-04-17 23:42:12 +08:00
2cd4012541 [opt](scan) read scan ranges in the order of partitions (#33515) (#33657)
backport: #33515
2024-04-17 23:42:12 +08:00
e7209d9a85 [fix](merge-iterator) Fix mem leak when get next batch failed (#33627) 2024-04-17 23:42:12 +08:00
4740b22481 [fix](test) fix some p2 external table test cases (#33624)
bp #33621
Also fix a merge bug from #33245
2024-04-17 23:42:12 +08:00
8ee8de7857 [Fix](executor)reset remote scan thread num #33579 2024-04-17 23:42:11 +08:00
48880c3e1a [Fix](timezone) fix miss of expected rounding of Date type with timezone #33553 2024-04-17 23:42:11 +08:00
d1a68b8c42 [enhancement](merge-iterator) catch exception to avoid coredump when copy_rows (#33567) 2024-04-17 23:42:00 +08:00
ae68cca07d [fix](schema change) CastStringConverter is compiled failed in g++ (#33546)
follow #32873, CastStringConverter is compiled failed in g++ for uninitialized value, which is ok in clang:
2024-04-17 23:42:00 +08:00
249a9c9875 [Feature](Variant) support aggregation model for Variant type (#33493)
refactor use `insert_from` to replace `replace_column_data` for variable lengths columns
2024-04-17 23:42:00 +08:00
50b64a111d [refactor](heap sort) Simplify sorted block view (#33477) 2024-04-17 23:42:00 +08:00
6bcf24b1f6 [bug](not in) if not in (null) could eos early (#33482)
* [bug](not in) if not in (null) could eos early
2024-04-17 23:41:59 +08:00
272269f9c1 [Fix](inverted index) fix fast execute problem when need read data opt enabled (#33526) 2024-04-17 23:41:59 +08:00
fefbde8927 [log](move-memtable) improve logs in vtablet_writer_v2 and load_stream (#33103) 2024-04-12 15:09:25 +08:00
9b7af4c0cf [feature](schema change) unified schema change for parquet and orc reader (#32873)
Following #25138, unified schema change interface for parquet and orc reader, and can be applied to other format readers as well.
Unified schema change interface for all format readers:
- First, read the data according to the column type of the file into source column;
- Second, convert source column to the destination column with type planned by FE.
2024-04-12 15:09:25 +08:00
Pxl
5f30463bb3 [Chore](descriptors) remove unused codes for descriptors (#33408)
remove unused codes for descriptors
2024-04-12 15:09:25 +08:00
26d9082b9a [Feature](function) Add function strcmp (#33272) 2024-04-12 15:09:25 +08:00
31984bb4f0 [feature](function) support quote string function #33055 2024-04-12 15:09:25 +08:00
f0463a9034 [Feature][Enhancement](hive-writer) Add hive-writer runtime profiles, change output file names (#33245)
Issue Number: #31442

- Add hive-writer runtime profiles.
- Change output file names to `${query_id}${uuid}-${index}.${compression}.${format}`. e.g. `"d8735c6fa444a6d-acd392981e510c2b_34fbdcbb-b2e1-4f2c-b68c-a384238954a9-0.snappy.parquet"`. For the same partition writer, when the file size exceeds `hive_sink_max_file_size`, the currently written file will be closed and a new file will be generated, in which ${index} in the new file name will be incremented, while the rest will be the same .
2024-04-12 10:43:16 +08:00
3c9c6c18a8 [Enhancement](hive-writer) Write only regular fields to file in the hive-writer. (#33000) 2024-04-12 10:29:08 +08:00
3d66723214 [branch-2.1](auto-partition) pick auto partition and some more prs (#33523) 2024-04-11 17:12:17 +08:00
f7d52b5b1c [feature](expr) add type check when expr prepare (#33330) 2024-04-11 09:31:50 +08:00
ef26479282 [improve](serde) support complex type in write/read pb serde (#33124)
support complex type and ip/jsonb in DataTypeSerDe::write_column_to_pb/read_column_from_pb function
2024-04-11 09:31:50 +08:00
ea1e542e31 [fix](partial-update) remove unnecessary DECHEK on IndexChannel::num_rows_filtered (#33160) 2024-04-11 09:31:50 +08:00
Pxl
3081fc584d [Improvement](runtime-filter) support sync join node build side's size to init bloom runtime filter (#32180)
support sync join node build side's size to init bloom runtime filter
2024-04-11 09:31:50 +08:00
e2ad7149c3 [feature](debug point) Add handler to debug point (#33350) 2024-04-10 16:24:13 +08:00
0e262ba0e4 [improvement](spill) improve cancel of spill and improve log printing (#33229)
* [improvement](spill) improve cancel of spill and improve log printing

* fix
2024-04-10 16:23:20 +08:00
28acfaed2b [fix](pipeline)group by and output is empty (#33192) 2024-04-10 16:23:20 +08:00
d667df2d06 [improvement](spill) avoid unnecessary spilling in hash join build phase (#33277) 2024-04-10 16:21:50 +08:00
Pxl
8fd6d4c41b [Chore](build) add -Wconversion and remove some unused code (#33127)
add -Wconversion and remove some unused code
2024-04-10 15:26:08 +08:00
cc363f26c2 [fix](Nereids) fix group concat (#33091)
Fix failed in regression_test/suites/query_p0/group_concat/test_group_concat.groovy

select
group_concat( distinct b1, '?'), group_concat( distinct b3, '?')
from
table_group_concat
group by
b2

exception:

lowestCostPlans with physicalProperties(GATHER) doesn't exist in root group

The root cause is '?' is push down to slot by NormalizeAggregate, AggregateStrategies treat the slot as a distinct parameter and generate a invalid PhysicalHashAggregate, and then reject by ChildOutputPropertyDeriver.

I fix this bug by avoid push down literal to slot in NormalizeAggregate, and forbidden generate stream aggregate node when group by slots is empty
2024-04-10 14:59:46 +08:00
6c5dd820c0 [improvement](spill) improve spill timers (#33156) 2024-04-10 14:55:11 +08:00
c61d6ad1e2 [Feature] support function uuid_to_int and int_to_uuid #33005 2024-04-10 14:53:56 +08:00
bf022f9d8d [enhancement](function truncate) truncate can use column as scale argument (#32746)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-04-10 14:53:56 +08:00
8b1d174b13 [Optimize] Move strings_pool from individual tree nodes to the tree itself (#33089)
Previously, strings_pool was allocated within each tree node. However, due to the Arena's alignment of allocated chunks to at least 4K, this allocation size was excessively large for a single tree node. Consequently, when there are numerous nodes within the SubcolumnTree, a significant portion of memory was wasted. Moving strings_pool to the tree itself optimizes memory usage and reduces wastage, improving overall efficiency.
2024-04-10 14:53:56 +08:00
3b42dc73af [improvement](spill) avoid spill if memory is enough (#33075) 2024-04-10 14:53:27 +08:00
517c12478f [improvement](spill) spill trigger improvement (#32641) 2024-04-10 14:52:46 +08:00
b0b5f84e40 [feature](load) support compressed JSON format data for broker load (#30809) 2024-04-10 14:20:53 +08:00
Pxl
e4993a19e5 [Chore](column) remove ColumnVectorHelper (#33036)
remove ColumnVectorHelper
2024-04-10 11:56:41 +08:00
8e19cdd745 [featrue](expr) support common subexpression elimination be part (#32673) 2024-04-10 11:56:21 +08:00
5116724494 [Fix](hive-writer) Fix the issue of block was not copied to do filtering when hive partition writer write block to file. (#32775) (#33447)
backport #32775
2024-04-10 11:42:23 +08:00
4963d60a07 [Fix](multi-catalog)Fix the issue of not initializing the writer caused by refactoring and add hive writing regression test. (#32721) (#33446)
backport #32721.
2024-04-10 11:42:22 +08:00
cf7595d423 [opt](memory) Optimize mem tracker accuracy (#32039) (#33140) 2024-04-10 11:42:19 +08:00
285e2fcb5a [fix] (vectorization) regexp all_pass string (#32515) 2024-04-10 11:34:30 +08:00