add codes for collect_list and collect_set and update regression output, before output format for ARRAY(string) already changed.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.
Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.
The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.
To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
Fix https://github.com/apache/doris/pull/10521, multi-catalog query failed for two reasons:
1. The `SelectStmt` does not get the correct catalog.
2. External table should have three level aliases.
Disable querying external views.
Support show create table for external table&view.
SortInfo is in SortNode. But there are some replicated field in SortNode
Issue Number: close#10616
Remove the redundant field in `TSortNode` which exist in `TSortInfo`.
[API-BREAK] This has changed `Thrift` file.
Refactor Context in Cascades:
use two context in cascades framework.
JobContext is used in each job, contains such attributes:
- reference to PlannerContext
- current cost upper bound
- current required physical properties
PlannerContext is used to hold global info for query planner, contains such attributes:
- reference to Memo
- reference to connectContext
- reference to ruleset could be used for plan
- job pool to maintain unexecuted jobs
- job scheduler to schedule unexecuted jobs
- current job context for next job to be executed
During the query planning phase, the binary predicate rewrite optimization process converting DecimalLiteral to integers may overflow, resulting in false values like "id = 12345678901.0" (see the issue for detailed examples).
This pr fixes a possible overflow and optimizes the case where DecimalLiteral is not in the column type value range.
Issue Number: close#10544
`ExprBuilder` use stack to build the expr.
The input order is : col, value and the output is value, col, but the `>=` is not reverse.
Example:
`col >= 1` => `1 >= col`
In this case, it's better use the queue to keeper the input order.
And also the `CompoundPredicate(OR)` have some problems, it should be `alwaysTrue` whenever it's not a partition key or it's not a supported op.
Something the upstream system(eg, hive) may create empty orc file
which only has a header and footer, without schema.
And if we call `_reader->createRowReader()` with selected columns,
it will throw ParserError: Invalid column selected xx.
So here we first check its number of rows and skip these kind of files.
This is only a fix for non-vec load, for vec load, it use arrow scanner
to read orc file, which does not have this problem.
enhancement
- add functions `finalizeForNereids` and `finalizeImplForNereids` in stale expression to generate some attributes using in BE.
- remove unnecessary parameter `Analyzer` in function `getBuiltinFunction`
- swap join condition if its left hand expression related to right table
- change join physical implementation to broadcast hash join
- add push predicate rule into planner
fix
- swap join children visit order to ensure the last fragment is root
- avoid visit join left child twice
known issues
- expression compute will generate a wrong answer when expression include arithmetic with two literal children.
Add Rule for disassemble the logical aggregate node, this is necessary since our execution framework is distributed and the execution of aggregate always in two steps, first, aggregate locally then merge them.
Add some fields to logical aggregate to determine whether a logical aggreate operator has been disasembled and mark the aggregate phase it belongs and add the logic to mapping the new aggregate function to its stale definition to get the function intermediate type.