Physical sort:
* 1. Build sortInfo
* There are two types of slotRef:
* one is generated by the previous node, collectively called old.
* the other is newly generated by the sort node, collectively called new.
* Filling of sortInfo related data structures,
* a. ordering use newSlotRef.
* b. sortTupleSlotExprs use oldSlotRef.
* 2. Create sortNode
* 3. Create mergeFragment
TODO:
1.Currently, columns that do not exist in select but exist in order by cannot be parsed.
eg: select key from table order by value;
2.For the combination of Literal and slotRefrance in select, there is a problem with parsing,
eg: select key ,(10-value) from table;
for example:
select * from t1 inner join t2 on t1.a = t2.b inner join t3 on t3.c = t2.b;
If t3 is a large table, it will be placed first after the reorderTable,
and the problem that t2.b does not exist will occur in reanalyzing.
There is two issue fixed in this pr:
**The first issue** is the C++ code rule of `do not call virtual function in constructor or deconstructor`.
The deconstructor function of `ArrowReaderWrap` call the virtual function named `close()`.
When deconstructing, it will never call `ParquetReaderWrap::close()` just call the `ArrowReaderWrap::close()`
**The second issue** is parallelism deconstructing for `ParquetReaderWrap` and `prefetch_batch`.
`prefetch_batch` use `thread.detach()` to separate the control from `ParquetReaderWrap`, but it rely on some local vars from `ParquetReaderWrap` such as **`_closed ` /`_total_groups ` and `_reader`**
In this case, `ParquetReaderWrap` may call deconstructor before `prefetch_batch` and then get the core dump.
add codes for collect_list and collect_set and update regression output, before output format for ARRAY(string) already changed.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.
Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.
The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.
To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
Fix https://github.com/apache/doris/pull/10521, multi-catalog query failed for two reasons:
1. The `SelectStmt` does not get the correct catalog.
2. External table should have three level aliases.
Disable querying external views.
Support show create table for external table&view.
SortInfo is in SortNode. But there are some replicated field in SortNode
Issue Number: close#10616
Remove the redundant field in `TSortNode` which exist in `TSortInfo`.
[API-BREAK] This has changed `Thrift` file.
Refactor Context in Cascades:
use two context in cascades framework.
JobContext is used in each job, contains such attributes:
- reference to PlannerContext
- current cost upper bound
- current required physical properties
PlannerContext is used to hold global info for query planner, contains such attributes:
- reference to Memo
- reference to connectContext
- reference to ruleset could be used for plan
- job pool to maintain unexecuted jobs
- job scheduler to schedule unexecuted jobs
- current job context for next job to be executed
During the query planning phase, the binary predicate rewrite optimization process converting DecimalLiteral to integers may overflow, resulting in false values like "id = 12345678901.0" (see the issue for detailed examples).
This pr fixes a possible overflow and optimizes the case where DecimalLiteral is not in the column type value range.
Issue Number: close#10544