enhancement
- refactor compute output expression on root fragment in nereids planner
- refactor aggregate plan translator
- refactor aggregate disassemble rule
- slightly refactor sort plan translator
- add exchange node on the top of plan node tree if it is needed
- slightly refactor PhysicalPlanTranslator#translatePlan
fix
- slotDescriptor should not reuse between TupleDescriptors
- expression's nullable now works fine
- remove quotes when parse string literal
- set resolvedTupleExprs in SortNode to control output
- remove the extra column in sortTupleSlotExprs in SortInfo
known issues
- aggregate function must be the top expression in output expression (need project in ExecNode in BE)
- first phase aggregate could not convert to stream mode.
- OlapScanNode do not set data partition
- Sort could not process expression like 'order by a + 1' and SortInfo generated in a trick way and should be refactor when we want to support 'order by a + 1'
- column prune do not work as expected
Physical sort:
* 1. Build sortInfo
* There are two types of slotRef:
* one is generated by the previous node, collectively called old.
* the other is newly generated by the sort node, collectively called new.
* Filling of sortInfo related data structures,
* a. ordering use newSlotRef.
* b. sortTupleSlotExprs use oldSlotRef.
* 2. Create sortNode
* 3. Create mergeFragment
TODO:
1.Currently, columns that do not exist in select but exist in order by cannot be parsed.
eg: select key from table order by value;
2.For the combination of Literal and slotRefrance in select, there is a problem with parsing,
eg: select key ,(10-value) from table;
for example:
select * from t1 inner join t2 on t1.a = t2.b inner join t3 on t3.c = t2.b;
If t3 is a large table, it will be placed first after the reorderTable,
and the problem that t2.b does not exist will occur in reanalyzing.
There is two issue fixed in this pr:
**The first issue** is the C++ code rule of `do not call virtual function in constructor or deconstructor`.
The deconstructor function of `ArrowReaderWrap` call the virtual function named `close()`.
When deconstructing, it will never call `ParquetReaderWrap::close()` just call the `ArrowReaderWrap::close()`
**The second issue** is parallelism deconstructing for `ParquetReaderWrap` and `prefetch_batch`.
`prefetch_batch` use `thread.detach()` to separate the control from `ParquetReaderWrap`, but it rely on some local vars from `ParquetReaderWrap` such as **`_closed ` /`_total_groups ` and `_reader`**
In this case, `ParquetReaderWrap` may call deconstructor before `prefetch_batch` and then get the core dump.
add codes for collect_list and collect_set and update regression output, before output format for ARRAY(string) already changed.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.
Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.
The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.
To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
Fix https://github.com/apache/doris/pull/10521, multi-catalog query failed for two reasons:
1. The `SelectStmt` does not get the correct catalog.
2. External table should have three level aliases.
Disable querying external views.
Support show create table for external table&view.
SortInfo is in SortNode. But there are some replicated field in SortNode
Issue Number: close#10616
Remove the redundant field in `TSortNode` which exist in `TSortInfo`.
[API-BREAK] This has changed `Thrift` file.
Refactor Context in Cascades:
use two context in cascades framework.
JobContext is used in each job, contains such attributes:
- reference to PlannerContext
- current cost upper bound
- current required physical properties
PlannerContext is used to hold global info for query planner, contains such attributes:
- reference to Memo
- reference to connectContext
- reference to ruleset could be used for plan
- job pool to maintain unexecuted jobs
- job scheduler to schedule unexecuted jobs
- current job context for next job to be executed