It is a painful work to read profile, especially there are multi-parallel instances.
This tool helps us to grasp the main information of profile in a graphical view.
The profile is represented by a tree.
Sql operation nodes contains operation type(join, scan...), its node id, its fragment id. The number on the arrow edge means how many rows output by child node. This tool will sum the output rows of the same node in multi-parallel instances, that is if there are 4 parallel instance, and each ScanNode on lineitem table output 10 rows, the label on the arrow beginning with ScanNode(lineitem) is 40.
Here is a demo for tpch Q2
tpch q2 profile viewer
Issue Number: close #xxx
1. add a post processor: runtime filter pruner
Doris generates RFs (runtime filter) on Join node to reduce the probe table at scan stage. But some RFs have no effect, because its selectivity is 100%. This pr will remove them.
A RF is effective if
a. the build column value range covers part of that of probe column, OR
b. the build column ndv is less than that of probe column, OR
c. the build column's ColumnStats.selectivity < 1, OR
d. the build column is reduced by another RF, which satisfies above criterions.
2. explain graph
a. add RF info in Join and Scan node
b. add predicate count in Scan node
3. Rename session variable
rename `enable_remove_no_conjuncts_runtime_filter_policy` to `enable_runtime_filter_prune`
4. fix min/max column stats derive bug
`select max(A) as X from T group by B`
X.min is A.min, not A.max
enhancement
- refactor compute output expression on root fragment in nereids planner
- refactor aggregate plan translator
- refactor aggregate disassemble rule
- slightly refactor sort plan translator
- add exchange node on the top of plan node tree if it is needed
- slightly refactor PhysicalPlanTranslator#translatePlan
fix
- slotDescriptor should not reuse between TupleDescriptors
- expression's nullable now works fine
- remove quotes when parse string literal
- set resolvedTupleExprs in SortNode to control output
- remove the extra column in sortTupleSlotExprs in SortInfo
known issues
- aggregate function must be the top expression in output expression (need project in ExecNode in BE)
- first phase aggregate could not convert to stream mode.
- OlapScanNode do not set data partition
- Sort could not process expression like 'order by a + 1' and SortInfo generated in a trick way and should be refactor when we want to support 'order by a + 1'
- column prune do not work as expected
This feature is propsoed in [DSIP-1](https://cwiki.apache.org/confluence/display/DORIS/DSIP-001%3A+Java+UDF).
This PR support fixed-length input and output Java UDF. Phase I in DIP-1 is done after this PR.
To support Java UDF effeciently, I use no data copy in JNI call and all compute operations are off-heap in Java.
To achieve that, I use a UdfExecutor instead.
For users, a UDF class must have a public evaluate method.
This CL mainly changes:
1. Add star schema benchmark tools in `tools/ssb-tools`, for user to easy load and test with SSB data set.
2. Disable the segment cache for some read scenario such as compaction and alter operation.(Fix#6924 )
3. Fix a bug that `max_segment_num_per_rowset` won't work(Fix#6926)
4. Enable `enable_batch_delete_by_default` by default.
It will crash when there are External engine tables in doris after executing "python show_segment_status.py"
The main reason is external engine tables don't have any index and partitions. So it should be ignored.
1. Disable the MySQL client and LZO library by default when building the Doris.
MySQL client library is used for MySQL external table feature.
This feature will be replaced by the new ODBC external table soon.
LZO library is used to compress/decompress data of some old data format of Doris,
which is no longer used anymore.
2. Add missing license to some files.
3. For all non-Apache-License code, all are explained in NOTICE file and the corresponding license is declared.
4. Remove the js source code from webroot, it will be downloaded as thirdparty
This CL mainly made the following modifications:
1. Reorganized SegmentV2 upgrade document.
2. When the variable `use_v2_rollup` is set to true, the base rollup in v2 format is forcibly queried for verifying the data.
3. Fix a problem that there is no persistent storage format information in the schema change operation that performs v2 conversion.
4. Allow users to directly create v2 format tables.