using weak ptr as a lock between fragment execute thread and scanner thread, to solve the core problem in scanner's dctor to access scannode's profile.
Add 3 counters for ExecNode:
ExecTime - Total execution time(excluding the execution time of children).
OutputBytes - The total number of bytes output to parent.
BlockCount - The total count of blocks output to parent.
For comparison predicate, two arguments must be cast to datetime and push down to storage if either one is date type. This PR disables predicate push-down for this case.
this pr is aim to update for filter_by_select logic and change delete limit
only support scala type in delete statement where condition
only support column nullable and predict column support filter_by_select logic, because we can not push down non-scala type to storage layer to pack in predict column but do filter logic
Iceberg has its own metadata information, which includes count statistics for table data. If the table does not contain equli'ty delete, we can get the count data of the current table directly from the count statistics.
For load request, there are 2 tuples on scan node, input tuple and output tuple.
The input tuple is for reading file, and it will be converted to output tuple based on user specified column mappings.
And the broker load support different column mapping in different data description to same table(or partition).
So for each scanner, the output tuples are same but the input tuple can be different.
The previous implements save the input tuple in scan node level, causing different scanner using same input tuple,
which is incorrect.
This PR remove the input tuple from scan node and save them in each scanners.
Optimization "select count(*) from table" stmtement , push down "count" type to BE.
support file type : parquet ,orc in hive .
1. 4kfiles , 60kwline num
before: 1 min 37.70 sec
after: 50.18 sec
2. 50files , 60kwline num
before: 1.12 sec
after: 0.82 sec
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
1. Filtering is done at the sending end rather than the receiving end
2. Projection is done at the sending end rather than the receiving end
3. Each sender can use different shuffle policies to send data
For pipeline, olap table sink close is divided into three stages, try_close() --> pending_finish() --> close()
only after all node channels are done or canceled, pending_finish() will return false, close() will start.
this will avoid block pipeline on close().
In close, check the index channel intolerable failure status after each node channel failure,
if intolerable failure is true, the close will be terminated in advance, and all node channels will be canceled to avoid meaningless blocking.
Currently, there are many profiles using add child profile to orgnanize profile into blocks. But it is wrong. Child profile will have a total time counter. Actually, what we should use is just a label.
- MemoryUsage:
- HashTable: 23.98 KB
- SerializeKeyArena: 446.75 KB
Add a new macro ADD_LABEL_COUNTER to add just a label in the profile.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>