Fix bugs:
1. Fe need to send file format (e.g. parquet, orc ...) to be while processing load jobs using new scanner.
2. Try to get parquet file column type from SchemaElement.type before getting from Logical type and Converted type.
1. Remove single load channel mem limit, only use load channel mgr mem limit
2. Default load channel mgr mem limit from 50% to 80%
3. load channel mgr add soft mem limit. When the soft limit is exceeded, other threads will not hang, only current thread triggers flush
4. When exceed load channel mgr mem limit, find a load channel with the largest mem usage, continue to find a tablet channel with the largest mem usage, and try to flush 1/3 of the mem usage of this tablet channel.
1. HttpServer threads allocate bytebuffer and put them into streamload pipe, but scanner thread release them with query tracker.
2. We can assume brpc allocate memory in doris thread.
Above problems leads to wrong result of memtracker.
In the past, with legacy planner, we could only do bucket shuffle join on the join node belonging to the fragment with at least one scan node.
But, bucket shuffle join should do on each join node that left child's data distribution satisfy join's demand.
In nereids, we have data distribution info on each node. So we could enable bucket shuffle join on fragment without scan node.
This PR mainly improves some functions of the statistics module(#6370):
1. when collecting partition statistics, filter empty partitions in advance and do not generate statistical tasks.
2. the old statistical update method may have problems when updating statistics in parallel, which has been solved.
3. optimize internal-query.
4. add test cases related to statistics.
5. modify some comments as prompted by CheckStyle.
This pull request includes some implementations of the statistics(#6370), it adds statistics module related syntax. The current syntax for collecting statistics will not collect statistics (It will collect statistics until test is stable).
- `ANALYZE` syntax(collect statistics)
```SQL
ANALYZE [[ db_name.tb_name ] [( column_name [, ...] )], ...] [PARTITIONS(...)] [ PROPERTIES(...) ]
```
> db_name.tb_name: collect table and column statistics from tb_name.
> column_name: collect column statistics from column_name.
> properties: properties of statistics jobs.
example:
```SQL
ANALYZE; -- collect statistics for all tables in the current database
ANALYZE table1(pv, citycode); -- collect pv and citycode statistics for table1
ANALYZE test.table2 PARTITIONS(partition1); -- collect statistics for partition1 of table2
```
- `SHOW ANALYZE` syntax(show statistics job info)
```SQL
SHOW ANALYZE
[TABLE | ID]
[
WHERE
[STATE = ["PENDING"|"SCHEDULING"|"RUNNING"|"FINISHED"|"FAILED"|"CANCELLED"]]
]
[ORDER BY ...]
[LIMIT limit][OFFSET offset];
```
- `SHOW TABLE STATS`syntax(show table statistics)
```SQL
SHOW TABLE STATS [ db_name.tb_name ]
```
- `SHOW COLUMN STATS` syntax(show column statistics)
```SQL
SHOW COLUMN STATS [ db_name.tb_name ]
```
This pull request includes some implementations of the statistics(#6370), it Implements sql-task to collect statistics based on internal-query(#9983).
After the ANALYZE statement is parsed, statistical tasks will be generated. The statistical tasks includes mata-task(get statistics from metadata) and sql-task(get statistics by sql query). For sql-task, it will get statistics such as the row_count, the number of null values, and the maximum value by SQL query.
For statistical tasks, also include sampling sql-task, which will be implemented in the next pr.
The like predicate process data in block perform better than in row. Currently, only not null column is optimized, nullable column will be handled later.
SELECT COUNT(*) FROM hits WHERE URL LIKE '%google%';
before: ~680ms
after: ~570ms
when enable light schema change, run test_materialized_view_hll case throw NullPointerException.
java.lang.NullPointerException: null
at org.apache.doris.analysis.SlotDescriptor.setColumn(SlotDescriptor.java:153)
at org.apache.doris.planner.OlapScanNode.updateSlotUniqueId(OlapScanNode.java:399)
Get schema from parquet reader.
The new VFileScanner need to get file schema (column name to type map) from parquet file while processing load job,
this pr is to set the type information for parquet columns.
refactor some arguments for parquet reader
1. Add new parquet context to wrap reader arguments
2. Reduced some arguments for function call
Co-authored-by: jinzhe <jinzhe@selectdb.com>
The new scanner (VFileScanner) need a counter to record two values in load job.
1. The number of rows unselected by pre-filter, and
2. The number of rows filtered by unmatched schema or other error. This pr is to implement the counter.
The mem hook record tracker cannot guarantee that the final consumption is 0, nor can it guarantee that the memory alloc and free are recorded in a one-to-one correspondence.
In the life cycle of a memtable from insert to flush, the memory free of hook is more than that of alloc, resulting in tracker consumption less than 0.
In order to avoid the cumulative error of the upper load channel tracker, the memtable tracker consumption is reset to zero on destructor.
The mem hook consumes the orphan tracker by default. If the thread does not attach other trackers, by default all consumption will be passed to the process tracker through the orphan tracker.
In real time, consumption of all other trackers + orphan tracker consumption = process tracker consumption.
Ideally, all threads are expected to attach to the specified tracker, so that "all memory has its own ownership", and the consumption of the orphan mem tracker is close to 0, but greater than 0.
Following the iteration order of the hash table will result in out-of-order access to aggregate states, which is very inefficient.
Traversing aggregate states in memory write order can significantly improve memory read efficiency.
Test
hash table items count: 3.35M
Before this optimization: insert keys into column takes 500ms
With this optimization only takes 80ms
In current policy, if mem-limit exceeded, load channel will pick tablets that consume most memory, but mem_consumption contains memory in flush, if some delta writer flushing a full memtable(default 200MB), the current memtable might be very small, we should avoid flush such memtable, which can generate a very small segment.
Refactor of scanners. Support broker load.
This pr is part of the refactor scanner tasks. It provide support for borker load using new VFileScanner.
Work still in progress.
In an earlier PR #11976 , we add shuffle join and bucket shuffle support. But if join's right child's distribution spec satisfied join's require, we do not add distribute on right child. Instead of, do it in plan translator.
It is hard to calculate accurate cost in this way, since we some distribute cost do not calculated.
In this PR, we introduce a new shuffle type BUCKET, and change the way of add enforce to ensure all necessary distribute will be added in cost and enforcer job.