MetricRegistry::trigger_all_hooks holds the metrics lock and is stuck in get_je_metrics, to_prometheus is waiting for MetricRegistry::trigger_all_hooks to release the lock, so get_je_metrics is no longer called in MetricRegistry::trigger_all_hooks.
Add DATA_TYPE in information schema for types: datev2, datatimev2, decimal, jsonb. It was 'unknown' for these types and cause problem for tools such as BI using information schema.
In disk balancer, if a tablet is in highly concurrent load,
new rowset creation time(which use current time) may be same as the
newest rowset, and when add tablet, there has a creation time check
that new_time must bigger than old time, so disk balancer will failed
many times and makes this tablet lose many verisons as migration will
block writes.
convert_nullable_flags does not contain nullable info for RowID column, but valid_column_ids contain RowID column, nullable falg will be undefined for RowID column
1. When mapping column from external datasource, use date/datetimev2 as default type
2. check `is_cancelled` when read data, to avoid endless loop after query is cancelled
The performance of ClickBench Q30 is affected by batch_size:
| batch_size | 1024 | 4096 | 20480 |
| -- | -- | -- | -- |
| Q30 query time | 2.27 | 1.08 | 0.62 |
Because aggregation operator will create a new result block for each batch block, and Q30 has 90 columns, which is time-consuming. Larger batch_size will decrease the number of aggregation blocks, so the larger batch_size will improve performance.
Doris internal reader will read at least 4064 rows even if batch_size < 4064, so this PR keep the process of reading external table the same as internal table.
Read parquet file failed:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed, reason = [CORRUPTION]The number of rows are not equal among parquet columns
```
This error may be thrown when reading non-predicate columns in lazy-read, for example:
A row group with 1000 rows has tow non-predicate columns.
Column A has one page, Column B has two pages with 500 rows for each page.
The read range of `ParquetColumnReader` is [0, 400), and the rows between [0, 450) are all filtered by predicate columns.
So column A can skip the first page, and reach the EOF, while column B can also skip the first page, but doesn't read the EOF.
Support set skip line number for stream load to load csv file.
Usage `-H skip_lines:number`:
```
curl --location-trusted -u root: -T test.csv -H skip_lines:5 -XPUT http://127.0.0.1:8030/api/testDb/testTbl/_stream_load
```
Skip line number also can be used in mysql load as below:
```sql
LOAD DATA
LOCAL
INFILE '${mysql_load_skip_lines}'
INTO TABLE ${tableName}
COLUMNS TERMINATED BY ','
IGNORE 2 LINES
PROPERTIES ("auth" = "root:");
```
forbidden to_quantile_state temporary to avoid core dump. waiting for [Feature] support QuantileState in vectorized engine #15868 get the ball rolling on implementation.