convert_nullable_flags does not contain nullable info for RowID column, but valid_column_ids contain RowID column, nullable falg will be undefined for RowID column
1. When mapping column from external datasource, use date/datetimev2 as default type
2. check `is_cancelled` when read data, to avoid endless loop after query is cancelled
The performance of ClickBench Q30 is affected by batch_size:
| batch_size | 1024 | 4096 | 20480 |
| -- | -- | -- | -- |
| Q30 query time | 2.27 | 1.08 | 0.62 |
Because aggregation operator will create a new result block for each batch block, and Q30 has 90 columns, which is time-consuming. Larger batch_size will decrease the number of aggregation blocks, so the larger batch_size will improve performance.
Doris internal reader will read at least 4064 rows even if batch_size < 4064, so this PR keep the process of reading external table the same as internal table.
Read parquet file failed:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed, reason = [CORRUPTION]The number of rows are not equal among parquet columns
```
This error may be thrown when reading non-predicate columns in lazy-read, for example:
A row group with 1000 rows has tow non-predicate columns.
Column A has one page, Column B has two pages with 500 rows for each page.
The read range of `ParquetColumnReader` is [0, 400), and the rows between [0, 450) are all filtered by predicate columns.
So column A can skip the first page, and reach the EOF, while column B can also skip the first page, but doesn't read the EOF.
Support set skip line number for stream load to load csv file.
Usage `-H skip_lines:number`:
```
curl --location-trusted -u root: -T test.csv -H skip_lines:5 -XPUT http://127.0.0.1:8030/api/testDb/testTbl/_stream_load
```
Skip line number also can be used in mysql load as below:
```sql
LOAD DATA
LOCAL
INFILE '${mysql_load_skip_lines}'
INTO TABLE ${tableName}
COLUMNS TERMINATED BY ','
IGNORE 2 LINES
PROPERTIES ("auth" = "root:");
```
forbidden to_quantile_state temporary to avoid core dump. waiting for [Feature] support QuantileState in vectorized engine #15868 get the ball rolling on implementation.
1. Filter out and remain predicates that do not support applying on inverted index,
like `BF` predicate, `IS_NULL` predicate, `IS_NOT_NULL` predicate.
2. Add inverted index regression case that based on tpcds_sf1 data set.
BufferControlBlock may block all fragment handle threads leads to be out of work
modify include:
BufferControlBlock cancel after max timeout
StmtExcutor notify be to cancel the fragment when unexcepted occur
more details see issue #16203
As AVX512 is available in most modern processors, it is good to use them if have performance boost.
In latest bitshuffle, AVX512 have been added. We could make it integrated in doris for AVX512 case.
Tested with master branch, queries(SSB query q1.1.sql~q4.3.sql total 13 queries) can be boost from 1.4%~3.2%. (use run-ssb-queries.sh 5 times, each time with 100 iterations.)
Signed-off-by: Wu, Kaiqiang <kaiqiang.wu@intel.com>
Co-authored-by: vesslanjin <jun.i.jin@intel.com>