Commit Graph

1212 Commits

Author SHA1 Message Date
646ba2cc88 [bugfix](scannode) 1. make rows_read correct 2. use single scanner if has limit clause (#16473)
make rows_read correct so that the scheduler could using this correctly.
use single scanner if has limit clause. Move it from fragment context to scannode.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-02-09 14:12:18 +08:00
0142ef8b95 [improvement](scanner) Supports bthread scanner (#16031) 2023-02-09 10:24:56 +08:00
9f8753ffd2 [bugfix](vertical_compaction) fix base_compaction delete_sign handler (#16469)
In vertical base compaction, same rows will be filtered in vertical_merge_iterator,
we should skip these filtered rows when set agg flag of delete sign.
For example, schema is a,b,delete_sign, and data is
1,1,1
1,1,0
1,1,0
2,2,1
2,2
and Block we get in VerticalBlockReader is
1,1,1
2,2,1
and we should set agg flag idex 0,4 to true when handle delete sign, so
we add a function continuous_agg_count to skip same rows filtered in
VerticalMergeIterator.
2023-02-09 10:13:41 +08:00
d1c6b81140 [Bug](log) add some log to find out bug (#16518) 2023-02-08 21:23:02 +08:00
f71fc3291f [Bug](fix) right anti join error result when batch size is low (#16510) 2023-02-08 17:26:19 +08:00
f6a20f844b [fix](hashjoin) join produce blocks with rows larger than batch size: handle join with other conjuncts (#16402) 2023-02-08 14:26:35 +08:00
41947c73eb [Feature](array-function) Support array functions for nested type datev2 and datetimev2 (#16382) 2023-02-08 12:51:07 +08:00
91325e5ca3 [fix](pipeline) incorrect result when disabling sharing hash table (#16476) 2023-02-07 21:25:32 +08:00
9114896178 [DecimalV3](opt) opt the function of decimalv3 to_string logic (#16427) 2023-02-07 13:28:07 +08:00
d390e63a03 [enhancement](stream receiver) make stream receiver exception safe (#16412)
make stream receiver exception safe
change get_block(block**) to get_block(block* , bool* eos) unify stream semantic
2023-02-07 12:44:20 +08:00
27216dc7e0 [improvement](multi-catalog) push down all predicates into rowgroup/page filtering for ParquetReader (#16388)
Tow improvements:
1. Refactor rowgroup&page filtering in `ParquetReader`, and use the operator overloading of Doris native c++ type to process comparison.
2. Support decimal/decimal v3/date/datev2/datetime/datetimev2
2023-02-07 11:32:57 +08:00
91229bb87d [Bug](makr join) Fix mark join with other conjuncts (#16435) 2023-02-07 09:31:41 +08:00
36a5e0a2a9 [bugfix](array) fix element revert on error in DataTypeArray::from_string (#16434)
* fix array from_string element revert on error

* add testcase
2023-02-06 18:27:36 +08:00
b21fdace37 [bugfix](RemoteUDF) fix remote udf retrun rpc env init error (#16325) 2023-02-06 15:47:10 +08:00
737c73dcf0 [Improvement](topn) order by key topn query optimization (#15663) 2023-02-06 15:36:05 +08:00
f2fd47f238 [Improve](row-store) support row cache (#16263) 2023-02-06 11:16:39 +08:00
b1b2697cc7 [fix](iceberg) fix iceberg catalog (#16372)
1. Fix iceberg catalog access s3
2. Fix iceberg catalog partition table query
3. Fix persistence
2023-02-05 13:15:28 +08:00
09870098af [fix](func) fix core dump when the pattern of the regexp_extract_all function does not contain subpatterns (#16408) 2023-02-05 01:16:54 +08:00
059cf58151 [fix](vertical compaction) fix uint32_t init value (#16377) 2023-02-05 00:05:35 +08:00
dd63897757 [fix](be)the set operation node should accept both nullable and non-nullable data from child node (#16126) 2023-02-04 23:08:59 +08:00
d2b5015d3f [enhancement](profile) add the profile counter RawRowsRead to record the rows read from the parquet file (#16328) 2023-02-04 22:59:34 +08:00
c3a6eb4f9a [Refactor](function) remove useless function get to create column (#16333)
remove unless create_column to redurce the unless new operator
2023-02-04 22:54:14 +08:00
458adf6c91 [improvement](jdbc) refator jdbc of copy result set by batch (#16337)
have test jdbc external table with read,  10%+ performance improvement after optimization
2023-02-04 22:51:55 +08:00
918004c016 [Bug](date) Fix BE crash caused by function datediff (#16397)
* [Bug](date) Fix BE crash caused by function `datediff`

* update
2023-02-04 18:43:23 +08:00
87fbb8341a [Bug](datev2) Fix bug when cast datev2 to date (#16394) 2023-02-03 20:50:16 +08:00
f94a78ab4a [Fix](topn) fix wrong nullable cast for RowId column and use heapsorter for two phase read (#16399)
convert_nullable_flags does not contain nullable info for RowID column, but valid_column_ids contain RowID column, nullable falg will be undefined for RowID column
2023-02-03 20:49:45 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
7d5a10e1af [bug](function) fix mask_first_n function can't handle const value (#16308) 2023-02-03 10:32:42 +08:00
7a800bd3c6 [fix](scan) coredump caused by null of _scanner_ctx (#16361) 2023-02-03 09:24:15 +08:00
1d8265c5a3 [refactor](row-store) make row store column a hidden column in meta (#16251)
This could simplfy storage engine logic and make code more readable, and we could analyze
the hidden `__DORIS_ROW_STORE_COL__` length etc..
2023-02-02 20:56:13 +08:00
Pxl
0d5b115993 [Feature](Materialized-View) support duplicate base column for diffrent aggregate function (#15837)
support duplicate base column for diffrent aggregate function
2023-02-02 18:57:39 +08:00
cb6875b5a4 [improvement](multi-catalog) use date/datetimev2 as default col type for catalog table (#16304)
1. When mapping column from external datasource, use date/datetimev2 as default type
2. check `is_cancelled` when read data, to avoid endless loop after query is cancelled
2023-02-02 17:35:48 +08:00
bb179b77f7 [Feature-WIP](inverted index) support array type for inverted index reader (#16355) 2023-02-02 16:14:14 +08:00
9618427020 [improvement](multi-catalog) increase default batch_size to 4064 (#16326)
The performance of ClickBench Q30 is affected by batch_size:
| batch_size | 1024 | 4096 | 20480 |
| -- | -- | -- | -- |
| Q30 query time | 2.27 | 1.08 | 0.62 |

Because aggregation operator will create a new result block for each batch block, and Q30 has 90 columns, which is time-consuming. Larger batch_size will decrease the number of aggregation blocks, so the larger batch_size will improve performance.

Doris internal reader will read at least 4064 rows even if batch_size < 4064, so this PR keep the process of reading external table the same  as internal table.
2023-02-02 11:51:09 +08:00
eba70f972e [improvement](global context) remove some unused method from runtime state (#16329)
This is part of #16296.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-02 10:24:55 +08:00
696c6ffcc5 [fix](join) crash caused by canceling query (#16311)
If the query was canceled,
the status in shared context may be `OK` with other fields not set.
2023-02-02 09:55:37 +08:00
1c5279d26e [fix](multi-catalog) remove the eof check among parquet columns (#16302)
Read parquet file failed:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed, reason = [CORRUPTION]The number of rows are not equal among parquet columns
```
This error may be thrown when reading non-predicate columns in lazy-read, for example:
A row group with 1000 rows has tow non-predicate columns.
Column A has one page, Column B has two pages with 500 rows for each page.
The read range of `ParquetColumnReader` is [0, 400), and the rows between [0, 450) are all filtered by predicate columns.
So column A can skip the first page, and reach the EOF,  while column B can also skip the first page, but doesn't read the EOF.
2023-02-02 09:22:09 +08:00
82faa965f5 [Bug](followup) fix datev2 functions (#16330) 2023-02-01 22:38:34 +08:00
b878a7e61e [feature](Load)Suppot skip specific lines number for csv stream load (#16055)
Support set skip line number for stream load to load csv file.

Usage `-H skip_lines:number`:
```
curl --location-trusted -u root: -T test.csv -H skip_lines:5  -XPUT http://127.0.0.1:8030/api/testDb/testTbl/_stream_load
```

Skip line number also can be used in mysql load as below:
```sql
LOAD DATA
LOCAL
INFILE '${mysql_load_skip_lines}'
INTO TABLE ${tableName}
COLUMNS TERMINATED BY ','
IGNORE 2 LINES
PROPERTIES ("auth" = "root:");
```
2023-02-01 20:42:43 +08:00
bb0d4ba787 [BugFix](sort) use correct agg function when using 2 phase sort for agg table (#16185) 2023-02-01 20:07:43 +08:00
d224624bbe [improvement](session variable)Add enable_file_cache session variable (#16268)
Add enable_file_cache session variable, so that we can close file cache without restart BE.
2023-02-01 18:15:03 +08:00
bf16228851 [fix](hashjoin) join produce blocks with rows larger than batch size (#16166)
* [fix](hashjoin) join produce blocks with rows larger than batch size

* fix
2023-02-01 16:02:31 +08:00
aaae1497cd [Refactor](function) opt the exec of function with null column (#16256) 2023-02-01 15:56:31 +08:00
Pxl
ca73c60442 [Chore](build) enable ignored-qualifiers check (#16196)
enable ignored-qualifiers check
2023-02-01 15:15:59 +08:00
Pxl
1b99746355 [Bug](function) enchance esquery error msg && forbid to_quantile_state #16274
forbidden to_quantile_state temporary to avoid core dump. waiting for [Feature] support QuantileState in vectorized engine #15868 get the ball rolling on implementation.
2023-02-01 14:06:09 +08:00
ba026b6e99 [datev2](function) make function nullable DEPEND_ON_ARGUMENT (#16159) 2023-02-01 13:57:43 +08:00
dbd1dfb64c [Bug](date) fix BE crash if month_floor 's argument is null (#16281) 2023-02-01 12:25:57 +08:00
95d7c2de26 [Refactor](function) Rewrite the function elt (#16287) 2023-02-01 11:17:06 +08:00
934f2de8da [fix](inverted index) fix some bug about fulltext match query with compound conditions (#16226) 2023-01-31 21:34:30 +08:00
ca7eb94f23 [improvement](agg-function) Increase the limit maximum number of agg function parameters (#15924) 2023-01-31 21:03:50 +08:00