Commit Graph

3694 Commits

Author SHA1 Message Date
87fbb8341a [Bug](datev2) Fix bug when cast datev2 to date (#16394) 2023-02-03 20:50:16 +08:00
f94a78ab4a [Fix](topn) fix wrong nullable cast for RowId column and use heapsorter for two phase read (#16399)
convert_nullable_flags does not contain nullable info for RowID column, but valid_column_ids contain RowID column, nullable falg will be undefined for RowID column
2023-02-03 20:49:45 +08:00
4df70becb9 [refactor](reader) refactor broker_file_reader to get _client in the constructor (#16021) 2023-02-03 16:51:19 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
7d5a10e1af [bug](function) fix mask_first_n function can't handle const value (#16308) 2023-02-03 10:32:42 +08:00
545b91f8f7 [bug](jdbc) fix jdbc insert decimalv3 be core dump (#16353) 2023-02-03 10:00:06 +08:00
7a800bd3c6 [fix](scan) coredump caused by null of _scanner_ctx (#16361) 2023-02-03 09:24:15 +08:00
1d8265c5a3 [refactor](row-store) make row store column a hidden column in meta (#16251)
This could simplfy storage engine logic and make code more readable, and we could analyze
the hidden `__DORIS_ROW_STORE_COL__` length etc..
2023-02-02 20:56:13 +08:00
6ee0dbfb23 [fix](cooldown) Fix bugs in cooldown single replica files (#16299) 2023-02-02 19:31:26 +08:00
Pxl
0d5b115993 [Feature](Materialized-View) support duplicate base column for diffrent aggregate function (#15837)
support duplicate base column for diffrent aggregate function
2023-02-02 18:57:39 +08:00
cb6875b5a4 [improvement](multi-catalog) use date/datetimev2 as default col type for catalog table (#16304)
1. When mapping column from external datasource, use date/datetimev2 as default type
2. check `is_cancelled` when read data, to avoid endless loop after query is cancelled
2023-02-02 17:35:48 +08:00
557159d3ce [feature](JdbcExternalCatalog) support insert data in JdbcExternalCatalog (#16271) 2023-02-02 17:31:33 +08:00
bb179b77f7 [Feature-WIP](inverted index) support array type for inverted index reader (#16355) 2023-02-02 16:14:14 +08:00
9618427020 [improvement](multi-catalog) increase default batch_size to 4064 (#16326)
The performance of ClickBench Q30 is affected by batch_size:
| batch_size | 1024 | 4096 | 20480 |
| -- | -- | -- | -- |
| Q30 query time | 2.27 | 1.08 | 0.62 |

Because aggregation operator will create a new result block for each batch block, and Q30 has 90 columns, which is time-consuming. Larger batch_size will decrease the number of aggregation blocks, so the larger batch_size will improve performance.

Doris internal reader will read at least 4064 rows even if batch_size < 4064, so this PR keep the process of reading external table the same  as internal table.
2023-02-02 11:51:09 +08:00
69f34cd1c3 [fix](load) sequence column do not compare correctly in memtable (#16211) 2023-02-02 11:00:23 +08:00
eba70f972e [improvement](global context) remove some unused method from runtime state (#16329)
This is part of #16296.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-02 10:24:55 +08:00
696c6ffcc5 [fix](join) crash caused by canceling query (#16311)
If the query was canceled,
the status in shared context may be `OK` with other fields not set.
2023-02-02 09:55:37 +08:00
63042a38bd [fix](memtracker) Fix high frequency load slow lock in memtracker (#16244)
Global lock stuck in memtracker when bthread is frequently created
2023-02-02 09:53:44 +08:00
1c5279d26e [fix](multi-catalog) remove the eof check among parquet columns (#16302)
Read parquet file failed:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed, reason = [CORRUPTION]The number of rows are not equal among parquet columns
```
This error may be thrown when reading non-predicate columns in lazy-read, for example:
A row group with 1000 rows has tow non-predicate columns.
Column A has one page, Column B has two pages with 500 rows for each page.
The read range of `ParquetColumnReader` is [0, 400), and the rows between [0, 450) are all filtered by predicate columns.
So column A can skip the first page, and reach the EOF,  while column B can also skip the first page, but doesn't read the EOF.
2023-02-02 09:22:09 +08:00
aa0837f198 [bugfix](topn) fix topn runtime predicate getting value bug for decimal type (#16331)
* fix topn runtime predicate getting value bug for decimal type

* fix cast_to_string bug for TYPE_DECIMALV2
2023-02-02 09:13:32 +08:00
7c145faa80 [Enhance] use fast_float::from_chars to do str cast to float/double to avoid lose precision (#16190) 2023-02-01 23:53:34 +08:00
82faa965f5 [Bug](followup) fix datev2 functions (#16330) 2023-02-01 22:38:34 +08:00
b878a7e61e [feature](Load)Suppot skip specific lines number for csv stream load (#16055)
Support set skip line number for stream load to load csv file.

Usage `-H skip_lines:number`:
```
curl --location-trusted -u root: -T test.csv -H skip_lines:5  -XPUT http://127.0.0.1:8030/api/testDb/testTbl/_stream_load
```

Skip line number also can be used in mysql load as below:
```sql
LOAD DATA
LOCAL
INFILE '${mysql_load_skip_lines}'
INTO TABLE ${tableName}
COLUMNS TERMINATED BY ','
IGNORE 2 LINES
PROPERTIES ("auth" = "root:");
```
2023-02-01 20:42:43 +08:00
bb0d4ba787 [BugFix](sort) use correct agg function when using 2 phase sort for agg table (#16185) 2023-02-01 20:07:43 +08:00
d224624bbe [improvement](session variable)Add enable_file_cache session variable (#16268)
Add enable_file_cache session variable, so that we can close file cache without restart BE.
2023-02-01 18:15:03 +08:00
bf16228851 [fix](hashjoin) join produce blocks with rows larger than batch size (#16166)
* [fix](hashjoin) join produce blocks with rows larger than batch size

* fix
2023-02-01 16:02:31 +08:00
aaae1497cd [Refactor](function) opt the exec of function with null column (#16256) 2023-02-01 15:56:31 +08:00
Pxl
ca73c60442 [Chore](build) enable ignored-qualifiers check (#16196)
enable ignored-qualifiers check
2023-02-01 15:15:59 +08:00
Pxl
1b99746355 [Bug](function) enchance esquery error msg && forbid to_quantile_state #16274
forbidden to_quantile_state temporary to avoid core dump. waiting for [Feature] support QuantileState in vectorized engine #15868 get the ball rolling on implementation.
2023-02-01 14:06:09 +08:00
1c7c6b2f44 [improve](file cache) rename the var QueryContext to QueryFileCacheContext (#16272) 2023-02-01 14:05:00 +08:00
ba026b6e99 [datev2](function) make function nullable DEPEND_ON_ARGUMENT (#16159) 2023-02-01 13:57:43 +08:00
dbd1dfb64c [Bug](date) fix BE crash if month_floor 's argument is null (#16281) 2023-02-01 12:25:57 +08:00
95d7c2de26 [Refactor](function) Rewrite the function elt (#16287) 2023-02-01 11:17:06 +08:00
6470ae58ea [enhancement](config) remove config load_process_max_memory_limit_bytes (#15686) 2023-01-31 21:36:34 +08:00
934f2de8da [fix](inverted index) fix some bug about fulltext match query with compound conditions (#16226) 2023-01-31 21:34:30 +08:00
ca7eb94f23 [improvement](agg-function) Increase the limit maximum number of agg function parameters (#15924) 2023-01-31 21:03:50 +08:00
00a598a839 [feature](cooldown) Decouple storage policy and resource (#15873) 2023-01-31 14:13:47 +08:00
471db80f69 [Bug](date) Fix invalid date (#16205)
Issue Number: close #15777
2023-01-31 10:08:44 +08:00
a7b030778a [fix](sort) fix heap-use-after-free error if sort with limit and is spilled (#16267) 2023-01-31 09:59:03 +08:00
1020fe165b [Bug](function) positive function coredump in decimal (#16230) 2023-01-30 22:17:50 +08:00
75c8670286 [Feature-WIP](inverted index) Filter out and remain predicates that do not support apply by inverted index, and add inverted index regression case (#16167)
1. Filter out and remain predicates that do not support applying on inverted index,
    like `BF` predicate, `IS_NULL` predicate, `IS_NOT_NULL` predicate.
2. Add inverted index regression case that based on tpcds_sf1 data set.
2023-01-30 22:16:08 +08:00
9aa0d86fec [fix](olap) Incorrect reserving size for PredicateColumn converted from ColumnDictionary (#16249) 2023-01-30 20:28:22 +08:00
Pxl
322dc2a104 [Bug](function) fix now(int) use_default_implementation_for_nulls && fix dround signature (#16238) 2023-01-30 18:01:26 +08:00
c59a8cb15d [refactor](remove unused code) remove log error hub (#16183)
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-30 16:53:56 +08:00
fdc042bb39 [fix](vresultsink) BufferControlBlock may block all fragment handle threads (#16231)
BufferControlBlock may block all fragment handle threads leads to be out of work

modify include:

BufferControlBlock cancel after max timeout
StmtExcutor notify be to cancel the fragment when unexcepted occur
more details see issue #16203
2023-01-30 16:53:21 +08:00
90b12143a3 [refactor](remove unused code) remove runtime tuple structure and useless utils class (#16237) 2023-01-30 16:45:14 +08:00
a9671b6dfd [feature](agg)support two level-hash map in aggregation node (#15967) 2023-01-30 16:43:33 +08:00
4b6a4b3cf7 [refactor](remove unused code) Remove unused mempool declare or function params (#16222)
* Remove unused mempool declare or function params

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-30 13:03:18 +08:00
07d58e531a [improvement](filecache) add profile for file cache (#16223) 2023-01-30 10:46:31 +08:00
28fcc093a8 [improvement](bitshuffle)Enable avx512 support in bitshuffle for performance boost (#15972)
As AVX512 is available in most modern processors, it is good to use them if have performance boost.
In latest bitshuffle, AVX512 have been added. We could make it integrated in doris for AVX512 case.

Tested with master branch, queries(SSB query q1.1.sql~q4.3.sql total 13 queries) can be boost from 1.4%~3.2%. (use run-ssb-queries.sh 5 times, each time with 100 iterations.)

Signed-off-by: Wu, Kaiqiang <kaiqiang.wu@intel.com>
Co-authored-by: vesslanjin <jun.i.jin@intel.com>
2023-01-30 10:33:01 +08:00