Commit Graph

8501 Commits

Author SHA1 Message Date
5e232a30d8 [fix](planner) Doris returns empty sets when select from a inline view (#16370)
Doris always delays the execution of expressions as possible as it can, so as the expansion of constant expression. Given below SQL:

```sql
select i from (select 'abc' as i, sum(birth) as j from  subquerytest2) as tmp
```

The aggregation would be eliminated, since its output is not required by the outer block, but the expasion for constant expression would be done in the final result expr, and since aggreagete output has been eliminate, the expasion would actually do nothing, and finally cause a empty results.

To fix this, we materialize the results expr in the inner block for such SQL, it may affect performance, but better than let system produce a mistaken result.
2023-02-03 21:23:52 +08:00
a5d9aca7ba [test](Nereids) enable G-K and L-Q scalar function regression test cases (#16169)
1. delete invalid signature of nvl function 
2. fix some test cases that failed because of malformed function name
2023-02-03 21:18:43 +08:00
87fbb8341a [Bug](datev2) Fix bug when cast datev2 to date (#16394) 2023-02-03 20:50:16 +08:00
f94a78ab4a [Fix](topn) fix wrong nullable cast for RowId column and use heapsorter for two phase read (#16399)
convert_nullable_flags does not contain nullable info for RowID column, but valid_column_ids contain RowID column, nullable falg will be undefined for RowID column
2023-02-03 20:49:45 +08:00
929b31bd3c [Feature](Nereids) Support CaseWhen with subquery (#16385)
Co-authored-by: jianghaochen <jianghaochen@meituan.com>
2023-02-03 18:20:47 +08:00
3891083474 [fix](Nereids): fix some bugs in DpHyper (#16282) 2023-02-03 18:19:48 +08:00
3f4ca3da32 [Bug](CURRENT_TIMESTAMP) Fix wrong default value after schema change (#16364)
* [Bug](CURRENT_TIMESTAMP) Fix wrong default value after schema change

* update

* update
2023-02-03 17:06:24 +08:00
4df70becb9 [refactor](reader) refactor broker_file_reader to get _client in the constructor (#16021) 2023-02-03 16:51:19 +08:00
13cb81a724 [fix](broker) Fix bug that heavy broker load may failed due to BrokerException which indicate the fd is not owned by client (#16350)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-02-03 15:06:45 +08:00
6294b29f0a [chore](regression-test) Remove array config in regression test (#16376)
The fe config "enable_array_type" is not used, this commit removes it from regression test.
2023-02-03 14:44:03 +08:00
b1fd124f02 [feature](struct-type/map-type) Add switch for struct and map type for creating table (#16379)
Add switches to forbid uses creating table with struct or map column.
2023-02-03 13:46:52 +08:00
dfb610d7ec [fix](nereids) the order exprs in sort node should be slotRef in its tupleDesc (#16363) 2023-02-03 13:28:08 +08:00
a9177569c6 [refactor](Nereids) remove trick datatype code in Expression (#16365)
Since we already do typeCoercion bottom-up in binding step.
The trick codes of dataType in Expression are useless.
This PR try to remove them.
2023-02-03 13:02:34 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
7d5a10e1af [bug](function) fix mask_first_n function can't handle const value (#16308) 2023-02-03 10:32:42 +08:00
4fc0715156 [fix](auth)fix external catalog cannot use db (#16269) 2023-02-03 10:10:33 +08:00
545b91f8f7 [bug](jdbc) fix jdbc insert decimalv3 be core dump (#16353) 2023-02-03 10:00:06 +08:00
7a800bd3c6 [fix](scan) coredump caused by null of _scanner_ctx (#16361) 2023-02-03 09:24:15 +08:00
13f74088fa [Improve](row-store) check light schema change enabled (#16358) 2023-02-02 20:57:18 +08:00
1d8265c5a3 [refactor](row-store) make row store column a hidden column in meta (#16251)
This could simplfy storage engine logic and make code more readable, and we could analyze
the hidden `__DORIS_ROW_STORE_COL__` length etc..
2023-02-02 20:56:13 +08:00
6ee0dbfb23 [fix](cooldown) Fix bugs in cooldown single replica files (#16299) 2023-02-02 19:31:26 +08:00
Pxl
0d5b115993 [Feature](Materialized-View) support duplicate base column for diffrent aggregate function (#15837)
support duplicate base column for diffrent aggregate function
2023-02-02 18:57:39 +08:00
e31913faca [Feature](Nereids) Support order and limit in subquery (#15971)
1.Compatible with the old optimizer, the sort and limit in the subquery will not take effect, just delete it directly.
```
select * from sub_query_correlated_subquery1 where sub_query_correlated_subquery1.k1 > (select sum(sub_query_correlated_subquery3.k3) a from sub_query_correlated_subquery3 where sub_query_correlated_subquery3.v2 = sub_query_correlated_subquery1.k2 order by a limit 1);
```

2.Adjust the unnesting position of the subquery to ensure that the conjunct in the filter has been optimized, and then unnesting

Support:
```
SELECT DISTINCT k1 FROM sub_query_correlated_subquery1 i1 WHERE ((SELECT count(*) FROM sub_query_correlated_subquery1 WHERE ((k1 = i1.k1) AND (k2 = 2)) or ((k1 = i1.k1) AND (k2 = 1)) )  > 0);
```
The reason why the above can be supported is that conjunction will be performed, which can be converted into the following
```
SELECT DISTINCT k1 FROM sub_query_correlated_subquery1 i1 WHERE ((SELECT count(*) FROM sub_query_correlated_subquery1 WHERE ((k1 = i1.k1) AND (k2 = 2 or k2 = 1)) )  > 0);
```

Not Support:
```
SELECT DISTINCT k1 FROM sub_query_correlated_subquery1 i1 WHERE ((SELECT count(*) FROM sub_query_correlated_subquery1 WHERE ((k1 = i1.k1) AND (k2 = 2)) or ((k2 = i1.k1) AND (k2 = 1)) )  > 0);
```
2023-02-02 18:17:30 +08:00
cb6875b5a4 [improvement](multi-catalog) use date/datetimev2 as default col type for catalog table (#16304)
1. When mapping column from external datasource, use date/datetimev2 as default type
2. check `is_cancelled` when read data, to avoid endless loop after query is cancelled
2023-02-02 17:35:48 +08:00
557159d3ce [feature](JdbcExternalCatalog) support insert data in JdbcExternalCatalog (#16271) 2023-02-02 17:31:33 +08:00
09abd32957 [fix](test) result order in group-by-costant case is not stable (#16323) 2023-02-02 16:54:01 +08:00
398da44e46 [fix](Nereids) fix bugs in test join5 (#16312)
make bucket-shuffle-join in PhysicalPlanTranlator when property of left child is not enforced
2023-02-02 16:51:45 +08:00
68d2067f51 [improvement](testcase) change order by sql in test_dup_mv_bitmap_hash.groovy to make result stable
change order by sql in test_dup_mv_bitmap_hash.groovy to make result stable
2023-02-02 16:42:58 +08:00
bb179b77f7 [Feature-WIP](inverted index) support array type for inverted index reader (#16355) 2023-02-02 16:14:14 +08:00
a69c0f28ca [typo](doc) revise zh-CN document markdown format in ALTER-SYSTEM-DECOMMISSION-BACKEND (#16221) 2023-02-02 15:42:27 +08:00
a6c1eaf1d8 [refactor] bind slot and function in one rule (#16288)
1. use one rule to bind slot and function and do type coercion to fix type and nullable error
  a. SUM(a1 + AVG(a2)) when a1 and a2 are TINYINT. Before, the return type was SMALLINT, after this PR will return the right type - DOUBLE.
2. fix runtime filter gnerator bugs - bind runtime filter on wrong join conjuncts.
2023-02-02 15:02:32 +08:00
42960ffd08 [typo](docs)fix docs format (#16279) 2023-02-02 14:13:17 +08:00
3b8182ee7e [nereids](nvl) Fix function signature (#16345) 2023-02-02 14:05:51 +08:00
9618427020 [improvement](multi-catalog) increase default batch_size to 4064 (#16326)
The performance of ClickBench Q30 is affected by batch_size:
| batch_size | 1024 | 4096 | 20480 |
| -- | -- | -- | -- |
| Q30 query time | 2.27 | 1.08 | 0.62 |

Because aggregation operator will create a new result block for each batch block, and Q30 has 90 columns, which is time-consuming. Larger batch_size will decrease the number of aggregation blocks, so the larger batch_size will improve performance.

Doris internal reader will read at least 4064 rows even if batch_size < 4064, so this PR keep the process of reading external table the same  as internal table.
2023-02-02 11:51:09 +08:00
69f34cd1c3 [fix](load) sequence column do not compare correctly in memtable (#16211) 2023-02-02 11:00:23 +08:00
eba70f972e [improvement](global context) remove some unused method from runtime state (#16329)
This is part of #16296.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-02 10:24:55 +08:00
1973b3a86f [test](regression) add tvf regression to test the remove of eof check (#16342)
Add regression test for #16302. This regression test will be failed if add EOF check for non-predicate columns.
2023-02-02 10:06:36 +08:00
941e192019 [enhancement](test) add function case date_sub(datetime,INTERVAL dayofmonth(datetime)-1 DAY) (#16306) 2023-02-02 09:56:01 +08:00
696c6ffcc5 [fix](join) crash caused by canceling query (#16311)
If the query was canceled,
the status in shared context may be `OK` with other fields not set.
2023-02-02 09:55:37 +08:00
63042a38bd [fix](memtracker) Fix high frequency load slow lock in memtracker (#16244)
Global lock stuck in memtracker when bthread is frequently created
2023-02-02 09:53:44 +08:00
06db0c6a91 [fix](iceberg) fix meta persist bug of iceberg catalog (#16344)
This PR #16082 forgot to update the GsonUtil for Iceberg Catalog/Database/Table
2023-02-02 09:30:25 +08:00
1c5279d26e [fix](multi-catalog) remove the eof check among parquet columns (#16302)
Read parquet file failed:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed, reason = [CORRUPTION]The number of rows are not equal among parquet columns
```
This error may be thrown when reading non-predicate columns in lazy-read, for example:
A row group with 1000 rows has tow non-predicate columns.
Column A has one page, Column B has two pages with 500 rows for each page.
The read range of `ParquetColumnReader` is [0, 400), and the rows between [0, 450) are all filtered by predicate columns.
So column A can skip the first page, and reach the EOF,  while column B can also skip the first page, but doesn't read the EOF.
2023-02-02 09:22:09 +08:00
aa0837f198 [bugfix](topn) fix topn runtime predicate getting value bug for decimal type (#16331)
* fix topn runtime predicate getting value bug for decimal type

* fix cast_to_string bug for TYPE_DECIMALV2
2023-02-02 09:13:32 +08:00
c4e1c5c15a [Docs](pipeline) Add doc of pipeline execution engine and remove vectorized-execution-engine (#16310)
Add doc of pipeline execution engine and remove vectorized-execution-engine
2023-02-01 23:57:18 +08:00
7c145faa80 [Enhance] use fast_float::from_chars to do str cast to float/double to avoid lose precision (#16190) 2023-02-01 23:53:34 +08:00
40d9e19e1d [feature-wip](multi-catalog) support iceberg union catalog, and add h… (#16082)
support iceberg unified catalog framework, and add hms and rest catalog for the framework
2023-02-01 22:59:42 +08:00
82faa965f5 [Bug](followup) fix datev2 functions (#16330) 2023-02-01 22:38:34 +08:00
b878a7e61e [feature](Load)Suppot skip specific lines number for csv stream load (#16055)
Support set skip line number for stream load to load csv file.

Usage `-H skip_lines:number`:
```
curl --location-trusted -u root: -T test.csv -H skip_lines:5  -XPUT http://127.0.0.1:8030/api/testDb/testTbl/_stream_load
```

Skip line number also can be used in mysql load as below:
```sql
LOAD DATA
LOCAL
INFILE '${mysql_load_skip_lines}'
INTO TABLE ${tableName}
COLUMNS TERMINATED BY ','
IGNORE 2 LINES
PROPERTIES ("auth" = "root:");
```
2023-02-01 20:42:43 +08:00
bb0d4ba787 [BugFix](sort) use correct agg function when using 2 phase sort for agg table (#16185) 2023-02-01 20:07:43 +08:00
0842aa2947 [Fix](MTMV)Support master and follow change in multi fe for mtmv (#16149)
Support master and follow change in multi fe for mtmv

This PR fixes following issues:

1. Start the mtmv only in master node, if master change to follower, it will stop the scheduler.
2. Fix a double meta write here
3. Rename some edit log function and variables
4. If a mv both have PeriodicalJob and immediate job and PeriodicalJob will be trigger right now, scheduler will ignore the immediate job.
5. Fix expired time bugs, and make sure it will be clean among all the fes.
6. cleanerScheduler interval from 1 day to 1 minute.
2023-02-01 20:02:46 +08:00