Commit Graph

8522 Commits

Author SHA1 Message Date
c488e67bd3 [Bug](vectorized)Fix reading date and datetime types conversion error (#16252)
from pr #15612, Type conversion error when reading date and datetime types



---------

Co-authored-by: wudi <>
2023-02-04 23:05:00 +08:00
d2b5015d3f [enhancement](profile) add the profile counter RawRowsRead to record the rows read from the parquet file (#16328) 2023-02-04 22:59:34 +08:00
1deefd7f72 [typo](docs)Remove doris manager doc (#16278)
* [typo](docs)remove doris manager docs

* fix
2023-02-04 22:54:53 +08:00
c3a6eb4f9a [Refactor](function) remove useless function get to create column (#16333)
remove unless create_column to redurce the unless new operator
2023-02-04 22:54:14 +08:00
e5d624ce9c [Enhancement](profile) lazy load profileContent string (#16354)
Sometimes the profileContent of ProfileElement is very large (more than 30MB), and this kind of huge string object may cause performance problems for gc. But we use them only when we invoke profile relevant restful apis (such as /profile/{format}/{query_id}, /api/profile and so on), so we need to lazy load them.
2023-02-04 22:53:44 +08:00
458adf6c91 [improvement](jdbc) refator jdbc of copy result set by batch (#16337)
have test jdbc external table with read,  10%+ performance improvement after optimization
2023-02-04 22:51:55 +08:00
1069d4f91e [Enhancement](Stmt)ShowPartitionsStmt support forward to master #16359
Co-authored-by: duanxujian <duanxujian@jd.com>
2023-02-04 22:51:19 +08:00
63d57b83f3 [fix](memory) Fix request jemallloc metrics wait lock je_malloc_mutex_lock_slow #16381
MetricRegistry::trigger_all_hooks holds the metrics lock and is stuck in get_je_metrics, to_prometheus is waiting for MetricRegistry::trigger_all_hooks to release the lock, so get_je_metrics is no longer called in MetricRegistry::trigger_all_hooks.
2023-02-04 22:49:22 +08:00
bd8ef4edeb [fix](cooldown) Fix core in remove_all_remote_rowsets (#16374) 2023-02-04 22:31:38 +08:00
1473a9716b [fix](cooldown) Fix bug in report tablet (#16414) 2023-02-04 22:30:57 +08:00
1146bde695 [feature-wip](MTMV) Support refresh mtmv (#16218)
Support using this sql to refresh mtmv manually. It can generate a mtmv task right now.

```
REFRESH MATERIALIZED VIEW test_mv_view [complete];
```

You can use `show mtmv task` to show the latest task.

In this pr, I also try to clear the mtmv tasks when drop the mtmv to make sure test suite to be right
2023-02-04 20:17:45 +08:00
60386a46a6 fix ADMIN-CHECK-TABLET typo (#16389) 2023-02-04 18:44:08 +08:00
918004c016 [Bug](date) Fix BE crash caused by function datediff (#16397)
* [Bug](date) Fix BE crash caused by function `datediff`

* update
2023-02-04 18:43:23 +08:00
712fa8c538 [typo](docs) Fixed some display errors caused by MD syntax errors (#16395) 2023-02-04 18:12:05 +08:00
ad78f313be [Improvement](statistics) show analysis job info (#16305)
Supports query analysis job info.

syntax:

```SQL
SHOW ANALYZE
    [TABLE | ID]
    [
        WHERE
        [STATE = ["PENDING"|"RUNNING"|"FINISHED"|"FAILED"]]
    ]
    [ORDER BY ...]
    [LIMIT limit];
```
example:

```SQL
SHOW ANALYZE test_table1 WHERE state = 'FINISHED' ORDER BY col_name LIMIT 1;
```

result:

| job_id  | catalog_name | db_name              | tbl_name    | col_name | job_type | analysis_type | message | last_exec_time_in_ms | state    | schedule_type |
| ------ |  ------------ | -------------------- | ----------- | -------- | -------- | ------------- | ------- | -------------------- | -------- | ------------- |
| 10086   | internal     | default_cluster:test | test_table1 | pv       | MANUAL   | FULL          |         | 2023-02-01 09:36:41          | FINISHED | ONCE          |
2023-02-03 23:21:47 +08:00
f443ebfd9a [Improvement](statistics) optimise histogram keyword (#16369) 2023-02-03 23:02:41 +08:00
125b60b4b9 [improvement](compatibility) add DATA_TYPE in information schema for new types #16391
Add DATA_TYPE in information schema for types: datev2, datatimev2, decimal, jsonb. It was 'unknown' for these types and cause problem for tools such as BI using information schema.
2023-02-03 22:28:42 +08:00
b621d1d68a [docs](docs) update en docs (#16257) 2023-02-03 22:01:43 +08:00
4f778c38a1 [feature](nereids) support explore 4 phase aggregation (#16298)
support 4 phase Aggregation.
example: 
`select count(distinct k1), sum(k2) from t`
suppose t.k0 is distribute key.

we have plan 
```
Agg(DISTINCT_GLOBAL)
   |
Exchange(Gather)
  |
Agg(DISTINCT_LOCAL)
  |
Agg(GLOBAL)
  |
Exchange(hash distribute by k1)
 |
Agg(LOCAL) 
 |
scan
```

limitations:
1. only support sql with one distinct.
not support:`select count(distinct k1), count(distinct k2) from t`
2. only support sql with distinct one column
not support: `select count(distinct k1, k2) from t`
2023-02-03 21:51:10 +08:00
56be2e5a1a [bugfix](disk balance) fix new rowset time check when add tablet (#16261)
In disk balancer, if a tablet is in highly concurrent load,
new rowset creation time(which use current time) may be same as the
newest rowset, and when add tablet, there has a creation time check
that new_time must bigger than old time, so disk balancer will failed
many times and makes this tablet lose many verisons as migration will
block writes.
2023-02-03 21:49:37 +08:00
54c85e36ad [Fix](point query) OlapScanNode reuslt could be memleak since it's cached (#16406)
Cached OlapScanNode each time call `addScanRangeLocations` will add TScanRangeLocations to result.
So `result` could grow too large and lead `getReplicaNumPerHost` a cpu hot spot in it's loop.
2023-02-03 21:42:53 +08:00
5e232a30d8 [fix](planner) Doris returns empty sets when select from a inline view (#16370)
Doris always delays the execution of expressions as possible as it can, so as the expansion of constant expression. Given below SQL:

```sql
select i from (select 'abc' as i, sum(birth) as j from  subquerytest2) as tmp
```

The aggregation would be eliminated, since its output is not required by the outer block, but the expasion for constant expression would be done in the final result expr, and since aggreagete output has been eliminate, the expasion would actually do nothing, and finally cause a empty results.

To fix this, we materialize the results expr in the inner block for such SQL, it may affect performance, but better than let system produce a mistaken result.
2023-02-03 21:23:52 +08:00
a5d9aca7ba [test](Nereids) enable G-K and L-Q scalar function regression test cases (#16169)
1. delete invalid signature of nvl function 
2. fix some test cases that failed because of malformed function name
2023-02-03 21:18:43 +08:00
87fbb8341a [Bug](datev2) Fix bug when cast datev2 to date (#16394) 2023-02-03 20:50:16 +08:00
f94a78ab4a [Fix](topn) fix wrong nullable cast for RowId column and use heapsorter for two phase read (#16399)
convert_nullable_flags does not contain nullable info for RowID column, but valid_column_ids contain RowID column, nullable falg will be undefined for RowID column
2023-02-03 20:49:45 +08:00
929b31bd3c [Feature](Nereids) Support CaseWhen with subquery (#16385)
Co-authored-by: jianghaochen <jianghaochen@meituan.com>
2023-02-03 18:20:47 +08:00
3891083474 [fix](Nereids): fix some bugs in DpHyper (#16282) 2023-02-03 18:19:48 +08:00
3f4ca3da32 [Bug](CURRENT_TIMESTAMP) Fix wrong default value after schema change (#16364)
* [Bug](CURRENT_TIMESTAMP) Fix wrong default value after schema change

* update

* update
2023-02-03 17:06:24 +08:00
4df70becb9 [refactor](reader) refactor broker_file_reader to get _client in the constructor (#16021) 2023-02-03 16:51:19 +08:00
13cb81a724 [fix](broker) Fix bug that heavy broker load may failed due to BrokerException which indicate the fd is not owned by client (#16350)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-02-03 15:06:45 +08:00
6294b29f0a [chore](regression-test) Remove array config in regression test (#16376)
The fe config "enable_array_type" is not used, this commit removes it from regression test.
2023-02-03 14:44:03 +08:00
b1fd124f02 [feature](struct-type/map-type) Add switch for struct and map type for creating table (#16379)
Add switches to forbid uses creating table with struct or map column.
2023-02-03 13:46:52 +08:00
dfb610d7ec [fix](nereids) the order exprs in sort node should be slotRef in its tupleDesc (#16363) 2023-02-03 13:28:08 +08:00
a9177569c6 [refactor](Nereids) remove trick datatype code in Expression (#16365)
Since we already do typeCoercion bottom-up in binding step.
The trick codes of dataType in Expression are useless.
This PR try to remove them.
2023-02-03 13:02:34 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
7d5a10e1af [bug](function) fix mask_first_n function can't handle const value (#16308) 2023-02-03 10:32:42 +08:00
4fc0715156 [fix](auth)fix external catalog cannot use db (#16269) 2023-02-03 10:10:33 +08:00
545b91f8f7 [bug](jdbc) fix jdbc insert decimalv3 be core dump (#16353) 2023-02-03 10:00:06 +08:00
7a800bd3c6 [fix](scan) coredump caused by null of _scanner_ctx (#16361) 2023-02-03 09:24:15 +08:00
13f74088fa [Improve](row-store) check light schema change enabled (#16358) 2023-02-02 20:57:18 +08:00
1d8265c5a3 [refactor](row-store) make row store column a hidden column in meta (#16251)
This could simplfy storage engine logic and make code more readable, and we could analyze
the hidden `__DORIS_ROW_STORE_COL__` length etc..
2023-02-02 20:56:13 +08:00
6ee0dbfb23 [fix](cooldown) Fix bugs in cooldown single replica files (#16299) 2023-02-02 19:31:26 +08:00
Pxl
0d5b115993 [Feature](Materialized-View) support duplicate base column for diffrent aggregate function (#15837)
support duplicate base column for diffrent aggregate function
2023-02-02 18:57:39 +08:00
e31913faca [Feature](Nereids) Support order and limit in subquery (#15971)
1.Compatible with the old optimizer, the sort and limit in the subquery will not take effect, just delete it directly.
```
select * from sub_query_correlated_subquery1 where sub_query_correlated_subquery1.k1 > (select sum(sub_query_correlated_subquery3.k3) a from sub_query_correlated_subquery3 where sub_query_correlated_subquery3.v2 = sub_query_correlated_subquery1.k2 order by a limit 1);
```

2.Adjust the unnesting position of the subquery to ensure that the conjunct in the filter has been optimized, and then unnesting

Support:
```
SELECT DISTINCT k1 FROM sub_query_correlated_subquery1 i1 WHERE ((SELECT count(*) FROM sub_query_correlated_subquery1 WHERE ((k1 = i1.k1) AND (k2 = 2)) or ((k1 = i1.k1) AND (k2 = 1)) )  > 0);
```
The reason why the above can be supported is that conjunction will be performed, which can be converted into the following
```
SELECT DISTINCT k1 FROM sub_query_correlated_subquery1 i1 WHERE ((SELECT count(*) FROM sub_query_correlated_subquery1 WHERE ((k1 = i1.k1) AND (k2 = 2 or k2 = 1)) )  > 0);
```

Not Support:
```
SELECT DISTINCT k1 FROM sub_query_correlated_subquery1 i1 WHERE ((SELECT count(*) FROM sub_query_correlated_subquery1 WHERE ((k1 = i1.k1) AND (k2 = 2)) or ((k2 = i1.k1) AND (k2 = 1)) )  > 0);
```
2023-02-02 18:17:30 +08:00
cb6875b5a4 [improvement](multi-catalog) use date/datetimev2 as default col type for catalog table (#16304)
1. When mapping column from external datasource, use date/datetimev2 as default type
2. check `is_cancelled` when read data, to avoid endless loop after query is cancelled
2023-02-02 17:35:48 +08:00
557159d3ce [feature](JdbcExternalCatalog) support insert data in JdbcExternalCatalog (#16271) 2023-02-02 17:31:33 +08:00
09abd32957 [fix](test) result order in group-by-costant case is not stable (#16323) 2023-02-02 16:54:01 +08:00
398da44e46 [fix](Nereids) fix bugs in test join5 (#16312)
make bucket-shuffle-join in PhysicalPlanTranlator when property of left child is not enforced
2023-02-02 16:51:45 +08:00
68d2067f51 [improvement](testcase) change order by sql in test_dup_mv_bitmap_hash.groovy to make result stable
change order by sql in test_dup_mv_bitmap_hash.groovy to make result stable
2023-02-02 16:42:58 +08:00
bb179b77f7 [Feature-WIP](inverted index) support array type for inverted index reader (#16355) 2023-02-02 16:14:14 +08:00