Commit Graph

3725 Commits

Author SHA1 Message Date
ab4c718478 [fix](iceberg) remove s3 default temporary credentials #16543
remove TemporaryAWSCredentialsProvider in global s3 source

Co-authored-by: jinzhe <jinzhe@selectdb.com>
2023-02-09 15:36:35 +08:00
ba4b6aa0c0 [hot-fix](cooldown) Fix unknown module cooldownJob when load fe image #16545 2023-02-09 15:36:01 +08:00
4b093d1ef6 [Bug](point query) when prepared statement used lazyEvaluateRangeLocations should clear bucketSeq2locations to avoid memleak (#16531)
When JDBC client enable server side prepared statement, it will cache OlapScanNode and reuse it for performance, but each
time call `addScanRangeLocations` will add new item to `bucketSeq2locations`, so the `bucketSeq2locations` lead to a memleak if OlapScanNode
cached in memory
2023-02-09 14:41:07 +08:00
531616b8ee [Fix](bucket)fix partition with no history data && AutoBucketUtilsTest (#16516)
fix partition with no history data && AutoBucketUtilsTest (#16515)
2023-02-09 10:17:25 +08:00
e1f1386395 [fix](cooldown) Rewrite update cooldown conf (#16488)
Remove error-prone CooldownJob, and use CooldownConfHandler to update Tablet's cooldown conf.
Some bug fix about cooldown.
2023-02-09 09:12:55 +08:00
d1c6b81140 [Bug](log) add some log to find out bug (#16518) 2023-02-08 21:23:02 +08:00
f0b0eedbc5 [fix](planner)group_concat lost order by info in second phase merge agg (#16479) 2023-02-08 20:48:52 +08:00
a512469537 [fix](planner) cannot process more than one subquery in disjunct (#16506)
before this PR, Doris cannot process sql like that
```sql
CREATE TABLE `test_sq_dj1` (
    `c1` int(11) NULL,
    `c2` int(11) NULL,
    `c3` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`c1`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`c1`) BUCKETS 3
PROPERTIES (
    "replication_allocation" = "tag.location.default: 1",
    "in_memory" = "false",
    "storage_format" = "V2",
    "disable_auto_compaction" = "false"
);

CREATE TABLE `test_sq_dj2` (
    `c1` int(11) NULL,
    `c2` int(11) NULL,
    `c3` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`c1`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`c1`) BUCKETS 3
PROPERTIES (
    "replication_allocation" = "tag.location.default: 1",
    "in_memory" = "false",
    "storage_format" = "V2",
    "disable_auto_compaction" = "false"
);

insert into test_sq_dj1 values(1, 2, 3), (10, 20, 30), (100, 200, 300);
insert into test_sq_dj2 values(10, 20, 30);

-- core
SELECT * FROM test_sq_dj1 WHERE c1 IN (SELECT c1 FROM test_sq_dj2) OR c1 IN (SELECT c1 FROM test_sq_dj2) OR c1 < 10;

-- invalid slot
SELECT * FROM test_sq_dj1 WHERE c1 IN (SELECT c1 FROM test_sq_dj2) OR c1 IN (SELECT c2 FROM test_sq_dj2) OR c1 < 10;
```

there are two problems:
1. we should remove redundant sub-query in one conjuncts to avoid generate useless join node
2. when we have more than one sub-query in one disjunct. we should put the conjunct contains the disjunct at the top node of the set of mark join nodes. And pop up the mark slot to the top node.
2023-02-08 18:46:06 +08:00
bb334de00f [enhancement](load) Change transaction limit from global level to db level (#15830)
Add transaction size quota for database

Co-authored-by: wuhangze <wuhangze@jd.com>
2023-02-08 18:04:26 +08:00
666f7096f2 [Fix](multi catalog)(planner) Fix external table statistic collection bug (#16486)
Add index id to column statistic id. Refresh statistic cache after analyze.
2023-02-08 16:51:30 +08:00
b06e6b25c9 [improvement](fuzzy) print fuzzy session variable in FE audit log (#16493)
* [improvement](fuzzy) print fuzzy session variable in FE audit log
2023-02-08 16:38:04 +08:00
e11437d1fe [fix](planner) npe in RewriteBinaryPredicatesRule (#16401)
RewriteBinaryPredicatesRule rewrite expression like 
`cast(A decimal) > decimal` to `A > some_other_bigint`
in order to:
1. push down the rewrite predicate 
2. avoid convert column A to decimal

We get the datatype of `A` by `expr0.getSrcSlotRef().getColumn().getType()`.
However, when A is result of a function from sub-query, this rule is not applicable.
For example:
```
select * 
from (
       select TIMESTAMPDIFF(MINUTE,startTime,endTime) AS timediff 
        from CNC_SliceSate) T  
where timediff > 5.0;
```
we cannot push predicate down to OlapScan(CNC_SliceSate) to save effort.
2023-02-08 15:57:35 +08:00
2883f67042 [fix](iceberg) update iceberg docs and add credential properties (#16429)
Update iceberg docs
Add new s3 credential and properties
2023-02-08 13:53:01 +08:00
41947c73eb [Feature](array-function) Support array functions for nested type datev2 and datetimev2 (#16382) 2023-02-08 12:51:07 +08:00
98c741d664 [fix](Nereids): FilterOrSelf shouldn't And all predicates.. (#16491) 2023-02-08 12:42:22 +08:00
583001bd92 [Bug](share hash table) Support shared hash table on Nereids (#16474) 2023-02-08 11:51:27 +08:00
254790c564 [fix](nereids) FE nereids use DateV2Literal instead of 'cast datev2' (#16386)
BE already support DateV2Literal, and hence, remove code in FE which convert DateV2Literal to Cast datev2
2023-02-08 10:51:35 +08:00
81dbed70c2 [fix](Nereids) back off on tpch p1 (#16478)
adjust nullable on empty set should apply after unnested sub-query
some function should propagate nullable when args are datev2 or datetimev2
add back tpch sf0.1 nereids regression test
2023-02-08 10:43:13 +08:00
a4c28e6efa [Fix](Nereids) runtime filter cannot generate when expression is cast. (#16120) 2023-02-07 20:28:07 +08:00
1d0fdff98a [Bug](sort) disable 2phase read for sort by expressions exclude slotref (#16460)
```
create table tbl1 (k1 varchar(100), k2 string) distributed by hash(k1) buckets 1 properties("replication_num" = "1");

insert into tbl1 values(1, "alice");

select cast(k1 as INT) as id from tbl1 order by id limit 2;
```

The above query could pass `checkEnableTwoPhaseRead` since the order by element is SlotRef but actually it's an function call expr
2023-02-07 19:42:54 +08:00
796d51ae2e [enhance](fuzzy)set rewriteOrToInPredicateThreshold=2/10000 in fuzzy mode (#16456)
* set rewriteOrToInPredicateThreshold=2/10000 in fuzzy mod

* fmt
2023-02-07 12:45:27 +08:00
6fdd35a6f2 [enhancement](mpp process) remove unused method and make report process more clear (#16441)
both update status and open_vectorized_internal will call send_report and stop report thread. move update_status code to open method and remove unnecessary send_report and stop_report_thread.


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-07 12:28:55 +08:00
bed1ab7c19 [Feature](Nereids) Add hint to enable pre-aggregation when scan OLAP table. (#15614)
This pr added support for the pre-aggregation hint. Users could use /*+PREAGGOPEN*/ to enable pre-preaggregation for OLAP table.
For example:
Let's say we have an aggregate-keys table t (k1 int, k2 int, v1 int sum, v2 int sum). Pre-aggregation could be enabled by query with a hint: select k1, v1 from t /*+PREAGGOPEN*/.
2023-02-07 11:59:10 +08:00
0b8c6315fb [fix](broker load) Fix hll_hash(null) in broker load report incorrect Exception (#16293)
Co-authored-by: wuhangze <wuhangze@jd.com>
2023-02-07 11:32:20 +08:00
a13beca0de [Fix](load)Use lower case for load column names. #16422
The columns name in stream load and broker load are case sensitive, make it case insensitive. This would be consist with query, because query sql columns name are case insensitve.
2023-02-07 09:18:37 +08:00
dcbcec0775 [regression](fuzzy)fuzzy enable_fold_constant_by_be (#16448)
* [fuzzy](test) fuzzy some session variables stably according to pull_request_id

* fuzzy enable_fold_constant_by_be

---------

Co-authored-by: stephen <hello_stephen@@qq.com>
2023-02-07 09:17:50 +08:00
3334e3f393 [fix](restore) do not set default replication_allocation when restore with property reserve_replica = true (#15562)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-02-06 22:38:03 +08:00
737c73dcf0 [Improvement](topn) order by key topn query optimization (#15663) 2023-02-06 15:36:05 +08:00
719b8ca340 [enhance](Nereids): polish code (#16368) 2023-02-06 12:06:55 +08:00
a17fbe2b4c [fix](MTMV) Use current db to identify the MTMV tasks and jobs (#16419)
Show MTMV JOB/Task will list all the jobs and tasks among different databases in spite of the current database.

Now use current db to identify the mtmv tasks and jobs. Only the user who did not use a database can list all job and tasks among different databases.
2023-02-06 12:03:29 +08:00
dccd04a3ba [fix](fe)predicate is wrongly pushed through CUBE function (#15831) 2023-02-06 11:29:15 +08:00
a390252893 [fix](keywork) add TIME to keyword (#16277) 2023-02-06 11:07:11 +08:00
f940cf4cf6 [fix](multi-catalog) fix recursive get schema cache bug (#16415) 2023-02-06 09:23:07 +08:00
b1b2697cc7 [fix](iceberg) fix iceberg catalog (#16372)
1. Fix iceberg catalog access s3
2. Fix iceberg catalog partition table query
3. Fix persistence
2023-02-05 13:15:28 +08:00
df3a6e2412 [fix](fe)only set column info for slots in sortTupleDesc (#16407) 2023-02-04 23:14:25 +08:00
e5d624ce9c [Enhancement](profile) lazy load profileContent string (#16354)
Sometimes the profileContent of ProfileElement is very large (more than 30MB), and this kind of huge string object may cause performance problems for gc. But we use them only when we invoke profile relevant restful apis (such as /profile/{format}/{query_id}, /api/profile and so on), so we need to lazy load them.
2023-02-04 22:53:44 +08:00
458adf6c91 [improvement](jdbc) refator jdbc of copy result set by batch (#16337)
have test jdbc external table with read,  10%+ performance improvement after optimization
2023-02-04 22:51:55 +08:00
1069d4f91e [Enhancement](Stmt)ShowPartitionsStmt support forward to master #16359
Co-authored-by: duanxujian <duanxujian@jd.com>
2023-02-04 22:51:19 +08:00
1146bde695 [feature-wip](MTMV) Support refresh mtmv (#16218)
Support using this sql to refresh mtmv manually. It can generate a mtmv task right now.

```
REFRESH MATERIALIZED VIEW test_mv_view [complete];
```

You can use `show mtmv task` to show the latest task.

In this pr, I also try to clear the mtmv tasks when drop the mtmv to make sure test suite to be right
2023-02-04 20:17:45 +08:00
ad78f313be [Improvement](statistics) show analysis job info (#16305)
Supports query analysis job info.

syntax:

```SQL
SHOW ANALYZE
    [TABLE | ID]
    [
        WHERE
        [STATE = ["PENDING"|"RUNNING"|"FINISHED"|"FAILED"]]
    ]
    [ORDER BY ...]
    [LIMIT limit];
```
example:

```SQL
SHOW ANALYZE test_table1 WHERE state = 'FINISHED' ORDER BY col_name LIMIT 1;
```

result:

| job_id  | catalog_name | db_name              | tbl_name    | col_name | job_type | analysis_type | message | last_exec_time_in_ms | state    | schedule_type |
| ------ |  ------------ | -------------------- | ----------- | -------- | -------- | ------------- | ------- | -------------------- | -------- | ------------- |
| 10086   | internal     | default_cluster:test | test_table1 | pv       | MANUAL   | FULL          |         | 2023-02-01 09:36:41          | FINISHED | ONCE          |
2023-02-03 23:21:47 +08:00
f443ebfd9a [Improvement](statistics) optimise histogram keyword (#16369) 2023-02-03 23:02:41 +08:00
4f778c38a1 [feature](nereids) support explore 4 phase aggregation (#16298)
support 4 phase Aggregation.
example: 
`select count(distinct k1), sum(k2) from t`
suppose t.k0 is distribute key.

we have plan 
```
Agg(DISTINCT_GLOBAL)
   |
Exchange(Gather)
  |
Agg(DISTINCT_LOCAL)
  |
Agg(GLOBAL)
  |
Exchange(hash distribute by k1)
 |
Agg(LOCAL) 
 |
scan
```

limitations:
1. only support sql with one distinct.
not support:`select count(distinct k1), count(distinct k2) from t`
2. only support sql with distinct one column
not support: `select count(distinct k1, k2) from t`
2023-02-03 21:51:10 +08:00
54c85e36ad [Fix](point query) OlapScanNode reuslt could be memleak since it's cached (#16406)
Cached OlapScanNode each time call `addScanRangeLocations` will add TScanRangeLocations to result.
So `result` could grow too large and lead `getReplicaNumPerHost` a cpu hot spot in it's loop.
2023-02-03 21:42:53 +08:00
5e232a30d8 [fix](planner) Doris returns empty sets when select from a inline view (#16370)
Doris always delays the execution of expressions as possible as it can, so as the expansion of constant expression. Given below SQL:

```sql
select i from (select 'abc' as i, sum(birth) as j from  subquerytest2) as tmp
```

The aggregation would be eliminated, since its output is not required by the outer block, but the expasion for constant expression would be done in the final result expr, and since aggreagete output has been eliminate, the expasion would actually do nothing, and finally cause a empty results.

To fix this, we materialize the results expr in the inner block for such SQL, it may affect performance, but better than let system produce a mistaken result.
2023-02-03 21:23:52 +08:00
929b31bd3c [Feature](Nereids) Support CaseWhen with subquery (#16385)
Co-authored-by: jianghaochen <jianghaochen@meituan.com>
2023-02-03 18:20:47 +08:00
3891083474 [fix](Nereids): fix some bugs in DpHyper (#16282) 2023-02-03 18:19:48 +08:00
3f4ca3da32 [Bug](CURRENT_TIMESTAMP) Fix wrong default value after schema change (#16364)
* [Bug](CURRENT_TIMESTAMP) Fix wrong default value after schema change

* update

* update
2023-02-03 17:06:24 +08:00
b1fd124f02 [feature](struct-type/map-type) Add switch for struct and map type for creating table (#16379)
Add switches to forbid uses creating table with struct or map column.
2023-02-03 13:46:52 +08:00
dfb610d7ec [fix](nereids) the order exprs in sort node should be slotRef in its tupleDesc (#16363) 2023-02-03 13:28:08 +08:00
a9177569c6 [refactor](Nereids) remove trick datatype code in Expression (#16365)
Since we already do typeCoercion bottom-up in binding step.
The trick codes of dataType in Expression are useless.
This PR try to remove them.
2023-02-03 13:02:34 +08:00