Commit Graph

8289 Commits

Author SHA1 Message Date
54e68fe250 [feature](cooldown)add ut for CooldownConfHandler (#17007)
* add ut for CooldownConfHandler

* add ut for CooldownConfHandler

* add ut for CooldownConfHandler
2023-02-24 17:06:55 +08:00
Pxl
0691586eb7 [Chore](regression-test) add createMV action && add some mv case from fe ut MaterializedViewFunctionTest (#16825)
1. add createMV action
2. add some mv case from fe ut MaterializedViewFunctionTest
3. reduce mv scheduler interval time from 10s to 0.3s
2023-02-24 16:35:37 +08:00
cf5bc9594b [fix](planner) conjuncts of the outer query block didn't work when it's on the results expr of inline view (#17036)
Here is a cases:

select id, name
from (select '123' as id, '1234' as name, age from test_insert ) a
where name != '1234';
2023-02-24 15:27:34 +08:00
c39914c0a0 [feature](partition)add default list partition (#15509)
This pr implements the list default partition referred in related #15507.
It's similar as GreenPlum's default's partition which would store all data not satisfying prior partition key's
constraints and optimizer wouldn't filter default partition which means default partition would be scanned
each time you try to select data from one table with default partition.

User could either create one table with default partition or alter add one default partition.

```sql
PARTITION LIST(key) {
PARTITION p1 values in (xx,xx),
PARTITION DEFAULT
}

ALTER TABLE XXX ADD PARTITION DEFAULT
```

We don't support automatically migrate data inside default partition which meets newly added partition key's
constraint to newly add partition when alter add new partition. User should select default partition using new 
constraints as predicate and insert them to new partition.

```sql
insert into tbl select * from tbl partition default where partition_key=xx;
```
2023-02-24 15:24:59 +08:00
479d57df88 [fix](planner) the project expr should be calculated in join node in some case (#17035)
Consider the sql bellow:

select sum(cc.qlnm) as qlnm
FROM
  outerjoin_A
  left join (SELECT
      outerjoin_B.b,
      coalesce(outerjoin_C.c, 0) AS qlnm
    FROM
      outerjoin_B
      inner JOIN outerjoin_C ON outerjoin_B.b = outerjoin_C.c
  ) cc on outerjoin_A.a = cc.b
group by outerjoin_A.a;

The coalesce(outerjoin_C.c, 0) was calculated in the agg node, which is wrong.
This pr correct this, and the expr is calculated in the inner join node now.
2023-02-24 15:20:05 +08:00
d562428b1d [enhancement](memory) reduce memory usage for failed broker loads (#16974)
Reduce more memory usage for failed broker load msg in fe after pr  #15895
2023-02-24 12:07:02 +08:00
c3538ca804 [Enhancement](HttpServer) Add http interface authentication (#16571)
1. Organize http documents
2. Add http interface authentication for FE
3. Support https interface for FE
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
2023-02-24 10:59:33 +08:00
a12b3c3f0c [fix](alter inverted index) fix incorrect CreateTime of 'show alter' query result after fe restart (#17043)
For add or drop inverted index, when replay the logModifyTableAddOrDropInvertedIndices will new a schema change job, that has a new CreateTime, here should new a schema change job when not replay log.
2023-02-24 10:25:48 +08:00
7229751bd9 [Improve](map-type) Add contains_null for map (#16948)
Add contains_null for map type.
2023-02-23 20:47:26 +08:00
92ecd16573 (feature)[DOE]Support array for Doris on ES (#16941)
* (feature)[DOE]Support array for Doris on ES
2023-02-23 19:31:18 +08:00
48fd528a2b [feature](Nereids) Add hint NTH_OPTIMIZED_PLAN to let the optimzier select n-th optimized plan (#16992)
Add hint NTH_OPTIMIZED_PLAN to let the optimzier can select n-th optimized plan. For example, you could use,

select /*+SET_VAR("nth_optimized_plan"=2) */ * from table;

to select the second-best plan in the optimizer.
2023-02-23 18:56:51 +08:00
edead494cb [Enhancement](storage) add a new hidden column __DORIS_VERSION_COL__ for unique key table (#16509) 2023-02-23 15:47:17 +08:00
3ea6478ba8 [feature](multi-catalog) parquet reader support nested array column (#16961)
Support to decode nested array column in parquet reader:
1. FE should generate the right nested column type. FE doesn't check the nesting depth and legality, like map\<array\<int\>, int\>.
2. `ParquetColumnReader` has removed the filtering of page index to support nested array type.
    It's too difficult to skip values in nested complex  types. Maybe we should support the filtering of page index and lazy read in later PR.
3. `ExternalFileScanNode` has a bug in creating default value expression.
4. Maybe it's slow to read repetition levels in a while loop. I'll optimize this in next PR.
5. Array column has temporary `SchemaElement` in its thrift definition,
we have removed them and keep its parent in former implementation.
The remaining parent should inherit the repetition and definition level of its child.
2023-02-23 14:54:58 +08:00
c2cc75d741 [BugFix](Jdbc Catalog) Fix null pointer exception in JdbcExecutor (#16958)
This pr do two things:
1. fix: 
    It use `column[0]` to judge class type in JdbcExecutor, but column[0] may be null !

2. Enhencement
    In the original logic, all fields in jdbc catalog table will be set Nullable.
    However, it is inefficient for nullable fields. Actually, we can know if the fields in data source table
    is nullable through jdbc. So we can set the corresponding fields in Doris jdbc catalog to nullable or not.
2023-02-23 14:04:54 +08:00
51bbae27b8 [feature-wip](iceberg) add dlf and glue catalog impl for iceberg catalog (#16602)
iceberg catalog supports
DLF on Alibaba Cloud and AWS Glue Catalog
2023-02-23 14:02:41 +08:00
bc619ce5be [Fix](load)Pass hidden column to load columns (#17004)
The LoadScanProvider doesn't get Hidden Columns from stream load parameter.
This may cause stream load delete operation fail. This pr is to pass the hidden columns to LoadScanProvider.
2023-02-23 13:54:36 +08:00
a9fb47a80a [fix](planner) create view init bug (#16890)
the body of create view stmt is parsed twice.
in the second parse, we get sql string from CreateViewStmt.viewDefStmt.toSql() function, which missed selectlist.
2023-02-22 20:40:08 +08:00
df2f248712 [feature](planner) add dayofweek for FEFunctions to support fold constant (#16993)
add dayofweek for FEFunctions to support fold constant. use Zellar algorithm
2023-02-22 20:27:49 +08:00
7aa063c1f3 [fix](planner) bucket shuffle join is not recognized if the first table is a subquery (#16985)
consider sql select *
from
(select * from test_1) a
inner join
(select * from test_2) b
on a.id = b.id
inner join
(select * from test_3) c
on a.id = c.id

Because a.id is from a subquery, to find its source table, need use function getSrcSlotRef().
2023-02-22 20:23:00 +08:00
4c92730c3a [fix](planner)fix multi partition support datetime column #16759 2023-02-22 19:38:42 +08:00
dc3dab5a23 [vectorized](jdbc) fix jdbc connect sql server error (#16929) 2023-02-22 19:36:27 +08:00
12b6786522 [fix](hive) fix unable to specify user to access hdfs (#16999)
In version 1.2.1, user can set `"hadoop.username" = "xxx"` to specify a remote user to access hdfs
when creating hive catalog.
But in version 1.2.2, we upgrade the hadoop version from 2.8 to 3.3, some behavior changed and the
user specified remote user is useless.

This PR try to fix this by using `UserGroupInformation` to delegate.
2023-02-22 19:35:40 +08:00
7956800df7 [refactor](Nereids) let type coercion same with legacy planner (#16844)
- change for Nereids
1. add a variable length parameter to the ctor of Count for a good error reporting of Count(a, b)
2. refactor StringRegexPredicate, let it inherit from ScalarFunction
3. remove useless class TypeCollection
4. use catalog.Type.Collection to check expression arguments type
5. change type coercion for TimestampArithmetic, divide, integral divide, comparison predicate, case when and in predicate. Let them same as legacy planner.

- change for legacy planner
1. change the common type of floating and Decimal from Decimal to Double
2023-02-22 17:29:37 +08:00
66ceab540a [fix](replica) Fix inconsistent replica id between BE and FE in corner case of tablet rebalance (#16889) 2023-02-22 16:21:11 +08:00
76ef4af29d [fix](alter inverted index) fix write edit log in replaymodifyTableAddOrDropInvertedIndices function (#16977)
Actually, when modifyTableAddOrDropInvertedIndices, no need write logAlterJob edit log, because write logModifyTableAddOrDropInvertedIndices is enough
2023-02-21 22:36:56 +08:00
0de8f90a83 [enhancement](nereids) add a session variable to control join reorder algorithm (#16783)
1. disable join reorder in nereids if session variable disable_join_reorder is true.
2. add a session variable max_table_count_use_cascades_join_reorder to control join reorder algorithm in nereids. if dp hyper is used only when enable_dphyp_optimizer is true and the joined table count more than max_table_count_use_cascades_join_reorder, which default value is 10.
2023-02-21 21:08:39 +08:00
54bf40b6e7 [feature](Nereids): Eliminate duplicate join condition. (#16910) 2023-02-21 19:40:44 +08:00
a95f47ac0a [ehancement](planner) Support filter the output of set operation node (#16666) 2023-02-21 19:22:09 +08:00
ed05f3b480 [regression-test](fuzzy) fuzzy session variable batch_size (#16384) 2023-02-21 17:53:19 +08:00
cc839aead7 [fix](Nereids) fix signatures of some window functions (#16871)
change signatures of lead(), lag(), first_value(), last_value() to be equal with legacy optimizer;
these four functions only support Type.trivialTypes as returnType and input column type
2023-02-21 15:55:29 +08:00
44fed0e99b [Fix](multi catalog)(nereids)Enable runtime filter for external table (#16855)
Enable runtime filter for external table.
2023-02-21 10:35:58 +08:00
57519fcf50 [fix](information_schema) catch and skip exception when getting schema from FE catalog (#16647)
When querying information_schema database, BE will call FE RPC
to get schema info such as db name list, table name list, etc.
But some external catalog when failed to get these info because of wrong connection info.
We should catch these kind of exception and skip it, so that it can continue to
get schema info of other catalogs.
Otherwise, the whole query on information_schema will fail, even if user just want to get
info of internal catalog.

And set jdbc connection timeout to 5s, to avoid thrift rpc timeout from BE to FE(default is 30s)
2023-02-21 08:43:09 +08:00
Pxl
ce3afe7f13 [Enchancement](Materialized-View) forbiden some case in create mv with group by and fix select fail on g… (#16820)
1. forbiden some case in create mv with group by

select k1+1,sum(abs(k2+2)+k3+3) from d_table group by k1; 

   2. fix select fail on grouping column have diffrent expr with select list

create materialized view k1p2ap3psg as select k1+1,sum(abs(k2+2)+k3+3) from d_table group by k1+1;

mysql [test]>explain select k1+1,sum(abs(k2+2)+k3+3) from d_table group by k1;
ERROR 1105 (HY000): errCode = 2, detailMessage = select list expression not produced by aggregation output (missing from GROUP BY clause?): `k1` + 1
2023-02-20 13:04:50 +08:00
1011422e6d [feature](Nereids): infer isNotNull from Inner/Semi/Anti Join (#16821) 2023-02-20 12:14:15 +08:00
0b96ddc090 [improvement](help-doc) Add help doc format unit test (#16904)
[improvement](help-doc) Add help doc format unit test #16904
2023-02-20 12:02:51 +08:00
a17a32ebd4 [improve](show alter) add more infos to 'show alter' result for schema change job (#16843) 2023-02-20 11:59:06 +08:00
97230a54fb [Refactor](auth)(step-2) Add AccessController to support customized authorization (#16802)
Support specifying AccessControllerFactory when creating catalog

create catalog hive properties(
...
"access_controller.class" = "org.apache.doris.mysql.privilege.RangerAccessControllerFactory",
"access_controller.properties.prop1" = "xxx",
"access_controller.properties.prop2" = "yyy",
...
)
So that user can specified their own access controller, such as RangerAccessController

Add interface to check column level privilege

A new method of CatalogAccessController: checkColsPriv(),
for checking column level privileges.

TODO:
Support grant column level privileges statements in Doris

Add TestExternalCatalog/Database/Table/ScanNode

These classes are used for FE unit test. In unit test you can

create catalog test1 properties(
    "type" = "test"
    "catalog_provider.class" = "org.apache.doris.datasource.ColumnPrivTest$MockedCatalogProvider"
    "access_controller.class" = "org.apache.doris.mysql.privilege.TestAccessControllerFactory",
    "access_controller.properties.key1" = "val1",
    "access_controller.properties.key2" = "val2"
);
To create a test catalog, and specify catalog_provider to mock database/table/schema metadata

Set roles in current user identity in connection context

The roles can be used for authorization in access controller.
2023-02-20 10:32:48 +08:00
5291f14aff [vectorized](udf) java udf support array type (#16841) 2023-02-20 10:00:25 +08:00
1c6c28b8fb [Enhance](ComputeNode) K8sDeployManager support domain (#16897)
Describe your changes.
1.DeployManager adds the ability to obtain domain names from third-party systems

2.When the DeployManager determines whether the node exists, add the domain name judgment logic

3.rename Backend.getHost() to getIp() 

4.Delete the logic for handling UnknownHostException in FQDNManager, because there are two cases of UnknownHostException. If it occurs temporarily, it can wait for the next detection. If the node is deleted, the logic can be handed over to DeployManager for processing.
2023-02-19 21:30:18 +08:00
73f7979b73 [fix](struct-type) forbid struct-type to be distributed key/aggregation key and add more tests (#16626)
This commits forbid struct and map type to be distributed key/aggregation key.

The sql such as:

select distinct stuct_col from struct_table

will report an error.
2023-02-19 15:16:36 +08:00
96a3c60d3b [feature-wip](MTMV) Support alter statement (#16817)
Steps:
1. drop the old MTMV jobs
2. clear the old task records and clean the running and pending tasks
3. set the new scheduler info in MTMV and replay it in followers.
4. create a job in the master node.

Note that if you change the refresh info of MTMV, the old MTMV tasks will be cleaned.
2023-02-19 12:15:17 +08:00
d4cebb39ba [fix](Nereids): fix SemiJoinLogicalJoinTransposeProject. (#16883) 2023-02-18 23:12:34 +08:00
e2e6a0dd83 [Feature](load) Support mutable property for partition (#16036)
The background is described in this issue: #15723,
where users used Apache Druid to satisfy such lambada requirements before.
We will not make Doris dropping data not belonged to current time window automatically like Druid,
which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way:

1. Support mutable property for a partition.
2. The mutable property of a partition is passed from FE to BE in a load procedure
3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio',
   so that data write to immutable partition will be neglected and not cause load failure.

Use Example:

1. Add immutable partition or modify an partition to be immutable:
- alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true');
- alter table test_tbl modify partition xx set ('mutable' = 'false');

2. Write 5 records into table, two of then belongs to immutable partition
2023-02-18 23:09:34 +08:00
45427b86be [regression](struct-type) add more regression tests for struct and map type (#16790)
This commit forbid struct and map column in Materialized view and add more regression tests.
2023-02-18 20:42:17 +08:00
861e4bc64a [fix](planner) Nullable of slot descriptor is mistaken and cause BE crash #16862 2023-02-18 20:39:56 +08:00
070f42c463 [Enhancement](Es): Support config like whether push down to es (#16800)
Support config like whether push down to es and refactor some code
Like transform to wildcard query and push down to es, this increases the cpu consumption of the es,
I add a switch control it.
2023-02-17 21:56:11 +08:00
fd5d7d6097 [refactor](Nereids) remove local sort (#16819)
After adding phase in sort, the locatSort is no longer needed
change the order of sortPhase in constructor
2023-02-17 18:52:41 +08:00
6a1e3d3435 [fix](cooldown)Fix bug for single cooldown compaction, add remote meta (#16812)
* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction
2023-02-17 15:13:06 +08:00
6acee1ce88 [Fix](topn opt) double check plan From OriginalPlanner to make sure optimized SQL is a general topn query (#16848)
From the original logic, query like `select * from a where exists (select * from b order by 1) order by 1 limit 1` is a query contains subquery,
but the top query will pass `checkEnableTwoPhaseRead` and set `isTwoPhaseOptEnabled=true`.So check the double plan is a general topn query plan is needed, and rollback the needMaterialize flag setted by the previous `analyze`.
2023-02-17 10:59:35 +08:00
1fc5023d97 [Enhance](ComputeNode) K8sDeployManager support computeNode (#16789)
1.allow have no ELECTABLE or BACKEND
2.add cn NodeType
3.delete deprecated code
2023-02-17 09:08:14 +08:00