Commit Graph

3870 Commits

Author SHA1 Message Date
76e539dbda [Improvement](multi catalog)(nereids)Support JDBC external table for new planner. (#17063)
Support JDBC external table for Nereids planner. JDBC table is another type of table, like olap table, hms table and so on.
2023-02-28 09:43:04 +08:00
bf9997ae3d [fix](Nereids) date/datetime foor and ceil should always nullable (#17188) 2023-02-28 09:37:10 +08:00
b51ce415e7 [Feature](load) Add submitter and comments to load job (#16878)
* [Feature](load) Add submitter and comments to load job
2023-02-28 09:06:19 +08:00
dd1bd6d8f1 [Fix](multi catalog)Support hive default partition. (#17179)
Hive store all the data without partition columns to a default partition named __HIVE_DEFAULT_PARTITION__.
Doris will fail to get the this partition when the partition column type is INT or something else that 
__HIVE_DEFAULT_PARTITION__ couldn't convert to. 
This pr is to support hive default partition, set the column value to NULL for the missing partition columns.
2023-02-28 00:08:29 +08:00
d3a6cab716 [Fix](MySQLLoad) Fix load a big local file bug since bytebuffer from mysql packet using the same byte array (#16901)
Loading a big local file will cause `INTERNAL_ERROR]too many filtered rows` issue since the bytebuffer from mysql client always use the same byte array. 

And the later bytes will overwrite the previous one and make wrong bytes order among the network.

Copy the byte array and then fill it into network.
2023-02-28 00:06:44 +08:00
84413f33b8 [enhancement](merge-on-write) add skip_delete_bitmap session variable for debug purpose (#17127) 2023-02-27 23:31:28 +08:00
e8de07a6a5 [feature](cooldown) Forbid storage policy for MoW tables (#17148)
* disable setting storage policy on MoW table

* fix error in regression test

* make the name of test table unique

* use Strings.isNullOrEmpty to replace equals

* fix error in if statement
2023-02-27 18:42:31 +08:00
0db58800d3 [fix](stmt-forward) fix result missing (#17173) 2023-02-27 18:01:43 +08:00
95837b7958 [Enhancement](ES): Support mapping es date format and replace simple json with jackson (#16806)
* Support mapping es date format, default/yyyy-MM-dd HH:mm:ss/yyyy-MM-dd/epoch_millis

* Replace simple json with jackson, resolve column order random problem

* Add es array doc version
2023-02-27 14:47:21 +08:00
c0360f80bb [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases (#15339)
Enhance aggregate function `collect_set` and `collect_list` to support optional `max_size` param,
which enables to limit the number of elements in result array.
2023-02-27 14:22:30 +08:00
26ccb6ba5a [feature-wip](MTMV) Add some metrics for MTMV (#16913)
Demo:

```
# HELP doris_fe_mtmv_job Total job number of mtmv.
# TYPE doris_fe_mtmv_job gauge
doris_fe_mtmv_job{type="TOTAL-JOB"} 1
doris_fe_mtmv_job{type="ACTIVE-JOB"} 1
# HELP doris_fe_mtmv_task Running task number of mtmv.
# TYPE doris_fe_mtmv_task gauge
doris_fe_mtmv_task{type="RUNNING-TASK"} 0
doris_fe_mtmv_task{type="PENDING-TASK"} 0
doris_fe_mtmv_task{type="FAILED-TASK"} 0
doris_fe_mtmv_task{type="TOTAL-TASK"} 1
```
2023-02-27 11:27:23 +08:00
2d5f32caf1 [fix](nereids) dphyper join reorder may lost join condition in some case (#16995)
when emitCsgCmp, we should check if there is some missed edges should be used as connection edge. If there is missed edge but can't be used as connection edge, the emitCsgCmp should return and seek for another plan.
2023-02-27 10:36:11 +08:00
f228cfdd00 [enhancement](session-variable)add a use_fix_replica session variable to fix query replica (#17101)
Add use_fix_replica session variable, so that we can be better debug replica inconsistencies problem.
If use_fix_replica default is -1, which means not fix,
else we will choose the {use_fix_replica} smallest replica.
2023-02-27 10:20:23 +08:00
857d38e24b [fix](scan) Default enable function(Like) pushdown #17154
function pushdown: #10355
NGram BloomFilter Index apply like pushdown: #11579

Enabled by default, make sure it stays active.

If NGram BloomFilter Index is not used, this like pushdown can be replaced by #15917, which can push down all expressions including like.
2023-02-27 09:58:37 +08:00
4d1f3b8abf [fix](Nereids) mow unique table's preagg should work like duplicate table (#17028) 2023-02-26 22:52:50 +08:00
469b6b8466 [enhancement](Nereids) datetime v2 type precision derive (#17079) 2023-02-26 22:33:55 +08:00
710529b060 [enhance](Nereids): refactor LogicalJoin. (#17099) 2023-02-26 22:28:54 +08:00
e9619368e9 [fix](s3) fix SdkClientException: Multiple HTTP implementations were found on the classpath (#17136) 2023-02-26 15:32:43 +08:00
Pxl
6bb721d86b [Chore](build) fix some warning on code generate and webui #17078
[WARNING:gensrc/thrift/parquet.thrift:22] Uncaptured doctext at on line 18.

[WARNING:gensrc/thrift/parquet.thrift:23] Uncaptured doctext at on line 22.

[WARNING:gensrc/thrift/parquet.thrift:436] Uncaptured doctext at on line 428.

WARNING in asset size limit: The following asset(s) exceed the recommended size limit (244 KiB).WARNING in asset size limit: The following asset(s) exceed the
  recommended size limit (244 KiB). This can impact web performance


WARNING in entrypoint size limit: The following entrypoint(s) combined asset size exceeds the recommended limit


Warning : Macro "NonTerminator" has been declared but never used.
2023-02-26 13:01:19 +08:00
0251cb8941 [fix](cooldown) Handle re-add replica with cooldowned data #17047
Modify rule of choosing cooldown replica, only alive replica can be cooldown replica.
Handle re-add replica with cooldowned data.
2023-02-26 12:36:55 +08:00
5018223176 [Enhancement](stmt-forward) better error msg for follower fe #17132
The error log msg for the FE follower's forward to master failure is ambiguous as seen, so we should clarify it.
2023-02-26 12:28:33 +08:00
605d840231 [improvement](log)enhance log msg of finding be policy failure (#17134) 2023-02-26 11:52:25 +08:00
d3a7cb8bde [fix](stream_load) can abort 2pc stream load when table dropped #17088
when stream load with 2pc, the table was droped before commit, it will get error commit or abort, trasaction can not finish.
if commit or abort ,will get error:
{
"status": "ANALYSIS_ERROR",
"msg": "errCode = 7, detailMessage = unknown table, tableId=52579"
}
after this pr, i can abort success.
2023-02-26 11:20:41 +08:00
3a9aa03aab [BugFix](oracle-catalog) Modify the doris data type mapping of oracle NUMBER(p,s) type (#17051)
The data type `NUMBER(p,s)` of oracle has some different of doris decimal type in semantics. 
For Oracle Number(p,s) type:
1. 
if s<0 , it means this is an Interger. This `NUMBER(p,s)` has (p+|s| ) significant digit,
and rounding will be performed at s position.
eg:  if we insert 1234567 into `NUMBER(5,-2)` type, then the oracle will store 1234500. In this case,
Doris will use
int type (`TINYINT/SMALLINT/INT/.../LARGEINT`).

2. if s>=0 && s<p , it just like doris Decimal(p,s) behavior.

3. if s>=0 && s>p, it means this is a decimal(like 0.xxxxx).
p represents how many digits can be left to the left after the decimal point,
the figure after the decimal point s will be rounded. eg: we can not insert 0.0123456 into `NUMBER(5,7)` type,
because there must be two zeros on the right side of the decimal point,
we can insert 0.0012345 into `NUMBER(5,7)` type. In this case, Doris will use `DECIMAL(s,s)`

4. if we don't specify p and s for `NUMBER(p,s)` like `NUMBER`,
the p and s of `NUMBER` are uncertain. In this case, doris can not determine p and s,
so doris can not determine data type.
2023-02-26 09:05:41 +08:00
f6ce072297 [Enhencement](csv-reader) Optimize csv_reader _split_value and fix json_reader case sensitive (#17093)
1. Enhencement:
    For single-charset column separator,csv_reader use another method of `split value`.
2. BugFix
    Set `json` file format loading to be sensitive.
2023-02-26 09:03:04 +08:00
c43e521d29 [feature](multi-catalog) support map&struct type in parquet&orc reader (#17087)
Support parsing map&struct type in parquet&orc reader.

## Remaining Problems
1. Doris use array type to build the key and value column of a `map`, but doesn't fill the offsets in value column, so the offsets in value column is wasted.
2. Parquet support reading only key or value column in `map`, this PR hasn't supported yet.
3. Parquet support reading partial columns in `struct`, this PR hasn't supported yet.
2023-02-26 08:55:39 +08:00
50b423e09b [improvement](mysql) merge connect context and mysql channel and reduce send buffer memory (#17125) 2023-02-25 21:07:23 +08:00
4093ef9e4b [fix](auth) fix losing global priv bug and refactor default role name (#16966)
This PR mainly changes:

When upgrading from old version to master, the ADMIN_PRIV for normal user may be lost.
This may only happen if:

Create a user with ADMIN_PRIV privilege.
Upgrade Doris to v1.2.x or master before the meta image which contains the edit log in step 1 is generate.
And the ADMIN_PRIV will be lost in Global Privileges
This PR will rectify this bug and set ADMIN_PRIV to the right place

Refactor the user's implicit role name

In [feature](auth)Implementing privilege management with rbac model #16091, we refactor the Doris auth model by introducing RBAC. And each user will have an implicit role,
named with prefix default_role_rbac_. But it has wrong format like:
default_role_rbac_'default_cluster:user1'@'%'

This PR change the role name's format, like:

default_role_rbac_user1@%
default_role_rbac_user2@[domain]
NOTICE: this change may cause incompatible metadata, but since [feature](auth)Implementing privilege management with rbac model #16091 is not released, we should fix it soon.

Add a new session variable show_user_default_role

When set to true, it will show implicit role of user in the result of show roles stmt. Default is false
2023-02-24 23:36:53 +08:00
83e5ecdecc [fix](Nereids) use a threshold to check the equal double values in n-th rank (#17118)
The cost is inaccurate, so we use a threshold to check the equal double values
2023-02-24 22:12:47 +08:00
Pxl
2db4a981b3 [Feature](Materialized-View) forbiden rename column on materialized view (#17030)
forbiden rename column on materialized view
2023-02-24 21:28:31 +08:00
c53b6a9532 [fix](Nereids) fix nullable() of lead/lag (#17014)
fix bug when we use NULL as default value for window function lead() and lag()
2023-02-24 21:27:44 +08:00
54e68fe250 [feature](cooldown)add ut for CooldownConfHandler (#17007)
* add ut for CooldownConfHandler

* add ut for CooldownConfHandler

* add ut for CooldownConfHandler
2023-02-24 17:06:55 +08:00
Pxl
0691586eb7 [Chore](regression-test) add createMV action && add some mv case from fe ut MaterializedViewFunctionTest (#16825)
1. add createMV action
2. add some mv case from fe ut MaterializedViewFunctionTest
3. reduce mv scheduler interval time from 10s to 0.3s
2023-02-24 16:35:37 +08:00
cf5bc9594b [fix](planner) conjuncts of the outer query block didn't work when it's on the results expr of inline view (#17036)
Here is a cases:

select id, name
from (select '123' as id, '1234' as name, age from test_insert ) a
where name != '1234';
2023-02-24 15:27:34 +08:00
c39914c0a0 [feature](partition)add default list partition (#15509)
This pr implements the list default partition referred in related #15507.
It's similar as GreenPlum's default's partition which would store all data not satisfying prior partition key's
constraints and optimizer wouldn't filter default partition which means default partition would be scanned
each time you try to select data from one table with default partition.

User could either create one table with default partition or alter add one default partition.

```sql
PARTITION LIST(key) {
PARTITION p1 values in (xx,xx),
PARTITION DEFAULT
}

ALTER TABLE XXX ADD PARTITION DEFAULT
```

We don't support automatically migrate data inside default partition which meets newly added partition key's
constraint to newly add partition when alter add new partition. User should select default partition using new 
constraints as predicate and insert them to new partition.

```sql
insert into tbl select * from tbl partition default where partition_key=xx;
```
2023-02-24 15:24:59 +08:00
479d57df88 [fix](planner) the project expr should be calculated in join node in some case (#17035)
Consider the sql bellow:

select sum(cc.qlnm) as qlnm
FROM
  outerjoin_A
  left join (SELECT
      outerjoin_B.b,
      coalesce(outerjoin_C.c, 0) AS qlnm
    FROM
      outerjoin_B
      inner JOIN outerjoin_C ON outerjoin_B.b = outerjoin_C.c
  ) cc on outerjoin_A.a = cc.b
group by outerjoin_A.a;

The coalesce(outerjoin_C.c, 0) was calculated in the agg node, which is wrong.
This pr correct this, and the expr is calculated in the inner join node now.
2023-02-24 15:20:05 +08:00
d562428b1d [enhancement](memory) reduce memory usage for failed broker loads (#16974)
Reduce more memory usage for failed broker load msg in fe after pr  #15895
2023-02-24 12:07:02 +08:00
c3538ca804 [Enhancement](HttpServer) Add http interface authentication (#16571)
1. Organize http documents
2. Add http interface authentication for FE
3. Support https interface for FE
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
2023-02-24 10:59:33 +08:00
a12b3c3f0c [fix](alter inverted index) fix incorrect CreateTime of 'show alter' query result after fe restart (#17043)
For add or drop inverted index, when replay the logModifyTableAddOrDropInvertedIndices will new a schema change job, that has a new CreateTime, here should new a schema change job when not replay log.
2023-02-24 10:25:48 +08:00
7229751bd9 [Improve](map-type) Add contains_null for map (#16948)
Add contains_null for map type.
2023-02-23 20:47:26 +08:00
92ecd16573 (feature)[DOE]Support array for Doris on ES (#16941)
* (feature)[DOE]Support array for Doris on ES
2023-02-23 19:31:18 +08:00
48fd528a2b [feature](Nereids) Add hint NTH_OPTIMIZED_PLAN to let the optimzier select n-th optimized plan (#16992)
Add hint NTH_OPTIMIZED_PLAN to let the optimzier can select n-th optimized plan. For example, you could use,

select /*+SET_VAR("nth_optimized_plan"=2) */ * from table;

to select the second-best plan in the optimizer.
2023-02-23 18:56:51 +08:00
edead494cb [Enhancement](storage) add a new hidden column __DORIS_VERSION_COL__ for unique key table (#16509) 2023-02-23 15:47:17 +08:00
3ea6478ba8 [feature](multi-catalog) parquet reader support nested array column (#16961)
Support to decode nested array column in parquet reader:
1. FE should generate the right nested column type. FE doesn't check the nesting depth and legality, like map\<array\<int\>, int\>.
2. `ParquetColumnReader` has removed the filtering of page index to support nested array type.
    It's too difficult to skip values in nested complex  types. Maybe we should support the filtering of page index and lazy read in later PR.
3. `ExternalFileScanNode` has a bug in creating default value expression.
4. Maybe it's slow to read repetition levels in a while loop. I'll optimize this in next PR.
5. Array column has temporary `SchemaElement` in its thrift definition,
we have removed them and keep its parent in former implementation.
The remaining parent should inherit the repetition and definition level of its child.
2023-02-23 14:54:58 +08:00
c2cc75d741 [BugFix](Jdbc Catalog) Fix null pointer exception in JdbcExecutor (#16958)
This pr do two things:
1. fix: 
    It use `column[0]` to judge class type in JdbcExecutor, but column[0] may be null !

2. Enhencement
    In the original logic, all fields in jdbc catalog table will be set Nullable.
    However, it is inefficient for nullable fields. Actually, we can know if the fields in data source table
    is nullable through jdbc. So we can set the corresponding fields in Doris jdbc catalog to nullable or not.
2023-02-23 14:04:54 +08:00
51bbae27b8 [feature-wip](iceberg) add dlf and glue catalog impl for iceberg catalog (#16602)
iceberg catalog supports
DLF on Alibaba Cloud and AWS Glue Catalog
2023-02-23 14:02:41 +08:00
bc619ce5be [Fix](load)Pass hidden column to load columns (#17004)
The LoadScanProvider doesn't get Hidden Columns from stream load parameter.
This may cause stream load delete operation fail. This pr is to pass the hidden columns to LoadScanProvider.
2023-02-23 13:54:36 +08:00
a9fb47a80a [fix](planner) create view init bug (#16890)
the body of create view stmt is parsed twice.
in the second parse, we get sql string from CreateViewStmt.viewDefStmt.toSql() function, which missed selectlist.
2023-02-22 20:40:08 +08:00
df2f248712 [feature](planner) add dayofweek for FEFunctions to support fold constant (#16993)
add dayofweek for FEFunctions to support fold constant. use Zellar algorithm
2023-02-22 20:27:49 +08:00
7aa063c1f3 [fix](planner) bucket shuffle join is not recognized if the first table is a subquery (#16985)
consider sql select *
from
(select * from test_1) a
inner join
(select * from test_2) b
on a.id = b.id
inner join
(select * from test_3) c
on a.id = c.id

Because a.id is from a subquery, to find its source table, need use function getSrcSlotRef().
2023-02-22 20:23:00 +08:00