Commit Graph

2007 Commits

Author SHA1 Message Date
6af1c52e13 [Feature] add support for tencent chdfs (#8963)
Co-authored-by: chengwu <chengwu@tencent.com>
2022-04-12 16:02:42 +08:00
51269efbb7 [improvement]Disable mini load (#8955)
Disable miniload by default
2022-04-12 16:01:03 +08:00
0d761f9909 [feature-wip][UDF][DIP-1] Support variable-size input and output for Java UDF (#8678)
This feature is proposed in DSIP-1. This PR support variable-length input and output Java UDF.
2022-04-11 09:36:16 +08:00
936b942e3a [fix](error-code) replace invalid format specifier (#8940)
change %lu and %ld to %d
2022-04-10 20:37:32 +08:00
1ee8633e5e [fix](account) use LOG.info instead of LOG.debug (#8911)
This complements (#8849)
2022-04-09 19:18:13 +08:00
ce6b5169c2 [fix](join) Fix error bucket num get in bucket shuffle join in dynamic partition (#8891) 2022-04-09 19:11:44 +08:00
a290104966 [fix](routine load) Routine load task doesn't reallocate when previous BE is down. (#8824)
if previous be is not alive, should assigned another available BE instead.
2022-04-09 19:02:55 +08:00
ddf7ef9327 [improvement](join) update broadcast join cost algorithm (#8695)
broadcast join cost is used compressed data size currently.
The amount of memory used may be significantly more than estimated.
This patch:
1. add a compressed ratio to broadcast join cost and set to 5 according to the experience.
2. add a new session variable `auto_broadcast_join_threshold` to limit memory used by broadcast in bytes, the default value is 1073741824(1GB)
2022-04-09 19:00:27 +08:00
Pxl
453485abfb [Bug] Fix some bugs(rewrite rule/symbol transport) of like predicate (#8770) 2022-04-08 14:32:09 +08:00
c5718928df [feature-wip](array-type) support explode and explode_outer table function (#8766)
explode(ArrayColumn) desc:
> Create a row for each element in the array column. 

explode_outer(ArrayColumn) desc:
> Create a row for each element in the array column. Unlike explode, if the array is null or empty, it returns null.

Usage example:
1. create a table with array column, and insert some data;
2. open enable_lateral_view and enable_vectorized_engine;
```
set enable_lateral_view = true;
set enable_vectorized_engine=true;
```
3. use explode_outer
```
> select * from array_test;
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    3 | NULL | NULL   |
|    1 |    2 | [1, 2] |
|    2 |    3 | NULL   |
|    4 | NULL | []     |
+------+------+--------+

> select k1,explode_column from array_test LATERAL VIEW explode_outer(k3) TempExplodeView as explode_column;
+------+----------------+
| k1   | explode_column |
+------+----------------+
|    1 |              1 |
|    1 |              2 |
|    2 |           NULL |
|    4 |           NULL |
|    3 |           NULL |
+------+----------------+
```
4. explode usage example. explode return empty rows while the ARRAY is null or empty
```
> select k1,explode_column from array_test LATERAL VIEW explode(k3) TempExplodeView as explode_column;
+------+----------------+
| k1   | explode_column |
+------+----------------+
|    1 |              1 |
|    1 |              2 |
+------+----------------+
```
2022-04-08 12:11:04 +08:00
fa8e4ec2f0 [fix] Disable cast operation of object type (#8882)
Disable cast between string and object type(bitmap, hll, quantile_state)
2022-04-08 09:13:56 +08:00
e3daa9580a [Fix](Lateral View) The Error expr type when exploding a function result of inline view (#8851)
Fixed #8850

The column in inline view maybe a function instead of slotRef.
So when this column is used as the input of explode function,
it can't be converted to slotRef.

The correct way is to treat it as an Expr and extract the required slotRef for materialization.
For example:
```
with d as (select k1+k1 as k1_plus from table)
select k1_plus from d explode_split(k1_plus, ",")
```
FnExp: SlorRef<k1_plus>
SubstituteFnExpr: functionCallExpr<k1+k1>
originSlotRefList: SlotRef<k1>
2022-04-08 09:08:55 +08:00
318feb01f3 [improvement](account) support to account management sql (#8849)
Add [IF EXISTS] support to following statements:
- CREATE [IF NOT EXISTS] USER
- CREATE [IF NOT EXISTS] ROLE
- DROP [IF EXISTS] USER
- DROP [IF EXISTS] ROLE
2022-04-08 09:08:08 +08:00
32bba15e34 [refactor][fix] remove useless import in Config.java (#8878) 2022-04-07 11:40:05 +08:00
64d18364db [improvement](restore) set table property 'dynamic_partition.enable' to false after restore (#8852)
when restore table with dynamic partition properties, 'dynamic_partition.enable' is set to the backup time value.
but Doris could not turn on dynamic partition automatically when restore.
So we cloud see table never do dynamic partition with dynamic_partition.enable is set to 'true'.
2022-04-07 11:34:01 +08:00
ce50c4d826 [feature](diagnose) support "ADMIN DIAGNOSE TABLET" stmt (#8839)
`ADMIN DIAGNOSE TABLET tablet_id`

This statement makes it easier to quickly diagnose the status of a tablet.
See "ADMIN-DIAGNOSE-TABLET.md" for details

```
mysql> admin diagnose tablet 10196;
+----------------------------------+------------------------------+------------+
| Item                             | Info                         | Suggestion |
+----------------------------------+------------------------------+------------+
| TabletExist                      | Yes                          |            |
| TabletId                         | 10196                        |            |
| Database                         | default_cluster:db1: 10192   |            |
| Table                            | tbl1: 10194                  |            |
| Partition                        | tbl1: 10193                  |            |
| MaterializedIndex                | tbl1: 10195                  |            |
| Replicas(ReplicaId -> BackendId) | {"10197":10002}              |            |
| ReplicasNum                      | OK                           |            |
| ReplicaBackendStatus             | Backend 10002 is not alive.  |            |
| ReplicaVersionStatus             | OK                           |            |
| ReplicaStatus                    | OK                           |            |
| ReplicaCompactionStatus          | OK                           |            |
+----------------------------------+------------------------------+------------+
```
2022-04-07 11:30:03 +08:00
e72ccfd80c [Refactor][httpv2]remove http v1 code (#8848)
http v2 has been actually tested in production, and it is completely replaceable to have http code. In order to simplify code maintenance, remove the previous http part of the code
2022-04-07 08:38:29 +08:00
98cab78320 [refactor](schema_hash) remove schema_hash since every tablet id in be is unique (#8574) 2022-04-07 08:37:45 +08:00
319f1f634a [fix](ut) fix fe run CreateTableAsSelectStmtTest ,UserPropertyTest, ProjectPlannerFunctionTest and AggregateTest failed (#8838) 2022-04-06 15:23:49 +08:00
33736e45fa [fix](table-function) Fixed unreasonable nullable conversion (#8818) 2022-04-03 11:02:35 +08:00
a8417e6c8b [fix](restore) fix restore issue when meta version is too low (#8816)
When restore snapshot from 0.13 to master, the restore job is pending for long time.
However, we get error "Could not set meta version to 93 since it is lower than minimum required version 100" in log.
We should cancel restore job once get that error.
2022-04-03 10:56:23 +08:00
eed4908790 [chore](deps) upgrade spring to 2.6.2 to 2.6.6 (#8802) 2022-04-03 10:52:31 +08:00
0e3b15f2d7 [fix](colocate) Fix the error colocate plan when query is (rollup + instance >1) (#8779)
The Repeat Node will change the fragment data partition.

So the output partition of child fragment is different from the data partition of current fragment.
When judging whether colocate can be enabled,
the current data partition of fragment should be used directly instead of the child's output partition.

Before this PR fix, queries with '''rollup + concurrency greater than 1''' may have incorrect results.
For example: 
```
select t1.tc1,t1.tc2,sum(t1.tc3) as total from t1 join[shuffle] t1 t2 on t1.tc1=t2.tc1
group by rollup(tc1,tc2) order by t1.tc1,t1.tc2,total;
```

Fixed #8778
2022-04-03 10:19:39 +08:00
a75e4a1469 Window funnel (#8485)
Add new feature window funnel
2022-04-02 22:08:50 +08:00
13f1f94f86 [chore] upgrade log4j version to 2.17.2 (#8774)
upgrade log4j version to 2.17.2
2022-04-02 21:29:25 +08:00
4d516bece8 [feature-wip](array-type)Add element_at and subscript functions (#8597)
Describe the overview of changes.
1. add function element_at;
2. support element_subscript([]) to get element of array, col_array[N] <==> element_at(col_array, N);
3. return error message instead of BE crash while array function execute failed;

element_at(array, index) desc:
>   Returns element of array at given **(1-based)** index. 
  If **index < 0**, accesses elements from the last to the first. 
  Returns NULL if the index exceeds the length of the array or the array is NULL.

Usage example:
1. create table with ARRAY type column and insert some data:
```
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    1 |    2 | [1, 2] |
|    2 |    3 | NULL   |
|    4 | NULL | []     |
|    3 | NULL | NULL   |
+------+------+--------+
```
2. enable vectorized:
```
set enable_vectorized_engine=true;
```
3. element_subscript([]) usage example:
```
> select k1,k3,k3[1] from array_test;
+------+--------+----------------------------+
| k1   | k3     | %element_extract%(`k3`, 1) |
+------+--------+----------------------------+
|    3 | NULL   |                       NULL |
|    1 | [1, 2] |                          1 |
|    2 | NULL   |                       NULL |
|    4 | []     |                       NULL |
+------+--------+----------------------------+
```
4. element_at function usage example:
```
> select k1,k3 from array_test where element_at(k3, -1) = 2;
+------+--------+
| k1   | k3     |
+------+--------+
|    1 | [1, 2] |
+------+--------+
```
2022-04-02 12:03:56 +08:00
6c5bbc6e4c fix agg functions check failed from empty table (#8785)
fix agg functions check failed from empty table
2022-04-02 10:44:55 +08:00
9f80f6cf5e [Improvement](Planner)Enable hash join project (#8618) 2022-04-01 15:42:25 +08:00
6729d41c93 [improvement] add switch of quantile_state column (#8706)
Add switch for quantile_state column, default false.
2022-03-31 22:59:27 +08:00
d17ee7b476 [fix](sql-block-rule)Fix sql block rule bug (#8738)
1. Check properties' effectiveness of sql_block_rule, can't set limitations of sql_block_rule to be negative.
2. Optimize the judgment conditions when a query hits a sql_block_rule
3. Check if sql_block_rule has already exist when exec set property for "user" "sql_block_rule" = "xxx"
4. Add UT ad SqlBlockRuleMgrTest.java
2022-03-31 13:51:13 +08:00
b98da02611 [chore][fix](httpv2) Use mariadb-java-client for http query api (#8716)
In #8319, I remove mysql-connector-java dependency because of license incompatibility.
But we need a mysql compatible driver for http query api. So I choose mariadb-java-client,
which is under LGPL.
2022-03-30 09:59:45 +08:00
22cf6ea17c [chore] Modify build.sh and refactor dependency of FE submodules (#8732)
This PR fixes the #8731 and refactor the `build.sh` script.

The build.sh script is currently responsible for the compilation of the following Doris components.
1. FE
    - fe-common
    - fe-core
    - spark-dpp
    - hive-udf
    - java-udf
    - ui
2. BE
    - palo_be
    - meta_tool
3. broker

In the FE module.
- The 4 submodules `fe-common, fe-core, spark-dpp and ui` together form Frontend.
- `spark-dpp, hive-udf and java-udf` can be compiled separately to produce jar packages for individual use.

In the BE module.
- `palo_be` can start the BE process separately.
- `meta_tool` can be compiled separately to produce binaries.

The modified build.sh script has the following changes:

1. there is no longer an option to compile `ui` separately, build together with `--fe`.
2. `fe/be/spark-dpp/hive-udf/java-udf/palo_be/meta_tool` can be compiled separately.
3. all components except `java-udf` will be compiled by default (`java-udf` is in development)

Remaining issues:

Several submodules of FE have messy dependencies.
For example, `java-udf` depends on `fe-core`, and `fe-core` depends on `spark-dpp`,
resulting in a large binary jar of `java-udf`.
It needs to be reorganized afterwards.
2022-03-30 00:13:24 +08:00
3f5bc5206d [Improvement] broker load with hdfs support wildcard (#8718)
broker load with hdfs support wildcard
2022-03-29 18:21:41 +08:00
66a3c574df [Vectorized][Bug] fix percentile_approx function to return always nullable (#8572) 2022-03-29 14:47:39 +08:00
f3659c87c1 [fix][chore](repository)(fe) check reponame when creating repository and modify build.sh (#8671)
1. We need to check repo name when creating repository
2. modify build.sh to not install spark-dpp when spark-dpp is not compiled
2022-03-29 11:32:52 +08:00
d82c138a60 [fix](user-property) Fix bug that can not set exec_mem_limit at user level (#8710) 2022-03-29 10:03:33 +08:00
7cfce63a13 [fix](mini-load) Remove mini load in LOADING and PENDING state (#8649)
1. Remove some unused code.
2. handle mini load with wrong state
    1. For some historical reasons, some mini load jobs in LOADING state have not been cleared.
        As a result, new load jobs cannot be committed.
    2. If a mini load job is created right before FE restart, the mini load job will be in PENDING state forever.
        But it should be removed finally.
2022-03-28 10:22:17 +08:00
HB
39717a85a2 [fix](load) Fix null column bug in load's mapping column setting (#8625) 2022-03-28 10:08:00 +08:00
f96bc62573 [feature](balance) Support balance between disks on a single BE (#8553)
Current situation of Doris is that the cluster is balanced, but the disks of a backend may be unbalanced.
for example, backend A have two disks: disk1 and disk2, disk1's usage is 98%, but disk2's usage is only 40%.
disk1 is unable to take more data, therefore only one disk of backend A can take new data,
the available write throughput of backend A is only half of its ability, and we can not resolve this through load or 
partition rebalance now.

So we introduce disk rebalancer, disk rebalancer is different from other rebalancer(load or partition)
which take care of cluster-wide data balancing. it takes care about backend-wide data balancing.

[For more details see #8550](https://github.com/apache/incubator-doris/issues/8550)
2022-03-28 10:03:21 +08:00
b2861f36c4 [chore] optimize aws thirdparty package download. (#8637) 2022-03-28 09:35:51 +08:00
cfb57be731 [api-change] add soft limit of String type length (#8567)
1. add a config string_type_soft_limit to soft limit max length of string type
2. disable using String type in Key column, partition column and
   distribution column
3. remove String type alias BLOB for futrue use
2022-03-25 09:28:41 +08:00
5f606c9d57 [fix] Fix coredump of stddev function (#8543)
This is only a temporary fix its performance is not ideal. Finally,
we need to reconstruct the functions of `stddev` and delete the interface of `insert_to_null_default ()`.
2022-03-24 11:39:29 +08:00
a58e56f0b4 [fix](load) fix another bug that BE may crash when calling mark_as_failed (#8607)
Same as #8501
2022-03-24 09:13:54 +08:00
bea9a7ba4f [feature] Support pre-aggregation for quantile type (#8234)
Add a new column-type to speed up the approximation of quantiles.
1. The  new column-type is named `quantile_state` with fixed aggregation function `quantile_union`, which stores the intermediate results of pre-aggregated approximation calculations for quantiles.
2. support pre-aggregation of new column-type and quantile_state related functions.
2022-03-24 09:11:34 +08:00
72dfdb9a6c [fix] Fix Check_time return wrong value when exec show table status (#8578) 2022-03-23 10:34:23 +08:00
b89e4c7bba [feature-wip](java-udf) support java UDF with fixed-length input and output (#8516)
This feature is propsoed in [DSIP-1](https://cwiki.apache.org/confluence/display/DORIS/DSIP-001%3A+Java+UDF). 
This PR support fixed-length input and output Java UDF. Phase I in DIP-1 is done after this PR.

To support Java UDF effeciently, I use no data copy in JNI call and all compute operations are off-heap in Java.
To achieve that, I use a UdfExecutor instead. 

For users, a UDF class must have a public evaluate method.
2022-03-23 10:32:50 +08:00
71ce3c4a6e [feature-wip](array-type) Add codes and UT for array_contains and array_position functions (#8401) (#8589)
array_contains function Usage example:
1. create table with ARRAY column, and insert some data:
```
> select * from array_test;
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    1 |    2 | [1, 2] |
|    2 |    3 | NULL   |
|    4 | NULL | []     |
|    3 | NULL | NULL   |
+------+------+--------+
```
2. enable vectorized:
```
> set enable_vectorized_engine=true;
```
3. select with array_contains:
```
> select k1,array_contains(k3,1) from array_test;
+------+-------------------------+
| k1   | array_contains(`k3`, 1) |
+------+-------------------------+
|    3 |                    NULL |
|    1 |                       1 |
|    2 |                    NULL |
|    4 |                       0 |
+------+-------------------------+
```
4. also we can use array_contains in where condition
```
> select * from array_test where array_contains(k3,1);
+------+------+--------+
| k1   | k2   | k3     |
+------+------+--------+
|    1 |    2 | [1, 2] |
+------+------+--------+
```
5. array_position usage example
```
> select k1,k3,array_position(k3,2) from array_test;
+------+--------+-------------------------+
| k1   | k3     | array_position(`k3`, 2) |
+------+--------+-------------------------+
|    3 | NULL   |                    NULL |
|    1 | [1, 2] |                       2 |
|    2 | NULL   |                    NULL |
|    4 | []     |                       0 |
+------+--------+-------------------------+
```
2022-03-22 15:42:40 +08:00
b638c07533 [feature-wip](array-type) Support nested array insertion. (#8305) (#8586)
Please refer to #8304 .
2022-03-22 15:28:26 +08:00
e44038caf3 [feature-wip](array-type) Array data can be loaded in stream load. (#8368) (#8585)
Please refer to #8367 .
2022-03-22 15:25:40 +08:00
38ec3cbbdf [feature-wip](array-type) Support ArrayLiteral in SQL. (#8089) (#8582)
Please refer to #8074
2022-03-22 15:07:06 +08:00