Commit Graph

80 Commits

Author SHA1 Message Date
9b047d2c94 Feat: Add byte size to TTypedesc in TExpr. Which will be used to carry scalarType information. (#17757)
Co-authored-by: libinfeng <libinfeng@selectdb.com>
2023-03-15 08:24:32 +08:00
02220560c5 [Improvement](multi catalog)Hive splitter. Get HDFS/S3 splits by using FileSystem api (#17706)
Use FileSystem API to get splits for file in HDFS/S3 instead of calling InputFormat.getSplits.
The splits is based on blocks in HDFS/S3.
2023-03-15 00:25:00 +08:00
5b39fa9843 [Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562)
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------

Co-authored-by: spaces-x <weixiang06@meituan.com>
2023-03-14 10:54:04 +08:00
55c42da511 [Feature](array) Support array<decimalv3> data type (#16640) 2023-03-13 10:48:13 +08:00
4ddd303cfc [Feature-wip](MySQL Load)Support cancel query for mysql load (#17233)
Notice some changes:
1. Support cancel query for mysql load 
2. Change the thread pool for mysql load manager.
3. Fix sucret path check logic
4. Fix some doc error
2023-03-09 22:08:26 +08:00
6c894be007 [enhancement](Nereids) support decimalv3 and precision derive (#17393) 2023-03-09 14:12:10 +08:00
b1ca87eb9b [FIX](complex-type) fix Is null predict for map/struct (#17497)
Fix is null predicate is not supported in select statement for map and struct column
2023-03-08 17:03:06 +08:00
feacb15e71 [Improvement](datev2) push down datev2 predicates with date literal (#17522) 2023-03-08 16:54:54 +08:00
626fbc34f9 [bugfix](jsonb) Fix create mv using jsonb key cause be crash (#17430) 2023-03-08 14:18:26 +08:00
4b743061b4 [feature](function) support type template in SQL function (#17344)
A new way just like c++ template is proposed in this PR. The previous functions can be defined much simpler using template function. 

    # map element extract template function
    [['element_at', '%element_extract%'], 'E', ['ARRAY<E>', 'BIGINT'], 'ALWAYS_NULLABLE', ['E']],

    # map element extract template function
    [['element_at', '%element_extract%'], 'V', ['MAP<K, V>', 'K'], 'ALWAYS_NULLABLE', ['K', 'V']],


BTW, the plain type function is not affected and the legacy ARRAY_X MAP_K_V is still supported for compatability.
2023-03-08 10:51:31 +08:00
627b5ee302 [enhancement](k8s) Support fqdn mode for fe in k8s enviroment (#17329) 2023-03-05 10:18:56 +08:00
82df2ae9d8 [feature](mysql) Support secure MySQL connection to FE (#17138)
Background:
Doris currently does not support SSL connection from MySQL clients, it's not secure enough in some cases, especially access Doris via the public internet.

Solution:
- Use TLS1.2 protocol to encrypt information.
- Implementation details
  * server <--- connect <--- client
  * if enable SSL: {
  * server <--- SSL connection request packet <--- client
  * server <--- SSL Exchange ---> client } (we will add this `if` logic part in this PR)
  * server ---> handshake request packet ---> client
  * server <--- encrypted data ---> client (this part will be realized in this PR)
- reference1 https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_connection_phase.html#sect_protocol_connection_phase_initial_handshake_ssl_handshake
- reference2 https://www.rfc-editor.org/rfc/rfc5246

close #16313

Signed-off-by: Yukang Lian <yukang.lian2022@gmail.com>
Co-authored-by: Gavin Chou <gavineaglechou@gmail.com>
Co-authored-by: morningman <morningman@163.com>
2023-03-04 12:14:48 +08:00
b5b595519a [fix](log) use logger to replace printStackTrace() (#17382)
Use Logger to replace printStackTrace to better locate problems.
2023-03-03 14:51:30 +08:00
30df268c1f [fix](hdfs)(catalog) fix BE crash when hdfs-site.xml not exist in be/conf and fix compute node logic (#17244)
We set LIBHDFS3_CONF env in start_be.sh, so libhdfs3 will try to read this hdfs-site.xml,
if file does not exist, it will throw error. But Doris does not handle this error, cause BE crash.
This CL mainly changes:

Modify start_be.sh to only set LIBHDFS3_CONF if hdfs-site.xml exist.
Refactor the HDFSCommonBuilder so that it can return error correctly.
Add BE IP info in status, so that we can get ip from error msg like:
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]failed to init reader for file  000.snappy.orc, err: 
[INTERNAL_ERROR][172.21.0.101]failed to init HDFSCommonBuilder, please check check be/conf/hdfs-site.xml
The logic of prefer compute node is wrong, which causing the external table query can only assign up to 3 backends.
This CL refactor this logic and also change some FE config:

prefer_compute_node_for_external_table

If set to true, query on external table will prefer to assign to compute node.
And the max number of compute node is controlled by min_backend_num_for_external_table.
If set to false, query on external table will assign to any node.

min_backend_num_for_external_table

Only take effect when prefer_compute_node_for_external_table is true.
If the compute node number is less than this value, query on external table will try to get some mix node
to assign, to let the total number of node reach this value.
If the compute node number is larger than this value, query on external table will assign to compute node only.
2023-03-02 11:09:55 +08:00
201cf9c8df Revert "[enhancement](k8s) Support fqdn mode for fe in k8s enviroment (#16315)" (#17278)
This reverts commit 48afd77e37d63e2989cd85ab12b39a273fcd284e.
There is meta problem
2023-03-02 00:44:54 +08:00
722755efe9 [fix](planner) change back legacy planner type coercion (#17070)
revert legacy planner change in #16844
2023-03-01 20:55:56 +08:00
48afd77e37 [enhancement](k8s) Support fqdn mode for fe in k8s enviroment (#16315) 2023-03-01 10:54:39 +08:00
b51ce415e7 [Feature](load) Add submitter and comments to load job (#16878)
* [Feature](load) Add submitter and comments to load job
2023-02-28 09:06:19 +08:00
d3a6cab716 [Fix](MySQLLoad) Fix load a big local file bug since bytebuffer from mysql packet using the same byte array (#16901)
Loading a big local file will cause `INTERNAL_ERROR]too many filtered rows` issue since the bytebuffer from mysql client always use the same byte array. 

And the later bytes will overwrite the previous one and make wrong bytes order among the network.

Copy the byte array and then fill it into network.
2023-02-28 00:06:44 +08:00
c3538ca804 [Enhancement](HttpServer) Add http interface authentication (#16571)
1. Organize http documents
2. Add http interface authentication for FE
3. Support https interface for FE
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
2023-02-24 10:59:33 +08:00
7229751bd9 [Improve](map-type) Add contains_null for map (#16948)
Add contains_null for map type.
2023-02-23 20:47:26 +08:00
edead494cb [Enhancement](storage) add a new hidden column __DORIS_VERSION_COL__ for unique key table (#16509) 2023-02-23 15:47:17 +08:00
7956800df7 [refactor](Nereids) let type coercion same with legacy planner (#16844)
- change for Nereids
1. add a variable length parameter to the ctor of Count for a good error reporting of Count(a, b)
2. refactor StringRegexPredicate, let it inherit from ScalarFunction
3. remove useless class TypeCollection
4. use catalog.Type.Collection to check expression arguments type
5. change type coercion for TimestampArithmetic, divide, integral divide, comparison predicate, case when and in predicate. Let them same as legacy planner.

- change for legacy planner
1. change the common type of floating and Decimal from Decimal to Double
2023-02-22 17:29:37 +08:00
ed05f3b480 [regression-test](fuzzy) fuzzy session variable batch_size (#16384) 2023-02-21 17:53:19 +08:00
5291f14aff [vectorized](udf) java udf support array type (#16841) 2023-02-20 10:00:25 +08:00
73f7979b73 [fix](struct-type) forbid struct-type to be distributed key/aggregation key and add more tests (#16626)
This commits forbid struct and map type to be distributed key/aggregation key.

The sql such as:

select distinct stuct_col from struct_table

will report an error.
2023-02-19 15:16:36 +08:00
45427b86be [regression](struct-type) add more regression tests for struct and map type (#16790)
This commit forbid struct and map column in Materialized view and add more regression tests.
2023-02-18 20:42:17 +08:00
0c56a4622c [Feature](struct-type) Add implicitly cast for struct-type (#16613)
Currently not support insert {1, 'a'} into struct<f1:tinyint, f2:varchar(20)>
This commit will support implicitly cast the char type in the struct to varchar.
Add implicitly cast for struct-type.
2023-02-15 16:55:00 +08:00
de85c57715 [Improve](point query) support retry different backends in PointQueryExecutor (#16380) 2023-02-14 07:31:31 +08:00
77be0d13c3 [BugFix](Load) Add a secure path for MySql Load to load local file from fe node (#16653)
MySql load can load fe server node, but it will cause secure issue that user use it to detect the fe node local file.

For this reason, add a configuration named mysql_load_server_secure_path to set a secure path to load data.

By default, load fe local file feature is disabled by this configuration.
2023-02-13 14:39:51 +08:00
f41a2055d3 [feature](Load)Remove user/password in properties for mysql load to avoid double auth. (#16073)
Use FE cluster token to auth stream load.
This auth is only open for be, and fe auth still only support http basic auth.

I will use this auth for mysql load to build a no-auth stream load from fe to be.
And this will avoid double auth in mysql load.
More information to see the design doc.
2023-02-13 10:00:08 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00
d9924c9b8e [Improvement](topn) add limit threashold session variable and fuzzy for topn optimizations (#16514)
1. add limit threshold for topn runtime pushdown and key topn optimization
2. use unified session variable topn_opt_limit_threshold for all topn optimizations
3. add fuzzy support for topn_opt_limit_threshold
2023-02-10 12:56:33 +08:00
8758cd412f [feature](auth)Implementing privilege management with rbac model (#16091)
change implement of auth to rbac

each user has one default role which can not be drop;

if you grant priv to user,it will grant to default role ,

In the current pr, the user can still only have one role other than the default role, but in the future, the user and role will be many-to-many

rename PaloRole,PaloAuth,PaloPrivilege to Role,Auth,Privilege
2023-02-10 12:30:49 +08:00
1b3902baa2 [Feature](Complex-type) Add struct and map type to Doris (#16444)
This commit support:
1、Insert + select for struct/map type
2、Json stream load for struct type
3、m[key] function for map type

How to use:
Set the fe config to create table for struct and map type
1、admin set frontend config("enable_struct_type" = "true");
2、admin set frontend config("enable_map_type" = "true");

#16547

Co-authored-by: xy720 <xuyang25@baidu.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2023-02-10 11:00:33 +08:00
885fe1516f [refactor](datev2) refine logics of auto conversion (#16552)
* [refactor](datev2) refine logics of auto conversion

* uodate

* update

* Revert "uodate"

This reverts commit 2609a13b4022b4a603bf992fad64c133def266e0.
2023-02-10 10:06:47 +08:00
e1f1386395 [fix](cooldown) Rewrite update cooldown conf (#16488)
Remove error-prone CooldownJob, and use CooldownConfHandler to update Tablet's cooldown conf.
Some bug fix about cooldown.
2023-02-09 09:12:55 +08:00
bb334de00f [enhancement](load) Change transaction limit from global level to db level (#15830)
Add transaction size quota for database

Co-authored-by: wuhangze <wuhangze@jd.com>
2023-02-08 18:04:26 +08:00
dcbcec0775 [regression](fuzzy)fuzzy enable_fold_constant_by_be (#16448)
* [fuzzy](test) fuzzy some session variables stably according to pull_request_id

* fuzzy enable_fold_constant_by_be

---------

Co-authored-by: stephen <hello_stephen@@qq.com>
2023-02-07 09:17:50 +08:00
1146bde695 [feature-wip](MTMV) Support refresh mtmv (#16218)
Support using this sql to refresh mtmv manually. It can generate a mtmv task right now.

```
REFRESH MATERIALIZED VIEW test_mv_view [complete];
```

You can use `show mtmv task` to show the latest task.

In this pr, I also try to clear the mtmv tasks when drop the mtmv to make sure test suite to be right
2023-02-04 20:17:45 +08:00
b1fd124f02 [feature](struct-type/map-type) Add switch for struct and map type for creating table (#16379)
Add switches to forbid uses creating table with struct or map column.
2023-02-03 13:46:52 +08:00
bb179b77f7 [Feature-WIP](inverted index) support array type for inverted index reader (#16355) 2023-02-02 16:14:14 +08:00
17bec356a3 [Bug](decimalv3) always use decimalv3 for show create table (#16295) 2023-02-01 09:54:42 +08:00
ec4a56922f [enhancement](memory) reduce memory usage for failed broker loads (#15895)
* [enhancement](memory) reduce memory usage for failed  broker loads
2023-01-30 10:22:31 +08:00
c6bc0a03a4 [feature](Load)Suppot MySQL Load Data (#15511)
Main subtask of [DSIP-28](https://cwiki.apache.org/confluence/display/DORIS/DSIP-028%3A+Suppot+MySQL+Load+Data)

## Problem summary
Support mysql load syntax as below: 
```sql
LOAD DATA
    [LOCAL]
    INFILE 'file_name'
    INTO TABLE tbl_name
    [PARTITION (partition_name [, partition_name] ...)]
    [COLUMNS TERMINATED BY 'string']
    [LINES TERMINATED BY 'string']
    [IGNORE number {LINES | ROWS}]
    [(col_name_or_user_var [, col_name_or_user_var] ...)]
    [SET (col_name={expr | DEFAULT} [, col_name={expr | DEFAULT}] ...)]
    [PROPERTIES (key1 = value1 [, key2=value2]) ]
```

For example, 
```sql
            LOAD DATA 
            LOCAL
            INFILE 'local_test.file'
            INTO TABLE db1.table1
            PARTITION (partition_a, partition_b, partition_c, partition_d)
            COLUMNS TERMINATED BY '\t'
            (k1, k2, v2, v10, v11)
            set (c1=k1,c2=k2,c3=v10,c4=v11)
            PROPERTIES ("auth" = "root:", "strict_mode"="true")
```

Note that in this pr the property named `auth` must be set since stream load need auth. I will optimize it later.
2023-01-29 14:44:59 +08:00
7e7fd5d049 [cleanup](fe) cleanup useless code. (#16129)
* [cleanup](Nereids): cleanup useless code.

* revert ErrorCode.java
2023-01-28 18:44:43 +08:00
2daa5f3fef [fix](statistics) Fix statistics related threads continuously spawn as doing checkpoint #16088 2023-01-21 07:58:33 +08:00
726427b795 [refactor](fe) refactor and upgrade dependency tree of FE and support AWS glue catalog (#16046)
1. Spark dpp
 
	Move `DppResult` and `EtlJobConfig` to sparkdpp package in `fe-common` module.
	So taht `fe-core` is longer depends on `spark-dpp` module, so that the `spark-dpp.jar`
	will not be moved into `fe/lib`, which reduce the size of FE output.
	
2. Modify start_fe.sh

	Modify the CLASSPATH to make sure that doris-fe.jar is at front, so that
	when loading classes with same qualified name, it will be got from doris-fe.jar firstly.
	
3. Upgrade hadoop and hive version

	hadoop: 2.10.2 -> 3.3.3
	hive: 2.3.7 -> 3.1.3
	
4. Override the IHiveMetastoreClient implementations from dependency

	`ProxyMetaStoreClient.java` for Aliyun DLF.
	`HiveMetaStoreClient.java` for origin Apache Hive metastore.

	Because I need to modified some of their method to make them compatible with
	different version of Hive.
	
5. Exclude some unused dependencies to reduce the size of FE output

	Now it is only 370MB (Before is 600MB)
	
6. Upgrade aws-java-sdk version to 1.12.31

7. Support AWS Glue Data Catalog

8. Remove HudiScanNode(no longer support)
2023-01-20 14:42:16 +08:00
74c0677d62 [fix](planner) fix bugs in uncheckedCastChild (#15905)
1. `uncheckedCastChild` may generate redundant `CastExpr` like `cast( cast(XXX as Date) as Date)`
2. generate DateLiteral to replace cast(IntLiteral as Date)
2023-01-19 15:51:08 +08:00
3894de49d2 [Enhancement](topn) support two phase read for topn query (#15642)
This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`.

TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase:
1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode.
2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine.

After the second phase read, Block will contain all the data needed for the query
2023-01-19 10:01:33 +08:00