Commit Graph

156 Commits

Author SHA1 Message Date
8b9813663d [test](executor)add crud regression test for resource group (#19659)
dd crud regression test for resource group (#19659)
2023-05-20 13:49:02 +08:00
a81db3e984 [improvement](FQDN) broker support fqdn (#19821)
1.broker support fqdn
2.change 'master_only' attr of 'enable_fqdn_mode'
2023-05-20 11:25:58 +08:00
67dc68630b [Improve](complex-type)improve array/map/struct creating and function with decimalv3 (#19830) 2023-05-19 17:43:36 +08:00
9d54545bac [Fix](inverted index) add datev2/datetimev2 for inverted index column type (#19845)
When we try to query array of datetimev2 column by inverted index, it returns an error like this:

CREATE TABLE `nested` (
 `qid` bigint(20) NULL,
 `tag` array<text> NULL,
 `creationDate` datetime NULL,
 `title` text NULL,
 `user` text NULL,
 `answers.user` array<text> NULL,
 `answers.date` array<datetimev2(0)> NULL,
 INDEX tag_idx (`tag`) USING INVERTED PROPERTIES("parser" = "english") COMMENT '',
 INDEX creation_date_idx (`creationDate`) USING INVERTED COMMENT '',
 INDEX title_idx (`title`) USING INVERTED COMMENT '',
 INDEX user_idx (`user`) USING INVERTED COMMENT '',
 INDEX answers_user_idx (`answers.user`) USING INVERTED COMMENT '',
 INDEX answers_date_idx (`answers.date`) USING INVERTED COMMENT ''
) ENGINE=OLAP
DUPLICATE KEY(`qid`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`qid`) BUCKETS 18
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"storage_format" = "V2",
"compression" = "ZSTD",
"light_schema_change" = "true",
"dynamic_schema" = "true",
"disable_auto_compaction" = "false"
); 

mysql> select * from nested.nested where tag match 'java' and `answers.date` element_le '2012-04-08T21:15:33.873Z' limit 10;
ERROR 1105 (HY000): errCode = 2, detailMessage = no function found for MATCH_ELEMENT_LE,`answers.date` MA
2023-05-19 14:57:01 +08:00
14620a6766 [minor](log) add details for unqueryable replicas (#19792)
Add a new FE config: show_details_for_unaccessible_tablet.
Default is false, when set to true, if a query is unable to select a healthy replica,
the detailed information of all the replicas of the tablet including the specific reason why they are unqueryable,
will be printed out.
2023-05-19 08:53:57 +08:00
294599ee45 [feature](jsonb) rename JSONB type name and function name to JSON (#19774)
To be more compatible with MySQL, rename JSONB type name and function name to JSON.

The old JSONB type name and jsonb_xx function can still be used for backward compatibility.

There is a function jsonb_extract remained since json_extract is used by json string function and more work need to change it. It will be changed further.
2023-05-18 16:16:52 +08:00
1d05feea1b [Feature](Nereids) add executable function to support fold constant for functions (#18209)
1. Add date-time functions for fold constant for Nereids.
This is the list of executable date-time function nereids supports up to now:
- now()
- now(int)
- current_timestamp()
- current_timestamp(int)
- localtime()
- localtimestamp()
- curdate()
- current_date()
- curtime()
- current_time()
- date_{add/sub}(),{years/months/days/hours/minutes/seconds}_{add/sub}()
- datediff()
- {date/datev2}()
- {year/quarter/month/day/hour/minute/second}()
- dayof{year/month/week}()
- date_format()
- date_trunc()
- from_days()
- last_day()
- to_monday()
- from_unixtime()
- unix_timestamp()
- utc_timestamp()
- to_date()
- to_days()
- str_to_date()
- makedate()

2. solved problem:
- enable datev2/datetimev2 default.
- refactor Nereids foldConstantOnFE and support fold nested expression.
- separate the executable into multi-files for easily-reading and adding new functions
2023-05-17 21:26:31 +08:00
30c4f25cb3 [fix](multi-catalog) verify the precision of datetime types for each data source (#19544)
Fix threes bugs of timestampv2 precision:
1. Hive catalog doesn't set the precision of timestampv2, and can't get the precision from hive metastore, so set the largest precision for timestampv2;
2. Jdbc catalog use datetimev1 to parse timestamp, and convert to timestampv2, so the precision is lost.
3. TVF doesn't use the precision from meta data of file format.
2023-05-17 20:50:15 +08:00
Pxl
4eb2604789 [Bug](function) fix function define of Retention inconsist and change some static_cast to assert cast (#19455)
1. fix function define of `Retention` inconsist, this function return tinyint on `FE` and return uint8 on `BE`
2. make assert_cast support cast to derived
3. change some static cast to assert cast
4. support sum(bool)/avg(bool)
2023-05-15 11:50:02 +08:00
6748ae4a57 [Feature] Collect the information statistics of the query hit (#18805)
1. Show the query hit statistics for `baseall`

   ```sql
    MySQL [test_query_db]> show query stats from baseall;
    +-------+------------+-------------+
    | Field | QueryCount | FilterCount |
    +-------+------------+-------------+
    | k0    | 0          | 0           |
    | k1    | 0          | 0           |
    | k2    | 0          | 0           |
    | k3    | 0          | 0           |
    | k4    | 0          | 0           |
    | k5    | 0          | 0           |
    | k6    | 0          | 0           |
    | k10   | 0          | 0           |
    | k11   | 0          | 0           |
    | k7    | 0          | 0           |
    | k8    | 0          | 0           |
    | k9    | 0          | 0           |
    | k12   | 0          | 0           |
    | k13   | 0          | 0           |
    +-------+------------+-------------+
    14 rows in set (0.002 sec)

    MySQL [test_query_db]> select k0, k1,k2, sum(k3) from baseall  where k9 > 1 group by k0,k1,k2;
    +------+------+--------+-------------+
    | k0   | k1   | k2     | sum(`k3`)   |
    +------+------+--------+-------------+
    |    0 |    6 |  32767 |        3021 |
    |    1 |   12 |  32767 | -2147483647 |
    |    0 |    3 |   1989 |        1002 |
    |    0 |    7 | -32767 |        1002 |
    |    1 |    8 |    255 |  2147483647 |
    |    1 |    9 |   1991 | -2147483647 |
    |    1 |   11 |   1989 |       25699 |
    |    1 |   13 | -32767 |  2147483647 |
    |    1 |   14 |    255 |         103 |
    |    0 |    1 |   1989 |        1001 |
    |    0 |    2 |   1986 |        1001 |
    |    1 |   15 |   1992 |        3021 |
    +------+------+--------+-------------+
    12 rows in set (0.050 sec)

    MySQL [test_query_db]> show query stats from baseall;
    +-------+------------+-------------+
    | Field | QueryCount | FilterCount |
    +-------+------------+-------------+
    | k0    | 1          | 0           |
    | k1    | 1          | 0           |
    | k2    | 1          | 0           |
    | k3    | 1          | 0           |
    | k4    | 0          | 0           |
    | k5    | 0          | 0           |
    | k6    | 0          | 0           |
    | k10   | 0          | 0           |
    | k11   | 0          | 0           |
    | k7    | 0          | 0           |
    | k8    | 0          | 0           |
    | k9    | 1          | 1           |
    | k12   | 0          | 0           |
    | k13   | 0          | 0           |
    +-------+------------+-------------+
    14 rows in set (0.001 sec)
   ```

2. Show the query hit statistics summary for all the mv in a table

   ```sql
   MySQL [test_query_db]> show query stats from baseall all;
    +-----------+------------+
    | IndexName | QueryCount |
    +-----------+------------+
    | baseall   | 1          |
    +-----------+------------+
    1 row in set (0.005 sec)
   ```

3. Show the query hit statistics detail info for all the mv in a table

   ```sql
    MySQL [test_query_db]> show query stats from baseall all verbose;
    +-----------+-------+------------+-------------+
    | IndexName | Field | QueryCount | FilterCount |
    +-----------+-------+------------+-------------+
    | baseall   | k0    | 1          | 0           |
    |           | k1    | 1          | 0           |
    |           | k2    | 1          | 0           |
    |           | k3    | 1          | 0           |
    |           | k4    | 0          | 0           |
    |           | k5    | 0          | 0           |
    |           | k6    | 0          | 0           |
    |           | k10   | 0          | 0           |
    |           | k11   | 0          | 0           |
    |           | k7    | 0          | 0           |
    |           | k8    | 0          | 0           |
    |           | k9    | 1          | 1           |
    |           | k12   | 0          | 0           |
    |           | k13   | 0          | 0           |
    +-----------+-------+------------+-------------+
    14 rows in set (0.017 sec)
   ```

4. Show the query hit for a database

   ```sql
    MySQL [test_query_db]> show query stats for test_query_db;
    +----------------------------+------------+
    | TableName                  | QueryCount |
    +----------------------------+------------+
    | compaction_tbl             | 0          |
    | bigtable                   | 0          |
    | empty                      | 0          |
    | tempbaseall                | 0          |
    | test                       | 0          |
    | test_data_type             | 0          |
    | test_string_function_field | 0          |
    | baseall                    | 1          |
    | nullable                   | 0          |
    +----------------------------+------------+
    9 rows in set (0.005 sec)
   ```

5. Show query hit statistics for all the databases

   ```sql
    MySQL [(none)]> show query stats;
    +-----------------+------------+
    | Database        | QueryCount |
    +-----------------+------------+
    | test_query_db   | 1          |
    +-----------------+------------+
    1 rows in set (0.005 sec)
   ```
2023-05-15 10:56:34 +08:00
26e930eed1 [Fix](multi-catalog) Make BE selection policy works fine when enable prefer_compute_node_for_external_table (#19346) 2023-05-12 15:32:50 +08:00
a041f8eabe [fix](fe) Fx SimpleDateFormatter thread unsafe issue by replacing to DateTimeFormatter. (#19265)
DateTimeFormatter replace SimpleDateFormat in fe module because SimpleDateFormat is not thread-safe.
2023-05-11 22:50:24 +08:00
6d2070c59d [enhancement](stats) Make stats cache item size configurable (#19205) 2023-05-11 13:59:37 +08:00
d20b5f90d8 [feature](executor) Automatically set the instance_num using the info from be. (#19345)
1. fixed some error regressions (results error with big nstance_num due to incorrect order by).
2. if set parallel_fragment_exec_instance_num to 0, the concurrency in the Pipeline execution engine will automatically be set to half of the number of CPU cores.
3. add limit to parallel_fragment_exec_instance_num that it cannot be set to more than fe.conf::max_instance_num(Default: 128)
```
mysql [(none)]>set parallel_fragment_exec_instance_num = 514;
ERROR 1231 (42000): errCode = 2, detailMessage = Variable 'parallel_fragment_exec_instance_num' can't be set to the value of '514(Should not be set to more than 128)'
```
2023-05-10 17:07:41 +08:00
78435823b6 [Fix](multi catalog)Return all partition values while reading hive table. (#19434)
Return all partition values while reading hive table.
Add a config item for the max value of hive table to partition list cache.
Default value is 100.
2023-05-10 10:55:33 +08:00
4302ceaee8 [Improvement](data types) enhance show data types stmt (#18831) 2023-05-09 09:42:44 +08:00
e78149cb65 [Enhencement](Export) add property for outfile/export and add test (#18997)
This pr does three things:
1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first.
2. add p2 test for export
3. modify docs
2023-05-08 14:02:20 +08:00
abc73ac1eb [refactor](cluster)(step-1) remove cluster related stmt (#19355)
* [refactor](cluster)(step-1) remove cluster stmt
2023-05-07 18:44:42 +08:00
3f6e5118e6 [enchancement](statistics) support periodic collection of statistics (#19247)
This PR enables periodic collection of statistics and is a precursor to automatic statistics collection. It mainly includes the following contents:

support periodic collection of statistics.
Change the type of Date in statistics p0 to DateV2(see [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type #19077) for test locally. complement cases(remove Chinese characters, optimize code, etc) , improve stability.
Supports setting whether to keep records of statistics synchronization job info, convenient for use in p0 testing.
The statistics job table was modified, and some auxiliary judgments were added to avoid the user perceiving the modification. This function was removed when the table schema is stable.
2023-05-06 14:53:06 +08:00
3287f350de [feature](table) implement the round robin selection be when create tablet (#19167) 2023-05-06 14:46:48 +08:00
70236adc1f [Refactor](doc)(config)(variable) use script to generate doc for FE config and session variables (#19246)
The document of configs(FE and BE) and session variables is hard to maintain.
Because developer need to modify both code and document.
And you can see that some of config's document is missing.

So I plan to write the document of config or variables directly in code, and using
script to generate document automatically.

How To
This CL mainly changes:

Add field in Config and Session Variables' annaotion

description: The description of the config or variable item. It is a String array. And first element is in Chinese, second is in English
options: the valid options if the config or variable is enum.
Add a scripts docs/generate-config-and-variable-doc.sh

Simple run sh docs/generate-config-and-variable-doc.sh and it will generate docs of FE config and variables,
And save it under docs/admin-manual/config/fe-config.md and docs/advanced/variables.md,
both in Chinese and in English.

And there are template markdowns for this script to read and replace with real doc content.

TODO
Too many description need to be filled. I will finish them in next PR. And now the origin doc remain unchanged.
Find a way to check the description field of config and variables, to make sure we won't missing it.
Generate doc for BE config.
2023-05-05 14:42:43 +08:00
5459cd9c30 [Improve](fe)Upgrade dependencies and optimize jar package management (#18882)
bind netty-version to 4.1.89-final
bind jettison to 1.5.4
upgrade hadoop version to 3.3.5
upgrade range-plugins-common to 2.4.0
bind bcprov-jdk15on to 2.4.0
upgrade and bind woodstox to 6.5.1
upgrade and bind kerby to 2.0.3
upgrade hudi to 0.13.0
upgrade parquet to 1.13.0
upgrade maven-source-plugin to 3.2.1
upgrade maven-assembly-plugin to 3.3.0
upgrade maven-javadoc-plugin to 3.3.2
upgrade maven-shade-plugin to 3.3.4
upgrade maven-clean-plugin to 3.1.0
Remove meaningless plugins
Optimize doris maven path
Unify the Java modules for management in fe
2023-05-04 10:07:37 +08:00
43e70ab252 [chore](recover) add a config to recover remaining data in emergency (#18986) 2023-04-28 17:42:00 +08:00
Pxl
ec517a53a8 [Chore](build) upgrade clang-format version to 16 && move thrift to fe-common (#19155)
upgrade clang-format version to 16
move thrift to fe-common
fix core dump on pipeline engine when operator canceled and not prepared
2023-04-28 14:14:51 +08:00
5e9c0c3500 [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type (#19077)
* prohibits date and decimal type

* add config in test
2023-04-28 11:31:51 +08:00
f9f5bbde6f [feature-wip](duplicate_no_keys) add create duplicate table without keys (#18758) 2023-04-27 09:59:56 +08:00
270be55c4c [feat](stats) Add option to config file to enable or disable analyze function (#19062)
Add this option in conf:

    /**
     * If set false, user couldn't submit analyze SQL and FE won't allocate any related resources.
     */
    @ConfField
    public static boolean enable_stats = true;

It will be checked during analyze of analyze related stmt and init analyze manager
2023-04-26 13:37:08 +08:00
61b7a52444 [Enhancement](multi-catalogs) Use decimal V3 type in multi-catalogs module. (#18926)
1. Use decimal V3 type in JDBC and Iceberg tables.
2. Fix hdfs TVF decimal V3 type and regression test.
2023-04-25 14:49:40 +08:00
fd4576e420 [Fix](auth) fix some problem of skip_localhost_auth_check in FE config #18996 2023-04-25 09:10:01 +08:00
2d7903e2bd [Feature](multi-catalog) support query hive views. (#18815)
A very simple implementation to query hive views, it is an EXPERIMENTAL feature.
We can try to parse the ddl of hive views and try to execute the query relies on the fact that HiveQL
is very similar to Doris SQL. But if the ddl of hive views use some complicated or incompatible grammar,
the query might fail.
2023-04-24 08:49:26 +08:00
166bed11d4 [Enchancement](auth) Forbid to login doris from 127.0.0.1 without password (#18816)
* forbid to login from 127.0.0.1 without password

* add localhost limit

* rename
2023-04-23 13:56:31 +08:00
3007cd49f2 [enhancement](mysql) enable two-way ssl authentication (#18530)
According to the mysql-ssl, enable two-way SSL authentication.
2023-04-21 14:39:14 +08:00
8e2146f48c [Enhencement](Export) support export with outfile syntax (#18325)
`Export` syntax provides asynchronous export function, but `Export` does not achieve vectorization.
`Outfile` syntax provides synchronous export function`.
So we can reimplement the export syntax with oufile syntax.
2023-04-20 17:27:04 +08:00
fb377a9da9 [Improvement](functions)Optimized some datetime function's return value (#18369) 2023-04-19 15:51:11 +08:00
5c076b738b [improvement](resource-group) add test for resource group (#18575)
Co-authored-by: wangbo <youseebiggirl_t_t@qq.com>
2023-04-18 20:20:50 +08:00
5300b21db7 [Bug](DECIMALV3) report failure if a decimal value is overflow (#18336) 2023-04-17 13:18:14 +08:00
1cbbc60822 [feature](config) support "experimental" prefix for FE config (#18699)
For each release of Doris, there are some experimental features.
These feature may not stable or qualified enough, and user need to use it by setting config or session variables,
eg, set enable_mtmv = true, otherwise, these feature is disable by default.

We should explicitly tell user which features are experimental, so that user will notice that and decide whether to
use it.

Changes
In this PR, I support the experimental_ prefix for FE config and session variables.

Session Variable

Given enable_nereids_planner as an example.

The Nereids planner is an experimental feature in Doris, so there is an EXPERIMENTAL annotation for it:

@VariableMgr.VarAttr(..., expType = ExperimentalType.EXPERIMENTAL)
private boolean enableNereidsPlanner = false;
And for compatibility, user can set it by:

set enable_nereids_planner = true;
set experimental_enable_nereids_planner = true;
And for show variables, it will only show experimental_enable_nereids_planner entry.

And you can also see all experimental session variables by:

show variables like "%experimental%"
Config

Same as session variable, give enable_mtmv as an example.

@ConfField(..., expType = ExperimentalType.EXPERIMENTAL)
public static boolean enable_mtmv = false;
User can set it in fe.conf or ADMIN SET FRONTEND CONFIG stmt with both names:

enable_mtmv
experimental_enable_mtmv
And user can see all experimental FE configs by:

ADMIN SHOW FRONTEND CONFIG LIKE "%experimental%";
TODO
Support this feature for BE config

Only add experimental for:

enable_pipeline_engine
enable_nereids_planner
enable_single_replica_insert
and FE config:

enable_mtmv
enabel_ssl
enable_fqdn_mode
Should modify other config and session vars
2023-04-16 18:32:10 +08:00
5d1abe4507 [Bugfix](Mtmv)Fix mtmv meta load failed (#18605)
MTMV meta load fail since meta was public to the CI System
2023-04-14 16:29:18 +08:00
9634d21a28 [fix](info_db) avoid infodb query timeout when external catalog info is too large or is not reachable (#18662)
When query tables in information_schema databases, it may timeout due to:

There are external catalog with too many tables.
The external catalog is unreachable
So I add a new FE config infodb_support_ext_catalog.
The default is false, which means that when select from tables in information_schema database,
the result will not contain the information of the table in external catalog.

Describe your changes.
2023-04-14 14:40:31 +08:00
db5ec6f6b0 [FIX](thrift)Fix with 1.2 version for thrift #18658 2023-04-14 14:07:42 +08:00
33eec9096f [Enhancement](FE) use customized grpc threadpool to get better metric for grpc from FE to BE (#13983)
Previously in Doris FE, there is no specific thread pool for grpc-client-channel,
by default the underlying netty logic would use one dynamic unbounded cache threadpool.
The workload for this grpc threadpool is unseen.
Use ThreadpoolMgr to create one customized threadpool to get Prometheus-compatible metric data.
2023-04-13 20:09:26 +08:00
75fd4b70fa [improve](fe)Optimize fe binary package packaging (#18554) 2023-04-12 12:58:45 +08:00
cb644d5bc3 [feature](function) support any type in SQL function (#18392)
Add AnyType to Doris.
Support Inference function in fe SQL function.
2023-04-11 19:45:02 +08:00
c13f806e53 [Refactor](multi catalog)Split ExternalFileScanNode into FileQueryScanNode and FileLoadScanNode (#18342)
Split ExternalFileScanNode into FileQueryScanNode and FileLoadScanNode.
Remove some useless code in FileLoadScanNode.
Remove unused config item: enable_vectorized_load and enable_new_load_scan_node
2023-04-11 10:30:38 +08:00
512718f629 [enhancement](Nereids)(planner) fix some problem in Nereids and legacy planner (#18280)
1. remove TypeCoercion and CharacterLiteralTypeCoercion
2. Nereids Cast do not relay on legacy planner's analyze()
3. fix below problem in legacy planner, after this PR
    a. BOOLEAN can cast to DECIMALV2 explicitly
    b. compare between BOOLEAN and DATE will cast both side to DOUBLE
    c. HLL cannot be implicitly cast to any other type
2023-04-10 18:25:33 +08:00
9700721982 [feature-wip](resource-group) Support create and show resource groups (#18184) 2023-04-10 15:18:48 +08:00
09d98c1663 [BugFix](MTMV)Set enable_mtmv_scheduler_framework master only to avoid regression fail (#18473)
Set enable_mtmv_scheduler_framework master only to avoid regression fail
2023-04-09 08:47:18 +08:00
ea60d65384 [Improvement](multi catalog)Move split size config to session variable (#18355)
Move split size config to session variable. Before, it was in Config class, user need to restart FE after change it.
2023-04-05 01:02:47 +08:00
7c36bef6bc [Feature-Wip](MySQL Load)Show load warning for my sql load (#18224)
1. Support the show load warnings for mysql load to get the detail error message.
2. Fix fillByteBufferAsync not mark the load as finished in same data load
3. Fix drain data only in client mode.
2023-04-04 22:44:48 +08:00
66bfd18601 [opt](file_reader) add prefetch buffer to read csv&json file (#18301)
Co-authored-by: ByteYue <[yj976240184@gmail.com](mailto:yj976240184@gmail.com)>
This PR is an optimization for https://github.com/apache/doris/pull/17478:
1. Change the buffer size of `LineReader` to 4MB to align with the size of prefetch buffer.
2. Lazily prefetch data in the first read to prevent wasted reading.
3. S3 block size is 32MB only, which is too small for a file split. Set 128MB as default file split size.
4. Add `_end_offset` for prefetch buffer to prevent wasted reading.

The query performance of reading data on object storage is improved by more than 3x+.
2023-04-04 19:05:22 +08:00