* [Enhance] Add MetaUrl and CompactionUrl for "show tablet" stmt
Add MetaUrl and CompactionUrl in result of following stmt:
`show tablet 10010`;
* fix ut
* add doc
Co-authored-by: chenmingyu <chenmingyu@baidu.com>
Fix: #3946
CL:
1. Add prepare phase for `from_unixtime()`, `date_format()` and `convert_tz()` functions, to handle the format string once for all.
2. Find the cctz timezone when init `runtime state`, so that don't need to find timezone for each rows.
3. Add constant rewrite rule for `utc_timestamp()`
4. Add doc for `to_date()`
5. Comment out the `push_handler_test`, it can not run in DEBUG mode, will be fixed later.
6. Remove `timezone_db.h/cpp` and add `timezone_utils.h/cpp`
The performance shows bellow:
11,000,000 rows
SQL1: `select count(from_unixtime(k1)) from tbl1;`
Before: 8.85s
After: 2.85s
SQL2: `select count(from_unixtime(k1, '%Y-%m-%d %H:%i:%s')) from tbl1 limit 1;`
Before: 10.73s
After: 4.85s
The date string format seems still slow, we may need a further enhancement about it.
Currently we choose BE random without check disk is available,
the create table will failed until create tablet task is sent to BE
and BE will check is there has available capacity to create tablet.
So check backend disk available by storage medium will reduce unnecessary RPC call.
1. Split /_cluster/state into /_mapping and /_search_shards requests to reduce permissions and make the logic clearer
2. Rename part es related objects to make their representation more accurate
3. Simply support docValue and Fields in alias mode, and take the first one by default
#3311
Fix#3920
CL:
1. Parse the TCP metrics header in `/proc/net/snmp` to get the right position of the metrics.
2. Add 2 new metrics: `tcp_in_segs` and `tcp_out_segs`
When starting FE with `start_fe.sh --helper xxx` command, do not allow to
point helper to FE itself. Because this is meaningless and may cause some
confusing problemes.
This configuration is specifically used to limit timeout setting for stream load.
It is to prevent that failed stream load transactions cannot be canceled within
a short time because of the user's large timeout setting.
This CL mainly support timezone in dynamic partition:
1. use new Java Time API to replace Calendar.
2. support set time zone in dynamic partition parameters.
This plugin is used to output data to Doris for logstash
Use the HTTP protocol to interact with the Doris FE Http interface
Load data through Doris's stream load
This CL mainly changes:
1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine.
2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.
Problem is described in ISSUE #3678
This CL mainly changed to rule of creating dynamic partition.
1. If time unit is DAY, the logic remains unchanged.
2. If time unit is WEEK, the logical changes are as follows:
1. Allow to set the start day of every week, the default is Monday. Optional Monday to Sunday
2. Assuming that the starting day is a Tuesday, the range of the partition is Tuesday of the week to Monday of the next week.
3. If time unit is MONTH, the logical changes are as follows:
1. Allow to set the start date of each month. The default is 1st, and can be selected from 1st to 28th.
2. Assuming that the starting date is the 2nd, the range of the partition is from the 2nd of this month to the 1st of the next month.
4. The `SHOW DYNAMIC PARTITION TABLES` statement adds a `StartOf` column to show the start day of week or month.
It is recommended to refer to the example in `dynamic-partition.md` to understand.
TODO:
Better to support HOUR and YEAR time unit. Maybe in next PR.
FIX: #3678
1. User interface:
1.1 Spark resource management
Spark is used as an external computing resource in Doris to do ETL work. In the future, there may be other external resources that will be used in Doris, for example, MapReduce is used for ETL, Spark/GPU is used for queries, HDFS/S3 is used for external storage. We introduced resource management to manage these external resources used by Doris.
```sql
-- create spark resource
CREATE EXTERNAL RESOURCE resource_name
PROPERTIES
(
type = spark,
spark_conf_key = spark_conf_value,
working_dir = path,
broker = broker_name,
broker.property_key = property_value
)
-- drop spark resource
DROP RESOURCE resource_name
-- show resources
SHOW RESOURCES
SHOW PROC "/resources"
-- privileges
GRANT USAGE_PRIV ON RESOURCE resource_name TO user_identity
GRANT USAGE_PRIV ON RESOURCE resource_name TO ROLE role_name
REVOKE USAGE_PRIV ON RESOURCE resource_name FROM user_identity
REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name
```
- CREATE EXTERNAL RESOURCE:
FOR user_name is optional. If there has, the external resource belongs to this user. If not, the external resource belongs to the system and all users are available.
PROPERTIES:
1. type: resource type. Only support spark now.
2. spark configuration: follow the standard writing of Spark configurations, refer to: https://spark.apache.org/docs/latest/configuration.html.
3. working_dir: optional, used to store ETL intermediate results in spark ETL.
4. broker: optional, used in spark ETL. The ETL intermediate results need to be read with the broker when pushed into BE.
Example:
```sql
CREATE EXTERNAL RESOURCE "spark0"
PROPERTIES
(
"type" = "spark",
"spark.master" = "yarn",
"spark.submit.deployMode" = "cluster",
"spark.jars" = "xxx.jar,yyy.jar",
"spark.files" = "/tmp/aaa,/tmp/bbb",
"spark.yarn.queue" = "queue0",
"spark.executor.memory" = "1g",
"spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999",
"spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000",
"working_dir" = "hdfs://127.0.0.1:10000/tmp/doris",
"broker" = "broker0",
"broker.username" = "user0",
"broker.password" = "password0"
)
```
- SHOW RESOURCES:
General users can only see their own resources.
Admin and root users can show all resources.
1.2 Create spark load job
```sql
LOAD LABEL db_name.label_name
(
DATA INFILE ("/tmp/file1") INTO TABLE table_name, ...
)
WITH RESOURCE resource_name
[(key1 = value1, ...)]
[PROPERTIES (key2 = value2, ... )]
```
Example:
```sql
LOAD LABEL example_db.test_label
(
DATA INFILE ("hdfs:/127.0.0.1:10000/tmp/file1") INTO TABLE example_table
)
WITH RESOURCE "spark0"
(
"spark.executor.memory" = "1g",
"spark.files" = "/tmp/aaa,/tmp/bbb"
)
PROPERTIES ("timeout" = "3600")
```
The spark configurations in load stmt can override the existing configuration in the resource for temporary use.
#3010