This CL mainly support timezone in dynamic partition:
1. use new Java Time API to replace Calendar.
2. support set time zone in dynamic partition parameters.
* 1. Add enable spilling in query option, support spill disk in Analytic_Eval_Node, FE can open enable spilling by
set enable_spilling = true;
Now, Sort Node and Analytic_Eval_Node can spill to disk.
2. Delete merge merge_sorter code we do not use now.
3. Replace buffered_tuple_stream by buffered_tuple_stream2 in Analytic_Eval_Node and support spill to disk. Delete the useless code of buffered_block_mgr and buffered_tuple_stream.
4. Add DataStreamRecvr Profile. Move the counter belong to DataStreamRecvr from fragment to DataStreamRecvr Profile to make clear of Running Profile.
* change some hint in code
* replace disable_spill with enable_spill which is better compatible to FE
1. Missing field `partitionIndexMap` in SchemaChangeJobV2
2. Pair in field `indexSchemaVersionAndHashMap` can not be persisted by GSON
3. Exit the FE process when replay edit log error.
Fix: #3802
Bitmap and Hll type can not be used with incorrect aggregate functions, which will cause to BE crush.
Add some logical checks in FE's ColumnDef#analyze to avoid creating tables or changing schemas incorrectly.
Keys never be bitmap or hll type
values with bitmap or hll type have to be associated with bitmap_union or hll_union
The unique table also should be compensated candidate index.
The reason is the same as the agg table type.
Fixed#3771.
Change-Id: Ic04b0360a0b178cb0b6ee635e56f48852092ec09
If you a load task encoutering error, it will be cancelled.
At this time, FE will clear the Txn according to the DbName.
In FE, DbName should be added by cluter name.
If missing cluster name, it will encounter NullPointer.
As a result, the Txn will still exists until timeout.
This CL mainly changes:
1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine.
2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.
BinaryPredicte's equals function compare by opcode ,
but the opcode may not be inited yet.
so it will return true if this child is same, for example `a>1` and `a<1` are equal.
major work
1. Correct the value of ```numNodes``` and ```cardinality``` when ```OlapTableScan``` computeStats so that the ``` broadcast cost``` and ```paritition join cost ``` can be calculated correctly.
2. Find a input fragment with higher parallelism for shuffle fragment to assign backend
Add spark etl job config, includes:
1. Schema of the load tables, including columns, partitions and rollups
2. Infos of the source file, including split rules, corresponding columns, and conversion rules
3. ETL output directory and file name format
4. Job properties
5. Version for further extension
* Serialize origin stmt in Rollup Job and MV Meta
In materialized view 2.0, the define expr is serialized in column.
The method is that doris serialzie the origin stmt of Create Materialzied View Stmt in RollupJobV2 and MVMeta.
The define expr will be extract from the origin stmt after meta is deserialized.
The define expr is necessary for bitmap and hll materialized view.
For example:
MV meta: __doris_mv_bitmap_k1, bitmap_union, to_bitmap(k1)
Origin stmt: select bitmap_union(to_bitmap(k1)) from table
Deserialize meta: __doris_mv_bitmap_k1, bitmap_union, null
After extract: the define expr `to_bitmap(k1)` from origin stmt should be extracted.
__doris_mv_bitmap_v1, bitmap_union, to_bitmap(k1) (which comes from the origin stmt)
Change-Id: Ic2da093188d8985f5e97be5bd094e5d60d82c9a7
* Add comment of read method
Change-Id: I4e1e0f4ad0f6e76cdc43e49938de768ec3b0a0e8
* Fix ut
Change-Id: I2be257d512bf541f00912a374a2e07a039fc42b4
* Change code style
Change-Id: I3ab23f5c94ae781167f498fefde2d96e42e05bf9
Problem is described in ISSUE #3678
This CL mainly changed to rule of creating dynamic partition.
1. If time unit is DAY, the logic remains unchanged.
2. If time unit is WEEK, the logical changes are as follows:
1. Allow to set the start day of every week, the default is Monday. Optional Monday to Sunday
2. Assuming that the starting day is a Tuesday, the range of the partition is Tuesday of the week to Monday of the next week.
3. If time unit is MONTH, the logical changes are as follows:
1. Allow to set the start date of each month. The default is 1st, and can be selected from 1st to 28th.
2. Assuming that the starting date is the 2nd, the range of the partition is from the 2nd of this month to the 1st of the next month.
4. The `SHOW DYNAMIC PARTITION TABLES` statement adds a `StartOf` column to show the start day of week or month.
It is recommended to refer to the example in `dynamic-partition.md` to understand.
TODO:
Better to support HOUR and YEAR time unit. Maybe in next PR.
FIX: #3678
Add a JSON format for existing metrics like this.
```
{
"tags":
{
"metric":"thread_pool",
"name":"thrift-server-pool",
"type":"active_thread_num"
},
"unit":"number",
"value":3
}
```
I add a new JsonMetricVisitor to handle the transformation.
It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor.
Also I add
1. A unit item to indicate the metric better
2. Cloning tablet statistics divided by database.
3. Use white space to replace newline in audit.log
Change the load label of audit plugin as:
`audit_yyyyMMdd_HHmmss_feIdentity`.
The `feIdentity` is got from the FE which run this plugin, currently just use FE's IP_editlog_port.
1. User interface:
1.1 Spark resource management
Spark is used as an external computing resource in Doris to do ETL work. In the future, there may be other external resources that will be used in Doris, for example, MapReduce is used for ETL, Spark/GPU is used for queries, HDFS/S3 is used for external storage. We introduced resource management to manage these external resources used by Doris.
```sql
-- create spark resource
CREATE EXTERNAL RESOURCE resource_name
PROPERTIES
(
type = spark,
spark_conf_key = spark_conf_value,
working_dir = path,
broker = broker_name,
broker.property_key = property_value
)
-- drop spark resource
DROP RESOURCE resource_name
-- show resources
SHOW RESOURCES
SHOW PROC "/resources"
-- privileges
GRANT USAGE_PRIV ON RESOURCE resource_name TO user_identity
GRANT USAGE_PRIV ON RESOURCE resource_name TO ROLE role_name
REVOKE USAGE_PRIV ON RESOURCE resource_name FROM user_identity
REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name
```
- CREATE EXTERNAL RESOURCE:
FOR user_name is optional. If there has, the external resource belongs to this user. If not, the external resource belongs to the system and all users are available.
PROPERTIES:
1. type: resource type. Only support spark now.
2. spark configuration: follow the standard writing of Spark configurations, refer to: https://spark.apache.org/docs/latest/configuration.html.
3. working_dir: optional, used to store ETL intermediate results in spark ETL.
4. broker: optional, used in spark ETL. The ETL intermediate results need to be read with the broker when pushed into BE.
Example:
```sql
CREATE EXTERNAL RESOURCE "spark0"
PROPERTIES
(
"type" = "spark",
"spark.master" = "yarn",
"spark.submit.deployMode" = "cluster",
"spark.jars" = "xxx.jar,yyy.jar",
"spark.files" = "/tmp/aaa,/tmp/bbb",
"spark.yarn.queue" = "queue0",
"spark.executor.memory" = "1g",
"spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999",
"spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000",
"working_dir" = "hdfs://127.0.0.1:10000/tmp/doris",
"broker" = "broker0",
"broker.username" = "user0",
"broker.password" = "password0"
)
```
- SHOW RESOURCES:
General users can only see their own resources.
Admin and root users can show all resources.
1.2 Create spark load job
```sql
LOAD LABEL db_name.label_name
(
DATA INFILE ("/tmp/file1") INTO TABLE table_name, ...
)
WITH RESOURCE resource_name
[(key1 = value1, ...)]
[PROPERTIES (key2 = value2, ... )]
```
Example:
```sql
LOAD LABEL example_db.test_label
(
DATA INFILE ("hdfs:/127.0.0.1:10000/tmp/file1") INTO TABLE example_table
)
WITH RESOURCE "spark0"
(
"spark.executor.memory" = "1g",
"spark.files" = "/tmp/aaa,/tmp/bbb"
)
PROPERTIES ("timeout" = "3600")
```
The spark configurations in load stmt can override the existing configuration in the resource for temporary use.
#3010