In some scenarios, when users use dynamic partitions, they hope to use Doris' hierarchical storage
function at the same time.
For example, for the dynamic partition rule of partitioning by day, we hope that the partitions of the last 3 days
are stored on the SSD storage medium and automatically migrated to the HDD storage medium after expiration.
This CL add a new dynamic partition property: "hot_partition_num".
This parameter is used to specify how many recent partitions need to be stored on the SSD storage medium.
In BE, when a problem happened, in the log, we can find the database id, table id, partition id,
but no database name, table name, partition name.
In FE, there also no way to find database name/table name/partition name accourding to
database id/table id/partition id. Therefore, this patch add 3 new commands:
1. show database id;
mysql> show database 10002;
+----------------------+
| DbName |
+----------------------+
| default_cluster:test |
+----------------------+
2. show table id;
mysql> show table 11100;
+----------------------+-----------+-------+
| DbName | TableName | DbId |
+----------------------+-----------+-------+
| default_cluster:test | table2 | 10002 |
+----------------------+-----------+-------+
3. show partition id;
mysql> show partition 11099;
+----------------------+-----------+---------------+-------+---------+
| DbName | TableName | PartitionName | DbId | TableId |
+----------------------+-----------+---------------+-------+---------+
| default_cluster:test | table2 | p201708 | 10002 | 11100 |
+----------------------+-----------+---------------+-------+---------+
1. When an oom error occurs when writing bdbje, catch the error and exit the process.
2. Increase the timeout period of bdbje replica ack and change it to a configuration.
The buffered reader's _cur_offset should be initialized as same as the inner file reader's,
to make sure that the reader will start to read at rignt position.
Support when creating a kafka routine load, start consumption from a specified point in time instead of a specific offset.
eg:
```
FROM KAFKA
(
"kafka_broker_list" = "broker1:9092,broker2:9092",
"kafka_topic" = "my_topic",
"property.kafka_default_offsets" = "2021-10-10 11:00:00"
);
or
FROM KAFKA
(
"kafka_broker_list" = "broker1:9092,broker2:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2",
"kafka_offsets" = "2021-10-10 11:00:00, 2021-10-10 11:00:00, 2021-10-10 12:00:00"
);
```
This PR also reconstructed the analysis method of properties when creating or altering
routine load jobs, and unified the analysis process in the `RoutineLoadDataSourceProperties` class.
The old colocate aggregation can only cover the case where the child is scan.
In fact, as long as the child's data distribution meets the requirements,
no matter what the plan node on the child node is, a colocate aggregation can be performed.
This PR also fixes the correct data partition attribute of fragment.
The data partition of fragment which contains scan node is Hash Partition rather than Random.
This modification is mainly to determine the possibility of colocate
through the correct distribution of child fragments.
The expose annotation is used in the persistence logic used by the old backup recovery.
This annotation by itself is meant to ignore some variables when serializing and deserializing.
However, this variable was used incorrectly and gson did not ignore the variables that should have been ignored.
This results in duplicate initialization when fe is restarted.
This pr uses the doris wrapped Gson directly, than eliminates the use of the expose annotation.
Fixed sortedTabletInfoList being repeatedly initialized resulting in incorrect numbers.
Fixed#5852
Modify spark, flink doris connector to send request to FE, fix the problem of POST method,
it should be the same as the method when sending the request
1 Make some MemTracker have reasonable parent MemTracker not the root tracker
2 Make each MemTracker can be easily to trace.
3 Add show level of MemTracker to reduce the MemTracker show in the web page to have a way to control show how many tracker in web page.
Currently, the `show data` does not support sorting. When the number of tables increases, it is inconvenient to manage. Need to support sorting
like:
```
mysql> show data order by ReplicaCount desc,Size asc;
+-----------+-------------+--------------+
| TableName | Size | ReplicaCount |
+-----------+-------------+--------------+
| table_c | 3.102 KB | 40 |
| table_d | .000 | 20 |
| table_b | 324.000 B | 20 |
| table_a | 1.266 KB | 10 |
| Total | 4.684 KB | 90 |
| Quota | 1024.000 GB | 1073741824 |
| Left | 1024.000 GB | 1073741734 |
+-----------+-------------+--------------+
```
The cause of the problem is that after query cancel, OlapScanNode::transfer_thread still continues to schedule
OlapScanNode::scanner_thread until all tasks are scheduled.
Although each task does not scan data and exits quickly, it still consumes a lot of resources.
(Guess)This may be the cause of the BUG (#5767) causing the I/O to be full.
So after query cancel, immediately exit the scheduling loop in transfer_thread, and after waiting for
the end of all scanner_threads, transfer_thread will also exit.
* Solve the situation that the hardware information of the Web UI home page cannot be loaded
Solve the situation that the hardware information of the Web UI home page cannot be loaded
* Add flink doris connector design document
Add flink doris connector design document
* flink doris connector design english docment
flink doris connector design english docment
Co-authored-by: zhangjf@shuhaisc.com <zhangfeng800729>
When a query is retried, the FE log cannot quickly associate the new and old queries by query id.
This will increase the complexity of troubleshooting.
Modify the log printing logic of FE to associate the new and old query ids, and the print log looks like this:
Query {old_query_id} {retry_times} times with new query id: {new_query_id}
1. relocation R_X86_64_32 against `__gxx_personality_v0' can not be used when making a shared object; recompile with -fPIC
2. warning: the use of `tmpnam' is dangerous, better use `mkstemp'
3. Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test couldn't detect the number of threads.
* Solve the situation that the hardware information of the Web UI home page cannot be loaded
Solve the situation that the hardware information of the Web UI home page cannot be loaded
* Http v2 version is enabled by default
Http v2 version is enabled by default
Co-authored-by: zhangjf@shuhaisc.com <zhangfeng800729>
* [Bug] Fix bug that database not found when replaying batch transaction remove log
[GlobalTransactionMgr.replayBatchRemoveTransactions():353] replay batch remove transactions failed. db 0
org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = databaseTransactionMgr[0] does not exist
at org.apache.doris.transaction.GlobalTransactionMgr.getDatabaseTransactionMgr(GlobalTransactionMgr.java:84) ~[palo-fe.jar:3.4.0]
at org.apache.doris.transaction.GlobalTransactionMgr.replayBatchRemoveTransactions(GlobalTransactionMgr.java:350) [palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:601) [palo-fe.jar:3.4.0]
at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2452) [palo-fe.jar:3.4.0]
at org.apache.doris.master.Checkpoint.runAfterCatalogReady(Checkpoint.java:101) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:3.4.0]
The id of information_scheam database is 0, and it has no txn at all.
1. Reduce lock conflicts in RuntimeProfile of be;
2. can view query profile when the query is executing;
3. reduce wait time for 'show proc /current_queries'.