We created multiple rowset readers to read data of one tablet,
after one rowset reader has reached EOF, it can be released to
reduce resource (typically memory) consumption.
As the same, we can release segment reader when it reach EOF.
`Select outfile into` currently only supports to export data with CSV format.
This patch extends the feature to supports parquet format.
Usage:
LocaFile:
```
SELECT citycode FROM table1 INTO OUTFILE "file:///root/doris/" FORMAT AS PARQUET PROPERTIES
("schema"="required,int32,siteid;", "parquet.compression"="snappy");
```
BrokerFile:
```
SELECT siteid FROM table1 INTO OUTFILE "hdfs://host/test_sql_prc_2019_02_19/" FORMAT AS PARQUET
PROPERTIES (
"broker.name" = "hdfs_broker",
"broker.hadoop.security.authentication" = "kerberos",
"broker.kerberos_principal" = "test",
"broker.kerberos_keytab_content" = "base64" ,
"schema"="required,int32,siteid;"
);
```
Field `schema` is required, which defines the schema of a parquet file.
Prefix `parquet.` is the parquet file properties, like compression, version, enable_dictionary.
The version information of the tablet will be stored in the memory
in an adjacency graph data structure.
And as the new version is written and the old version is deleted,
the data structure will begin to have empty vertex with no edge associations(orphan vertex).
These orphan vertexs should be removed somehow.
1. The partitions set by the admin repair command are prioritized
to ensure that the tablets of these partitions can be repaired as soon as possible.
2. Add an FE metric "query_begin" to monitor the number of queries submitted to the Doris.
For PR #5792. This patch add a new param `cache type` to distinguish sql cache and partition cache.
When update sql cache, we make assure one sql key only has one version cache.
When parsing memory parameters in `ParseUtil::parse_mem_spec`, convert the percentage to `double` instead of `int`.
The currently affected parameters include `mem_limit` and `storage_page_cache_limit`
According to the LRU priority, the `lru list` is split into `lru normal list` and `lru durable list`,
and the two lists are traversed in sequence during LRU evict, avoiding invalid cycles.
If query is memory exceed, a detail info where memory exceed is required.
However it is not necessary to return the entire query stack to the end user.
The query stack only needs to be printed in the be log.
The buffered reader's _cur_offset should be initialized as same as the inner file reader's,
to make sure that the reader will start to read at rignt position.
Support when creating a kafka routine load, start consumption from a specified point in time instead of a specific offset.
eg:
```
FROM KAFKA
(
"kafka_broker_list" = "broker1:9092,broker2:9092",
"kafka_topic" = "my_topic",
"property.kafka_default_offsets" = "2021-10-10 11:00:00"
);
or
FROM KAFKA
(
"kafka_broker_list" = "broker1:9092,broker2:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2",
"kafka_offsets" = "2021-10-10 11:00:00, 2021-10-10 11:00:00, 2021-10-10 12:00:00"
);
```
This PR also reconstructed the analysis method of properties when creating or altering
routine load jobs, and unified the analysis process in the `RoutineLoadDataSourceProperties` class.
1 Make some MemTracker have reasonable parent MemTracker not the root tracker
2 Make each MemTracker can be easily to trace.
3 Add show level of MemTracker to reduce the MemTracker show in the web page to have a way to control show how many tracker in web page.
The cause of the problem is that after query cancel, OlapScanNode::transfer_thread still continues to schedule
OlapScanNode::scanner_thread until all tasks are scheduled.
Although each task does not scan data and exits quickly, it still consumes a lot of resources.
(Guess)This may be the cause of the BUG (#5767) causing the I/O to be full.
So after query cancel, immediately exit the scheduling loop in transfer_thread, and after waiting for
the end of all scanner_threads, transfer_thread will also exit.
* [Bug] Fix bug that database not found when replaying batch transaction remove log
[GlobalTransactionMgr.replayBatchRemoveTransactions():353] replay batch remove transactions failed. db 0
org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = databaseTransactionMgr[0] does not exist
at org.apache.doris.transaction.GlobalTransactionMgr.getDatabaseTransactionMgr(GlobalTransactionMgr.java:84) ~[palo-fe.jar:3.4.0]
at org.apache.doris.transaction.GlobalTransactionMgr.replayBatchRemoveTransactions(GlobalTransactionMgr.java:350) [palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:601) [palo-fe.jar:3.4.0]
at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2452) [palo-fe.jar:3.4.0]
at org.apache.doris.master.Checkpoint.runAfterCatalogReady(Checkpoint.java:101) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:3.4.0]
The id of information_scheam database is 0, and it has no txn at all.
1. Reduce lock conflicts in RuntimeProfile of be;
2. can view query profile when the query is executing;
3. reduce wait time for 'show proc /current_queries'.
1. Fix NPE in ReplicasProcNode when backend does not exist
2. Forbid the create table like statement to specify the view.
3. Check self ip when starting FE to see if it use the origin ip.
4. Modify the error msg of tablet sink to show more detail errors.
1. Add /api/compaction/run_status to show the running compaction tasks.
2. Support do base and cumulative compaction for one tablet at same time.
3. Modify some log level.
4. Add a feedback document.
The previous replay logic does not record the size of the map, which eventually resulted in EOF when reading the log.
This pr replaces the replay logic directly with json.
At the same time, the replay logic of image is supplemented.
The pr ensure that the attributes 'lastStreamLoadTime' of backend can be correctly recorded in the image.
Doris version-0.12 V1 segment only supports creating zone map index for key columns.
Doris version-0.13 supports creating zone map index for key columns and duplicate model columns.
Latest version supports creating zone map for key columns in AGG_KEYS or all columns in UNIQUE_KEYS or DUP_KEYS.
In V1 Segment file, the tablet meta contains zone map index. But in V2 Segment file, it does not.
When upgrade Doris from version-0.12 to version-0.13, we found the memory is obviousely increasing in table
with V1 segment file. Because a lots value columns create zone map index.
Therefore, it is better to be compatible with version-0.12. The config `enable_value_column_zonemap` is default false.
If it is need to add zone map index for value columns, you can set it true.