random_shuflle will generate same random sequence when call multiple times,
although we use twice random, but when there is no change in the size relationship
between the adjacent numbers, the result of the second shuffle will not change either
To avoid showing too many memtracker on BE web pages.
The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE.
OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata.
TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task.
VERBOSE is used for other more detailed memtrackers.
We created multiple rowset readers to read data of one tablet,
after one rowset reader has reached EOF, it can be released to
reduce resource (typically memory) consumption.
As the same, we can release segment reader when it reach EOF.
`Select outfile into` currently only supports to export data with CSV format.
This patch extends the feature to supports parquet format.
Usage:
LocaFile:
```
SELECT citycode FROM table1 INTO OUTFILE "file:///root/doris/" FORMAT AS PARQUET PROPERTIES
("schema"="required,int32,siteid;", "parquet.compression"="snappy");
```
BrokerFile:
```
SELECT siteid FROM table1 INTO OUTFILE "hdfs://host/test_sql_prc_2019_02_19/" FORMAT AS PARQUET
PROPERTIES (
"broker.name" = "hdfs_broker",
"broker.hadoop.security.authentication" = "kerberos",
"broker.kerberos_principal" = "test",
"broker.kerberos_keytab_content" = "base64" ,
"schema"="required,int32,siteid;"
);
```
Field `schema` is required, which defines the schema of a parquet file.
Prefix `parquet.` is the parquet file properties, like compression, version, enable_dictionary.
The version information of the tablet will be stored in the memory
in an adjacency graph data structure.
And as the new version is written and the old version is deleted,
the data structure will begin to have empty vertex with no edge associations(orphan vertex).
These orphan vertexs should be removed somehow.
1. The partitions set by the admin repair command are prioritized
to ensure that the tablets of these partitions can be repaired as soon as possible.
2. Add an FE metric "query_begin" to monitor the number of queries submitted to the Doris.
For PR #5792. This patch add a new param `cache type` to distinguish sql cache and partition cache.
When update sql cache, we make assure one sql key only has one version cache.
When parsing memory parameters in `ParseUtil::parse_mem_spec`, convert the percentage to `double` instead of `int`.
The currently affected parameters include `mem_limit` and `storage_page_cache_limit`
According to the LRU priority, the `lru list` is split into `lru normal list` and `lru durable list`,
and the two lists are traversed in sequence during LRU evict, avoiding invalid cycles.
If query is memory exceed, a detail info where memory exceed is required.
However it is not necessary to return the entire query stack to the end user.
The query stack only needs to be printed in the be log.
The buffered reader's _cur_offset should be initialized as same as the inner file reader's,
to make sure that the reader will start to read at rignt position.
Support when creating a kafka routine load, start consumption from a specified point in time instead of a specific offset.
eg:
```
FROM KAFKA
(
"kafka_broker_list" = "broker1:9092,broker2:9092",
"kafka_topic" = "my_topic",
"property.kafka_default_offsets" = "2021-10-10 11:00:00"
);
or
FROM KAFKA
(
"kafka_broker_list" = "broker1:9092,broker2:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2",
"kafka_offsets" = "2021-10-10 11:00:00, 2021-10-10 11:00:00, 2021-10-10 12:00:00"
);
```
This PR also reconstructed the analysis method of properties when creating or altering
routine load jobs, and unified the analysis process in the `RoutineLoadDataSourceProperties` class.
1 Make some MemTracker have reasonable parent MemTracker not the root tracker
2 Make each MemTracker can be easily to trace.
3 Add show level of MemTracker to reduce the MemTracker show in the web page to have a way to control show how many tracker in web page.
The cause of the problem is that after query cancel, OlapScanNode::transfer_thread still continues to schedule
OlapScanNode::scanner_thread until all tasks are scheduled.
Although each task does not scan data and exits quickly, it still consumes a lot of resources.
(Guess)This may be the cause of the BUG (#5767) causing the I/O to be full.
So after query cancel, immediately exit the scheduling loop in transfer_thread, and after waiting for
the end of all scanner_threads, transfer_thread will also exit.
1. relocation R_X86_64_32 against `__gxx_personality_v0' can not be used when making a shared object; recompile with -fPIC
2. warning: the use of `tmpnam' is dangerous, better use `mkstemp'
3. Death tests use fork(), which is unsafe particularly in a threaded context. For this test, Google Test couldn't detect the number of threads.
* [Bug] Fix bug that database not found when replaying batch transaction remove log
[GlobalTransactionMgr.replayBatchRemoveTransactions():353] replay batch remove transactions failed. db 0
org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = databaseTransactionMgr[0] does not exist
at org.apache.doris.transaction.GlobalTransactionMgr.getDatabaseTransactionMgr(GlobalTransactionMgr.java:84) ~[palo-fe.jar:3.4.0]
at org.apache.doris.transaction.GlobalTransactionMgr.replayBatchRemoveTransactions(GlobalTransactionMgr.java:350) [palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.loadJournal(EditLog.java:601) [palo-fe.jar:3.4.0]
at org.apache.doris.catalog.Catalog.replayJournal(Catalog.java:2452) [palo-fe.jar:3.4.0]
at org.apache.doris.master.Checkpoint.runAfterCatalogReady(Checkpoint.java:101) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:3.4.0]
The id of information_scheam database is 0, and it has no txn at all.
1. Reduce lock conflicts in RuntimeProfile of be;
2. can view query profile when the query is executing;
3. reduce wait time for 'show proc /current_queries'.