* [Bug] Filter out unavaiable backends when getting scan range location
In the previous implementation, we will eliminate non-surviving BEs in the Coordinator phase.
But for Spark or Flink Connector, there is no such logic, so when a BE node is down,
it will cause the problem of querying errors through the Connector.
* fix ut
* fix compiule
refactor runtime filter bloomfilter and eliminate some virtual function calls which obtained a performance improvement of about 5%
import block bloom filter, for avx version obtained 40% performance improvement
before: bloomfilter size:default, about 2000W item cost about 1s400ms
after: bloomfilter size:524288, about 2000W item cost about 400ms
1. support in/bloomfilter/minmax
2. support broadcast/shuffle/bucket shuffle/colocate join
3. opt memory use and cpu cache miss while build runtime filter
4. opt memory use in left semi join (works well on tpcds-95)
random_shuflle will generate same random sequence when call multiple times,
although we use twice random, but when there is no change in the size relationship
between the adjacent numbers, the result of the second shuffle will not change either
To avoid showing too many memtracker on BE web pages.
The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE.
OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata.
TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task.
VERBOSE is used for other more detailed memtrackers.
We created multiple rowset readers to read data of one tablet,
after one rowset reader has reached EOF, it can be released to
reduce resource (typically memory) consumption.
As the same, we can release segment reader when it reach EOF.
`Select outfile into` currently only supports to export data with CSV format.
This patch extends the feature to supports parquet format.
Usage:
LocaFile:
```
SELECT citycode FROM table1 INTO OUTFILE "file:///root/doris/" FORMAT AS PARQUET PROPERTIES
("schema"="required,int32,siteid;", "parquet.compression"="snappy");
```
BrokerFile:
```
SELECT siteid FROM table1 INTO OUTFILE "hdfs://host/test_sql_prc_2019_02_19/" FORMAT AS PARQUET
PROPERTIES (
"broker.name" = "hdfs_broker",
"broker.hadoop.security.authentication" = "kerberos",
"broker.kerberos_principal" = "test",
"broker.kerberos_keytab_content" = "base64" ,
"schema"="required,int32,siteid;"
);
```
Field `schema` is required, which defines the schema of a parquet file.
Prefix `parquet.` is the parquet file properties, like compression, version, enable_dictionary.
The version information of the tablet will be stored in the memory
in an adjacency graph data structure.
And as the new version is written and the old version is deleted,
the data structure will begin to have empty vertex with no edge associations(orphan vertex).
These orphan vertexs should be removed somehow.
1. The partitions set by the admin repair command are prioritized
to ensure that the tablets of these partitions can be repaired as soon as possible.
2. Add an FE metric "query_begin" to monitor the number of queries submitted to the Doris.
For PR #5792. This patch add a new param `cache type` to distinguish sql cache and partition cache.
When update sql cache, we make assure one sql key only has one version cache.
When parsing memory parameters in `ParseUtil::parse_mem_spec`, convert the percentage to `double` instead of `int`.
The currently affected parameters include `mem_limit` and `storage_page_cache_limit`
According to the LRU priority, the `lru list` is split into `lru normal list` and `lru durable list`,
and the two lists are traversed in sequence during LRU evict, avoiding invalid cycles.
If query is memory exceed, a detail info where memory exceed is required.
However it is not necessary to return the entire query stack to the end user.
The query stack only needs to be printed in the be log.