#5902
This CL mainly changes:
1. Support setting tags for BE nodes:
```
alter system add backend "1272:9050, 1212:9050" properties("tag.location": "zoneA");
alter system modify backend "1272:9050, 1212:9050" set ("tag.location": "zoneB");
```
And for compatibility, all BE nodes will be set a "default" tag when upgrading: `"tag.location": "default"`.
2. Create a new class `ReplicaAllocation` to replace the previous `replication_num`.
`ReplicaAllocation` represents the allocation of the replicas of a tablet. It contains a map from
Tag to number of replicas.
For example, if user set a table's replication num to 3, it will be converted to a ReplicaAllocation
like: `"tag.location.default" : "3"`, which means the tablet will have 3 replicas and all of them will be
allocated in BE nodes with tag "default";
3. Support create table with replication allocation:
```
CREATE TABLE example_db.table_hash
(
k1 TINYINT
)
DISTRIBUTED BY HASH(k1) BUCKETS 32
PROPERTIES (
"replication_allocation"="tag.location.zone1:1, tag.location.zone2:2"
);
```
Also support set replica allocation for dynamic tables, and modify replica allocation at runtime.
For compatibility, user can still set "replication_num" = "3", and it will be automatically converted to:
` "replication_allocation"="tag.location.default:3"`
4. Support tablet repair and balance based on Tag
1. For tablets of non-colocate table, most of the logic is the same as before,
but when selecting the destination node for clone, the tag of the node will be considered.
If the required tag does not exist, it cannot be repaired.
Similarly, under the condition of ensuring that the replicas are complete, the tablet will be
reallocated according to the tag or the replicas will be balanced.
Balancing is performed separately within each resource group.
2. For tablets of colocate table, the backends sequence of buckets will be splitted by tag.
For example, if replica allocation is "tag.location.zone1:1, tag.location.zone2:2",
And zone1 has 2 BE: A, B; zone2 has 3 BE: C, D, F
there will be 2 backend sequences: one is for zone1, and the other is for zone2.
And one posible seqeunces will be:
zone1: [A] [B] [A] [B]
zone2: [C, D][D, F][F, C][C, D]
5. Support setting tags for user and restrict execution node with tags:
```
set property for 'cmy' 'resource_tags.location' : 'zone1, zone2';
```
After setting, the user 'cmy' can only query data stored on backends with tag zone1 and zone2,
And query can only be executed on backends with tag zone1 and zone2
For compatibility, after upgrading, the property `resource_tags.location` will be empty,
so that user can still query data stored on any backends.
6. Modify the Unit test frame of FE so that we can created multi backends with different mocked IP in unit test.
This help us to easily test some distributed cases like query, tablet repair and balance
The document will be added in another PR.
Also fix a bug described in #6194
fix#5378#5391#5688#5973#6155 and all replay NPE. All replay method can now throw MetaNotFoundException and caught to log a warning for potential inconsistent metadata cases.
try to establish a clear notice for future developer to check null.
* fix(sparkload): bitmap deep copy in `or` operator
fix multi rollup hold the same Ref of bitmapvalue which may be updated repeatedly.
* fix(sparkload): bitmap deep copy in `or` operator
fix multi rollup hold the same Ref of bitmapvalue which may be updated repeatedly.
Co-authored-by: weixiang <weixiang06@meituan.com>
* fix bugs with string type
1. not support string with agg type min/max
2. agg_update with large string may coredump
3. stringval with large string may coredump
4. not support string as partition key
Origin stream load column order transformation is unclear , a user is struggling for a long time in this part ,so i modified some expressions to make it clearer.
for issue #6474
```sql
create table test.table1 like test.table with rollup r1,r2 -- copy some rollup
create table test.table1 like test.table with rollup all -- copy all rollup
create table test.table1 like test.table -- only copy base table
```
Fix#6512
If there is missing replica for a tablet, clone task will be executed to restore missing replica from a healthy replica. Src replica selector will randomly choose a healthy replica as src replica.
It's better to choose the health replica with min version count as src replica so that it could avoid repetitive compaction task. In addition, replica with less version count is good for query performance.
fix#6447
1. FE master regularly triggers the remove operation
2. After the master completes the removal of deleteInfo, it is synchronized to the Follower through editlog for remove
3. When the DeleteInfo creation time is longer than the current time, it will be cleaned up, which is determined by the `delete_info_keep_max_second` configuration
Implement the lower_case_table_names variable of mysql. The value meaning is as follows:
0: the table names are case-sensitive.
1: table names are stored in lowercase and comparisons are not case sensitive.
2: table names are stored as given but compared case-insensitively.
1. Add license/total line/release badegs.
2. Add monthly active contributor and contributor growth graph
3. fix a pom.xml bug
4. Modify some routine load log on BE side
This CL mainly changes:
1. the `storage_page_cache_limit` is based on config `mem_limit`
the default is 20% of `mem_limit`.
2. the `buffer_pool_limit` is based on config `mem_limit`
the default is 20% of `mem_limit`.
3. the `buffer_pool_clean_pages_limit` is based on config `buffer_pool_limit`
the default is 50% of `buffer_pool_limit`
4. Fix some show bugs of lru cache hit ratio and usage ratio
5. Fix a create view bug that `notEvalNondeterministicFunction` should be reset after analyze.
fix#6269
The outline of our changes is to improve our memory in case of OOM in BE and to speed up the calculation.
1. We do not need to do Aggregation in load, which has already been done in the ETL spark job.
2. Based on 1, we do not need to serialize/deserialize bitmap/HLL objects.
* Add statistics struct and Support manually inject statistics
This PR mainly developed the data structure used by statistical information
and the function of manually modifying the statistical information.
We use a statistics package alone to store statistical information,
and use the 'statistics manager' as a unified entry for statistical information.
For detailed data structure and explanation, please refer to the comments on the class.
Manually modify statistics include: Manually modify table statistics and column statistics.
The syntax is explained in the issue #6370.
* Show table and column statistics
'SHOW TABLE STATS' used to show the statistics of table.
'SHOW COLUMN STATS' used to show the statistics of columns.
Currently, only the tables and columns for setting statistics
will be displayed in the results.