1. not collect partition stats anymore
2. merge insert of stats
3. delete period collector since it is useless
4. remove enable_auto_sample
5. move some config related to stats to global session variable
Before this PR, when analyze a table, the insert count equals column count times 2
After this PR, insert count of analyze table would reduce to column count / insert_merge_item_count.
According to my test, when analyzing tpch lineitem, the insert sql count is 1
Use field datatype such as decimal(10, 0) to create table like. Because the scale is 0, the precision and scale will lost when create table like done. this will fix the bug.
**Before fix, create table with following SQL**:
CREATE TABLE IF NOT EXISTS db_test.table_test
(
`name` varchar COMMENT "1m size",
`id` SMALLINT COMMENT "[-32768, 32767]",
`timestamp0` decimal null comment "c0",
`timestamp1` decimal(38, 0) null comment "c1"
)
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES ('replication_num' = '1');
**and Then run**
CREATE TABLE db_test.table_test_like LIKE db_test.table_test
SHOW CREATE TABLE db_test.table_test_like;
the field `timestamp1` will be decimal(9, 0), it's wrong. this will fix it.
The image file of our cluster reaches 2.3G. After the checkpoint, Followers synchronize the image timeout, resulting in the continuous increase of the bdb directory.
related pr: #25768
notice: this PR have change the fe meta version.
the nullable mode of udf-function is important,
if not write to info, it's will be loss after restart.
Add new FE config `ignore_unknown_metadata_module`. Default is false.
If set to true, when reading metadata image file, and there are unknown modules, these modules
will be ignored and skipped.
This is mainly used in downgrade operation, old version can be compatible with new version Image file.
If user has database with same name mysql, will introduce problem when doing checkpoint.
Solution:
Add check for this situation, if duplicate, exit and print log info to prevent damage of metadata;
Add fe config field: mysqldb_replace_name to make things correct if user already has mysql db.
Related pr: #23087#22868
1. Fix data size calculation of auto sample, before this pr, the data size is include all the replicas
2. Move some auto analyze related options to global session variable
3. Add some logs
In the old code, when using desc command to view the table schema
It will display as follows
```
ARRAY<TINYINT(4)>
ARRAY<SMALLINT(6)>
ARRAY<INT(11)>
ARRAY<BIGINT(20)>
ARRAY<LARGEINT(40)>
```
However, for normal integer type displays, the width is not displayed
So, I changed it to the following
```
ARRAY<TINYINT>
ARRAY<SMALLINT>
ARRAY<INT>
ARRAY<BIGINT>
ARRAY<LARGEINT>
```
### Support dispaly of auto analyze jobs
After this PR, users and DBA could use such grammar to check the execution status of auto analyze jobs:
```sql
SHOW AUTO ANALYZE [tbl_name] [WHERE STATE='SOME STATE']
```
Record count of history auto analyze job could be configured by setting FE option: auto_analyze_job_record_count, default value is 2000
### Enhance auto analyze
After this PR, auto jobs those created automatically will no longer execute beyond a specific time frame.
Add variant type for metadata Add persistent information for variant, including the path of variant sub-columns, persisting them to the segment footer and tablet schema of the rowset.