1. Fix data size calculation of auto sample, before this pr, the data size is include all the replicas
2. Move some auto analyze related options to global session variable
3. Add some logs
In the old code, when using desc command to view the table schema
It will display as follows
```
ARRAY<TINYINT(4)>
ARRAY<SMALLINT(6)>
ARRAY<INT(11)>
ARRAY<BIGINT(20)>
ARRAY<LARGEINT(40)>
```
However, for normal integer type displays, the width is not displayed
So, I changed it to the following
```
ARRAY<TINYINT>
ARRAY<SMALLINT>
ARRAY<INT>
ARRAY<BIGINT>
ARRAY<LARGEINT>
```
### Support dispaly of auto analyze jobs
After this PR, users and DBA could use such grammar to check the execution status of auto analyze jobs:
```sql
SHOW AUTO ANALYZE [tbl_name] [WHERE STATE='SOME STATE']
```
Record count of history auto analyze job could be configured by setting FE option: auto_analyze_job_record_count, default value is 2000
### Enhance auto analyze
After this PR, auto jobs those created automatically will no longer execute beyond a specific time frame.
Add variant type for metadata Add persistent information for variant, including the path of variant sub-columns, persisting them to the segment footer and tablet schema of the rowset.
1. Analyze with sample automatically when table size is greater than huge_table_lower_bound_size_in_bytes(5G by default). User can disable this feature by fe option enable_auto_sample
2. Support grammer like `ANALYZE TABLE test WITH FULL` to force do full analyze whatever table size is
3. Fix bugs that tables stats doesn't get updated properly when stats is dropped, or only few column is analyzed
The origin virtual number is Math.max(Math.min(512 / backends.size(), 32), 2);, which is too small,
causing uneven cache distribution when enabling file cache.
1. support struct data type
2. add array / map / struct literal syntax
3. fix array union / intersect / except type coercion
4. fix explict cast data type check for array
5. fix bound function type coercion
- add config `enable_col_auth` to temporarily disable column permissions(because old/new planner has bug when select from view)
- Restore the old optimizer to the previous authentication method
- Support for new optimizer authentication(Legacy issue: When querying the view, the permissions of the base table will be authenticated. The view's own permissions should be authenticated and processed after the new optimizer is improved)
- fix: show grants for non-existent users
- fix: role:`admin` can not grant/revoke to/from user
FEATURE:
1. enable array type in Nereids
2. support generice on function signature
3. support array and map type in type coercion and type check
4. add element_at and element_slice syntax in Nereids parser
REFACTOR:
1. remove AbstractDataType
BUG FIX:
1. remove FROM from nonReserved keyword list
TODO:
1. support lambda expression
2. use Nereids' way do function type coercion
3. use castIfnotSame when do implict cast on BoundFunction
4. let AnyDataType type coercion do same thing as function type coercion
5. add below array function
- array_apply
- array_concat
- array_filter
- array_sortby
- array_exists
- array_first_index
- array_last_index
- array_count
- array_shuffle shuffle
- array_pushfront
- array_pushback
- array_repeat
- array_zip
- reverse
- concat_ws
- split_by_string
- explode
- bitmap_from_array
- bitmap_to_array
- multi_search_all_positions
- multi_match_any
- tokenize
if we write sql with : `select cast(array() as array<varchar(10)>)`
castexpr in fe will call analyze() with `Type.matchExactType(childType, type, true);`
here array type only check contains_null , but should check inner type to make array matchExactType right
update the capacity coeficient for calcutating the backend load score:
1. Add fe config entry `backend_load_capacity_coeficient` to allow setting the capacity coeficient manually;
2. Adjust calculating capacity coeficient as below.
We emphasize disk usage for calculating load score.
If a be has a high used capacity percent, we should increase its load score.
So we increase capacity coefficient with a be's used capacity percent.
But this is not enough. For example, if the tablets have a big difference in data size.
Then for below two BEs, their load score maybe the same:
BE A: disk usage = 60%, replica number = 2000 (it contains the big tablets)
BE B: disk usage = 30%, replica number = 4000 (it contains the small tablets)
But what we want is: firstly move some big tablets from A to B, after their disk usages are close,
then move some small tablets from B to A, finally both of their disk usages and replica number
are close.
To achieve this, when the max difference between all BE's disk usages >= 30%, we set the capacity cofficient to 1.0 and avoid the affect of replica num. After the disk usage difference decrease, then decrease the capacity cofficient to make replica num effective.
fe log is large for a busy doris cluster, if you want to preserve some historical logs, it cost too much disk space.
enable compression is a good way to save space.
and a gzip compressed text file can be viewed without decompression.
Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema by session var `truncate_char_or_varchar_columns`.