Files
doris/fe
ElvinWei 1a6401d682 [enchancement](statistics) support sampling collection of statistics (#18880)
1. Supports sampling to collect statistics
2. Improved syntax for collecting statistics
3. Support histogram specifies the number of buckets
4. Tweaked some code structure

---

The syntax supports WITH and PROPERTIES, using the same syntax as before.

Column Statistics Collection Syntax:
```SQL
ANALYZE [ SYNC ] TABLE table_name
     [ (column_name [, ...]) ]
     [ [WITH SYNC] | [WITH INCREMENTAL] | [WITH SAMPLE PERCENT | ROWS ] ]
     [ PROPERTIES ('key' = 'value', ...) ];
```

Column histogram collection syntax:
```SQL
ANALYZE [ SYNC ] TABLE table_name
     [ (column_name [, ...]) ]
     UPDATE HISTOGRAM
     [ [ WITH SYNC ][ WITH INCREMENTAL ][ WITH SAMPLE PERCENT | ROWS ][ WITH BUCKETS ] ]
     [ PROPERTIES ('key' = 'value', ...) ];
```

Illustrate:
- sync:Collect statistics synchronously. Return after collecting.
- incremental:Collect statistics incrementally. Incremental collection of histogram statistics is not supported.
- sample percent | rows:Collect statistics by sampling. Scale and number of rows can be sampled.
- buckets:Specifies the maximum number of buckets generated when collecting histogram statistics.
- table_name: The purpose table for collecting statistics. Can be of the form `db_name.table_name`.
- column_name: The specified destination column must be a column that exists in `table_name`, and multiple column names are separated by commas.
- properties:Properties used to set statistics tasks. Currently only the following configurations are supported (equivalent to the with statement)
   - 'sync' = 'true'
   - 'incremental' = 'true'
   - 'sample.percent' = '50'
   - 'sample.rows' = '1000'
   - 'num.buckets' = 10

--- 

TODO: 
- Supplement the complete p0 test
- `Incremental` statistics see #18653
2023-04-21 13:11:43 +08:00
..

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

# fe-common

This module is used to store some common classes of other modules.

# spark-dpp

This module is Spark DPP program, used for Spark Load function.
Depends: fe-common

# fe-core

This module is the main process module of FE.
Depends: fe-common, spark-dpp