[Improvement](docs) Update EN doc (#9228)

This commit is contained in:
Gabriel
2022-04-27 23:22:38 +08:00
committed by GitHub
parent 2ec0b98787
commit 5cbb4a2317
38 changed files with 696 additions and 696 deletions

View File

@ -38,13 +38,13 @@ Support SQL block rule by user level:
SQL block rule CRUD
- create SQL block rule
- sqlRegex patternSpecial characters need to be translated, "NULL" by default
- sql: Regex pattern,Special characters need to be translated, "NULL" by default
- sqlHash: Sql hash value, Used to match exactly, We print it in fe.audit.log, This parameter is the only choice between sql and sql, "NULL" by default
- partition_num: Max number of partitions will be scanned by a scan node, 0L by default
- tablet_num: Max number of tablets will be scanned by a scan node, 0L by default
- cardinality: An inaccurate number of scan rows of a scan node, 0L by default
- global: Whether global(all users)is in effect, false by default
- enableWhether to enable block ruletrue by default
- enable: Whether to enable block rule,true by default
```sql
CREATE SQL_BLOCK_RULE test_rule
PROPERTIES(
@ -70,7 +70,7 @@ CREATE SQL_BLOCK_RULE test_rule2 PROPERTIES("partition_num" = "30", "cardinality
```sql
SHOW SQL_BLOCK_RULE [FOR RULE_NAME]
```
- alter SQL block ruleAllows changes sql/sqlHash/global/enable/partition_num/tablet_num/cardinality anyone
- alter SQL block rule,Allows changes sql/sqlHash/global/enable/partition_num/tablet_num/cardinality anyone
- sql and sqlHash cannot be set both. It means if sql or sqlHash is set in a rule, another property will never be allowed to be altered
- sql/sqlHash and partition_num/tablet_num/cardinality cannot be set together. For example, partition_num is set in a rule, then sql or sqlHash will never be allowed to be altered.
```sql
@ -81,7 +81,7 @@ ALTER SQL_BLOCK_RULE test_rule PROPERTIES("sql"="select \\* from test_table","en
ALTER SQL_BLOCK_RULE test_rule2 PROPERTIES("partition_num" = "10","tablet_num"="300","enable"="true")
```
- drop SQL block ruleSupport multiple rules, separated by `,`
- drop SQL block rule,Support multiple rules, separated by `,`
```sql
DROP SQL_BLOCK_RULE test_rule1,test_rule2
```

View File

@ -28,7 +28,7 @@ under the License.
Bucket Shuffle Join is a new function officially added in Doris 0.14. The purpose is to provide local optimization for some join queries to reduce the time-consuming of data transmission between nodes and speed up the query.
It's design, implementation can be referred to [ISSUE 4394](https://github.com/apache/incubator-doris/issues/4394)
It's design, implementation can be referred to [ISSUE 4394](https://github.com/apache/incubator-doris/issues/4394).
## Noun Interpretation
@ -40,7 +40,7 @@ It's design, implementation can be referred to [ISSUE 4394](https://github.com/a
## Principle
The conventional distributed join methods supported by Doris is: `Shuffle Join, Broadcast Join`. Both of these join will lead to some network overhead.
For example, there are join queries for table A and table B. the join method is hashjoin. The cost of different join types is as follows
For example, there are join queries for table A and table B. the join method is hashjoin. The cost of different join types is as follows:
* **Broadcast Join**: If table a has three executing hashjoinnodes according to the data distribution, table B needs to be sent to the three HashJoinNode. Its network overhead is `3B `, and its memory overhead is `3B`.
* **Shuffle Join**: Shuffle join will distribute the data of tables A and B to the nodes of the cluster according to hash calculation, so its network overhead is `A + B` and memory overhead is `B`.
@ -50,9 +50,9 @@ The data distribution information of each Doris table is saved in FE. If the joi
The picture above shows how the Bucket Shuffle Join works. The SQL query is A table join B table. The equivalent expression of join hits the data distribution column of A. According to the data distribution information of table A. Bucket Shuffle Join sends the data of table B to the corresponding data storage and calculation node of table A. The cost of Bucket Shuffle Join is as follows:
* network cost ``` B < min(3B, A + B) ```
* network cost: ``` B < min(3B, A + B) ```
* memory cost ``` B <= min(3B, B) ```
* memory cost: ``` B <= min(3B, B) ```
Therefore, compared with Broadcast Join and Shuffle Join, Bucket shuffle join has obvious performance advantages. It reduces the time-consuming of data transmission between nodes and the memory cost of join. Compared with Doris's original join method, it has the following advantages
@ -91,7 +91,7 @@ You can use the `explain` command to check whether the join is a Bucket Shuffle
| | equal join conjunct: `test`.`k1` = `baseall`.`k1`
```
The join type indicates that the join method to be used is`BUCKET_SHUFFLE`
The join type indicates that the join method to be used is:`BUCKET_SHUFFLE`.
## Planning rules of Bucket Shuffle Join

View File

@ -101,25 +101,25 @@ There are two ways to configure BE configuration items:
### `alter_tablet_worker_count`
Default3
Default: 3
The number of threads making schema changes
### `base_compaction_check_interval_seconds`
Default60 (s)
Default: 60 (s)
BaseCompaction thread polling interval
### `base_compaction_interval_seconds_since_last_operation`
Default86400
Default: 86400
One of the triggering conditions of BaseCompaction: the interval since the last BaseCompaction
### `base_compaction_num_cumulative_deltas`
Default5
Default: 5
One of the triggering conditions of BaseCompaction: The limit of the number of Cumulative files to be reached. After reaching this limit, BaseCompaction will be triggered
@ -150,13 +150,13 @@ Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"in
### `base_compaction_write_mbytes_per_sec`
Default5(MB)
Default: 5(MB)
Maximum disk write speed per second of BaseCompaction task
### `base_cumulative_delta_ratio`
Default0.3 (30%)
Default: 0.3 (30%)
One of the trigger conditions of BaseCompaction: Cumulative file size reaches the proportion of Base file
@ -206,7 +206,7 @@ User can set this configuration to a larger value to get better QPS performance.
### `buffer_pool_clean_pages_limit`
默认值20G
默认值: 20G
Clean up pages that may be saved by the buffer pool
@ -226,25 +226,25 @@ The maximum amount of memory available in the BE buffer pool. The buffer pool is
### `check_consistency_worker_count`
Default1
Default: 1
The number of worker threads to calculate the checksum of the tablet
### `chunk_reserved_bytes_limit`
Default2147483648
Default: 2147483648
The reserved bytes limit of Chunk Allocator is 2GB by default. Increasing this variable can improve performance, but it will get more free memory that other modules cannot use.
### `clear_transaction_task_worker_count`
Default1
Default: 1
Number of threads used to clean up transactions
### `clone_worker_count`
Default3
Default: 3
Number of threads used to perform cloning tasks
@ -258,13 +258,13 @@ This value is usually delivered by the FE to the BE by the heartbeat, no need to
### `column_dictionary_key_ratio_threshold`
Default0
Default: 0
The value ratio of string type, less than this ratio, using dictionary compression algorithm
### `column_dictionary_key_size_threshold`
Default0
Default: 0
Dictionary compression column size, less than this value using dictionary compression algorithm
@ -305,7 +305,7 @@ tablet_score = compaction_tablet_scan_frequency_factor * tablet_scan_frequency +
### `create_tablet_worker_count`
Default3
Default: 3
Number of worker threads for BE to create a tablet
@ -325,19 +325,19 @@ Generally it needs to be turned off. When you want to manually operate the compa
### `cumulative_compaction_budgeted_bytes`
Default104857600
Default: 104857600
One of the trigger conditions of BaseCompaction: Singleton file size limit, 100MB
### `cumulative_compaction_check_interval_seconds`
Default10 (s)
Default: 10 (s)
CumulativeCompaction thread polling interval
### `cumulative_compaction_skip_window_seconds`
Default30(s)
Default: 30(s)
CumulativeCompaction skips the most recently released increments to prevent compacting versions that may be queried (in case the query planning phase takes some time). Change the parameter is to set the skipped window time size
@ -419,13 +419,13 @@ In some deployment environments, the `conf/` directory may be overwritten due to
### `delete_worker_count`
Default3
Default: 3
Number of threads performing data deletion tasks
### `disable_mem_pools`
Defaultfalse
Default: false
Whether to disable the memory cache pool, it is not disabled by default
@ -437,13 +437,13 @@ Whether to disable the memory cache pool, it is not disabled by default
### `disk_stat_monitor_interval`
Default5(s)
Default: 5(s)
Disk status check interval
### `doris_cgroups`
Defaultempty
Default: empty
Cgroups assigned to doris
@ -475,7 +475,7 @@ When the concurrency cannot be improved in high concurrency scenarios, try to re
### `doris_scanner_row_num`
Default16384
Default: 16384
The maximum number of data rows returned by each scanning thread in a single execution
@ -493,31 +493,31 @@ The maximum number of data rows returned by each scanning thread in a single exe
### `download_low_speed_limit_kbps`
Default50 (KB/s)
Default: 50 (KB/s)
Minimum download speed
### `download_low_speed_time`
Default300(s)
Default: 300(s)
Download time limit, 300 seconds by default
### `download_worker_count`
Default1
Default: 1
The number of download threads, the default is 1
### `drop_tablet_worker_count`
Default3
Default: 3
Number of threads to delete tablet
### `enable_metric_calculator`
Defaulttrue
Default: true
If set to true, the metric calculator will run to collect BE-related indicator information, if set to false, it will not run
@ -540,31 +540,31 @@ If set to true, the metric calculator will run to collect BE-related indicator i
### `enable_system_metrics`
Defaulttrue
Default: true
User control to turn on and off system indicators.
### `enable_token_check`
Defaulttrue
Default: true
Used for forward compatibility, will be removed later.
### `es_http_timeout_ms`
Default5000 (ms)
Default: 5000 (ms)
The timeout period for connecting to ES via http, the default is 5 seconds.
### `es_scroll_keepalive`
Default5m
Default: 5m
es scroll Keeplive hold time, the default is 5 minutes
### `etl_thread_pool_queue_size`
Default256
Default: 256
The size of the ETL thread pool
@ -578,20 +578,20 @@ The size of the ETL thread pool
### `file_descriptor_cache_capacity`
Default32768
Default: 32768
File handle cache capacity, 32768 file handles are cached by default.
### `cache_clean_interval`
Default1800(s)
Default: 1800(s)
File handle cache cleaning interval, used to clean up file handles that have not been used for a long time.
Also the clean interval of Segment Cache.
### `flush_thread_num_per_store`
Default2
Default: 2
The number of threads used to refresh the memory table per store
@ -599,17 +599,17 @@ The number of threads used to refresh the memory table per store
### `fragment_pool_queue_size`
Default2048
Default: 2048
The upper limit of query requests that can be processed on a single node
### `fragment_pool_thread_num_min`
Default64
Default: 64
### `fragment_pool_thread_num_max`
Default256
Default: 256
The above two parameters are to set the number of query threads. By default, a minimum of 64 threads will be started, subsequent query requests will dynamically create threads, and a maximum of 256 threads will be created.
@ -626,7 +626,7 @@ The above two parameters are to set the number of query threads. By default, a m
### `ignore_broken_disk`
Defaultfalse
Default: false
When BE start, If there is a broken disk, BE process will exit by default.Otherwise, we will ignore the broken disk
@ -662,37 +662,37 @@ When configured as true, the program will run normally and ignore this error. In
### inc_rowset_expired_sec
Default1800 (s)
Default: 1800 (s)
Import activated data, storage engine retention time, used for incremental cloning
### `index_stream_cache_capacity`
Default10737418240
Default: 10737418240
BloomFilter/Min/Max and other statistical information cache capacity
### `kafka_broker_version_fallback`
Default0.10.0
Default: 0.10.0
If the dependent Kafka version is lower than the Kafka client version that routine load depends on, the value set by the fallback version kafka_broker_version_fallback will be used, and the valid values are: 0.9.0, 0.8.2, 0.8.1, 0.8.0.
### `load_data_reserve_hours`
Default4(hour)
Default: 4(hour)
Used for mini load. The mini load data file will be deleted after this time
### `load_error_log_reserve_hours`
Default48 (hour)
Default: 48 (hour)
The load error log will be deleted after this time
### `load_process_max_memory_limit_bytes`
Default107374182400
Default: 107374182400
The upper limit of memory occupied by all imported threads on a single node, default value: 100G
@ -700,7 +700,7 @@ Set these default values very large, because we don't want to affect load perfor
### `load_process_max_memory_limit_percent`
Default80 (%)
Default: 80 (%)
The percentage of the upper memory limit occupied by all imported threads on a single node, the default is 80%
@ -708,25 +708,25 @@ Set these default values very large, because we don't want to affect load perfor
### `log_buffer_level`
Defaultempty
Default: empty
The log flushing strategy is kept in memory by default
### `madvise_huge_pages`
Defaultfalse
Default: false
Whether to use linux memory huge pages, not enabled by default
### `make_snapshot_worker_count`
Default5
Default: 5
Number of threads making snapshots
### `max_client_cache_size_per_host`
Default10
Default: 10
The maximum number of client caches per host. There are multiple client caches in BE, but currently we use the same cache size configuration. If necessary, use different configurations to set up different client-side caches
@ -738,43 +738,43 @@ The maximum number of client caches per host. There are multiple client caches i
### `max_consumer_num_per_group`
Default3
Default: 3
The maximum number of consumers in a data consumer group, used for routine load
### `min_cumulative_compaction_num_singleton_deltas`
Default5
Default: 5
Cumulative compaction strategy: the minimum number of incremental files
### `max_cumulative_compaction_num_singleton_deltas`
Default1000
Default: 1000
Cumulative compaction strategy: the maximum number of incremental files
### `max_download_speed_kbps`
Default50000 (KB/s)
Default: 50000 (KB/s)
Maximum download speed limit
### `max_free_io_buffers`
Default128
Default: 128
For each io buffer size, the maximum number of buffers that IoMgr will reserve ranges from 1024B to 8MB buffers, up to about 2GB buffers.
### `max_garbage_sweep_interval`
Default3600
Default: 3600
The maximum interval for disk garbage cleaning, the default is one hour
### `max_memory_sink_batch_count`
Default20
Default: 20
The maximum external scan cache batch count, which means that the cache max_memory_cache_batch_count * batch_size row, the default is 20, and the default value of batch_size is 1024, which means that 20 * 1024 rows will be cached
@ -800,7 +800,7 @@ The maximum external scan cache batch count, which means that the cache max_memo
### `max_runnings_transactions_per_txn_map`
Default100
Default: 100
Max number of txns for every txn_partition_map in txn manager, this is a self protection to avoid too many txns saving in manager
@ -812,7 +812,7 @@ Max number of txns for every txn_partition_map in txn manager, this is a self pr
### `max_tablet_num_per_shard`
Default1024
Default: 1024
The number of sliced tablets, plan the layout of the tablet, and avoid too many tablet subdirectories in the repeated directory
@ -830,31 +830,31 @@ The number of sliced tablets, plan the layout of the tablet, and avoid too many
### `memory_limitation_per_thread_for_schema_change`
Default2 (G)
Default: 2 (G)
Maximum memory allowed for a single schema change task
### `memory_maintenance_sleep_time_s`
Default10
Default: 10
Sleep time (in seconds) between memory maintenance iterations
### `memory_max_alignment`
Default16
Default: 16
Maximum alignment memory
### `read_size`
Default8388608
Default: 8388608
The read size is the read size sent to the os. There is a trade-off between latency and the whole process, getting to keep the disk busy but not introducing seeks. For 8 MB reads, random io and sequential io have similar performance
### `min_buffer_size`
Default1024
Default: 1024
Minimum read buffer size (in bytes)
@ -873,19 +873,19 @@ Minimum read buffer size (in bytes)
### `min_file_descriptor_number`
Default60000
Default: 60000
The lower limit required by the file handle limit of the BE process
### `min_garbage_sweep_interval`
Default180
Default: 180
The minimum interval between disk garbage cleaning, time seconds
### `mmap_buffers`
Defaultfalse
Default: false
Whether to use mmap to allocate memory, not used by default
@ -897,67 +897,67 @@ Whether to use mmap to allocate memory, not used by default
### `num_disks`
Defalut0
Defalut: 0
Control the number of disks on the machine. If it is 0, it comes from the system settings
### `num_threads_per_core`
Default3
Default: 3
Control the number of threads that each core runs. Usually choose 2 times or 3 times the number of cores. This keeps the core busy without causing excessive jitter
### `num_threads_per_disk`
Default0
Default: 0
The maximum number of threads per disk is also the maximum queue depth of each disk
### `number_tablet_writer_threads`
Default16
Default: 16
Number of tablet write threads
### `path_gc_check`
Defaulttrue
Default: true
Whether to enable the recycle scan data thread check, it is enabled by default
### `path_gc_check_interval_second`
Default86400
Default: 86400
Recycle scan data thread check interval, in seconds
### `path_gc_check_step`
Default1000
Default: 1000
### `path_gc_check_step_interval_ms`
Default10 (ms)
Default: 10 (ms)
### `path_scan_interval_second`
Default86400
Default: 86400
### `pending_data_expire_time_sec`
Default1800
Default: 1800
The maximum duration of unvalidated data retained by the storage engine, the default unit: seconds
### `periodic_counter_update_period_ms`
Default500
Default: 500
Update rate counter and sampling counter cycle, default unit: milliseconds
### `plugin_path`
Default${DORIS_HOME}/plugin
Default: ${DORIS_HOME}/plugin
pliugin path
@ -969,43 +969,43 @@ pliugin path
### `pprof_profile_dir`
Default ${DORIS_HOME}/log
Default : ${DORIS_HOME}/log
pprof profile save directory
### `priority_networks`
Defaultempty
Default: empty
Declare a selection strategy for those servers with many IPs. Note that at most one ip should match this list. This is a semicolon-separated list in CIDR notation, such as 10.10.10.0/24. If there is no IP matching this rule, one will be randomly selected
### `priority_queue_remaining_tasks_increased_frequency`
Default512
Default: 512
the increased frequency of priority for remaining tasks in BlockingPriorityQueue
### `publish_version_worker_count`
Default8
Default: 8
the count of thread to publish version
### `pull_load_task_dir`
Default${DORIS_HOME}/var/pull_load
Default: ${DORIS_HOME}/var/pull_load
Pull the directory of the laod task
### `push_worker_count_high_priority`
Default3
Default: 3
Import the number of threads for processing HIGH priority tasks
### `push_worker_count_normal_priority`
Default3
Default: 3
Import the number of threads for processing NORMAL priority tasks
@ -1024,43 +1024,43 @@ Import the number of threads for processing NORMAL priority tasks
### `release_snapshot_worker_count`
Default5
Default: 5
Number of threads releasing snapshots
### `report_disk_state_interval_seconds`
Default60
Default: 60
The interval time for the agent to report the disk status to FE, unit (seconds)
### `report_tablet_interval_seconds`
Default60
Default: 60
The interval time for the agent to report the olap table to the FE, in seconds
### `report_task_interval_seconds`
Default10
Default: 10
The interval time for the agent to report the task signature to FE, unit (seconds)
### `result_buffer_cancelled_interval_time`
Default300
Default: 300
Result buffer cancellation time (unit: second)
### `routine_load_thread_pool_size`
Default10
Default: 10
The thread pool size of the routine load task. This should be greater than the FE configuration'max_concurrent_task_num_per_be' (default 5)
### `row_nums_check`
Defaulttrue
Default: true
Check row nums for BE/CE and schema change. true is open, false is closed
@ -1073,7 +1073,7 @@ Check row nums for BE/CE and schema change. true is open, false is closed
### `scan_context_gc_interval_min`
Default5
Default: 5
This configuration is used for the context gc thread scheduling cycle. Note: The unit is minutes, and the default is 5 minutes
@ -1096,43 +1096,43 @@ This configuration is used for the context gc thread scheduling cycle. Note: The
### `small_file_dir`
Default${DORIS_HOME}/lib/small_file/
Default: ${DORIS_HOME}/lib/small_file/
Directory for saving files downloaded by SmallFileMgr
### `snapshot_expire_time_sec`
Default172800
Default: 172800
Snapshot file cleaning interval, default value: 48 hours
### `status_report_interval`
Default5
Default: 5
Interval between profile reports; unit: seconds
### `storage_flood_stage_left_capacity_bytes`
Default1073741824
Default: 1073741824
The min bytes that should be left of a data dirdefault value:1G
The min bytes that should be left of a data dir,default value:1G
### `storage_flood_stage_usage_percent`
Default95 (95%)
Default: 95 (95%)
The storage_flood_stage_usage_percent and storage_flood_stage_left_capacity_bytes configurations limit the maximum usage of the capacity of the data directory.
### `storage_medium_migrate_count`
Default1
Default: 1
the count of thread to clone
### `storage_page_cache_limit`
Default20%
Default: 20%
Cache for storage page size
@ -1155,8 +1155,8 @@ Cache for storage page size
eg.2: `storage_root_path=/home/disk1/doris,medium:hdd,capacity:50;/home/disk2/doris,medium:ssd,capacity:50`
* 1./home/disk1/doris,medium:hdd,capacity:10capacity limit is 10GB, HDD;
* 2./home/disk2/doris,medium:ssd,capacity:50capacity limit is 50GB, SSD;
* 1./home/disk1/doris,medium:hdd,capacity:10,capacity limit is 10GB, HDD;
* 2./home/disk2/doris,medium:ssd,capacity:50,capacity limit is 50GB, SSD;
* Default: ${DORIS_HOME}
@ -1189,13 +1189,13 @@ Some data formats, such as JSON, cannot be split. Doris must read all the data i
### `streaming_load_rpc_max_alive_time_sec`
Default1200
Default: 1200
The lifetime of TabletsChannel. If the channel does not receive any data at this time, the channel will be deleted, unit: second
### `sync_tablet_meta`
Defaultfalse
Default: false
Whether the storage engine opens sync and keeps it to the disk
@ -1213,37 +1213,37 @@ Log Level: INFO < WARNING < ERROR < FATAL
### `sys_log_roll_mode`
DefaultSIZE-MB-1024
Default: SIZE-MB-1024
The size of the log split, one log file is split every 1G
### `sys_log_roll_num`
Default10
Default: 10
Number of log files kept
### `sys_log_verbose_level`
Defaultl10
Defaultl: 10
Log display level, used to control the log output at the beginning of VLOG in the code
### `sys_log_verbose_modules`
Defaultempty
Default: empty
Log printing module, writing olap will only print the log under the olap module
### `tablet_map_shard_size`
Default1
Default: 1
tablet_map_lock fragment size, the value is 2^n, n=0,1,2,3,4, this is for better tablet management
### `tablet_meta_checkpoint_min_interval_secs`
Default600s
Default: 600s
The polling interval of the TabletMeta Checkpoint thread
@ -1257,7 +1257,7 @@ The polling interval of the TabletMeta Checkpoint thread
### `tablet_stat_cache_update_interval_second`
默认值10
默认值: 10
The minimum number of Rowsets for TabletMeta Checkpoint
@ -1271,7 +1271,7 @@ When writing is too frequent and the disk time is insufficient, you can configur
### `tablet_writer_open_rpc_timeout_sec`
Default300
Default: 300
Update interval of tablet state cache, unit: second
@ -1285,7 +1285,7 @@ When meet '[E1011]The server is overcrowded' error, you can tune the configurati
### `tc_free_memory_rate`
Default20 (%)
Default: 20 (%)
Available memory, value range: [0-100]
@ -1299,7 +1299,7 @@ If the system is found to be in a high-stress scenario and a large number of thr
### `tc_use_memory_min`
Default10737418240
Default: 10737418240
The minimum memory of TCmalloc, when the memory used is less than this, it is not returned to the operating system
@ -1311,13 +1311,13 @@ The minimum memory of TCmalloc, when the memory used is less than this, it is no
### `thrift_connect_timeout_seconds`
Default3
Default: 3
The default thrift client connection timeout time (unit: seconds)
### `thrift_rpc_timeout_ms`
Default5000
Default: 5000
thrift default timeout time, default: 5 seconds
@ -1338,43 +1338,43 @@ If the parameter is `THREAD_POOL`, the model is a blocking I/O model.
### `trash_file_expire_time_sec`
Default259200
Default: 259200
The interval for cleaning the recycle bin is 72 hours. When the disk space is insufficient, the file retention period under trash may not comply with this parameter
### `txn_commit_rpc_timeout_ms`
Default10000
Default: 10000
txn submit rpc timeout, the default is 10 seconds
### `txn_map_shard_size`
Default128
Default: 128
txn_map_lock fragment size, the value is 2^n, n=0,1,2,3,4. This is an enhancement to improve the performance of managing txn
### `txn_shard_size`
Default1024
Default: 1024
txn_lock shard size, the value is 2^n, n=0,1,2,3,4, this is an enhancement function that can improve the performance of submitting and publishing txn
### `unused_rowset_monitor_interval`
Default30
Default: 30
Time interval for clearing expired Rowset, unit: second
### `upload_worker_count`
Default1
Default: 1
Maximum number of threads for uploading files
### `use_mmap_allocate_chunk`
Defaultfalse
Default: false
Whether to use mmap to allocate blocks. If you enable this feature, it is best to increase the value of vm.max_map_count, its default value is 65530. You can use "sysctl -w vm.max_map_count=262144" or "echo 262144> /proc/sys/vm/" to operate max_map_count as root. When this setting is true, you must set chunk_reserved_bytes_limit to a relatively low Big number, otherwise the performance is very very bad
@ -1386,7 +1386,7 @@ udf function directory
### `webserver_num_workers`
Default48
Default: 48
Webserver default number of worker threads
@ -1398,7 +1398,7 @@ Webserver default number of worker threads
### `write_buffer_size`
Default104857600
Default: 104857600
The size of the buffer before flashing
@ -1486,7 +1486,7 @@ The default value is currently only an empirical value, and may need to be modif
### `auto_refresh_brpc_channel`
* Type: bool
* Description: When obtaining a brpc connection, judge the availability of the connection through hand_shake rpc, and re-establish the connection if it is not available
* Description: When obtaining a brpc connection, judge the availability of the connection through hand_shake rpc, and re-establish the connection if it is not available .
* Default value: false
### `high_priority_flush_thread_num_per_store`

File diff suppressed because it is too large Load Diff

View File

@ -159,7 +159,7 @@ The rules of dynamic partition are prefixed with `dynamic_partition.`:
The range of reserved history periods. It should be in the form of `[yyyy-MM-dd,yyyy-MM-dd],[...,...]` while the `dynamic_partition.time_unit` is "DAY, WEEK, and MONTH". And it should be in the form of `[yyyy-MM-dd HH:mm:ss,yyyy-MM-dd HH:mm:ss],[...,...]` while the dynamic_partition.time_unit` is "HOUR". And no more spaces expected. The default value is `"NULL"`, which means it is not set.
Let us give an example. Suppose today is 2021-09-06partitioned by day, and the properties of dynamic partition are set to:
Let us give an example. Suppose today is 2021-09-06,partitioned by day, and the properties of dynamic partition are set to:
```time_unit="DAY/WEEK/MONTH", end=3, start=-3, reserved_history_periods="[2020-06-01,2020-06-20],[2020-10-31,2020-11-15]"```.

View File

@ -43,7 +43,7 @@ LDAP group authorization, is to map the group in LDAP to the Role in Doris, if t
You need to configure the LDAP basic information in the fe/conf/ldap.conf file, and the LDAP administrator password needs to be set using sql statements.
#### Configure the fe/conf/ldap.conf file
#### Configure the fe/conf/ldap.conf file:
* ldap_authentication_enabled = false
Set the value to "true" to enable LDAP authentication; when the value is "false", LDAP authentication is not enabled and all other configuration items of this profile are invalid.Set the value to "true" to enable LDAP authentication; when the value is "false", LDAP authentication is not enabled and all other configuration items of this profile are invalid.
@ -66,7 +66,7 @@ You need to configure the LDAP basic information in the fe/conf/ldap.conf file,
For example, if you use the LDAP user node uid attribute as the username to log into Doris, you can configure it as:
ldap_user_filter = (&(uid={login}));
This item can be configured using the LDAP user mailbox prefix as the user name:
ldap_user_filter = (&(mail={login}@baidu.com))
ldap_user_filter = (&(mail={login}@baidu.com)).
* ldap_group_basedn = ou=group,dc=domain,dc=com
base dn when Doris searches for group information in LDAP. if this item is not configured, LDAP group authorization will not be enabled.

View File

@ -497,7 +497,7 @@ The following configuration belongs to the system level configuration of SyncJob
* `max_bytes_sync_commit`
The maximum size of the data when the transaction is committed. If the data size received by Fe is larger than it, it will immediately commit the transaction and send the accumulated data. The default value is 64MB. If you want to modify this configuration, please ensure that this value is greater than the product of `canal.instance.memory.buffer.size` and `canal.instance.memory.buffer.mmemunit` on the canal side (16MB by default) and `min_bytes_sync_commit`
The maximum size of the data when the transaction is committed. If the data size received by Fe is larger than it, it will immediately commit the transaction and send the accumulated data. The default value is 64MB. If you want to modify this configuration, please ensure that this value is greater than the product of `canal.instance.memory.buffer.size` and `canal.instance.memory.buffer.mmemunit` on the canal side (16MB by default) and `min_bytes_sync_commit`.
* `max_sync_task_threads_num`

View File

@ -301,7 +301,7 @@ The user can control the stop, pause and restart of the job by the three command
7. The difference between STOP and PAUSE
the FE will automatically clean up stopped ROUTINE LOADwhile paused ROUTINE LOAD can be resumed
the FE will automatically clean up stopped ROUTINE LOAD,while paused ROUTINE LOAD can be resumed
## Related parameters

View File

@ -171,10 +171,10 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL`
+ two\_phase\_commit
Stream load supports the two-phase commit modeThe mode could be enabled by declaring ```two_phase_commit=true``` in http header. This mode is disabled by default.
the two-phase commit mode meansDuring Stream load, after data is written, the message will be returned to the client, the data is invisible at this point and the transaction status is PRECOMMITTED. The data will be visible only after COMMIT is triggered by client
Stream load supports the two-phase commit mode.The mode could be enabled by declaring ```two_phase_commit=true``` in http header. This mode is disabled by default.
the two-phase commit mode means: During Stream load, after data is written, the message will be returned to the client, the data is invisible at this point and the transaction status is PRECOMMITTED. The data will be visible only after COMMIT is triggered by client.
1. User can invoke the following interface to trigger commit operations for transaction
1. User can invoke the following interface to trigger commit operations for transaction:
```
curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" http://fe_host:http_port/api/{db}/_stream_load_2pc
```
@ -183,7 +183,7 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL`
curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" http://be_host:webserver_port/api/{db}/_stream_load_2pc
```
2. User can invoke the following interface to trigger abort operations for transaction
2. User can invoke the following interface to trigger abort operations for transaction:
```
curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" http://fe_host:http_port/api/{db}/_stream_load_2pc
```
@ -360,7 +360,7 @@ Cluster situation: The concurrency of Stream load is not affected by cluster siz
In the community version 0.14.0 and earlier versions, the connection reset exception occurred after Http V2 was enabled, because the built-in web container is tomcat, and Tomcat has pits in 307 (Temporary Redirect). There is a problem with the implementation of this protocol. All In the case of using Stream load to import a large amount of data, a connect reset exception will occur. This is because tomcat started data transmission before the 307 jump, which resulted in the lack of authentication information when the BE received the data request. Later, changing the built-in container to Jetty solved this problem. If you encounter this problem, please upgrade your Doris or disable Http V2 (`enable_http_server_v2=false`).
After the upgrade, also upgrade the http client version of your program to `4.5.13`Introduce the following dependencies in your pom.xml file
After the upgrade, also upgrade the http client version of your program to `4.5.13`,Introduce the following dependencies in your pom.xml file
```xml
<dependency>

View File

@ -32,9 +32,9 @@ If Doris' data disk capacity is not controlled, the process will hang because th
## Glossary
* FEDoris Frontend Node. Responsible for metadata management and request access.
* BEDoris Backend Node. Responsible for query execution and data storage.
* Data DirData directory, each data directory specified in the `storage_root_path` of the BE configuration file `be.conf`. Usually a data directory corresponds to a disk, so the following **disk** also refers to a data directory.
* FE: Doris Frontend Node. Responsible for metadata management and request access.
* BE: Doris Backend Node. Responsible for query execution and data storage.
* Data Dir: Data directory, each data directory specified in the `storage_root_path` of the BE configuration file `be.conf`. Usually a data directory corresponds to a disk, so the following **disk** also refers to a data directory.
## Basic Principles
@ -125,7 +125,7 @@ When the disk capacity is higher than High Watermark or even Flood Stage, many o
When the BE has crashed because the disk is full and cannot be started (this phenomenon may occur due to untimely detection of FE or BE), you need to delete some temporary files in the data directory to ensure that the BE process can start.
Files in the following directories can be deleted directly:
* log/Log files in the log directory.
* log/: Log files in the log directory.
* snapshot/: Snapshot files in the snapshot directory.
* trash/ Trash files in the trash directory.

View File

@ -124,9 +124,9 @@ There are many statistical information collected at BE. so we list the correspo
- BytesReceived: Size of bytes received by network
- DataArrivalWaitTime: Total waiting time of sender to push data
- MergeGetNext: When there is a sort in the lower level node, exchange node will perform a unified merge sort and output an ordered result. This indicator records the total time consumption of merge sorting, including the time consumption of MergeGetNextBatch.
- MergeGetNextBatchIt takes time for merge node to get data. If it is single-layer merge sort, the object to get data is network queue. For multi-level merge sorting, the data object is child merger.
- MergeGetNextBatch: It takes time for merge node to get data. If it is single-layer merge sort, the object to get data is network queue. For multi-level merge sorting, the data object is child merger.
- ChildMergeGetNext: When there are too many senders in the lower layer to send data, single thread merge will become a performance bottleneck. Doris will start multiple child merge threads to do merge sort in parallel. The sorting time of child merge is recorded, which is the cumulative value of multiple threads.
- ChildMergeGetNextBatch: It takes time for child merge to get dataIf the time consumption is too large, the bottleneck may be the lower level data sending node.
- ChildMergeGetNextBatch: It takes time for child merge to get data,If the time consumption is too large, the bottleneck may be the lower level data sending node.
- FirstBatchArrivalWaitTime: The time waiting for the first batch come from sender
- DeserializeRowBatchTimer: Time consuming to receive data deserialization
- SendersBlockedTotalTimer(*): When the DataStreamRecv's queue buffer is full, wait time of sender

View File

@ -53,7 +53,7 @@ Commit is divided into ‘ title ’ and ‘ content ’ , the title should be l
* deps: Modification of third-party dependency Library
* community: Such as modification of Github issue template.
Some tips
Some tips:
1. If there are multiple types in one commit, multiple types need to be added
2. If code refactoring brings performance improvement, [refactor][optimize] can be added at the same time
@ -80,7 +80,7 @@ Commit is divided into ‘ title ’ and ‘ content ’ , the title should be l
* config
* docs
Some tips
Some tips:
1. Try to use options that already exist in the list. If you need to add, please update this document in time
@ -93,7 +93,7 @@ Commit is divided into ‘ title ’ and ‘ content ’ , the title should be l
commit message should follow the following format:
```
issue#7777
issue: #7777
your message
```

View File

@ -44,10 +44,10 @@ https://dist.apache.org/repos/dist/release/incubator/doris/
For the first release, you need to copy the KEYS file as well. Then add it to the svn release.
```
add 成功后就可以在下面网址上看到你发布的文件
After add succeeds, you can see the files you published on the following website
https://dist.apache.org/repos/dist/release/incubator/doris/0.xx.0-incubating/
稍等一段时间后,能在 apache 官网看到:
After a while, you can see on the official website of Apache:
http://www.apache.org/dist/incubator/doris/0.9.0-incubating/
```
@ -150,7 +150,7 @@ Title:
[ANNOUNCE] Apache Doris (incubating) 0.9.0 Release
```
To mail
To mail:
```
dev@doris.apache.org

View File

@ -32,7 +32,7 @@ under the License.
1. Download the doris source code
URL[apache/incubator-doris: Apache Doris (Incubating) (github.com)](https://github.com/apache/incubator-doris)
URL: [apache/incubator-doris: Apache Doris (Incubating) (github.com)](https://github.com/apache/incubator-doris)
2. Install GCC 8.3.1+, Oracle JDK 1.8+, Python 2.7+, confirm that the gcc, java, python commands point to the correct version, and set the JAVA_HOME environment variable
@ -132,7 +132,7 @@ Need to create this folder, this is where the be data is stored
mkdir -p /soft/be/storage
```
3. Open vscode, and open the directory where the be source code is located. In this case, open the directory as **/home/workspace/incubator-doris/**For details on how to vscode, refer to the online tutorial
3. Open vscode, and open the directory where the be source code is located. In this case, open the directory as **/home/workspace/incubator-doris/**,For details on how to vscode, refer to the online tutorial
4. Install the vscode ms c++ debugging plug-in, the plug-in identified by the red box in the figure below

View File

@ -33,7 +33,7 @@ It can be used to test the performance of some parts of the BE storage layer (fo
## Compilation
1. To ensure that the environment has been able to successfully compile the Doris ontology, you can refer to [Installation and deployment] (https://doris.apache.org/master/en/installing/compilation.html)
1. To ensure that the environment has been able to successfully compile the Doris ontology, you can refer to [Installation and deployment] (https://doris.apache.org/master/en/installing/compilation.html).
2. Execute`run-be-ut.sh`
@ -53,9 +53,9 @@ The data set is generated according to the following rules.
>int: Random in [1,1000000].
The data character set of string type is uppercase and lowercase English letters, and the length varies according to the type.
> char: Length random in [1,8]
> varchar: Length random in [1,128]
> string: Length random in [1,100000]
> char: Length random in [1,8].
> varchar: Length random in [1,128].
> string: Length random in [1,100000].
`rows_number` indicates the number of rows of data, the default value is `10000`.

View File

@ -26,7 +26,7 @@ under the License.
# C++ Code Diagnostic
Doris support to use [Clangd](https://clangd.llvm.org/) and [Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/) to diagnostic code. Clangd and Clang-Tidy already has in [LDB-toolchain](https://doris.apache.org/zh-CN/installing/compilation-with-ldb-toolchain)also can install by self.
Doris support to use [Clangd](https://clangd.llvm.org/) and [Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/) to diagnostic code. Clangd and Clang-Tidy already has in [LDB-toolchain](https://doris.apache.org/zh-CN/installing/compilation-with-ldb-toolchain),also can install by self.
### Clang-Tidy
Clang-Tidy can do some diagnostic cofig, config file `.clang-tidy` is in Doris root path. Compared with vscode-cpptools, clangd can provide more powerful and accurate code jumping for vscode, and integrates the analysis and quick-fix functions of clang-tidy.

View File

@ -46,16 +46,16 @@ under the License.
Doris build against `thrift` 0.13.0 ( note : `Doris` 0.15 and later version build against `thrift` 0.13.0 , the previous version is still `thrift` 0.9.3)
Windows:
1. Download`http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.exe`
2. Copycopy the file to `./thirdparty/installed/bin`
1. Download: `http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.exe`
2. Copy: copy the file to `./thirdparty/installed/bin`
MacOS:
1. Download`brew install thrift@0.13.0`
2. Establish soft connection
1. Download: `brew install thrift@0.13.0`
2. Establish soft connection:
`mkdir -p ./thirdparty/installed/bin`
`ln -s /opt/homebrew/Cellar/thrift@0.13.0/0.13.0/bin/thrift ./thirdparty/installed/bin/thrift`
NoteThe error that the version cannot be found may be reported when MacOS execute `brew install thrift@0.13.0`. The solution is execute at the terminal as follows:
Note: The error that the version cannot be found may be reported when MacOS execute `brew install thrift@0.13.0`. The solution is execute at the terminal as follows:
1. `brew tap-new $USER/local-tap`
2. `brew extract --version='0.13.0' thrift $USER/local-tap`
3. `brew install thrift@0.13.0`

View File

@ -47,7 +47,7 @@ Create `settings.json` in `.vscode/` , and set settings:
* `"java.configuration.runtimes"`
* `"java.jdt.ls.java.home"` -- must set it to the directory of JDK11+, used for vscode-java plugin
* `"maven.executable.path"` -- maven pathfor maven-language-server plugin
* `"maven.executable.path"` -- maven path,for maven-language-server plugin
example:

View File

@ -349,7 +349,7 @@ PROPERTIES (
);
```
Parameter Description
Parameter Description:
Parameter | Description
---|---
@ -378,7 +378,7 @@ PROPERTIES (
);
```
Parameter Description
Parameter Description:
Parameter | Description
---|---

View File

@ -81,14 +81,14 @@ Note: Executing `brew install thrift@0.13.0` on MacOS may report an error that t
Reference link: `https://gist.github.com/tonydeng/02e571f273d6cce4230dc8d5f394493c`
Linux:
1.Download source package`wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
2.Install dependencies`yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
1.Download source package: `wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
2.Install dependencies: `yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
3.`tar zxvf thrift-0.13.0.tar.gz`
4.`cd thrift-0.13.0`
5.`./configure --without-tests`
6.`make`
7.`make install`
Check the version after installation is completethrift --version
Check the version after installation is complete: thrift --version
Note: If you have compiled Doris, you do not need to install thrift, you can directly use $DORIS_HOME/thirdparty/installed/bin/thrift
```

View File

@ -49,7 +49,7 @@ CREATE TABLE IF NOT EXISTS `hive_bitmap_table`(
```
### Hive Bitmap UDF Usage
### Hive Bitmap UDF Usage:
Hive Bitmap UDF used in Hive/Spark

View File

@ -77,14 +77,14 @@ Note: Executing `brew install thrift@0.13.0` on MacOS may report an error that t
Reference link: `https://gist.github.com/tonydeng/02e571f273d6cce4230dc8d5f394493c`
Linux:
1.Download source package`wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
2.Install dependencies`yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
1.Download source package: `wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
2.Install dependencies: `yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
3.`tar zxvf thrift-0.13.0.tar.gz`
4.`cd thrift-0.13.0`
5.`./configure --without-tests`
6.`make`
7.`make install`
Check the version after installation is completethrift --version
Check the version after installation is complete: thrift --version
Note: If you have compiled Doris, you do not need to install thrift, you can directly use $DORIS_HOME/thirdparty/installed/bin/thrift
```

View File

@ -61,7 +61,7 @@ Instructions:
3. The UDF call type represented by `type` in properties is native by default. When using java UDF, it is transferred to `Java_UDF`.
4. `name`: A function belongs to a DB and name is of the form`dbName`.`funcName`. When `dbName` is not explicitly specified, the db of the current session is used`dbName`.
Sample
Sample:
```sql
CREATE FUNCTION java_udf_add_one(int) RETURNS int PROPERTIES (
"file"="file:///path/to/java-udf-demo-jar-with-dependencies.jar",

View File

@ -46,17 +46,17 @@ Copy gensrc/proto/function_service.proto and gensrc/proto/types.proto to Rpc ser
- function_service.proto
- PFunctionCallRequest
- function_nameThe function name, corresponding to the symbol specified when the function was created
- argsThe parameters passed by the method
- contextQuerying context Information
- function_name:The function name, corresponding to the symbol specified when the function was created
- args:The parameters passed by the method
- context:Querying context Information
- PFunctionCallResponse
- resultReturn result
- statusReturn Status, 0 indicates normal
- result:Return result
- status:Return Status, 0 indicates normal
- PCheckFunctionRequest
- functionFunction related information
- match_typeMatching type
- function:Function related information
- match_type:Matching type
- PCheckFunctionResponse
- statusReturn status, 0 indicates normal
- status:Return status, 0 indicates normal
### Generated interface
@ -65,9 +65,9 @@ Use protoc generate code, and specific parameters are viewed using protoc -h
### Implementing an interface
The following three methods need to be implemented
- fnCallUsed to write computational logic
- checkFnUsed to verify function names, parameters, and return values when creating UDFs
- handShakeUsed for interface probe
- fnCall:Used to write computational logic
- checkFn:Used to verify function names, parameters, and return values when creating UDFs
- handShake:Used for interface probe
## Create UDF
@ -81,10 +81,10 @@ PROPERTIES (["key"="value"][,...])
```
Instructions:
1. PROPERTIES中`symbol`Represents the name of the method passed by the RPC call, which must be set
2. PROPERTIES中`object_file`Represents the RPC service address. Currently, a single address and a cluster address in BRPC-compatible format are supported. Refer to the cluster connection mode[Format specification](https://github.com/apache/incubator-brpc/blob/master/docs/cn/client.md#%E8%BF%9E%E6%8E%A5%E6%9C%8D%E5%8A%A1%E9%9B%86%E7%BE%A4)
3. PROPERTIES中`type`Indicates the UDF call type, which is Native by default. Rpc is transmitted when Rpc UDF is used
4. name: A function belongs to a DB and name is of the form`dbName`.`funcName`. When `dbName` is not explicitly specified, the db of the current session is used`dbName`
1. PROPERTIES中`symbol`Represents the name of the method passed by the RPC call, which must be set.
2. PROPERTIES中`object_file`Represents the RPC service address. Currently, a single address and a cluster address in BRPC-compatible format are supported. Refer to the cluster connection mode[Format specification](https://github.com/apache/incubator-brpc/blob/master/docs/cn/client.md#%E8%BF%9E%E6%8E%A5%E6%9C%8D%E5%8A%A1%E9%9B%86%E7%BE%A4).
3. PROPERTIES中`type`Indicates the UDF call type, which is Native by default. Rpc is transmitted when Rpc UDF is used.
4. name: A function belongs to a DB and name is of the form`dbName`.`funcName`. When `dbName` is not explicitly specified, the db of the current session is used`dbName`.
Sample:
```sql

View File

@ -215,8 +215,8 @@ See the section on `lower_case_table_names` variables in [Variables](../administ
**instructions**
* 1./home/disk1/doris,medium:hdd,capacity:10capacity limit is 10GB, HDD;
* 2./home/disk2/doris,medium:ssd,capacity:50capacity limit is 50GB, SSD;
* 1./home/disk1/doris,medium:hdd,capacity:10,capacity limit is 10GB, HDD;
* 2./home/disk2/doris,medium:ssd,capacity:50,capacity limit is 50GB, SSD;
* BE webserver_port configuration

View File

@ -33,8 +33,8 @@ under the License.
`BITMAP BITMAP_SUBSET_LIMIT(BITMAP src, BIGINT range_start, BIGINT cardinality_limit)`
Create subset of the BITMAP, begin with range from range_start, limit by cardinality_limit
range_startstart value for the range
cardinality_limitsubset upper limit
range_start: start value for the range
cardinality_limit: subset upper limit
## example
@ -50,7 +50,7 @@ mysql> select bitmap_to_string(bitmap_subset_limit(bitmap_from_string('1,2,3,4,5
+-------+
| value |
+-------+
| 45 |
| 4,5 |
+-------+
```

View File

@ -31,7 +31,7 @@ under the License.
`INT bit_length (VARCHAR str)`
Return length of argument in bits
Return length of argument in bits.
## example

View File

@ -101,7 +101,7 @@ This section introduces the methods that can be used as analysis functions in Do
### AVG()
grammar
grammar:
```sql
AVG([DISTINCT | ALL] *expression*) [OVER (*analytic_clause*)]
@ -136,7 +136,7 @@ from int_t where property in ('odd','even');
### COUNT()
grammar
grammar:
```sql
COUNT([DISTINCT | ALL] expression) [OVER (analytic_clause)]
@ -173,7 +173,7 @@ from int_t where property in ('odd','even');
The DENSE_RANK() function is used to indicate the ranking. Unlike RANK(), DENSE_RANK() does not have vacant numbers. For example, if there are two parallel ones, the third number of DENSE_RANK() is still 2, and the third number of RANK() is 3.
grammar
grammar:
```sql
DENSE_RANK() OVER(partition_by_clause order_by_clause)
@ -202,7 +202,7 @@ The following example shows the ranking of the x column grouped by the property
FIRST_VALUE() returns the first value in the window range.
grammar
grammar:
```sql
FIRST_VALUE(expr) OVER(partition_by_clause order_by_clause [window_clause])
@ -224,7 +224,7 @@ We have the following data
| Mats | Sweden | Tja |
```
Use FIRST_VALUE() to group by country and return the value of the first greeting in each group
Use FIRST_VALUE() to group by country and return the value of the first greeting in each group:
```sql
select country, name,
@ -244,7 +244,7 @@ over (partition by country order by name, greeting) as greeting from mail_merge;
The LAG() method is used to calculate the value of several lines forward from the current line.
grammar
grammar:
```sql
LAG (expr, offset, default) OVER (partition_by_clause order_by_clause)
@ -274,7 +274,7 @@ order by closing_date;
LAST_VALUE() returns the last value in the window range. Contrary to FIRST_VALUE().
grammar
grammar:
```sql
LAST_VALUE(expr) OVER(partition_by_clause order_by_clause [window_clause])
@ -301,7 +301,7 @@ from mail_merge;
The LEAD() method is used to calculate the value of several rows from the current row.
grammar
grammar:
```sql
LEAD (expr, offset, default]) OVER (partition_by_clause order_by_clause)
@ -334,7 +334,7 @@ order by closing_date;
### MAX()
grammar
grammar:
```sql
MAX([DISTINCT | ALL] expression) [OVER (analytic_clause)]
@ -365,7 +365,7 @@ from int_t where property in ('prime','square');
### MIN()
grammar
grammar:
```sql
MIN([DISTINCT | ALL] expression) [OVER (analytic_clause)]
@ -398,7 +398,7 @@ from int_t where property in ('prime','square');
The RANK() function is used to indicate ranking. Unlike DENSE_RANK(), RANK() will have vacant numbers. For example, if there are two parallel 1s, the third number in RANK() is 3, not 2.
grammar
grammar:
```sql
RANK() OVER(partition_by_clause order_by_clause)
@ -427,7 +427,7 @@ select x, y, rank() over(partition by x order by y) as rank from int_t;
For each row of each Partition, an integer that starts from 1 and increases continuously is returned. Unlike RANK() and DENSE_RANK(), the value returned by ROW_NUMBER() will not be repeated or vacant, and is continuously increasing.
grammar
grammar:
```sql
ROW_NUMBER() OVER(partition_by_clause order_by_clause)
@ -452,7 +452,7 @@ select x, y, row_number() over(partition by x order by y) as rank from int_t;
### SUM()
grammar
grammar:
```sql
SUM([DISTINCT | ALL] expression) [OVER (analytic_clause)]

View File

@ -40,7 +40,7 @@ key:
Super user rights:
max_user_connections: Maximum number of connections.
max_query_instances: Maximum number of query instance user can use when query.
sql_block_rules: set sql block rulesAfter setting, if the query user execute match the rules, it will be rejected.
sql_block_rules: set sql block rules.After setting, if the query user execute match the rules, it will be rejected.
cpu_resource_limit: limit the cpu resource usage of a query. See session variable `cpu_resource_limit`.
exec_mem_limit: Limit the memory usage of the query. See the description of the session variable `exec_mem_limit` for details. -1 means not set.
load_mem_limit: Limit memory usage for imports. See the introduction of the session variable `load_mem_limit` for details. -1 means not set.

View File

@ -78,7 +78,7 @@ under the License.
Other properties: Other information necessary to access remote storage, such as authentication information.
7) Modify BE node attributes currently supports the following attributes:
1. tag.locationResource tag
1. tag.location: Resource tag
2. disable_query: Query disabled attribute
3. disable_load: Load disabled attribute

View File

@ -199,7 +199,7 @@ under the License.
9. Modify default buckets number of partition
grammer:
MODIFY DISTRIBUTION DISTRIBUTED BY HASH (k1[,k2 ...]) BUCKETS num
note
note:
1)Only support non colocate table with RANGE partition and HASH distribution
10. Modify table comment

View File

@ -51,12 +51,12 @@ Grammar:
## example
[CANCEL ALTER TABLE COLUMN]
1. 撤销针对 my_table 的 ALTER COLUMN 操作。
1. Cancel ALTER COLUMN operation for my_table.
CANCEL ALTER TABLE COLUMN
FROM example_db.my_table;
[CANCEL ALTER TABLE ROLLUP]
1. 撤销 my_table 下的 ADD ROLLUP 操作。
1. Cancel ADD ROLLUP operation for my_table.
CANCEL ALTER TABLE ROLLUP
FROM example_db.my_table;

View File

@ -79,7 +79,7 @@ CREATE [AGGREGATE] [ALIAS] FUNCTION function_name
> "prepare_fn": Function signature of the prepare function for finding the entry from the dynamic library. This option is optional for custom functions
>
> "close_fn": Function signature of the close function for finding the entry from the dynamic library. This option is optional for custom functions
> "type" Function type, RPC for remote udf, NATIVE for c++ native udf
> "type": Function type, RPC for remote udf, NATIVE for c++ native udf

View File

@ -36,7 +36,7 @@ under the License.
2. Baidu AFS: afs for Baidu. Only be used inside Baidu.
3. Baidu Object Storage(BOS): BOS on Baidu Cloud.
4. Apache HDFS.
5. Amazon S3Amazon S3
5. Amazon S3: Amazon S3.
### Syntax:
@ -137,14 +137,14 @@ under the License.
read_properties:
Used to specify some special parameters.
Syntax
Syntax:
[PROPERTIES ("key"="value", ...)]
You can specify the following parameters:
line_delimiter Used to specify the line delimiter in the load file. The default is `\n`. You can use a combination of multiple characters as the column separator.
line_delimiter: Used to specify the line delimiter in the load file. The default is `\n`. You can use a combination of multiple characters as the column separator.
fuzzy_parse Boolean type, true to indicate that parse json schema as the first line, this can make import more faster,but need all key keep the order of first line, default value is false. Only use for json format.
fuzzy_parse: Boolean type, true to indicate that parse json schema as the first line, this can make import more faster,but need all key keep the order of first line, default value is false. Only use for json format.
jsonpaths: There are two ways to import json: simple mode and matched mode.
simple mode: it is simple mode without setting the jsonpaths parameter. In this mode, the json data is required to be the object type. For example:
@ -152,7 +152,7 @@ under the License.
matched mode: the json data is relatively complex, and the corresponding value needs to be matched through the jsonpaths parameter.
strip_outer_array: Boolean type, true to indicate that json data starts with an array object and flattens objects in the array object, default value is false. For example
strip_outer_array: Boolean type, true to indicate that json data starts with an array object and flattens objects in the array object, default value is false. For example:
[
{"k1" : 1, "v1" : 2},
{"k1" : 3, "v1" : 4}
@ -207,9 +207,9 @@ under the License.
dfs.client.failover.proxy.provider: Specify the provider that client connects to namenode by default: org. apache. hadoop. hdfs. server. namenode. ha. Configured Failover ProxyProvider.
4.4. Amazon S3
fs.s3a.access.keyAmazonS3的access key
fs.s3a.secret.keyAmazonS3的secret key
fs.s3a.endpointAmazonS3的endpoint
fs.s3a.access.key: AmazonS3的access key
fs.s3a.secret.key: AmazonS3的secret key
fs.s3a.endpoint: AmazonS3的endpoint
4.5. If using the S3 protocol to directly connect to the remote storage, you need to specify the following attributes
(
@ -230,7 +230,7 @@ under the License.
)
fs.defaultFS: defaultFS
hdfs_user: hdfs user
namenode HA
namenode HA:
By configuring namenode HA, new namenode can be automatically identified when the namenode is switched
dfs.nameservices: hdfs service name, customize, eg: "dfs.nameservices" = "my_ha"
dfs.ha.namenodes.xxx: Customize the name of a namenode, separated by commas. XXX is a custom name in dfs. name services, such as "dfs. ha. namenodes. my_ha" = "my_nn"

View File

@ -76,12 +76,12 @@ under the License.
7. hdfs
Specify to use libhdfs export to hdfs
Grammar
Grammar:
WITH HDFS ("key"="value"[,...])
The following parameters can be specified:
fs.defaultFS: Set the fs such ashdfs://ip:port
hdfs_userSpecify hdfs user name
fs.defaultFS: Set the fs such as:hdfs://ip:port
hdfs_user:Specify hdfs user name
## example

View File

@ -162,10 +162,10 @@ Date class (DATE/DATETIME): 2017-10-03, 2017-06-13 12:34:03.
NULL value: N
6. S3 Storage
fs.s3a.access.key user AKrequired
fs.s3a.secret.key user SKrequired
fs.s3a.endpoint user endpointrequired
fs.s3a.impl.disable.cache whether disable cachedefault trueoptional
fs.s3a.access.key user AK,required
fs.s3a.secret.key user SK,required
fs.s3a.endpoint user endpoint,required
fs.s3a.impl.disable.cache whether disable cache,default true,optional
'35;'35; example

View File

@ -29,7 +29,7 @@ under the License.
The `SELECT INTO OUTFILE` statement can export the query results to a file. Currently supports export to remote storage through Broker process, or directly through S3, HDFS protocol such as HDFS, S3, BOS and COS(Tencent Cloud) through the Broker process. The syntax is as follows:
Grammar
Grammar:
query_stmt
INTO OUTFILE "file_path"
[format_as]
@ -50,7 +50,7 @@ under the License.
3. properties
Specify the relevant attributes. Currently it supports exporting through the Broker process, or through the S3, HDFS protocol.
Grammar
Grammar:
[PROPERTIES ("key"="value", ...)]
The following parameters can be specified:
column_separator: Specifies the exported column separator, defaulting to t. Supports invisible characters, such as'\x07'.
@ -173,7 +173,7 @@ under the License.
"AWS_SECRET_KEY" = "xxx",
"AWS_REGION" = "bd"
)
The final generated file prefix is `my_file_{fragment_instance_id}_`
The final generated file prefix is `my_file_{fragment_instance_id}_`.
7. Use the s3 protocol to export to bos, and enable concurrent export of session variables.
set enable_parallel_outfile = true;

View File

@ -30,11 +30,11 @@ under the License.
The kafka partition and offset in the result show the currently consumed partition and the corresponding offset to be consumed.
grammar
grammar:
SHOW [ALL] CREATE ROUTINE LOAD for load_name;
Description
`ALL`: optionalIs for getting all jobs, including history jobs
Description:
`ALL`: optional,Is for getting all jobs, including history jobs
`load_name`: routine load name
## example