[Improvement](docs) Update EN doc (#9228)
This commit is contained in:
@ -38,13 +38,13 @@ Support SQL block rule by user level:
|
||||
|
||||
SQL block rule CRUD
|
||||
- create SQL block rule
|
||||
- sql:Regex pattern,Special characters need to be translated, "NULL" by default
|
||||
- sql: Regex pattern,Special characters need to be translated, "NULL" by default
|
||||
- sqlHash: Sql hash value, Used to match exactly, We print it in fe.audit.log, This parameter is the only choice between sql and sql, "NULL" by default
|
||||
- partition_num: Max number of partitions will be scanned by a scan node, 0L by default
|
||||
- tablet_num: Max number of tablets will be scanned by a scan node, 0L by default
|
||||
- cardinality: An inaccurate number of scan rows of a scan node, 0L by default
|
||||
- global: Whether global(all users)is in effect, false by default
|
||||
- enable:Whether to enable block rule,true by default
|
||||
- enable: Whether to enable block rule,true by default
|
||||
```sql
|
||||
CREATE SQL_BLOCK_RULE test_rule
|
||||
PROPERTIES(
|
||||
@ -70,7 +70,7 @@ CREATE SQL_BLOCK_RULE test_rule2 PROPERTIES("partition_num" = "30", "cardinality
|
||||
```sql
|
||||
SHOW SQL_BLOCK_RULE [FOR RULE_NAME]
|
||||
```
|
||||
- alter SQL block rule,Allows changes sql/sqlHash/global/enable/partition_num/tablet_num/cardinality anyone
|
||||
- alter SQL block rule,Allows changes sql/sqlHash/global/enable/partition_num/tablet_num/cardinality anyone
|
||||
- sql and sqlHash cannot be set both. It means if sql or sqlHash is set in a rule, another property will never be allowed to be altered
|
||||
- sql/sqlHash and partition_num/tablet_num/cardinality cannot be set together. For example, partition_num is set in a rule, then sql or sqlHash will never be allowed to be altered.
|
||||
```sql
|
||||
@ -81,7 +81,7 @@ ALTER SQL_BLOCK_RULE test_rule PROPERTIES("sql"="select \\* from test_table","en
|
||||
ALTER SQL_BLOCK_RULE test_rule2 PROPERTIES("partition_num" = "10","tablet_num"="300","enable"="true")
|
||||
```
|
||||
|
||||
- drop SQL block rule,Support multiple rules, separated by `,`
|
||||
- drop SQL block rule,Support multiple rules, separated by `,`
|
||||
```sql
|
||||
DROP SQL_BLOCK_RULE test_rule1,test_rule2
|
||||
```
|
||||
|
||||
@ -28,7 +28,7 @@ under the License.
|
||||
|
||||
Bucket Shuffle Join is a new function officially added in Doris 0.14. The purpose is to provide local optimization for some join queries to reduce the time-consuming of data transmission between nodes and speed up the query.
|
||||
|
||||
It's design, implementation can be referred to [ISSUE 4394](https://github.com/apache/incubator-doris/issues/4394)。
|
||||
It's design, implementation can be referred to [ISSUE 4394](https://github.com/apache/incubator-doris/issues/4394).
|
||||
|
||||
## Noun Interpretation
|
||||
|
||||
@ -40,7 +40,7 @@ It's design, implementation can be referred to [ISSUE 4394](https://github.com/a
|
||||
## Principle
|
||||
The conventional distributed join methods supported by Doris is: `Shuffle Join, Broadcast Join`. Both of these join will lead to some network overhead.
|
||||
|
||||
For example, there are join queries for table A and table B. the join method is hashjoin. The cost of different join types is as follows:
|
||||
For example, there are join queries for table A and table B. the join method is hashjoin. The cost of different join types is as follows:
|
||||
* **Broadcast Join**: If table a has three executing hashjoinnodes according to the data distribution, table B needs to be sent to the three HashJoinNode. Its network overhead is `3B `, and its memory overhead is `3B`.
|
||||
* **Shuffle Join**: Shuffle join will distribute the data of tables A and B to the nodes of the cluster according to hash calculation, so its network overhead is `A + B` and memory overhead is `B`.
|
||||
|
||||
@ -50,9 +50,9 @@ The data distribution information of each Doris table is saved in FE. If the joi
|
||||
|
||||
The picture above shows how the Bucket Shuffle Join works. The SQL query is A table join B table. The equivalent expression of join hits the data distribution column of A. According to the data distribution information of table A. Bucket Shuffle Join sends the data of table B to the corresponding data storage and calculation node of table A. The cost of Bucket Shuffle Join is as follows:
|
||||
|
||||
* network cost: ``` B < min(3B, A + B) ```
|
||||
* network cost: ``` B < min(3B, A + B) ```
|
||||
|
||||
* memory cost: ``` B <= min(3B, B) ```
|
||||
* memory cost: ``` B <= min(3B, B) ```
|
||||
|
||||
Therefore, compared with Broadcast Join and Shuffle Join, Bucket shuffle join has obvious performance advantages. It reduces the time-consuming of data transmission between nodes and the memory cost of join. Compared with Doris's original join method, it has the following advantages
|
||||
|
||||
@ -91,7 +91,7 @@ You can use the `explain` command to check whether the join is a Bucket Shuffle
|
||||
| | equal join conjunct: `test`.`k1` = `baseall`.`k1`
|
||||
```
|
||||
|
||||
The join type indicates that the join method to be used is:`BUCKET_SHUFFLE`。
|
||||
The join type indicates that the join method to be used is:`BUCKET_SHUFFLE`.
|
||||
|
||||
## Planning rules of Bucket Shuffle Join
|
||||
|
||||
|
||||
@ -101,25 +101,25 @@ There are two ways to configure BE configuration items:
|
||||
|
||||
### `alter_tablet_worker_count`
|
||||
|
||||
Default:3
|
||||
Default: 3
|
||||
|
||||
The number of threads making schema changes
|
||||
|
||||
### `base_compaction_check_interval_seconds`
|
||||
|
||||
Default:60 (s)
|
||||
Default: 60 (s)
|
||||
|
||||
BaseCompaction thread polling interval
|
||||
|
||||
### `base_compaction_interval_seconds_since_last_operation`
|
||||
|
||||
Default:86400
|
||||
Default: 86400
|
||||
|
||||
One of the triggering conditions of BaseCompaction: the interval since the last BaseCompaction
|
||||
|
||||
### `base_compaction_num_cumulative_deltas`
|
||||
|
||||
Default:5
|
||||
Default: 5
|
||||
|
||||
One of the triggering conditions of BaseCompaction: The limit of the number of Cumulative files to be reached. After reaching this limit, BaseCompaction will be triggered
|
||||
|
||||
@ -150,13 +150,13 @@ Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"in
|
||||
|
||||
### `base_compaction_write_mbytes_per_sec`
|
||||
|
||||
Default:5(MB)
|
||||
Default: 5(MB)
|
||||
|
||||
Maximum disk write speed per second of BaseCompaction task
|
||||
|
||||
### `base_cumulative_delta_ratio`
|
||||
|
||||
Default:0.3 (30%)
|
||||
Default: 0.3 (30%)
|
||||
|
||||
One of the trigger conditions of BaseCompaction: Cumulative file size reaches the proportion of Base file
|
||||
|
||||
@ -206,7 +206,7 @@ User can set this configuration to a larger value to get better QPS performance.
|
||||
|
||||
### `buffer_pool_clean_pages_limit`
|
||||
|
||||
默认值:20G
|
||||
默认值: 20G
|
||||
|
||||
Clean up pages that may be saved by the buffer pool
|
||||
|
||||
@ -226,25 +226,25 @@ The maximum amount of memory available in the BE buffer pool. The buffer pool is
|
||||
|
||||
### `check_consistency_worker_count`
|
||||
|
||||
Default:1
|
||||
Default: 1
|
||||
|
||||
The number of worker threads to calculate the checksum of the tablet
|
||||
|
||||
### `chunk_reserved_bytes_limit`
|
||||
|
||||
Default:2147483648
|
||||
Default: 2147483648
|
||||
|
||||
The reserved bytes limit of Chunk Allocator is 2GB by default. Increasing this variable can improve performance, but it will get more free memory that other modules cannot use.
|
||||
|
||||
### `clear_transaction_task_worker_count`
|
||||
|
||||
Default:1
|
||||
Default: 1
|
||||
|
||||
Number of threads used to clean up transactions
|
||||
|
||||
### `clone_worker_count`
|
||||
|
||||
Default:3
|
||||
Default: 3
|
||||
|
||||
Number of threads used to perform cloning tasks
|
||||
|
||||
@ -258,13 +258,13 @@ This value is usually delivered by the FE to the BE by the heartbeat, no need to
|
||||
|
||||
### `column_dictionary_key_ratio_threshold`
|
||||
|
||||
Default:0
|
||||
Default: 0
|
||||
|
||||
The value ratio of string type, less than this ratio, using dictionary compression algorithm
|
||||
|
||||
### `column_dictionary_key_size_threshold`
|
||||
|
||||
Default:0
|
||||
Default: 0
|
||||
|
||||
Dictionary compression column size, less than this value using dictionary compression algorithm
|
||||
|
||||
@ -305,7 +305,7 @@ tablet_score = compaction_tablet_scan_frequency_factor * tablet_scan_frequency +
|
||||
|
||||
### `create_tablet_worker_count`
|
||||
|
||||
Default:3
|
||||
Default: 3
|
||||
|
||||
Number of worker threads for BE to create a tablet
|
||||
|
||||
@ -325,19 +325,19 @@ Generally it needs to be turned off. When you want to manually operate the compa
|
||||
|
||||
### `cumulative_compaction_budgeted_bytes`
|
||||
|
||||
Default:104857600
|
||||
Default: 104857600
|
||||
|
||||
One of the trigger conditions of BaseCompaction: Singleton file size limit, 100MB
|
||||
|
||||
### `cumulative_compaction_check_interval_seconds`
|
||||
|
||||
Default:10 (s)
|
||||
Default: 10 (s)
|
||||
|
||||
CumulativeCompaction thread polling interval
|
||||
|
||||
### `cumulative_compaction_skip_window_seconds`
|
||||
|
||||
Default:30(s)
|
||||
Default: 30(s)
|
||||
|
||||
CumulativeCompaction skips the most recently released increments to prevent compacting versions that may be queried (in case the query planning phase takes some time). Change the parameter is to set the skipped window time size
|
||||
|
||||
@ -419,13 +419,13 @@ In some deployment environments, the `conf/` directory may be overwritten due to
|
||||
|
||||
### `delete_worker_count`
|
||||
|
||||
Default:3
|
||||
Default: 3
|
||||
|
||||
Number of threads performing data deletion tasks
|
||||
|
||||
### `disable_mem_pools`
|
||||
|
||||
Default:false
|
||||
Default: false
|
||||
|
||||
Whether to disable the memory cache pool, it is not disabled by default
|
||||
|
||||
@ -437,13 +437,13 @@ Whether to disable the memory cache pool, it is not disabled by default
|
||||
|
||||
### `disk_stat_monitor_interval`
|
||||
|
||||
Default:5(s)
|
||||
Default: 5(s)
|
||||
|
||||
Disk status check interval
|
||||
|
||||
### `doris_cgroups`
|
||||
|
||||
Default:empty
|
||||
Default: empty
|
||||
|
||||
Cgroups assigned to doris
|
||||
|
||||
@ -475,7 +475,7 @@ When the concurrency cannot be improved in high concurrency scenarios, try to re
|
||||
|
||||
### `doris_scanner_row_num`
|
||||
|
||||
Default:16384
|
||||
Default: 16384
|
||||
|
||||
The maximum number of data rows returned by each scanning thread in a single execution
|
||||
|
||||
@ -493,31 +493,31 @@ The maximum number of data rows returned by each scanning thread in a single exe
|
||||
|
||||
### `download_low_speed_limit_kbps`
|
||||
|
||||
Default:50 (KB/s)
|
||||
Default: 50 (KB/s)
|
||||
|
||||
Minimum download speed
|
||||
|
||||
### `download_low_speed_time`
|
||||
|
||||
Default:300(s)
|
||||
Default: 300(s)
|
||||
|
||||
Download time limit, 300 seconds by default
|
||||
|
||||
### `download_worker_count`
|
||||
|
||||
Default:1
|
||||
Default: 1
|
||||
|
||||
The number of download threads, the default is 1
|
||||
|
||||
### `drop_tablet_worker_count`
|
||||
|
||||
Default:3
|
||||
Default: 3
|
||||
|
||||
Number of threads to delete tablet
|
||||
|
||||
### `enable_metric_calculator`
|
||||
|
||||
Default:true
|
||||
Default: true
|
||||
|
||||
If set to true, the metric calculator will run to collect BE-related indicator information, if set to false, it will not run
|
||||
|
||||
@ -540,31 +540,31 @@ If set to true, the metric calculator will run to collect BE-related indicator i
|
||||
|
||||
### `enable_system_metrics`
|
||||
|
||||
Default:true
|
||||
Default: true
|
||||
|
||||
User control to turn on and off system indicators.
|
||||
|
||||
### `enable_token_check`
|
||||
|
||||
Default:true
|
||||
Default: true
|
||||
|
||||
Used for forward compatibility, will be removed later.
|
||||
|
||||
### `es_http_timeout_ms`
|
||||
|
||||
Default:5000 (ms)
|
||||
Default: 5000 (ms)
|
||||
|
||||
The timeout period for connecting to ES via http, the default is 5 seconds.
|
||||
|
||||
### `es_scroll_keepalive`
|
||||
|
||||
Default:5m
|
||||
Default: 5m
|
||||
|
||||
es scroll Keeplive hold time, the default is 5 minutes
|
||||
|
||||
### `etl_thread_pool_queue_size`
|
||||
|
||||
Default:256
|
||||
Default: 256
|
||||
|
||||
The size of the ETL thread pool
|
||||
|
||||
@ -578,20 +578,20 @@ The size of the ETL thread pool
|
||||
|
||||
### `file_descriptor_cache_capacity`
|
||||
|
||||
Default:32768
|
||||
Default: 32768
|
||||
|
||||
File handle cache capacity, 32768 file handles are cached by default.
|
||||
|
||||
### `cache_clean_interval`
|
||||
|
||||
Default:1800(s)
|
||||
Default: 1800(s)
|
||||
|
||||
File handle cache cleaning interval, used to clean up file handles that have not been used for a long time.
|
||||
Also the clean interval of Segment Cache.
|
||||
|
||||
### `flush_thread_num_per_store`
|
||||
|
||||
Default:2
|
||||
Default: 2
|
||||
|
||||
The number of threads used to refresh the memory table per store
|
||||
|
||||
@ -599,17 +599,17 @@ The number of threads used to refresh the memory table per store
|
||||
|
||||
### `fragment_pool_queue_size`
|
||||
|
||||
Default:2048
|
||||
Default: 2048
|
||||
|
||||
The upper limit of query requests that can be processed on a single node
|
||||
|
||||
### `fragment_pool_thread_num_min`
|
||||
|
||||
Default:64
|
||||
Default: 64
|
||||
|
||||
### `fragment_pool_thread_num_max`
|
||||
|
||||
Default:256
|
||||
Default: 256
|
||||
|
||||
The above two parameters are to set the number of query threads. By default, a minimum of 64 threads will be started, subsequent query requests will dynamically create threads, and a maximum of 256 threads will be created.
|
||||
|
||||
@ -626,7 +626,7 @@ The above two parameters are to set the number of query threads. By default, a m
|
||||
|
||||
### `ignore_broken_disk`
|
||||
|
||||
Default:false
|
||||
Default: false
|
||||
|
||||
When BE start, If there is a broken disk, BE process will exit by default.Otherwise, we will ignore the broken disk
|
||||
|
||||
@ -662,37 +662,37 @@ When configured as true, the program will run normally and ignore this error. In
|
||||
|
||||
### inc_rowset_expired_sec
|
||||
|
||||
Default:1800 (s)
|
||||
Default: 1800 (s)
|
||||
|
||||
Import activated data, storage engine retention time, used for incremental cloning
|
||||
|
||||
### `index_stream_cache_capacity`
|
||||
|
||||
Default:10737418240
|
||||
Default: 10737418240
|
||||
|
||||
BloomFilter/Min/Max and other statistical information cache capacity
|
||||
|
||||
### `kafka_broker_version_fallback`
|
||||
|
||||
Default:0.10.0
|
||||
Default: 0.10.0
|
||||
|
||||
If the dependent Kafka version is lower than the Kafka client version that routine load depends on, the value set by the fallback version kafka_broker_version_fallback will be used, and the valid values are: 0.9.0, 0.8.2, 0.8.1, 0.8.0.
|
||||
|
||||
### `load_data_reserve_hours`
|
||||
|
||||
Default:4(hour)
|
||||
Default: 4(hour)
|
||||
|
||||
Used for mini load. The mini load data file will be deleted after this time
|
||||
|
||||
### `load_error_log_reserve_hours`
|
||||
|
||||
Default:48 (hour)
|
||||
Default: 48 (hour)
|
||||
|
||||
The load error log will be deleted after this time
|
||||
|
||||
### `load_process_max_memory_limit_bytes`
|
||||
|
||||
Default:107374182400
|
||||
Default: 107374182400
|
||||
|
||||
The upper limit of memory occupied by all imported threads on a single node, default value: 100G
|
||||
|
||||
@ -700,7 +700,7 @@ Set these default values very large, because we don't want to affect load perfor
|
||||
|
||||
### `load_process_max_memory_limit_percent`
|
||||
|
||||
Default:80 (%)
|
||||
Default: 80 (%)
|
||||
|
||||
The percentage of the upper memory limit occupied by all imported threads on a single node, the default is 80%
|
||||
|
||||
@ -708,25 +708,25 @@ Set these default values very large, because we don't want to affect load perfor
|
||||
|
||||
### `log_buffer_level`
|
||||
|
||||
Default:empty
|
||||
Default: empty
|
||||
|
||||
The log flushing strategy is kept in memory by default
|
||||
|
||||
### `madvise_huge_pages`
|
||||
|
||||
Default:false
|
||||
Default: false
|
||||
|
||||
Whether to use linux memory huge pages, not enabled by default
|
||||
|
||||
### `make_snapshot_worker_count`
|
||||
|
||||
Default:5
|
||||
Default: 5
|
||||
|
||||
Number of threads making snapshots
|
||||
|
||||
### `max_client_cache_size_per_host`
|
||||
|
||||
Default:10
|
||||
Default: 10
|
||||
|
||||
The maximum number of client caches per host. There are multiple client caches in BE, but currently we use the same cache size configuration. If necessary, use different configurations to set up different client-side caches
|
||||
|
||||
@ -738,43 +738,43 @@ The maximum number of client caches per host. There are multiple client caches i
|
||||
|
||||
### `max_consumer_num_per_group`
|
||||
|
||||
Default:3
|
||||
Default: 3
|
||||
|
||||
The maximum number of consumers in a data consumer group, used for routine load
|
||||
|
||||
### `min_cumulative_compaction_num_singleton_deltas`
|
||||
|
||||
Default:5
|
||||
Default: 5
|
||||
|
||||
Cumulative compaction strategy: the minimum number of incremental files
|
||||
|
||||
### `max_cumulative_compaction_num_singleton_deltas`
|
||||
|
||||
Default:1000
|
||||
Default: 1000
|
||||
|
||||
Cumulative compaction strategy: the maximum number of incremental files
|
||||
|
||||
### `max_download_speed_kbps`
|
||||
|
||||
Default:50000 (KB/s)
|
||||
Default: 50000 (KB/s)
|
||||
|
||||
Maximum download speed limit
|
||||
|
||||
### `max_free_io_buffers`
|
||||
|
||||
Default:128
|
||||
Default: 128
|
||||
|
||||
For each io buffer size, the maximum number of buffers that IoMgr will reserve ranges from 1024B to 8MB buffers, up to about 2GB buffers.
|
||||
|
||||
### `max_garbage_sweep_interval`
|
||||
|
||||
Default:3600
|
||||
Default: 3600
|
||||
|
||||
The maximum interval for disk garbage cleaning, the default is one hour
|
||||
|
||||
### `max_memory_sink_batch_count`
|
||||
|
||||
Default:20
|
||||
Default: 20
|
||||
|
||||
The maximum external scan cache batch count, which means that the cache max_memory_cache_batch_count * batch_size row, the default is 20, and the default value of batch_size is 1024, which means that 20 * 1024 rows will be cached
|
||||
|
||||
@ -800,7 +800,7 @@ The maximum external scan cache batch count, which means that the cache max_memo
|
||||
|
||||
### `max_runnings_transactions_per_txn_map`
|
||||
|
||||
Default:100
|
||||
Default: 100
|
||||
|
||||
Max number of txns for every txn_partition_map in txn manager, this is a self protection to avoid too many txns saving in manager
|
||||
|
||||
@ -812,7 +812,7 @@ Max number of txns for every txn_partition_map in txn manager, this is a self pr
|
||||
|
||||
### `max_tablet_num_per_shard`
|
||||
|
||||
Default:1024
|
||||
Default: 1024
|
||||
|
||||
The number of sliced tablets, plan the layout of the tablet, and avoid too many tablet subdirectories in the repeated directory
|
||||
|
||||
@ -830,31 +830,31 @@ The number of sliced tablets, plan the layout of the tablet, and avoid too many
|
||||
|
||||
### `memory_limitation_per_thread_for_schema_change`
|
||||
|
||||
Default:2 (G)
|
||||
Default: 2 (G)
|
||||
|
||||
Maximum memory allowed for a single schema change task
|
||||
|
||||
### `memory_maintenance_sleep_time_s`
|
||||
|
||||
Default:10
|
||||
Default: 10
|
||||
|
||||
Sleep time (in seconds) between memory maintenance iterations
|
||||
|
||||
### `memory_max_alignment`
|
||||
|
||||
Default:16
|
||||
Default: 16
|
||||
|
||||
Maximum alignment memory
|
||||
|
||||
### `read_size`
|
||||
|
||||
Default:8388608
|
||||
Default: 8388608
|
||||
|
||||
The read size is the read size sent to the os. There is a trade-off between latency and the whole process, getting to keep the disk busy but not introducing seeks. For 8 MB reads, random io and sequential io have similar performance
|
||||
|
||||
### `min_buffer_size`
|
||||
|
||||
Default:1024
|
||||
Default: 1024
|
||||
|
||||
Minimum read buffer size (in bytes)
|
||||
|
||||
@ -873,19 +873,19 @@ Minimum read buffer size (in bytes)
|
||||
|
||||
### `min_file_descriptor_number`
|
||||
|
||||
Default:60000
|
||||
Default: 60000
|
||||
|
||||
The lower limit required by the file handle limit of the BE process
|
||||
|
||||
### `min_garbage_sweep_interval`
|
||||
|
||||
Default:180
|
||||
Default: 180
|
||||
|
||||
The minimum interval between disk garbage cleaning, time seconds
|
||||
|
||||
### `mmap_buffers`
|
||||
|
||||
Default:false
|
||||
Default: false
|
||||
|
||||
Whether to use mmap to allocate memory, not used by default
|
||||
|
||||
@ -897,67 +897,67 @@ Whether to use mmap to allocate memory, not used by default
|
||||
|
||||
### `num_disks`
|
||||
|
||||
Defalut:0
|
||||
Defalut: 0
|
||||
|
||||
Control the number of disks on the machine. If it is 0, it comes from the system settings
|
||||
|
||||
### `num_threads_per_core`
|
||||
|
||||
Default:3
|
||||
Default: 3
|
||||
|
||||
Control the number of threads that each core runs. Usually choose 2 times or 3 times the number of cores. This keeps the core busy without causing excessive jitter
|
||||
|
||||
### `num_threads_per_disk`
|
||||
|
||||
Default:0
|
||||
Default: 0
|
||||
|
||||
The maximum number of threads per disk is also the maximum queue depth of each disk
|
||||
|
||||
### `number_tablet_writer_threads`
|
||||
|
||||
Default:16
|
||||
Default: 16
|
||||
|
||||
Number of tablet write threads
|
||||
|
||||
### `path_gc_check`
|
||||
|
||||
Default:true
|
||||
Default: true
|
||||
|
||||
Whether to enable the recycle scan data thread check, it is enabled by default
|
||||
|
||||
### `path_gc_check_interval_second`
|
||||
|
||||
Default:86400
|
||||
Default: 86400
|
||||
|
||||
Recycle scan data thread check interval, in seconds
|
||||
|
||||
### `path_gc_check_step`
|
||||
|
||||
Default:1000
|
||||
Default: 1000
|
||||
|
||||
### `path_gc_check_step_interval_ms`
|
||||
|
||||
Default:10 (ms)
|
||||
Default: 10 (ms)
|
||||
|
||||
### `path_scan_interval_second`
|
||||
|
||||
Default:86400
|
||||
Default: 86400
|
||||
|
||||
### `pending_data_expire_time_sec`
|
||||
|
||||
Default:1800
|
||||
Default: 1800
|
||||
|
||||
The maximum duration of unvalidated data retained by the storage engine, the default unit: seconds
|
||||
|
||||
### `periodic_counter_update_period_ms`
|
||||
|
||||
Default:500
|
||||
Default: 500
|
||||
|
||||
Update rate counter and sampling counter cycle, default unit: milliseconds
|
||||
|
||||
### `plugin_path`
|
||||
|
||||
Default:${DORIS_HOME}/plugin
|
||||
Default: ${DORIS_HOME}/plugin
|
||||
|
||||
pliugin path
|
||||
|
||||
@ -969,43 +969,43 @@ pliugin path
|
||||
|
||||
### `pprof_profile_dir`
|
||||
|
||||
Default :${DORIS_HOME}/log
|
||||
Default : ${DORIS_HOME}/log
|
||||
|
||||
pprof profile save directory
|
||||
|
||||
### `priority_networks`
|
||||
|
||||
Default:empty
|
||||
Default: empty
|
||||
|
||||
Declare a selection strategy for those servers with many IPs. Note that at most one ip should match this list. This is a semicolon-separated list in CIDR notation, such as 10.10.10.0/24. If there is no IP matching this rule, one will be randomly selected
|
||||
|
||||
### `priority_queue_remaining_tasks_increased_frequency`
|
||||
|
||||
Default:512
|
||||
Default: 512
|
||||
|
||||
the increased frequency of priority for remaining tasks in BlockingPriorityQueue
|
||||
|
||||
### `publish_version_worker_count`
|
||||
|
||||
Default:8
|
||||
Default: 8
|
||||
|
||||
the count of thread to publish version
|
||||
|
||||
### `pull_load_task_dir`
|
||||
|
||||
Default:${DORIS_HOME}/var/pull_load
|
||||
Default: ${DORIS_HOME}/var/pull_load
|
||||
|
||||
Pull the directory of the laod task
|
||||
|
||||
### `push_worker_count_high_priority`
|
||||
|
||||
Default:3
|
||||
Default: 3
|
||||
|
||||
Import the number of threads for processing HIGH priority tasks
|
||||
|
||||
### `push_worker_count_normal_priority`
|
||||
|
||||
Default:3
|
||||
Default: 3
|
||||
|
||||
Import the number of threads for processing NORMAL priority tasks
|
||||
|
||||
@ -1024,43 +1024,43 @@ Import the number of threads for processing NORMAL priority tasks
|
||||
|
||||
### `release_snapshot_worker_count`
|
||||
|
||||
Default:5
|
||||
Default: 5
|
||||
|
||||
Number of threads releasing snapshots
|
||||
|
||||
### `report_disk_state_interval_seconds`
|
||||
|
||||
Default:60
|
||||
Default: 60
|
||||
|
||||
The interval time for the agent to report the disk status to FE, unit (seconds)
|
||||
|
||||
### `report_tablet_interval_seconds`
|
||||
|
||||
Default:60
|
||||
Default: 60
|
||||
|
||||
The interval time for the agent to report the olap table to the FE, in seconds
|
||||
|
||||
### `report_task_interval_seconds`
|
||||
|
||||
Default:10
|
||||
Default: 10
|
||||
|
||||
The interval time for the agent to report the task signature to FE, unit (seconds)
|
||||
|
||||
### `result_buffer_cancelled_interval_time`
|
||||
|
||||
Default:300
|
||||
Default: 300
|
||||
|
||||
Result buffer cancellation time (unit: second)
|
||||
|
||||
### `routine_load_thread_pool_size`
|
||||
|
||||
Default:10
|
||||
Default: 10
|
||||
|
||||
The thread pool size of the routine load task. This should be greater than the FE configuration'max_concurrent_task_num_per_be' (default 5)
|
||||
|
||||
### `row_nums_check`
|
||||
|
||||
Default:true
|
||||
Default: true
|
||||
|
||||
Check row nums for BE/CE and schema change. true is open, false is closed
|
||||
|
||||
@ -1073,7 +1073,7 @@ Check row nums for BE/CE and schema change. true is open, false is closed
|
||||
|
||||
### `scan_context_gc_interval_min`
|
||||
|
||||
Default:5
|
||||
Default: 5
|
||||
|
||||
This configuration is used for the context gc thread scheduling cycle. Note: The unit is minutes, and the default is 5 minutes
|
||||
|
||||
@ -1096,43 +1096,43 @@ This configuration is used for the context gc thread scheduling cycle. Note: The
|
||||
|
||||
### `small_file_dir`
|
||||
|
||||
Default:${DORIS_HOME}/lib/small_file/
|
||||
Default: ${DORIS_HOME}/lib/small_file/
|
||||
|
||||
Directory for saving files downloaded by SmallFileMgr
|
||||
|
||||
### `snapshot_expire_time_sec`
|
||||
|
||||
Default:172800
|
||||
Default: 172800
|
||||
|
||||
Snapshot file cleaning interval, default value: 48 hours
|
||||
|
||||
### `status_report_interval`
|
||||
|
||||
Default:5
|
||||
Default: 5
|
||||
|
||||
Interval between profile reports; unit: seconds
|
||||
|
||||
### `storage_flood_stage_left_capacity_bytes`
|
||||
|
||||
Default:1073741824
|
||||
Default: 1073741824
|
||||
|
||||
The min bytes that should be left of a data dir,default value:1G
|
||||
The min bytes that should be left of a data dir,default value:1G
|
||||
|
||||
### `storage_flood_stage_usage_percent`
|
||||
|
||||
Default:95 (95%)
|
||||
Default: 95 (95%)
|
||||
|
||||
The storage_flood_stage_usage_percent and storage_flood_stage_left_capacity_bytes configurations limit the maximum usage of the capacity of the data directory.
|
||||
|
||||
### `storage_medium_migrate_count`
|
||||
|
||||
Default:1
|
||||
Default: 1
|
||||
|
||||
the count of thread to clone
|
||||
|
||||
### `storage_page_cache_limit`
|
||||
|
||||
Default:20%
|
||||
Default: 20%
|
||||
|
||||
Cache for storage page size
|
||||
|
||||
@ -1155,8 +1155,8 @@ Cache for storage page size
|
||||
|
||||
eg.2: `storage_root_path=/home/disk1/doris,medium:hdd,capacity:50;/home/disk2/doris,medium:ssd,capacity:50`
|
||||
|
||||
* 1./home/disk1/doris,medium:hdd,capacity:10,capacity limit is 10GB, HDD;
|
||||
* 2./home/disk2/doris,medium:ssd,capacity:50,capacity limit is 50GB, SSD;
|
||||
* 1./home/disk1/doris,medium:hdd,capacity:10,capacity limit is 10GB, HDD;
|
||||
* 2./home/disk2/doris,medium:ssd,capacity:50,capacity limit is 50GB, SSD;
|
||||
|
||||
* Default: ${DORIS_HOME}
|
||||
|
||||
@ -1189,13 +1189,13 @@ Some data formats, such as JSON, cannot be split. Doris must read all the data i
|
||||
|
||||
### `streaming_load_rpc_max_alive_time_sec`
|
||||
|
||||
Default:1200
|
||||
Default: 1200
|
||||
|
||||
The lifetime of TabletsChannel. If the channel does not receive any data at this time, the channel will be deleted, unit: second
|
||||
|
||||
### `sync_tablet_meta`
|
||||
|
||||
Default:false
|
||||
Default: false
|
||||
|
||||
Whether the storage engine opens sync and keeps it to the disk
|
||||
|
||||
@ -1213,37 +1213,37 @@ Log Level: INFO < WARNING < ERROR < FATAL
|
||||
|
||||
### `sys_log_roll_mode`
|
||||
|
||||
Default:SIZE-MB-1024
|
||||
Default: SIZE-MB-1024
|
||||
|
||||
The size of the log split, one log file is split every 1G
|
||||
|
||||
### `sys_log_roll_num`
|
||||
|
||||
Default:10
|
||||
Default: 10
|
||||
|
||||
Number of log files kept
|
||||
|
||||
### `sys_log_verbose_level`
|
||||
|
||||
Defaultl:10
|
||||
Defaultl: 10
|
||||
|
||||
Log display level, used to control the log output at the beginning of VLOG in the code
|
||||
|
||||
### `sys_log_verbose_modules`
|
||||
|
||||
Default:empty
|
||||
Default: empty
|
||||
|
||||
Log printing module, writing olap will only print the log under the olap module
|
||||
|
||||
### `tablet_map_shard_size`
|
||||
|
||||
Default:1
|
||||
Default: 1
|
||||
|
||||
tablet_map_lock fragment size, the value is 2^n, n=0,1,2,3,4, this is for better tablet management
|
||||
|
||||
### `tablet_meta_checkpoint_min_interval_secs`
|
||||
|
||||
Default:600(s)
|
||||
Default: 600(s)
|
||||
|
||||
The polling interval of the TabletMeta Checkpoint thread
|
||||
|
||||
@ -1257,7 +1257,7 @@ The polling interval of the TabletMeta Checkpoint thread
|
||||
|
||||
### `tablet_stat_cache_update_interval_second`
|
||||
|
||||
默认值:10
|
||||
默认值: 10
|
||||
|
||||
The minimum number of Rowsets for TabletMeta Checkpoint
|
||||
|
||||
@ -1271,7 +1271,7 @@ When writing is too frequent and the disk time is insufficient, you can configur
|
||||
|
||||
### `tablet_writer_open_rpc_timeout_sec`
|
||||
|
||||
Default:300
|
||||
Default: 300
|
||||
|
||||
Update interval of tablet state cache, unit: second
|
||||
|
||||
@ -1285,7 +1285,7 @@ When meet '[E1011]The server is overcrowded' error, you can tune the configurati
|
||||
|
||||
### `tc_free_memory_rate`
|
||||
|
||||
Default:20 (%)
|
||||
Default: 20 (%)
|
||||
|
||||
Available memory, value range: [0-100]
|
||||
|
||||
@ -1299,7 +1299,7 @@ If the system is found to be in a high-stress scenario and a large number of thr
|
||||
|
||||
### `tc_use_memory_min`
|
||||
|
||||
Default:10737418240
|
||||
Default: 10737418240
|
||||
|
||||
The minimum memory of TCmalloc, when the memory used is less than this, it is not returned to the operating system
|
||||
|
||||
@ -1311,13 +1311,13 @@ The minimum memory of TCmalloc, when the memory used is less than this, it is no
|
||||
|
||||
### `thrift_connect_timeout_seconds`
|
||||
|
||||
Default:3
|
||||
Default: 3
|
||||
|
||||
The default thrift client connection timeout time (unit: seconds)
|
||||
|
||||
### `thrift_rpc_timeout_ms`
|
||||
|
||||
Default:5000
|
||||
Default: 5000
|
||||
|
||||
thrift default timeout time, default: 5 seconds
|
||||
|
||||
@ -1338,43 +1338,43 @@ If the parameter is `THREAD_POOL`, the model is a blocking I/O model.
|
||||
|
||||
### `trash_file_expire_time_sec`
|
||||
|
||||
Default:259200
|
||||
Default: 259200
|
||||
|
||||
The interval for cleaning the recycle bin is 72 hours. When the disk space is insufficient, the file retention period under trash may not comply with this parameter
|
||||
|
||||
### `txn_commit_rpc_timeout_ms`
|
||||
|
||||
Default:10000
|
||||
Default: 10000
|
||||
|
||||
txn submit rpc timeout, the default is 10 seconds
|
||||
|
||||
### `txn_map_shard_size`
|
||||
|
||||
Default:128
|
||||
Default: 128
|
||||
|
||||
txn_map_lock fragment size, the value is 2^n, n=0,1,2,3,4. This is an enhancement to improve the performance of managing txn
|
||||
|
||||
### `txn_shard_size`
|
||||
|
||||
Default:1024
|
||||
Default: 1024
|
||||
|
||||
txn_lock shard size, the value is 2^n, n=0,1,2,3,4, this is an enhancement function that can improve the performance of submitting and publishing txn
|
||||
|
||||
### `unused_rowset_monitor_interval`
|
||||
|
||||
Default:30
|
||||
Default: 30
|
||||
|
||||
Time interval for clearing expired Rowset, unit: second
|
||||
|
||||
### `upload_worker_count`
|
||||
|
||||
Default:1
|
||||
Default: 1
|
||||
|
||||
Maximum number of threads for uploading files
|
||||
|
||||
### `use_mmap_allocate_chunk`
|
||||
|
||||
Default:false
|
||||
Default: false
|
||||
|
||||
Whether to use mmap to allocate blocks. If you enable this feature, it is best to increase the value of vm.max_map_count, its default value is 65530. You can use "sysctl -w vm.max_map_count=262144" or "echo 262144> /proc/sys/vm/" to operate max_map_count as root. When this setting is true, you must set chunk_reserved_bytes_limit to a relatively low Big number, otherwise the performance is very very bad
|
||||
|
||||
@ -1386,7 +1386,7 @@ udf function directory
|
||||
|
||||
### `webserver_num_workers`
|
||||
|
||||
Default:48
|
||||
Default: 48
|
||||
|
||||
Webserver default number of worker threads
|
||||
|
||||
@ -1398,7 +1398,7 @@ Webserver default number of worker threads
|
||||
|
||||
### `write_buffer_size`
|
||||
|
||||
Default:104857600
|
||||
Default: 104857600
|
||||
|
||||
The size of the buffer before flashing
|
||||
|
||||
@ -1486,7 +1486,7 @@ The default value is currently only an empirical value, and may need to be modif
|
||||
### `auto_refresh_brpc_channel`
|
||||
|
||||
* Type: bool
|
||||
* Description: When obtaining a brpc connection, judge the availability of the connection through hand_shake rpc, and re-establish the connection if it is not available 。
|
||||
* Description: When obtaining a brpc connection, judge the availability of the connection through hand_shake rpc, and re-establish the connection if it is not available .
|
||||
* Default value: false
|
||||
|
||||
### `high_priority_flush_thread_num_per_store`
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -159,7 +159,7 @@ The rules of dynamic partition are prefixed with `dynamic_partition.`:
|
||||
|
||||
The range of reserved history periods. It should be in the form of `[yyyy-MM-dd,yyyy-MM-dd],[...,...]` while the `dynamic_partition.time_unit` is "DAY, WEEK, and MONTH". And it should be in the form of `[yyyy-MM-dd HH:mm:ss,yyyy-MM-dd HH:mm:ss],[...,...]` while the dynamic_partition.time_unit` is "HOUR". And no more spaces expected. The default value is `"NULL"`, which means it is not set.
|
||||
|
||||
Let us give an example. Suppose today is 2021-09-06,partitioned by day, and the properties of dynamic partition are set to:
|
||||
Let us give an example. Suppose today is 2021-09-06,partitioned by day, and the properties of dynamic partition are set to:
|
||||
|
||||
```time_unit="DAY/WEEK/MONTH", end=3, start=-3, reserved_history_periods="[2020-06-01,2020-06-20],[2020-10-31,2020-11-15]"```.
|
||||
|
||||
|
||||
@ -43,7 +43,7 @@ LDAP group authorization, is to map the group in LDAP to the Role in Doris, if t
|
||||
|
||||
You need to configure the LDAP basic information in the fe/conf/ldap.conf file, and the LDAP administrator password needs to be set using sql statements.
|
||||
|
||||
#### Configure the fe/conf/ldap.conf file:
|
||||
#### Configure the fe/conf/ldap.conf file:
|
||||
* ldap_authentication_enabled = false
|
||||
Set the value to "true" to enable LDAP authentication; when the value is "false", LDAP authentication is not enabled and all other configuration items of this profile are invalid.Set the value to "true" to enable LDAP authentication; when the value is "false", LDAP authentication is not enabled and all other configuration items of this profile are invalid.
|
||||
|
||||
@ -66,7 +66,7 @@ You need to configure the LDAP basic information in the fe/conf/ldap.conf file,
|
||||
For example, if you use the LDAP user node uid attribute as the username to log into Doris, you can configure it as:
|
||||
ldap_user_filter = (&(uid={login}));
|
||||
This item can be configured using the LDAP user mailbox prefix as the user name:
|
||||
ldap_user_filter = (&(mail={login}@baidu.com))。
|
||||
ldap_user_filter = (&(mail={login}@baidu.com)).
|
||||
|
||||
* ldap_group_basedn = ou=group,dc=domain,dc=com
|
||||
base dn when Doris searches for group information in LDAP. if this item is not configured, LDAP group authorization will not be enabled.
|
||||
|
||||
@ -497,7 +497,7 @@ The following configuration belongs to the system level configuration of SyncJob
|
||||
|
||||
* `max_bytes_sync_commit`
|
||||
|
||||
The maximum size of the data when the transaction is committed. If the data size received by Fe is larger than it, it will immediately commit the transaction and send the accumulated data. The default value is 64MB. If you want to modify this configuration, please ensure that this value is greater than the product of `canal.instance.memory.buffer.size` and `canal.instance.memory.buffer.mmemunit` on the canal side (16MB by default) and `min_bytes_sync_commit`。
|
||||
The maximum size of the data when the transaction is committed. If the data size received by Fe is larger than it, it will immediately commit the transaction and send the accumulated data. The default value is 64MB. If you want to modify this configuration, please ensure that this value is greater than the product of `canal.instance.memory.buffer.size` and `canal.instance.memory.buffer.mmemunit` on the canal side (16MB by default) and `min_bytes_sync_commit`.
|
||||
|
||||
* `max_sync_task_threads_num`
|
||||
|
||||
|
||||
@ -301,7 +301,7 @@ The user can control the stop, pause and restart of the job by the three command
|
||||
|
||||
7. The difference between STOP and PAUSE
|
||||
|
||||
the FE will automatically clean up stopped ROUTINE LOAD,while paused ROUTINE LOAD can be resumed
|
||||
the FE will automatically clean up stopped ROUTINE LOAD,while paused ROUTINE LOAD can be resumed
|
||||
|
||||
## Related parameters
|
||||
|
||||
|
||||
@ -171,10 +171,10 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL`
|
||||
|
||||
+ two\_phase\_commit
|
||||
|
||||
Stream load supports the two-phase commit mode。The mode could be enabled by declaring ```two_phase_commit=true``` in http header. This mode is disabled by default.
|
||||
the two-phase commit mode means:During Stream load, after data is written, the message will be returned to the client, the data is invisible at this point and the transaction status is PRECOMMITTED. The data will be visible only after COMMIT is triggered by client。
|
||||
Stream load supports the two-phase commit mode.The mode could be enabled by declaring ```two_phase_commit=true``` in http header. This mode is disabled by default.
|
||||
the two-phase commit mode means: During Stream load, after data is written, the message will be returned to the client, the data is invisible at this point and the transaction status is PRECOMMITTED. The data will be visible only after COMMIT is triggered by client.
|
||||
|
||||
1. User can invoke the following interface to trigger commit operations for transaction:
|
||||
1. User can invoke the following interface to trigger commit operations for transaction:
|
||||
```
|
||||
curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" http://fe_host:http_port/api/{db}/_stream_load_2pc
|
||||
```
|
||||
@ -183,7 +183,7 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL`
|
||||
curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:commit" http://be_host:webserver_port/api/{db}/_stream_load_2pc
|
||||
```
|
||||
|
||||
2. User can invoke the following interface to trigger abort operations for transaction:
|
||||
2. User can invoke the following interface to trigger abort operations for transaction:
|
||||
```
|
||||
curl -X PUT --location-trusted -u user:passwd -H "txn_id:txnId" -H "txn_operation:abort" http://fe_host:http_port/api/{db}/_stream_load_2pc
|
||||
```
|
||||
@ -360,7 +360,7 @@ Cluster situation: The concurrency of Stream load is not affected by cluster siz
|
||||
|
||||
In the community version 0.14.0 and earlier versions, the connection reset exception occurred after Http V2 was enabled, because the built-in web container is tomcat, and Tomcat has pits in 307 (Temporary Redirect). There is a problem with the implementation of this protocol. All In the case of using Stream load to import a large amount of data, a connect reset exception will occur. This is because tomcat started data transmission before the 307 jump, which resulted in the lack of authentication information when the BE received the data request. Later, changing the built-in container to Jetty solved this problem. If you encounter this problem, please upgrade your Doris or disable Http V2 (`enable_http_server_v2=false`).
|
||||
|
||||
After the upgrade, also upgrade the http client version of your program to `4.5.13`,Introduce the following dependencies in your pom.xml file
|
||||
After the upgrade, also upgrade the http client version of your program to `4.5.13`,Introduce the following dependencies in your pom.xml file
|
||||
|
||||
```xml
|
||||
<dependency>
|
||||
|
||||
@ -32,9 +32,9 @@ If Doris' data disk capacity is not controlled, the process will hang because th
|
||||
|
||||
## Glossary
|
||||
|
||||
* FE:Doris Frontend Node. Responsible for metadata management and request access.
|
||||
* BE:Doris Backend Node. Responsible for query execution and data storage.
|
||||
* Data Dir:Data directory, each data directory specified in the `storage_root_path` of the BE configuration file `be.conf`. Usually a data directory corresponds to a disk, so the following **disk** also refers to a data directory.
|
||||
* FE: Doris Frontend Node. Responsible for metadata management and request access.
|
||||
* BE: Doris Backend Node. Responsible for query execution and data storage.
|
||||
* Data Dir: Data directory, each data directory specified in the `storage_root_path` of the BE configuration file `be.conf`. Usually a data directory corresponds to a disk, so the following **disk** also refers to a data directory.
|
||||
|
||||
## Basic Principles
|
||||
|
||||
@ -125,7 +125,7 @@ When the disk capacity is higher than High Watermark or even Flood Stage, many o
|
||||
When the BE has crashed because the disk is full and cannot be started (this phenomenon may occur due to untimely detection of FE or BE), you need to delete some temporary files in the data directory to ensure that the BE process can start.
|
||||
Files in the following directories can be deleted directly:
|
||||
|
||||
* log/:Log files in the log directory.
|
||||
* log/: Log files in the log directory.
|
||||
* snapshot/: Snapshot files in the snapshot directory.
|
||||
* trash/ Trash files in the trash directory.
|
||||
|
||||
|
||||
@ -124,9 +124,9 @@ There are many statistical information collected at BE. so we list the correspo
|
||||
- BytesReceived: Size of bytes received by network
|
||||
- DataArrivalWaitTime: Total waiting time of sender to push data
|
||||
- MergeGetNext: When there is a sort in the lower level node, exchange node will perform a unified merge sort and output an ordered result. This indicator records the total time consumption of merge sorting, including the time consumption of MergeGetNextBatch.
|
||||
- MergeGetNextBatch:It takes time for merge node to get data. If it is single-layer merge sort, the object to get data is network queue. For multi-level merge sorting, the data object is child merger.
|
||||
- MergeGetNextBatch: It takes time for merge node to get data. If it is single-layer merge sort, the object to get data is network queue. For multi-level merge sorting, the data object is child merger.
|
||||
- ChildMergeGetNext: When there are too many senders in the lower layer to send data, single thread merge will become a performance bottleneck. Doris will start multiple child merge threads to do merge sort in parallel. The sorting time of child merge is recorded, which is the cumulative value of multiple threads.
|
||||
- ChildMergeGetNextBatch: It takes time for child merge to get data,If the time consumption is too large, the bottleneck may be the lower level data sending node.
|
||||
- ChildMergeGetNextBatch: It takes time for child merge to get data,If the time consumption is too large, the bottleneck may be the lower level data sending node.
|
||||
- FirstBatchArrivalWaitTime: The time waiting for the first batch come from sender
|
||||
- DeserializeRowBatchTimer: Time consuming to receive data deserialization
|
||||
- SendersBlockedTotalTimer(*): When the DataStreamRecv's queue buffer is full, wait time of sender
|
||||
|
||||
@ -53,7 +53,7 @@ Commit is divided into ‘ title ’ and ‘ content ’ , the title should be l
|
||||
* deps: Modification of third-party dependency Library
|
||||
* community: Such as modification of Github issue template.
|
||||
|
||||
Some tips:
|
||||
Some tips:
|
||||
|
||||
1. If there are multiple types in one commit, multiple types need to be added
|
||||
2. If code refactoring brings performance improvement, [refactor][optimize] can be added at the same time
|
||||
@ -80,7 +80,7 @@ Commit is divided into ‘ title ’ and ‘ content ’ , the title should be l
|
||||
* config
|
||||
* docs
|
||||
|
||||
Some tips:
|
||||
Some tips:
|
||||
|
||||
1. Try to use options that already exist in the list. If you need to add, please update this document in time
|
||||
|
||||
@ -93,7 +93,7 @@ Commit is divided into ‘ title ’ and ‘ content ’ , the title should be l
|
||||
commit message should follow the following format:
|
||||
|
||||
```
|
||||
issue:#7777
|
||||
issue: #7777
|
||||
|
||||
your message
|
||||
```
|
||||
|
||||
@ -44,10 +44,10 @@ https://dist.apache.org/repos/dist/release/incubator/doris/
|
||||
For the first release, you need to copy the KEYS file as well. Then add it to the svn release.
|
||||
|
||||
```
|
||||
add 成功后就可以在下面网址上看到你发布的文件
|
||||
After add succeeds, you can see the files you published on the following website
|
||||
https://dist.apache.org/repos/dist/release/incubator/doris/0.xx.0-incubating/
|
||||
|
||||
稍等一段时间后,能在 apache 官网看到:
|
||||
After a while, you can see on the official website of Apache:
|
||||
http://www.apache.org/dist/incubator/doris/0.9.0-incubating/
|
||||
```
|
||||
|
||||
@ -150,7 +150,7 @@ Title:
|
||||
[ANNOUNCE] Apache Doris (incubating) 0.9.0 Release
|
||||
```
|
||||
|
||||
To mail:
|
||||
To mail:
|
||||
|
||||
```
|
||||
dev@doris.apache.org
|
||||
|
||||
@ -32,7 +32,7 @@ under the License.
|
||||
|
||||
1. Download the doris source code
|
||||
|
||||
URL:[apache/incubator-doris: Apache Doris (Incubating) (github.com)](https://github.com/apache/incubator-doris)
|
||||
URL: [apache/incubator-doris: Apache Doris (Incubating) (github.com)](https://github.com/apache/incubator-doris)
|
||||
|
||||
2. Install GCC 8.3.1+, Oracle JDK 1.8+, Python 2.7+, confirm that the gcc, java, python commands point to the correct version, and set the JAVA_HOME environment variable
|
||||
|
||||
@ -132,7 +132,7 @@ Need to create this folder, this is where the be data is stored
|
||||
mkdir -p /soft/be/storage
|
||||
```
|
||||
|
||||
3. Open vscode, and open the directory where the be source code is located. In this case, open the directory as **/home/workspace/incubator-doris/**,For details on how to vscode, refer to the online tutorial
|
||||
3. Open vscode, and open the directory where the be source code is located. In this case, open the directory as **/home/workspace/incubator-doris/**,For details on how to vscode, refer to the online tutorial
|
||||
|
||||
4. Install the vscode ms c++ debugging plug-in, the plug-in identified by the red box in the figure below
|
||||
|
||||
|
||||
@ -33,7 +33,7 @@ It can be used to test the performance of some parts of the BE storage layer (fo
|
||||
|
||||
## Compilation
|
||||
|
||||
1. To ensure that the environment has been able to successfully compile the Doris ontology, you can refer to [Installation and deployment] (https://doris.apache.org/master/en/installing/compilation.html)。
|
||||
1. To ensure that the environment has been able to successfully compile the Doris ontology, you can refer to [Installation and deployment] (https://doris.apache.org/master/en/installing/compilation.html).
|
||||
|
||||
2. Execute`run-be-ut.sh`
|
||||
|
||||
@ -53,9 +53,9 @@ The data set is generated according to the following rules.
|
||||
>int: Random in [1,1000000].
|
||||
|
||||
The data character set of string type is uppercase and lowercase English letters, and the length varies according to the type.
|
||||
> char: Length random in [1,8]。
|
||||
> varchar: Length random in [1,128]。
|
||||
> string: Length random in [1,100000]。
|
||||
> char: Length random in [1,8].
|
||||
> varchar: Length random in [1,128].
|
||||
> string: Length random in [1,100000].
|
||||
|
||||
`rows_number` indicates the number of rows of data, the default value is `10000`.
|
||||
|
||||
|
||||
@ -26,7 +26,7 @@ under the License.
|
||||
|
||||
# C++ Code Diagnostic
|
||||
|
||||
Doris support to use [Clangd](https://clangd.llvm.org/) and [Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/) to diagnostic code. Clangd and Clang-Tidy already has in [LDB-toolchain](https://doris.apache.org/zh-CN/installing/compilation-with-ldb-toolchain),also can install by self.
|
||||
Doris support to use [Clangd](https://clangd.llvm.org/) and [Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/) to diagnostic code. Clangd and Clang-Tidy already has in [LDB-toolchain](https://doris.apache.org/zh-CN/installing/compilation-with-ldb-toolchain),also can install by self.
|
||||
|
||||
### Clang-Tidy
|
||||
Clang-Tidy can do some diagnostic cofig, config file `.clang-tidy` is in Doris root path. Compared with vscode-cpptools, clangd can provide more powerful and accurate code jumping for vscode, and integrates the analysis and quick-fix functions of clang-tidy.
|
||||
|
||||
@ -46,16 +46,16 @@ under the License.
|
||||
Doris build against `thrift` 0.13.0 ( note : `Doris` 0.15 and later version build against `thrift` 0.13.0 , the previous version is still `thrift` 0.9.3)
|
||||
|
||||
Windows:
|
||||
1. Download:`http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.exe`
|
||||
2. Copy:copy the file to `./thirdparty/installed/bin`
|
||||
1. Download: `http://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.exe`
|
||||
2. Copy: copy the file to `./thirdparty/installed/bin`
|
||||
|
||||
MacOS:
|
||||
1. Download:`brew install thrift@0.13.0`
|
||||
2. Establish soft connection:
|
||||
1. Download: `brew install thrift@0.13.0`
|
||||
2. Establish soft connection:
|
||||
`mkdir -p ./thirdparty/installed/bin`
|
||||
`ln -s /opt/homebrew/Cellar/thrift@0.13.0/0.13.0/bin/thrift ./thirdparty/installed/bin/thrift`
|
||||
|
||||
Note:The error that the version cannot be found may be reported when MacOS execute `brew install thrift@0.13.0`. The solution is execute at the terminal as follows:
|
||||
Note: The error that the version cannot be found may be reported when MacOS execute `brew install thrift@0.13.0`. The solution is execute at the terminal as follows:
|
||||
1. `brew tap-new $USER/local-tap`
|
||||
2. `brew extract --version='0.13.0' thrift $USER/local-tap`
|
||||
3. `brew install thrift@0.13.0`
|
||||
|
||||
@ -47,7 +47,7 @@ Create `settings.json` in `.vscode/` , and set settings:
|
||||
|
||||
* `"java.configuration.runtimes"`
|
||||
* `"java.jdt.ls.java.home"` -- must set it to the directory of JDK11+, used for vscode-java plugin
|
||||
* `"maven.executable.path"` -- maven path,for maven-language-server plugin
|
||||
* `"maven.executable.path"` -- maven path,for maven-language-server plugin
|
||||
|
||||
example:
|
||||
|
||||
|
||||
@ -349,7 +349,7 @@ PROPERTIES (
|
||||
);
|
||||
```
|
||||
|
||||
Parameter Description:
|
||||
Parameter Description:
|
||||
|
||||
Parameter | Description
|
||||
---|---
|
||||
@ -378,7 +378,7 @@ PROPERTIES (
|
||||
);
|
||||
```
|
||||
|
||||
Parameter Description:
|
||||
Parameter Description:
|
||||
|
||||
Parameter | Description
|
||||
---|---
|
||||
|
||||
@ -81,14 +81,14 @@ Note: Executing `brew install thrift@0.13.0` on MacOS may report an error that t
|
||||
Reference link: `https://gist.github.com/tonydeng/02e571f273d6cce4230dc8d5f394493c`
|
||||
|
||||
Linux:
|
||||
1.Download source package:`wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
|
||||
2.Install dependencies:`yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
|
||||
1.Download source package: `wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
|
||||
2.Install dependencies: `yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
|
||||
3.`tar zxvf thrift-0.13.0.tar.gz`
|
||||
4.`cd thrift-0.13.0`
|
||||
5.`./configure --without-tests`
|
||||
6.`make`
|
||||
7.`make install`
|
||||
Check the version after installation is complete:thrift --version
|
||||
Check the version after installation is complete: thrift --version
|
||||
Note: If you have compiled Doris, you do not need to install thrift, you can directly use $DORIS_HOME/thirdparty/installed/bin/thrift
|
||||
```
|
||||
|
||||
|
||||
@ -49,7 +49,7 @@ CREATE TABLE IF NOT EXISTS `hive_bitmap_table`(
|
||||
|
||||
```
|
||||
|
||||
### Hive Bitmap UDF Usage:
|
||||
### Hive Bitmap UDF Usage:
|
||||
|
||||
Hive Bitmap UDF used in Hive/Spark
|
||||
|
||||
|
||||
@ -77,14 +77,14 @@ Note: Executing `brew install thrift@0.13.0` on MacOS may report an error that t
|
||||
Reference link: `https://gist.github.com/tonydeng/02e571f273d6cce4230dc8d5f394493c`
|
||||
|
||||
Linux:
|
||||
1.Download source package:`wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
|
||||
2.Install dependencies:`yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
|
||||
1.Download source package: `wget https://archive.apache.org/dist/thrift/0.13.0/thrift-0.13.0.tar.gz`
|
||||
2.Install dependencies: `yum install -y autoconf automake libtool cmake ncurses-devel openssl-devel lzo-devel zlib-devel gcc gcc-c++`
|
||||
3.`tar zxvf thrift-0.13.0.tar.gz`
|
||||
4.`cd thrift-0.13.0`
|
||||
5.`./configure --without-tests`
|
||||
6.`make`
|
||||
7.`make install`
|
||||
Check the version after installation is complete:thrift --version
|
||||
Check the version after installation is complete: thrift --version
|
||||
Note: If you have compiled Doris, you do not need to install thrift, you can directly use $DORIS_HOME/thirdparty/installed/bin/thrift
|
||||
```
|
||||
|
||||
|
||||
@ -61,7 +61,7 @@ Instructions:
|
||||
3. The UDF call type represented by `type` in properties is native by default. When using java UDF, it is transferred to `Java_UDF`.
|
||||
4. `name`: A function belongs to a DB and name is of the form`dbName`.`funcName`. When `dbName` is not explicitly specified, the db of the current session is used`dbName`.
|
||||
|
||||
Sample:
|
||||
Sample:
|
||||
```sql
|
||||
CREATE FUNCTION java_udf_add_one(int) RETURNS int PROPERTIES (
|
||||
"file"="file:///path/to/java-udf-demo-jar-with-dependencies.jar",
|
||||
|
||||
@ -46,17 +46,17 @@ Copy gensrc/proto/function_service.proto and gensrc/proto/types.proto to Rpc ser
|
||||
|
||||
- function_service.proto
|
||||
- PFunctionCallRequest
|
||||
- function_name:The function name, corresponding to the symbol specified when the function was created
|
||||
- args:The parameters passed by the method
|
||||
- context:Querying context Information
|
||||
- function_name:The function name, corresponding to the symbol specified when the function was created
|
||||
- args:The parameters passed by the method
|
||||
- context:Querying context Information
|
||||
- PFunctionCallResponse
|
||||
- result:Return result
|
||||
- status:Return Status, 0 indicates normal
|
||||
- result:Return result
|
||||
- status:Return Status, 0 indicates normal
|
||||
- PCheckFunctionRequest
|
||||
- function:Function related information
|
||||
- match_type:Matching type
|
||||
- function:Function related information
|
||||
- match_type:Matching type
|
||||
- PCheckFunctionResponse
|
||||
- status:Return status, 0 indicates normal
|
||||
- status:Return status, 0 indicates normal
|
||||
|
||||
### Generated interface
|
||||
|
||||
@ -65,9 +65,9 @@ Use protoc generate code, and specific parameters are viewed using protoc -h
|
||||
### Implementing an interface
|
||||
|
||||
The following three methods need to be implemented
|
||||
- fnCall:Used to write computational logic
|
||||
- checkFn:Used to verify function names, parameters, and return values when creating UDFs
|
||||
- handShake:Used for interface probe
|
||||
- fnCall:Used to write computational logic
|
||||
- checkFn:Used to verify function names, parameters, and return values when creating UDFs
|
||||
- handShake:Used for interface probe
|
||||
|
||||
## Create UDF
|
||||
|
||||
@ -81,10 +81,10 @@ PROPERTIES (["key"="value"][,...])
|
||||
```
|
||||
Instructions:
|
||||
|
||||
1. PROPERTIES中`symbol`Represents the name of the method passed by the RPC call, which must be set。
|
||||
2. PROPERTIES中`object_file`Represents the RPC service address. Currently, a single address and a cluster address in BRPC-compatible format are supported. Refer to the cluster connection mode[Format specification](https://github.com/apache/incubator-brpc/blob/master/docs/cn/client.md#%E8%BF%9E%E6%8E%A5%E6%9C%8D%E5%8A%A1%E9%9B%86%E7%BE%A4)。
|
||||
3. PROPERTIES中`type`Indicates the UDF call type, which is Native by default. Rpc is transmitted when Rpc UDF is used。
|
||||
4. name: A function belongs to a DB and name is of the form`dbName`.`funcName`. When `dbName` is not explicitly specified, the db of the current session is used`dbName`。
|
||||
1. PROPERTIES中`symbol`Represents the name of the method passed by the RPC call, which must be set.
|
||||
2. PROPERTIES中`object_file`Represents the RPC service address. Currently, a single address and a cluster address in BRPC-compatible format are supported. Refer to the cluster connection mode[Format specification](https://github.com/apache/incubator-brpc/blob/master/docs/cn/client.md#%E8%BF%9E%E6%8E%A5%E6%9C%8D%E5%8A%A1%E9%9B%86%E7%BE%A4).
|
||||
3. PROPERTIES中`type`Indicates the UDF call type, which is Native by default. Rpc is transmitted when Rpc UDF is used.
|
||||
4. name: A function belongs to a DB and name is of the form`dbName`.`funcName`. When `dbName` is not explicitly specified, the db of the current session is used`dbName`.
|
||||
|
||||
Sample:
|
||||
```sql
|
||||
|
||||
@ -215,8 +215,8 @@ See the section on `lower_case_table_names` variables in [Variables](../administ
|
||||
|
||||
**instructions**
|
||||
|
||||
* 1./home/disk1/doris,medium:hdd,capacity:10,capacity limit is 10GB, HDD;
|
||||
* 2./home/disk2/doris,medium:ssd,capacity:50,capacity limit is 50GB, SSD;
|
||||
* 1./home/disk1/doris,medium:hdd,capacity:10,capacity limit is 10GB, HDD;
|
||||
* 2./home/disk2/doris,medium:ssd,capacity:50,capacity limit is 50GB, SSD;
|
||||
|
||||
* BE webserver_port configuration
|
||||
|
||||
|
||||
@ -33,8 +33,8 @@ under the License.
|
||||
`BITMAP BITMAP_SUBSET_LIMIT(BITMAP src, BIGINT range_start, BIGINT cardinality_limit)`
|
||||
|
||||
Create subset of the BITMAP, begin with range from range_start, limit by cardinality_limit
|
||||
range_start:start value for the range
|
||||
cardinality_limit:subset upper limit
|
||||
range_start: start value for the range
|
||||
cardinality_limit: subset upper limit
|
||||
|
||||
## example
|
||||
|
||||
@ -50,7 +50,7 @@ mysql> select bitmap_to_string(bitmap_subset_limit(bitmap_from_string('1,2,3,4,5
|
||||
+-------+
|
||||
| value |
|
||||
+-------+
|
||||
| 4,5 |
|
||||
| 4,5 |
|
||||
+-------+
|
||||
```
|
||||
|
||||
|
||||
@ -31,7 +31,7 @@ under the License.
|
||||
`INT bit_length (VARCHAR str)`
|
||||
|
||||
|
||||
Return length of argument in bits。
|
||||
Return length of argument in bits.
|
||||
|
||||
## example
|
||||
|
||||
|
||||
@ -101,7 +101,7 @@ This section introduces the methods that can be used as analysis functions in Do
|
||||
|
||||
### AVG()
|
||||
|
||||
grammar:
|
||||
grammar:
|
||||
|
||||
```sql
|
||||
AVG([DISTINCT | ALL] *expression*) [OVER (*analytic_clause*)]
|
||||
@ -136,7 +136,7 @@ from int_t where property in ('odd','even');
|
||||
|
||||
### COUNT()
|
||||
|
||||
grammar:
|
||||
grammar:
|
||||
|
||||
```sql
|
||||
COUNT([DISTINCT | ALL] expression) [OVER (analytic_clause)]
|
||||
@ -173,7 +173,7 @@ from int_t where property in ('odd','even');
|
||||
|
||||
The DENSE_RANK() function is used to indicate the ranking. Unlike RANK(), DENSE_RANK() does not have vacant numbers. For example, if there are two parallel ones, the third number of DENSE_RANK() is still 2, and the third number of RANK() is 3.
|
||||
|
||||
grammar:
|
||||
grammar:
|
||||
|
||||
```sql
|
||||
DENSE_RANK() OVER(partition_by_clause order_by_clause)
|
||||
@ -202,7 +202,7 @@ The following example shows the ranking of the x column grouped by the property
|
||||
|
||||
FIRST_VALUE() returns the first value in the window range.
|
||||
|
||||
grammar:
|
||||
grammar:
|
||||
|
||||
```sql
|
||||
FIRST_VALUE(expr) OVER(partition_by_clause order_by_clause [window_clause])
|
||||
@ -224,7 +224,7 @@ We have the following data
|
||||
| Mats | Sweden | Tja |
|
||||
```
|
||||
|
||||
Use FIRST_VALUE() to group by country and return the value of the first greeting in each group:
|
||||
Use FIRST_VALUE() to group by country and return the value of the first greeting in each group:
|
||||
|
||||
```sql
|
||||
select country, name,
|
||||
@ -244,7 +244,7 @@ over (partition by country order by name, greeting) as greeting from mail_merge;
|
||||
|
||||
The LAG() method is used to calculate the value of several lines forward from the current line.
|
||||
|
||||
grammar:
|
||||
grammar:
|
||||
|
||||
```sql
|
||||
LAG (expr, offset, default) OVER (partition_by_clause order_by_clause)
|
||||
@ -274,7 +274,7 @@ order by closing_date;
|
||||
|
||||
LAST_VALUE() returns the last value in the window range. Contrary to FIRST_VALUE().
|
||||
|
||||
grammar:
|
||||
grammar:
|
||||
|
||||
```sql
|
||||
LAST_VALUE(expr) OVER(partition_by_clause order_by_clause [window_clause])
|
||||
@ -301,7 +301,7 @@ from mail_merge;
|
||||
|
||||
The LEAD() method is used to calculate the value of several rows from the current row.
|
||||
|
||||
grammar:
|
||||
grammar:
|
||||
|
||||
```sql
|
||||
LEAD (expr, offset, default]) OVER (partition_by_clause order_by_clause)
|
||||
@ -334,7 +334,7 @@ order by closing_date;
|
||||
|
||||
### MAX()
|
||||
|
||||
grammar:
|
||||
grammar:
|
||||
|
||||
```sql
|
||||
MAX([DISTINCT | ALL] expression) [OVER (analytic_clause)]
|
||||
@ -365,7 +365,7 @@ from int_t where property in ('prime','square');
|
||||
|
||||
### MIN()
|
||||
|
||||
grammar:
|
||||
grammar:
|
||||
|
||||
```sql
|
||||
MIN([DISTINCT | ALL] expression) [OVER (analytic_clause)]
|
||||
@ -398,7 +398,7 @@ from int_t where property in ('prime','square');
|
||||
|
||||
The RANK() function is used to indicate ranking. Unlike DENSE_RANK(), RANK() will have vacant numbers. For example, if there are two parallel 1s, the third number in RANK() is 3, not 2.
|
||||
|
||||
grammar:
|
||||
grammar:
|
||||
|
||||
```sql
|
||||
RANK() OVER(partition_by_clause order_by_clause)
|
||||
@ -427,7 +427,7 @@ select x, y, rank() over(partition by x order by y) as rank from int_t;
|
||||
|
||||
For each row of each Partition, an integer that starts from 1 and increases continuously is returned. Unlike RANK() and DENSE_RANK(), the value returned by ROW_NUMBER() will not be repeated or vacant, and is continuously increasing.
|
||||
|
||||
grammar:
|
||||
grammar:
|
||||
|
||||
```sql
|
||||
ROW_NUMBER() OVER(partition_by_clause order_by_clause)
|
||||
@ -452,7 +452,7 @@ select x, y, row_number() over(partition by x order by y) as rank from int_t;
|
||||
|
||||
### SUM()
|
||||
|
||||
grammar:
|
||||
grammar:
|
||||
|
||||
```sql
|
||||
SUM([DISTINCT | ALL] expression) [OVER (analytic_clause)]
|
||||
|
||||
@ -40,7 +40,7 @@ key:
|
||||
Super user rights:
|
||||
max_user_connections: Maximum number of connections.
|
||||
max_query_instances: Maximum number of query instance user can use when query.
|
||||
sql_block_rules: set sql block rules。After setting, if the query user execute match the rules, it will be rejected.
|
||||
sql_block_rules: set sql block rules.After setting, if the query user execute match the rules, it will be rejected.
|
||||
cpu_resource_limit: limit the cpu resource usage of a query. See session variable `cpu_resource_limit`.
|
||||
exec_mem_limit: Limit the memory usage of the query. See the description of the session variable `exec_mem_limit` for details. -1 means not set.
|
||||
load_mem_limit: Limit memory usage for imports. See the introduction of the session variable `load_mem_limit` for details. -1 means not set.
|
||||
|
||||
@ -78,7 +78,7 @@ under the License.
|
||||
Other properties: Other information necessary to access remote storage, such as authentication information.
|
||||
|
||||
7) Modify BE node attributes currently supports the following attributes:
|
||||
1. tag.location:Resource tag
|
||||
1. tag.location: Resource tag
|
||||
2. disable_query: Query disabled attribute
|
||||
3. disable_load: Load disabled attribute
|
||||
|
||||
|
||||
@ -199,7 +199,7 @@ under the License.
|
||||
9. Modify default buckets number of partition
|
||||
grammer:
|
||||
MODIFY DISTRIBUTION DISTRIBUTED BY HASH (k1[,k2 ...]) BUCKETS num
|
||||
note:
|
||||
note:
|
||||
1)Only support non colocate table with RANGE partition and HASH distribution
|
||||
|
||||
10. Modify table comment
|
||||
|
||||
@ -51,12 +51,12 @@ Grammar:
|
||||
|
||||
## example
|
||||
[CANCEL ALTER TABLE COLUMN]
|
||||
1. 撤销针对 my_table 的 ALTER COLUMN 操作。
|
||||
1. Cancel ALTER COLUMN operation for my_table.
|
||||
CANCEL ALTER TABLE COLUMN
|
||||
FROM example_db.my_table;
|
||||
|
||||
[CANCEL ALTER TABLE ROLLUP]
|
||||
1. 撤销 my_table 下的 ADD ROLLUP 操作。
|
||||
1. Cancel ADD ROLLUP operation for my_table.
|
||||
CANCEL ALTER TABLE ROLLUP
|
||||
FROM example_db.my_table;
|
||||
|
||||
|
||||
@ -79,7 +79,7 @@ CREATE [AGGREGATE] [ALIAS] FUNCTION function_name
|
||||
> "prepare_fn": Function signature of the prepare function for finding the entry from the dynamic library. This option is optional for custom functions
|
||||
>
|
||||
> "close_fn": Function signature of the close function for finding the entry from the dynamic library. This option is optional for custom functions
|
||||
> "type": Function type, RPC for remote udf, NATIVE for c++ native udf
|
||||
> "type": Function type, RPC for remote udf, NATIVE for c++ native udf
|
||||
|
||||
|
||||
|
||||
|
||||
@ -36,7 +36,7 @@ under the License.
|
||||
2. Baidu AFS: afs for Baidu. Only be used inside Baidu.
|
||||
3. Baidu Object Storage(BOS): BOS on Baidu Cloud.
|
||||
4. Apache HDFS.
|
||||
5. Amazon S3:Amazon S3。
|
||||
5. Amazon S3: Amazon S3.
|
||||
|
||||
### Syntax:
|
||||
|
||||
@ -137,14 +137,14 @@ under the License.
|
||||
read_properties:
|
||||
|
||||
Used to specify some special parameters.
|
||||
Syntax:
|
||||
Syntax:
|
||||
[PROPERTIES ("key"="value", ...)]
|
||||
|
||||
You can specify the following parameters:
|
||||
|
||||
line_delimiter: Used to specify the line delimiter in the load file. The default is `\n`. You can use a combination of multiple characters as the column separator.
|
||||
line_delimiter: Used to specify the line delimiter in the load file. The default is `\n`. You can use a combination of multiple characters as the column separator.
|
||||
|
||||
fuzzy_parse: Boolean type, true to indicate that parse json schema as the first line, this can make import more faster,but need all key keep the order of first line, default value is false. Only use for json format.
|
||||
fuzzy_parse: Boolean type, true to indicate that parse json schema as the first line, this can make import more faster,but need all key keep the order of first line, default value is false. Only use for json format.
|
||||
|
||||
jsonpaths: There are two ways to import json: simple mode and matched mode.
|
||||
simple mode: it is simple mode without setting the jsonpaths parameter. In this mode, the json data is required to be the object type. For example:
|
||||
@ -152,7 +152,7 @@ under the License.
|
||||
|
||||
matched mode: the json data is relatively complex, and the corresponding value needs to be matched through the jsonpaths parameter.
|
||||
|
||||
strip_outer_array: Boolean type, true to indicate that json data starts with an array object and flattens objects in the array object, default value is false. For example:
|
||||
strip_outer_array: Boolean type, true to indicate that json data starts with an array object and flattens objects in the array object, default value is false. For example:
|
||||
[
|
||||
{"k1" : 1, "v1" : 2},
|
||||
{"k1" : 3, "v1" : 4}
|
||||
@ -207,9 +207,9 @@ under the License.
|
||||
dfs.client.failover.proxy.provider: Specify the provider that client connects to namenode by default: org. apache. hadoop. hdfs. server. namenode. ha. Configured Failover ProxyProvider.
|
||||
4.4. Amazon S3
|
||||
|
||||
fs.s3a.access.key:AmazonS3的access key
|
||||
fs.s3a.secret.key:AmazonS3的secret key
|
||||
fs.s3a.endpoint:AmazonS3的endpoint
|
||||
fs.s3a.access.key: AmazonS3的access key
|
||||
fs.s3a.secret.key: AmazonS3的secret key
|
||||
fs.s3a.endpoint: AmazonS3的endpoint
|
||||
4.5. If using the S3 protocol to directly connect to the remote storage, you need to specify the following attributes
|
||||
|
||||
(
|
||||
@ -230,7 +230,7 @@ under the License.
|
||||
)
|
||||
fs.defaultFS: defaultFS
|
||||
hdfs_user: hdfs user
|
||||
namenode HA:
|
||||
namenode HA:
|
||||
By configuring namenode HA, new namenode can be automatically identified when the namenode is switched
|
||||
dfs.nameservices: hdfs service name, customize, eg: "dfs.nameservices" = "my_ha"
|
||||
dfs.ha.namenodes.xxx: Customize the name of a namenode, separated by commas. XXX is a custom name in dfs. name services, such as "dfs. ha. namenodes. my_ha" = "my_nn"
|
||||
|
||||
@ -76,12 +76,12 @@ under the License.
|
||||
|
||||
7. hdfs
|
||||
Specify to use libhdfs export to hdfs
|
||||
Grammar:
|
||||
Grammar:
|
||||
WITH HDFS ("key"="value"[,...])
|
||||
|
||||
The following parameters can be specified:
|
||||
fs.defaultFS: Set the fs such as:hdfs://ip:port
|
||||
hdfs_user:Specify hdfs user name
|
||||
fs.defaultFS: Set the fs such as:hdfs://ip:port
|
||||
hdfs_user:Specify hdfs user name
|
||||
|
||||
## example
|
||||
|
||||
|
||||
@ -162,10 +162,10 @@ Date class (DATE/DATETIME): 2017-10-03, 2017-06-13 12:34:03.
|
||||
NULL value: N
|
||||
|
||||
6. S3 Storage
|
||||
fs.s3a.access.key user AK,required
|
||||
fs.s3a.secret.key user SK,required
|
||||
fs.s3a.endpoint user endpoint,required
|
||||
fs.s3a.impl.disable.cache whether disable cache,default true,optional
|
||||
fs.s3a.access.key user AK,required
|
||||
fs.s3a.secret.key user SK,required
|
||||
fs.s3a.endpoint user endpoint,required
|
||||
fs.s3a.impl.disable.cache whether disable cache,default true,optional
|
||||
|
||||
'35;'35; example
|
||||
|
||||
|
||||
@ -29,7 +29,7 @@ under the License.
|
||||
|
||||
The `SELECT INTO OUTFILE` statement can export the query results to a file. Currently supports export to remote storage through Broker process, or directly through S3, HDFS protocol such as HDFS, S3, BOS and COS(Tencent Cloud) through the Broker process. The syntax is as follows:
|
||||
|
||||
Grammar:
|
||||
Grammar:
|
||||
query_stmt
|
||||
INTO OUTFILE "file_path"
|
||||
[format_as]
|
||||
@ -50,7 +50,7 @@ under the License.
|
||||
3. properties
|
||||
Specify the relevant attributes. Currently it supports exporting through the Broker process, or through the S3, HDFS protocol.
|
||||
|
||||
Grammar:
|
||||
Grammar:
|
||||
[PROPERTIES ("key"="value", ...)]
|
||||
The following parameters can be specified:
|
||||
column_separator: Specifies the exported column separator, defaulting to t. Supports invisible characters, such as'\x07'.
|
||||
@ -173,7 +173,7 @@ under the License.
|
||||
"AWS_SECRET_KEY" = "xxx",
|
||||
"AWS_REGION" = "bd"
|
||||
)
|
||||
The final generated file prefix is `my_file_{fragment_instance_id}_`。
|
||||
The final generated file prefix is `my_file_{fragment_instance_id}_`.
|
||||
|
||||
7. Use the s3 protocol to export to bos, and enable concurrent export of session variables.
|
||||
set enable_parallel_outfile = true;
|
||||
|
||||
@ -30,11 +30,11 @@ under the License.
|
||||
|
||||
The kafka partition and offset in the result show the currently consumed partition and the corresponding offset to be consumed.
|
||||
|
||||
grammar:
|
||||
grammar:
|
||||
SHOW [ALL] CREATE ROUTINE LOAD for load_name;
|
||||
|
||||
Description:
|
||||
`ALL`: optional,Is for getting all jobs, including history jobs
|
||||
Description:
|
||||
`ALL`: optional,Is for getting all jobs, including history jobs
|
||||
`load_name`: routine load name
|
||||
|
||||
## example
|
||||
|
||||
Reference in New Issue
Block a user