24 KiB
title, language
| title | language |
|---|---|
| Variable | en |
Variable
This document focuses on currently supported variables.
Variables in Doris refer to variable settings in MySQL. However, some of the variables are only used to be compatible with some MySQL client protocols, and do not produce their actual meaning in the MySQL database.
Variable setting and viewing
View
All or specified variables can be viewed via SHOW VARIABLES [LIKE 'xxx'];. Such as:
SHOW VARIABLES;
SHOW VARIABLES LIKE '%time_zone%';
Settings
Some variables can be set at global-level or session-only. For global-level, the set value will be used in subsequent new session connections. For session-only, the variable only works for the current session.
For session-only, set by the SET var_name=xxx; statement. Such as:
SET exec_mem_limit = 137438953472;
SET forward_to_master = true;
SET time_zone = "Asia/Shanghai";
For global-level, set by SET GLOBAL var_name=xxx;. Such as:
SET GLOBAL exec_mem_limit = 137438953472
Note 1: Only ADMIN users can set variable at global-level. Note 2: Global-level variables do not affect variable values in the current session, only variables in new sessions.
Variables that support both session-level and global-level setting include:
time_zonewait_timeoutsql_modeenable_profilequery_timeoutexec_mem_limitbatch_sizeparallel_fragment_exec_instance_numparallel_exchange_instance_numallow_partition_column_nullableinsert_visible_timeout_msenable_fold_constant_by_be
Variables that support only global-level setting include:
default_rowset_type
At the same time, variable settings also support constant expressions. Such as:
SET exec_mem_limit = 10 * 1024 * 1024 * 1024;
SET forward_to_master = concat('tr', 'u', 'e');
Set variables in the query statement
In some scenarios, we may need to set variables specifically for certain queries. The SET_VAR hint sets the session value of a system variable temporarily (for the duration of a single statement). Examples:
SELECT /*+ SET_VAR(exec_mem_limit = 8589934592) */ name FROM people ORDER BY name;
SELECT /*+ SET_VAR(query_timeout = 1, enable_partition_cache=true) */ sleep(3);
Note that the comment must start with /*+ and can only follow the SELECT.
Supported variables
-
SQL_AUTO_IS_NULLUsed for compatible JDBC connection pool C3P0. No practical effect.
-
auto_increment_incrementUsed for compatibility with MySQL clients. No practical effect.
-
autocommitUsed for compatibility with MySQL clients. No practical effect.
-
batch_sizeUsed to specify the number of rows of a single packet transmitted by each node during query execution. By default, the number of rows of a packet is 1024 rows. That is, after the source node generates 1024 rows of data, it is packaged and sent to the destination node.
A larger number of rows will increase the throughput of the query in the case of scanning large data volumes, but may increase the query delay in small query scenario. At the same time, it also increases the memory overhead of the query. The recommended setting range is 1024 to 4096.
-
character_set_clientUsed for compatibility with MySQL clients. No practical effect.
-
character_set_connectionUsed for compatibility with MySQL clients. No practical effect.
-
character_set_resultsUsed for compatibility with MySQL clients. No practical effect.
-
character_set_serverUsed for compatibility with MySQL clients. No practical effect.
-
codegen_levelUsed to set the level of LLVM codegen. (Not currently in effect).
-
collation_connectionUsed for compatibility with MySQL clients. No practical effect.
-
collation_databaseUsed for compatibility with MySQL clients. No practical effect.
-
collation_serverUsed for compatibility with MySQL clients. No practical effect.
-
delete_without_partitionWhen set to true. When using the delete command to delete partition table data, no partition is required. The delete operation will be automatically applied to all partitions.
Note, however, that the automatic application to all partitions may cause the delete command to take a long time to trigger a large number of subtasks and cause a long time. If it is not necessary, it is not recommended to turn it on.
-
disable_colocate_joinControls whether the Colocation Join function is enabled. The default is false, which means that the feature is enabled. True means that the feature is disabled. When this feature is disabled, the query plan will not attempt to perform a Colocation Join.
-
enable_bucket_shuffle_joinControls whether the Bucket Shuffle Join function is enabled. The default is true, which means that the feature is enabled. False means that the feature is disabled. When this feature is disabled, the query plan will not attempt to perform a Bucket Shuffle Join.
-
disable_streaming_preaggregationsControls whether streaming pre-aggregation is turned on. The default is false, which is enabled. Currently not configurable and enabled by default.
-
enable_insert_strictUsed to set the
strictmode when loading data via INSERT statement. The default is false, which means that thestrictmode is not turned on. For an introduction to this mode, see here. -
enable_spillingUsed to set whether to enable external sorting. The default is false, which turns off the feature. This feature is enabled when the user does not specify a LIMIT condition for the ORDER BY clause and also sets
enable_spillingto true. When this feature is enabled, the temporary data is stored in thedoris-scratch/directory of the BE data directory and the temporary data is cleared after the query is completed.This feature is mainly used for sorting operations with large amounts of data using limited memory.
Note that this feature is experimental and does not guarantee stability. Please turn it on carefully.
-
exec_mem_limitUsed to set the memory limit for a single query. The default is 2GB, you can set it in B/K/KB/M/MB/G/GB/T/TB/P/PB, the default is B.
This parameter is used to limit the memory that can be used by an instance of a single query fragment in a query plan. A query plan may have multiple instances, and a BE node may execute one or more instances. Therefore, this parameter does not accurately limit the memory usage of a query across the cluster, nor does it accurately limit the memory usage of a query on a single BE node. The specific needs need to be judged according to the generated query plan.
Usually, only some blocking nodes (such as sorting node, aggregation node, and join node) consume more memory, while in other nodes (such as scan node), data is streamed and does not occupy much memory.
When a
Memory Exceed Limiterror occurs, you can try to increase the parameter exponentially, such as 4G, 8G, 16G, and so on. -
forward_to_masterThe user sets whether to forward some commands to the Master FE node for execution. The default is
true, which means no forwarding. There are multiple FE nodes in Doris, one of which is the Master node. Usually users can connect to any FE node for full-featured operation. However, some of detail information can only be obtained from the Master FE node.For example, the
SHOW BACKENDS;command, if not forwarded to the Master FE node, can only see some basic information such as whether the node is alive, and forwarded to the Master FE to obtain more detailed information including the node startup time and the last heartbeat time.The commands currently affected by this parameter are as follows:
-
SHOW FRONTEND;Forward to Master to view the last heartbeat information.
-
SHOW BACKENDS;Forward to Master to view startup time, last heartbeat information, and disk capacity information.
-
SHOW BROKERS;Forward to Master to view the start time and last heartbeat information.
-
SHOW TABLET;/ADMIN SHOW REPLICA DISTRIBUTION;/ADMIN SHOW REPLICA STATUS;Forward to Master to view the tablet information stored in the Master FE metadata. Under normal circumstances, the tablet information in different FE metadata should be consistent. When a problem occurs, this method can be used to compare the difference between the current FE and Master FE metadata.
-
SHOW PROC;Forward to Master to view information about the relevant PROC stored in the Master FE metadata. Mainly used for metadata comparison.
-
-
init_connectUsed for compatibility with MySQL clients. No practical effect.
-
interactive_timeoutUsed for compatibility with MySQL clients. No practical effect.
-
enable_profileUsed to set whether you need to view the profile of the query. The default is false, which means no profile is required.
By default, the BE sends a profile to the FE for viewing errors only if an error occurs in the query. A successful query will not send a profile. Sending a profile will incur a certain amount of network overhead, which is detrimental to a high concurrent query scenario.
When the user wants to analyze the profile of a query, the query can be sent after this variable is set to true. After the query is finished, you can view the profile on the web page of the currently connected FE:
fe_host:fe_http:port/queryIt will display the most recent 100 queries which
enable_profileis set to true. -
languageUsed for compatibility with MySQL clients. No practical effect.
-
licenseShow Doris's license. No other effect.
-
load_mem_limitUsed to specify the memory limit of the load operation. The default is 0, which means that this variable is not used, and
exec_mem_limitis used as the memory limit for the load operation.This variable is usually used for INSERT operations. Because the INSERT operation has both query and load part. If the user does not set this variable, the respective memory limits of the query and load part are
exec_mem_limit. Otherwise, the memory of query part of INSERT is limited toexec_mem_limit, and the load part is limited toload_mem_limit.For other load methods, such as BROKER LOAD, STREAM LOAD, the memory limit still uses
exec_mem_limit. -
lower_case_table_namesUsed to control whether the user table name is case-sensitive.
A value of 0 makes the table name case-sensitive. The default is 0.
When the value is 1, the table name is case insensitive. Doris will convert the table name to lowercase when storing and querying.
The advantage is that any case of table name can be used in one statement. The following SQL is correct:mysql> show tables; +------------------+ | Tables_ in_testdb| +------------------+ | cost | +------------------+ mysql> select * from COST where COst.id < 100 order by cost.id;The disadvantage is that the table name specified in the table creation statement cannot be obtained after table creation. The table name viewed by 'show tables' is lower case of the specified table name.
When the value is 2, the table name is case insensitive. Doris stores the table name specified in the table creation statement and converts it to lowercase for comparison during query.
The advantage is that the table name viewed by 'show tables' is the table name specified in the table creation statement;
The disadvantage is that only one case of table name can be used in the same statement. For example, the table name 'cost' can be used to query the 'cost' table:mysql> select * from COST where COST.id < 100 order by COST.id;This variable is compatible with MySQL and must be configured at cluster initialization by specifying
lower_case_table_names=in fe.conf. It cannot be modified by thesetstatement after cluster initialization is complete, nor can it be modified by restarting or upgrading the cluster.The system view table names in information_schema are case-insensitive and behave as 2 when the value of
lower_case_table_namesis 0.
Translated with www.DeepL.com/Translator (free version)
-
max_allowed_packetUsed for compatible JDBC connection pool C3P0. No practical effect.
-
max_pushdown_conditions_per_columnFor the specific meaning of this variable, please refer to the description of
max_pushdown_conditions_per_columnin BE Configuration. This variable is set to -1 by default, which means that the configuration value inbe.confis used. If the setting is greater than 0, the query in the current session will use the variable value, and ignore the configuration value inbe.conf. -
max_scan_key_numFor the specific meaning of this variable, please refer to the description of
doris_max_scan_key_numin BE Configuration. This variable is set to -1 by default, which means that the configuration value inbe.confis used. If the setting is greater than 0, the query in the current session will use the variable value, and ignore the configuration value inbe.conf. -
net_buffer_lengthUsed for compatibility with MySQL clients. No practical effect.
-
net_read_timeoutUsed for compatibility with MySQL clients. No practical effect.
-
net_write_timeoutUsed for compatibility with MySQL clients. No practical effect.
-
parallel_exchange_instance_numUsed to set the number of exchange nodes used by an upper node to receive data from the lower node in the execution plan. The default is -1, which means that the number of exchange nodes is equal to the number of execution instances of the lower nodes (default behavior). When the setting is greater than 0 and less than the number of execution instances of the lower node, the number of exchange nodes is equal to the set value.
In a distributed query execution plan, the upper node usually has one or more exchange nodes for receiving data from the execution instances of the lower nodes on different BEs. Usually the number of exchange nodes is equal to the number of execution instances of the lower nodes.
In some aggregate query scenarios, if the amount of data to be scanned at the bottom is large, but the amount of data after aggregation is small, you can try to modify this variable to a smaller value, which can reduce the resource overhead of such queries. Such as the scenario of aggregation query on the DUPLICATE KEY data model.
-
parallel_fragment_exec_instance_numFor the scan node, set its number of instances to execute on each BE node. The default is 1.
A query plan typically produces a set of scan ranges, the range of data that needs to be scanned. These data are distributed across multiple BE nodes. A BE node will have one or more scan ranges. By default, a set of scan ranges for each BE node is processed by only one execution instance. When the machine resources are abundant, you can increase the variable and let more execution instances process a set of scan ranges at the same time, thus improving query efficiency.
The number of scan instances determines the number of other execution nodes in the upper layer, such as aggregate nodes and join nodes. Therefore, it is equivalent to increasing the concurrency of the entire query plan execution. Modifying this parameter will help improve the efficiency of large queries, but larger values will consume more machine resources, such as CPU, memory, and disk IO.
-
query_cache_sizeUsed for compatibility with MySQL clients. No practical effect.
-
query_cache_typeUsed for compatible JDBC connection pool C3P0. No practical effect.
-
query_timeoutUsed to set the query timeout. This variable applies to all query statements in the current connection, as well as INSERT statements. The default is 5 minutes, in seconds.
-
resource_groupNot used.
-
send_batch_parallelismUsed to set the default parallelism for sending batch when execute InsertStmt operation, if the value for parallelism exceed
max_send_batch_parallelism_per_jobin BE config, then the coordinator BE will use the value ofmax_send_batch_parallelism_per_job. -
sql_modeUsed to specify SQL mode to accommodate certain SQL dialects. For the SQL mode, see here.
-
sql_safe_updatesUsed for compatibility with MySQL clients. No practical effect.
-
sql_select_limitUsed for compatibility with MySQL clients. No practical effect.
-
system_time_zoneDisplays the current system time zone. Cannot be changed.
-
time_zoneUsed to set the time zone of the current session. The time zone has an effect on the results of certain time functions. For the time zone, see here.
-
tx_isolationUsed for compatibility with MySQL clients. No practical effect.
-
tx_read_onlyUsed for compatibility with MySQL clients. No practical effect.
-
transaction_read_onlyUsed for compatibility with MySQL clients. No practical effect.
-
transaction_isolationUsed for compatibility with MySQL clients. No practical effect.
-
versionUsed for compatibility with MySQL clients. No practical effect.
-
performance_schemaUsed for compatibility with MySQL JDBC 8.0.16 or later version. No practical effect.
-
version_commentUsed to display the version of Doris. Cannot be changed.
-
wait_timeoutThe length of the connection used to set up an idle connection. When an idle connection does not interact with Doris for that length of time, Doris will actively disconnect the link. The default is 8 hours, in seconds.
-
default_rowset_typeUsed for setting the default storage format of Backends storage engine. Valid options: alpha/beta
-
use_v2_rollupUsed to control the sql query to use segment v2 rollup index to get data. This variable is only used for validation when upgrading to segment v2 feature. Otherwise, not recommended to use.
-
rewrite_count_distinct_to_bitmap_hllWhether to rewrite count distinct queries of bitmap and HLL types as bitmap_union_count and hll_union_agg.
-
prefer_join_methodWhen choosing the join method(broadcast join or shuffle join), if the broadcast join cost and shuffle join cost are equal, which join method should we prefer.
Currently, the optional values for this variable are "broadcast" or "shuffle".
-
allow_partition_column_nullableWhether to allow the partition column to be NULL when creating the table. The default is true, which means NULL is allowed. false means the partition column must be defined as NOT NULL.
-
insert_visible_timeout_msWhen execute insert statement, doris will wait for the transaction to commit and visible after the import is completed. This parameter controls the timeout of waiting for transaction to be visible. The default value is 10000, and the minimum value is 1000.
-
enable_exchange_node_parallel_mergeIn a sort query, when an upper level node receives the ordered data of the lower level node, it will sort the corresponding data on the exchange node to ensure that the final data is ordered. However, when a single thread merges multiple channels of data, if the amount of data is too large, it will lead to a single point of exchange node merge bottleneck.
Doris optimizes this part if there are too many data nodes in the lower layer. Exchange node will start multithreading for parallel merging to speed up the sorting process. This parameter is false by default, which means that exchange node does not adopt parallel merge sort to reduce the extra CPU and memory consumption.
-
extract_wide_range_exprUsed to control whether turn on the 'Wide Common Factors' rule. The value has two: true or false. On by default.
-
enable_fold_constant_by_beUsed to control the calculation method of constant folding. The default is
false, that is, calculation is performed inFE; if it is set totrue, it will be calculated byBEthroughRPCrequest. -
cpu_resource_limitUsed to limit the resource overhead of a query. This is an experimental feature. The current implementation is to limit the number of scan threads for a query on a single node. The number of scan threads is limited, and the data returned from the bottom layer slows down, thereby limiting the overall computational resource overhead of the query. Assuming it is set to 2, a query can use up to 2 scan threads on a single node.
This parameter will override the effect of
parallel_fragment_exec_instance_num. That is, assuming thatparallel_fragment_exec_instance_numis set to 4, and this parameter is set to 2. Then 4 execution instances on a single node will share up to 2 scanning threads.This parameter will be overridden by the
cpu_resource_limitconfiguration in the user property.The default is -1, which means no limit.
-
disable_join_reorderUsed to turn off all automatic join reorder algorithms in the system. There are two values: true and false.It is closed by default, that is, the automatic join reorder algorithm of the system is adopted. After set to true, the system will close all automatic sorting algorithms, adopt the original SQL table order, and execute join
-
enable_infer_predicateUsed to control whether to perform predicate derivation. There are two values: true and false. It is turned off by default, that is, the system does not perform predicate derivation, and uses the original predicate to perform related operations. After it is set to true, predicate expansion is performed.
-
return_object_data_as_binaryUsed to identify whether to return the bitmap/hll result in the select result. In the select into outfile statement, if the export file format is csv, the bimap/hll data will be base64-encoded, if it is the parquet file format, the data will be stored as a byte array -
block_encryption_modeThe block_encryption_mode variable controls the block encryption mode. The default setting is empty, when use AES equal toAES_128_ECB, when use SM4 equal toSM3_128_ECBavailable values:AES_128_ECB, AES_192_ECB, AES_256_ECB, AES_128_CBC, AES_192_CBC, AES_256_CBC, AES_128_CFB, AES_192_CFB, AES_256_CFB, AES_128_CFB1, AES_192_CFB1, AES_256_CFB1, AES_128_CFB8, AES_192_CFB8, AES_256_CFB8, AES_128_CFB128, AES_192_CFB128, AES_256_CFB128, AES_128_CTR, AES_192_CTR, AES_256_CTR, AES_128_OFB, AES_192_OFB, AES_256_OFB, SM4_128_ECB, SM4_128_CBC, SM4_128_CFB128, SM4_128_OFB, SM4_128_CTR, -
enable_infer_predicateUsed to control whether predicate deduction is performed. There are two values: true and false. It is turned off by default, and the system does not perform predicate deduction, and uses the original predicate for related operations. When set to true, predicate expansion occurs.