fix docs typo (#4725)

This commit is contained in:
Zhengguo Yang
2020-10-14 09:27:50 +08:00
committed by GitHub
parent dec91a3d43
commit 751aa05cc0
36 changed files with 95 additions and 95 deletions

View File

@ -28,7 +28,7 @@ under the License.
Meta Info Action is used to obtain metadata information in the cluster. Such as database list, table structure, etc.
## List Datbase
## List Database
### Request

View File

@ -38,7 +38,7 @@ Used to obtain the table structure information of the specified table. This inte
* `<db>`
Sepcify database
Specify database
* `<table>`

View File

@ -1,6 +1,6 @@
---
{
"title": "RPOFILE",
"title": "PROFILE",
"language": "en"
}
---

View File

@ -151,7 +151,7 @@ The detailed syntax for creating a routine load task can be connected to Doris a
* data\_source\_properties
The specific Kakfa partition can be specified in `data_source_properties`. If not specified, all partitions of the subscribed topic are consumed by default.
The specific Kafka partition can be specified in `data_source_properties`. If not specified, all partitions of the subscribed topic are consumed by default.
Note that when partition is explicitly specified, the load job will no longer dynamically detect changes to Kafka partition. If not specified, the partitions that need to be consumed are dynamically adjusted based on changes in the kafka partition.

View File

@ -307,6 +307,6 @@ Cluster situation: The concurrency of Stream load is not affected by cluster siz
Since Stream load is an HTTP protocol submission creation import task, HTTP Clients in various languages usually have their own request retry logic. After receiving the first request, the Doris system has started to operate Stream load, but because the result is not returned to the Client side in time, the Client side will retry to create the request. At this point, the Doris system is already operating on the first request, so the second request will be reported to Label Already Exists.
To sort out the possible methods mentioned above: Search FE Master's log with Label to see if there are two ``redirect load action to destination = ``redirect load action to destination'cases in the same Label. If so, the request is submitted repeatedly by the Client side.
To sort out the possible methods mentioned above: Search FE Master's log with Label to see if there are two ``redirect load action to destination = ``redirect load action to destination cases in the same Label. If so, the request is submitted repeatedly by the Client side.
It is suggested that the user calculate the approximate import time according to the data quantity of the current request, and change the request time-out time of the Client end according to the import time-out time, so as to avoid the request being submitted by the Client end many times.

View File

@ -334,7 +334,7 @@ MySQL [test]> desc advertiser_view_record;
In Doris, the result of `count(distinct)` aggregation is exactly the same as the result of `bitmap_union_count` aggregation. And `bitmap_union_count` is equal to the result of `bitmap_union` to calculate count, so if the query ** involves `count(distinct)`, you can speed up the query by creating a materialized view with `bitmap_union` aggregation.**
For this case, you can create a materialized view that accurately deduplicates `user_id` based on advertising and channel grouping.
For this case, you can create a materialized view that accurately deduplicate `user_id` based on advertising and channel grouping.
```
MySQL [test]> create materialized view advertiser_uv as select advertiser, channel, bitmap_union(to_bitmap(user_id)) from advertiser_view_record group by advertiser, channel;

View File

@ -32,7 +32,7 @@ For the time being, read the [Doris metadata design document](../../internal/met
## Important tips
* Current metadata design is not backward compatible. That is, if the new version has a new metadata structure change (you can see whether there is a new VERSION in the `FeMetaVersion. java'file in the FE code), it is usually impossible to roll back to the old version after upgrading to the new version. Therefore, before upgrading FE, be sure to test metadata compatibility according to the operations in the [Upgrade Document](../../installing/upgrade_EN.md).
* Current metadata design is not backward compatible. That is, if the new version has a new metadata structure change (you can see whether there is a new VERSION in the `FeMetaVersion.java` file in the FE code), it is usually impossible to roll back to the old version after upgrading to the new version. Therefore, before upgrading FE, be sure to test metadata compatibility according to the operations in the [Upgrade Document](../../installing/upgrade_EN.md).
## Metadata catalog structure

View File

@ -28,7 +28,7 @@ under the License.
This document mainly introduces Doris's monitoring items and how to collect and display them. And how to configure alarm (TODO)
[Dashborad template click download](https://grafana.com/dashboards/9734/revisions)
[Dashboard template click download](https://grafana.com/dashboards/9734/revisions)
> Note: Before 0.9.0 (excluding), please use revision 1. For version 0.9.x, use revision 2. For version 0.10.x, use revision 3.
@ -102,7 +102,7 @@ Users will see the following monitoring item results (for example, FE partial mo
...
```
This is a monitoring data presented in [Promethus Format] (https://prometheus.io/docs/practices/naming/). We take one of these monitoring items as an example to illustrate:
This is a monitoring data presented in [Prometheus Format] (https://prometheus.io/docs/practices/naming/). We take one of these monitoring items as an example to illustrate:
```
# HELP jvm_heap_size_bytes jvm heap stat
@ -133,9 +133,9 @@ Please start building the monitoring system after you have completed the deploym
Prometheus
1. Download the latest version of Proetheus on the [Prometheus Website] (https://prometheus.io/download/). Here we take version 2.3.2-linux-amd64 as an example.
1. Download the latest version of Prometheus on the [Prometheus Website] (https://prometheus.io/download/). Here we take version 2.3.2-linux-amd64 as an example.
2. Unzip the downloaded tar file on the machine that is ready to run the monitoring service.
3. Open the configuration file promethues.yml. Here we provide an example configuration and explain it (the configuration file is in YML format, pay attention to uniform indentation and spaces):
3. Open the configuration file prometheus.yml. Here we provide an example configuration and explain it (the configuration file is in YML format, pay attention to uniform indentation and spaces):
Here we use the simplest way of static files to monitor configuration. Prometheus supports a variety of [service discovery] (https://prometheus.io/docs/prometheus/latest/configuration/configuration/), which can dynamically sense the addition and deletion of nodes.
@ -180,9 +180,9 @@ Prometheus
```
4. start Promethues
4. start Prometheus
Start Promethues with the following command:
Start Prometheus with the following command:
`nohup ./prometheus --web.listen-address="0.0.0.0:8181" &`
@ -241,7 +241,7 @@ Prometheus
7. Configure Grafana
For the first landing, you need to set up the data source according to the prompt. Our data source here is Proetheus, which was configured in the previous step.
For the first landing, you need to set up the data source according to the prompt. Our data source here is Prometheus, which was configured in the previous step.
The Setting page of the data source configuration is described as follows:

View File

@ -1,6 +1,6 @@
---
{
"title": "Multi-tenancy(Exprimental)",
"title": "Multi-tenancy(Experimental)",
"language": "en"
}
---
@ -24,7 +24,7 @@ specific language governing permissions and limitations
under the License.
-->
# Multi-tenancy(Exprimental)
# Multi-tenancy(Experimental)
This function is experimental and is not recommended for use in production environment.
@ -179,7 +179,7 @@ The concrete structure is as follows:
Supports selecting multiple instances on the same machine. The general principle of selecting instance is to select be on different machines as much as possible and to make the number of be used on all machines as uniform as possible.
For use, each user and DB belongs to a cluster (except root). To create user and db, you first need to enter a cluster. When a cluster is created, the system defaults to the manager of the cluster, the superuser account. Supuser has the right to create db, user, and view the number of be nodes in the cluster to which it belongs. All non-root user logins must specify a cluster, namely `user_name@cluster_name`.
For use, each user and DB belongs to a cluster (except root). To create user and db, you first need to enter a cluster. When a cluster is created, the system defaults to the manager of the cluster, the superuser account. Superuser has the right to create db, user, and view the number of be nodes in the cluster to which it belongs. All non-root user logins must specify a cluster, namely `user_name@cluster_name`.
Only root users can view all clusters in the system through `SHOW CLUSTER', and can enter different clusters through @ different cluster names. User clusters are invisible except root.
@ -191,11 +191,11 @@ The concrete structure is as follows:
The process of cluster expansion is the same as that of cluster creation. BE instance on hosts that are not outside the cluster is preferred. The selected principles are the same as creating clusters.
5. 集群缩容、CLUSTER DECOMMISSION
5. Cluster and Shrinkage CLUSTER DECOMMISSION
Users can scale clusters by setting instance num of clusters.
Cluster shrinkage takes precedence over downlining instances on hosts with the largest number of BE instances.
Cluster shrinkage takes precedence over Shrinking instances on hosts with the largest number of BE instances.
Users can also directly use `ALTER CLUSTER DECOMMISSION BACKEND` to specify BE for cluster scaling.

View File

@ -314,7 +314,7 @@ Duplicate status view mainly looks at the status of the duplicate, as well as th
The figure above shows some additional information, including copy size, number of rows, number of versions, where the data path is located.
> Note: The contents of the `State'column shown here do not represent the health status of the replica, but the status of the replica under certain tasks, such as CLONE, SCHEMA CHANGE, ROLLUP, etc.
> Note: The contents of the `State` column shown here do not represent the health status of the replica, but the status of the replica under certain tasks, such as CLONE, SCHEMA CHANGE, ROLLUP, etc.
In addition, users can check the distribution of replicas in a specified table or partition by following commands.

View File

@ -142,10 +142,10 @@ PROPERTIES
`port`: The port of the external table, required.
`odbc_type`: Indicates the type of external table. Currently, Doris supports `MySQL` and `Oracle`. In the future, it may support more databases. The ODBC exteranl table referring to the resource is required. The old MySQL exteranl table referring to the resource is optional.
`odbc_type`: Indicates the type of external table. Currently, Doris supports `MySQL` and `Oracle`. In the future, it may support more databases. The ODBC external table referring to the resource is required. The old MySQL external table referring to the resource is optional.
`driver`: Indicates the driver dynamic library used by the ODBC external table.
The ODBC exteranl table referring to the resource is required. The old MySQL exteranl table referring to the resource is optional.
The ODBC external table referring to the resource is required. The old MySQL external table referring to the resource is optional.
For the usage of ODBC resource, please refer to [ODBC of Doris](../extending-doris/odbc-of-doris.html)

View File

@ -26,7 +26,7 @@ under the License.
# Statistics of query execution
This document focuses on introducing the **RuningProfle** which recorded runtime status of Doris in query execution. Using these statistical information, we can understand the execution of frgment to become a expert of Doris's **debugging and tuning**.
This document focuses on introducing the **Running Profile** which recorded runtime status of Doris in query execution. Using these statistical information, we can understand the execution of frgment to become a expert of Doris's **debugging and tuning**.
## Noun Interpretation

View File

@ -159,7 +159,7 @@ Note that the comment must start with /*+ and can only follow the SELECT.
* `enable_insert_strict`
Used to set the `strict` mode when loadingdata via INSERT statement. The default is false, which means that the `strict` mode is not turned on. For an introduction to this mode, see [here] (./load-data/insert-into-manual.md).
Used to set the `strict` mode when loading data via INSERT statement. The default is false, which means that the `strict` mode is not turned on. For an introduction to this mode, see [here] (./load-data/insert-into-manual.md).
* `enable_spilling`
@ -181,7 +181,7 @@ Note that the comment must start with /*+ and can only follow the SELECT.
* `forward_to_master`
The user sets whether to forward some commands to the Master FE node for execution. The default is false, which means no forwarding. There are multiple FE nodes in Doris, one of which is the Master node. Usually users can connect to any FE node for full-featured operation. However, some of detail informationcan only be obtained from the Master FE node.
The user sets whether to forward some commands to the Master FE node for execution. The default is false, which means no forwarding. There are multiple FE nodes in Doris, one of which is the Master node. Usually users can connect to any FE node for full-featured operation. However, some of detail information can only be obtained from the Master FE node.
For example, the `SHOW BACKENDS;` command, if not forwarded to the Master FE node, can only see some basic information such as whether the node is alive, and forwarded to the Master FE to obtain more detailed information including the node startup time and the last heartbeat time.

View File

@ -26,7 +26,7 @@ under the License.
# Publish of Apache Doris
Apache publishing must be at least an IPMC member, a commiter with Apache mailboxes, a role called release manager.
Apache publishing must be at least an IPMC member, a committer with Apache mailboxes, a role called release manager.
The general process of publication is as follows:
@ -170,7 +170,7 @@ Email address is apache's mailbox.
##### View and Output
The first line shows the name of the public key file (pubring. gpg), the second line shows the public key characteristics (4096 bits, Hash string and generation time), the third line shows the "user ID", and the fourth line shows the private key characteristics.
The first line shows the name of the public key file (pubring.gpg), the second line shows the public key characteristics (4096 bits, Hash string and generation time), the third line shows the "user ID", and the fourth line shows the private key characteristics.
```
$ gpg --list-keys

View File

@ -63,7 +63,7 @@ sha512sum --check apache-doris-a.b.c-incubating-src.tar.gz.sha512
## 3. Verify license header
Apache RAT is recommended to verify license headder, which can dowload as following command.
Apache RAT is recommended to verify license header, which can download as following command.
``` shell
wget http://mirrors.tuna.tsinghua.edu.cn/apache/creadur/apache-rat-0.13/apache-rat-0.13-bin.tar.gz

View File

@ -46,7 +46,7 @@ You will get logstash-output-doris-{version}.gem file in the same directory
### 3.Plug-in installation
copy logstash-output-doris-{version}.gem to the logstash installation directory
Excuting an order
Executing an order
`./bin/logstash-plugin install logstash-output-doris-{version}.gem`

View File

@ -33,7 +33,7 @@ Doris plugin framework supports install/uninstall custom plugins at runtime with
For example, the audit plugin worked after a request execution, it can obtain information related to a request (access user, request IP, SQL, etc...) and write the information into the specified table.
Differences from UDF:
* UDF is a function used for data calculation when SQL is executed. Plugin is additional function that is used to extend Doris with customized function, such as support different storage engines and different import ways, and plugin does't participate in data calculation when executing SQL.
* UDF is a function used for data calculation when SQL is executed. Plugin is additional function that is used to extend Doris with customized function, such as support different storage engines and different import ways, and plugin doesn't participate in data calculation when executing SQL.
* The execution cycle of UDF is limited to a SQL execution. The execution cycle of plugin may be the same as the Doris process.
* The usage scene is different. If you need to support special data algorithms when executing SQL, then UDF is recommended, if you need to run custom functions on Doris, or start a background thread to do tasks, then the use of plugin is recommended.

View File

@ -238,7 +238,7 @@ After the compilation is completed, the UDF dynamic link library is successfully
After following the above steps, you can get the UDF dynamic library (that is, the `.so` file in the compilation result). You need to put this dynamic library in a location that can be accessed through the HTTP protocol.
Then log in to the Doris system and create a UDF function in the mysql-client through the `CREATE FUNCTION` syntax. You need to have AMDIN authority to complete this operation. At this time, there will be a UDF created in the Doris system.
Then log in to the Doris system and create a UDF function in the mysql-client through the `CREATE FUNCTION` syntax. You need to have ADMIN authority to complete this operation. At this time, there will be a UDF created in the Doris system.
```
CREATE [AGGREGATE] FUNCTION

View File

@ -43,4 +43,4 @@ MySQL > select stddev_samp(scan_rows) from log_statis group by datetime;
+--------------------------+
```
## keyword
STDDEVu SAMP,STDDEV,SAMP
STDDEV SAMP,STDDEV,SAMP

View File

@ -30,7 +30,7 @@ under the License.
`BITMAP TO_BITMAP(expr)`
Convert an unsigned bigint (ranging from 0 to 18446744073709551615) to a bitmap containing that value. Mainly be used to load interger value into bitmap column, e.g.,
Convert an unsigned bigint (ranging from 0 to 18446744073709551615) to a bitmap containing that value. Mainly be used to load integer value into bitmap column, e.g.,
```
cat data | curl --location-trusted -u user:passwd -T - -H "columns: dt,page,user_id, user_id=to_bitmap(user_id)" http://host:8410/api/test/testDb/_stream_load

View File

@ -28,7 +28,7 @@ under the License.
## Description
### Syntax
'VARCHAR ST'u AsText (GEOMETRY geo)'
'VARCHAR ST_AsText (GEOMETRY geo)'
Converting a geometric figure into a WKT (Well Known Text) representation

View File

@ -28,7 +28,7 @@ under the License.
## Description
### Syntax
'GEOMETRY ST'u GeometryFromText (VARCHAR wkt)'
'GEOMETRY ST_GeometryFromText (VARCHAR wkt)'
Converting a WKT (Well Known Text) into a corresponding memory geometry

View File

@ -28,7 +28,7 @@ under the License.
## Description
### Syntax
'GEOMETRY ST'u Polygon (VARCHAR wkt)'
'GEOMETRY ST_Polygon (VARCHAR wkt)'
Converting a WKT (Well Known Text) into a corresponding polygon memory form

View File

@ -28,7 +28,7 @@ under the License.
## Description
### Syntax
'INT LOCATION (WARCHAR substrate, WARCHAR str [, INT pos]]'
'INT LOCATION (VARCHAR substrate, VARCHAR str [, INT pos]]'
Returns where substr appears in str (counting from 1). If the third parameter POS is specified, the position where substr appears is found from the string where STR starts with POS subscript. If not found, return 0

View File

@ -28,7 +28,7 @@ under the License.
## Description
### Syntax
'INT lower (WARCHAR str)'
'INT lower (VARCHAR str)'
Convert all strings in parameters to lowercase

View File

@ -34,7 +34,7 @@ SET PASSWORD [FOR user_identity] =
The SET PASSWORD command can be used to modify a user's login password. If the [FOR user_identity] field does not exist, modify the password of the current user.
Note that the user_identity here must match exactly the user_identity specified when creating a user using CREATE USER, otherwise the user will be reported as non-existent. If user_identity is not specified, the current user is'username'@'ip', which may not match any user_identity. The current user can be viewed through SHOW GRANTS.
Note that the user_identity here must match exactly the user_identity specified when creating a user using CREATE USER, otherwise the user will be reported as non-existent. If user_identity is not specified, the current user is 'username'@'ip', which may not match any user_identity. The current user can be viewed through SHOW GRANTS.
PASSWORD () input is a plaintext password, and direct use of strings, you need to pass the encrypted password.
If you change the password of other users, you need to have administrator privileges.

View File

@ -31,7 +31,7 @@ Syntax:
SET PROPERTY [FOR 'user'] 'key' = 'value' [, 'key' = 'value']
Set user attributes, including resources allocated to users, import cluster, etc. The user attributes set here are for user, not user_identity. That is to say, if two users'jack'@'%' and'jack'@'192%'are created through the CREATE USER statement, the SET PROPERTY statement can only be used for the jack user, not'jack'@'%' or'jack'@'192%'
Set user attributes, including resources allocated to users, import cluster, etc. The user attributes set here are for user, not user_identity. That is to say, if two users 'jack'@'%' and 'jack'@'192%'are created through the CREATE USER statement, the SET PROPERTY statement can only be used for the jack user, not 'jack'@'%' or 'jack'@'192%'
Importing cluster is only applicable to Baidu internal users.
@ -49,7 +49,7 @@ Quota.low: Resource allocation at low level.
Load_cluster. {cluster_name}. hadoop_palo_path: The Hadoop directory used by Palo needs to store ETL programs and intermediate data generated by ETL for Palo to import. After the import is completed, the intermediate data will be automatically cleaned up, and the ETL program will be automatically reserved for next use.
Load_cluster. {cluster_name}. hadoop_configs: configuration of hadoop, where fs. default. name, mapred. job. tracker, hadoop. job. UGI must be filled in.
Load ucluster. {cluster name}. hadoop port: Hadoop HDFS name node http}
Load_cluster. {cluster_name}. hadoop_port: Hadoop HDFS name node http}
Default_load_cluster: The default import cluster.
## example
@ -61,11 +61,11 @@ SET PROPERTY FOR 'jack' 'max_user_connections' = '1000';
SET PROPERTY FOR 'jack' 'resource.cpu_share' = '1000';
3. Modify the weight of the normal group of Jack users
Set property for'jack''quota. normal' = 400';
Set property for 'jack''quota. normal' = 400';
4. Add import cluster for user jack
SET PROPERTY FOR 'jack'
'load 'cluster.{cluster name}.hadoop'u palo path' ='/user /palo /palo path',
'load 'cluster.{cluster name}.hadoop' palo path' ='/user /palo /palo path',
'load_cluster.{cluster_name}.hadoop_configs' = 'fs.default.name=hdfs://dpp.cluster.com:port;mapred.job.tracker=dpp.cluster.com:port;hadoop.job.ugi=user,password;mapred.job.queue.name=job_queue_name_in_hadoop;mapred.job.priority=HIGH;';
5. Delete the import cluster under user jack.

View File

@ -26,7 +26,7 @@ under the License.
# RESTORE
## Description
1. RESTOR
1. RESTORE
This statement is used to restore the data previously backed up by the BACKUP command to the specified database. This command is an asynchronous operation. After successful submission, you need to check progress through the SHOW RESTORE command. Restoring tables of OLAP type is supported only.
Grammar:
SNAPSHOT RESTORE [dbu name].{snapshot name}
@ -47,7 +47,7 @@ Explain:
"Backup_timestamp" = "2018-05-04-16-45-08": specifies which version of the time to restore the corresponding backup must be filled in. This information can be obtained through the `SHOW SNAPSHOT ON repo;'statement.
"Replication_num" = "3": Specifies the number of replicas of the restored table or partition. The default is 3. If an existing table or partition is restored, the number of copies must be the same as the number of copies of an existing table or partition. At the same time, there must be enough hosts to accommodate multiple copies.
"Timeout" = "3600": Task timeout, default to one day. Unit seconds.
"Meta_version" = 40: Use the specified meta_version to read the previously backed up metadata. Note that as a temporary solution, this parameter is only used to restore the data backed up by the older version of Doris. The latest version of the backup data already contains metaversion, no need to specify.
"Meta_version" = 40: Use the specified meta_version to read the previously backed up metadata. Note that as a temporary solution, this parameter is only used to restore the data backed up by the older version of Doris. The latest version of the backup data already contains meta version, no need to specify.
## example
1. Restore backup table backup_tbl in snapshot_1 from example_repo to database example_db1 with the time version of "2018-05-04-16-45-08". Restore to one copy:

View File

@ -41,7 +41,7 @@ SHOW [FULL] [BUILTIN] FUNCTIONS [IN|FROM db] [LIKE 'function_pattern']
Look at all the custom(builtin) functions under the database. If the user specifies the database, then look at the corresponding database, otherwise directly query the database where the current session is located.
You need `SHOW'privileges for this database
You need `SHOW` privileges for this database
## example

View File

@ -33,7 +33,7 @@ Subsequent imports of new features will only be supported in STEAM LOAD, MINI LO
MINI LOAD is imported through HTTP protocol. Users can import without relying on Hadoop or Mysql client.
The user describes the import through HTTP protocol, and the data is streamed into Doris in the process of receiving http requests. After the ** import job is completed, the ** returns to the user the imported results.
* Note: In order to be compatible with the old version of mini load usage habits, users can still view the import results through the'SHOW LOAD'command.
* Note: In order to be compatible with the old version of mini load usage habits, users can still view the import results through the 'SHOW LOAD' command.
Grammar:
Import:
@ -49,13 +49,13 @@ HTTP Protocol Specification
Privilege Authentication Currently Doris uses the Basic mode of HTTP for privilege authentication. So you need to specify a username and password when importing
This way is to pass the password in plaintext, and does not support encrypted transmission for the time being.
Expect Doris needs to send an HTTP request with the'Expect'header information,'100-continue'.
Expect Doris needs to send an HTTP request with the 'Expect' header information,'100-continue'.
Why? Because we need to redirect the request, we have to transfer the data content before.
This can avoid causing multiple data transmission, thereby improving efficiency.
Content-Length Doris needs to send a request with the header'Content-Length'. If the content ratio is sent
'Content-Length'is less, so Doris believes that if there is a transmission problem, the submission task fails.
NOTE: If you send more data than'Content-Length', Doris reads only'Content-Length'.
Content-Length Doris needs to send a request with the header 'Content-Length'. If the content ratio is sent
'Content-Length' is less, so Doris believes that if there is a transmission problem, the submission task fails.
NOTE: If you send more data than 'Content-Length', Doris reads only 'Content-Length'.
Length content and import
@ -72,9 +72,9 @@ The specified method is comma-separated, such as columns = k1, k2, k3, K4
Column_separator: Used to specify the separator between columns, default is' t'
NOTE: Url encoding is required, for example
If you need to specify' t'as a separator, you should pass in'column_separator=% 09'
If you need to specify'x01'as a delimiter, you should pass in'column_separator=% 01'
If you need to specify','as a separator, you should pass in'column_separator=% 2c'
If you need to specify '\t' as a separator, you should pass in 'column_separator=% 09'
If you need to specify 'x01'as a delimiter, you should pass in 'column_separator=% 01'
If you need to specify','as a separator, you should pass in 'column_separator=% 2c'
Max_filter_ratio: Used to specify the maximum percentage allowed to filter irregular data, default is 0, not allowed to filter
@ -101,22 +101,22 @@ Although the information of mini load can be found in subsequent queries, it can
'35;'35; example
1. Import the data from the local file'testData'into the table of'testTbl' in the database'testDb'(the user is in defalut_cluster)
1. Import the data from the local file 'testData' into the table of 'testTbl' in the database 'testDb'(the user is in default_cluster)
curl --location-trusted -u root -T testData http://host:port/api/testDb/testTbl/_load?label=123
2. Import the data from the local file'testData'into the table of'testTbl' in the database'testDb'(the user is in test_cluster). The timeout time is 3600 seconds.
2. Import the data from the local file 'testData' into the table of 'testTbl' in the database'testDb'(the user is in test_cluster). The timeout time is 3600 seconds.
curl --location-trusted -u root@test_cluster:root -T testData http://fe.host:port/api/testDb/testTbl/_load?label=123&timeout=3600
3. Import data from the local file'testData'into the'testTbl' table in the database'testDb', allowing a 20% error rate (the user is in defalut_cluster)
3. Import data from the local file 'testData' into the 'testTbl' table in the database 'testDb', allowing a 20% error rate (the user is in default_cluster)
curl --location-trusted -u root -T testData http://host:port/api/testDb/testTbl/_load?label=123\&max_filter_ratio=0.2
4. Import the data from the local file'testData'into the table'testTbl' in the database'testDb', allowing a 20% error rate, and specify the column name of the file (the user is in defalut_cluster)
4. Import the data from the local file 'testData' into the table 'testTbl' in the database 'testDb', allowing a 20% error rate, and specify the column name of the file (the user is in default_cluster)
curl --location-trusted -u root -T testData http://host:port/api/testDb/testTbl/_load?label=123\&max_filter_ratio=0.2\&columns=k1,k2,k3
5. Import in streaming mode (user is in defalut_cluster)
5. Import in streaming mode (user is in default_cluster)
seq 1 10 | awk '{OFS="\t"}{print $1, $1 * 10}' | curl --location-trusted -u root -T - http://host:port/api/testDb/testTbl/_load?label=123
6. Import tables containing HLL columns, which can be columns in tables or columns in data to generate HLL columns (users are in defalut_cluster)
6. Import tables containing HLL columns, which can be columns in tables or columns in data to generate HLL columns (users are in default_cluster)
curl --location-trusted -u root -T testData http://host:port/api/testDb/testTbl/_load?label=123\&max_filter_ratio=0.2\&hll=hll_column1,k1:hll_column2,k2
\&columns=k1,k2,k3

View File

@ -35,23 +35,23 @@ curl --location-trusted -u user:passwd -XPOST http://host:port/api/{db}/_multi_c
curl --location-trusted -u user:passwd -XPOST http://host:port/api/{db}/_multi_desc?label=xxx
'MULTI LOAD'can support users to import multiple tables at the same time on the basis of'MINI LOAD'. The specific commands are shown above.
'/api/{db}/_multi_start'starts a multi-table import task
'/api/{db}/{table}/_load'adds a table to be imported to an import task. The main difference from'MINI LOAD' is that the'sub_label'parameter needs to be passed in.
'/api/{db}/_multi_commit'submits the entire multi-table import task and the background begins processing
'/api/{db}/_multi_abort'Abandons a multi-table import task
'/api/{db}/_multi_desc'shows the number of jobs submitted by a multi-table import task
'/api/{db}/_multi_start' starts a multi-table import task
'/api/{db}/{table}/_load' adds a table to be imported to an import task. The main difference from 'MINI LOAD' is that the 'sub_label' parameter needs to be passed in.
'/api/{db}/_multi_commit' submits the entire multi-table import task and the background begins processing
'/api/{db}/_multi_abort' Abandons a multi-table import task
'/api/{db}/_multi_desc' shows the number of jobs submitted by a multi-table import task
HTTP Protocol Specification
Privilege Authentication Currently Doris uses the Basic mode of HTTP for privilege authentication. So you need to specify a username and password when importing
This way is to pass passwords in plaintext, since we are all in the Intranet environment at present...
Expect Doris needs to send an HTTP request, and needs the'Expect'header information with the content of'100-continue'.
Expect Doris needs to send an HTTP request, and needs the 'Expect' header information with the content of'100-continue'.
Why? Because we need to redirect the request, we have to transfer the data content before.
This can avoid causing multiple data transmission, thereby improving efficiency.
Content-Length Doris needs to send a request with the header'Content-Length'. If the content ratio is sent
Content-Length Doris needs to send a request with the header 'Content-Length'. If the content ratio is sent
If'Content-Length'is less, Palo believes that if there is a transmission problem, the submission of the task fails.
NOTE: If you send more data than'Content-Length', Doris reads only'Content-Length'.
NOTE: If you send more data than 'Content-Length', Doris reads only 'Content-Length'.
Length content and import
Description of parameters:
@ -67,8 +67,8 @@ If it is not passed in, the column order in the file is considered to be the sam
The specified method is comma-separated, such as columns = k1, k2, k3, K4
Column_separator: Used to specify the separator between columns, default is' t'
NOTE: Url encoding is required, such as specifying't'as a delimiter.
Then you should pass in'column_separator=% 09'
NOTE: Url encoding is required, such as specifying '\t'as a delimiter.
Then you should pass in 'column_separator=% 09'
Max_filter_ratio: Used to specify the maximum percentage allowed to filter irregular data, default is 0, not allowed to filter
Custom specification should be as follows:'max_filter_ratio = 0.2', meaning that 20% error rate is allowed.
@ -86,19 +86,19 @@ Real import behavior will occur, and the amount of data in this way can not be t
'35;'35; example
1. Import the data from the local file'testData1'into the table of'testTbl1' in the database'testDb', and
Import the data from'testData2'into the table'testTbl2' in'testDb'(the user is in defalut_cluster)
1. Import the data from the local file 'testData1'into the table of 'testTbl1' in the database 'testDb', and
Import the data from 'testData2'into the table 'testTbl2' in 'testDb'(the user is in default_cluster)
curl --location-trusted -u root -XPOST http://host:port/api/testDb/_multi_start?label=123
curl --location-trusted -u root -T testData1 http://host:port/api/testDb/testTbl1/_load?label=123\&sub_label=1
curl --location-trusted -u root -T testData2 http://host:port/api/testDb/testTbl2/_load?label=123\&sub_label=2
curl --location-trusted -u root -XPOST http://host:port/api/testDb/_multi_commit?label=123
2. Multi-table Import Midway Abandon (User in defalut_cluster)
2. Multi-table Import Midway Abandon (User in default_cluster)
curl --location-trusted -u root -XPOST http://host:port/api/testDb/_multi_start?label=123
curl --location-trusted -u root -T testData1 http://host:port/api/testDb/testTbl1/_load?label=123\&sub_label=1
curl --location-trusted -u root -XPOST http://host:port/api/testDb/_multi_abort?label=123
3. Multi-table import to see how much content has been submitted (user is in defalut_cluster)
3. Multi-table import to see how much content has been submitted (user is in default_cluster)
curl --location-trusted -u root -XPOST http://host:port/api/testDb/_multi_start?label=123
curl --location-trusted -u root -T testData1 http://host:port/api/testDb/testTbl1/_load?label=123\&sub_label=1
curl --location-trusted -u root -XPOST http://host:port/api/testDb/_multi_desc?label=123

View File

@ -27,7 +27,7 @@ under the License.
# ROUTINE LOAD
## description
Routine Load function allows users to submit a resident load task, and continuously load data into Doris by continuously reading data from the specified data source. Currently, only text data format (CSV) data is loaded from Kakfa by means of no authentication or SSL authentication.
Routine Load function allows users to submit a resident load task, and continuously load data into Doris by continuously reading data from the specified data source. Currently, only text data format (CSV) data is loaded from Kafka by means of no authentication or SSL authentication.
Syntax:
@ -214,7 +214,7 @@ FROM data_source
`Kafka_broker_list`
Kafka's broker connection information. The format is ip:host. Multiple brokare separated by commas.
Kafka's broker connection information. The format is ip:host. Multiple brokers are separated by commas.
Example:
@ -234,9 +234,9 @@ FROM data_source
Offset can specify a specific offset from 0 or greater, or:
1) OFFSET_BEGINNING: Subscribe from the location where the data is avaie.
1) OFFSET_BEGINNING: Subscribe from the location where the data is available.
2) OFFSET_END: ​​Subscribe from the end.
2) OFFSET_END: Subscribe from the end.
If not specified, all partitions under topic are subscribed by default fromSET_END.
@ -253,7 +253,7 @@ FROM data_source
The function is equivalent to the "--property" parameter in the kafka shel
When the value of the parameter is a file, you need to add the keyword: "FILbefore the value.
When the value of the parameter is a file, you need to add the keyword: "FILE" before the value.
For information on how to create a file, see "HELP CREATE FILE;"
@ -266,7 +266,7 @@ FROM data_source
"property.ssl.ca.location" = "FILE:ca.pem"
```
1. When connecting to Kafka using SSL, you need to specify the follg parameters:
1. When connecting to Kafka using SSL, you need to specify the following parameters:
```
"property.security.protocol" = "ssl",
@ -278,9 +278,9 @@ FROM data_source
among them:
"property.security.protocol" and "property.ssl.ca.location" are requ to indicate the connection method is SSL and the location of the CA certate.
"property.security.protocol" and "property.ssl.ca.location" are required to indicate the connection method is SSL and the location of the CA certificate.
If the client authentication is enabled on the Kafka server, you alsod to set:
If the client authentication is enabled on the Kafka server, you also need to set:
```
"property.ssl.certificate.location"
@ -292,11 +292,11 @@ FROM data_source
2. Specify the default starting offset for kafka partition
If kafka_partitions/kafka_offsets is not specified, all partitions are umed by default, and you can specify kafka_default_offsets to specify the star offset. The default is OFFSET_END, which starts at the end of the substion.
If kafka_partitions/kafka_offsets is not specified, all partitions are unanmed by default, and you can specify kafka_default_offsets to specify the star offset. The default is OFFSET_END, which starts at the end of the subscription.
Values:
1) OFFSET_BEGINNING: Subscribe from the location where the data is avaie.
1) OFFSET_BEGINNING: Subscribe from the location where the data is available.
2) OFFSET_END: Subscribe from the end.
@ -309,8 +309,8 @@ FROM data_source
Integer class (TINYINT/SMALLINT/INT/BIGINT/LARGEINT): 1, 1000, 1234
Floating point class (FLOAT/DOUBLE/DECIMAL): 1.1, 0.23, .356
 
  Date class (DATE/DATETIME): 2017-10-03, 2017-06-13 12:34:03.
Date class (DATE/DATETIME): 2017-10-03, 2017-06-13 12:34:03.
String class (CHAR/VARCHAR) (without quotes): I am a student, a
@ -505,7 +505,7 @@ FROM data_source
]
}
7. Create a Kafka routine load task named test1 for the example_tbl of example_db. delete all data key colunms match v3 >100 key columns.
7. Create a Kafka routine load task named test1 for the example_tbl of example_db. delete all data key columns match v3 >100 key columns.
CREATE ROUTINE LOAD example_db.test1 ON example_tbl
WITH MERGE

View File

@ -38,7 +38,7 @@ SnapshotName: The name of the backup
DbName: Subordinate database
State: Current phase
PENDING: The initial state after submitting a job
SNAPSHOTING: In the execution snapshot
SNAPSHOTTING: In the execution snapshot
UPLOAD_SNAPSHOT: Snapshot completed, ready for upload
UPLOADING: Snapshot uploading
SAVE_META: Save job meta-information as a local file
@ -50,7 +50,7 @@ CreateTime: Task submission time
Snapshot Finished Time: Snapshot completion time
Upload Finished Time: Snapshot Upload Completion Time
FinishedTime: Job End Time
Unfinished Tasks: The unfinished sub-task ID is displayed in the SNAP HOTING and UPLOADING phases
Unfinished Tasks: The unfinished sub-task ID is displayed in the SNAPSHOTTING and UPLOADING phases
Status: Display failure information if the job fails
Timeout: Job timeout, per second

View File

@ -39,11 +39,11 @@ Timestamp: Time version of backup to be restored
DbName: Subordinate database
State: Current phase
PENDING: The initial state after submitting a job
SNAPSHOTING: In the execution snapshot
SNAPSHOTTING: In the execution snapshot
DOWNLOAD: The snapshot is complete, ready to download the snapshot in the warehouse
DOWNLOADING: Snapshot Download
COMMIT: Snapshot download completed, ready to take effect
COMMITING: In force
COMMITTING: In force
FINISHED: Operation Successful
CANCELLED: Job Failure
AllowLoad: Is import allowed on recovery (currently not supported)
@ -54,7 +54,7 @@ MetaPreparedTime: Metadata Readiness Completion Time
Snapshot Finished Time: Snapshot completion time
Download Finished Time: Snapshot download completion time
FinishedTime: Job End Time
Unfinished Tasks: The unfinished sub-task ID is displayed in the SNAP HOTING, DOWNLOADING, and COMMITING phases
Unfinished Tasks: The unfinished sub-task ID is displayed in the SNAPSHOTTING, DOWNLOADING, and COMMITTING phases
Status: Display failure information if the job fails
Timeout: Job timeout, per second

View File

@ -175,11 +175,11 @@ Where url is the url given by ErrorURL.
        
```Curl --location-trusted -u root -H "label:123" -H "where: k1=20180601" -T testData http://host:port/api/testDb/testTbl/_stream_load```
3. load the data from the local file 'testData' into the 'testTbl' table in the database 'testDb', allowing a 20% error rate (user is in defalut_cluster)
3. load the data from the local file 'testData' into the 'testTbl' table in the database 'testDb', allowing a 20% error rate (user is in default_cluster)
```Curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -T testData http://host:port/api/testDb/testTbl/_stream_load```
4. load the data from the local file 'testData' into the 'testTbl' table in the database 'testDb', allow a 20% error rate, and specify the column name of the file (user is in defalut_cluster)
4. load the data from the local file 'testData' into the 'testTbl' table in the database 'testDb', allow a 20% error rate, and specify the column name of the file (user is in default_cluster)
```Curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -H "columns: k2, k1, v1" -T testData http://host:port/api/testDb/testTbl/_stream_load```
@ -187,7 +187,7 @@ Where url is the url given by ErrorURL.
```Curl --location-trusted -u root -H "label:123" -H "max_filter_ratio:0.2" -H "partitions: p1, p2" -T testData http://host:port/api/testDb/testTbl/stream_load```
6. load using streaming mode (user is in defalut_cluster)
6. load using streaming mode (user is in default_cluster)
```Seq 1 10 | awk '{OFS="\t"}{print $1, $1 * 10}' | curl --location-trusted -u root -T - http://host:port/api/testDb/testTbl/_stream_load```