Files
doris/docs
wyb 4978bd6c81 [Spark load] Add resource manager (#3418)
1. User interface:

1.1 Spark resource management

Spark is used as an external computing resource in Doris to do ETL work. In the future, there may be other external resources that will be used in Doris, for example, MapReduce is used for ETL, Spark/GPU is used for queries, HDFS/S3  is used for external storage. We introduced resource management to manage these external resources used by Doris.

```sql
-- create spark resource
CREATE EXTERNAL RESOURCE resource_name
PROPERTIES 
(                 
  type = spark,
  spark_conf_key = spark_conf_value,
  working_dir = path,
  broker = broker_name,
  broker.property_key = property_value
)

-- drop spark resource
DROP RESOURCE resource_name

-- show resources
SHOW RESOURCES
SHOW PROC "/resources"

-- privileges
GRANT USAGE_PRIV ON RESOURCE resource_name TO user_identity
GRANT USAGE_PRIV ON RESOURCE resource_name TO ROLE role_name

REVOKE USAGE_PRIV ON RESOURCE resource_name FROM user_identity
REVOKE USAGE_PRIV ON RESOURCE resource_name FROM ROLE role_name
```



- CREATE EXTERNAL RESOURCE:

FOR user_name is optional. If there has, the external resource belongs to this user. If not, the external resource belongs to the system and all users are available.

PROPERTIES:
1. type: resource type. Only support spark now.
2. spark configuration: follow the standard writing of Spark configurations, refer to: https://spark.apache.org/docs/latest/configuration.html.
3. working_dir: optional, used to store ETL intermediate results in spark ETL.
4. broker: optional, used in spark ETL. The ETL intermediate results need to be read with the broker when pushed into BE.

Example: 

```sql
CREATE EXTERNAL RESOURCE "spark0"
PROPERTIES 
(                                                                             
  "type" = "spark",                   
  "spark.master" = "yarn",
  "spark.submit.deployMode" = "cluster",
  "spark.jars" = "xxx.jar,yyy.jar",
  "spark.files" = "/tmp/aaa,/tmp/bbb",
  "spark.yarn.queue" = "queue0",
  "spark.executor.memory" = "1g",
  "spark.hadoop.yarn.resourcemanager.address" = "127.0.0.1:9999",
  "spark.hadoop.fs.defaultFS" = "hdfs://127.0.0.1:10000",
  "working_dir" = "hdfs://127.0.0.1:10000/tmp/doris",
  "broker" = "broker0",
  "broker.username" = "user0",
  "broker.password" = "password0"
)
```



- SHOW RESOURCES:
General users can only see their own resources.
Admin and root users can show all resources.




1.2 Create spark load job

```sql
LOAD LABEL db_name.label_name 
(
  DATA INFILE ("/tmp/file1") INTO TABLE table_name, ...
)
WITH RESOURCE resource_name
[(key1 = value1, ...)]
[PROPERTIES (key2 = value2, ... )]
```

Example:

```sql
LOAD LABEL example_db.test_label 
(
  DATA INFILE ("hdfs:/127.0.0.1:10000/tmp/file1") INTO TABLE example_table
)
WITH RESOURCE "spark0"
(
  "spark.executor.memory" = "1g",
  "spark.files" = "/tmp/aaa,/tmp/bbb"
)
PROPERTIES ("timeout" = "3600")
```

The spark configurations in load stmt can override the existing configuration in the resource for temporary use.

#3010
2020-05-26 18:21:21 +08:00
..

Doris Document

Vuepress is used as our document site generator, configurations are in ./docs/.vuepress folder.

Getting Started

Download and install nodejs

npm config set registry https://registry.npm.taobao.org // Only if you are in Mainland China.
cd docs && npm install
npm run dev

Open your browser and navigate to localhost:8080/en/ or localhost:8080/zh-CN/.

Docs' Directories

  .
  ├─ docs/
  │  ├─ .vuepress
  │  │  ├─ dist // Built site files.
  │  │  ├─ public // Assets
  │  │  ├─ sidebar // Side bar configurations.
  │  │  │  ├─ en.js
  │  │  │  └─ zh-CN.js
  │  ├─ theme // Global styles and customizations.
  │  └─ config.js // Vuepress configurations.
  ├─ zh-CN/
  │  ├─ xxxx.md
  │  └─ README.md // Will be rendered as entry page.
  └─ en/
     ├─ one.md
     └─ README.md // Will be rendered as entry page.

Start Writing

  1. Write markdown files in multi languages and put them in separated folders ./en/ and ./zh-CN/. But they should be with the same name.

    .
    ├─ en/
    │  ├─ one.md
    │  └─ two.md
    └─ zh-CN/
    │  ├─ one.md
    │  └─ two.md
    
  2. Frontmatters like below should always be on the top of each file:

    ---
    {
        "title": "Backup and Recovery", // sidebar title
        "language": "en" // writing language
    }
    ---
    
  3. Assets are in .vuepress/public/.

    Assuming that there exists a png .vuepress/public/images/image_x.png, then it can be used like:

    ![alter text](/images/image_x.png)
    
  4. Remember to update the sidebar configurations in .vuepress/sidebar/ after adding a new file or a folder.

    Assuming that the directories are:

    .
    ├─ en/
    │  ├─ subfolder
    │  │  ├─ one.md
    │  │  └─ two.md
    │  └─ three.md
    └─ zh-CN/
       ├─ subfolder
       │  ├─ one.md
       │  └─ two.md
       └─ three.md
    

    Then the sidebar configurations would be like:

    // .vuepress/sidebar/en.js`
    module.exports = [
      {
        title: "subfolder name",
        directoryPath: "subfolder/",
        children: ["one", "two"]
      },
      "three"
    ]
    
    // .vuepress/sidebar/zh-CN.js
    module.exports = [
      {
        title: "文件夹名称",
        directoryPath: "subfolder/",
        children: ["one", "two"]
      },
      "three"
    ]
    
  5. Run npm run lint before starting a PR.

Surely that there will be lots of error logs if the mardown files are not following the rules, and these logs will all be printed in the console:


en/administrator-guide/alter-table/alter-table-bitmap-index.md:92 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "    ```"]
en/administrator-guide/alter-table/alter-table-rollup.md:45 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]
en/administrator-guide/alter-table/alter-table-rollup.md:77 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]
en/administrator-guide/alter-table/alter-table-rollup.md:178 MD046/code-block-style Code block style [Expected: fenced; Actual: indented]
en/administrator-guide/alter-table/alter-table-schema-change.md:50 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]
en/administrator-guide/alter-table/alter-table-schema-change.md:82 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]
en/administrator-guide/alter-table/alter-table-schema-change.md:127 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]
en/administrator-guide/alter-table/alter-table-schema-change.md:144 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]
en/administrator-guide/alter-table/alter-table-schema-change.md:153 MD040/fenced-code-language Fenced code blocks should have a language specified [Context: "```"]
en/administrator-guide/alter-table/alter-table-schema-change.md:199 MD046/code-block-style Code block style [Expected: fenced; Actual: indented]
en/administrator-guide/backup-restore.md:45:1 MD029/ol-prefix Ordered list item prefix [Expected: 1; Actual: 2; Style: 1/1/1]
en/administrator-guide/backup-restore.md:57:1 MD029/ol-prefix Ordered list item prefix [Expected: 1; Actual: 2; Style: 1/1/1]
en/administrator-guide/backup-restore.md:61:1 MD029/ol-prefix Ordered list item prefix [Expected: 1; Actual: 3; Style: 1/1/1]
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! docs@ lint: `markdownlint '**/*.md' -f`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the docs@ lint script.

We use Algolia DocSearch as our fulltext search engine.

One thing we need to do is that Config.json From DocSearch should be updated if a new language or branch is created.

For more detail of the docsearch's configuration, please refer to Configuration of DocSearch

Deployment

Just start a PR, and all things will be done automatically.

What Travis Does

Once a PR accepted, travis ci will be triggered to build and deploy the whole website within its own branch. Here is what .travis.yml does:

  1. Prepare nodejs and vuepress enviorment.

  2. Use current branch's name as the relative url path in .vuepress/config.js(which is the base property).

  3. Build the documents into a website all by vuepress.

  4. Fetch asf-site repo to local directory, and copy .vupress/dist/ into {BRANCH}/.

  5. Push the new site to asf-site repo with GitHub Token(which is preset in Travis console as a variable used in .travis.yml).

asf-site repository

Finally the asf-site repository will be like:

.
├─ master/
│  ├─ en/
│  │  ├─ subfolder
│  │  │  ├─ one.md
│  │  └─ three.md
│  └─ zh-CN/
│      ├─ subfolder
│      │  ├─ one.md
│      └─ three.md
├─ incubating-0.11/
│  ├─ en/
│  │  ├─ subfolder
│  │  │  ├─ one.md
│  │  └─ three.md
│  └─ zh-CN/
│      ├─ subfolder
│      │  ├─ one.md
│      └─ three.md
├─ index.html // user entry, and auto redirected to master folder
└─ versions.json // all versions that can be seleted on the website are defined here

And the versions.json is like:

{
  "en": [
    {
      "text": "Versions", // dropdown label
      "items": [
        {
          "text": "master", // dropdown-item label
          "link": "/../master/en/installing/compilation.html", // entry page for this version
          "target": "_blank"
        },
        {
          "text": "branch-0.11",
          "link": "/../branch-0.11/en/installing/compilation.html",
          "target": "_blank"
        }
      ]
    }
  ],
  "zh-CN": [
    {
      "text": "版本",
      "items": [
        {
          "text": "master",
          "link": "/../master/zh-CN/installing/compilation.html",
          "target": "_blank"
        },
        {
          "text": "branch-0.11",
          "link": "/../branch-0.11/zh-CN/installing/compilation.html",
          "target": "_blank"
        }
      ]
    }
  ]
}