diff --git a/docs/en/docs/install/k8s-deploy/debug-crash.md b/docs/en/docs/install/k8s-deploy/debug-crash.md new file mode 100644 index 0000000000..911a65603c --- /dev/null +++ b/docs/en/docs/install/k8s-deploy/debug-crash.md @@ -0,0 +1,73 @@ +--- +{ + "title": "How to enter the container when the service crashes", + "language": "en" +} +--- + + + +In the k8s environment, the service will enter the `CrashLoopBackOff` state due to some unexpected things. You can view the pod status and pod_name under the specified namespace through the `kubectl get pod --namespace ${namespace}` command. + +In this state, the cause of the service problem cannot be determined simply by using the describe and logs commands. When the service enters the `CrashLoopBackOff` state, there needs to be a mechanism that allows the pod deploying the service to enter the `running` state so that users can enter the container for debugging through exec. + +doris-operator provides a `debug` running mode. In essence, the debug process occupies the active detection port of the corresponding node, bypasses the k8s active detection mechanism, and creates a smoothly running container environment to facilitate users to enter and locate problems. + +The following describes how to enter debug mode for manual debugging when the service enters `CrashLoopBackOff`, and how to return to normal startup state after solving the problem. + + + +## Start Debug mode + +When a pod of the service enters CrashLoopBackOff or cannot be started normally during normal operation, take the following steps to put the service into `debug` mode and manually start the service to find the problem. + +**1.Use the following command to add annotation to the pod with problems.** +```shell +$ kubectl annotate pod ${pod_name} --namespace ${namespace} selectdb.com.doris/runmode=debug +``` +When the service is restarted next time, the service will detect the annotation that identifies the `debug` mode startup, and will enter the `debug` mode to start, and the pod status will be `running`. + +**2.When the service enters `debug` mode, the pod of the service is displayed in a normal state. Users can enter the inside of the pod through the following command** + +```shell +$ kubectl --namespace ${namespace} exec -ti ${pod_name} bash +``` + +**3. Manually start the service under `debug`. When the user enters the pod, manually execute the `start_xx.sh` script by modifying the port of the corresponding configuration file. The script directory is under `/opt/apache-doris/xx/bin`.** + +FE needs to modify `query_port`, BE needs to modify `heartbeat_service_port` +The main purpose is to avoid misleading the flow by accessing the crashed node through service in `debug` mode. + +## Exit Debug mode + +When the service locates the problem, it needs to exit the `debug` operation. At this time, you only need to delete the corresponding pod according to the following command, and the service will start in the normal mode. +```shell +$ kubectl delete pod ${pod_name} --namespace ${namespace} +``` + + + +## 注意事项 + +**After entering the pod, you need to modify the port information of the configuration file before you can manually start the corresponding Doris component.** + +- FE needs to modify the `http_port=8030` configuration with the default path: `/opt/apache-doris/fe/conf/fe.conf`. +- BE needs to modify the `webserver_port=8040` configuration with the default path: `/opt/apache-doris/be/conf/be.conf`. + diff --git a/docs/en/docs/install/k8s-deploy/expansion-and-contraction.md b/docs/en/docs/install/k8s-deploy/expansion-and-contraction.md new file mode 100644 index 0000000000..46b54f2235 --- /dev/null +++ b/docs/en/docs/install/k8s-deploy/expansion-and-contraction.md @@ -0,0 +1,100 @@ +--- +{ + "title": "Service expansion and contraction", + "language": "en" +} +--- + + + +The expansion and contraction of Doris on K8S can be achieved by modifying the replicas field of the corresponding component of the DorisCluster resource. Modifications can be made by directly editing the corresponding resources, or by using commands. + +## Get DorisCluster resources + +Use the command `kubectl --namespace {namespace} get doriscluster` to get the name of the deployed DorisCluster (referred to as dcr) resource. In this article, we use doris as the namespace. + +```shell +$ kubectl --namespace doris get doriscluster +NAME FESTATUS BESTATUS CNSTATUS BROKERSTATUS +doriscluster-sample available available +``` + +## Expansion and contraction + +All K8S operation and maintenance operations are performed by modifying the resources to the final state, and the Operator service automatically completes the operation and maintenance. For expansion and contraction operations, you can directly enter the edit mode to modify the replicas value of the corresponding spec through `kubectl --namespace {namespace} edit doriscluster {dcr_name}`. After saving and exiting, Doris-Operator completes the operation and maintenance. You can also use the following commands to implement different components. Expansion and contraction. + +### FE expansion + +**1. Check the current number of FE services** + +```shell +$ kubectl --namespace doris get pods -l "app.kubernetes.io/component=fe" +NAME READY STATUS RESTARTS AGE +doriscluster-sample-fe-0 1/1 Running 0 10d +``` + +**2. Expansion FE** + +```shell +$ kubectl --namespace doris patch doriscluster doriscluster-sample --type merge --patch '{"spec":{"feSpec":{"replicas":3}}}' +``` + +**3. Check expansion results** +```shell +$ kubectl --namespace doris get pods -l "app.kubernetes.io/component=fe" +NAME READY STATUS RESTARTS AGE +doriscluster-sample-fe-2 1/1 Running 0 9m37s +doriscluster-sample-fe-1 1/1 Running 0 9m37s +doriscluster-sample-fe-0 1/1 Running 0 8m49s +``` + +### BE expansion + +**1. Check the current number of BE services** + +```shell +$ kubectl --namespace doris get pods -l "app.kubernetes.io/component=be" +NAME READY STATUS RESTARTS AGE +doriscluster-sample-be-0 1/1 Running 0 3d2h +``` + +**2. Expansion BE** + +```shell +$ kubectl --namespace doris patch doriscluster doriscluster-sample --type merge --patch '{"spec":{"beSpec":{"replicas":3}}}' +``` + +**3. Check expansion results** +```shell +$ kubectl --namespace doris get pods -l "app.kubernetes.io/component=be" +NAME READY STATUS RESTARTS AGE +doriscluster-sample-be-0 1/1 Running 0 3d2h +doriscluster-sample-be-2 1/1 Running 0 12m +doriscluster-sample-be-1 1/1 Running 0 12m +``` + +### Service contraction + +Regarding the issue of node shrinkage, Doris-Operator currently does not support the safe shutdown of nodes. Here, the purpose of reducing FE or BE can still be achieved by reducing the replicas attribute of the cluster component. Here, the node is directly stopped to achieve node shutdown. line, the current version of Doris-Operator fails to implement [decommission](../../sql-manual/sql-reference/Cluster-Management-Statements/ALTER-SYSTEM-DECOMMISSION-BACKEND) and goes offline after safely transferring the copy. This may cause some problems and precautions as follows: + +- If the BE node is rashly taken offline when there is a single copy of the table, there will definitely be data loss, so avoid this operation as much as possible. +- FE Follower nodes try to avoid being offline at will, which may cause metadata damage and affect services. +- FE Observer type nodes can be taken offline at will without risk. +- The CN node does not hold a copy of the data and can be offline at will. However, the remote data cache existing on the CN node will be lost, resulting in a certain performance regression in data query within a short period of time. diff --git a/docs/en/docs/install/k8s-deploy/root-user-use.md b/docs/en/docs/install/k8s-deploy/root-user-use.md new file mode 100644 index 0000000000..38bd73ed65 --- /dev/null +++ b/docs/en/docs/install/k8s-deploy/root-user-use.md @@ -0,0 +1,63 @@ +--- +{ + "title": "Used by Root users", + "language": "en" +} +--- + + + +Doris-Operator uses the root account without password mode when deploying and managing related service nodes. The username and password can only be reset after deployment. + +## Modify root account and password + +1. Refer to the [Privilege Management](../../admin-manual/privilege-ldap/user-privilege) document, modify or create the corresponding password or account name, and give the account the permission to manage nodes in Doris. + +2. An example of adding spec.adminUser.* to the configuration in the DorisCluster CRD file is as follows: + +```yaml + apiVersion: doris.selectdb.com/v1 + kind: DorisCluster + metadata: + annotations: + selectdb/doriscluster: doriscluster-sample + labels: + app.kubernetes.io/instance: doris-sample + name: doris-sample + namespace: doris + spec: + adminUser: + name: root + password: root_pwd +``` + +3. Update the new account and password to the deployed DorisCluster, and issue them through Doris-Operator so that each node can sense and take effect. Reference command: + +```shell + kubectl apply --namespace ${your_namespace} -f ${your_crd_yaml_file} +``` + +### Precautions + +- The cluster management account is root and has no password by default. +- The username and password can only be reset after successful deployment. During initial deployment, adding `adminUser` may cause startup exceptions. +- Modifying the user name and password is not a necessary operation. Only when the current cluster management user (default root) or password is modified in Doris, it needs to be issued through Doris-Operator. +- If you modify the user name `spec.adminUser.name`, you need to assign the new user the permission to manage Doris nodes. +- This operation restarts all nodes in sequence. diff --git a/docs/sidebars.json b/docs/sidebars.json index 9a16dab759..48b3c3eea5 100644 --- a/docs/sidebars.json +++ b/docs/sidebars.json @@ -28,7 +28,10 @@ "install/k8s-deploy/operator-deploy", "install/k8s-deploy/helm-chart-deploy", "install/k8s-deploy/network", - "install/k8s-deploy/persistent-volume" + "install/k8s-deploy/persistent-volume", + "install/k8s-deploy/root-user-use", + "install/k8s-deploy/expansion-and-contraction", + "install/k8s-deploy/debug-crash" ] }, { diff --git a/docs/zh-CN/docs/install/k8s-deploy/debug-crash.md b/docs/zh-CN/docs/install/k8s-deploy/debug-crash.md new file mode 100644 index 0000000000..f135f761ab --- /dev/null +++ b/docs/zh-CN/docs/install/k8s-deploy/debug-crash.md @@ -0,0 +1,73 @@ +--- +{ + "title": "服务 Crash 情况下如何进入容器", + "language": "zh-CN" +} +--- + + + +在 k8s 环境中服务因为一些预期之外的事情会进入 `CrashLoopBackOff` 状态,通过 `kubectl get pod --namespace ${namespace}` 命令可以查看指定 namespace 下的 pod 状态和 pod_name。 + +在这种状态下,单纯通过 describe 和 logs 命令无法判定服务出问题的原因。当服务进入 `CrashLoopBackOff` 状态时,需要有一种机制允许部署服务的 pod 进入 `running` 状态方便用户通过 exec 进入容器内进行 debug。 + +doris-operator 提供了 `debug` 的运行模式,本质上是通过 debug 进程占用对应节点的探活端口,绕过 k8s 探活机制,制造一个平稳运行的容器环境来方便用户进入并定位问题。 + +下面描述了当服务进入 `CrashLoopBackOff` 时如何进入 debug 模式进行人工 debug ,以及解决后如何恢复到正常启动状态。 + + + +## 启动 Debug 模式 + +当服务一个 pod 进入 CrashLoopBackOff 或者正常运行过程中无法再正常启动时,通过一下步骤让服务进入 `debug` 模式,进行手动启动服务查找问题。 + +**1.通过以下命令给运行有问题的 pod 进行添加 annnotation** +```shell +$ kubectl annotate pod ${pod_name} --namespace ${namespace} selectdb.com.doris/runmode=debug +``` +当服务进行下一次重启时候,服务会检测到标识 `debug` 模式启动的 annotation 就会进入 `debug` 模式启动,pod 状态为 `running`。 + +**2.当服务进入 `debug` 模式,此时服务的 pod 显示为正常状态,用户可以通过如下命令进入 pod 内部** + +```shell +$ kubectl --namespace ${namespace} exec -ti ${pod_name} bash +``` + +**3. `debug` 下手动启动服务,当用户进入 pod 内部,通过修改对应配置文件有关端口进行手动执行 `start_xx.sh` 脚本,脚本目录为 `/opt/apache-doris/xx/bin` 下。** + +FE 需要修改 `query_port`,BE 需要修改 `heartbeat_service_port` +主要是避免`debug`模式下还能通过 service 访问到 crash 的节点导致误导流。 + +## 退出 Debug 模式 + +当服务定位到问题后需要退出 `debug` 运行,此时只需要按照如下命令删除对应的 pod,服务就会按照正常的模式启动。 +```shell +$ kubectl delete pod ${pod_name} --namespace ${namespace} +``` + + + +## 注意事项 + +**进入 pod 内部后,需要修改配置文件的端口信息,才能手动启动 相应的 Doris 组件** + +- FE 需要修改默认路径为:`/opt/apache-doris/fe/conf/fe.conf` 的 `http_port=8030` 配置。 +- BE 需要修改默认路径为:`/opt/apache-doris/be/conf/be.conf` 的 `webserver_port=8040` 配置。 + diff --git a/docs/zh-CN/docs/install/k8s-deploy/expansion-and-contraction.md b/docs/zh-CN/docs/install/k8s-deploy/expansion-and-contraction.md new file mode 100644 index 0000000000..9659e0b478 --- /dev/null +++ b/docs/zh-CN/docs/install/k8s-deploy/expansion-and-contraction.md @@ -0,0 +1,100 @@ +--- +{ + "title": "服务扩缩容", + "language": "zh-CN" +} +--- + + + +Doris 在 K8S 之上的扩缩容可通过修改 DorisCluster 资源对应组件的 replicas 字段来实现。修改可直接编辑对应的资源,也可通过命令的方式。 + +## 获取 DorisCluster 资源 + +使用命令 `kubectl --namespace {namespace} get doriscluster` 获取已部署 DorisCluster (简称 dcr )资源的名称。本文中,我们以doris 为 namespace. + +```shell +$ kubectl --namespace doris get doriscluster +NAME FESTATUS BESTATUS CNSTATUS BROKERSTATUS +doriscluster-sample available available +``` + +## 扩缩容资源 + +K8S 所有运维操作通过修改资源为最终状态,由 Operator 服务自动完成运维。扩缩容操作可通过 `kubectl --namespace {namespace} edit doriscluster {dcr_name}` 直接进入编辑模式修改对应 spec 的 replicas 值,保存退出后 Doris-Operator 完成运维,也可以通过如下命令实现不同组件的扩缩容。 + +### FE 扩容 + +**1. 查看当前 FE 服务数量** + +```shell +$ kubectl --namespace doris get pods -l "app.kubernetes.io/component=fe" +NAME READY STATUS RESTARTS AGE +doriscluster-sample-fe-0 1/1 Running 0 10d +``` + +**2. 扩容 FE** + +```shell +$ kubectl --namespace doris patch doriscluster doriscluster-sample --type merge --patch '{"spec":{"feSpec":{"replicas":3}}}' +``` + +**3. 检测扩容结果** +```shell +$ kubectl --namespace doris get pods -l "app.kubernetes.io/component=fe" +NAME READY STATUS RESTARTS AGE +doriscluster-sample-fe-2 1/1 Running 0 9m37s +doriscluster-sample-fe-1 1/1 Running 0 9m37s +doriscluster-sample-fe-0 1/1 Running 0 8m49s +``` + +### BE 扩容 + +**1. 查看当前 BE 服务数量** + +```shell +$ kubectl --namespace doris get pods -l "app.kubernetes.io/component=be" +NAME READY STATUS RESTARTS AGE +doriscluster-sample-be-0 1/1 Running 0 3d2h +``` + +**2. 扩容 BE** + +```shell +$ kubectl --namespace doris patch doriscluster doriscluster-sample --type merge --patch '{"spec":{"beSpec":{"replicas":3}}}' +``` + +**3. 检测扩容结果** +```shell +$ kubectl --namespace doris get pods -l "app.kubernetes.io/component=be" +NAME READY STATUS RESTARTS AGE +doriscluster-sample-be-0 1/1 Running 0 3d2h +doriscluster-sample-be-2 1/1 Running 0 12m +doriscluster-sample-be-1 1/1 Running 0 12m +``` + +### 节点缩容 + +关于节点缩容问题,Doris-Operator 目前并不能很好的支持节点安全下线,在这里仍能够通过减少集群组件的 replicas 属性来实现减少 FE 或 BE 的目的,这里是直接 stop 节点来实现节点下线,当前版本的 Doris-Operator 并未能实现 [decommission](../../sql-manual/sql-reference/Cluster-Management-Statements/ALTER-SYSTEM-DECOMMISSION-BACKEND) 安全转移副本后下线。由此可能引发一些问题及其注意事项如下 + +- 表存在单副本情况下贸然下线 BE 节点,一定会有数据丢失,尽可能避免此操作。 +- FE Follower 节点尽量避免随意下线,可能带来元数据损坏影响服务。 +- FE Observer 类型节点可以随意下线,并无风险。 +- CN 节点不持有数据副本,可以随意下线,但因此会损失存在于该 CN 节点的远端数据缓存,导致数据查询短时间内存在一定的性能回退。 diff --git a/docs/zh-CN/docs/install/k8s-deploy/root-user-use.md b/docs/zh-CN/docs/install/k8s-deploy/root-user-use.md new file mode 100644 index 0000000000..b07ed2b2f7 --- /dev/null +++ b/docs/zh-CN/docs/install/k8s-deploy/root-user-use.md @@ -0,0 +1,63 @@ +--- +{ + "title": "Root 用户使用", + "language": "zh-CN" +} +--- + + + +Doris-Operator 在部署管理相关服务节点使用的是 root 账号无密码的模式。用户名密码只有在部署后才能重新设置。 + +## 修改 root 账号及其密码 + +1. 参阅 [权限管理](../../admin-manual/privilege-ldap/user-privilege) 文档,修改或创建相应密码或账户名,并在 Doris 中给予该账号管理节点的权限。 + +2. 在 DorisCluster CRD 文件中的配置添加 spec.adminUser.* 样例如下: + +```yaml + apiVersion: doris.selectdb.com/v1 + kind: DorisCluster + metadata: + annotations: + selectdb/doriscluster: doriscluster-sample + labels: + app.kubernetes.io/instance: doris-sample + name: doris-sample + namespace: doris + spec: + adminUser: + name: root + password: root_pwd +``` + +3. 将新的账号和密码更新到部署的 DorisCluster 中, 经过 Doris-Operator 下发,让各个节点感知并生效。参考命令: + +```shell + kubectl apply --namespace ${your_namespace} -f ${your_crd_yaml_file} +``` + +### 注意事项 + +- 集群管理账户是 root ,默认无密码。 +- 用户名密码只有在部署成功后才能重新设置。初次部署,添加 `adminUser` 可能会导致启动异常。 +- 修改用户名和密码并不是必须的操作,只有在 Doris 内修改了当前的集群管理的用户(默认 root )或密码时 需要通过 Doris-Operator 下发。 +- 如果修改用户名 `spec.adminUser.name` 需要给新的用户分配拥有管理 Doris 的节点的权限。 +- 此操作会依次重启所有节点。