[docs] (DebugPoints) Update docs about Debug Points (#28347)

---------

Co-authored-by: qinhao <qinhao@newland.com.cn>
This commit is contained in:
HowardQin
2023-12-25 09:33:47 +08:00
committed by GitHub
parent b7ae7a07c7
commit ff365ca130
2 changed files with 376 additions and 82 deletions

View File

@ -26,9 +26,17 @@ under the License.
# Debug Point
Debug point is used in code test. When enabling a debug point, it can run related code.
Debug point is a piece of code, inserted into FE or BE code, when program running into this code,
Both FE and BE support debug points.
it can change variables or behaviors of the program.
It is mainly used for unit test or regression test when it is impossible to trigger some exceptions through normal means.
Each debug point has a name, the name can be whatever you want, there are swithes to enable and disable debug points,
and you can also pass data to debug points.
Both FE and BE support debug point, and after inserting debug point code, recompilation of FE or BE is needed.
## Code Example
@ -36,8 +44,8 @@ FE example
```java
private Status foo() {
// dbug_fe_foo_do_nothing is the debug point name.
// When it's activeDebugPointUtil.isEnable("dbug_fe_foo_do_nothing") will return true.
// dbug_fe_foo_do_nothing is the debug point name
// when it's active, DebugPointUtil.isEnable("dbug_fe_foo_do_nothing") returns true
if (DebugPointUtil.isEnable("dbug_fe_foo_do_nothing")) {
return Status.Nothing;
}
@ -48,13 +56,13 @@ private Status foo() {
}
```
BE 桩子示例代码
BE example
```c++
void Status foo() {
// dbug_be_foo_do_nothing is the debug point name.
// When it's active,DEBUG_EXECUTE_IF will execute the code block.
DEBUG_EXECUTE_IF("dbug_be_foo_do_nothing", { return Status.Nothing; });
// dbug_be_foo_do_nothing is the debug point name
// when it's active, DBUG_EXECUTE_IF will execute the code block
DBUG_EXECUTE_IF("dbug_be_foo_do_nothing", { return Status.Nothing; });
do_foo_action();
@ -62,32 +70,36 @@ void Status foo() {
}
```
## Global config
To activate debug points, need set `enable_debug_points` to true.
## Global Config
`enable_debug_points` was located in FE's fe.conf and BE's be.conf。
To enable debug points globally, we need to set `enable_debug_points` to true,
`enable_debug_points` is located in FE's fe.conf and BE's be.conf.
## Enable Debug Point
## Activate A Specified Debug Point
After debug points are enabled globally, a http request with a debug point name should be send to FE or BE node, <br/>
only after that, when the program running into the specified debug point, related code can be executed.
### API
```
POST /api/debug_point/add/{debug_point_name}[?timeout=<int>&execute=<int>]
POST /api/debug_point/add/{debug_point_name}[?timeout=<int>&execute=<int>]
```
### Query Parameters
* `debug_point_name`
Debug point name. Require.
Debug point name. Required.
* `timeout`
Timeout in seconds. When timeout, the debug point will be disable. Default is -1, not timeout. Optional.
Timeout in seconds. When timeout, the debug point will be deactivated. Default is -1, never timeout. Optional.
* `execute`
Max active times。Default is -1, unlimit active times. Optional.
After activating, the max times the debug point can be executed. Default is -1, unlimited times. Optional.
### Request body
@ -96,24 +108,105 @@ None
### Response
```
{
msg: "OK",
code: 0
}
```
```
{
msg: "OK",
code: 0
}
```
### Examples
Enable debug point `foo`, activate no more than five times.
After activating debug point `foo`, executed no more than five times.
```
curl -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?execute=5"
```
curl -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?execute=5"
```
## Pass Custom Parameters
When activating debug point, besides "timeout" and "execute" mentioned above, passing custom parameters is also allowed.<br/>
A parameter is a key-value pair in the form of "key=value" in url path, after debug point name glued by charactor '?'.<br/>
See examples below.
### API
```
POST /api/debug_point/add/{debug_point_name}[?k1=v1&k2=v2&k3=v3...]
```
* `k1=v1` <br/>
k1 is parameter name <br/>
v1 is parameter value <br/>
multiple key-value pairs are concatenated by `&` <br/>
### Request body
None
### Response
```
{
msg: "OK",
code: 0
}
```
### Examples
Assuming a FE node with configuration http_port=8030 in fe.conf, <br/>
the following http request activates a debug point named `foo` in FE node and passe parameter `percent` and `duration`:
>NOTE: User name and password may be needed.
```
curl -u root: -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?percent=0.5&duration=3"
```
```
NOTE:
1. Inside FE and BE code, names and values of parameters are taken as strings.
2. Parameter names and values are case sensitive in http request and FE/BE code.
3. FE and BE share same url paths of REST API, it's just their IPs and Ports are different.
```
### Use parameters in FE and BE code
Following request activates debug point `OlapTableSink.write_random_choose_sink` in FE and passes parameter `needCatchUp` and `sinkNum`:
```
curl -u root: -X POST "http://127.0.0.1:8030/api/debug_point/add/OlapTableSink.write_random_choose_sink?needCatchUp=true&sinkNum=3"
```
The code in FE checks debug point `OlapTableSink.write_random_choose_sink` and gets parameter values:
```java
private void debugWriteRandomChooseSink(Tablet tablet, long version, Multimap<Long, Long> bePathsMap) {
DebugPoint debugPoint = DebugPointUtil.getDebugPoint("OlapTableSink.write_random_choose_sink");
if (debugPoint == null) {
return;
}
boolean needCatchup = debugPoint.param("needCatchUp", false);
int sinkNum = debugPoint.param("sinkNum", 0);
...
}
```
Following request activates debug point `TxnManager.prepare_txn.random_failed` in BE and passes parameter `percent`:
```
curl -X POST "http://127.0.0.1:8040/api/debug_point/add/TxnManager.prepare_txn.random_failed?percent=0.7
```
The code in BE checks debug point `TxnManager.prepare_txn.random_failed` and gets parameter value:
```c++
DBUG_EXECUTE_IF("TxnManager.prepare_txn.random_failed",
{if (rand() % 100 < (100 * dp->param("percent", 0.5))) {
LOG_WARNING("TxnManager.prepare_txn.random_failed random failed");
return Status::InternalError("debug prepare txn random failed");
}}
);
```
```
## Disable Debug Point
### API
@ -137,10 +230,10 @@ None
### Response
```
{
msg: "OK",
code: 0
}
{
msg: "OK",
code: 0
}
```
### Examples
@ -149,17 +242,17 @@ None
Disable debug point `foo`。
```
curl -X POST "http://127.0.0.1:8030/api/debug_point/remove/foo"
```
curl -X POST "http://127.0.0.1:8030/api/debug_point/remove/foo"
```
```
## Clear Debug Points
### API
```
POST /api/debug_point/clear
POST /api/debug_point/clear
```
@ -170,16 +263,78 @@ None
### Response
```
{
msg: "OK",
code: 0
}
```
```
{
msg: "OK",
code: 0
}
```
### Examples
```
curl -X POST "http://127.0.0.1:8030/api/debug_point/clear"
```
```
curl -X POST "http://127.0.0.1:8030/api/debug_point/clear"
```
## Debug Points in Regression Test
>In community's CI system, `enable_debug_points` configuration of FE and BE are true by default.
The Regression test framework also provides methods to activate and deactivate a particular debug point, <br/>
they are declared as below:
```groovy
// "name" is the debug point to activate, "params" is a list of key-value pairs passed to debug point
def enableDebugPointForAllFEs(String name, Map<String, String> params = null);
def enableDebugPointForAllBEs(String name, Map<String, String> params = null);
// "name" is the debug point to deactivate
def disableDebugPointForAllFEs(String name);
def disableDebugPointForAllFEs(String name);
```
`enableDebugPointForAllFEs()` or `enableDebugPointForAllBEs()` needs to be called before the test actions you want to generate error, <br/>
and `disableDebugPointForAllFEs()` or `disableDebugPointForAllBEs()` needs to be called afterward.
### Concurrent Issue
Enabled debug points affects FE or BE globally, which could cause other concurrent tests to fail unexpectly in your pull request. <br/>
To avoid this, there's a convension that regression tests using debug points must be in directory regression-test/suites/fault_injection_p0, <br/>
and their group name must be "nonConcurrent", as these regression tests will be executed serially by pull request workflow.
### Examples
```groovy
// .groovy file of the test case must be in regression-test/suites/fault_injection_p0
// and the group name must be 'nonConcurrent'
suite('debugpoint_action', 'nonConcurrent') {
try {
// Activate debug point named "PublishVersionDaemon.stop_publish" in all FE
// and pass parameter "timeout"
// "execute" and "timeout" are pre-existing parameters, usage is mentioned above
GetDebugPoint().enableDebugPointForAllFEs('PublishVersionDaemon.stop_publish', [timeout:1])
// Activate debug point named "Tablet.build_tablet_report_info.version_miss" in all BE
// and pass parameter "tablet_id", "version_miss" and "timeout"
GetDebugPoint().enableDebugPointForAllBEs('Tablet.build_tablet_report_info.version_miss',
[tablet_id:'12345', version_miss:true, timeout:1])
// Test actions which will run into debug point and generate error
sql """CREATE TABLE tbl_1 (k1 INT, k2 INT)
DUPLICATE KEY (k1)
DISTRIBUTED BY HASH(k1)
BUCKETS 3
PROPERTIES ("replication_allocation" = "tag.location.default: 1");
"""
sql "INSERT INTO tbl_1 VALUES (1, 10)"
sql "INSERT INTO tbl_1 VALUES (2, 20)"
order_qt_select_1_1 'SELECT * FROM tbl_1'
} finally {
// Deactivate debug points
GetDebugPoint().disableDebugPointForAllFEs('PublishVersionDaemon.stop_publish')
GetDebugPoint().disableDebugPointForAllBEs('Tablet.build_tablet_report_info.version_miss')
}
}
```

View File

@ -26,9 +26,13 @@ under the License.
# 代码打桩
代码打桩是代码测试使用的。激活木桩后,可以执行木桩代码。木桩的名字是任意取的
代码打桩,是指在 FE 或 BE 源码中插入一段代码,当程序执行到这里时,可以改变程序的变量或行为,这样的一段代码称为一个`木桩`
FE 和 BE 都支持代码打桩
主要用于单元测试或回归测试,用来构造正常方法无法实现的异常
每一个木桩都有一个名称,可以随便取名,可以通过一些机制控制木桩的开关,还可以向木桩传递参数。
FE 和 BE 都支持代码打桩,打桩完后要重新编译 BE 或 FE。
## 木桩代码示例
@ -54,8 +58,8 @@ BE 桩子示例代码
void Status foo() {
// dbug_be_foo_do_nothing 是一个木桩名字,
// 打开这个木桩之后,DEBUG_EXECUTE_IF 将会执行宏参数中的代码块
DEBUG_EXECUTE_IF("dbug_be_foo_do_nothing", { return Status.Nothing; });
// 打开这个木桩之后,DBUG_EXECUTE_IF 将会执行宏参数中的代码块
DBUG_EXECUTE_IF("dbug_be_foo_do_nothing", { return Status.Nothing; });
do_foo_action();
@ -71,11 +75,12 @@ void Status foo() {
## 打开木桩
打开总开关后,还需要通过向 FE 或 BE 发送 http 请求的方式,打开或关闭指定名称的木桩,只有这样当代码执行到这个木桩时,相关代码才会被执行。
### API
```
POST /api/debug_point/add/{debug_point_name}[?timeout=<int>&execute=<int>]
POST /api/debug_point/add/{debug_point_name}[?timeout=<int>&execute=<int>]
```
@ -85,10 +90,10 @@ void Status foo() {
木桩名字。必填。
* `timeout`
超时时间,单位为秒。超时之后,木桩失活。默认值-1表示永远不超时。可
超时时间,单位为秒。超时之后,木桩失活。默认值-1表示永远不超时。可
* `execute`
木桩最大激活次数。默认值-1表示不限激活次数。可
木桩最大执行次数。默认值-1表示不限执行次数。可
### Request body
@ -97,30 +102,109 @@ void Status foo() {
### Response
```
{
msg: "OK",
code: 0
}
```
```
{
msg: "OK",
code: 0
}
```
### Examples
打开木桩 `foo`,最多激活5次。
打开木桩 `foo`,最多执行5次。
```
curl -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?execute=5"
```
curl -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?execute=5"
```
```
## 向木桩传递参数
激活木桩时,除了前文所述的 timeout 和 execute,还可以传递其它自定义参数。<br/>
一个参数是一个形如 key=value 的 key-value 对,在 url 的路径部分,紧跟在木桩名称后,以字符 '?' 开头。
### API
```
POST /api/debug_point/add/{debug_point_name}[?k1=v1&k2=v2&k3=v3...]
```
* `k1=v1`
k1为参数名称,v1为参数值,多个参数用&分隔。
### Request body
### Response
```
{
msg: "OK",
code: 0
}
```
### Examples
假设 FE 在 fe.conf 中有配置 http_port=8030,则下面的请求激活 FE 中的木桩`foo`,并传递了两个参数 `percent` 和 `duration`:
```
curl -u root: -X POST "http://127.0.0.1:8030/api/debug_point/add/foo?percent=0.5&duration=3"
```
```
注意:
1、在 FE 或 BE 的代码中,参数名和参数值都是字符串。
2、在 FE 或 BE 的代码中和 http 请求中,参数名称和值都是大小写敏感的。
3、发给 FE 或 BE 的 http 请求,路径部分格式是相同的,只是 IP 地址和端口号不同。
```
### 在 FE 和 BE 代码中使用参数
激活 FE 中的木桩`OlapTableSink.write_random_choose_sink`并传递参数 `needCatchUp` 和 `sinkNum`:
>注意:可能需要用户名和密码
```
curl -u root: -X POST "http://127.0.0.1:8030/api/debug_point/add/OlapTableSink.write_random_choose_sink?needCatchUp=true&sinkNum=3"
```
在 FE 代码中使用木桩 OlapTableSink.write_random_choose_sink 的参数 `needCatchUp` 和 `sinkNum`:
```java
private void debugWriteRandomChooseSink(Tablet tablet, long version, Multimap<Long, Long> bePathsMap) {
DebugPoint debugPoint = DebugPointUtil.getDebugPoint("OlapTableSink.write_random_choose_sink");
if (debugPoint == null) {
return;
}
boolean needCatchup = debugPoint.param("needCatchUp", false);
int sinkNum = debugPoint.param("sinkNum", 0);
...
}
```
激活 BE 中的木桩`TxnManager.prepare_txn.random_failed`并传递参数 `percent`:
```
curl -X POST "http://127.0.0.1:8040/api/debug_point/add/TxnManager.prepare_txn.random_failed?percent=0.7
```
在 BE 代码中使用木桩 `TxnManager.prepare_txn.random_failed` 的参数 `percent`:
```c++
DBUG_EXECUTE_IF("TxnManager.prepare_txn.random_failed",
{if (rand() % 100 < (100 * dp->param("percent", 0.5))) {
LOG_WARNING("TxnManager.prepare_txn.random_failed random failed");
return Status::InternalError("debug prepare txn random failed");
}}
);
```
## 关闭木桩
### API
```
POST /api/debug_point/remove/{debug_point_name}
POST /api/debug_point/remove/{debug_point_name}
```
@ -137,10 +221,10 @@ void Status foo() {
### Response
```
{
msg: "OK",
code: 0
}
{
msg: "OK",
code: 0
}
```
### Examples
@ -149,39 +233,94 @@ void Status foo() {
关闭木桩`foo`。
```
curl -X POST "http://127.0.0.1:8030/api/debug_point/remove/foo"
```
```
curl -X POST "http://127.0.0.1:8030/api/debug_point/remove/foo"
```
## 清除所有木桩
### API
```
POST /api/debug_point/clear
POST /api/debug_point/clear
```
### Request body
### Response
```
{
msg: "OK",
code: 0
}
```
```
{
msg: "OK",
code: 0
}
```
### Examples
清除所有木桩。
```
curl -X POST "http://127.0.0.1:8030/api/debug_point/clear"
```
```
curl -X POST "http://127.0.0.1:8030/api/debug_point/clear"
```
## 在回归测试中使用木桩
> 提交PR时,社区 CI 系统默认开启 FE 和 BE 的`enable_debug_points`配置。
回归测试框架提供方法函数来开关指定的木桩,它们声明如下:
```groovy
// 打开木桩,name 是木桩名称,params 是一个key-value列表,是传给木桩的参数
def enableDebugPointForAllFEs(String name, Map<String, String> params = null);
def enableDebugPointForAllBEs(String name, Map<String, String> params = null);
// 关闭木桩,name 是木桩的名称
def disableDebugPointForAllFEs(String name);
def disableDebugPointForAllFEs(String name);
```
需要在调用测试 action 之前调用 `enableDebugPointForAllFEs()` 或 `enableDebugPointForAllBEs()` 来开启木桩, <br/>
这样执行到木桩代码时,相关代码才会被执行,<br/>
然后在调用测试 action 之后调用 `disableDebugPointForAllFEs()` 或 `disableDebugPointForAllBEs()` 来关闭木桩。
### 并发问题
FE 或 BE 中开启的木桩是全局生效的,同一个 Pull Request 中,并发跑的其它测试,可能会受影响而意外失败。
为了避免这种情况,我们规定,使用木桩的回归测试,必须放在 regression-test/suites/fault_injection_p0 目录下,
且组名必须设置为 `nonConcurrent`,社区 CI 系统对于这些用例,会串行运行。
### Examples
```groovy
// 测试用例的.groovy 文件必须放在 regression-test/suites/fault_injection_p0 目录下,
// 且组名设置为 'nonConcurrent'
suite('debugpoint_action', 'nonConcurrent') {
try {
// 打开所有FE中,名为 "PublishVersionDaemon.stop_publish" 的木桩
// 传参数 timeout
// 与上面curl调用时一样,execute 是执行次数,timeout 是超时秒数
GetDebugPoint().enableDebugPointForAllFEs('PublishVersionDaemon.stop_publish', [timeout:1])
// 打开所有BE中,名为 "Tablet.build_tablet_report_info.version_miss" 的木桩
// 传参数 tablet_id, version_miss 和 timeout
GetDebugPoint().enableDebugPointForAllBEs('Tablet.build_tablet_report_info.version_miss',
[tablet_id:'12345', version_miss:true, timeout:1])
// 测试用例,会触发木桩代码的执行
sql """CREATE TABLE tbl_1 (k1 INT, k2 INT)
DUPLICATE KEY (k1)
DISTRIBUTED BY HASH(k1)
BUCKETS 3
PROPERTIES ("replication_allocation" = "tag.location.default: 1");
"""
sql "INSERT INTO tbl_1 VALUES (1, 10)"
sql "INSERT INTO tbl_1 VALUES (2, 20)"
order_qt_select_1_1 'SELECT * FROM tbl_1'
} finally {
GetDebugPoint().disableDebugPointForAllFEs('PublishVersionDaemon.stop_publish')
GetDebugPoint().disableDebugPointForAllBEs('Tablet.build_tablet_report_info.version_miss')
}
}
```