### What problem does this PR solve?
Problem Summary:
In HiveMetaStoreCache, the function FileInputFormat.setInputPaths is
used to set input paths. However, this function splits paths using
commas, which is not the expected behavior. As a result, when partition
values contain commas, it leads to incorrect path parsing and potential
errors.
```java
public static void setInputPaths(JobConf conf, String org.apache.hadoop.shaded.com.aSeparatedPaths) {
setInputPaths(conf, StringUtils.stringToPath(
getPathStrings(org.apache.hadoop.shaded.com.aSeparatedPaths)));
}
```
To prevent FileInputFormat.setInputPaths from splitting paths by commas,
we use another overloaded version of the method. Instead of passing a
comma-separated string, we explicitly pass a Path object, ensuring that
partition values containing commas are handled correctly.
```java
public static void setInputPaths(JobConf conf, Path... inputPaths) {
Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]);
StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString()));
for(int i = 1; i < inputPaths.length;i++) {
str.append(StringUtils.COMMA_STR);
path = new Path(conf.getWorkingDirectory(), inputPaths[i]);
str.append(StringUtils.escapeString(path.toString()));
}
conf.set(org.apache.hadoop.shaded.org.apache.hadoop.mapreduce.lib.input.
FileInputFormat.INPUT_DIR, str.toString());
}
```
### Release note
None
Guide for test cases
General Case
-
Write "def" before variable names; otherwise, they will be global variables and may be affected by other cases running in parallel.
Problematic code:
ret = ***Correct code:
def ret = *** -
Avoid setting global session variables or modifying cluster configurations in cases, as it may affect other cases.
Problematic code:
sql """set global enable_pipeline_x_engine=true;"""Correct code:
sql """set enable_pipeline_x_engine=true;""" -
If it is necessary to set global variables or modify cluster configurations, specify the case to run in a nonConcurrent manner.
-
For cases involving time-related operations, it is best to use fixed time values instead of dynamic values like the
now()function to prevent cases from failing after some time.Problematic code:
sql """select count(*) from table where created < now();"""Correct code:
sql """select count(*) from table where created < '2023-11-13';""" -
After streamloading in a case, add a sync to ensure stability when executing in a multi-FE environment.
Problematic code:
streamLoad { ... } sql """select count(*) from table """Correct code:
streamLoad { ... } sql """sync""" sql """select count(*) from table """ -
For UDF cases, make sure to copy the corresponding JAR file to all BE machines.
-
Do not create the same table in different cases under the same directory to avoid conflicts.
-
Cases injected should be marked as nonConcurrent and ensured injection to be removed after running the case.
Compatibility case
Refers to the resources or rules created on the initial cluster during FE testing or upgrade testing, which can still be used normally after the cluster restart or upgrade, such as permissions, UDF, etc.
These cases need to be split into two files, load.groovy and xxxx.groovy, placed in a folder, and tagged with the restart_fe group label, example.