### What problem does this PR solve?
Problem Summary:
In HiveMetaStoreCache, the function FileInputFormat.setInputPaths is
used to set input paths. However, this function splits paths using
commas, which is not the expected behavior. As a result, when partition
values contain commas, it leads to incorrect path parsing and potential
errors.
```java
public static void setInputPaths(JobConf conf, String org.apache.hadoop.shaded.com.aSeparatedPaths) {
setInputPaths(conf, StringUtils.stringToPath(
getPathStrings(org.apache.hadoop.shaded.com.aSeparatedPaths)));
}
```
To prevent FileInputFormat.setInputPaths from splitting paths by commas,
we use another overloaded version of the method. Instead of passing a
comma-separated string, we explicitly pass a Path object, ensuring that
partition values containing commas are handled correctly.
```java
public static void setInputPaths(JobConf conf, Path... inputPaths) {
Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]);
StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString()));
for(int i = 1; i < inputPaths.length;i++) {
str.append(StringUtils.COMMA_STR);
path = new Path(conf.getWorkingDirectory(), inputPaths[i]);
str.append(StringUtils.escapeString(path.toString()));
}
conf.set(org.apache.hadoop.shaded.org.apache.hadoop.mapreduce.lib.input.
FileInputFormat.INPUT_DIR, str.toString());
}
```
### Release note
None
### What problem does this PR solve?
bp https://github.com/apache/doris/pull/47697
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [x] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [x] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [x] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Problem Summary:
The [External hive
CI](http://43.132.222.7:8111/buildConfiguration/Doris_External_Regression/612304?buildTab=log&linesState=3650&logView=flowAware)
failed because of `namenode` error( 50070 port already in used), docker
logs:
```txt
2025-01-21T04:22:37.955682469Z java.net.BindException: Port in use: 0.0.0.0:50070
2025-01-21T04:22:37.955686106Z at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:940)
2025-01-21T04:22:37.955689402Z at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:876)
2025-01-21T04:22:37.955692708Z at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)
2025-01-21T04:22:37.955697828Z at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:760)
2025-01-21T04:22:37.955701444Z at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:639)
2025-01-21T04:22:37.955704831Z at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:819)
2025-01-21T04:22:37.955708237Z at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:803)
2025-01-21T04:22:37.955711674Z at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1500)
2025-01-21T04:22:37.955715090Z at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1566)
2025-01-21T04:22:37.955718446Z Caused by: java.net.BindException: Address already in use
2025-01-21T04:22:37.955722013Z at sun.nio.ch.Net.bind0(Native Method)
2025-01-21T04:22:37.955725460Z at sun.nio.ch.Net.bind(Net.java:433)
2025-01-21T04:22:37.955729227Z at sun.nio.ch.Net.bind(Net.java:425)
2025-01-21T04:22:37.955733074Z at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
2025-01-21T04:22:37.955736600Z at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
2025-01-21T04:22:37.955740197Z at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
2025-01-21T04:22:37.955743884Z at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:934)
2025-01-21T04:22:37.955747391Z ... 8 more
2025-01-21T04:22:37.961686454Z 25/01/21 04:22:37 INFO util.ExitUtil: Exiting with status 1
```
The best choice is avoid the services using server port at range
`/proc/sys/net/ipv4/ip_local_port_range` (32768-60999). But since the
namenode [hardcode exposing port `50070` in docker
image](https://hub.docker.com/layers/bde2020/hadoop-datanode/2.0.0-hadoop2.7.4-java8/images/sha256-5623fca5e36d890983cdc6cfd29744d1d65476528117975b3af6a80d99b3c62f),
so we add the port to `net.ipv4.ip_local_reserved_ports` and introduce a
new flags `--reserve-ports` to control it (default false, because not
everyone want to modify system reserved ports).
Change-Id: I03a81e9931cb555695199436b6f0517cccf83588
### What problem does this PR solve?
Problem Summary:
Oceanbase container sometimes start failed.
<img width="653" alt="image"
src="https://github.com/user-attachments/assets/d95c66cf-7e04-4179-a565-9b9dd8b87128"
/>
We do two things:
1. Print last 100 lines docker logs of unhealthy container for debugging
2. Upgrade Oceanbase docker image to the newest `4.2.1-lts`, since it is
7 months newer than `4.2.1`, more stable
Problem Summary:
Parallel put `tpch1.db`, `paimon1` and `tvf_data` hive data. Reduce the
time cost from 22m to 16m on 16C machine.
Change-Id: Ib75c57d397ce1f96d5108d4b570bcb215f31d421
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Master PR: #45308
Problem Summary:
Adjust the indentation format of the `init_be` and `entry_point`
scripts, as well as the duration of loop execution.
Adjust the smallest unit of all indentations to a single tab character,
and modify the loop duration when checking the BE startup status,
changing both from 300 seconds to 30 seconds to speed up the overall
Docker startup time.
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #45267
Master PR: #45269
Problem Summary:
To meet the needs of rapid Docker startup, I have made adjustments to
two related scripts in the Docker startup process. First, I added a env
`SKIP_CHECK_ULIMIT` to the `start_be.sh` script, which will skip the
size checks for `swap`, `ulimit`, and `max_map_count`. At the same time,
I used `--console` to start the process and print logs. The reason why I
did not use the `--daemon` daemon command to execute is that starting
with a foreground log printing method in a Docker container is the
correct and reliable approach.
At the same time, I added a check logic for a `be.conf` configuration
item in the `init_be.sh` script: if it is the first time starting,
append the export `SKIP_CHECK_ULIMIT=true` to skip the `ulimit` value
check in the BE process. In summary, these adjustments can meet the
basic requirements for rapid Docker startup usage.
### What problem does this PR solve?
bp #44001 , but no hive4 acid table.
Problem Summary:
1. Fixed the issue that when reading insert translaction only tables,
there was no acid check, which caused multiple data reads (i.e., reading
data from the previous base_n).
2. Forbidden to create, insert data, and delete aicd tables.