### What problem does this PR solve?
Problem Summary:
In HiveMetaStoreCache, the function FileInputFormat.setInputPaths is
used to set input paths. However, this function splits paths using
commas, which is not the expected behavior. As a result, when partition
values contain commas, it leads to incorrect path parsing and potential
errors.
```java
public static void setInputPaths(JobConf conf, String org.apache.hadoop.shaded.com.aSeparatedPaths) {
setInputPaths(conf, StringUtils.stringToPath(
getPathStrings(org.apache.hadoop.shaded.com.aSeparatedPaths)));
}
```
To prevent FileInputFormat.setInputPaths from splitting paths by commas,
we use another overloaded version of the method. Instead of passing a
comma-separated string, we explicitly pass a Path object, ensuring that
partition values containing commas are handled correctly.
```java
public static void setInputPaths(JobConf conf, Path... inputPaths) {
Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]);
StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString()));
for(int i = 1; i < inputPaths.length;i++) {
str.append(StringUtils.COMMA_STR);
path = new Path(conf.getWorkingDirectory(), inputPaths[i]);
str.append(StringUtils.escapeString(path.toString()));
}
conf.set(org.apache.hadoop.shaded.org.apache.hadoop.mapreduce.lib.input.
FileInputFormat.INPUT_DIR, str.toString());
}
```
### Release note
None
### What problem does this PR solve?
bp https://github.com/apache/doris/pull/47697
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [x] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [x] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [x] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Problem Summary:
The [External hive
CI](http://43.132.222.7:8111/buildConfiguration/Doris_External_Regression/612304?buildTab=log&linesState=3650&logView=flowAware)
failed because of `namenode` error( 50070 port already in used), docker
logs:
```txt
2025-01-21T04:22:37.955682469Z java.net.BindException: Port in use: 0.0.0.0:50070
2025-01-21T04:22:37.955686106Z at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:940)
2025-01-21T04:22:37.955689402Z at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:876)
2025-01-21T04:22:37.955692708Z at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)
2025-01-21T04:22:37.955697828Z at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:760)
2025-01-21T04:22:37.955701444Z at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:639)
2025-01-21T04:22:37.955704831Z at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:819)
2025-01-21T04:22:37.955708237Z at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:803)
2025-01-21T04:22:37.955711674Z at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1500)
2025-01-21T04:22:37.955715090Z at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1566)
2025-01-21T04:22:37.955718446Z Caused by: java.net.BindException: Address already in use
2025-01-21T04:22:37.955722013Z at sun.nio.ch.Net.bind0(Native Method)
2025-01-21T04:22:37.955725460Z at sun.nio.ch.Net.bind(Net.java:433)
2025-01-21T04:22:37.955729227Z at sun.nio.ch.Net.bind(Net.java:425)
2025-01-21T04:22:37.955733074Z at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
2025-01-21T04:22:37.955736600Z at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
2025-01-21T04:22:37.955740197Z at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
2025-01-21T04:22:37.955743884Z at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:934)
2025-01-21T04:22:37.955747391Z ... 8 more
2025-01-21T04:22:37.961686454Z 25/01/21 04:22:37 INFO util.ExitUtil: Exiting with status 1
```
The best choice is avoid the services using server port at range
`/proc/sys/net/ipv4/ip_local_port_range` (32768-60999). But since the
namenode [hardcode exposing port `50070` in docker
image](https://hub.docker.com/layers/bde2020/hadoop-datanode/2.0.0-hadoop2.7.4-java8/images/sha256-5623fca5e36d890983cdc6cfd29744d1d65476528117975b3af6a80d99b3c62f),
so we add the port to `net.ipv4.ip_local_reserved_ports` and introduce a
new flags `--reserve-ports` to control it (default false, because not
everyone want to modify system reserved ports).
Change-Id: I03a81e9931cb555695199436b6f0517cccf83588
### What problem does this PR solve?
Problem Summary:
Oceanbase container sometimes start failed.
<img width="653" alt="image"
src="https://github.com/user-attachments/assets/d95c66cf-7e04-4179-a565-9b9dd8b87128"
/>
We do two things:
1. Print last 100 lines docker logs of unhealthy container for debugging
2. Upgrade Oceanbase docker image to the newest `4.2.1-lts`, since it is
7 months newer than `4.2.1`, more stable
Problem Summary:
Parallel put `tpch1.db`, `paimon1` and `tvf_data` hive data. Reduce the
time cost from 22m to 16m on 16C machine.
Change-Id: Ib75c57d397ce1f96d5108d4b570bcb215f31d421
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Master PR: #45308
Problem Summary:
Adjust the indentation format of the `init_be` and `entry_point`
scripts, as well as the duration of loop execution.
Adjust the smallest unit of all indentations to a single tab character,
and modify the loop duration when checking the BE startup status,
changing both from 300 seconds to 30 seconds to speed up the overall
Docker startup time.
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #45267
Master PR: #45269
Problem Summary:
To meet the needs of rapid Docker startup, I have made adjustments to
two related scripts in the Docker startup process. First, I added a env
`SKIP_CHECK_ULIMIT` to the `start_be.sh` script, which will skip the
size checks for `swap`, `ulimit`, and `max_map_count`. At the same time,
I used `--console` to start the process and print logs. The reason why I
did not use the `--daemon` daemon command to execute is that starting
with a foreground log printing method in a Docker container is the
correct and reliable approach.
At the same time, I added a check logic for a `be.conf` configuration
item in the `init_be.sh` script: if it is the first time starting,
append the export `SKIP_CHECK_ULIMIT=true` to skip the `ulimit` value
check in the BE process. In summary, these adjustments can meet the
basic requirements for rapid Docker startup usage.
### What problem does this PR solve?
bp #44001 , but no hive4 acid table.
Problem Summary:
1. Fixed the issue that when reading insert translaction only tables,
there was no acid check, which caused multiple data reads (i.e., reading
data from the previous base_n).
2. Forbidden to create, insert data, and delete aicd tables.
pick (#42102)
Add a variable `enable_jdbc_cast_predicate_push_down`, the default value
is false, which prohibits the pushdown of non-constant predicates with
type conversion and all predicates with implicit conversion. This change
can prevent the wrong predicates from being pushed down to the Jdbc data
source, resulting in query data errors, because the predicates with cast
were not correctly pushed down to the data source before. If you find
that the data is read correctly and the performance is better before
this change, you can manually set this variable to true
```
| Expression | Can Push Down |
|-----------------------------------------------------|---------------|
| column type equals const type | Yes |
| column type equals cast const type | Yes |
| cast column type equals const type | No |
| cast column type equals cast const type | No |
| column type not equals column type | No |
| column type not equals cast const type | No |
| cast column type not equals const type | No |
| cast column type not equals cast const type | No |
```