Commit Graph

112 Commits

Author SHA1 Message Date
0883d47832 [Enhance](broker) add inputstream expire scheduled checker to avoid memory leak for broker scan (#28589)
This pr introduces 2 broker conf:

1. enable_input_stream_expire_check: which indicates whether enable inputStream expire check.
2. input_stream_expire_seconds: which indicates the timeout seconds for inputStream since last update.
2023-12-19 19:24:29 +08:00
bd2e8239f8 [fix](broker) not print broker request detail default (#27126)
if broker detail contails ak/sk or some sensitive information, not print these info default in log
---------
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-11-20 16:41:47 +08:00
aa0b74d63a [improvement](fe and broker) support specify broker to getSplits, check isSplitable, file scan for HMS Multi-catalog (#24830)
I want to use Doris Multi-catalog to accelerate HMS query. My organization has custom distributed file system, and we think wrapping the fs access difference into broker (listLocatedFiles, openReader..) would be a elegant approach.

This pr introduce HMS catalog conf `bind.broker.name`. If we set this conf, file split, query scan operation will send to broker.

usage:
create a hms catalog with broker usage
```
CREATE CATALOG hive_catalog_broker PROPERTIES (
    'type'='hms',
    'hive.metastore.uris' = 'thrift://xxx',
    'broker.name' = 'hdfs_broker'
);
```
When we try to query from this catalog, file split and query scan request will send to broker `hdfs_broker`.

More details about this pr:
1. Introduce HMS catalog proporty `bind.broker.name` to specify broker name to do remote path work. When `broker.name` is set, `enable.self.splitter` must be `true` to ensure file splitting process is executed in Fe
2. Introduce 2 more interfaces to broker service:
- `TBrokerIsSplittableResponse isSplittable(1: TBrokerIsSplittableRequest request)`, helps to invoke input format `isSplitable` interface.
- `TBrokerListResponse listLocatedFiles(1: TBrokerListPathRequest request)`, helps to do `listFiles` or `listLocatedStatus` for remote file system
3. 3 parts of whole processing will be executed in broker:
- Check whether the path with specified input format name `isSplittable`
- `listLocatedFiles` of table / partition locations.
- `OpenReader` for specified file splits.

Co-authored-by: chenlinzhong <490103404@qq.com>
2023-10-13 11:04:38 +08:00
8b1e5897c8 [fix](security): Use SecureRandom instead of Random, because it provides better security #24483 2023-09-21 08:22:16 +08:00
b7ca4fcc8d [fix](io): use try with resource make io stream close automatically to avoid resource leak (#24605) 2023-09-20 11:39:03 +08:00
c41cadb64d [fix](broker) fix broker read issue (#24635)
The given "length" of broker's pread() method is the buffer length, not the length required from file.
So it may larger than the file length.
So we should return all read data, instead of return EOF when `read()` method return -1

I will add regression test case later when the framework support broker process.
2023-09-20 10:43:16 +08:00
6fe207eb4b [fix](broker) do not close filesystem(#24357)
same as #24128
To avoid Filesystem closed error
2023-09-14 18:36:09 +08:00
f8692bef4b [fix](io): use try with resource make io stream close automatically to avoid resource leak (#24297) 2023-09-14 11:51:30 +08:00
9df72a96f3 [Feature](multi-catalog) Support hadoop viewfs. (#24168)
### Feature

Support hadoop viewfs.

### Test

- Regression tests: 
  - hive viewfs test.
  - tvf viewfs test.

- Broker load with broker and with hdfs tests manually.
2023-09-13 00:20:12 +08:00
a8ed1d87d7 [enhancement](config): Change root log level to info in broker log (#24023) 2023-09-09 17:56:50 +08:00
698fe55662 remove unused configs in be and broker (#24021) 2023-09-09 08:24:50 +08:00
97eb2b9172 [Fix](multi-catalog) Fix broker load reader and hdfs reader issue. (#23529)
Broker load with broker sometimes will throw 'Invalid orc post script length'.
hdfs query sometimes will throw 'Invalid orc post script length'.
2023-08-29 13:45:48 +08:00
aa5e56c73d [fix](broker) fix export job failed for that currentStreamOffset may be different with request offset (#23133)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>when export job encounter heavy pressure, the failed export job may see the following message
current outputstream offset is 423597 not equal to request 421590, cause by: null,
because the broker pwrite operation may retry for timeout, so we just skip it instead of throw broker exception.
2023-08-18 14:32:36 +08:00
d1a2473944 [Feature](broker)Support GCS (#20904) 2023-08-07 19:37:18 +08:00
d4a1549003 [minor](broker) fix name in broker's pom.xml (#20840)
change palo -> doris
do not check compiler's version inenv.sh, because building broker does not need gcc compiler. And the version is also checked in CMakefile
2023-07-03 16:46:47 +08:00
3ba3b6c66f [opt](FileCache) use modification time to determine whether the file is changed (#18906)
Get the last modification time from file status, and use the combination of path and modification time to generate cache identifier.
When a file is changed, the modification time will be changed, so the former cache path will be invalid.
2023-05-11 07:50:39 +08:00
Pxl
ec517a53a8 [Chore](build) upgrade clang-format version to 16 && move thrift to fe-common (#19155)
upgrade clang-format version to 16
move thrift to fe-common
fix core dump on pipeline engine when operator canceled and not prepared
2023-04-28 14:14:51 +08:00
918a244068 [chore](pom) update apache pom to 29 (#18843) 2023-04-20 16:57:05 +08:00
c12646382d [feature](multicatalog) enable doris hive/iceberg catalog to read data on tencent GooseFS (#18685) 2023-04-16 18:11:57 +08:00
75fd4b70fa [improve](fe)Optimize fe binary package packaging (#18554) 2023-04-12 12:58:45 +08:00
05db6e9b55 [refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009)
Follow #17586.
This PR mainly changes:

Remove env/
Remove FileUtils/FilesystemUtils
Some methods are moved to LocalFileSystem
Remove olap/file_cache
Add s3 client cache for s3 file system
In my test, the time of open s3 file can be reduced significantly
Fix cold/hot separation bug for s3 fs.
This is the last PR of #17764.
After this, all IO operation should be in io/fs.

Except for tests in #17586, I also tested some case related to fs io:

clone
concurrency query on local/s3/hdfs
load error log create and clean
disk metrics
2023-03-29 09:00:52 +08:00
Pxl
1a549edac2 [Chore](third-party) upgrade thrift from 0.13 to 0.16 (#17202)
upgrade thrift from 0.13 to 0.16
There is thrift's release notes https://github.com/apache/thrift/blob/master/CHANGES.md
2023-03-10 11:33:16 +08:00
13cb81a724 [fix](broker) Fix bug that heavy broker load may failed due to BrokerException which indicate the fd is not owned by client (#16350)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-02-03 15:06:45 +08:00
c7a72436e6 [Feature](multi-catalog)Add support for JuiceFS (#15969)
The broker implements the interface to juicefs,It supports loading data from juicefs to doris through broker.
At the same time, it also implements the multi catalog to read the hive data stored in juicefs
2023-01-19 08:54:16 +08:00
5c5b7a5c6f [Broker](bos) suppoert baidu bos object storage for broker (#15448) 2022-12-30 12:39:10 +08:00
d48abd91df [deps](fe)upgrade deps version (#15262)
upgrade hadoop version to 2.10.2
jackson-databind to 2.14.1
2022-12-24 22:18:10 +08:00
2eca51f3ba [enhancement](broker) broker load support tencent cos (#12801) 2022-11-22 21:51:15 +08:00
4976021bf7 [Enhancement] Doris broker support aliyun-oss #13665 (#14305) 2022-11-21 10:29:14 +08:00
3c8524b9d8 [security](fe jar) upgrade commons-codec:commons-codec to 1.13 #13951 2022-11-07 13:50:07 +08:00
2cf89c55c2 [chore](macOS) Fix issues found on macOS x86_64 (#13583)
1. Use `brew --prefix` instead of `brew --repo` in scripts.
2. `sprintf` is marked as a deprecated function in MacOSX sdk (13.0).
2022-10-24 20:59:20 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00
7147c77f22 [Enhancement](broker)Doris support obs broker load (#12781)
1. Upgrade fs_broker module hadoop2.7.3->hadoop2.8.3
2. Support obs broker load

org.apache.doris.broker.hdfs.FileSystemManager add getOBSFileSystem method
2022-10-13 09:44:13 +08:00
HB
00dda79735 [fix](broker-load) Correction of kerberos authentication time determination rule (#11793)
Every time a new broker load comes in, Doris will update the start time of Kerberos authentication,
but this logic is wrong.
Because the authentication duration of Kerberos is calculated from the moment when the ticket is obtained.

This PR change the logic:
1. If it is kerberos, check fs expiration by create time.
2.Otherwise, check fs expiration by access time
2022-09-18 17:46:13 +08:00
d7e032bc38 Modify the startup script and print the log without using the --daemon parameter. (#12218) 2022-08-31 14:36:14 +08:00
4fa53b4cdb [chore](workflow) Add shellcheck to check shell scripts (#11744) 2022-08-18 16:07:28 +08:00
HB
583b44dfa8 [enhancement](broker) Improve the availability of broker load (#10699) 2022-08-09 17:00:48 +08:00
388db05ef9 [bugfix](log4j) Upgrade log4j to 2.18.0 (#11368) 2022-07-31 22:21:33 +08:00
67f341f44e [TLP](step-1) Remove incubator prefix (#10230)
Remove some `incubator-` prefix in source code.
The document is not modified, will be done in next PR.
2022-06-19 19:34:52 +08:00
4ccaa0dfc5 [Bug] (load) Broker load kerberos auth fail (#9494) 2022-05-12 15:43:29 +08:00
419ec3b96c [Fix Bug] Fix ehco command not found (#9021) 2022-04-15 13:43:47 +08:00
6af1c52e13 [Feature] add support for tencent chdfs (#8963)
Co-authored-by: chengwu <chengwu@tencent.com>
2022-04-12 16:02:42 +08:00
13f1f94f86 [chore] upgrade log4j version to 2.17.2 (#8774)
upgrade log4j version to 2.17.2
2022-04-02 21:29:25 +08:00
50a59f3f86 [license] Organize third-party dependent licenses for bianry releases (#8350) 2022-03-07 23:18:58 +08:00
c0e59e59aa [fix][refactor] fix bugs and refactor some code by lint (#7871)
1. Fix some `passedByValue` issues.
2. Fix some `dereferenceBeforeCheck` issues.
3. Fix some `uninitMemberVar` issues.
4. Fix some iterator `eraseDereference` issues.
5. Fix compile issue introduced from #7923 #7905 #7848
2022-02-01 14:31:14 +08:00
4bdeef3b64 [chore][fix][doc](fe-plugin)(mysqldump) fix build auditlog plugin error (#7804)
1. fix problems when build fe_plugins
2. format
3. add docs about dump data using mysql dump
2022-01-26 09:11:23 +08:00
946fa2960d [improvement](broker) add some properties that can be set in the broker conf file (#7499) 2022-01-18 10:24:54 +08:00
ad35067a2a [chore][docs] add deploy spark/flink connectors to maven release repo docs (#7616) 2022-01-06 23:23:33 +08:00
738d2d2e07 [refactor] update parent pom version and optimize build scripts (#7548) 2022-01-05 10:45:11 +08:00
a60d86c1e1 [improvement](broker) add disable cache config for broker (#7506) 2021-12-31 16:48:55 +08:00
85c30fc720 [deps] Upgrade Log4j to 2.7.1 to solve the CVE-2021-44832 security vulnerability (#7536)
Upgrade Log4j to 2.7.1 to solve the CVE-2021-44832 security vulnerability

Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
2021-12-30 10:21:37 +08:00