doris

Author	SHA1	Message	Date
plat1ko	b0cac0014d	[enhance](FS) Improve FS error code (#29432 )	2024-01-06 21:17:22 +08:00
zclllyybb	f374beaa4e	[fix](log) regularise some BE error type and fix a load task check #28729	2023-12-25 10:45:19 +08:00
plat1ko	d767804815	[feature](merge-cloud) Decouple rowset id generator and local rowsets gc implementation (#25921 )	2023-11-10 10:07:02 +08:00
Jack Drogon	2cc68381ec	[feature](binlog) Add ingest_binlog/http_get_snapshot limit download speed && Add async ingest_binlog (#26323 )	2023-11-06 11:14:44 +08:00
DeadlineFen	6502da8917	[bugfix](restore) add partition id into convert_rowset_ids() (#24834 )	2023-09-25 20:07:24 +08:00
plat1ko	25b6e4deb2	[fix](daemon) Fix incorrect initialization order of daemon services (#23578 ) Current initialization dependency: Daemon ───┬──► StorageEngine ──► ExecEnv ──► Disk/Mem/CpuInfo │ │ BackendService ─┘ However, original code incorrectly initialize Daemon before StorageEngine. This PR also stop and join threads of daemon services in their dtor, to ensure Daemon services release resources in reverse order of initialization via RAII.	2023-08-31 19:46:38 +08:00
slothever	f66f161017	[fix](multi-catalog)fix hive table with cosn location issue (#23409 ) Sometimes, the partitions of a hive table may on different storage, eg, some is on HDFS, others on object storage(cos, etc). This PR mainly changes: 1. Fix the bug of accessing files via cosn. 2. Add a new field `fs_name` in TFileRangeDesc This is because, when accessing a file, the BE will get a hdfs client from hdfs client cache, and different file in one query request may have different fs name, eg, some of are `hdfs://`, some of are `cosn://`, so we need to specify fs name for each file, otherwise, it may return error: `reason: IllegalArgumentException: Wrong FS: cosn://doris-build-1308700295/xxxx, expected: hdfs://[172.xxxx:4007](http://172.xxxxx:4007/)`	2023-08-26 00:16:00 +08:00
plat1ko	d4694167a8	[Enhancement](chore) Some Status relevant enhancement (#23072 )	2023-08-21 14:14:38 +08:00
Mingyu Chen	2678afd2db	[fix][improvement](fs) add HdfsIO profile and modification time (#21638 ) Refactor the interface of create_file_reader the file_size and mtime are merged into FileDescription, not in FileReaderOptions anymore. Now the file handle cache can get correct file's modification time from FileDescription. Add HdfsIO for hdfs file reader pick from [Enhancement](multi-catalog) Add hdfs read statistics profile. #21442	2023-07-08 14:49:44 +08:00
Mingyu Chen	b471cf2045	Revert "[Enhancement](multi-catalog) Add hdfs read statistics profile. (#21442 )" (#21618 ) This reverts commit 57729bad6841ea9728e6b2cf0bd484133e7b9ead. To fix compile error	2023-07-07 17:45:31 +08:00
Qi Chen	57729bad68	[Enhancement](multi-catalog) Add hdfs read statistics profile. (#21442 ) Add hdfs read statistics profile. ``` - HdfsIO: 0ns - TotalBytesRead: 133.47 MB - TotalLocalBytesRead: 133.47 MB - TotalShortCircuitBytesRead: 133.47 MB - TotalZeroCopyBytesRead: 0.00 ```	2023-07-07 14:52:14 +08:00
Jack Drogon	03cb69c0ee	[feature](backup-restore) Add local backup/restore not upload/download by broker (#20492 )	2023-06-07 21:35:15 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Mingyu Chen	05db6e9b55	[refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009 ) Follow #17586. This PR mainly changes: Remove env/ Remove FileUtils/FilesystemUtils Some methods are moved to LocalFileSystem Remove olap/file_cache Add s3 client cache for s3 file system In my test, the time of open s3 file can be reduced significantly Fix cold/hot separation bug for s3 fs. This is the last PR of #17764. After this, all IO operation should be in io/fs. Except for tests in #17586, I also tested some case related to fs io: clone concurrency query on local/s3/hdfs load error log create and clean disk metrics	2023-03-29 09:00:52 +08:00
Mingyu Chen	cb79e42e5c	[refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586 ) See #17764 for details I have tested: - Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp - Outfile to local/s3/hdfs/broker. - Load from local/s3/hdfs/broker. - Query file on local/s3/hdfs/broker file system, with table value function and catalog. - Backup/Restore with local/s3/hdfs/broker file system Not test: - cold & host data separation case.	2023-03-21 21:08:38 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
Zhengguo Yang	805c13aaa1	[fix](backup) fix backup restore raise `Storage backend not initialized.` error (#11736 ) fix backup restore raise Storage backend not initialized. error	2022-08-15 13:24:38 +08:00
plat1ko	331fa50501	[feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280 ) This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet, and there is no necessary to prohibit loading new data to cooled tablets. Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without perceiving the underlying filesystem. The abstracted `RemoteFileSystem` can try local caching strategies with different granularity, instead of caching segment files as before. To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory. In the future, `FileReader`s and `FileWriter`s should be unified.	2022-07-08 12:18:39 +08:00
Tiewei Fang	c9f86bc7e2	[refactor] Refactoring Status static methods to format message using fmt(#9533 )	2022-07-02 18:58:23 +08:00
xiepengcheng01	1d3496c6ab	[feature] support backup/restore connect to HDFS (#10081 )	2022-06-19 10:26:20 +08:00
yinzhijian	bc431f2806	[typo] Fix typos in comments (#10142 )	2022-06-16 10:13:59 +08:00
plat1ko	f4e2f78a1a	[fix] Fix the bug that data balance causes tablet loss (#9971 ) 1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent. 2. According to #6063, almost apply this fix on current code.	2022-06-15 09:52:56 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
hongbin	c71ffc01de	[Refactor] Cleanup some unused include (#9063 )	2022-04-18 09:52:31 +08:00
yiguolei	e5e0dc421d	[refactor] Change ALL OLAPStatus to Status (#8855 ) Currently, there are 2 status code in BE, one is common/Status.h, and the other is olap/olap_define.h called OLAPStatus. OLAPStatus is just an enum type, it is very simple and could not save many informations, I will unify these code to common/Status.	2022-04-14 11:43:49 +08:00
caiconghui	98cab78320	[refactor](schema_hash) remove schema_hash since every tablet id in be is unique (#8574 )	2022-04-07 08:37:45 +08:00
yiguolei	aea3e4e59b	[refactor] Remove version hash from BE and related test in BE (#8027 )	2022-02-14 09:29:27 +08:00
pengxiangyu	20ef8a6e21	[feature-wip](remote storage)(step1) use a struct instead of string for parameter path, add basic remote method (#7098 ) For the first, we need to make a parameter to discribe the data is local or remote. At then, we need to support some basic function to support the operation for remote storage.	2021-12-22 22:58:23 +08:00
HappenLee	da99749e7f	[Bug] Fix bug that BE will crash when backup using S3 (#6855 )	2021-10-17 22:54:42 +08:00
Zhengguo Yang	d641a26490	[Refactor] Remove boost filesystem (#5579 ) * use std::filesystem instead of boost Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>	2021-04-08 09:11:59 +08:00
stdpain	ad67dd34a0	update gcc to gcc 10 and support c++17 (#5394 ) * update gcc to gcc 10 and support c++17 update brpc to 0.9.7 update boost to 1.73 remove third-party boost 1.54 for mysql * update cmake version * ignore jdk version * remove unused patch * avoid use SYS_getrandom call	2021-03-25 09:30:38 +08:00
Zhengguo Yang	6ede4c6ec1	[Feature] Support backup,restore,load,export directly connect to s3 (#5399 ) * [doris-1008] support backup and restore directly to cloud storage via aws s3 protocol * Internal][S3DirectAccess] Support backup,restore,load,export directlyconnect to s3 1. Support load and export data from/to s3 directly. 2. Add a config to auto convert broker access to s3 acces when available Change-Id: Iac96d4b3670776708bc96a119ff491db8cb4cde7 (cherry picked from commit 2f03832ca52221cc7436069b96c45c48c4bc7201) * [Internal][S3DirectAccess] File path glob compatible with broker Change-Id: Ie55e07a547aa22c6fa8d432ca926216c10384e68 (cherry picked from commit d4fb25544c0dc06d23e1ada571ec3f8edd4ba56f) * [internal] [doris-1008] fix log4j class not found Change-Id: I468176aca0d821383c74ee658d461aba9e7d5be3 (cherry picked from commit 029adaa9d6ded8503acbd6644c1519456f3db232) * add poms Co-authored-by: yangzhengguo01 <yangzhengguo01@baidu.com>	2021-02-22 16:07:56 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Zhengguo Yang	75e0ba32a1	Fixes some be typo (#4714 )	2020-10-13 09:37:15 +08:00
ZHAO Chun	e98bbb5bc5	Refactor clone task (#2285 ) In the previous implementation, clone task will continue download files even if some error happened. This may cause unexpected problem. This Change List refactor it to that when error happends, clone task will fail total and try to clone from another remote source. Besides above change, I call FileUtils::remove_all and create_dir instead of boost one, which may cause exception. What's more AgentMasterClient is replaced with ThriftRpcHelper, by this change conncection can be reused.	2019-11-24 22:36:10 +08:00
kangpinghuang	6b4ef34162	fix AlphaRowsetTest by remove StorageEngine #2078 (#2091 )	2019-10-30 19:39:41 +08:00
ZHAO Chun	f130bd3e7b	Use Env function to operate directory (#1980 ) Now Env has unify all environment operation, such as file operation. However some of our old functions don't leverage it. This change unify FileUtils::scan_dir to use Env's function.	2019-10-15 09:25:12 +08:00
Mingyu Chen	e67b398916	Fix bug that backup may create an empty file on remote storage. (#1869 ) Sometime the broker writer failed to close, but we do not handle this failure. This may create an empty file on remote storage but be treated as normal. Also enhance some usabilities: 1. getting latest 2000 transactions instead of getting the earliest. 2. Show backend which download and upload tasks are being executed.	2019-09-28 00:11:43 +08:00
yiguolei	2f0808137a	Refactor FrontendHelper (#1888 )	2019-09-27 13:21:14 +08:00
Mingyu Chen	7e981b2b14	Limit the disk usage to avoid running out of disk capacity (#1702 ) Set high watermark and flood stage of disk used capacity. And forbid some operations if disk usage is too high.	2019-08-27 22:18:17 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
ZHAO Chun	9d03ba236b	Uniform Status (#1317 )	2019-06-14 23:38:31 +08:00
Mingyu Chen	488e3825f7	Fix bug that restore process in BE causes BE crash (#1193 ) When calling SnapshotLoader.move(), all files should be revoked if they are in GC queue, or the file may be deleted after move() success.	2019-05-23 19:22:29 +08:00
Mingyu Chen	0820a29b8d	Implement the routine load process of Kafka on Backend (#671 )	2019-04-28 10:33:50 +08:00
Mingyu Chen	5b1e3d3f40	Optimize backup & restore process (#460 ) 1. Print broker address for debug. 2. Do not letting backup job cancelled if it already in state UPLOAD_INFO. 3. Cancel task on Backends when job is cancelled. 4. Show detail progress of backup and restore job. 5. Make 'show snapshot' result more readable. 6. Change upload and download thread num of backup and restore in Backend to 1.	2018-12-24 16:49:16 +08:00
chenhao7253886	37b4cafe87	Change variable and namespace name in BE (#268 ) Change 'palo' to 'doris'	2018-11-02 10:22:32 +08:00
morningman	2868793b6b	Change license to Apache License 2.0 (#262 )	2018-11-01 09:06:01 +08:00
morningman	051aced48d	Missing many files in last commit In last commit, a lot of files has been missed	2018-10-31 16:19:21 +08:00
morningman	cc74efb3c5	merge to ddb65b69f9c788e359e191889cb31f15279c41ec (#224 ) 1. Apache HDFS broker support HDFS HA and Hadoop kerberos authentication. 2. New Backup and Restore function. Use Fs Broker to backup your data to HDFS or restore them from HDFS. 3. Table-Level Privileges. Grant fine-grained privileges on table-level to specified user. 4. A lot of bugs fixed. 5. Performance improvement.	2018-08-24 17:12:26 +08:00

50 Commits