doris

Author	SHA1	Message	Date
Xinyi Zou	e17aef9467	[refactor] refactor the implement of MemTracker, and related usage (#8322 ) Modify the implementation of MemTracker: 1. Simplify a lot of useless logic; 2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing; 3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes; 4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection 5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently; 6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later; 7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env; 8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.; Modify where MemTracker is used: 1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code; 2. Added trackers for global objects such as ChunkAllocator and StorageEngine; 3. Added more fine-grained trackers such as ExprContext; 4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode; 5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;	2022-03-11 22:04:23 +08:00
caiconghui	c86d469baf	[Refactor](storage_engine) Use std::shared_mutex to replace RWMutex (#8387 )	2022-03-11 18:14:24 +08:00
Adonis Ling	b40e9144cb	[feature-wip][array-type] Refactor type info for nested array. (#8279 )	2022-03-02 14:20:39 +08:00
GoGoWen	a8a5c0a6a8	[improvement](load) memory usage optimization for load job (#7454 ) Reduce memory usage when loading unqualified data	2021-12-24 21:30:28 +08:00
pengxiangyu	20ef8a6e21	[feature-wip](remote storage)(step1) use a struct instead of string for parameter path, add basic remote method (#7098 ) For the first, we need to make a parameter to discribe the data is local or remote. At then, we need to support some basic function to support the operation for remote storage.	2021-12-22 22:58:23 +08:00
Mingyu Chen	6c4aeab06f	[fix](broker-load) BE may crash when using preceding filter in broker or routine load (#7193 ) The broker scan node has two tuple descriptors: One is dest tuple and the other is src tuple. The src tuple is used to read the lines of the original file, and the dest tuple is used to save the converted lines. The preceding filter is executed on the src tuple, so src tuple descriptor should be used to initialize the filter expression	2021-11-30 22:04:05 +08:00
Zhengguo Yang	6c6380969b	[refactor] replace boost smart ptr with stl (#6856 ) 1. replace all boost::shared_ptr to std::shared_ptr 2. replace all boost::scopted_ptr to std::unique_ptr 3. replace all boost::scoped_array to std::unique<T[]> 4. replace all boost:thread to std::thread	2021-11-17 10:18:35 +08:00
Zhengguo Yang	4f744333c2	fix some core in local test: (#6594 ) 1. insert very large string value may coredump 2. some analitic functiuon and agg function result may be incorrect 3. string compare may be coredump when string type is too large 4. string type in delete condition can not process correctly 5. add text/blob as alias of string to compitable with mysql 6. fix string type min/max agg may process incorrectly	2021-09-10 09:52:03 +08:00
Xiang Wei	c65ec3136b	[Improvement] spark load without agg and de/serialization (#6270 ) fix #6269 The outline of our changes is to improve our memory in case of OOM in BE and to speed up the calculation. 1. We do not need to do Aggregation in load, which has already been done in the ETL spark job. 2. Based on 1, we do not need to serialize/deserialize bitmap/HLL objects.	2021-08-19 14:15:01 +08:00
Zhengguo Yang	8738ce380b	Add long text type STRING, with a maximum length of 2GB. Usage is similar to varchar, and there is no guarantee for the performance of storing extremely long data (#6391 )	2021-08-18 09:05:40 +08:00
Mingyu Chen	c6aa37f5ef	[Alter] Support doing compaction for tablets under alter operation (#6365 ) The problem I want to solve is described in #6355. This CL mainly changes: 1. Support compacting tablets under alter operations On BE side, the compaction logic will select tablets which state is "TABLET_NOTREADY" to do cumulative compaction. 2. Remove "alter_task" field in tablet's meta on BE side. "alter_task" field is never used long time ago 3. Support doing delete operation when table is doing alter operation. Previously, when a table is doing alter operation, execution of delete will return error: Table's state is not NORMAL. But now, delete can be executed successfully only if the condition column is not under schema change. And delete condition will be applied to all materialized indexes.	2021-08-07 21:32:26 +08:00
HappenLee	b423274f17	[Enhance] Make MemTracker more accurate (#5515 ) (#5516 ) * [Enhance] Make MemTracker more accurate (#5515) This PR main about: 1. Improve the readability of MemTrackers' name 2. Add the MemTracker of: * Load * Compaction * SchemaChange * StoragePageCache * TabletManager 3. Change SchemaChange to a Singleon * revise some code for Code Review * change the name of mem_tracker * keep reader_context have the same lifetime of rowset_reader in schema change. * change vlog notice to log(warning) in schema change	2021-04-08 09:14:55 +08:00
Zhengguo Yang	d641a26490	[Refactor] Remove boost filesystem (#5579 ) * use std::filesystem instead of boost Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>	2021-04-08 09:11:59 +08:00
Mingyu Chen	780900ac9c	[Feature] Support preceding filter original data when loading (#5338 ) Support conditional filtering of original data in broker load and routine load eg: ``` LOAD LABEL `label1` ( DATA INFILE ('bos://cmy-repo/1.csv') INTO TABLE tbl2 COLUMNS TERMINATED BY '\t' (event_day, product_id, ocpc_stage, user_id) SET ( ocpc_stage = ocpc_stage + 100 ) PRECEDING FILTER user_id = 1381035 WHERE ocpc_stage > 30 ) ... ```	2021-02-07 22:37:48 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
sduzh	10e1e29711	Remove header file common/names.h (#4945 )	2020-11-26 17:00:48 +08:00
Mingyu Chen	f1b57c4418	[Optimize] Avoid repeated sending of common components in Fragments (#4904 ) This CL mainly changes: 1. Avoid repeated sending of common components in Fragments In the previous implementation, a query may generate multiple Fragments, these Fragments contain some common information, such as DescriptorTable. Fragment will be sent to BE in a certain order, so these public information will be sent repeatedly and generated repeatedly on the BE side. In some complex SQL, these public information may be very large, thereby increasing the execution time of Fragment. So I improved this. For multiple Fragments sent to the same BE, only the first Fragment will carry these public information, and it will be cached on the BE side, and subsequent Fragments no longer need to carry this information. In the local test, the execution time of some complex SQL can be reduced from 3 seconds to 1 second. 2. Add the time-consuming part of FE logic in Profile Including SQL analysis, planning, Fragment scheduling and sending on the FE side, and the time to fetch data.	2020-11-22 20:38:05 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
Mingyu Chen	83f6f46c34	[Config] Limit the version number of tablet (#4687 ) Add a BE config `max_tablet_version_num` to limit the version number of a single tablet. To avoid too many versions	2020-10-13 10:08:16 +08:00
wangbo	2c24fe80fa	[SparkDpp] Support complete types (#4524 ) For[Spark Load] 1 support decimal andl largeint 2 add validate logic for char/varchar/decimal 3 check data load from hive with strict mode 4 support decimal/date/datetime aggregator	2020-09-13 11:57:33 +08:00
Mingyu Chen	976820ba20	[SegmentV2] Change the default storage format to SegmentV2 (#4387 ) Since the Segment V2 has been released for a long time, we should make it as default storage format for newly created table. This CL mainly changes: 1. For all newly created tables, their default storage format is Segment V2. 2. For all already exist tablets, their storage format remain unchanged. 3. Fix bugs described in Fix #4384 and Fix #4385	2020-08-24 21:51:17 +08:00
Mingyu Chen	e25108097d	[Bug][MemTracker] Cleanup the mem tracker's constructor to avoid wrong usage (#4345 ) After PR: #4135, If a mem tracker has parent, it should be created by 'CreateTracker'. So I removed other unused constructors. And also fix the bug described in #4344	2020-08-18 16:54:55 +08:00
wyb	3f7307d685	[Spark Load]Add spark etl job main class (#3927 ) 1. Add SparkEtlJob class 2. Remove DppResult comment 3. Support loading from hive table directly #3433	2020-06-24 13:54:55 +08:00
xy720	f189a2e7b8	[Spark load][Be 1/1] Be handle push task (#3742 ) 1、Add a PushBrokerReader in push_handle.cpp. 2、PushBrokerReader wraps the ParquetScanner to support reading data from parquet format file through broker.	2020-06-22 19:57:58 +08:00
Mingyu Chen	1421a9be41	[Compaction] Support compact only one rowset (#2558 ) Support compaction operation to compact only one rowset. After the modification, the last rowset of the tablet will also be compacted. At the same time, we added a `segments_overlap_pb` field to the rowset meta. Used to describe whether the segment data in the rowset overlaps. This field is set by `rowset_writer`. Initially UNKNOWN for compatibility with existing data. In addition, the version hash of the rowset generated after compaction is directly set to the version hash of last rowset participating in compaction, to ensure that the tablet's version hash remains unchanged after compaction.	2019-12-27 10:08:41 +08:00
Mingyu Chen	11872d5cf6	Sending clear txn task explicitly after transaction being aborted (#2182 )	2019-11-13 11:22:45 +08:00
kangpinghuang	6b4ef34162	fix AlphaRowsetTest by remove StorageEngine #2078 (#2091 )	2019-10-30 19:39:41 +08:00
kangpinghuang	6634051359	Make default rowset type to config (#2020 )	2019-10-21 21:44:00 +08:00
yiguolei	0e4b3755a2	Refactor txn manager methods (#1950 )	2019-10-11 17:16:13 +08:00
Dayue Gao	a63989cc61	Use RowsetFactory to create and init RowsetWriter (#1740 )	2019-09-04 17:02:43 +08:00
yiguolei	6f4feca3dc	Add rowset id generator to FE and BE (#1678 )	2019-09-02 18:51:31 +08:00
Dayue Gao	af8256be2a	Implement BetaRowsetWriter (#1590 ) BetaRowsetWriter is used to write rowset in V2 segment format. This PR contains several interface changes 1. Rowset.make_snapshot() is renamed to `link_files_to` because hard links are also useful in copy task, linked schema change, etc 2. Rowset.copy_files_to_path() is renamed to `copy_files_to` to be consistent with other names 3. RowsetWriter.mem_pool() is removed because not all rowset writers use MemPool 4. RowsetWriter.garbage_collection() is removed because it's not used by clients 5. SegmentGroup's make_snapshot() is removed because link_segments_to_path() provides similar functionality	2019-08-12 16:41:47 +08:00
ZHAO Chun	c5edf9dae0	Unify Field and ColumnSchema in Storage (#1561 ) Currently, we have Field and ColumnSchema to access column data in a row. These two classes are mostly the same. So we should unify these to one class. Now, Field has offset information, which is an row attribute, so we remove offset in Field. RowCursor now has some logic which belong to Schema, so in this patch I add Schema attribute to RowCursor to make RowCursor simple. After this change, only Schema will handle Field/ColumnSchema. I extract some logic from RowCursor to be/src/olap/row.h, then we can use same logic to handle different types of row. Each type of row has same function that to get Cell of this row. A cell represent a column content with a null indicator.	2019-07-30 14:01:57 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
Mingyu Chen	e4e04e8203	Make LZO support optional (#1263 )	2019-06-07 22:26:54 +08:00
lichaoyong	842e943f56	Fix compaction and ingestion core (#417 ) Error occurs when reading data by compaction and ingestion. Under the circumstance, the two operation should stop and return error.	2018-12-12 11:30:06 +08:00
李超勇	6b4049e21c	Unify Slice code path (#380 )	2018-12-03 18:11:47 +08:00
李超勇	ff95f23615	Remove OLAP_LOG_DEBUG AND OLAP_LOG_TRACE log format (#378 ) Use VLOG(3) and VLOG(10) instead	2018-12-03 10:08:21 +08:00
李超勇	5dea8bd3e6	Remove OLAP_LOG_FATAL log format. Use LOG(FATAL) instead (#376 )	2018-12-01 19:26:08 +08:00
李超勇	3d324e38ea	Remove OLAP_LOG_INFO log format. Use LOG(INFO) instead (#372 )	2018-11-30 20:59:40 +08:00
ZHAO Chun	49302955c8	Revert "Remove OLAP_LOG_INFO log format. Use LOG(INFO) instead (#370 )" (#371 ) This reverts commit a816925776de06dc7503ea7429802cad9042d0e4.	2018-11-30 20:56:51 +08:00
李超勇	a816925776	Remove OLAP_LOG_INFO log format. Use LOG(INFO) instead (#370 ) * Remove unused row-oriented format flags * Remove unused row-oriented format flags * Remove OLAP_LOG_INFO log format. Use LOG(INFO) instead	2018-11-30 20:36:58 +08:00
kangpinghuang	85d0996b35	Rename Rowset to SegmentGroup (#364 ) * Rename Rowset to SegmentGroup * Modify protobuf related rowset to SegmentGroup	2018-11-29 17:30:41 +08:00
李超勇	1ba8a4ee4e	Transform row-oriented table to columnar-oriented table (#311 )	2018-11-16 16:03:56 +08:00
chenhao7253886	37b4cafe87	Change variable and namespace name in BE (#268 ) Change 'palo' to 'doris'	2018-11-02 10:22:32 +08:00
morningman	2868793b6b	Change license to Apache License 2.0 (#262 )	2018-11-01 09:06:01 +08:00
morningman	051aced48d	Missing many files in last commit In last commit, a lot of files has been missed	2018-10-31 16:19:21 +08:00
morningman	65fe7f65c1	Fixed: privilege logic error: 1. No one can set root password expect for root user itself 2. NODE_PRIV cannot be granted. 3. ADMIN_PRIV and GRANT_PRIV can only be granted or revoked on . 4. No one can modifly privs of default role 'operator' and 'admin'. 5. No user can be granted to role 'operator'. Fixed: the running load limit should not be applied to replay logic. It will cause replay or loading image fail. Changed: optimize the problem of too many directories under mini load directory. Fixed: missing password and auth check when handling mini load request in Frontend. Fixed: DomainResolver should start after Frontends transfer to a certain ROLE, not in Catalog construction methods. Fixed: a stupid bug that no one can set password for root user... fix it: only root user can set password for root. Fixed: read null data twice When reading data with a null value, in some cases, the same data will be read twice by the storage engine, resulting in a wrong result.The reason for this problem is that when splitting, and the start key is the minimum value, the data with null is read. Fixed: add a flag to prevent DomainResovler thread start twice. Fixed: fixed a mem leak of using ByteBuf when parsing auth info of http request. Fixed: add a new config 'disable_hadoop_load', default is false, set to true to disable hadoop load. Changed: add detail error msg of submitting hadoop load job in show load result. Fixed: Backend process should be crashed if failed to saving header. Added: exposure backend info to user when encounter error on Backend. for debugging it more convenient. Fixed: Should remove fd from map when inputstream or outputstream is closed in Broker process. Fixed: Change all files' LF to unix format. Internal commit id: merge from dfcd0aca18eed9ff99d188eb3d01c60d419be1b8	2018-10-01 19:58:41 +08:00
morningman	2419384e8a	push 3.3.19 to github (#193 ) * push 3.3.19 to github * merge to 20ed420122a8283200aa37b0a6179b6a571d2837	2018-05-15 20:38:22 +08:00

1 2

52 Commits