doris

Author	SHA1	Message	Date
Xin Liao	1e6a3610a7	[feature-wip](unique-key-merge-on-write) optimize rowid conversion and add ut (#11541 )	2022-08-08 10:41:44 +08:00
yiguolei	321107cb40	[refactor](schema change) Using tablet schema shared ptr instead of raw ptr (#11475 ) * Using tabletschema shared ptr instead of raw ptrs Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-05 11:04:38 +08:00
Xin Liao	2783267599	[feature-wip](unique-key-merge-on-write) update bitmap after compaction, DSIP-018 (#11289 )	2022-07-30 19:50:47 +08:00
Xin Liao	d4fb27125a	[feature-wip](unique-key-merge-on-write) row id conversion for compaction (#11149 )	2022-07-27 16:32:13 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Lightman	486cf0ebd4	[Feature] Lightweight schema change of add/drop column (#10136 ) * [Schema Change] support fast add/drop column (#49) * [feature](schema-change) support fast schema change. coauthor: yixiutt * [schema change] Using columns desc from fe to read data. coauthor: Lchangliang * [feature](schema change) schema change optimize for add/drop columns. 1.add uniqueId field for class column. 2.schema change for add/drop columns directly update schema meta Co-authored-by: yixiutt <yixiu@selectdb.com> Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com> [Feature](schema change) fix write and add regression test (#69) Co-authored-by: yixiutt <yixiu@selectdb.com> [schema change] be ssupport that delete use newest schema add delete regression test fix regression case (#107) tmp [feature](schema change) light schema change exclude rollup and agg/uniq/dup key type. [feature](schema change) fe olapTable maxUniqueId write in disk. [feature](schema change) add rpc iface for sc add column. [feature](schema change) add columnsDesc to TPushReq for ligtht sc. resolve the deadlock when schema change (#124) fix columns from fe don't has bitmap_index flag (#134) add update/delete case construct MATERIALIZED schema from origin schema when insert fix not vectorized compaction coredump use segment cache choose newest schema by schema version when compaction (#182) [bugfix](schema change) fix ligth schema change problem. [feature](schema change) light schema change add alter job. (#1) fix be ut [bug] (schema change) unique drop key column should not light schema change [feature](schema change) add schema change regression-test. fix regression test [bugfix](schema change) fix multi alter clauses for light schema change. (#2) [bugfix](schema change) fix multi clauses calculate column unique id (#3) modify PushTask process (#217) [Bugfix](schema change) fix jobId replay cause bdbje exception. [bug](schema change) fix max col unique id repeatitive. (#232) [optimize](schema change) modify pendingMaxColUniqueId generate rule. fix compaction error * fix be ut * fix snapshot load core fix unique_id error (#278) [refact](fe) remove redundant code for light schema change. (#4) [refact](fe) remove redundant code for light schema change. (#4) format fe core format be core fix be ut modify fe meta version fix rebase error flush schema into rowset_meta in old table [refactor](schema change) refact fe light schema change. (#5) delete the change of schemahash and support get max version schema * modify for review * fix be ut * fix schema change test	2022-07-12 19:41:06 +08:00
Adonis Ling	718a51a388	[refactor][style] Use clang-format to sort includes (#9483 )	2022-05-10 21:25:35 +08:00
Gabriel	49a0cd1925	[fix](compaction) fix bug for vectorized compaction (#9344 ) 1. add a BE config to switch vectorized compaction 2. Fix vectorized compaction bug that row statistic is not right.	2022-05-03 17:31:40 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
HappenLee	d330bc3806	[Vectorized](stream-load-vec) Support stream load in vectorized engine (#8709 ) (#9280 ) Implement vectorized stream load. Added fe configuration option `enable_vectorized_load` to enable vectorized stream load. Co-authored-by: tengjp@outlook.com Co-authored-by: mrhhsg@gmail.com Co-authored-by: minghong.zhou@163.com Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>	2022-04-29 09:50:51 +08:00
yiguolei	e5e0dc421d	[refactor] Change ALL OLAPStatus to Status (#8855 ) Currently, there are 2 status code in BE, one is common/Status.h, and the other is olap/olap_define.h called OLAPStatus. OLAPStatus is just an enum type, it is very simple and could not save many informations, I will unify these code to common/Status.	2022-04-14 11:43:49 +08:00
Xinyi Zou	e17aef9467	[refactor] refactor the implement of MemTracker, and related usage (#8322 ) Modify the implementation of MemTracker: 1. Simplify a lot of useless logic; 2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing; 3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes; 4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection 5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently; 6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later; 7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env; 8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.; Modify where MemTracker is used: 1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code; 2. Added trackers for global objects such as ChunkAllocator and StorageEngine; 3. Added more fine-grained trackers such as ExprContext; 4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode; 5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;	2022-03-11 22:04:23 +08:00
thinker	2a2f12ca51	[refactor & fix](exce & olap) refactor reader: rename Reader to TabletReader (#7544 ) 1. Consider the responsibility of Reader, Rename Reader to TabletReader, I think the new name TabletReader can represent its function exactly, it is more suitable and meaningful 2. add virtual keyword for the destructor of OlapScanner, because VOlapScanner is derived from it 3. refactor struct ReaderParams and KeysParam as TabletReader's inner struct，guard by TabletReader name scope, it's also more reasonable 4. reduce OlapScanner's member data amount, just use _parent->member_data is simpler 5. bugfix: TupleReader has the same memeber data _collect_iter to its parent class Reader, this usage is dangerous, the writer may make some mistake, so i delete TupleReader::_collect_iter to fix it. 6. call set_tablet_reader() in OlapScanner::prepare() to setup _tablet_reader, VOlapScanner should override set_tablet_reader to new BlockReader instead, use this way to avoid new Reader twice by reset unique_ptr _tablet_reader 7. if the member data is a inseparable part of a class, i suggest using normal variable while not pointer variable, because pointer bring a indirect lay and must handle coping and destructing carefully, it's not necessary 8. some other small changes for readability or design	2022-01-06 00:00:32 +08:00
HappenLee	c3b133bdb3	[Refactor] Refactor the reader code (#6866 ) 1. Removed useless redundant code logic 2. Change reader to interface, add tuple reader to simplify the structure of reader	2021-10-30 18:15:28 +08:00
Yingchun Lai	6d6c3d9703	[Enhancement] Reduce memory consumption by releasing readers earier (#5811 ) We created multiple rowset readers to read data of one tablet, after one rowset reader has reached EOF, it can be released to reduce resource (typically memory) consumption. As the same, we can release segment reader when it reach EOF.	2021-06-16 09:37:50 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
weizuo93	4c63dc0027	[Metric] Add metrics for compaction permits and log for compaction merge (#4893 ) 1. Add metrics to `used permits` and `waitting permits` for compaction. It would be useful to monitor `permits` hold by all executing compaction tasks and waitting compaction task. 2. Add log which can be chosen by config for merge rowsets. It would be helpful to track the process of rowsets merging for compaction task which lasts for a long time.	2020-11-28 10:00:08 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
Yingchun Lai	3c09e1e1d8	[trace] Adapt trace util to compaction module (#3814 ) Trace util is helpful for diagnosing compaction performance problems, we can get trace log for base compaction like: ``` W0610 11:26:33.804431 56452 storage_engine.cpp:552] Trace: 0610 11:23:03.727535 (+ 0us) storage_engine.cpp:554] start to perform base compaction 0610 11:23:03.728961 (+ 1426us) storage_engine.cpp:560] found best tablet 546859 0610 11:23:03.728963 (+ 2us) base_compaction.cpp:40] got base compaction lock 0610 11:23:03.729029 (+ 66us) base_compaction.cpp:44] rowsets picked 0610 11:24:51.784439 (+108055410us) compaction.cpp:46] got concurrency lock and start to do compaction 0610 11:24:51.784818 (+ 379us) compaction.cpp:74] prepare finished 0610 11:26:33.359265 (+101574447us) compaction.cpp:87] merge rowsets finished 0610 11:26:33.484481 (+125216us) compaction.cpp:102] output rowset built 0610 11:26:33.484482 (+ 1us) compaction.cpp:106] check correctness finished 0610 11:26:33.513197 (+ 28715us) compaction.cpp:110] modify rowsets finished 0610 11:26:33.513300 (+ 103us) base_compaction.cpp:49] compaction finished 0610 11:26:33.513441 (+ 141us) base_compaction.cpp:56] unused rowsets have been moved to GC queue Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"input_rowsets_data_size":1256413170,"input_segments_num":44,"merge_rowsets_latency_us":101574444,"merged_rows":0,"output_row_num":3346807,"output_rowset_data_size":1228439659,"output_segments_num":6} ``` for cumulative compaction like: ``` W0610 11:14:18.714366 56468 storage_engine.cpp:518] Trace: 0610 11:14:08.068484 (+ 0us) storage_engine.cpp:520] start to perform cumulative compaction 0610 11:14:08.069844 (+ 1360us) storage_engine.cpp:526] found best tablet 547083 0610 11:14:08.069846 (+ 2us) cumulative_compaction.cpp:42] got cumulative compaction lock 0610 11:14:08.069947 (+ 101us) cumulative_compaction.cpp:46] calculated cumulative point 0610 11:14:08.070141 (+ 194us) cumulative_compaction.cpp:50] rowsets picked 0610 11:14:08.070143 (+ 2us) compaction.cpp:46] got concurrency lock and start to do compaction 0610 11:14:08.070518 (+ 375us) compaction.cpp:74] prepare finished 0610 11:14:15.389893 (+7319375us) compaction.cpp:87] merge rowsets finished 0610 11:14:15.390916 (+ 1023us) compaction.cpp:102] output rowset built 0610 11:14:15.390917 (+ 1us) compaction.cpp:106] check correctness finished 0610 11:14:15.409460 (+ 18543us) compaction.cpp:110] modify rowsets finished 0610 11:14:15.409496 (+ 36us) cumulative_compaction.cpp:55] compaction finished 0610 11:14:15.410138 (+ 642us) cumulative_compaction.cpp:65] unused rowsets have been moved to GC queue Metrics: {"filtered_rows":0,"input_row_num":136707,"input_rowsets_count":302,"input_rowsets_data_size":76617836,"input_segments_num":302,"merge_rowsets_latency_us":7319372,"merged_rows":0,"output_row_num":136707,"output_rowset_data_size":53893280,"output_segments_num":1} ```	2020-06-13 19:31:51 +08:00
kangkaisen	cafb9f1e62	Replace Arena with MemPool first step (#1899 )	2019-09-28 01:12:22 +08:00
kangkaisen	b246d93128	Avoid SerDe for aggregation query with object pool (#1854 )	2019-09-26 13:51:13 +08:00
Dayue Gao	a63989cc61	Use RowsetFactory to create and init RowsetWriter (#1740 )	2019-09-04 17:02:43 +08:00
kangkaisen	1e4dd77d2a	Add bitmap agg type and udaf (#1610 )	2019-08-26 14:24:42 +08:00
lichaoyong	dcb75729db	Change cumulative compaction for decoupling storage from compution (#1576 ) 1. Calculate cumulative point when loading tablet first time. 2. Simplify pick rowsets logic upon delete predicate. 3. Saving meta and modify rowsets only once after cumulative compaction.	2019-08-13 18:25:56 +08:00
Dayue Gao	af8256be2a	Implement BetaRowsetWriter (#1590 ) BetaRowsetWriter is used to write rowset in V2 segment format. This PR contains several interface changes 1. Rowset.make_snapshot() is renamed to `link_files_to` because hard links are also useful in copy task, linked schema change, etc 2. Rowset.copy_files_to_path() is renamed to `copy_files_to` to be consistent with other names 3. RowsetWriter.mem_pool() is removed because not all rowset writers use MemPool 4. RowsetWriter.garbage_collection() is removed because it's not used by clients 5. SegmentGroup's make_snapshot() is removed because link_segments_to_path() provides similar functionality	2019-08-12 16:41:47 +08:00
ZHAO Chun	c5edf9dae0	Unify Field and ColumnSchema in Storage (#1561 ) Currently, we have Field and ColumnSchema to access column data in a row. These two classes are mostly the same. So we should unify these to one class. Now, Field has offset information, which is an row attribute, so we remove offset in Field. RowCursor now has some logic which belong to Schema, so in this patch I add Schema attribute to RowCursor to make RowCursor simple. After this change, only Schema will handle Field/ColumnSchema. I extract some logic from RowCursor to be/src/olap/row.h, then we can use same logic to handle different types of row. Each type of row has same function that to get Cell of this row. A cell represent a column content with a null indicator.	2019-07-30 14:01:57 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
lichaoyong	b037466d56	Get rid of choosing one tablet by compaction (#433 ) 1. Get rid of choosing one tablet by compaction. 2. Change PREFER_READER to PREFER_WRITING from _tablet_map_lock. 3. Change license of murmur_hash	2018-12-24 16:55:39 +08:00
lichaoyong	842e943f56	Fix compaction and ingestion core (#417 ) Error occurs when reading data by compaction and ingestion. Under the circumstance, the two operation should stop and return error.	2018-12-12 11:30:06 +08:00
李超勇	ff95f23615	Remove OLAP_LOG_DEBUG AND OLAP_LOG_TRACE log format (#378 ) Use VLOG(3) and VLOG(10) instead	2018-12-03 10:08:21 +08:00
kangpinghuang	85d0996b35	Rename Rowset to SegmentGroup (#364 ) * Rename Rowset to SegmentGroup * Modify protobuf related rowset to SegmentGroup	2018-11-29 17:30:41 +08:00
李超勇	1ba8a4ee4e	Transform row-oriented table to columnar-oriented table (#311 )	2018-11-16 16:03:56 +08:00
chenhao7253886	37b4cafe87	Change variable and namespace name in BE (#268 ) Change 'palo' to 'doris'	2018-11-02 10:22:32 +08:00
morningman	2868793b6b	Change license to Apache License 2.0 (#262 )	2018-11-01 09:06:01 +08:00
morningman	051aced48d	Missing many files in last commit In last commit, a lot of files has been missed	2018-10-31 16:19:21 +08:00
morningman	2419384e8a	push 3.3.19 to github (#193 ) * push 3.3.19 to github * merge to 20ed420122a8283200aa37b0a6179b6a571d2837	2018-05-15 20:38:22 +08:00
李超勇	6486be64c3	fix license statement (#29 ) * change picture to word * change picture to word * SHOW FULL TABLES WHERE Table_type != VIEW sql can not execute * change license description	2017-08-18 19:16:23 +08:00
cyongli	e2311f656e	baidu palo	2017-08-11 17:51:21 +08:00

38 Commits