doris

Author	SHA1	Message	Date
wuwenchi	102abff071	[Fix](spark-load) ignore column name case in spark load (#23947 ) Doris is not case sensitive to field names, so when doing spark load, we can convert all fields to lowercase for matching and loading.	2023-09-10 19:45:01 +08:00
wuwenchi	e5d1248c72	[Fix] spark load not found file #23502 Use the thrift interface to assign values to variables. Otherwise, __isset will returns false.	2023-09-02 01:16:38 +08:00
lihangyu	527293aa41	[refactor](dynamic table) remove dynamic table (#23298 )	2023-08-23 14:15:14 +08:00
plat1ko	d4694167a8	[Enhancement](chore) Some Status relevant enhancement (#23072 )	2023-08-21 14:14:38 +08:00
AlexYue	f036cdfde6	[feature](compaction) support delete in cumulative compaction (#19609 )	2023-08-07 15:22:21 +08:00
Mingyu Chen	5fc0a84735	[improvement](catalog) reduce the size thrift params for external table query (#21771 ) ### 1 In previous implementation, for each FileSplit, there will be a `TFileScanRange`, and each `TFileScanRange` contains a list of `TFileRangeDesc` and a `TFileScanRangeParams`. So if there are thousands of FileSplit, there will be thousands of `TFileScanRange`, which cause the thrift data send to BE too large, resulting in: 1. the rpc of sending fragment may fail due to timeout 2. FE will OOM For a certain query request, the `TFileScanRangeParams` is the common part and is same of all `TFileScanRange`. So I move this to the `TExecPlanFragmentParams`. After that, for each FileSplit, there is only a list of `TFileRangeDesc`. In my test, to query a hive table with 100000 partitions, the size of thrift data reduced from 151MB to 15MB, and the above 2 issues are gone. ### 2 Support when setting `max_external_file_meta_cache_num` <=0, the file meta cache for parquet footer will not be used. Because I found that for some wide table, the footer is too large(1MB after compact, and much more after deserialized to thrift), it will consuming too much memory of BE when there are many files. This will be optimized later, here I just support to disable this cache.	2023-07-17 13:37:02 +08:00
Pxl	ca71048f7f	[Chore](status) avoid empty error msg on status (#21454 ) avoid empty error msg on status	2023-07-11 13:48:16 +08:00
yiguolei	31a4f96f01	[refactor](exprcontext) move close to expr context's dector method (#20747 ) The close method does nothing. But I am not sure we could remove it. So that I add it to dector method and remove many many calls.	2023-06-14 18:01:07 +08:00
Xinyi Zou	0f21166110	[fix](memory) Fix runtime state default mem tracker (#20615 ) start time: Wed 07 Jun 2023 06:50:14 PM CST * Query id: e9000000e9-eb00000073 * * Aborted at 1686136356 (unix time) try "date -d @1686136356" if you are using GNU date * * Current BE git commitID: 5c33dd7a2c * * SIGSEGV address not mapped to object (@0x23000000235) received by PID 2131238 (TID 2132258 OR 0x7f708eff7700) from PID 565; stack trace: * 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t, void) at /mnt/hdd01/repo_center/doris_branch-2.0-beta/doris/be/src/common/signal_handler.h:413 1# 0x00007F727BBE3090 in /lib/x86_64-linux-gnu/libc.so.6 2# doris::AttachTask::AttachTask(doris::RuntimeState) at /mnt/hdd01/repo_center/doris_branch-2.0-beta/doris/be/src/runtime/thread_context.cpp:43 3# std::_Function_handler<void (doris::PTabletWriterAddBlockResult const&, bool), doris::stream_load::VNodeChannel::open_wait()::$_1>::_M_invoke(std::_Any_data const&, doris::PTabletWriterAddBlockResult const&, bool&&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291 4# doris::stream_load::ReusableClosure<doris::PTabletWriterAddBlockResult>::Run() at /mnt/hdd01/repo_center/doris_branch-2.0-beta/doris/be/src/vec/sink/vtablet_sink.h:176 5# brpc::Controller::EndRPC(brpc::Controller::CompletionInfo const&) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be 6# brpc::Controller::OnVersionedRPCReturned(brpc::Controller::CompletionInfo const&, bool, int) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be 7# brpc::policy::ProcessRpcResponse(brpc::InputMessageBase) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be 8# brpc::InputMessenger::InputMessageClosure::~InputMessageClosure() in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be 9# brpc::InputMessenger::OnNewMessages(brpc::Socket) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be 10# brpc::Socket::ProcessEvent(void) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be 11# bthread::TaskGroup::task_runner(long) in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be 12# bthread_make_fcontext in /root/20230607171843-doris-branch-2.0-beta-5c33dd7a/be/lib/doris_be	2023-06-09 21:09:07 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
WenYao	481e9aebdb	[Refactor](spark load) remove parquet scanner (#19251 )	2023-05-18 19:19:13 +08:00
DeadlineFen	e08de52ee7	[chore](compile) using PCH for compilation acceleration under clang (#19303 )	2023-05-08 19:51:06 +08:00
Pxl	dff669899a	[Feature](generic-aggregation) add some type define for generic aggregate functions support (#19252 ) add some type define for generic aggregate functions support	2023-05-06 11:30:13 +08:00
yiguolei	8d7a9fd21b	[refactor](exceptionsafe) add factory creator to some class (#18978 ) make vexprecontext,vexpr,function,query context,runtimestate thread safe. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-24 10:32:11 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
AlexYue	1f631c388d	[enhance](cooldown)accelerate cooldown task produce efficiency (#16089 )	2023-02-10 16:58:27 +08:00
yiguolei	4b6a4b3cf7	[refactor](remove unused code) Remove unused mempool declare or function params (#16222 ) * Remove unused mempool declare or function params --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-30 13:03:18 +08:00
yiguolei	5eaa995704	[refactor](some mempool) not memset 0 in default value iterator (#16194 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-29 22:50:39 +08:00
yiguolei	0b5e71d3b4	[refactor](refactor field) remove unused method (#16068 )	2023-01-19 10:16:09 +08:00
yiguolei	d257059e6b	[refactor](remove hadoop dpp) remove hadoop dpp code since it is not used (#16009 )	2023-01-18 15:01:04 +08:00
Tiewei Fang	bbdf40b6bd	[Enhencement](Push Handle) use VParquetScanner in PushHandle (#15980 ) * use VParquetScanner in PushHadnle * delete ParquetScanner	2023-01-17 16:21:04 +08:00
Xinyi Zou	97fcad76f8	[enhancement](memtracker) Improve readability (#15716 )	2023-01-16 16:30:35 +08:00
pengxiangyu	58c520dbfd	[Feature](remote) Cooldown cold data to object storage only one replica (#15832 )	2023-01-14 23:58:00 +08:00
HappenLee	9468711f9f	[Bug](join) fix bug null aware left anti join not correct result (#15841 )	2023-01-13 10:18:05 +08:00
zbtzbtzbt	fe5e5d2bf4	[refactor] separate agg and flush in memtable (#15713 )	2023-01-11 10:07:34 +08:00
spaces-x	1018657d9d	[Enhancement](SparkLoad): avoid BE OOM in push task, fix #15572 (#15620 ) Release memory pool held by the parquet reader when the data has been flushed by rowset writter. Co-authored-by: spaces-x <weixiang06@meituan.com>	2023-01-05 10:20:32 +08:00
plat1ko	ad68764977	[enhancement](tablet) Unify redundant `create_rowset_writer` methods (#15519 ) * Remove redundant create_rowset_writer methods * Set resource id when setting FS in rowset meta * fix * fix ut	2022-12-30 22:57:12 +08:00
Gabriel	b085ff49f0	[refactor](non-vec) delete non-vec data sink (#15283 ) * [refactor](non-vec) delete non-vec data sink Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-12-23 14:10:47 +08:00
Gabriel	e9a201e0ec	[refactor](non-vec) delete some non-vec exec node (#15239 ) * [refactor](non-vec) delete some non-vec exec node	2022-12-22 14:05:51 +08:00
AlexYue	821c12a456	[chore](BE) remove all useless segment group related code #15193 The segment group is useless in current codebase, remove all the related code inside Doris. As for the related protobuf code, use reserved flag to prevent any future user from using that field.	2022-12-20 17:11:47 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
AlexYue	58bc254529	[enhancement](BE)add metric for too many version (#14735 ) * add one funciton to get if exceeds version limit add bvar to indicate version exceed * resolve * remove unnecessary header file	2022-12-05 11:37:14 +08:00
jiafeng.zhang	915d8989c5	[feature](spark-load)Spark load supports string type data import (#11927 )	2022-08-22 08:56:59 +08:00
yiguolei	321107cb40	[refactor](schema change) Using tablet schema shared ptr instead of raw ptr (#11475 ) * Using tabletschema shared ptr instead of raw ptrs Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-05 11:04:38 +08:00
yiguolei	003335c1c5	[refactor](schema change) spark dpp need not call convert rowset during load process (#11397 ) * remove unused schema change logic in push handler Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-08-02 10:18:00 +08:00
Lightman	b35daf0a04	[improvement](light-schema-change) Support tablet schema cache (#11131 )	2022-08-01 12:18:00 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
yiguolei	82251a6bab	[refactor] some refactor of delete predicates (#10816 )	2022-07-15 14:13:34 +08:00
Lightman	486cf0ebd4	[Feature] Lightweight schema change of add/drop column (#10136 ) * [Schema Change] support fast add/drop column (#49) * [feature](schema-change) support fast schema change. coauthor: yixiutt * [schema change] Using columns desc from fe to read data. coauthor: Lchangliang * [feature](schema change) schema change optimize for add/drop columns. 1.add uniqueId field for class column. 2.schema change for add/drop columns directly update schema meta Co-authored-by: yixiutt <yixiu@selectdb.com> Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com> [Feature](schema change) fix write and add regression test (#69) Co-authored-by: yixiutt <yixiu@selectdb.com> [schema change] be ssupport that delete use newest schema add delete regression test fix regression case (#107) tmp [feature](schema change) light schema change exclude rollup and agg/uniq/dup key type. [feature](schema change) fe olapTable maxUniqueId write in disk. [feature](schema change) add rpc iface for sc add column. [feature](schema change) add columnsDesc to TPushReq for ligtht sc. resolve the deadlock when schema change (#124) fix columns from fe don't has bitmap_index flag (#134) add update/delete case construct MATERIALIZED schema from origin schema when insert fix not vectorized compaction coredump use segment cache choose newest schema by schema version when compaction (#182) [bugfix](schema change) fix ligth schema change problem. [feature](schema change) light schema change add alter job. (#1) fix be ut [bug] (schema change) unique drop key column should not light schema change [feature](schema change) add schema change regression-test. fix regression test [bugfix](schema change) fix multi alter clauses for light schema change. (#2) [bugfix](schema change) fix multi clauses calculate column unique id (#3) modify PushTask process (#217) [Bugfix](schema change) fix jobId replay cause bdbje exception. [bug](schema change) fix max col unique id repeatitive. (#232) [optimize](schema change) modify pendingMaxColUniqueId generate rule. fix compaction error * fix be ut * fix snapshot load core fix unique_id error (#278) [refact](fe) remove redundant code for light schema change. (#4) [refact](fe) remove redundant code for light schema change. (#4) format fe core format be core fix be ut modify fe meta version fix rebase error flush schema into rowset_meta in old table [refactor](schema change) refact fe light schema change. (#5) delete the change of schemahash and support get max version schema * modify for review * fix be ut * fix schema change test	2022-07-12 19:41:06 +08:00
yiguolei	89e56ea67f	[refactor] remove alpha rowset related code and vectorized row batch related code (#10584 )	2022-07-05 20:33:34 +08:00
Pxl	5805f8077f	[Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003 )	2022-06-16 10:50:08 +08:00
yiguolei	2c79d223e4	[refactor][rowset]move rowset writer to a single place (#9368 )	2022-05-19 23:57:02 +08:00
plat1ko	4cd579b155	[refactor] Check status precise_code instead of construct OLAPInternalError (#9514 ) * check status precise_code instead of construct OLAPInternalError * move is_io_error to Status	2022-05-12 15:39:29 +08:00
Adonis Ling	bd126f0679	[improvement] Refactor type info for further optimizations. (#8786 ) ## Design: For now, there are two categories of types in Doris, one is for scalar types (such as int, char and etc.) and the other is for composite types (array and etc.). For the sake of performance, we can cache type info of scalar types globally (unique objects) due to the limited number of scalar types. When we consider the composite types, normally, the type info is generated in runtime (we can also use some cache strategy to speed up). The memory thereby should be reclaimed when we create type info for composite types. There are a lots of interfaces to get the type info of a specific type. I reorganized those as the following describes. 1. `const TypeInfo* get_scalar_type_info(FieldType field_type)` The function is used to get the type info of scalar types. Due to the cache, the caller uses the result WITHOUT considering the problems about memory reclaim. 2. `const TypeInfo* get_collection_type_info(FieldType sub_type)` The function is used to get the type info of array types with just ONE depth. Due to the cache, the caller uses the result WITHOUT considering the problems about memory reclaim. 3. `TypeInfoPtr get_type_info(segment_v2::ColumnMetaPB* column_meta_pb)` 4. `TypeInfoPtr get_type_info(const TabletColumn* col)` These functions are used to get the type info of BOTH scalar types and composite types. The caller should be responsible to manage the resources returned. #### About the new type `TypeInfoPtr` `TypeInfoPtr` is an alias type to `unique_ptr` with a custom deleter. 1. For scalar types, the deleter does nothing. 2. For composite types, the deleter reclaim the memory. By analyzing the callers of `get_type_info`, these classes should hold TypeInfoPtr: 1. `Field` 2. `ColumnReader` 3. `DefaultValueColumnIterator` Other classes are either constructed by the foregoing classes or hold those, so they can just use the raw pointer of `TypeInfo` directly for the sake of performance. 1. `ScalarColumnWriter` - holds `Field` 1. `ZoneMapIndexWriter` - created by `ScalarColumnWriter`, use `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `ZoneMapIndexWriter`, only uses scalar types. 2. `BitmapIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `BitmapIndexWriter`, uses `type_info` in `BitmapIndexWriter` and `BitmapIndexWriter` doesn't support `ArrayType`. 3. `BloomFilterIndexWriter` - created by `ScalarColumnWriter`, uses `type_info` from the field in `ScalarColumnWriter` 1. `IndexedColumnWriter` - created by `BloomFilterIndexWriter`, only uses scalar types. 2. `IndexedColumnReader` initializes `type_info` by the field type in meta (only scalar types). 3. `ColumnVectorBatch` 1. `ZoneMapIndexReader` creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `IndexedColumnReader` 2. `BitmapIndexReader` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BitmapIndexReader` 3. `BloomFilterIndexWriter` supports scalar types only and it creates `ColumnVectorBatch`, `ColumnVectorBatch` uses `type_info` in `BloomFilterIndexWriter`	2022-04-20 14:47:29 +08:00
yiguolei	e5e0dc421d	[refactor] Change ALL OLAPStatus to Status (#8855 ) Currently, there are 2 status code in BE, one is common/Status.h, and the other is olap/olap_define.h called OLAPStatus. OLAPStatus is just an enum type, it is very simple and could not save many informations, I will unify these code to common/Status.	2022-04-14 11:43:49 +08:00
Zhengguo Yang	290366787c	[refactor] refactor code, replace some file with stl libs (#8759 ) 1. replace ConditionVariables with std::condition_variable 2. repalace Mutex with std::mutex 3. repalce MonoTime with std::chrono	2022-04-13 09:55:29 +08:00
Mingyu Chen	bd0a3369b7	[fix] check disk capacity before writing data (#8887 ) 1. We forgot to check disk capacity when writing data. 2. TODO: the user specified disk capacity is not used now. We need to find a way to use it. 3. Avoid print too much compaction log when there is not suitable version for compaction.	2022-04-08 11:29:49 +08:00
caiconghui	4076c5466b	[refactor][improvement](type_info) use template and single instance to refactor get type info logic (#8680 ) 1. use const pointer instead of shared_ptr 2. Restrict array types to support only primitive types and nest up to 9 levels.	2022-04-03 10:10:36 +08:00
caiconghui	c69dd54116	[refactor](mutex) Use std::mutex to replace Mutex and refactor some lock logic (#8452 )	2022-03-24 14:50:02 +08:00
Xinyi Zou	eeae516e37	[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476 ) Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G Implement a new way of memory statistics based on TCMalloc New/Delete Hook, MemTracker and TLS, and it is expected that all memory new/delete/malloc/free of the BE process can be counted.	2022-03-20 23:06:54 +08:00

1 2 3

102 Commits