doris

Author	SHA1	Message	Date
Gabriel	588634ddf6	[feature] support runtime filter on vectorized engine (#10103 )	2022-06-20 09:46:38 +08:00
Mingyu Chen	67f341f44e	[TLP](step-1) Remove incubator prefix (#10230 ) Remove some `incubator-` prefix in source code. The document is not modified, will be done in next PR.	2022-06-19 19:34:52 +08:00
Xinyi Zou	6ad024a2bf	[fix] (mem tracker) Refactor memtable mem tracker, fix flush memtable DCHECK failed (#10156 ) 1. Added memory leak detection for `DeltaWriter` and `MemTable` mem tracker 2. Modify memtable mem tracker to virtual to avoid frequent recursive consumption of parent tracker. 3. Disable memtable flush thread attach memtable tracker, ensure that memtable mem tracker is completely accurate. 4. Modify `memory_verbose_track=false`. At present, there is a performance problem in the frequent switch thread mem tracker. - Because the mem tracker exists as a shared_ptr in the thread local. Each time it is switched, the atomic variable use_count in the shared_ptr of the current tracker will be -1, and the tracker to be replaced use_count +1, multi-threading Frequent changes to the same tracker shared_ptr are slow. - TODO: 1. Reduce unnecessary thread mem tracker switch, 2. Consider using raw pointers for mem tracker in thread local.	2022-06-19 16:48:42 +08:00
yinzhijian	70450d04ba	[typo] Fix typos in comments (#10172 )	2022-06-19 10:30:17 +08:00
xiepengcheng01	1d3496c6ab	[feature] support backup/restore connect to HDFS (#10081 )	2022-06-19 10:26:20 +08:00
camby	0e404edf54	[improvement] Change array offset type from UInt32 to UInt64 (#10070 ) Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now. If we call array_union to merge arrays repeatedly, the size of array may overflow. So we need to extend it before `Array Data Type` release.	2022-06-19 10:24:08 +08:00
zhangstar333	44e979e43b	[Vectorized][Function] add orthogonal bitmap agg functions (#10126 ) * [Vectorized][Function] add orthogonal bitmap agg functions save some file about orthogonal bitmap function add some file to rebase update functions file * refactor union_count function refactor orthogonal union count functions * remove bool is_variadic	2022-06-17 08:48:41 +08:00
Xinyi Zou	c784fb3ddd	[fix] (mem tracker) Fix core dump during transmit_block (#10133 ) In some cases, query mem tracker does not exist in BE when transmit block. This will result in a null pointer for get query mem tracker in brpc transmit_block	2022-06-17 00:01:30 +08:00
yinzhijian	75a7e72402	[Refactor] Use iequal to replace boost::iequals (#10146 ) * [Refactor] Use iequal to replace boost::iequals * remove unused include	2022-06-16 18:18:38 +08:00
Pxl	5805f8077f	[Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003 )	2022-06-16 10:50:08 +08:00
yinzhijian	bc431f2806	[typo] Fix typos in comments (#10142 )	2022-06-16 10:13:59 +08:00
plat1ko	f4e2f78a1a	[fix] Fix the bug that data balance causes tablet loss (#9971 ) 1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent. 2. According to #6063, almost apply this fix on current code.	2022-06-15 09:52:56 +08:00
Xinyi Zou	85362a907e	[fix](mem tracker) Fix some memory leaks, inaccurate statistics, core dump, deadlock bugs (#10072 ) 1. Fix the memory leak. When the load task is canceled, the `IndexChannel` and `NodeChannel` mem trackers cannot be destructed in time. 2. Fix Load task being frequently canceled by oom and inaccurate `LoadChannel` mem tracker limit, and rewrite the variable name of `mem limit` in `LoadChannel`. 3. Fix core dump, when logout task mem tracker, phmap erase fails, resulting in repeated logout of the same tracker. 4. Fix the deadlock, when add_child_tracker mem limit exceeds, calling log_usage causes `_child_trackers_lock` deadlock. 5. Fix frequent log printing when thread mem tracker limit exceeds, which will affect readability and performance. 6. Optimize some details of mem tracker display.	2022-06-14 21:38:37 +08:00
yinzhijian	2a96d7ffde	[spell] Fix spell error in row_batch.h (#10109 )	2022-06-14 15:28:29 +08:00
yinzhijian	622143f87c	[typo] Fix typos in comments (#10111 )	2022-06-14 15:28:11 +08:00
yinzhijian	9203a235e0	[typo] Fix typos in runtime_state.cpp (#10112 )	2022-06-14 15:27:40 +08:00
jacktengg	ce730293c0	[improvement] send merged runtime filter asynchrously (#10080 )	2022-06-14 08:16:25 +08:00
Xinyi Zou	d58e00c49c	[fix](brpc) Embed serialized request into the attachment and transmit it through http brpc (#9803 ) When the length of `Tuple/Block data` is greater than 2G, serialize the protoBuf request and embed the `Tuple/Block data` into the controller attachment and transmit it through http brpc. This is to avoid errors when the length of the protoBuf request exceeds 2G: `Bad request, error_text=[E1003]Fail to compress request`. In #7164, `Tuple/Block data` was put into attachment and sent via default `baidu_std brpc`, but when the attachment exceeds 2G, it will be truncated. There is no 2G limit for sending via `http brpc`. Also, in #7921, consider putting `Tuple/Block data` into attachment transport by default, as this theoretically reduces one serialization and improves performance. However, the test found that the performance did not improve, but the memory peak increased due to the addition of a memory copy.	2022-06-13 20:41:48 +08:00
Adonis Ling	415b6b8086	[feature-wip](array-type) Support array type which doesn't contain null (#9809 )	2022-06-12 23:35:28 +08:00
HappenLee	94089b9192	[Refactor] Use file factory to replace create file reader/writer (#9505 ) 1. Simplify code logic and improve abstraction 2. Fix the mem leak of raw pointer Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-08 15:07:39 +08:00
Adonis Ling	fc9afda97a	[enhancement][diagnostics] Add a diagnostic: detect unused includes (#9117 )	2022-06-08 11:52:48 +08:00
Xinyi Zou	d588e99b8b	[fix][mem tracker] Fix logout load task mem tracker dcheck fail (#9943 ) * fix tracker 0602 * fix format	2022-06-07 11:31:49 +08:00
Mingyu Chen	c996334ad1	[improvement] Optimize send fragment logic to reduce send fragment timeout error (#9720 ) This CL mainly changes: 1. Reducing the rpc timeout problem caused by rpc waiting for the worker thread of brpc. 1. Merge multiple fragment instances on the same BE to send requests to reduce the number of send fragment rpcs 2. If fragments size >= 3, use 2 phase RPC: one is to send all fragments, two is to start these fragments. So that there will be at most 2 RPC for each query on one BE. 3. Set the timeout of send fragment rpc to the query timeout to ensure the consistency of users' expectation of query timeout period. 4. Do not close the connection anymore when rpc timeout occurs. 5. Change some log level from info to debug to simplify the fe.log content. NOTICE: 1. Change the definition of execPlanFragment rpc, must first upgrade BE. 3. Remove FE config `remote_fragment_exec_timeout_ms`	2022-06-03 15:47:40 +08:00
Xinyi Zou	0376ca17f3	[Enhancement] Remove minidump (#9894 )	2022-06-01 08:04:24 +08:00
Xinyi Zou	c8d303a82c	[bugfix] Fix BE core about vectorized join build thread memtracker switch, and FileStat duplicate	2022-05-31 19:12:42 +08:00
Adonis Ling	f377c26bf7	[refactor][be] Optimize headers (#9708 )	2022-05-30 16:12:10 +08:00
Dayue Gao	4d1e926b6c	[feature][config] introduce a new BE config storage_page_cache_shard_size (#9821 ) Co-authored-by: gaodayue <gaodayue@bytedance.com>	2022-05-28 10:17:09 +08:00
Adonis Ling	2a11a4ab99	[feature-wip][array-type] Support more sub types. (#9466 ) Please refer to #9465	2022-05-26 08:41:34 +08:00
Zhengguo Yang	f5bef328fe	[fix] disable transfer data large than 2GB by brpc (#9770 ) because of brpc and protobuf cannot transfer data large than 2GB, if large than 2GB will overflow, so add a check before send	2022-05-25 18:41:13 +08:00
Xinyi Zou	ca05d1ee01	[fix](memory tracker) Fix lru cache, compaction tracker, add USE_MEM_TRACKER compile (#9661 ) 1. Fix Lru Cache MemTracker consumption value is negative. 2. Fix compaction Cache MemTracker has no track. 3. Add USE_MEM_TRACKER compile option. 4. Make sure the malloc/free hook is not stopped at any time.	2022-05-25 08:56:17 +08:00
Yongqiang YANG	6353539ef7	[bugfix]teach BufferedBlockMgr2 track memory right (#9722 ) The problem was introduced by e2d3d0134eee5d50b6619fd9194a2e5f9cb557dc.	2022-05-24 10:18:51 +08:00
pengxiangyu	75b3707a28	[refactor](load) add tablet errors when close_wait return error (#9619 )	2022-05-22 21:27:42 +08:00
Jibing-Li	5fa6e892be	[fix](broker-scan-node) Remove trailing spaces in broker_scanner. Make it consistent with hive and trino behavior. (#9190 ) Hive and trino/presto would automatically trim the trailing spaces but Doris doesn't. This would cause different query result with hive. Add a new session variable "trim_tailing_spaces_for_external_table_query". If set to true, when reading csv from broker scan node, it will trim the tailing space of the column	2022-05-20 09:55:13 +08:00
Yongqiang YANG	defdae1e7d	[improvement](stream-load) adjust read unit of http to optimize stream load (#9154 )	2022-05-20 09:52:36 +08:00
huangzhaowei	0f9ef26576	[Bug] Fix timestamp_diff issue when timeunit is year and month (#9574 )	2022-05-19 21:24:43 +08:00
Shuangchi He	73c4ec7167	Fix some typos in be/. (#9681 )	2022-05-19 20:55:39 +08:00
jacktengg	908f9cb7b9	[Improvement][ASAN] make BE can exit normally and ASAN memory leak checking work (#9620 )	2022-05-18 07:40:57 +08:00
yiguolei	cd105bee0a	[refactor](es) Clean es tcp scannode and related thrift definitions (#9553 ) PaloExternalSourcesService is designed for es_scan_node using tcp protocol. But es tcp protocol need deploy a tcp jar into es code. Both es version and lucene version are upgraded, and the tcp jar is not maintained any more. So that I remove all the related code and thrift definitions.	2022-05-14 10:03:55 +08:00
xueweizhang	375c1bf5c0	[feature](mysql-table) support utf8mb4 for mysql external table (#9402 ) This patch supports utf8mb4 for mysql external table. if someone needs a mysql external table with utf8mb4 charset, but only support charset utf8 right now. When create mysql external table, it can add an optional propertiy "charset" which can set character fom mysql connection, default value is "utf8". You can set "utf8mb4" instead of "utf8" when you need.	2022-05-11 09:39:23 +08:00
Adonis Ling	718a51a388	[refactor][style] Use clang-format to sort includes (#9483 )	2022-05-10 21:25:35 +08:00
Xinyi Zou	b34ed43ec9	[feature-wip] (memory tracker) (step6, End) Fix some details (#9301 ) 1. Fix LoadTask, ChunkAllocator, TabletMeta, Brpc, the accuracy of memory track. 2. Modified some MemTracker names, deleted some unnecessary trackers, and improved readability. 3. More powerful MemTracker debugging capabilities. 4. Avoid creating TabletColumn temporary objects and improve BE startup time by 8%. 5. Fix some other details.	2022-05-10 18:17:09 +08:00
hongbin	e61d296486	[Refactor] Replace '#ifndef' with '#pragma once' (#9456 ) * Replace '#ifndef' with '#pragma once'	2022-05-10 09:25:59 +08:00
Zhengguo Yang	6834fb23ca	[fix](s3) fix s3 Temp file may write failed because of has no space on disk (#9421 )	2022-05-09 09:28:43 +08:00
pengxiangyu	7234c964ae	[Bug] Missing error tablet list when close_wait return error (#9418 )	2022-05-08 06:45:28 +08:00
chenlinzhong	53574ce0ea	[Bug] (fix) DeltaWriter::mem_consumption() coredump (#9245 )	2022-05-07 19:13:08 +08:00
yiguolei	e3b90de2d5	remove file result writer from result sink (#9378 )	2022-05-06 02:37:20 +08:00
HappenLee	a33191e222	[fix](memtracker) DCHECK failed in vetorized exec engine fold constant execute (#9354 )	2022-05-05 09:55:38 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
HappenLee	d330bc3806	[Vectorized](stream-load-vec) Support stream load in vectorized engine (#8709 ) (#9280 ) Implement vectorized stream load. Added fe configuration option `enable_vectorized_load` to enable vectorized stream load. Co-authored-by: tengjp@outlook.com Co-authored-by: mrhhsg@gmail.com Co-authored-by: minghong.zhou@163.com Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>	2022-04-29 09:50:51 +08:00
xy720	2ec0b98787	[fix](routine-load) Fix bug that new coming routine load tasks are rejected all the time and report TOO_MANY_TASK error (#9164 ) ``` CREATE ROUTINE LOAD iaas.dws_nat ON dws_nat WITH APPEND PROPERTIES ( "desired_concurrent_number"="2", "max_batch_interval" = "20", "max_batch_rows" = "400000", "max_batch_size" = "314572800", "format" = "json", "max_error_number" = "0" ) FROM KAFKA ( "kafka_broker_list" = "xxxx:xxxx", "kafka_topic" = "nat_nsq", "property.kafka_default_offsets" = "2022-04-19 13:20:00" ); ``` In the create statement example below, you can see The user didn't specify the custom partitions. So that 1. Fe will get all kafka partitions from server in routine load's scheduler. The user set the default offset by datetime. So that 2. Fe will get kafka offset by time from server in routine load's scheduler. When 1 is success, meanwhile 2 is failed, the progress of this routine load may not contains any partitions and offsets. Nevertheless, since newCurrentKafkaPartition which is get by kafka server may be always equal to currentKafkaPartitions, the wrong progress will never be updated.	2022-04-27 23:21:17 +08:00

1 2 3 4 5 ...

572 Commits