doris

Author	SHA1	Message	Date
TengJianPing	2ec1d282c5	[fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof (#20007 ) * [fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof	2023-05-25 10:29:35 +08:00
TengJianPing	c730033595	[improvement](exchange) data stream sender stop sending data to receiver if it returns eos early (#19847 ) For broadcast join, only one build fragment instance will build hash table, other fragment instances just receive and throw away build side data, this is waste of memory and cpu. This PR improve this condition, data stream receiver tells sender that it does not need data from sender, and sender stops sending anydata to it.	2023-05-24 15:11:32 +08:00
HHoflittlefish777	fe111207a9	[Fix](lazy_open) Fix lazy open null point (#19829 )	2023-05-23 09:17:46 +08:00
luozenglin	3e010bbee7	[improvement](profile) add profile counter 'BytesSent' for VDataBufferSender (#19826 )	2023-05-19 08:46:50 +08:00
HappenLee	fe42e52851	[pipeline](CTE) Support multi stream data sink in pipeline (#19519 )	2023-05-18 10:34:37 +08:00
HappenLee	5fa956b0d6	[Bug](pipeline) RegressionTest failed release resouce cause DCHECK failed #19773	2023-05-18 08:35:57 +08:00
Yongqiang YANG	49c6bbce84	[improvement](load) do not create pthread in tablet_sink (#19465 ) add bvar stat for streamload.	2023-05-17 22:05:54 +08:00
luozenglin	272a7565b8	[improvement](tracing) Remove useless span levels from be side tracing (#19665 ) 1. Remove an exec node method corresponding to a span and replace it with an exec node corresponding to a span; 2. Fix some problems with tracing in pipeline.	2023-05-17 19:04:52 +08:00
lihangyu	e22f5891d2	[WIP](row store) two phase opt read row store (#18654 )	2023-05-16 13:21:58 +08:00
HHoflittlefish777	f8ef25bb10	[enhancement](load) lazy-open necessary partitions when load (#18874 )	2023-05-14 16:09:55 +08:00
xy720	39ec8aa64c	[refactor](complex-type) refactor array/map/struct literal to not invoke execute() function in prepare state (#19068 )	2023-05-11 18:44:37 +08:00
zclllyybb	28e088aee1	[optimization](be) optimization for ColumnConst when writing mysql result (#19122 ) * opt for result * fix	2023-05-11 01:04:18 +08:00
Pxl	dfad7b6b38	[Feature](generic-aggregation) some prowork of generic aggregation (#19343 ) some prowork of generic aggregation	2023-05-09 21:42:21 +08:00
Tiewei Fang	e78149cb65	[Enhencement](Export) add property for outfile/export and add test (#18997 ) This pr does three things: 1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first. 2. add p2 test for export 3. modify docs	2023-05-08 14:02:20 +08:00
yiguolei	153f42a873	[enhancement](exprcontext) modify get_output_block_after_execute_expr method more clear to avoid mis usage (#19310 ) The original method signature is Block VExprContext::get_output_block_after_execute_exprs( const std::vectorvectorized::VExprContext*& output_vexpr_ctxs, const Block& input_block, Status& status) It return error status as a out parameter and the block as return value. It has to check the block.rows == 0 and then check error status. It is not conforming to the convention. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-05-06 09:03:22 +08:00
Xinyi Zou	8e4710079d	[improvement](profile) Insert into add LoadChannel runtime profile (#18908 ) TabletSink and LoadChannel in BE are M: N relationship, Every once in a while LoadChannel will randomly return its own runtime profile to a TabletSink, so usually all LoadChannel runtime profiles are saved on each TabletSink, and the timeliness of the same LoadChannel profile saved on different TabletSinks is different, and each TabletSink will periodically send fe reports all the LoadChannel profiles saved by itself, and ensures to update the latest LoadChannel profile according to the timestamp.	2023-04-24 09:41:57 +08:00
yiguolei	63a76ed115	[refactor](exceptionsafe) disallow call new method explicitly (#18830 ) disallow call new method explicitly force to use create_shared or create_unique to use shared ptr placement new is allowed reference https://abseil.io/tips/42 to add factory method to all class. I think we should follow this guide because if throw exception in new method, the program will terminate. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-21 09:13:24 +08:00
HappenLee	eb93afc614	[MemLeak](pipeline) fix mem leak by exchange node in pipeline (#18864 )	2023-04-21 09:06:55 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Xinyi Zou	79c446c89f	[enhancement](exception) Column filter/replicate supports exception safety (#18503 )	2023-04-18 19:23:09 +08:00
Gabriel	5300b21db7	[Bug](DECIMALV3) report failure if a decimal value is overflow (#18336 )	2023-04-17 13:18:14 +08:00
Xinyi Zou	c704351273	[enhancement](memory) Refactor memory limit exceeded behavior (#18590 ) No check mem tracker limit and no cancel task in mem hook, only in Allocator. This helps in clearer analysis of memory issues and reduces performance loss. PODArray/hash table/arena memory allocation will use Allocator. Optimize mem limit exceeded log printing Optimize compilation time	2023-04-14 10:42:35 +08:00
HappenLee	1d2dbe7898	[Bug][Pipeline] Run clickbench dead lock in pipeline exec engine (#18211 ) In pipeline exec engine run clickbench may dead lock in some query	2023-03-30 21:41:57 +08:00
Pxl	a8753faeb1	[Bug](function) fix column complex not resize after filter (#18043 )	2023-03-25 21:48:13 +08:00
Pxl	40ca250678	[Feature](materialized-view) support where clause on create materialized view (#17534 ) support where clause on create materialized view	2023-03-22 11:25:13 +08:00
Mellorsssss	4193884a32	[feature](array_zip) Support array_zip function (#17696 )	2023-03-21 18:44:30 +08:00
spaces-x	5b39fa9843	[Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562 ) * [Feature](vectorized)(quantile_state): support vectorized quantile state functions 1. now quantile column only support not nullable 2. add up some regression test cases 3. set default enable_quantile_state_type = true --------- Co-authored-by: spaces-x <weixiang06@meituan.com>	2023-03-14 10:54:04 +08:00
lihangyu	9b7596f1c6	[Feature](Dynamic schema table) step1 support schema change expression (#17494 ) 1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns 2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility	2023-03-13 15:12:42 +08:00
Johnny_Sc	47cfc81925	[fix docs] (#17634 ) Co-authored-by: shenshoucheng <shenshoucheng@jd.com>	2023-03-13 08:06:33 +08:00
lihangyu	fcd25b53bf	[Optimize](Random distribution) Improve the performance of tablet sin… (#17389 ) The current distribution model for Doris is as follows: OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block. This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable. This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet	2023-03-10 10:52:40 +08:00
Jerry Hu	5c265d8183	[fix](vec)crashing caused by parallel output file (#17384 )	2023-03-03 19:03:53 +08:00
ZhaoChangle	e82b827bc8	[optimize](vectorization)Optimize to_string's performance. (#17076 )	2023-03-03 10:35:59 +08:00
Mingyu Chen	39f59f554a	[improvement](dry-run)(tvf) support csv schema in tvf and add "dry_run_query" variable (#16983 ) This CL mainly changes: Support specifying csv schema manually in s3/hdfs table valued function s3 ( 'URI' = 'https://bucket1/inventory.dat', 'ACCESS_KEY'= 'ak', 'SECRET_KEY' = 'sk', 'FORMAT' = 'csv', 'column_separator' = '\|', 'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)', 'use_path_style'='true' ) Add new session variable dry_run_query If set to true, the real query result will not be returned, instead, it will only return the number of returned rows. mysql> select * from bigtable; +--------------+ \| ReturnedRows \| +--------------+ \| 10000000 \| +--------------+ This can avoid large result set transmission time and focus on real execution time of query engine. For debug and analysis purpose.	2023-03-02 16:51:27 +08:00
Gabriel	633f2d52a4	[minor](log) add some logs (#17287 )	2023-03-01 22:41:50 +08:00
camby	4d8b310de0	[fix](struct-type) fix struct subtype support (#17081 ) 1. Make sure all sub types which STRUCT supported work correctly; 2. remove unused variable `_need_validate_data`; 3. lazy init min or max decimal to support nested DecimalV2 column validate; Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2023-02-28 11:37:07 +08:00
amory	8b70bfdc31	[Feature](map-type) Support stream load and fix some bugs for map type (#16776 ) 1、support stream load with json, csv format for map 2、fix olap convertor when compaction action in map column which has null 3、support select outToFile for map 4、add some regression-test	2023-02-19 15:11:54 +08:00
zhengshengjun	e2e6a0dd83	[Feature](load) Support mutable property for partition (#16036 ) The background is described in this issue: #15723, where users used Apache Druid to satisfy such lambada requirements before. We will not make Doris dropping data not belonged to current time window automatically like Druid, which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way: 1. Support mutable property for a partition. 2. The mutable property of a partition is passed from FE to BE in a load procedure 3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio', so that data write to immutable partition will be neglected and not cause load failure. Use Example: 1. Add immutable partition or modify an partition to be immutable: - alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true'); - alter table test_tbl modify partition xx set ('mutable' = 'false'); 2. Write 5 records into table, two of then belongs to immutable partition	2023-02-18 23:09:34 +08:00
Gabriel	dd06cc7609	[pipeline](shuffle) Improve broadcast shuffle (#16779 ) Now we reuse buffer pool for broadcast shuffle on pipeline engine. This PR ensures that a pipeline with a broadcast shuffle sink will not be scheduled if there are no available buffer in the buffer pool	2023-02-15 22:03:27 +08:00
Pxl	f50edff59d	[Chore](build) enable fallthrough check annd fix some fallthrough bug (#16748 ) * enable fallthrough check annd fix some fallthrough bug * fix * fix	2023-02-15 15:58:43 +08:00
zhengshengjun	d013d529c8	[Feature](ipv6)Support IPV6 (#14063 ) Support IPV6 in Apache Doris, the main changes are: 1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string 2. BRPC and HTTP support binding to IPV6 address 3. BRPC and HTTP support visiting IPV6 Services	2023-02-14 21:43:10 +08:00
Gabriel	784c27deeb	[Bug](shuffle) fix mem leak in data stream sender (#16685 )	2023-02-14 16:40:13 +08:00
YueW	ed3420000e	[fix](bthread) fix bthread hang (#16594 )	2023-02-14 00:08:57 +08:00
奕冷	cf739e7496	[Enhancement](Stmt) Set insert_into timeout session variable separately (#16343 )	2023-02-12 16:56:10 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00
xy720	1b3902baa2	[Feature](Complex-type) Add struct and map type to Doris (#16444 ) This commit support: 1、Insert + select for struct/map type 2、Json stream load for struct type 3、m[key] function for map type How to use: Set the fe config to create table for struct and map type 1、admin set frontend config("enable_struct_type" = "true"); 2、admin set frontend config("enable_map_type" = "true"); #16547 Co-authored-by: xy720 <xuyang25@baidu.com> Co-authored-by: amory <wangqiannan@selectdb.com> Co-authored-by: cambyzju <zhuxiaoli01@baidu.com> Co-authored-by: hucheng01 <hucheng01@baidu.com>	2023-02-10 11:00:33 +08:00
Gabriel	a038fdaec6	[Bug](pipeline) Fix bug in non-local exchange on pipeline engine (#16463 ) Currently, for broadcast shuffle, we serialize a block once and then send it by RPC through multiple channel. After this, we will serialize next block in the same memory for consideration of memory reuse. However, since the RPC is asynchronized, maybe the next block serialization will happen before sending the previous block. So, in this PR, I use a ref count to identify if the serialized block can be reuse in broadcast shuffle.	2023-02-09 19:22:40 +08:00
Jerry Hu	91325e5ca3	[fix](pipeline) incorrect result when disabling sharing hash table (#16476 )	2023-02-07 21:25:32 +08:00
Pxl	5e4bb98900	[Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290 ) enable -Wpedantic and update lowest gcc version to 11.1	2023-02-03 11:28:48 +08:00
chenlinzhong	fdc042bb39	[fix](vresultsink) BufferControlBlock may block all fragment handle threads (#16231 ) BufferControlBlock may block all fragment handle threads leads to be out of work modify include: BufferControlBlock cancel after max timeout StmtExcutor notify be to cancel the fragment when unexcepted occur more details see issue #16203	2023-01-30 16:53:21 +08:00
abmdocrt	eb7da1c0ee	[fix](datatype) fix some bugs about data type array datetimev2 and decimalv3 (#16132 )	2023-01-29 14:26:08 +08:00

1 2 3 4

184 Commits