Commit Graph

184 Commits

Author SHA1 Message Date
2ec1d282c5 [fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof (#20007)
* [fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof
2023-05-25 10:29:35 +08:00
c730033595 [improvement](exchange) data stream sender stop sending data to receiver if it returns eos early (#19847)
For broadcast join, only one build fragment instance will build hash table, other fragment instances just receive and throw away build side data, this is waste of memory and cpu.

This PR improve this condition, data stream receiver tells sender that it does not need data from sender, and sender stops sending anydata to it.
2023-05-24 15:11:32 +08:00
fe111207a9 [Fix](lazy_open) Fix lazy open null point (#19829) 2023-05-23 09:17:46 +08:00
3e010bbee7 [improvement](profile) add profile counter 'BytesSent' for VDataBufferSender (#19826) 2023-05-19 08:46:50 +08:00
fe42e52851 [pipeline](CTE) Support multi stream data sink in pipeline (#19519) 2023-05-18 10:34:37 +08:00
5fa956b0d6 [Bug](pipeline) RegressionTest failed release resouce cause DCHECK failed #19773 2023-05-18 08:35:57 +08:00
49c6bbce84 [improvement](load) do not create pthread in tablet_sink (#19465)
add bvar stat for streamload.
2023-05-17 22:05:54 +08:00
272a7565b8 [improvement](tracing) Remove useless span levels from be side tracing (#19665)
1. Remove an exec node method corresponding to a span and replace it with an exec node corresponding to a span;
2. Fix some problems with tracing in pipeline.
2023-05-17 19:04:52 +08:00
e22f5891d2 [WIP](row store) two phase opt read row store (#18654) 2023-05-16 13:21:58 +08:00
f8ef25bb10 [enhancement](load) lazy-open necessary partitions when load (#18874) 2023-05-14 16:09:55 +08:00
39ec8aa64c [refactor](complex-type) refactor array/map/struct literal to not invoke execute() function in prepare state (#19068) 2023-05-11 18:44:37 +08:00
28e088aee1 [optimization](be) optimization for ColumnConst when writing mysql result (#19122)
* opt for result

* fix
2023-05-11 01:04:18 +08:00
Pxl
dfad7b6b38 [Feature](generic-aggregation) some prowork of generic aggregation (#19343)
some prowork of generic aggregation
2023-05-09 21:42:21 +08:00
e78149cb65 [Enhencement](Export) add property for outfile/export and add test (#18997)
This pr does three things:
1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first.
2. add p2 test for export
3. modify docs
2023-05-08 14:02:20 +08:00
153f42a873 [enhancement](exprcontext) modify get_output_block_after_execute_expr method more clear to avoid mis usage (#19310)
The original method signature is Block VExprContext::get_output_block_after_execute_exprs(
const std::vectorvectorized::VExprContext*& output_vexpr_ctxs, const Block& input_block,
Status& status)
It return error status as a out parameter and the block as return value. It has to check the block.rows == 0 and then check error status.
It is not conforming to the convention.


---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-05-06 09:03:22 +08:00
8e4710079d [improvement](profile) Insert into add LoadChannel runtime profile (#18908)
TabletSink and LoadChannel in BE are M: N relationship,
Every once in a while LoadChannel will randomly return its own runtime profile to a TabletSink, so usually all LoadChannel runtime profiles are saved on each TabletSink, and the timeliness of the same LoadChannel profile saved on different TabletSinks is different, and each TabletSink will periodically send fe reports all the LoadChannel profiles saved by itself, and ensures to update the latest LoadChannel profile according to the timestamp.
2023-04-24 09:41:57 +08:00
63a76ed115 [refactor](exceptionsafe) disallow call new method explicitly (#18830)
disallow call new method explicitly
force to use create_shared or create_unique to use shared ptr
placement new is allowed
reference https://abseil.io/tips/42 to add factory method to all class.
I think we should follow this guide because if throw exception in new method, the program will terminate.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-21 09:13:24 +08:00
eb93afc614 [MemLeak](pipeline) fix mem leak by exchange node in pipeline (#18864) 2023-04-21 09:06:55 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
79c446c89f [enhancement](exception) Column filter/replicate supports exception safety (#18503) 2023-04-18 19:23:09 +08:00
5300b21db7 [Bug](DECIMALV3) report failure if a decimal value is overflow (#18336) 2023-04-17 13:18:14 +08:00
c704351273 [enhancement](memory) Refactor memory limit exceeded behavior (#18590)
No check mem tracker limit and no cancel task in mem hook, only in Allocator. This helps in clearer analysis of memory issues and reduces performance loss.
PODArray/hash table/arena memory allocation will use Allocator.

Optimize mem limit exceeded log printing

Optimize compilation time
2023-04-14 10:42:35 +08:00
1d2dbe7898 [Bug][Pipeline] Run clickbench dead lock in pipeline exec engine (#18211)
In pipeline exec engine run clickbench may dead lock in some query
2023-03-30 21:41:57 +08:00
Pxl
a8753faeb1 [Bug](function) fix column complex not resize after filter (#18043) 2023-03-25 21:48:13 +08:00
Pxl
40ca250678 [Feature](materialized-view) support where clause on create materialized view (#17534)
support where clause on create materialized view
2023-03-22 11:25:13 +08:00
4193884a32 [feature](array_zip) Support array_zip function (#17696) 2023-03-21 18:44:30 +08:00
5b39fa9843 [Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562)
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------

Co-authored-by: spaces-x <weixiang06@meituan.com>
2023-03-14 10:54:04 +08:00
9b7596f1c6 [Feature](Dynamic schema table) step1 support schema change expression (#17494)
1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
2023-03-13 15:12:42 +08:00
47cfc81925 [fix docs] (#17634)
Co-authored-by: shenshoucheng <shenshoucheng@jd.com>
2023-03-13 08:06:33 +08:00
fcd25b53bf [Optimize](Random distribution) Improve the performance of tablet sin… (#17389)
The current distribution model for Doris is as follows:

OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block.

This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable.

This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet
2023-03-10 10:52:40 +08:00
5c265d8183 [fix](vec)crashing caused by parallel output file (#17384) 2023-03-03 19:03:53 +08:00
e82b827bc8 [optimize](vectorization)Optimize to_string's performance. (#17076) 2023-03-03 10:35:59 +08:00
39f59f554a [improvement](dry-run)(tvf) support csv schema in tvf and add "dry_run_query" variable (#16983)
This CL mainly changes:

Support specifying csv schema manually in s3/hdfs table valued function

s3 (
'URI' = 'https://bucket1/inventory.dat',
'ACCESS_KEY'= 'ak',
'SECRET_KEY' = 'sk',
'FORMAT' = 'csv',
'column_separator' = '|',
'csv_schema' = 'k1:int;k2:int;k3:int;k4:decimal(38,10)',
'use_path_style'='true'
)
Add new session variable dry_run_query

If set to true, the real query result will not be returned, instead, it will only return the number of returned rows.

mysql> select * from bigtable;
+--------------+
| ReturnedRows |
+--------------+
| 10000000     |
+--------------+
This can avoid large result set transmission time and focus on real execution time of query engine.
For debug and analysis purpose.
2023-03-02 16:51:27 +08:00
633f2d52a4 [minor](log) add some logs (#17287) 2023-03-01 22:41:50 +08:00
4d8b310de0 [fix](struct-type) fix struct subtype support (#17081)
1. Make sure all sub types which STRUCT supported work correctly;
2. remove unused variable `_need_validate_data`;
3. lazy init min or max decimal to support nested DecimalV2 column validate;

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2023-02-28 11:37:07 +08:00
8b70bfdc31 [Feature](map-type) Support stream load and fix some bugs for map type (#16776)
1、support stream load with json, csv format for map
2、fix olap convertor when compaction action in map column which has null
3、support select outToFile for map
4、add some regression-test
2023-02-19 15:11:54 +08:00
e2e6a0dd83 [Feature](load) Support mutable property for partition (#16036)
The background is described in this issue: #15723,
where users used Apache Druid to satisfy such lambada requirements before.
We will not make Doris dropping data not belonged to current time window automatically like Druid,
which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way:

1. Support mutable property for a partition.
2. The mutable property of a partition is passed from FE to BE in a load procedure
3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio',
   so that data write to immutable partition will be neglected and not cause load failure.

Use Example:

1. Add immutable partition or modify an partition to be immutable:
- alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true');
- alter table test_tbl modify partition xx set ('mutable' = 'false');

2. Write 5 records into table, two of then belongs to immutable partition
2023-02-18 23:09:34 +08:00
dd06cc7609 [pipeline](shuffle) Improve broadcast shuffle (#16779)
Now we reuse buffer pool for broadcast shuffle on pipeline engine. This PR ensures that a pipeline with a broadcast shuffle sink will not be scheduled if there are no available buffer in the buffer pool
2023-02-15 22:03:27 +08:00
Pxl
f50edff59d [Chore](build) enable fallthrough check annd fix some fallthrough bug (#16748)
* enable fallthrough check annd fix some fallthrough bug

* fix

* fix
2023-02-15 15:58:43 +08:00
d013d529c8 [Feature](ipv6)Support IPV6 (#14063)
Support IPV6 in Apache Doris, the main changes are:
1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string
2. BRPC and HTTP support binding to IPV6 address
3. BRPC and HTTP support visiting IPV6 Services
2023-02-14 21:43:10 +08:00
784c27deeb [Bug](shuffle) fix mem leak in data stream sender (#16685) 2023-02-14 16:40:13 +08:00
ed3420000e [fix](bthread) fix bthread hang (#16594) 2023-02-14 00:08:57 +08:00
cf739e7496 [Enhancement](Stmt) Set insert_into timeout session variable separately (#16343) 2023-02-12 16:56:10 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00
1b3902baa2 [Feature](Complex-type) Add struct and map type to Doris (#16444)
This commit support:
1、Insert + select for struct/map type
2、Json stream load for struct type
3、m[key] function for map type

How to use:
Set the fe config to create table for struct and map type
1、admin set frontend config("enable_struct_type" = "true");
2、admin set frontend config("enable_map_type" = "true");

#16547

Co-authored-by: xy720 <xuyang25@baidu.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2023-02-10 11:00:33 +08:00
a038fdaec6 [Bug](pipeline) Fix bug in non-local exchange on pipeline engine (#16463)
Currently, for broadcast shuffle, we serialize a block once and then send it by RPC through multiple channel. After this, we will serialize next block in the same memory for consideration of memory reuse. However, since the RPC is asynchronized, maybe the next block serialization will happen before sending the previous block.

So, in this PR, I use a ref count to identify if the serialized block can be reuse in broadcast shuffle.
2023-02-09 19:22:40 +08:00
91325e5ca3 [fix](pipeline) incorrect result when disabling sharing hash table (#16476) 2023-02-07 21:25:32 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
fdc042bb39 [fix](vresultsink) BufferControlBlock may block all fragment handle threads (#16231)
BufferControlBlock may block all fragment handle threads leads to be out of work

modify include:

BufferControlBlock cancel after max timeout
StmtExcutor notify be to cancel the fragment when unexcepted occur
more details see issue #16203
2023-01-30 16:53:21 +08:00
eb7da1c0ee [fix](datatype) fix some bugs about data type array datetimev2 and decimalv3 (#16132) 2023-01-29 14:26:08 +08:00