doris

Author	SHA1	Message	Date
Pxl	8e39f0cf6b	[Enchancement](Agg State) storage function name and result is nullable in agg state type (#20298 ) storage function name and result is nullable in agg state type	2023-06-04 22:44:48 +08:00
Kaijie Chen	b0bbff0fd1	[performance](load) improve memtable sort performance (#20392 )	2023-06-04 20:33:15 +08:00
HHoflittlefish777	34a1b7599f	[Fix](lazy_open) fix lazy open commit info lose (#20404 )	2023-06-04 19:08:36 +08:00
Ashin Gau	3e186a8821	[opt](MergedIO) optimize merge small IO, prevent amplified read (#20305 ) Optimize the strategy of merging small IO to prevent severe read amplification, and turn off merged IO when file cache enabled. Adjustable parameters: ``` // the max amplified read ratio when merging small IO max_amplified_read_ratio=0.8 // the min segment size file_cache_min_file_segment_size = 1048576 ```	2023-06-03 10:51:24 +08:00
Pxl	90d710e83d	[Enchancement](function) optimize for padding function && add string length check on string op (#20363 )	2023-06-02 21:24:41 +08:00
YueW	b62c5a70c7	[fix](match query) fix array column match query failed without inverted index (#20344 )	2023-06-02 21:10:12 +08:00
ZhangYu0123	78c37b5244	[Optimize](Function) Add fast path for col like '%%' or col like '%' or regexp '\\.' (#20143 ) Add fast path for col like '%%' or col like '%' or regexp '\\.' (1) like about 34% speed up when use count() test support col like '%%' , col like '%', col not like '%%' , col not like '%' (2) regexp about 37% speed up when use count() test support col regexp '\\.', col not regexp '\\.' Q1: select count() From hits where url like '%'; Q2: select count() From hits where url regexp '\\.*';	2023-06-02 16:26:56 +08:00
amory	06e7c14320	[Improve](json-array) Support json array with nereids bool (#20248 ) Support json array with nereids bool now : ``` set enable_nereids_planner=true; mysql> SELECT json_array(1, "abc", NULL, TRUE, '10:00:00'); +----------------------------------------------+ \| json_array(1, 'abc', NULL, TRUE, '10:00:00') \| +----------------------------------------------+ \| [1,"abc",null,false,"10:00:00"] \| +----------------------------------------------+ 1 row in set (0.02 sec) ``` nereids boolean is "true"/"false" is not '0' /'1' , so we always get false	2023-06-02 14:47:24 +08:00
amory	d68f3f3b3d	[Feature](array-functions)improve array functions for array_last_index (#20294 ) Now we just support array_first_index for lambda input , but no array_last_index	2023-06-02 13:54:03 +08:00
HappenLee	8bec2b41db	[pipeline](rpc) support closure reuse in pipeline exec engine (#20278 )	2023-06-02 09:50:21 +08:00
lihangyu	f0513a861d	[Improve](Scan) add a session variable to make scan run serial (#20220 ) Parallel scanning can result in some read amplification, for example, select * from xx where limit 1 actually requires only one row of data. However, due to parallel scanning of multiple tablets, read amplification occurs, leading to performance bottlenecks in high-concurrency scenarios. This PR Adding a SessionVariable to enforce serial scanning can help mitigate this issue.	2023-06-01 15:06:35 +08:00
Mryange	519f01133a	[feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811 )	2023-06-01 13:09:58 +08:00
lihangyu	9e21318834	[refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594 ) 1. make ColumnObject exception safe 2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema 3. add more test cases	2023-06-01 10:25:04 +08:00
Yongqiang YANG	6ee99c4138	[fix](load_profile) fix rows stat and add close_wait in sink (#20181 )	2023-05-31 18:23:30 +08:00
YueW	6adb3fdf11	[fix](match_phrase) Fix the inconsistent query result for 'match_phrase' after creating index without support_phrase property (#20258 ) if create inverted index without support_phrase property, remaining the match_phrase condition to filter by match function.	2023-05-31 18:09:50 +08:00
Jerry Hu	c03a19ea23	[improvement](bitmap) Using set to store a small number of elements to improve performance (#19973 ) Test on SSB 100g: select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 4.388s create materialized view: create materialized view customer_uv as select lo_suppkey, bitmap_union(to_bitmap(lo_linenumber)) from lineorder group by lo_suppkey; select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 12.908s test with the patch, exec time: 5.790s	2023-05-31 16:13:42 +08:00
Lijia Liu	f9dfcb923d	[Enhancement] Change Create Resource Group Grammar (#20249 )	2023-05-31 15:23:24 +08:00
zy-kkk	56fa38de1d	[Enhencement](JDBC Catalog) refactor jdbc catalog insert logic (#19950 ) This PR refactors the old way of writing data to JDBC External Table & JDBC Catalog, mainly including the following tasks 1. Continuing the work of @BePPPower 's PR #18594, changing the logic of splicing Inster sql to operating off-heap memory and using preparedStatement.set to write data logic to complete 2. Supplement the support written by largeint type, mainly to adapt to Java.Math.BigInteger, which uses binary operations 3. Delete the splicing SQL logic in the JDBC External Table & JDBC Catalog related written code ToDo: Binary type，like bit,binary, blob... Finally, special thanks to @BePPPower , @AshinGau for his work Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>	2023-05-30 22:03:39 +08:00
Pxl	7415135ad4	[Enchancement](execute) make assert_cast can output derived class name (#20212 ) before: F0530 11:02:41.989699 1154607 assert_cast.h:54] Bad cast from type:doris::vectorized::IDataType const* to doris::vectorized::DataTypeAggState const* after: F0530 11:24:28.390286 1292475 assert_cast.h:46] Bad cast from type:doris::vectorized::DataTypeNullable* to doris::vectorized::DataTypeAggState const*	2023-05-30 20:23:04 +08:00
Mingyu Chen	0c98355fff	[fix](catalog) fix create catalog with resource replay issue and kerberos auth issue (#20137 ) 1. Fix create catalog with resource replay bug. If user create catalog using `create catalog hive with resource xxx`, when replaying edit log, there is a bug that resource may be dropped, causing NPE and FE will fail to start. In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true. So that `with resource` will not be allowed, and it will be deprecated later. And also fix the replay bug to avoid NPE. 2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication. When user create 2 hive catalogs, one use simple auth, the other use kerberos auth. The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.` So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`. Which means this property will be added automatically when user creating hive catalog, to avoid such problem. 3. Fix calling `hdfsExists()` issue When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found. 3. Some code refactor Avoid import `org.apache.parquet.Strings`	2023-05-30 16:57:39 +08:00
YueW	de08c4a57b	[enhance](match) Support match query without inverted index (#19936 )	2023-05-30 15:02:57 +08:00
bobhan1	bb12a1cb49	[Enhance](array function) add support for DecimalV3 for array_enumerate_uniq() (#17724 )	2023-05-30 13:09:19 +08:00
Qi Chen	9b32d42ee4	[Fix](multi-catalog) fix all nested type test which introduced by #19518(support insert-only transactional table). (#20194 ) Fix `qt_nested_types_orc` in `test_tvf_p2` which introduced by #19518(support insert-only transactional table). ### Test case error `qt_nested_types_orc` in `test_tvf_p2` ``` select count(array0), count(array1), count(array2), count(array3), count(struct0), count(struct1), count(map0) from hdfs( "uri" = "hdfs://172.21.16.47:4007/catalog/tvf/orc/all_nested_types.orc", "format" = "orc", "fs.defaultFS" = "hdfs://172.21.16.47:4007") ``` Error Message： errCode = 2, detailMessage = (172.21.0.101)[INTERNAL_ERROR]Wrong data type for colum 'struct1'	2023-05-30 09:55:40 +08:00
Qi Chen	2abbc9f921	[Fix](multi-catalog) Fix parquet bugs of #19758 'replace the single pointer with an array of 'conjuncts' in ExecNode'. (#20191 ) Fix some parquet reader bugs which introduced by #19758 'replace the single pointer with an array of 'conjuncts' in ExecNode'.	2023-05-30 09:55:12 +08:00
Pxl	d1d0d9e5e8	[Chore](build) adjust some compile diagnostic (#20162 )	2023-05-29 19:19:01 +08:00
Xinyi Zou	f9478dbd9a	[fix](function) Fix VcompoundPred execute const column #20158 recurrent: ./run-regression-test.sh --run -suiteParallel 1 -actionParallel 1 -parallel 1 -d query_p0/sql_functions/window_functions select /+ SET_VAR(query_timeout = 600) / subq_0.`c1` as c0 from (select ref_1.`s_name` as c0, ref_1.`s_suppkey` as c1, ref_1.`s_address` as c2, ref_1.`s_address` as c3 from regression_test_query_p0_sql_functions_window_functions.tpch_tiny_supplier as ref_1 where (ref_1.`s_name` is NULL) or (ref_1.`s_acctbal` is not NULL)) as subq_0 where (subq_0.`c3` is NULL) or (subq_0.`c2` is not NULL) reason: FunctionIsNull and FunctionIsNotNull execute returns a const column, but their VectorizedFnCall::is_constant returns false, which causes problems with const handling when VCompoundPred::execute. This pr converts const column to full column in VCompoundPred execute. In the future, there will be a more thorough solution to such problems.	2023-05-29 18:16:58 +08:00
lihangyu	ab8125d56f	[Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037 ) * [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema 1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse. 2. Make some variables wrapped with std::unique<unique_ptr> Performance: \| 状态 \| QPS \| 平均响应时间 (avg) \| P99 响应时间 \| \|------------------\|-----\|------------------\|-------------\| \| 开启 SchemaCache \| 501 \| 20ms \| 34ms \| \| 关闭 SchemaCache \| 321 \| 31ms \| 61ms \| * handle schema change with schema version * remove useless header * rebase	2023-05-29 17:34:53 +08:00
amory	91dae8a5b6	[FIX](mysql_writer) fix mysql output binary object works (#20154 ) * fix struct_export out data * fix mysql writer output with binary true	2023-05-29 16:53:33 +08:00
Gabriel	55ccddb62c	[Conf](decimalv3) enable decimalv3 by default	2023-05-29 15:38:31 +08:00
Pxl	8376e5eefb	[Chore](build) add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array (#20118 ) add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array	2023-05-29 14:42:47 +08:00
Pxl	bbb3af6ce6	[Feature](agg_state) support agg_state combinators (#19969 ) support agg_state combinators state/merge/union	2023-05-29 13:07:29 +08:00
Mryange	a86134cb39	[fix](executor) Fixed an error with cast as time. #20144 before mysql [(none)]>select cast("10:10:10" as time); +-------------------------------+ \| CAST('10:10:10' AS TIMEV2(0)) \| +-------------------------------+ \| 00:00:00 \| +-------------------------------+ after mysql [(none)]>select cast("10:10:10" as time); +-------------------------------+ \| CAST('10:10:10' AS TIMEV2(0)) \| +-------------------------------+ \| 10:10:10 \| +-------------------------------+ In the past, we supported this syntax. mysql [(none)]>select cast("2023:05:01 13:14:15" as time); +------------------------------------------+ \| CAST('2023:05:01 13:14:15' AS TIMEV2(0)) \| +------------------------------------------+ \| 13:14:15 \| +------------------------------------------+ However, "10:10:10" is also a valid datetime. mysql [(none)]>select cast("10:10:10" as datetime); +-----------------------------------+ \| CAST('10:10:10' AS DATETIMEV2(0)) \| +-----------------------------------+ \| 2010-10-10 00:00:00 \| +-----------------------------------+ So here, the order of parsing has been adjusted.	2023-05-29 12:17:21 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
Kang	859b03dfdf	[Improvement](topn) prevent memory usage of key topn increasing unlimited (#19978 )	2023-05-29 10:16:15 +08:00
Yongqiang YANG	e0d9f7f955	[enhancement](load) add some profile items for load (#20141 )	2023-05-29 09:54:03 +08:00
amory	9d44918036	[Improve](data-type) Clean datatype uselesscode (#20145 ) * fix struct_export out data * delete useless code with data type	2023-05-28 20:48:29 +08:00
zhangstar333	509689491f	[improvement](exec) Refactor the partition sort node to send data in pipeline mode (#20128 ) before: the node will wait to retrieve all data from child, then send data to parent. now: for data from child that does not require sorting, it can be sent to parent immediately.	2023-05-27 22:42:10 +08:00
HappenLee	e5b0d7a5cd	[CTE](eof) Support cte reuse reduce counter by eof status and pipeline task mem can release (#20056 )	2023-05-26 22:03:29 +08:00
Qi Chen	cb4a57f44f	[Opt](orc-reader) Support merge small IO facility in orc reader. (#20092 ) #18976 introduced merge small IO facility to optimize performance, and used by parquet reader. This PR support this facility in orc reader. Current ORC reader implementation need to reposition parent present stream when reading lazy columns in lazy materialization facility. So let it works by removing `DCHECK_GE(offset, cached_data.end_offset)`.	2023-05-26 21:06:12 +08:00
zclllyybb	346c51faa2	[fix](expr) Make VExprContext exit gracefully (#19984 )	2023-05-26 20:21:53 +08:00
Xinyi Zou	a928b21434	[improvement](exception-safe) sort node is completely exception safe #20041	2023-05-26 18:29:02 +08:00
TengJianPing	488c9ba7c2	[improvement](exchange) test: data stream sender stop sending data to receiver if it returns eos early (#20081 )	2023-05-26 16:05:38 +08:00
Pxl	43aa062fb1	[Chore](hash-join) remove useless conditions and add some case (#20050 )	2023-05-26 14:45:24 +08:00
amory	ee34b6de2d	[Refact] (serde) refact mysql serde with data type (#19543 ) refact mysql output (de)serialize with data type serde , avoid accoriding switch case Primitive type writed in mysqlWriter	2023-05-26 14:11:17 +08:00
Pxl	15a7420661	[Chore](ub) fix some undefined behaviors (#19986 ) /home/zcp/repo_center/doris_master/doris/be/src/olap/rowset/segment_v2/column_reader.cpp:895:21: runtime error: load of value 423208544, which is not a valid value for type 'doris::ReaderType' /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_decimal.cpp:260:33: runtime error: load of misaligned address 0x7fa3348b301c for type 'int64_t' (aka 'long'), which requires 8 byte alignment /home/zcp/repo_center/doris_master/doris/be/src/olap/block_column_predicate.cpp:82:24: runtime error: variable length array bound evaluates to non-positive value 0 /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_string.h:225:26: runtime error: null pointer passed as argument 2, which is declared to never be null	2023-05-26 14:08:40 +08:00
Mryange	92a6122f74	[feature](profile)Add the filtering information of the Bloom filter in profile. (#19789 )	2023-05-26 10:56:58 +08:00
TengJianPing	3598518e59	[fix](revert) data stream sender stop sending data to receiver if it returns eos early (#19847 )" (#20040 ) * Revert "[fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof (#20007)" This reverts commit 2ec1d282c5e27b25d37baf91cacde082cca4ec31. * [fix](revert) data stream sender stop sending data to receiver if it returns eos early (#19847)" This reverts commit c73003359567067ea7d44e4a06c1670c9ec37902.	2023-05-25 16:50:17 +08:00
zhangstar333	002c76e06f	[vectorized](udaf) support udaf function work with window function (#19962 )	2023-05-25 14:38:47 +08:00
zhangstar333	53ae24912f	[vectorized](feature) support partition sort node (#19708 )	2023-05-25 11:22:02 +08:00
TengJianPing	2ec1d282c5	[fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof (#20007 ) * [fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof	2023-05-25 10:29:35 +08:00

1 2 3 4 5 ...

1727 Commits