doris

Author	SHA1	Message	Date
Kang	ffadaa4935	[improvement](inverted index) skip write index on load and generate index on compaction (#20325 )	2023-06-03 16:03:21 +08:00
Ashin Gau	3e186a8821	[opt](MergedIO) optimize merge small IO, prevent amplified read (#20305 ) Optimize the strategy of merging small IO to prevent severe read amplification, and turn off merged IO when file cache enabled. Adjustable parameters: ``` // the max amplified read ratio when merging small IO max_amplified_read_ratio=0.8 // the min segment size file_cache_min_file_segment_size = 1048576 ```	2023-06-03 10:51:24 +08:00
Pxl	90d710e83d	[Enchancement](function) optimize for padding function && add string length check on string op (#20363 )	2023-06-02 21:24:41 +08:00
YueW	b62c5a70c7	[fix](match query) fix array column match query failed without inverted index (#20344 )	2023-06-02 21:10:12 +08:00
YueW	adc3acb283	[fix](match) fix match query with compound predicates return -6003 (#20361 )	2023-06-02 18:25:37 +08:00
ZhangYu0123	78c37b5244	[Optimize](Function) Add fast path for col like '%%' or col like '%' or regexp '\\.' (#20143 ) Add fast path for col like '%%' or col like '%' or regexp '\\.' (1) like about 34% speed up when use count() test support col like '%%' , col like '%', col not like '%%' , col not like '%' (2) regexp about 37% speed up when use count() test support col regexp '\\.', col not regexp '\\.' Q1: select count() From hits where url like '%'; Q2: select count() From hits where url regexp '\\.*';	2023-06-02 16:26:56 +08:00
amory	06e7c14320	[Improve](json-array) Support json array with nereids bool (#20248 ) Support json array with nereids bool now : ``` set enable_nereids_planner=true; mysql> SELECT json_array(1, "abc", NULL, TRUE, '10:00:00'); +----------------------------------------------+ \| json_array(1, 'abc', NULL, TRUE, '10:00:00') \| +----------------------------------------------+ \| [1,"abc",null,false,"10:00:00"] \| +----------------------------------------------+ 1 row in set (0.02 sec) ``` nereids boolean is "true"/"false" is not '0' /'1' , so we always get false	2023-06-02 14:47:24 +08:00
amory	d68f3f3b3d	[Feature](array-functions)improve array functions for array_last_index (#20294 ) Now we just support array_first_index for lambda input , but no array_last_index	2023-06-02 13:54:03 +08:00
Jerry Hu	8ff8705b3f	[fix](olap) deletion statement with space conditions did not take effect (#20349 ) Deletion statement like this: delete from tb where k1 = ' '; The rows whose k1's value is ' ' will not be deleted.	2023-06-02 13:52:57 +08:00
Kaijie Chen	a869056567	[performance](load) support parallel memtable flush for unique key tables (#20308 )	2023-06-02 13:49:53 +08:00
Gabriel	dc43e65d06	[Bug](pipeline) Fix memory leak if query is canceled caused by memory limit (#20316 )	2023-06-02 11:42:52 +08:00
HappenLee	576288cc89	[Profile](exec) Remove unless profile in pipeline exec engine (#20337 )	2023-06-02 11:39:11 +08:00
Mingyu Chen	86d77084a4	[Fix](multi-catalog) fix oss access issue with aws s3 sdk (#20287 )	2023-06-02 10:40:07 +08:00
HappenLee	8bec2b41db	[pipeline](rpc) support closure reuse in pipeline exec engine (#20278 )	2023-06-02 09:50:21 +08:00
HappenLee	608d2a3eca	[Bug](exec) push down no group by agg min cause error result (#20289 ) sql """ CREATE TABLE t1_int ( num int(11) NULL, dgs_jkrq bigint(20) NULL ) ENGINE=OLAP DUPLICATE KEY(num) COMMENT 'OLAP' DISTRIBUTED BY HASH(num) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "false" ); """ sql """insert into t1_int values(1,1),(1,2),(1,3),(1,4),(1,null);""" qt_sql """ select min(dgs_jkrq) from t1_int; """ get the error result：4 after change we get the right result：1	2023-06-01 17:29:46 +08:00
lihangyu	f0513a861d	[Improve](Scan) add a session variable to make scan run serial (#20220 ) Parallel scanning can result in some read amplification, for example, select * from xx where limit 1 actually requires only one row of data. However, due to parallel scanning of multiple tablets, read amplification occurs, leading to performance bottlenecks in high-concurrency scenarios. This PR Adding a SessionVariable to enforce serial scanning can help mitigate this issue.	2023-06-01 15:06:35 +08:00
Mryange	519f01133a	[feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811 )	2023-06-01 13:09:58 +08:00
Gabriel	4387f47fb5	[pipeline](load) support pipeline load (#20217 )	2023-06-01 11:42:43 +08:00
lihangyu	9e21318834	[refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594 ) 1. make ColumnObject exception safe 2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema 3. add more test cases	2023-06-01 10:25:04 +08:00
xiongjx751	5b6b1b38a6	[Enhancement](merge-on-write) Performance optimization of calculations of delete bitmap between segments (#20153 ) 1. Use heap sort to find duplicated keys between segments and update the delete-bitmap. The old implementation traversed all keys in all segments, used each key to search for duplicates in earlier segments, and then marked them for deletion. 2. Trick: Each time the heap top is popped as a key1, the new heap top is key2, allowing for jumping directly from key1 to key2 instead of advancing iteratively. 3. Effect: This technique works well when there are many segments within the same rowset and the imported data is relatively ordered.	2023-06-01 10:12:59 +08:00
Xin Liao	09e6b6580f	[fix](checksum) delete predicates might be inconsistent with rowset readers in checksum task (#20251 ) The BlockReader capture rowsets and init delete_handler in different place. If there is a base compaction, it may result in obtaining inconsistent delete handlers. Therefore, place these two operations under the same lock.	2023-06-01 09:06:51 +08:00
Yongqiang YANG	6ee99c4138	[fix](load_profile) fix rows stat and add close_wait in sink (#20181 )	2023-05-31 18:23:30 +08:00
yangshijie	1aefc26ca0	[Bug](memtable) fix a bug occurred when we were inserting data into duplicate table without keys (#20233 )	2023-05-31 18:21:36 +08:00
YueW	6adb3fdf11	[fix](match_phrase) Fix the inconsistent query result for 'match_phrase' after creating index without support_phrase property (#20258 ) if create inverted index without support_phrase property, remaining the match_phrase condition to filter by match function.	2023-05-31 18:09:50 +08:00
Jerry Hu	c03a19ea23	[improvement](bitmap) Using set to store a small number of elements to improve performance (#19973 ) Test on SSB 100g: select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 4.388s create materialized view: create materialized view customer_uv as select lo_suppkey, bitmap_union(to_bitmap(lo_linenumber)) from lineorder group by lo_suppkey; select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey; exec time: 12.908s test with the patch, exec time: 5.790s	2023-05-31 16:13:42 +08:00
Lijia Liu	f9dfcb923d	[Enhancement] Change Create Resource Group Grammar (#20249 )	2023-05-31 15:23:24 +08:00
Mingyu Chen	6eb99d1219	[chore](arm) support build with hadoop libhdfs on arm (#20256 ) hadoop-3.3.4.3-for-doris already support build on arm	2023-05-31 13:57:48 +08:00
Xin Liao	ca88425bee	[Enhancement](merge-on-write) optimize bloom filter for primary key index (#20182 )	2023-05-31 09:49:15 +08:00
Jack Drogon	aae04d9680	[Chore](log) Remove some verbose log && Change log level (#20236 )	2023-05-31 09:15:01 +08:00
zy-kkk	56fa38de1d	[Enhencement](JDBC Catalog) refactor jdbc catalog insert logic (#19950 ) This PR refactors the old way of writing data to JDBC External Table & JDBC Catalog, mainly including the following tasks 1. Continuing the work of @BePPPower 's PR #18594, changing the logic of splicing Inster sql to operating off-heap memory and using preparedStatement.set to write data logic to complete 2. Supplement the support written by largeint type, mainly to adapt to Java.Math.BigInteger, which uses binary operations 3. Delete the splicing SQL logic in the JDBC External Table & JDBC Catalog related written code ToDo: Binary type，like bit,binary, blob... Finally, special thanks to @BePPPower , @AshinGau for his work Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>	2023-05-30 22:03:39 +08:00
Chenyang Sun	accaff1026	[Feature](compaction) wip: single replica compaction (#19237 ) Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica. The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica. The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool. When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.	2023-05-30 21:12:48 +08:00
Pxl	7415135ad4	[Enchancement](execute) make assert_cast can output derived class name (#20212 ) before: F0530 11:02:41.989699 1154607 assert_cast.h:54] Bad cast from type:doris::vectorized::IDataType const* to doris::vectorized::DataTypeAggState const* after: F0530 11:24:28.390286 1292475 assert_cast.h:46] Bad cast from type:doris::vectorized::DataTypeNullable* to doris::vectorized::DataTypeAggState const*	2023-05-30 20:23:04 +08:00
zzzxl	1919355c04	[Feature](Inverted index) add MATCH_ PHRASE query (#20156 )	2023-05-30 19:28:57 +08:00
airborne12	3d8440a1b7	[Feature-WIP](inverted index) support phrase for inverted index writer (#20193 )	2023-05-30 17:07:45 +08:00
Mingyu Chen	0c98355fff	[fix](catalog) fix create catalog with resource replay issue and kerberos auth issue (#20137 ) 1. Fix create catalog with resource replay bug. If user create catalog using `create catalog hive with resource xxx`, when replaying edit log, there is a bug that resource may be dropped, causing NPE and FE will fail to start. In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true. So that `with resource` will not be allowed, and it will be deprecated later. And also fix the replay bug to avoid NPE. 2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication. When user create 2 hive catalogs, one use simple auth, the other use kerberos auth. The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.` So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`. Which means this property will be added automatically when user creating hive catalog, to avoid such problem. 3. Fix calling `hdfsExists()` issue When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found. 3. Some code refactor Avoid import `org.apache.parquet.Strings`	2023-05-30 16:57:39 +08:00
Qi Chen	4475a69c57	[Fix](multi-catalog) Fix q03 in `text_external_brown` regression test by handling correctly when text converter parsing error. (#20190 ) Issue Number: close #20189 Fix `q03` in `text_external_brown` regression test by handling correctly when text converter parsing error.	2023-05-30 15:08:28 +08:00
YueW	de08c4a57b	[enhance](match) Support match query without inverted index (#19936 )	2023-05-30 15:02:57 +08:00
bobhan1	bb12a1cb49	[Enhance](array function) add support for DecimalV3 for array_enumerate_uniq() (#17724 )	2023-05-30 13:09:19 +08:00
Gabriel	c7b8c83a7f	[Improvement](runtimefilter) Build bloom filter according to the exact build size for IN_OR_BLOOM_FILTER (#20166 ) * [Improvement](runtimefilter) Build bloom filter according to the exact build size for IN_OR_BLOOM_FILTER	2023-05-30 12:55:30 +08:00
lihangyu	945cb56fb6	[Bug](segment iterator) remove DCHECK for block row count (#20199 ) CHECK rows count of block at segment iterator is not ready when `enable_common_expr_pushdown`	2023-05-30 11:34:25 +08:00
Qi Chen	9b32d42ee4	[Fix](multi-catalog) fix all nested type test which introduced by #19518(support insert-only transactional table). (#20194 ) Fix `qt_nested_types_orc` in `test_tvf_p2` which introduced by #19518(support insert-only transactional table). ### Test case error `qt_nested_types_orc` in `test_tvf_p2` ``` select count(array0), count(array1), count(array2), count(array3), count(struct0), count(struct1), count(map0) from hdfs( "uri" = "hdfs://172.21.16.47:4007/catalog/tvf/orc/all_nested_types.orc", "format" = "orc", "fs.defaultFS" = "hdfs://172.21.16.47:4007") ``` Error Message： errCode = 2, detailMessage = (172.21.0.101)[INTERNAL_ERROR]Wrong data type for colum 'struct1'	2023-05-30 09:55:40 +08:00
Qi Chen	2abbc9f921	[Fix](multi-catalog) Fix parquet bugs of #19758 'replace the single pointer with an array of 'conjuncts' in ExecNode'. (#20191 ) Fix some parquet reader bugs which introduced by #19758 'replace the single pointer with an array of 'conjuncts' in ExecNode'.	2023-05-30 09:55:12 +08:00
airborne12	90b4e127e3	[Feature](inverted index) add parser_mode properties for inverted index parser (#20116 ) We add parser mode for inverted index, usage like this: ``` CREATE TABLE `inverted` ( `FIELD0` text NULL, `FIELD1` text NULL, `FIELD2` text NULL, `FIELD3` text NULL, INDEX idx_name1 (`FIELD0`) USING INVERTED PROPERTIES("parser" = "chinese", "parser_mode" = "fine_grained") COMMENT '', INDEX idx_name2 (`FIELD1`) USING INVERTED PROPERTIES("parser" = "chinese", "parser_mode" = "coarse_grained") COMMENT '' ) ENGINE=OLAP ); ```	2023-05-29 23:21:52 +08:00
Pxl	d1d0d9e5e8	[Chore](build) adjust some compile diagnostic (#20162 )	2023-05-29 19:19:01 +08:00
Xinyi Zou	f9478dbd9a	[fix](function) Fix VcompoundPred execute const column #20158 recurrent: ./run-regression-test.sh --run -suiteParallel 1 -actionParallel 1 -parallel 1 -d query_p0/sql_functions/window_functions select /+ SET_VAR(query_timeout = 600) / subq_0.`c1` as c0 from (select ref_1.`s_name` as c0, ref_1.`s_suppkey` as c1, ref_1.`s_address` as c2, ref_1.`s_address` as c3 from regression_test_query_p0_sql_functions_window_functions.tpch_tiny_supplier as ref_1 where (ref_1.`s_name` is NULL) or (ref_1.`s_acctbal` is not NULL)) as subq_0 where (subq_0.`c3` is NULL) or (subq_0.`c2` is not NULL) reason: FunctionIsNull and FunctionIsNotNull execute returns a const column, but their VectorizedFnCall::is_constant returns false, which causes problems with const handling when VCompoundPred::execute. This pr converts const column to full column in VCompoundPred execute. In the future, there will be a more thorough solution to such problems.	2023-05-29 18:16:58 +08:00
lihangyu	ab8125d56f	[Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037 ) * [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema 1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse. 2. Make some variables wrapped with std::unique<unique_ptr> Performance: \| 状态 \| QPS \| 平均响应时间 (avg) \| P99 响应时间 \| \|------------------\|-----\|------------------\|-------------\| \| 开启 SchemaCache \| 501 \| 20ms \| 34ms \| \| 关闭 SchemaCache \| 321 \| 31ms \| 61ms \| * handle schema change with schema version * remove useless header * rebase	2023-05-29 17:34:53 +08:00
zhannngchen	cc20c430f6	[fix](partial update) use correct tablet schema for rowset writer in publish task (#20117 )	2023-05-29 16:57:18 +08:00
amory	91dae8a5b6	[FIX](mysql_writer) fix mysql output binary object works (#20154 ) * fix struct_export out data * fix mysql writer output with binary true	2023-05-29 16:53:33 +08:00
Gabriel	55ccddb62c	[Conf](decimalv3) enable decimalv3 by default	2023-05-29 15:38:31 +08:00
Pxl	8376e5eefb	[Chore](build) add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array (#20118 ) add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array	2023-05-29 14:42:47 +08:00

1 2 3 4 5 ...

4639 Commits