doris

Author	SHA1	Message	Date
Xin Liao	ca88425bee	[Enhancement](merge-on-write) optimize bloom filter for primary key index (#20182 )	2023-05-31 09:49:15 +08:00
Jack Drogon	aae04d9680	[Chore](log) Remove some verbose log && Change log level (#20236 )	2023-05-31 09:15:01 +08:00
zy-kkk	56fa38de1d	[Enhencement](JDBC Catalog) refactor jdbc catalog insert logic (#19950 ) This PR refactors the old way of writing data to JDBC External Table & JDBC Catalog, mainly including the following tasks 1. Continuing the work of @BePPPower 's PR #18594, changing the logic of splicing Inster sql to operating off-heap memory and using preparedStatement.set to write data logic to complete 2. Supplement the support written by largeint type, mainly to adapt to Java.Math.BigInteger, which uses binary operations 3. Delete the splicing SQL logic in the JDBC External Table & JDBC Catalog related written code ToDo: Binary type，like bit,binary, blob... Finally, special thanks to @BePPPower , @AshinGau for his work Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>	2023-05-30 22:03:39 +08:00
Chenyang Sun	accaff1026	[Feature](compaction) wip: single replica compaction (#19237 ) Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica. The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica. The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool. When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.	2023-05-30 21:12:48 +08:00
Pxl	7415135ad4	[Enchancement](execute) make assert_cast can output derived class name (#20212 ) before: F0530 11:02:41.989699 1154607 assert_cast.h:54] Bad cast from type:doris::vectorized::IDataType const* to doris::vectorized::DataTypeAggState const* after: F0530 11:24:28.390286 1292475 assert_cast.h:46] Bad cast from type:doris::vectorized::DataTypeNullable* to doris::vectorized::DataTypeAggState const*	2023-05-30 20:23:04 +08:00
zzzxl	1919355c04	[Feature](Inverted index) add MATCH_ PHRASE query (#20156 )	2023-05-30 19:28:57 +08:00
airborne12	3d8440a1b7	[Feature-WIP](inverted index) support phrase for inverted index writer (#20193 )	2023-05-30 17:07:45 +08:00
Mingyu Chen	0c98355fff	[fix](catalog) fix create catalog with resource replay issue and kerberos auth issue (#20137 ) 1. Fix create catalog with resource replay bug. If user create catalog using `create catalog hive with resource xxx`, when replaying edit log, there is a bug that resource may be dropped, causing NPE and FE will fail to start. In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true. So that `with resource` will not be allowed, and it will be deprecated later. And also fix the replay bug to avoid NPE. 2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication. When user create 2 hive catalogs, one use simple auth, the other use kerberos auth. The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.` So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`. Which means this property will be added automatically when user creating hive catalog, to avoid such problem. 3. Fix calling `hdfsExists()` issue When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found. 3. Some code refactor Avoid import `org.apache.parquet.Strings`	2023-05-30 16:57:39 +08:00
Qi Chen	4475a69c57	[Fix](multi-catalog) Fix q03 in `text_external_brown` regression test by handling correctly when text converter parsing error. (#20190 ) Issue Number: close #20189 Fix `q03` in `text_external_brown` regression test by handling correctly when text converter parsing error.	2023-05-30 15:08:28 +08:00
YueW	de08c4a57b	[enhance](match) Support match query without inverted index (#19936 )	2023-05-30 15:02:57 +08:00
bobhan1	bb12a1cb49	[Enhance](array function) add support for DecimalV3 for array_enumerate_uniq() (#17724 )	2023-05-30 13:09:19 +08:00
Gabriel	c7b8c83a7f	[Improvement](runtimefilter) Build bloom filter according to the exact build size for IN_OR_BLOOM_FILTER (#20166 ) * [Improvement](runtimefilter) Build bloom filter according to the exact build size for IN_OR_BLOOM_FILTER	2023-05-30 12:55:30 +08:00
lihangyu	945cb56fb6	[Bug](segment iterator) remove DCHECK for block row count (#20199 ) CHECK rows count of block at segment iterator is not ready when `enable_common_expr_pushdown`	2023-05-30 11:34:25 +08:00
Qi Chen	9b32d42ee4	[Fix](multi-catalog) fix all nested type test which introduced by #19518(support insert-only transactional table). (#20194 ) Fix `qt_nested_types_orc` in `test_tvf_p2` which introduced by #19518(support insert-only transactional table). ### Test case error `qt_nested_types_orc` in `test_tvf_p2` ``` select count(array0), count(array1), count(array2), count(array3), count(struct0), count(struct1), count(map0) from hdfs( "uri" = "hdfs://172.21.16.47:4007/catalog/tvf/orc/all_nested_types.orc", "format" = "orc", "fs.defaultFS" = "hdfs://172.21.16.47:4007") ``` Error Message： errCode = 2, detailMessage = (172.21.0.101)[INTERNAL_ERROR]Wrong data type for colum 'struct1'	2023-05-30 09:55:40 +08:00
Qi Chen	2abbc9f921	[Fix](multi-catalog) Fix parquet bugs of #19758 'replace the single pointer with an array of 'conjuncts' in ExecNode'. (#20191 ) Fix some parquet reader bugs which introduced by #19758 'replace the single pointer with an array of 'conjuncts' in ExecNode'.	2023-05-30 09:55:12 +08:00
airborne12	90b4e127e3	[Feature](inverted index) add parser_mode properties for inverted index parser (#20116 ) We add parser mode for inverted index, usage like this: ``` CREATE TABLE `inverted` ( `FIELD0` text NULL, `FIELD1` text NULL, `FIELD2` text NULL, `FIELD3` text NULL, INDEX idx_name1 (`FIELD0`) USING INVERTED PROPERTIES("parser" = "chinese", "parser_mode" = "fine_grained") COMMENT '', INDEX idx_name2 (`FIELD1`) USING INVERTED PROPERTIES("parser" = "chinese", "parser_mode" = "coarse_grained") COMMENT '' ) ENGINE=OLAP ); ```	2023-05-29 23:21:52 +08:00
Pxl	d1d0d9e5e8	[Chore](build) adjust some compile diagnostic (#20162 )	2023-05-29 19:19:01 +08:00
Xinyi Zou	f9478dbd9a	[fix](function) Fix VcompoundPred execute const column #20158 recurrent: ./run-regression-test.sh --run -suiteParallel 1 -actionParallel 1 -parallel 1 -d query_p0/sql_functions/window_functions select /+ SET_VAR(query_timeout = 600) / subq_0.`c1` as c0 from (select ref_1.`s_name` as c0, ref_1.`s_suppkey` as c1, ref_1.`s_address` as c2, ref_1.`s_address` as c3 from regression_test_query_p0_sql_functions_window_functions.tpch_tiny_supplier as ref_1 where (ref_1.`s_name` is NULL) or (ref_1.`s_acctbal` is not NULL)) as subq_0 where (subq_0.`c3` is NULL) or (subq_0.`c2` is not NULL) reason: FunctionIsNull and FunctionIsNotNull execute returns a const column, but their VectorizedFnCall::is_constant returns false, which causes problems with const handling when VCompoundPred::execute. This pr converts const column to full column in VCompoundPred execute. In the future, there will be a more thorough solution to such problems.	2023-05-29 18:16:58 +08:00
lihangyu	ab8125d56f	[Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037 ) * [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema 1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse. 2. Make some variables wrapped with std::unique<unique_ptr> Performance: \| 状态 \| QPS \| 平均响应时间 (avg) \| P99 响应时间 \| \|------------------\|-----\|------------------\|-------------\| \| 开启 SchemaCache \| 501 \| 20ms \| 34ms \| \| 关闭 SchemaCache \| 321 \| 31ms \| 61ms \| * handle schema change with schema version * remove useless header * rebase	2023-05-29 17:34:53 +08:00
zhannngchen	cc20c430f6	[fix](partial update) use correct tablet schema for rowset writer in publish task (#20117 )	2023-05-29 16:57:18 +08:00
amory	91dae8a5b6	[FIX](mysql_writer) fix mysql output binary object works (#20154 ) * fix struct_export out data * fix mysql writer output with binary true	2023-05-29 16:53:33 +08:00
Gabriel	55ccddb62c	[Conf](decimalv3) enable decimalv3 by default	2023-05-29 15:38:31 +08:00
Pxl	8376e5eefb	[Chore](build) add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array (#20118 ) add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array	2023-05-29 14:42:47 +08:00
Pxl	bbb3af6ce6	[Feature](agg_state) support agg_state combinators (#19969 ) support agg_state combinators state/merge/union	2023-05-29 13:07:29 +08:00
airborne12	8378ab5e41	[Fix](inverted index) fix memeory leak when inverted index writer do not finish correctly (#20028 ) * [Fix](inverted index) fix memeory leak when inverted index writer do not finish correctly * [Update](inverted index) use smart pointer to avoid memeory leak * [Chore](format) code format --------- Co-authored-by: airborne12 <airborne12@gmail.com>	2023-05-29 12:18:14 +08:00
Mryange	a86134cb39	[fix](executor) Fixed an error with cast as time. #20144 before mysql [(none)]>select cast("10:10:10" as time); +-------------------------------+ \| CAST('10:10:10' AS TIMEV2(0)) \| +-------------------------------+ \| 00:00:00 \| +-------------------------------+ after mysql [(none)]>select cast("10:10:10" as time); +-------------------------------+ \| CAST('10:10:10' AS TIMEV2(0)) \| +-------------------------------+ \| 10:10:10 \| +-------------------------------+ In the past, we supported this syntax. mysql [(none)]>select cast("2023:05:01 13:14:15" as time); +------------------------------------------+ \| CAST('2023:05:01 13:14:15' AS TIMEV2(0)) \| +------------------------------------------+ \| 13:14:15 \| +------------------------------------------+ However, "10:10:10" is also a valid datetime. mysql [(none)]>select cast("10:10:10" as datetime); +-----------------------------------+ \| CAST('10:10:10' AS DATETIMEV2(0)) \| +-----------------------------------+ \| 2010-10-10 00:00:00 \| +-----------------------------------+ So here, the order of parsing has been adjusted.	2023-05-29 12:17:21 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
Kang	859b03dfdf	[Improvement](topn) prevent memory usage of key topn increasing unlimited (#19978 )	2023-05-29 10:16:15 +08:00
Yongqiang YANG	e0d9f7f955	[enhancement](load) add some profile items for load (#20141 )	2023-05-29 09:54:03 +08:00
yujun	42239d635a	[fix](tablet_manager_lock) fix create tablet timeout #20067 (#20069 )	2023-05-28 23:05:13 +08:00
AlexYue	4573ee9a49	[enhance](PrefetchReader) abort load task when data size returned by S3 is smaller than requested (#19947 ) We encountered one confusing situation where buffered reader were trapped in one endless loop when calling readat. Then we found out that it was all due to the return data size is less than requested. As the following picture shows, the actual data size is about 2M, and when we called readat it only retrieved about 1MB.	2023-05-28 21:48:17 +08:00
amory	9d44918036	[Improve](data-type) Clean datatype uselesscode (#20145 ) * fix struct_export out data * delete useless code with data type	2023-05-28 20:48:29 +08:00
bobhan1	c45da40ed7	[refactor-WIP](TaskWorkerPool) add specific classes for ALTER_TABLE, CLONE, STORAGE_MEDIUM_MIGRATE task (#20140 )	2023-05-28 19:27:08 +08:00
YueW	ae352997b4	[Enhancement](alter inverted index) Improve alter inverted index performance with light weight add or drop inverted index (#19063 )	2023-05-28 11:23:07 +08:00
AlexYue	da17c45c0b	[enhance](FileWriter)enhance s3 file writer bvar to avoid adding abort bytes (#20138 ) * don't add each time upload or it would add aborted bytes * alloca memory	2023-05-28 10:52:37 +08:00
bobhan1	0434c6a738	[refactor-WIP](TaskWorkerPool) add specific classes for PUSH, PUBLIC_VERION, CLEAR_TRANSACTION tasks (#19822 )	2023-05-27 22:47:45 +08:00
zhangstar333	509689491f	[improvement](exec) Refactor the partition sort node to send data in pipeline mode (#20128 ) before: the node will wait to retrieve all data from child, then send data to parent. now: for data from child that does not require sorting, it can be sent to parent immediately.	2023-05-27 22:42:10 +08:00
airborne12	ac8599fedb	[Fix](single replica load) fix indices_size key not found core (#20047 )	2023-05-27 13:28:07 +08:00
lihangyu	f3d8af330a	[Bug](point query) check point query before check two phase read (#20055 ) * [Bug](point query) checkAndSetPointQuery before checkEnableTwoPhaseRead 1. checkEnableTwoPhaseRead rely on thr short circuit flag 2. add more metric to display lookup profile * fix rebase	2023-05-27 12:38:58 +08:00
Adonis Ling	16c46974c5	[chore](build) Fix compilation errors reported by GCC-13 (#20114 )	2023-05-27 08:25:52 +08:00
Jack Drogon	93933308e6	[Feature-WIP](CCR): Add ccr doris interface (WIP) (#17881 )	2023-05-26 23:40:49 +08:00
HappenLee	e5b0d7a5cd	[CTE](eof) Support cte reuse reduce counter by eof status and pipeline task mem can release (#20056 )	2023-05-26 22:03:29 +08:00
plat1ko	3c6227a900	[fix](filesystem) Fix core caused by using moved variable in batch_delete_impl #20033	2023-05-26 21:39:27 +08:00
Gabriel	23ad72e734	[Bug](runtime filter) Fix min/max filter for decimalv3 (#20005 )	2023-05-26 21:35:21 +08:00
Qi Chen	cb4a57f44f	[Opt](orc-reader) Support merge small IO facility in orc reader. (#20092 ) #18976 introduced merge small IO facility to optimize performance, and used by parquet reader. This PR support this facility in orc reader. Current ORC reader implementation need to reposition parent present stream when reading lazy columns in lazy materialization facility. So let it works by removing `DCHECK_GE(offset, cached_data.end_offset)`.	2023-05-26 21:06:12 +08:00
zclllyybb	346c51faa2	[fix](expr) Make VExprContext exit gracefully (#19984 )	2023-05-26 20:21:53 +08:00
Ashin Gau	9458a24cd7	[fix](multi-catalog) values in sqlserver should be enclosed by single quotes (#19971 ) Fix errors when inserting string/date/datetime values into SQLServer: ERROR 1105 (HY000): errCode = 2, detailMessage = (172.21.0.101)[INTERNAL_ERROR]UdfRuntimeException: JDBC executor sql has error: CAUSED BY: SQLServerException: Invalid column name '2021-10-30'. When using double quotes enclose string values, it will be parsed as column name, so we should enclose string values with single quotes.	2023-05-26 20:04:45 +08:00
Xinyi Zou	a928b21434	[improvement](exception-safe) sort node is completely exception safe #20041	2023-05-26 18:29:02 +08:00
qiye	9e70a9ef84	[opt](compaction) add pick rowset to compact interval config (#19868 )	2023-05-26 17:39:02 +08:00
lihangyu	317338913c	[Bug](topn) Fix topn fetch set real default value (#20074 ) 1. Before this PR if rowset does not contain column which should be read for related SlotDescriptor will call `insert_default` to column, but it's not this real defautl value.Real default value relevant information should be provided by the frontend side. 2. Support fetch when light schema change is not enabled, but disable for AGG or UNIQUE MOR model	2023-05-26 16:06:55 +08:00

1 2 3 4 5 ...

4612 Commits