doris

Author	SHA1	Message	Date
Ashin Gau	3e186a8821	[opt](MergedIO) optimize merge small IO, prevent amplified read (#20305 ) Optimize the strategy of merging small IO to prevent severe read amplification, and turn off merged IO when file cache enabled. Adjustable parameters: ``` // the max amplified read ratio when merging small IO max_amplified_read_ratio=0.8 // the min segment size file_cache_min_file_segment_size = 1048576 ```	2023-06-03 10:51:24 +08:00
lihangyu	f0513a861d	[Improve](Scan) add a session variable to make scan run serial (#20220 ) Parallel scanning can result in some read amplification, for example, select * from xx where limit 1 actually requires only one row of data. However, due to parallel scanning of multiple tablets, read amplification occurs, leading to performance bottlenecks in high-concurrency scenarios. This PR Adding a SessionVariable to enforce serial scanning can help mitigate this issue.	2023-06-01 15:06:35 +08:00
lihangyu	9e21318834	[refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594 ) 1. make ColumnObject exception safe 2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema 3. add more test cases	2023-06-01 10:25:04 +08:00
Lijia Liu	f9dfcb923d	[Enhancement] Change Create Resource Group Grammar (#20249 )	2023-05-31 15:23:24 +08:00
zy-kkk	56fa38de1d	[Enhencement](JDBC Catalog) refactor jdbc catalog insert logic (#19950 ) This PR refactors the old way of writing data to JDBC External Table & JDBC Catalog, mainly including the following tasks 1. Continuing the work of @BePPPower 's PR #18594, changing the logic of splicing Inster sql to operating off-heap memory and using preparedStatement.set to write data logic to complete 2. Supplement the support written by largeint type, mainly to adapt to Java.Math.BigInteger, which uses binary operations 3. Delete the splicing SQL logic in the JDBC External Table & JDBC Catalog related written code ToDo: Binary type，like bit,binary, blob... Finally, special thanks to @BePPPower , @AshinGau for his work Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>	2023-05-30 22:03:39 +08:00
Mingyu Chen	0c98355fff	[fix](catalog) fix create catalog with resource replay issue and kerberos auth issue (#20137 ) 1. Fix create catalog with resource replay bug. If user create catalog using `create catalog hive with resource xxx`, when replaying edit log, there is a bug that resource may be dropped, causing NPE and FE will fail to start. In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true. So that `with resource` will not be allowed, and it will be deprecated later. And also fix the replay bug to avoid NPE. 2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication. When user create 2 hive catalogs, one use simple auth, the other use kerberos auth. The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.` So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`. Which means this property will be added automatically when user creating hive catalog, to avoid such problem. 3. Fix calling `hdfsExists()` issue When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found. 3. Some code refactor Avoid import `org.apache.parquet.Strings`	2023-05-30 16:57:39 +08:00
YueW	de08c4a57b	[enhance](match) Support match query without inverted index (#19936 )	2023-05-30 15:02:57 +08:00
Qi Chen	9b32d42ee4	[Fix](multi-catalog) fix all nested type test which introduced by #19518(support insert-only transactional table). (#20194 ) Fix `qt_nested_types_orc` in `test_tvf_p2` which introduced by #19518(support insert-only transactional table). ### Test case error `qt_nested_types_orc` in `test_tvf_p2` ``` select count(array0), count(array1), count(array2), count(array3), count(struct0), count(struct1), count(map0) from hdfs( "uri" = "hdfs://172.21.16.47:4007/catalog/tvf/orc/all_nested_types.orc", "format" = "orc", "fs.defaultFS" = "hdfs://172.21.16.47:4007") ``` Error Message： errCode = 2, detailMessage = (172.21.0.101)[INTERNAL_ERROR]Wrong data type for colum 'struct1'	2023-05-30 09:55:40 +08:00
Qi Chen	2abbc9f921	[Fix](multi-catalog) Fix parquet bugs of #19758 'replace the single pointer with an array of 'conjuncts' in ExecNode'. (#20191 ) Fix some parquet reader bugs which introduced by #19758 'replace the single pointer with an array of 'conjuncts' in ExecNode'.	2023-05-30 09:55:12 +08:00
lihangyu	ab8125d56f	[Improve](performance) introduce SchemaCache to cache TabletSchame & Schema (#20037 ) * [Improve](performance) introduce SchemaCache to cache TabletSchame & Schema 1. When the system is under high-concurrency load with wide table point queries, the frequent memory allocation and deallocation of Schema become evident system bottlenecks. Additionally, the initialization of TabletSchema and Schema also becomes a CPU hotspot.Therefore, the introduction of a SchemaCache is implemented to cache these resources for reuse. 2. Make some variables wrapped with std::unique<unique_ptr> Performance: \| 状态 \| QPS \| 平均响应时间 (avg) \| P99 响应时间 \| \|------------------\|-----\|------------------\|-------------\| \| 开启 SchemaCache \| 501 \| 20ms \| 34ms \| \| 关闭 SchemaCache \| 321 \| 31ms \| 61ms \| * handle schema change with schema version * remove useless header * rebase	2023-05-29 17:34:53 +08:00
Gabriel	55ccddb62c	[Conf](decimalv3) enable decimalv3 by default	2023-05-29 15:38:31 +08:00
Pxl	8376e5eefb	[Chore](build) add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array (#20118 ) add non-virtual-dtor, remove no-embedded-directive/no-zero-length-array	2023-05-29 14:42:47 +08:00
Pxl	bbb3af6ce6	[Feature](agg_state) support agg_state combinators (#19969 ) support agg_state combinators state/merge/union	2023-05-29 13:07:29 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
zhangstar333	509689491f	[improvement](exec) Refactor the partition sort node to send data in pipeline mode (#20128 ) before: the node will wait to retrieve all data from child, then send data to parent. now: for data from child that does not require sorting, it can be sent to parent immediately.	2023-05-27 22:42:10 +08:00
Qi Chen	cb4a57f44f	[Opt](orc-reader) Support merge small IO facility in orc reader. (#20092 ) #18976 introduced merge small IO facility to optimize performance, and used by parquet reader. This PR support this facility in orc reader. Current ORC reader implementation need to reposition parent present stream when reading lazy columns in lazy materialization facility. So let it works by removing `DCHECK_GE(offset, cached_data.end_offset)`.	2023-05-26 21:06:12 +08:00
Xinyi Zou	a928b21434	[improvement](exception-safe) sort node is completely exception safe #20041	2023-05-26 18:29:02 +08:00
Pxl	43aa062fb1	[Chore](hash-join) remove useless conditions and add some case (#20050 )	2023-05-26 14:45:24 +08:00
Pxl	15a7420661	[Chore](ub) fix some undefined behaviors (#19986 ) /home/zcp/repo_center/doris_master/doris/be/src/olap/rowset/segment_v2/column_reader.cpp:895:21: runtime error: load of value 423208544, which is not a valid value for type 'doris::ReaderType' /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_decimal.cpp:260:33: runtime error: load of misaligned address 0x7fa3348b301c for type 'int64_t' (aka 'long'), which requires 8 byte alignment /home/zcp/repo_center/doris_master/doris/be/src/olap/block_column_predicate.cpp:82:24: runtime error: variable length array bound evaluates to non-positive value 0 /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_string.h:225:26: runtime error: null pointer passed as argument 2, which is declared to never be null	2023-05-26 14:08:40 +08:00
Mryange	92a6122f74	[feature](profile)Add the filtering information of the Bloom filter in profile. (#19789 )	2023-05-26 10:56:58 +08:00
zhangstar333	53ae24912f	[vectorized](feature) support partition sort node (#19708 )	2023-05-25 11:22:02 +08:00
Xinyi Zou	14b4c7abf9	[fix](hashtable) Check query cancel status during build hash table #19970 should cancel query during hash table build stage if the query is cancelled.	2023-05-24 14:24:03 +08:00
Chuanle Chen	6efe6ef6e8	[Enhancement](scanner) allocate blocks in scanner_context on demand and free them on close (#19389 ) Firstly, to reduce memory usage, we do not pre-allocate blocks, instead we lazily allocate block when upper call get_free_block. And when upper call return_free_block to return free block, we add the block to a queue for memory reuse, and we will free the blocks in the queue when the scanner_context was closed instead of destructed. Secondly, to limit the memory usage of the scanner, we introduce a variable _free_blocks_capacity to indicate the current number of free blocks available to the scanners. The number of scanners that can be scheduled will be calculated based on this value. ssb flat test previous lineorder 1.2G: load time: 3s, query time: 0.355s lineorder 5.8G: load time: 330s, query time: 0.970s load time: 349s, query time: 0.949s load time: 349s, query time: 0.955s load time: 360s, query time: 0.889s (pipeline enabled) after lineorder 1.2G: load time: 3s, query time: 0.349s lineorder 5.8G: load time: 342s, query time: 0.929s load time: 337s, query time: 0.913s load time: 345s, query time: 0.946s load time: 346s, query time: 0.865s (pipeline enabled)	2023-05-23 18:17:21 +08:00
Qi Chen	53ba46e404	[Fix][Refactor] Fix 'not member call on null pointer of type 'doris::TextConverter' error in ubsan env and refactor text converter. (#19849 ) Fix 'not member call on null pointer of type doris::TextConverter' error in ubsan env and refactor text converter.	2023-05-22 21:00:19 +08:00
Qi Chen	1d01136b1b	[Fix](parquet-reader) Fix partition field conjuncts not work. (#19837 ) Fix partition field conjuncts not work. Add predicate_partition_columns in _slot_id_to_filter_conjuncts(single slot conjuncts) to _filter_conjuncts, others should had been added from not_single_slot_filter_conjuncts.	2023-05-19 08:44:02 +08:00
WenYao	481e9aebdb	[Refactor](spark load) remove parquet scanner (#19251 )	2023-05-18 19:19:13 +08:00
Ashin Gau	30c4f25cb3	[fix](multi-catalog) verify the precision of datetime types for each data source (#19544 ) Fix threes bugs of timestampv2 precision: 1. Hive catalog doesn't set the precision of timestampv2, and can't get the precision from hive metastore, so set the largest precision for timestampv2; 2. Jdbc catalog use datetimev1 to parse timestamp, and convert to timestampv2, so the precision is lost. 3. TVF doesn't use the precision from meta data of file format.	2023-05-17 20:50:15 +08:00
luozenglin	272a7565b8	[improvement](tracing) Remove useless span levels from be side tracing (#19665 ) 1. Remove an exec node method corresponding to a span and replace it with an exec node corresponding to a span; 2. Fix some problems with tracing in pipeline.	2023-05-17 19:04:52 +08:00
Pxl	7f73749b88	[Bug](pipeline) fix distributionColumnIds not updated correct when outputColumnUnique… (#19704 ) fix distributionColumnIds not updated correct when outputColumnUnique	2023-05-17 00:13:10 +08:00
lihangyu	e22f5891d2	[WIP](row store) two phase opt read row store (#18654 )	2023-05-16 13:21:58 +08:00
Pxl	b927f8cd37	[Chore](asan) change asan_suppr from interceptor_via_lib to interceptor_via_fun (#19636 ) change asan_suppr from interceptor_via_lib to interceptor_via_fun	2023-05-16 10:51:43 +08:00
Pxl	2a02561863	[Bug](ubsan) fix some wrong downcast founded by ubsan (#19591 ) fix some wrong downcast founded by ubsan. ```cpp doris/be/src/olap/bloom_filter_predicate.h:43:32: runtime error: downcast of address 0x7f8ec2b691a0 which does not point to an object of type 'doris::BloomFilterColumnPredicate<doris::TYPE_DATE>::SpecificFilter' (aka 'BloomFilterFunc<(doris::PrimitiveType)11U>') 0x7f8ec2b691a0: note: object is of type 'doris::BloomFilterFunc<(doris::PrimitiveType)12>' e5 55 00 00 10 74 58 42 e5 55 00 00 00 00 10 00 8e 7f 00 00 20 07 6f cc 8e 7f 00 00 80 fe 68 cc ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'doris::BloomFilterFunc<(doris::PrimitiveType)12>' ``` 1. TYPE_DATE/TYPE_DATETIME have same data format, so I change the cast about bloom filter to reinterpret cast. ```cpp doris/be/src/vec/exec/format/orc/vorc_reader.h:281:17: runtime error: downcast of address 0x7f562f4c3180 which does not point to an object of type 'ColumnVector<int>' 0x7f562f4c3180: note: object is of type 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >' 74 65 00 00 20 91 70 f5 ca 55 00 00 02 00 00 00 00 00 00 00 f0 d4 4c 2f 56 7f 00 00 f0 d4 4c 2f ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >' ``` 2. doris use ColumnDecimal to store decimal elements.	2023-05-15 14:27:48 +08:00
Pxl	4eb2604789	[Bug](function) fix function define of Retention inconsist and change some static_cast to assert cast (#19455 ) 1. fix function define of `Retention` inconsist, this function return tinyint on `FE` and return uint8 on `BE` 2. make assert_cast support cast to derived 3. change some static cast to assert cast 4. support sum(bool)/avg(bool)	2023-05-15 11:50:02 +08:00
zclllyybb	92bf485abd	[Bug] Fix doris pipeline shared scan and top n opt (#19599 )	2023-05-15 10:00:44 +08:00
yiguolei	8ef9212ddc	[enhancement](exceptionsafe) force check exec node method's return value (#19538 )	2023-05-12 10:21:00 +08:00
starocean999	e9392780a9	[fix](nereids)fix some nereids planner bugs (#19509 ) 1.some encrypt and decrypt functions have wrong blockEncryptionMode 2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id 3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )	2023-05-12 09:06:16 +08:00
Qi Chen	0b25376cf8	[feature](torc) support insert only transactional hive table on be side (#19518 )	2023-05-11 14:15:09 +08:00
yiguolei	1d421a26d9	[bugfix](memory) merge block may allocate failed (#19507 )	2023-05-11 10:42:47 +08:00
Ashin Gau	d7ad299154	[fix](NestedType) throw error when reading complex nested type in orc&parquet (#19489 ) Doris block does not support complex nested type now, but orc and parquet reader has generated complex nested column, which makes the output of mysql client wrong and users confused.	2023-05-11 07:51:02 +08:00
Ashin Gau	3ba3b6c66f	[opt](FileCache) use modification time to determine whether the file is changed (#18906 ) Get the last modification time from file status, and use the combination of path and modification time to generate cache identifier. When a file is changed, the modification time will be changed, so the former cache path will be invalid.	2023-05-11 07:50:39 +08:00
Tiewei Fang	95833426e8	[BugFix](table-value-function) Fix backends() tvf (#19452 ) Change the `Alive/SystemDecommissioned/ClusterDecommissioned` field type of the `backends()`tvf to bool	2023-05-11 07:49:27 +08:00
yiguolei	9ffdbae442	[bugfix](jdbcconnector) jdbc connector cast string to array core (#19494 ) introduced by https://github.com/apache/doris/pull/18328/files Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-05-10 21:46:20 +08:00
Gabriel	4483e3a6e1	[Improvement](scan) add a config for scan queue memory limit (#19439 )	2023-05-10 13:14:23 +08:00
Pxl	5473795a51	[Bug](scan) forbiden push down in predicate when in_state->use_set is false (#19471 ) forbiden push down in predicate when in_state->use_set is false	2023-05-10 11:12:20 +08:00
Xinyi Zou	cf8ceb8586	[fix](scan) fix scanner mem tracker (#19354 )	2023-05-10 09:56:41 +08:00
Qi Chen	096aa25ca6	[improvement](orc-reader) Implements ORC lazy materialization (#18615 ) - Implements ORC lazy materialization, integrate with the implementation of https://github.com/apache/doris-thirdparty/pull/56 and https://github.com/apache/doris-thirdparty/pull/62. - Refactor code: Move `execute_conjuncts()` and `execute_conjuncts_and_filter_block()` in `parquet_group_reader `to `VExprContext`, used by parquet reader and orc reader. - Add session variables `enable_parquet_lazy_materialization` and `enable_orc_lazy_materialization` to control whether enable lazy materialization. - Modify `build.sh` to update apache-orc submodule or download package every time.	2023-05-09 23:33:33 +08:00
yongkang.zhong	1bc405c06f	[fix](catalog) fix doris jdbc catalog largeint select error (#19407 ) when I use mysql-jdbc 5.1.47 create a doris jdbc catalog, the largeint cannot select When mysql-jdbc reads largeint, it will convert the format to string because it is too long mysql> select `largeint` from type3; ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Fail to convert jdbc type of java.lang.String to doris type LARGEINT on column: largeint. You need to check this column type between external table and doris table.	2023-05-09 17:34:48 +08:00
chenlinzhong	aeb3450151	[feature](graph)Support querying data from the Nebula graph database (#19209 ) Support querying data from the Nebula graph database This feature comes from the needs of commercial customers who have used Doris and Nebula, hoping to connect these two databases changes mainly include: * add New Graph Database JDBC Type * Adapt the type and map the graph to the Doris type	2023-05-09 15:30:11 +08:00
Adonis Ling	673cbe3317	[chore](build) Porting to GCC-13 (#19293 ) Support using GCC-13 to build the codebase.	2023-05-08 10:42:06 +08:00
Qi Chen	b50e2a8c08	[Fix](parquet-reader) Fix dict cols not be converted back to string type in some cases. (#19348 ) Fix dict cols not be converted back to string type in some cases, which includes introduced by #19039. For dict cols, we will convert dict cols to int32 type firstly, then convert back to string type after read block. The block will be reuse it, so it is necessary to convert it back.	2023-05-07 10:05:23 +08:00

1 2 3 4 5 ...

766 Commits