doris

Author	SHA1	Message	Date
HappenLee	cb943ae7ca	[pipeline](bug) DCHECK may failed in pip sender queue (#19545 ) DCHECK may failed in pip sender queue	2023-05-12 20:39:18 +08:00
yiguolei	8ef9212ddc	[enhancement](exceptionsafe) force check exec node method's return value (#19538 )	2023-05-12 10:21:00 +08:00
starocean999	e9392780a9	[fix](nereids)fix some nereids planner bugs (#19509 ) 1.some encrypt and decrypt functions have wrong blockEncryptionMode 2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id 3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )	2023-05-12 09:06:16 +08:00
xy720	39ec8aa64c	[refactor](complex-type) refactor array/map/struct literal to not invoke execute() function in prepare state (#19068 )	2023-05-11 18:44:37 +08:00
Qi Chen	0b25376cf8	[feature](torc) support insert only transactional hive table on be side (#19518 )	2023-05-11 14:15:09 +08:00
herry2038	834bf2eab7	[feature](array) Add array_last lambda function (#18388 ) Add array_last lambda function	2023-05-11 13:15:54 +08:00
yiguolei	1d421a26d9	[bugfix](memory) merge block may allocate failed (#19507 )	2023-05-11 10:42:47 +08:00
zhangstar333	1d1b2f98c3	[refactor](function) let agg functions exception safety (#19109 )	2023-05-11 10:17:11 +08:00
Ashin Gau	d7ad299154	[fix](NestedType) throw error when reading complex nested type in orc&parquet (#19489 ) Doris block does not support complex nested type now, but orc and parquet reader has generated complex nested column, which makes the output of mysql client wrong and users confused.	2023-05-11 07:51:02 +08:00
Ashin Gau	3ba3b6c66f	[opt](FileCache) use modification time to determine whether the file is changed (#18906 ) Get the last modification time from file status, and use the combination of path and modification time to generate cache identifier. When a file is changed, the modification time will be changed, so the former cache path will be invalid.	2023-05-11 07:50:39 +08:00
Tiewei Fang	95833426e8	[BugFix](table-value-function) Fix backends() tvf (#19452 ) Change the `Alive/SystemDecommissioned/ClusterDecommissioned` field type of the `backends()`tvf to bool	2023-05-11 07:49:27 +08:00
Jerry Hu	47edc5a06e	[fix](functions) Support nullable column for multi_string functions (#19498 )	2023-05-11 01:13:13 +08:00
zclllyybb	28e088aee1	[optimization](be) optimization for ColumnConst when writing mysql result (#19122 ) * opt for result * fix	2023-05-11 01:04:18 +08:00
yiguolei	9ffdbae442	[bugfix](jdbcconnector) jdbc connector cast string to array core (#19494 ) introduced by https://github.com/apache/doris/pull/18328/files Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-05-10 21:46:20 +08:00
Gabriel	4483e3a6e1	[Improvement](scan) add a config for scan queue memory limit (#19439 )	2023-05-10 13:14:23 +08:00
Pxl	5473795a51	[Bug](scan) forbiden push down in predicate when in_state->use_set is false (#19471 ) forbiden push down in predicate when in_state->use_set is false	2023-05-10 11:12:20 +08:00
Xinyi Zou	cf8ceb8586	[fix](scan) fix scanner mem tracker (#19354 )	2023-05-10 09:56:41 +08:00
amory	b2371c1246	[Refact](Literal)refact literal get field and value (#19351 )	2023-05-10 09:01:17 +08:00
Qi Chen	096aa25ca6	[improvement](orc-reader) Implements ORC lazy materialization (#18615 ) - Implements ORC lazy materialization, integrate with the implementation of https://github.com/apache/doris-thirdparty/pull/56 and https://github.com/apache/doris-thirdparty/pull/62. - Refactor code: Move `execute_conjuncts()` and `execute_conjuncts_and_filter_block()` in `parquet_group_reader `to `VExprContext`, used by parquet reader and orc reader. - Add session variables `enable_parquet_lazy_materialization` and `enable_orc_lazy_materialization` to control whether enable lazy materialization. - Modify `build.sh` to update apache-orc submodule or download package every time.	2023-05-09 23:33:33 +08:00
Pxl	dfad7b6b38	[Feature](generic-aggregation) some prowork of generic aggregation (#19343 ) some prowork of generic aggregation	2023-05-09 21:42:21 +08:00
yongkang.zhong	1bc405c06f	[fix](catalog) fix doris jdbc catalog largeint select error (#19407 ) when I use mysql-jdbc 5.1.47 create a doris jdbc catalog, the largeint cannot select When mysql-jdbc reads largeint, it will convert the format to string because it is too long mysql> select `largeint` from type3; ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Fail to convert jdbc type of java.lang.String to doris type LARGEINT on column: largeint. You need to check this column type between external table and doris table.	2023-05-09 17:34:48 +08:00
chenlinzhong	aeb3450151	[feature](graph)Support querying data from the Nebula graph database (#19209 ) Support querying data from the Nebula graph database This feature comes from the needs of commercial customers who have used Doris and Nebula, hoping to connect these two databases changes mainly include: * add New Graph Database JDBC Type * Adapt the type and map the graph to the Doris type	2023-05-09 15:30:11 +08:00
DeadlineFen	e08de52ee7	[chore](compile) using PCH for compilation acceleration under clang (#19303 )	2023-05-08 19:51:06 +08:00
Tiewei Fang	e78149cb65	[Enhencement](Export) add property for outfile/export and add test (#18997 ) This pr does three things: 1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first. 2. add p2 test for export 3. modify docs	2023-05-08 14:02:20 +08:00
Adonis Ling	673cbe3317	[chore](build) Porting to GCC-13 (#19293 ) Support using GCC-13 to build the codebase.	2023-05-08 10:42:06 +08:00
Qi Chen	b50e2a8c08	[Fix](parquet-reader) Fix dict cols not be converted back to string type in some cases. (#19348 ) Fix dict cols not be converted back to string type in some cases, which includes introduced by #19039. For dict cols, we will convert dict cols to int32 type firstly, then convert back to string type after read block. The block will be reuse it, so it is necessary to convert it back.	2023-05-07 10:05:23 +08:00
Yusheng Xu	9edbfa37cd	[Enhancement](Broker Load) New progress manager for showing loading progress status (#19170 ) This work is in the early stage, current progress is not accurate because the scan range will be too large for gathering information, what's more, only file scan node and import job support new progress manager ## How it works for example, when we use the following load query: ``` LOAD LABEL test_broker_load ( DATA INFILE("XXX") INTO TABLE `XXX` ...... ) ``` Initial Progress: the query will call `BrokerLoadJob` to create job, then `coordinator` is called to calculate scan range and its location. Update Progress: BE will report runtime_state to FE and FE update progress status according to jobID and fragmentID we can use `show load` to see the progress PENDING: ``` State: PENDING Progress: 0.00% ``` LOADING: ``` State: LOADING Progress: 14.29% (1/7) ``` FINISH: ``` State: FINISHED Progress: 100.00% (7/7) ``` At current time, full output of `show load\G` looks like: ``` ************************* 1. row ************************* JobId: 25052 Label: test_broker State: LOADING Progress: 0.00% (0/7) Type: BROKER EtlInfo: NULL TaskInfo: cluster:N/A; timeout(s):250000; max_filter_ratio:0.0 ErrorMsg: NULL CreateTime: 2023-05-03 20:53:13 EtlStartTime: 2023-05-03 20:53:15 EtlFinishTime: 2023-05-03 20:53:15 LoadStartTime: 2023-05-03 20:53:15 LoadFinishTime: NULL URL: NULL JobDetails: {"Unfinished backends":{"5a9a3ecd203049bc-85e39a765c043228":[10080]},"ScannedRows":39611808,"TaskNumber":1,"LoadBytes":7398908902,"All backends":{"5a9a3ecd203049bc-85e39a765c043228":[10080]},"FileNumber":1,"FileSize":7895697364} TransactionId: 14015 ErrorTablets: {} User: root Comment: ``` ## TODO: 1. The current partition granularity of scan range is too large, resulting in an uneven loading process for progress." 2. Only broker load supports the new Progress Manager, support progress for other query	2023-05-06 22:44:40 +08:00
Gabriel	4c6ca88088	Revert "[refactor](function) ignore DST for function `from_unixtime` (#19151 )" (#19333 ) This reverts commit 9dd6c8f87b73db238bfd38fb1d76f3796910f398.	2023-05-06 16:33:58 +08:00
Pxl	dff669899a	[Feature](generic-aggregation) add some type define for generic aggregate functions support (#19252 ) add some type define for generic aggregate functions support	2023-05-06 11:30:13 +08:00
yiguolei	153f42a873	[enhancement](exprcontext) modify get_output_block_after_execute_expr method more clear to avoid mis usage (#19310 ) The original method signature is Block VExprContext::get_output_block_after_execute_exprs( const std::vectorvectorized::VExprContext*& output_vexpr_ctxs, const Block& input_block, Status& status) It return error status as a out parameter and the block as return value. It has to check the block.rows == 0 and then check error status. It is not conforming to the convention. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-05-06 09:03:22 +08:00
Xinyi Zou	58cb404661	[fix](memory) Allocator throws Exception instead of std::bad_alloc (#19285 ) W0505 01:31:25.840227 1727715 scanner_scheduler.cpp:340] Scan thread read VScanner failed: [MEM_LIMIT_EXCEEDED]PreCatch error code:11, [E11] Allocator sys memory check failed: Cannot alloc:16384, consuming tracker:<Orphan>, exec node:<>, process memory used 5.87 GB exceed limit 5.64 GB or sys mem available 252.17 GB less than low water mark 1.60 GB, failed alloc size 16.00 KB. @ 0x555c19e0cca8 doris::Exception::Exception() @ 0x555c1c3e0c3f Allocator<>::sys_memory_check() @ 0x555c1c3e1052 Allocator<>::memory_check() @ 0x555c19e0a645 Allocator<>::alloc() @ 0x555c1c34508b COWHelper<>::create<>() @ 0x555c1e23f574 doris::vectorized::ConvertThroughParsing<>::execute<>() @ 0x555c1e23f209 doris::vectorized::FunctionConvertFromString<>::execute_impl() @ 0x555c1e23f4aa doris::vectorized::FunctionConvertFromString<>::execute_impl() @ 0x555c1e15ac29 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x555c1e15ac56 doris::vectorized::PreparedFunctionImpl::execute() @ 0x555c1e245276 _ZNSt17_Function_handlerIFN5doris6StatusEPNS0_15FunctionContextERNS0_10vectorized5BlockERKSt6vectorImSaImEEmmEZNKS4_12FunctionCast14create_wrapperINS4_14DataTypeNumberIiEEEESt8functionISC_ERKSt10shared_ptrIKNS4_9IDataTypeEEPKT_bEUlS3_S6_SB_mmE_E9_M_invokeERKSt9_Any_dataOS3_S6_SB_OmSY_ @ 0x555c1e2a9341 _ZZNK5doris10vectorized12FunctionCast23prepare_remove_nullableEPNS_15FunctionContextERKSt10shared_ptrIKNS0_9IDataTypeEES9_bENKUlS3_RNS0_5BlockERKSt6vectorImSaImEEmmE_clES3_SB_SG_mm @ 0x555c1e2a8d42 _ZNSt17_Function_handlerIFN5doris6StatusEPNS0_15FunctionContextERNS0_10vectorized5BlockERKSt6vectorImSaImEEmmEZNKS4_12FunctionCast23prepare_remove_nullableES3_RKSt10shared_ptrIKNS4_9IDataTypeEESJ_bEUlS3_S6_SB_mmE_E9_M_invokeERKSt9_Any_dataOS3_S6_SB_OmSQ_ @ 0x555c1e20e42b doris::vectorized::PreparedFunctionCast::execute_impl() @ 0x555c1e15ac29 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x555c1e15ac56 doris::vectorized::PreparedFunctionImpl::execute() @ 0x555c1d63e960 doris::vectorized::IFunctionBase::execute() @ 0x555c1d628700 doris::vectorized::VCastExpr::execute() @ 0x555c1d6163e5 doris::vectorized::VExprContext::execute() @ 0x555c20a83fe1 doris::vectorized::VFileScanner::_convert_to_output_block() @ 0x555c20a809af doris::vectorized::VFileScanner::_get_block_impl() @ 0x555c209b9bc4 doris::vectorized::VScanner::get_block() @ 0x555c209b1a50 doris::vectorized::ScannerScheduler::_scanner_scan() @ 0x555c209b2ac1 _ZNSt17_Function_handlerIFvvEZZN5doris10vectorized16ScannerScheduler18_schedule_scannersEPNS2_14ScannerContextEENK3$_0clEvEUlvE1_E9_M_invokeERKSt9_Any_data @ 0x555c1a8378cf doris::ThreadPool::dispatch_thread() @ 0x555c1a830fac doris::Thread::supervise_thread() @ 0x7f461faa117a start_thread @ 0x7f462033bdf3 __GI___clone @ (nil) (unknown)	2023-05-05 18:01:48 +08:00
Xinyi Zou	f2a34dde52	[fix](memory) Fix memory leak due to incorrect block reuse of AggregateFunctionSortData #19214	2023-05-05 14:29:34 +08:00
Ashin Gau	b6c7f3aeb8	[opt](FileCache) Add file cache metrics and management (#19177 ) Add file cache metrics and management. 1. Get file cache metrics > If the performance of file cache is not efficient, there are currently no metrics to investigate the cause. In practice, hit ratio, disk usage, and segments removed status are very important information. API: `http://be_host:be_webserver_port/metrics` File cache metrics for each base path start with `doris_be_file_cache_` prefix. `hits_ratio` is the hit ratio of the cache since BE startup; `removed_elements` is the num of removed segment files since BE startup; Every cache path has three queues: index, normal and disposable. The capacity ratio of the three queues is 1:17:2. ``` doris_be_file_cache_hits_ratio{path="/mnt/datadisk1/gaoxin/file_cache"} 0.500000 doris_be_file_cache_hits_ratio{path="/mnt/datadisk1/gaoxin/small_file_cache"} 0.500000 doris_be_file_cache_removed_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 0 doris_be_file_cache_removed_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 0 doris_be_file_cache_normal_queue_max_size{path="/mnt/datadisk1/gaoxin/file_cache"} 912680550400 doris_be_file_cache_normal_queue_max_size{path="/mnt/datadisk1/gaoxin/small_file_cache"} 8500000000 doris_be_file_cache_normal_queue_max_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 217600 doris_be_file_cache_normal_queue_max_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 102400 doris_be_file_cache_normal_queue_curr_size{path="/mnt/datadisk1/gaoxin/file_cache"} 14129846 doris_be_file_cache_normal_queue_curr_size{path="/mnt/datadisk1/gaoxin/small_file_cache"} 14874904 doris_be_file_cache_normal_queue_curr_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 18 doris_be_file_cache_normal_queue_curr_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 22 ... ``` 2. Release file cache > Frequent segment files swapping can seriously affect the performance of file cache. Adding a deletion interface helps users clean up the file cache. API: `http://be_host:be_webserver_port/api/file_cache?op=release&base_path=${file_cache_base_path}` Return the number of released segment files. If `base_path` is not provide in url, all cache paths will be released. It's thread-safe to call this api, so only the segment files not been read currently can be released. ``` {"released_elements":22} ``` 3. Specify the base path to store cache data > Currently, regression testing lacks test cases of file cache, which cannot guarantee the stability of file cache. This interface is generally used in regression testing scenarios. Different queries use different paths to verify different usage cases and performance. User can set session variable `file_cache_base_path` to specify the base path to store cache data. `file_cache_base_path="random"` as default, means chosing a random path from cached paths to store cache data. If `file_cache_base_path` is not one of the base paths in BE configuration, a random path is used.	2023-05-05 14:28:01 +08:00
Gabriel	9dd6c8f87b	[refactor](function) ignore DST for function `from_unixtime` (#19151 )	2023-05-05 11:51:49 +08:00
yiguolei	4e4fb33995	[refactor](conjuncts) simplify conjuncts in exec node (#19254 ) Co-authored-by: yiguolei <yiguolei@gmail.com> Currently, exec node save exprcontext*, but the object is in object pool, the code is very unclear. we could just use exprcontext.	2023-05-04 18:04:32 +08:00
amory	e9a4cbcdf9	[Refact](type system) refact column with arrow serde (#19091 ) * refact arrow serde * add date serde * update arrow and fix nullable and date type	2023-05-04 15:28:46 +08:00
Xinyi Zou	e17a171a3c	[fix](vertical_compaction) Fix continuous_agg_count PODArray wrong boundary judgment #19187	2023-05-04 14:50:30 +08:00
zhangstar333	eac61dc410	[vectorized](function) add some check about result type in array map (#19228 )	2023-05-01 16:28:11 +08:00
yiguolei	8eab20d3df	[bugfix](low cardinality) cached code is wrong will result wrong query result when many null pages (#19221 ) Sometimes the dict is not initialized when run comparison predicate here, for example, the full page is null, then the reader will skip read, so that the dictionary is not inited. The cached code is wrong during this case, because the following page maybe not null, and the dict should have items in the future. This will result the dict string column query return wrong result, if there are many null values in the column. I also add some regression test for dict column's equal query, larger than query, less than query. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-29 21:28:41 +08:00
Tiewei Fang	c74c2a4f8e	[fix](Metadata tvf) Metadata TVF supports read the specified columns from Fe (#19110 )	2023-04-29 00:06:08 +08:00
Xinyi Zou	a324ee794c	[fix](memory) Fix Aggregation null key memory leak due to incorrect aggfunc destroy #19201	2023-04-28 18:41:41 +08:00
Xinyi Zou	1379d7f3e0	[fix](memory) mmap threshold can be modified in conf, Increase to 128M	2023-04-28 18:17:22 +08:00
ZhangYu0123	6626f26506	[optimize](string) optimize char_length function by SIMD (#18925 ) Optimize char_length function by SIMD (1) optimize utf8_len compute (2) 840% up	2023-04-28 17:22:35 +08:00
yixiutt	aef9355cd3	[feature-wip](partial update) PART1: support basic partial write (#17542 )	2023-04-28 17:17:57 +08:00
Pxl	ec517a53a8	[Chore](build) upgrade clang-format version to 16 && move thrift to fe-common (#19155 ) upgrade clang-format version to 16 move thrift to fe-common fix core dump on pipeline engine when operator canceled and not prepared	2023-04-28 14:14:51 +08:00
Ashin Gau	65a82a0b57	[opt](FileReader) turn off prefetch data in parquet page reader when using MergeRangeFileReader (#19102 ) Using both `MergeRangeFileReader` and `BufferedStreamReader` simultaneously would waste a lot of memory, so turn off prefetch data in `BufferedStreamReader` when using MergeRangeFileReader.	2023-04-28 09:27:56 +08:00
Gabriel	28016c53f0	[profile](rf) refactor profile of runtime filters (#19134 ) * [profile](rf) refactor profile of runtime filters --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-04-28 08:46:42 +08:00
Qi Chen	3ed5cf8350	[Optimize] add `has_filter` template param in `get_next_run() to decrease` _has_filter `condition checking count in the loop.` (#19043 )	2023-04-27 21:23:36 +08:00
Qi Chen	e4f7d77c5c	[Optimize](parquet-reader) Opt by filtering null count statistics in rowgroup and page level. (#19106 ) Issue Number: About #19038, we found in this case, l_orderkey has many nulls, so we can filter it by null count statistics in the row group and page level, then it can improve a lot of performance in this case.	2023-04-27 21:21:30 +08:00
HappenLee	9e2b118288	[RegressTest](Exec) Add DCHECK null_aware_left_anti_join in mark join (#19149 )	2023-04-27 17:52:03 +08:00

1 2 3 4 5 ...

1633 Commits