doris

Author	SHA1	Message	Date
airborne12	f199860dea	[Improvement](inverted index) Enhance compaction performance through direct inverted index merging (#19207 )	2023-05-08 14:07:32 +08:00
yongkang.zhong	c7a04fa05a	[improvement](JDBC Catalog)Added Presto connection to Presto/Trino (#19307 )	2023-05-08 14:05:56 +08:00
yongkang.zhong	1e5000c9b2	[typo](docs) fix err to mac local dev doc (#19367 )	2023-05-08 14:05:33 +08:00
yongkang.zhong	7f0d6eb644	[log](fe)add log partitionInfo is null, fe not start service (#19143 )	2023-05-08 14:04:16 +08:00
Tiewei Fang	e78149cb65	[Enhencement](Export) add property for outfile/export and add test (#18997 ) This pr does three things: 1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first. 2. add p2 test for export 3. modify docs	2023-05-08 14:02:20 +08:00
Adonis Ling	8c4f3d4126	[chore](macOS) Fix JAVA_OPTS in start_be.sh (#19267 ) We should set -XX:-MaxFDLimit on macOS if we enable java support for BE otherwise BE may fail to start up.	2023-05-08 14:01:10 +08:00
Ashin Gau	05c5c5949c	[refactor](FileCache) set FE session variable enable_file_cache=false as default (#19327 ) Users should set `enable_file_cache=true` in FE session variables and BE configuration to enable file cache.	2023-05-08 13:53:51 +08:00
zclllyybb	bb462202dc	[Exec] log the fuzzy config of be (#19349 )	2023-05-08 11:01:54 +08:00
Adonis Ling	673cbe3317	[chore](build) Porting to GCC-13 (#19293 ) Support using GCC-13 to build the codebase.	2023-05-08 10:42:06 +08:00
Mingyu Chen	fb5b3029a7	[fix](meta) fix image file checksum error (#19363 )	2023-05-08 10:00:09 +08:00
yiguolei	1f6898a091	[refactor](remove unused file) remove progresss updater (#19332 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-05-08 09:27:52 +08:00
yongkang.zhong	32273a7a9b	[improvement](backend)Optimized error messages for insufficient replication (#19211 ) optimized the error message for creating insufficient table replications	2023-05-07 20:45:21 +08:00
Mingyu Chen	abc73ac1eb	[refactor](cluster)(step-1) remove cluster related stmt (#19355 ) * [refactor](cluster)(step-1) remove cluster stmt	2023-05-07 18:44:42 +08:00
Qi Chen	b50e2a8c08	[Fix](parquet-reader) Fix dict cols not be converted back to string type in some cases. (#19348 ) Fix dict cols not be converted back to string type in some cases, which includes introduced by #19039. For dict cols, we will convert dict cols to int32 type firstly, then convert back to string type after read block. The block will be reuse it, so it is necessary to convert it back.	2023-05-07 10:05:23 +08:00
airborne12	ed368d7f6c	[chore](build) Ignore clucene checks (#19353 )	2023-05-07 09:38:44 +08:00
Dongyang Li	6c21df6324	[tools](tpch) run mode like clickbench (#19339 )	2023-05-06 23:33:26 +08:00
yongkang.zhong	9203e0392f	[typo](docs) add mac local dev docs (#19342 ) * [typo](docs) add mac local dev docs	2023-05-06 22:58:40 +08:00
奕冷	5bf1396efe	[enhancement](load) merge single-replica related services as non-standalone (#18421 )	2023-05-06 22:54:56 +08:00
Yusheng Xu	9edbfa37cd	[Enhancement](Broker Load) New progress manager for showing loading progress status (#19170 ) This work is in the early stage, current progress is not accurate because the scan range will be too large for gathering information, what's more, only file scan node and import job support new progress manager ## How it works for example, when we use the following load query: ``` LOAD LABEL test_broker_load ( DATA INFILE("XXX") INTO TABLE `XXX` ...... ) ``` Initial Progress: the query will call `BrokerLoadJob` to create job, then `coordinator` is called to calculate scan range and its location. Update Progress: BE will report runtime_state to FE and FE update progress status according to jobID and fragmentID we can use `show load` to see the progress PENDING: ``` State: PENDING Progress: 0.00% ``` LOADING: ``` State: LOADING Progress: 14.29% (1/7) ``` FINISH: ``` State: FINISHED Progress: 100.00% (7/7) ``` At current time, full output of `show load\G` looks like: ``` ************************* 1. row ************************* JobId: 25052 Label: test_broker State: LOADING Progress: 0.00% (0/7) Type: BROKER EtlInfo: NULL TaskInfo: cluster:N/A; timeout(s):250000; max_filter_ratio:0.0 ErrorMsg: NULL CreateTime: 2023-05-03 20:53:13 EtlStartTime: 2023-05-03 20:53:15 EtlFinishTime: 2023-05-03 20:53:15 LoadStartTime: 2023-05-03 20:53:15 LoadFinishTime: NULL URL: NULL JobDetails: {"Unfinished backends":{"5a9a3ecd203049bc-85e39a765c043228":[10080]},"ScannedRows":39611808,"TaskNumber":1,"LoadBytes":7398908902,"All backends":{"5a9a3ecd203049bc-85e39a765c043228":[10080]},"FileNumber":1,"FileSize":7895697364} TransactionId: 14015 ErrorTablets: {} User: root Comment: ``` ## TODO: 1. The current partition granularity of scan range is too large, resulting in an uneven loading process for progress." 2. Only broker load supports the new Progress Manager, support progress for other query	2023-05-06 22:44:40 +08:00
yongkang.zhong	2fe9ba7c2a	[fix](jdbc catalog) fix trino jdbc catalog varchar type err (#19298 )	2023-05-06 17:16:28 +08:00
Gabriel	4c6ca88088	Revert "[refactor](function) ignore DST for function `from_unixtime` (#19151 )" (#19333 ) This reverts commit 9dd6c8f87b73db238bfd38fb1d76f3796910f398.	2023-05-06 16:33:58 +08:00
15767714253	f584ad52ca	[UDF](demo) add new demo code for java udf (#19276 )	2023-05-06 16:17:54 +08:00
HappenLee	626a4c2ab0	[RegressionTest](pipeline) coredump when run regression test in pipeline engine (#19306 )	2023-05-06 14:54:17 +08:00
ElvinWei	3f6e5118e6	[enchancement](statistics) support periodic collection of statistics (#19247 ) This PR enables periodic collection of statistics and is a precursor to automatic statistics collection. It mainly includes the following contents： support periodic collection of statistics. Change the type of Date in statistics p0 to DateV2(see [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type #19077) for test locally. complement cases(remove Chinese characters, optimize code, etc) , improve stability. Supports setting whether to keep records of statistics synchronization job info, convenient for use in p0 testing. The statistics job table was modified, and some auxiliary judgments were added to avoid the user perceiving the modification. This function was removed when the table schema is stable.	2023-05-06 14:53:06 +08:00
Adonis Ling	ccd22c508a	[chore](fe) Fix the build on Centos 6 (#19255 )	2023-05-06 14:50:56 +08:00
Luwei	3287f350de	[feature](table) implement the round robin selection be when create tablet (#19167 )	2023-05-06 14:46:48 +08:00
AlexYue	83040c8f25	[feature](S3FileWriter) Reduce network RTT for files that multipart are not applicable (#19135 ) For files less than 5MB, we don't need to use multi part upload which would at least takes 3 network IO. Instead we can just call PutObject which only takes one shot.	2023-05-06 14:46:18 +08:00
Yulei-Yang	ff6e0d3943	[Improvement](meta) support return no partition info for show_create_table (#19030 ) Some tables have a mount of partitions, when use show create table stmt on them, you will get so many lines of result that a whole screen cannot show them all, even if you scroll up to the top. show create table table2; \| table2 \| CREATE TABLE `table2` ( `k1` int(11) NULL COMMENT 'test column k1', `k2` int(11) NULL COMMENT 'test column k2' ) ENGINE=OLAP DUPLICATE KEY(`k1`, `k2`) COMMENT 'test table1' PARTITION BY RANGE(`k1`) (PARTITION p01 VALUES [("-2147483648"), ("10")), PARTITION p02 VALUES [("10"), ("100"))) DISTRIBUTED BY HASH(`k1`) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false" ); show brief create table table2; \| table2 \| CREATE TABLE `table2` ( `k1` int(11) NULL COMMENT 'test column k1', `k2` int(11) NULL COMMENT 'test column k2' ) ENGINE=OLAP DUPLICATE KEY(`k1`, `k2`) COMMENT 'test table1' DISTRIBUTED BY HASH(`k1`) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false" ); \|	2023-05-06 14:45:08 +08:00
Mingyu Chen	28b5ef436a	[improvement](scripts) modify download scripts to shorten the dir name (#19330 ) 1. Rename the download dir to short name: fe, be, dependencies 2. Remove tsinghua source for 1.2.3 3. Modify download link to archieve for 1.2.3	2023-05-06 14:00:38 +08:00
AKIRA	bd23db762d	[minor](stats) Add doc for stats framework (#19311 )	2023-05-06 13:30:55 +08:00
wudi	1223f81228	[doc](flinkconnector) fix english doc #19315 Co-authored-by: wudi <>	2023-05-06 12:04:26 +08:00
Pxl	dff669899a	[Feature](generic-aggregation) add some type define for generic aggregate functions support (#19252 ) add some type define for generic aggregate functions support	2023-05-06 11:30:13 +08:00
plat1ko	cdfbfd1f6b	[fix](replica) Fix inconsistent replica id between FE and BE (#18688 )	2023-05-06 11:06:29 +08:00
starocean999	a72eee24f1	[fix](nereids) fix merge project with window function bug (#19280 ) 1. don't merge projects if any window function exists 2. bypass SimplifyArithmeticRule for decimalV3 type	2023-05-06 10:38:14 +08:00
Yongqiang YANG	3ddedb676c	[fix](status) do not capture stacktrace for META_KEY_NOT_FOUND (#19308 ) * [fix](status) do not capture stacktrace for META_KEY_NOT_FOUND * handle PUSH_VERSION_ALREADY_EXIST	2023-05-06 10:04:28 +08:00
Gabriel	811ba73ffe	[refactor](scan) avoid unnecessary function call (#19299 ) * [Improvement](scan) avoid unnecessary function call * update * update	2023-05-06 10:03:51 +08:00
amory	987a85c6e9	[FIX](regress_testcase)fix sync with stream load #19309 when query from slave node , here maybe can not select the stream load result , even the publish version is visible , so make a sync sql to enable result in slave node.	2023-05-06 10:03:02 +08:00
minghong	c936810e83	[fix](compile) fix bug in build.sh (#19314 ) fix path for $(dirname $0)/generated-source.sh to enable docker build	2023-05-06 10:00:20 +08:00
Ashin Gau	3ece5b801c	[fix](FileReader) broker reader is not thread-safe and can't be prefetched (#19321 ) Fix errors when using brokers to load csv/json files: 5# doris::ClientCacheHelper::reopen_client(std::function<doris::ThriftClientImpl* (doris::TNetworkAddress const&, void)>&, void, int) [clone .cold] at /root/doris/be/src/runtime/client_cache.cpp:84 6# doris::io::BrokerFileReader::read_at_impl(unsigned long, doris::Slice, unsigned long, doris::io::IOContext const) [clone .cold] at /root/doris/be/src/io/fs/broker_file_reader.cpp:104 7# doris::io::FileReader::read_at(unsigned long, doris::Slice, unsigned long, doris::io::IOContext const) at /root/doris/be/src/io/fs/file_reader.cpp:31 8# doris::io::PrefetchBuffer::prefetch_buffer() at /root/doris/be/src/io/fs/buffered_reader.cpp:71	2023-05-06 09:16:56 +08:00
yiguolei	153f42a873	[enhancement](exprcontext) modify get_output_block_after_execute_expr method more clear to avoid mis usage (#19310 ) The original method signature is Block VExprContext::get_output_block_after_execute_exprs( const std::vectorvectorized::VExprContext*& output_vexpr_ctxs, const Block& input_block, Status& status) It return error status as a out parameter and the block as return value. It has to check the block.rows == 0 and then check error status. It is not conforming to the convention. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-05-06 09:03:22 +08:00
Mingyu Chen	42bac3343d	[Refactor](StmtExecutor)(step-1) Extract profile logic from StmtExecutor and Coordinator (#19219 ) Previously, we use RuntimeProfile class directly, and because there are multiple level in profile, so you can see there may be several RuntimeProfile instances be to maintain. I created several new classes for profile: class Profile: The root profile of a execution task(query or load) class SummaryProfile: The profile that contains summary info of a execution task, such as start time, end time, query id. etc. class ExecutionProfile: The profile for a single Coordinator. Each Coordinator will have a ExecutionProfile. The profile structure is as following: Profile: SummaryProfile: ExecutionProfile 1: Fragment 0: Instance 0: Instance 1: ... Fragment 1: ... ExecutionProfile 2: ... You can see, each Profile has a SummaryProfile and one or more ExecutionProfile. For most kinds of job, such as query/insert, there is only one ExecutionProfile. But for broker load job, will may be more than one ExecutionProfile, corresponding to each sub task of the load job. How to use For query/insert, etc: Each StmtExcutor will have a Profile instance. Each Coordinator will have a ExecutionProfile instance. StmtExcutor is responsible for the SummaryProfile, it will update the SummaryProfile during the execution. Coordinator is responsible for the ExecutionProfile, it will first add ExecutionProfile to the child of Profile, and update the ExecutionProfile periodically during the execution. For Load/Export, etc: Each job will hava a Profile instance. For each Coordinator of this job, add its ExecutionProfile to the children of job's Profile. Behavior Change The columns of show load profile/show query profile and QueryProfile Web UI has changed to: \| Profile ID \| Task Type \| Start Time \| End Time \| Total \| Task State \| User \| Default Db\| Sql Statement \| Is Cached \| Total Instances Num \| Instances Num Per BE \| Parallel Fragment Exec Instance Num \| Trace ID \| The Query Id and Job Id is removed and using Profile ID instead. For load job, the profile id is job id, for query/insert, is query id.	2023-05-06 09:01:51 +08:00
Jibing-Li	5210c04241	[Refactor](ScanNode) Split interface refactor (#19133 ) Move getSplits function to ScanNode, remove Splitter interface. For each kind of data source, create a specific ScanNode and implement the getSplits interface. For example, HiveScanNode. Remove FileScanProviderIf move the code to each ScanNode.	2023-05-05 23:20:29 +08:00
Mingyu Chen	c9fa10ac10	[fix](doc) avoid generate config doc automatically (#19302 ) After #19246, when compilng FE, it will automatically generate Config and Session Variables doc and overwrite the origin one. Need to avoid it because it is not ready to use yet	2023-05-05 20:39:05 +08:00
yuxuan-luo	8aa61eb8f4	[fix](compile) Add missing inclusion (#19199 ) Co-authored-by: hugoluo <hugoluo@tencent.com>	2023-05-05 20:32:32 +08:00
jakevin	159344792f	[enhance](Nereids) make getExplorationRule static (#19278 ) make getExplorationRule static to avoid new ArrayList() multiple times.	2023-05-05 19:58:24 +08:00
luozenglin	5c8ecfbf9c	[fix](thirdparty) fix opentelemetry error message compiling with ubsan (#18912 )	2023-05-05 19:09:43 +08:00
Luzhijing	34228ba805	[doc](release note) add 2.0.0 alpha1 release note (#19286 )	2023-05-05 18:06:25 +08:00
starocean999	3e3262361c	[fix](fe)havingClause should be substituted the same way as resultExprs (#19261 ) substituted havingClause in the same way as resultExprs to prevent " HAVING clause not produced by aggregation output" error	2023-05-05 18:03:43 +08:00
Xinyi Zou	58cb404661	[fix](memory) Allocator throws Exception instead of std::bad_alloc (#19285 ) W0505 01:31:25.840227 1727715 scanner_scheduler.cpp:340] Scan thread read VScanner failed: [MEM_LIMIT_EXCEEDED]PreCatch error code:11, [E11] Allocator sys memory check failed: Cannot alloc:16384, consuming tracker:<Orphan>, exec node:<>, process memory used 5.87 GB exceed limit 5.64 GB or sys mem available 252.17 GB less than low water mark 1.60 GB, failed alloc size 16.00 KB. @ 0x555c19e0cca8 doris::Exception::Exception() @ 0x555c1c3e0c3f Allocator<>::sys_memory_check() @ 0x555c1c3e1052 Allocator<>::memory_check() @ 0x555c19e0a645 Allocator<>::alloc() @ 0x555c1c34508b COWHelper<>::create<>() @ 0x555c1e23f574 doris::vectorized::ConvertThroughParsing<>::execute<>() @ 0x555c1e23f209 doris::vectorized::FunctionConvertFromString<>::execute_impl() @ 0x555c1e23f4aa doris::vectorized::FunctionConvertFromString<>::execute_impl() @ 0x555c1e15ac29 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x555c1e15ac56 doris::vectorized::PreparedFunctionImpl::execute() @ 0x555c1e245276 _ZNSt17_Function_handlerIFN5doris6StatusEPNS0_15FunctionContextERNS0_10vectorized5BlockERKSt6vectorImSaImEEmmEZNKS4_12FunctionCast14create_wrapperINS4_14DataTypeNumberIiEEEESt8functionISC_ERKSt10shared_ptrIKNS4_9IDataTypeEEPKT_bEUlS3_S6_SB_mmE_E9_M_invokeERKSt9_Any_dataOS3_S6_SB_OmSY_ @ 0x555c1e2a9341 _ZZNK5doris10vectorized12FunctionCast23prepare_remove_nullableEPNS_15FunctionContextERKSt10shared_ptrIKNS0_9IDataTypeEES9_bENKUlS3_RNS0_5BlockERKSt6vectorImSaImEEmmE_clES3_SB_SG_mm @ 0x555c1e2a8d42 _ZNSt17_Function_handlerIFN5doris6StatusEPNS0_15FunctionContextERNS0_10vectorized5BlockERKSt6vectorImSaImEEmmEZNKS4_12FunctionCast23prepare_remove_nullableES3_RKSt10shared_ptrIKNS4_9IDataTypeEESJ_bEUlS3_S6_SB_mmE_E9_M_invokeERKSt9_Any_dataOS3_S6_SB_OmSQ_ @ 0x555c1e20e42b doris::vectorized::PreparedFunctionCast::execute_impl() @ 0x555c1e15ac29 doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns() @ 0x555c1e15ac56 doris::vectorized::PreparedFunctionImpl::execute() @ 0x555c1d63e960 doris::vectorized::IFunctionBase::execute() @ 0x555c1d628700 doris::vectorized::VCastExpr::execute() @ 0x555c1d6163e5 doris::vectorized::VExprContext::execute() @ 0x555c20a83fe1 doris::vectorized::VFileScanner::_convert_to_output_block() @ 0x555c20a809af doris::vectorized::VFileScanner::_get_block_impl() @ 0x555c209b9bc4 doris::vectorized::VScanner::get_block() @ 0x555c209b1a50 doris::vectorized::ScannerScheduler::_scanner_scan() @ 0x555c209b2ac1 _ZNSt17_Function_handlerIFvvEZZN5doris10vectorized16ScannerScheduler18_schedule_scannersEPNS2_14ScannerContextEENK3$_0clEvEUlvE1_E9_M_invokeERKSt9_Any_data @ 0x555c1a8378cf doris::ThreadPool::dispatch_thread() @ 0x555c1a830fac doris::Thread::supervise_thread() @ 0x7f461faa117a start_thread @ 0x7f462033bdf3 __GI___clone @ (nil) (unknown)	2023-05-05 18:01:48 +08:00
Yongqiang YANG	0283039f90	[improvement](load) log time consumed by io and enlarge timeout in p0 (#19243 )	2023-05-05 17:39:16 +08:00

1 2 3 4 5 ...

10335 Commits