doris

Author	SHA1	Message	Date
Ashin Gau	3929e8214d	[improvement](filecache) Use consistent hash to assign the same scan range into the same backend among different queries (#16574 ) When file cache enabled, running the same query for the second time may be still slow, for `FE` will assign the same scan range into different backends among different queries, and the former cached data in `BE` will be useless if the scan range is changed. So, this PR introduce consistent hash to assign the same scan range into the same backend among different queries.	2023-02-10 19:49:33 +08:00
FreeOnePlus	1cc735f20b	[feature](docker)Refactor Image build script (#16528 ) Co-authored-by: Yijia Su <suyijia@selectdb.com>	2023-02-10 18:30:54 +08:00
YueW	ad141747b4	[fix](inverted index) fix array type inverted index query error (#16582 )	2023-02-10 17:57:15 +08:00
YueW	43eca4f209	[Feature-WIP](inverted index) Implementation for alter inverted index. (#16371 ) implementation for add/drop inverted index.	2023-02-10 17:56:17 +08:00
Xin Liao	6a5277b391	[fix](sequence-column) MergeIterator does not use the correct seq column for comparison (#16494 )	2023-02-10 17:51:15 +08:00
Jerry Hu	861f31205a	[fix](window function) invalid order_by_start in VAnalyticEvalNode (#16589 )	2023-02-10 17:40:40 +08:00
lihangyu	32188855ef	[improve](topn) seperate multiget rpc to ThreadPool (#16598 ) multiget_data working in bthread and may block the whole worker pthread of BRPC framework and effect other bthreads, so I seperate work task into a seperate task pool.	2023-02-10 17:39:31 +08:00
FreeOnePlus	05103d88b2	[feature](docker)Add Doris Docker Build Script (#16522 ) Add 3FE & 3BE Build Script	2023-02-10 17:18:26 +08:00
AlexYue	1f631c388d	[enhance](cooldown)accelerate cooldown task produce efficiency (#16089 )	2023-02-10 16:58:27 +08:00
morrySnow	c08c643ca0	[fix](test) disable failed ut 'SelectRollupIndexTest#testPreAggHint' temporarily (#16593 ) UT 'SelectRollupIndexTest#testPreAggHint' failed caused by #16286 Disable it temporarily to avoid block CI/CD	2023-02-10 16:36:15 +08:00
zhangstar333	b99e2dc727	[bug](jdbc) fix jdbc can't get object of PGobject (#16496 ) when pg table have some unsupported column type like: point, polygon, jsonb...... jdbc catalog will convert it to string type in doris. but get result set in java is org.postgresql.util.PGobject Some test need this pr: #16442	2023-02-10 16:19:02 +08:00
Gabriel	06788bc2d0	[Bug](pipeline) Fix projection on streaming operator (#16592 )	2023-02-10 15:57:26 +08:00
lsy3993	da753d6e26	[typo](docs)delete char and varchar in java udf when create (#16566 )	2023-02-10 14:25:28 +08:00
谢健	ae325f546a	[refactor](Nereids): mv AggregateStrategies to implementation rules (#16551 )	2023-02-10 14:10:59 +08:00
FreeOnePlus	a06baad7d7	[docs](docker) Add Run Docker cluster docs (#16520 )	2023-02-10 14:07:07 +08:00
Kang	d9924c9b8e	[Improvement](topn) add limit threashold session variable and fuzzy for topn optimizations (#16514 ) 1. add limit threshold for topn runtime pushdown and key topn optimization 2. use unified session variable topn_opt_limit_threshold for all topn optimizations 3. add fuzzy support for topn_opt_limit_threshold	2023-02-10 12:56:33 +08:00
zhangdong	8758cd412f	[feature](auth)Implementing privilege management with rbac model (#16091 ) change implement of auth to rbac each user has one default role which can not be drop; if you grant priv to user,it will grant to default role , In the current pr, the user can still only have one role other than the default role, but in the future, the user and role will be many-to-many rename PaloRole,PaloAuth,PaloPrivilege to Role,Auth,Privilege	2023-02-10 12:30:49 +08:00
xueweizhang	379bef598d	[fix-core](block) clear block row_same_bit when block reuse (#16172 )	2023-02-10 12:21:27 +08:00
Jibing-Li	e9cd1d64ed	(fix)[multi-catalog][nereids] Reset ExternalFileScanNode required slots after Nereids planner do projection. #16549 The new Nereids planner do column projection after creating scan node. For ExternalFileScanNode, this may cause the columns in required_slots mismatch with the slots after projection. This pr is to reset the required_slots after projection.	2023-02-10 11:28:01 +08:00
xy720	1b3902baa2	[Feature](Complex-type) Add struct and map type to Doris (#16444 ) This commit support: 1、Insert + select for struct/map type 2、Json stream load for struct type 3、m[key] function for map type How to use: Set the fe config to create table for struct and map type 1、admin set frontend config("enable_struct_type" = "true"); 2、admin set frontend config("enable_map_type" = "true"); #16547 Co-authored-by: xy720 <xuyang25@baidu.com> Co-authored-by: amory <wangqiannan@selectdb.com> Co-authored-by: cambyzju <zhuxiaoli01@baidu.com> Co-authored-by: hucheng01 <hucheng01@baidu.com>	2023-02-10 11:00:33 +08:00
AKIRA	0c20c607b2	fix stats (#16556 )	2023-02-10 11:00:01 +08:00
Gabriel	885fe1516f	[refactor](datev2) refine logics of auto conversion (#16552 ) * [refactor](datev2) refine logics of auto conversion * uodate * update * Revert "uodate" This reverts commit 2609a13b4022b4a603bf992fad64c133def266e0.	2023-02-10 10:06:47 +08:00
FreeOnePlus	ab9eb53049	[style](profile)Change Code-Checks Add Docker Dir (#16581 ) --------- Co-authored-by: Yijia Su <suyijia@selectdb.com>	2023-02-10 09:19:52 +08:00
YueW	e68299113e	[fix](regression test) fix test_array_index.groovy without 'order by' lead to result mismatch (#16575 )	2023-02-10 08:53:22 +08:00
Pxl	266bb971a6	[Enchancement](function) display elements number on check_chars_length #16570	2023-02-10 08:52:41 +08:00
Xinyi Zou	c1a1275870	[fix](memory) Fix parquet load stack overflow (#16537 )	2023-02-10 08:48:12 +08:00
AlexYue	48780dcea0	[BugFix](cooldown) push correct cooldownttl to be (#16553 ) There were cooldownttl and cooldownttlms in StoragePolicy, it's so error-prone because they served nearly the same. For example, the init function would only assign the ttl timestamp to cooldownttl, which would end up pushing cooldownttl 0 to be.	2023-02-10 08:45:04 +08:00
yiguolei	4fcd6cd236	[refactor](remove unused code) remove load stream mgr (#16580 ) remove old stream load pipe remove old stream load manager --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-10 07:46:18 +08:00
Zhengguo Yang	438daaaf1c	[enchancement](mv) forbidden craete useless mv in fe (#16286 ) forbidden create useless mv in fe	2023-02-09 23:00:09 +08:00
yiguolei	ab34f418c3	[bugfix](information schema) sometimes fe throw thrift_rpc_error (#16555 ) mysql> SELECT TABLE_NAME, CHECK_OPTION, IS_UPDATABLE, SECURITY_TYPE, DEFINER FROM INFORMATION_SCHEMA.VIEWS WHERE TABLE_SCHEMA = 'test' ORDER BY TABLE_NAME ASC; ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 0 Current database: * NONE * ERROR 1105 (HY000): RpcException, msg: org.apache.doris.rpc.RpcException: failed to call frontend service/n @ 0x563a2b11b6ea doris::Status::ConstructErrorStatus() @ 0x563a2bcd638f doris::ThriftRpcHelper::rpc<>() @ 0x563a2b78b777 doris::SchemaHelper::list_table_status() @ 0x563a2b7a0972 doris::SchemaViewsScanner::get_new_table() @ 0x563a2b7a0b00 doris::SchemaViewsScanner::get_next_row() @ 0x563a2ccd0c93 doris::vectorized::VSchemaScanNode::get_next() @ 0x563a2b7450d6 --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-09 21:49:33 +08:00
morrySnow	05ed1f751b	[fix](planner)(Nereids) add date and datev2 signature to greatest and least function (#16565 )	2023-02-09 21:36:53 +08:00
yongkang.zhong	9f0fab8823	[typo](docs)Change the lowercase letters of the disk type example to uppercase (#16557 )	2023-02-09 20:33:20 +08:00
wudi	77b7b84c34	fix (#16322 ) Co-authored-by: wudi <>	2023-02-09 19:55:12 +08:00
Kang	130b3599bc	[Improvement](writer) make DeltaWriter close idempotent to be more robust (#16558 ) return `Status::OK()` instead of `Status::Error<ALREADY_CLOSED>()` for close() in `DeltaWriter` if it's already closed.	2023-02-09 19:48:23 +08:00
Gabriel	a038fdaec6	[Bug](pipeline) Fix bug in non-local exchange on pipeline engine (#16463 ) Currently, for broadcast shuffle, we serialize a block once and then send it by RPC through multiple channel. After this, we will serialize next block in the same memory for consideration of memory reuse. However, since the RPC is asynchronized, maybe the next block serialization will happen before sending the previous block. So, in this PR, I use a ref count to identify if the serialized block can be reuse in broadcast shuffle.	2023-02-09 19:22:40 +08:00
Ashin Gau	539fd684e9	[improvement](filecache) use dynamic segment size to cache remote file block (#16485 ) `CachedRemoteFileReader` has used fixed segment size(file_cache_max_file_segment_size=4M) to cache remote file blocks. However, the column size in a rowgroup/strip maybe smaller than 10K if a parquet/orc file has many columns, resulting in particularly serious read amplification. For example: Q1 in clickbench: select count() from hits ``` - FileCache: 0ns - IOHitCacheNum: 552 - IOTotalNum: 835 - ReadFromFileCacheBytes: 19.98 MB - ReadFromWriteCacheBytes: 0.00 - ReadTotalBytes: 29.52 MB - SkipCacheBytes: 0.00 - WriteInFileCacheBytes: 915.77 MB - WriteInFileCacheNum: 283 ``` Only 30MB of data is needed, but 900MB+ of data is read from hdfs. The query time of Q1(single scan thread) increased from 5.17s* to 24.45s when enable file cache. Therefore, this PR introduce dynamic segment size which is based on the `read_size` of the data. In order to prevent too small or too large IO, the segment size is limited in [4096, file_cache_max_file_segment_size]. Q1 in clickbench is 5.66s when enable file cache. The performance is almost the same as if the cache is disabled, and the data size read from hdfs is reduced to 45MB. ``` - FileCache: 0ns - IOHitCacheNum: 297 - IOTotalNum: 835 - ReadFromFileCacheBytes: 8.73 MB - ReadFromWriteCacheBytes: 0.00 - ReadTotalBytes: 29.52 MB - SkipCacheBytes: 0.00 - WriteInFileCacheBytes: 45.66 MB - WriteInFileCacheNum: 544 ``` ## Remaining Problems Small queries may result in a large number of small files(4KB at least), and the `BE` saves too much meta information of cached segments. ## Fix bug `FileCachePolicy` in `FileReaderOptions` is a constant reference, but the parameter passed in `FileFactory::create_file_reader` is a temporary variable, resulting in segmentation fault.	2023-02-09 16:39:10 +08:00
Xinyi Zou	9090c5e4e5	[fix](docs) Fix memory & rowset count metrics (#16550 )	2023-02-09 15:55:35 +08:00
chunping	851a3575ae	[fix](regression case) exclude test_broker_load suite, reopen after bug fix (#16554 ) There is something wrong with the `test_broker_load` suite(s3 auth problem). So I ignore this case temporarily. cc @wsjz , please help to solve it and add it back	2023-02-09 15:51:32 +08:00
slothever	ab4c718478	[fix](iceberg) remove s3 default temporary credentials #16543 remove TemporaryAWSCredentialsProvider in global s3 source Co-authored-by: jinzhe <jinzhe@selectdb.com>	2023-02-09 15:36:35 +08:00
plat1ko	ba4b6aa0c0	[hot-fix](cooldown) Fix unknown module cooldownJob when load fe image #16545	2023-02-09 15:36:01 +08:00
Gabriel	e48a033338	[Bug](pipeline) Support projection in UnionSourceOperator (#16525 )	2023-02-09 14:43:44 +08:00
lihangyu	4b093d1ef6	[Bug](point query) when prepared statement used lazyEvaluateRangeLocations should clear bucketSeq2locations to avoid memleak (#16531 ) When JDBC client enable server side prepared statement, it will cache OlapScanNode and reuse it for performance, but each time call `addScanRangeLocations` will add new item to `bucketSeq2locations`, so the `bucketSeq2locations` lead to a memleak if OlapScanNode cached in memory	2023-02-09 14:41:07 +08:00
HappenLee	7d035486ad	[Opt](vec) opt the fast execute logic to remove useless function call (#16532 )	2023-02-09 14:12:40 +08:00
yiguolei	646ba2cc88	[bugfix](scannode) 1. make rows_read correct 2. use single scanner if has limit clause (#16473 ) make rows_read correct so that the scheduler could using this correctly. use single scanner if has limit clause. Move it from fragment context to scannode. --------- Co-authored-by: yiguolei <yiguolei@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-02-09 14:12:18 +08:00
superche	21cdbec982	[fix](docs) fix some errors in docs (#16546 ) Co-authored-by: hechao <hechao@selectdb.com>	2023-02-09 13:50:42 +08:00
Yun Tang	338277b748	[doc](flink-connector) Update the flink connector docs to the latest (#14856 )	2023-02-09 12:48:59 +08:00
Jenson97	d52fab6316	[typo](docs)modified some text errors (#16544 ) Co-authored-by: wangtao <wangtao01@tianyancha.com>	2023-02-09 11:59:49 +08:00
Xiaocc	0142ef8b95	[improvement](scanner) Supports bthread scanner (#16031 )	2023-02-09 10:24:56 +08:00
Drogon	531616b8ee	[Fix](bucket)fix partition with no history data && AutoBucketUtilsTest (#16516 ) fix partition with no history data && AutoBucketUtilsTest (#16515)	2023-02-09 10:17:25 +08:00
yixiutt	9f8753ffd2	[bugfix](vertical_compaction) fix base_compaction delete_sign handler (#16469 ) In vertical base compaction, same rows will be filtered in vertical_merge_iterator, we should skip these filtered rows when set agg flag of delete sign. For example, schema is a,b,delete_sign, and data is 1,1,1 1,1,0 1,1,0 2,2,1 2,2 and Block we get in VerticalBlockReader is 1,1,1 2,2,1 and we should set agg flag idex 0,4 to true when handle delete sign, so we add a function continuous_agg_count to skip same rows filtered in VerticalMergeIterator.	2023-02-09 10:13:41 +08:00

1 2 3 4 5 ...

8637 Commits