doris

Author	SHA1	Message	Date
Mingyu Chen	4dad7c94da	[fix](orc) fix the count() pushdown issue in orc format (#24446 ) In previous, when querying hive table in orc format, and the file is splitted. the result of select count() may be multiple of the real row number. This is because the number of rows should be got after orc strip prune, otherwise, it may return wrong result	2023-09-16 09:57:39 +08:00
plat1ko	b9ddcbf729	[feature](merge-cloud) Rewrite code related to IOContext (#24269 )	2023-09-15 19:57:58 +08:00
Gabriel	d24f3efd4a	[pipelineX](profile) Phase 1: refactor pipelineX detailed profile (#24322 )	2023-09-15 16:14:05 +08:00
yiguolei	9c681692bd	Revert "[fix] fix http_stream retry mechanism (#23969 )" (#24407 ) This reverts commit 05e365ea137eb8c92b8e7eedc7d1435e83f065ae.	2023-09-15 10:07:53 +08:00
zzzzzzzs	05e365ea13	[fix] fix http_stream retry mechanism (#23969 ) Co-authored-by: yiguolei <676222867@qq.com>	2023-09-14 21:41:11 +08:00
Pxl	35c5d71549	[Improvement](join) some improvement of hash join (#23972 ) some improvement of hash join	2023-09-14 17:55:35 +08:00
Mryange	8e7f7c9566	[fix](profile) move probe time to pull and add LoopGenerateJoin time #24302	2023-09-14 16:41:01 +08:00
神技圈子	d8feca2530	[Enhancement]The page cache can be parameterized by the session variable of fe. (#23981 )	2023-09-14 14:28:19 +08:00
zhiqqqq	c7ae2a7d22	[Refactor & Bugfix](static variables) move some static vairables to exec_env (#24029 )	2023-09-13 09:27:03 +08:00
plat1ko	d8ef9dda59	[feature](merge-cloud) Rewrite FS interface (#23953 )	2023-09-12 19:20:25 +08:00
HappenLee	dbf509edc0	[Debug](scan) Add debug log for find p0 scan coredump in pipeline (#24202 )	2023-09-12 12:17:44 +08:00
Ashin Gau	6e28d878b5	[fix](hudi) compatible with hudi spark configuration and support skip merge (#24067 ) Fix three bugs: 1. Hudi slice maybe has log files only, so `new Path(filePath)` will throw errors. 2. Hive column names are lowercase only, so match column names in ignore-case-mode. 3. Compatible with [Spark Datasource Configs](https://hudi.apache.org/docs/configurations/#Read-Options), so users can add `hoodie.datasource.merge.type=skip_merge` in catalog properties to skip merge logs files.	2023-09-11 19:54:59 +08:00
zhangdong	dbb9365556	[Enhance](ip)optimize priority_ network matching logic for be (#23795 ) Issue Number: close #xxx If the user has configured the wrong priority_network, direct startup failure to avoid users mistakenly assuming that the configuration is correct If the user has not configured p_ n. Select only the first IP from the IPv4 list, rather than selecting from all IPs, to avoid users' servers not supporting IPv4 extends #23784	2023-09-11 18:32:31 +08:00
Jerry Hu	c94e47583c	[fix](join) avoid DCHECK failed in '_filter_data_and_build_output' (#24162 ) avoid DCHECK failed in '_filter_data_and_build_output'	2023-09-11 11:54:44 +08:00
Xiangyu Wang	9b3be0ba7a	[Fix](multi-catalog) Do not throw exceptions when file not exists for external hive tables. (#23799 ) A similar bug compares to #22140 . When executing a query with hms catalog, the query maybe failed because some hdfs files are not existed. We should just distinguish this kind of errors and skip it. ``` errCode = 2, detailMessage = (xxx.xxx.xxx.xxx)[CANCELLED][INTERNAL_ERROR]failed to init reader for file hdfs://xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc, err: [INTERNAL_ERROR]Init OrcReader failed. reason = Failed to read hdfs://xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc: [INTERNAL_ERROR]Read hdfs file failed. (BE: xxx.xxx.xxx.xxx) namenode:hdfs://xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc, err: (2), No such file or directory), reason: RemoteException: File does not exist: /xxx/dwd_tmp.db/check_dam_table_relation_record_day_data/part-00000-c4ee3118-ae94-4bf7-8c40-1f12da07a292-c000.snappy.orc at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:158) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1927) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:738) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:426) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682) ```	2023-09-10 21:55:09 +08:00
Mingyu Chen	f85da7d942	[improvement](jdbc) add profile for jdbc read and convert phase (#23962 ) Add 2 metrics in jdbc scan node profile: - `CallJniNextTime`: call get next from jdbc result set - `ConvertBatchTime`: call convert jobject to columm block Also fix a potential concurrency issue when init jdbc connection cache pool	2023-09-10 21:42:06 +08:00
zy-kkk	262c669918	[fix](jdbc catalog) fix jdbc catalog creating json columns when reading json data (#24122 )	2023-09-10 12:00:53 +08:00
Jerry Hu	93c1151f1a	[fix](join) incorrect result of mark join (#24112 )	2023-09-10 11:30:45 +08:00
daidai	f9a75b5c4f	[feature](csv_serde)1.append csv serde for serialize to csv and deserialize from csv. 2.let csvReader use csv serde not text_converter. (#23352 ) 1. append csv serde for serialize to csv and deserialize from csv. 2. let csvReader use csv serde not text_converter.	2023-09-10 00:16:21 +08:00
zhangstar333	03757d0672	[bug](explode) fix table node not implement alloc_resource function (#24031 ) fix table node not implement alloc_resource function	2023-09-09 08:25:28 +08:00
GoGoWen	0f0ffa3482	[Fix](Parquet Reader) fix parquet read issue (#24092 )	2023-09-09 00:35:18 +08:00
zhangstar333	76ca57cf21	[bug](join) fix outer join not add tuple is null column when build rows is 0 (#23974 ) fix outer join not add tuple is null column when build rows is 0	2023-09-08 17:55:03 +08:00
Pxl	69868f18d6	[Bug](join) fix nested loop join some problems (#24034 )	2023-09-08 17:40:41 +08:00
meiyi	82dc970916	[feature](insert) Support group commit insert (#22829 )	2023-09-08 15:51:03 +08:00
TengJianPing	b73f345479	[fix](intersect) fix wrong result of intersect node (#24044 ) Issue Number: close #24046	2023-09-08 10:27:37 +08:00
Jerry Hu	68acb8597b	[fix](nested_loop_join) null value should be output in semi-anti join (#23971 ) create table t1 (k1 bigint, k2 bigint) ENGINE=OLAP DUPLICATE KEY(k1, k2) COMMENT 'OLAP' DISTRIBUTED BY HASH(k2) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "is_being_synced" = "false", "storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "false" ); create table t3 (k1 bigint, k2 bigint) ENGINE=OLAP DUPLICATE KEY(k1, k2) COMMENT 'OLAP' DISTRIBUTED BY HASH(k2) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "is_being_synced" = "false", "storage_format" = "V2", "light_schema_change" = "true", "disable_auto_compaction" = "false", "enable_single_replica_compaction" = "false" ); Data: insert into t1 values (1,null),(null,1),(1,2), (null,2),(1,3), (2,4), (2,5), (3,3), (3,4), (20,2), (22,3), (24,4),(null,null); insert into t3 values (1,null),(null,1),(1,4), (1,2), (null,3), (2,4), (3,7), (3,9),(null,null),(5,1); Query: select t1.* from t1 where not exists ( select k1 from t3 where t1.k2 < t3.k2 ); Result: Empty set Expect result: +------+------+ \| k1 \| k2 \| +------+------+ \| NULL \| NULL \| \| 1 \| NULL \| +------+------+	2023-09-08 09:28:55 +08:00
Kang	9bc7010639	fix topn be inoperative because Field == Null always return true (#23830 ) ```if (!new_top.is_null() && new_top != old_top)``` is always false since old_top is Null when init and Field == Null always return true. We add old_top.is_null() check first to avoid the problem and then issue more carefull discussion about Field == Null semantics.	2023-09-04 16:02:07 +08:00
Pxl	bb3fadc5d3	[Bug](materialized-view) fix mv not match because cast and alias name (#23580 ) fix mv not match because cast and alias name	2023-09-04 12:46:33 +08:00
Gabriel	3317909141	[pipelineX](join) support nested loop join operator (#23756 )	2023-09-04 10:08:22 +08:00
zhangstar333	9da9409bd4	[refactor](join) improve join node output when build table rows is 0 (#23713 )	2023-09-04 09:48:38 +08:00
airborne12	347cceb530	[Feature](inverted index) push count on index down to scan node (#22687 ) Co-authored-by: airborne12 <airborne12@gmail.com>	2023-09-02 22:24:43 +08:00
airborne12	95488c4d93	[Fix](vscanner) remove TEMP column in block after filter (#23778 )	2023-09-02 21:54:27 +08:00
lihangyu	6b56896a01	[chore](json reader) add original data to error messge for tracing (#22803 )	2023-09-02 20:15:18 +08:00
GoGoWen	228f0ac5bb	[Feature](Multi-Catalog) support query doris bitmap column in external jdbc catalog (#23021 )	2023-09-02 12:46:33 +08:00
daidai	657e927d50	[fix](json)Fix the bug that read json file Out of bounds access (#23411 )	2023-09-02 01:11:37 +08:00
Ashin Gau	eaf2a6a80e	[fix](date) return right date value even if out of the range of date dictionary(#23664 ) PR(https://github.com/apache/doris/pull/22360) and PR(https://github.com/apache/doris/pull/22384) optimized the performance of date type. However hive supports date out of 1970~2038, leading wrong date value in tpcds benchmark. How to fix: 1. Increase dictionary range: 1900 ~ 2038 2. The date out of 1900 ~ 2038 is regenerated.	2023-09-01 14:40:20 +08:00
Gabriel	65f41f71c1	[pipelineX](refactor) refine codes (#23726 )	2023-09-01 07:57:35 +08:00
Mingyu Chen	3a2c0d16f7	[fix](parquet) fix potential heap-use-after-free issue and cache issue (#23638 ) 1. When file meta cache is disabled (by setting `max_external_file_meta_cache_num=0` in be.conf), the parquet's meta info is owned by parquet reader and will be released when calling `reader->close()`. But the underlying file reader of this parquet reader will be released after `reader->close()`, this may causing `heap-use-after-free` bug because some part of meta info may be referenced by file reader. This PR fix it by making sure that meta info is released after file reader released. 2. Add modification time for file meta cache in BE, to avoid parquet read error like: `Failed to deserialize parquet page header`	2023-08-31 18:23:05 +08:00
Gabriel	d22290e548	[pipelineX](join) support hash join (#23689 )	2023-08-31 13:01:26 +08:00
Pxl	f35ab37e1e	[Bug](materialized-view) fix load db use analyzer to analyze diffrent metaindex (#23673 ) fix load db use analyzer to analyze diffrent metaindex	2023-08-31 12:35:38 +08:00
Jerry Hu	f7caae08d5	[fix](union) should open/alloc_resource in sink operator instead of source (#23637 )	2023-08-30 18:58:59 +08:00
zhangstar333	94a8fa6bc9	[bug](function) fix explode_number function return wrong rows (#23603 ) before the explode_number function result is random with const value. because the _cur_size is reset, so it's can't insert values to column.	2023-08-29 19:02:49 +08:00
TengJianPing	962221cb18	[test](log) add log for debug case failure (#23506 )	2023-08-28 10:45:25 +08:00
Mingyu Chen	40be6a0b05	[fix](hive) do not split compress data file and support lz4/snappy block codec (#23245 ) 1. do not split compress data file Some data file in hive is compressed with gzip, deflate, etc. These kinds of file can not be splitted. 2. Support lz4 block codec for hive scan node, use lz4 block codec instead of lz4 frame codec 4. Support snappy block codec For hadoop snappy 5. Optimize the `count()` query of csv file For query like `select count() from tbl`, only need to split the line, no need to split the column. Need to pick to branch-2.0 after this PR: #22304	2023-08-26 12:59:05 +08:00
slothever	f66f161017	[fix](multi-catalog)fix hive table with cosn location issue (#23409 ) Sometimes, the partitions of a hive table may on different storage, eg, some is on HDFS, others on object storage(cos, etc). This PR mainly changes: 1. Fix the bug of accessing files via cosn. 2. Add a new field `fs_name` in TFileRangeDesc This is because, when accessing a file, the BE will get a hdfs client from hdfs client cache, and different file in one query request may have different fs name, eg, some of are `hdfs://`, some of are `cosn://`, so we need to specify fs name for each file, otherwise, it may return error: `reason: IllegalArgumentException: Wrong FS: cosn://doris-build-1308700295/xxxx, expected: hdfs://[172.xxxx:4007](http://172.xxxxx:4007/)`	2023-08-26 00:16:00 +08:00
Qi Chen	8af1e7f27f	[Fix](orc-reader) Fix incorrect result if null partition fields in orc file. (#23369 ) Fix incorrect result if null partition fields in orc file. ### Root Cause Theoretically, the underlying file of the hive partition table should not contain partition fields. But we found that in some user scenarios, the partition field will exist in the underlying orc/parquet file and are null values. As a result, the pushed down partition field which are null values. filter incorrectly. ### Solution we handle this case by only reading non-partition fields. The parquet reader is already handled this way, this PR handles the orc reader.	2023-08-26 00:13:11 +08:00
Qi Chen	a3a951c71d	[Fix](multi-catalog) Fix load string dict issue for transactional hive tables. (#23306 ) Fix load string dict issue for transactional hive tables. The column name need to pass 'row.column_name'. apache/doris-thirdparty#112	2023-08-26 00:09:12 +08:00
Qi Chen	29273771f7	[Fix](multi-catalog) Fix hive incorrect result by disable string dict filter if exprs contain null expr. (#23361 ) Issue Number: close #21960 Fix hive incorrect result by disable string dict filter if exprs contain null expr.	2023-08-25 21:16:43 +08:00
HappenLee	d331bfc513	[Performance](pipeline) support shared scan segment in mow (#23305 )	2023-08-25 10:43:02 +08:00
Pxl	d9db3f5431	[Improvement](scan) Remove redundant predicates on scan node (#23374 ) * Remove redundant predicates on scan node * update * fix	2023-08-25 10:41:37 +08:00

1 2 3 4 5 ...

994 Commits