doris

Author	SHA1	Message	Date
Jerry Hu	e15b6cfc68	[fix](be) return correct canceled status from scanner (#36392 ) (#39111 ) ## Proposed changes pick #36392	2024-08-09 04:02:42 +08:00
Kang	44cb7978a9	[opt](index) add more inverted index profile metrics #36696 (#38858 )	2024-08-08 14:16:55 +08:00
Mingyu Chen	bc644cb253	[opt](catalog) merge scan range to avoid too many splits (#38311 ) (#38964 ) bp #38311	2024-08-06 21:57:02 +08:00
daidai	607c0b82a9	[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. (#37377 ) (#38245 ) (#38810 ) ## Proposed changes pick pr: #38575 and fix this pr bug : #38245	2024-08-05 09:13:08 +08:00
daidai	5d02c48715	[feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432 ) (#38809 ) bp #38432 ## Proposed changes Add `hive_parquet_use_column_names` and `hive_orc_use_column_names` session variables to read the table after rename column in `Hive`. These two session variables are referenced from `parquet_use_column_names` and `orc_use_column_names` of `Trino` hive connector. By default, these two session variables are true. When they are set to false, reading orc/parquet will access the columns according to the ordinal position in the Hive table definition. For example: ```mysql in Hive : hive> create table tmp (a int , b string) stored as parquet; hive> insert into table tmp values(1,"2"); hive> alter table tmp change column a new_a int; hive> insert into table tmp values(2,"4"); in Doris : mysql> set hive_parquet_use_column_names=true; Query OK, 0 rows affected (0.00 sec) mysql> select * from tmp; +-------+------+ \| new_a \| b \| +-------+------+ \| NULL \| 2 \| \| 2 \| 4 \| +-------+------+ 2 rows in set (0.02 sec) mysql> set hive_parquet_use_column_names=false; Query OK, 0 rows affected (0.00 sec) mysql> select * from tmp; +-------+------+ \| new_a \| b \| +-------+------+ \| 1 \| 2 \| \| 2 \| 4 \| +-------+------+ 2 rows in set (0.02 sec) ``` You can use `set parquet.column.index.access/orc.force.positional.evolution = true/false` in hive 3 to control the results of reading the table like these two session variables. However, for the rename struct inside column parquet table, the effects of hive and doris are different.	2024-08-05 09:06:49 +08:00
wuwenchi	41fa7bc9fd	[bugfix](paimon)Fixed the reading of timestamp with time zone type data for 2.1 (#37716 ) (#38592 ) bp: #37716	2024-08-01 10:23:06 +08:00
zzzxl	e2bb86e7f8	[fix](inverted index) fixed in_list condition not indexed on pipelinex (#38178 ) ## Proposed changes https://github.com/apache/doris/pull/36565 https://github.com/apache/doris/pull/37842 https://github.com/apache/doris/pull/37921 https://github.com/apache/doris/pull/37386 <!--Describe your changes.-->	2024-07-25 14:42:34 +08:00
Mingyu Chen	3ea26a8c95	[fix](external) record not found file number (#38253 ) (#38285 ) bp #38253	2024-07-25 11:03:19 +08:00
wangbo	7b141ffde7	[pick]add min scan thread num for workload group's scan thread (#38123 ) ## Proposed changes pick #38096	2024-07-19 18:43:05 +08:00
Mingyu Chen	3d5043817a	Revert "[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. (#37377 )" (#38007 ) Reverts apache/doris#37530 Need more test, revert it temporarily	2024-07-17 21:44:25 +08:00
daidai	6932eef65e	[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. (#37377 ) (#37530 ) bp #37377	2024-07-16 10:56:13 +08:00
zhangdong	e7a001c420	[enhance](mtmv)support partition tvf (#37795 ) pick from: https://github.com/apache/doris/pull/36479 and https://github.com/apache/doris/pull/37201	2024-07-16 09:27:44 +08:00
Mingyu Chen	a4d37d96ca	[opt](file-scanner) add not found file number in profile (#37042 ) (#37764 ) bp #37042	2024-07-15 17:11:06 +08:00
Ashin Gau	20758576b2	[fix](split) remove retry when fetch split batch failed (#37637 ) bp: #37636	2024-07-12 22:46:03 +08:00
Jerry Hu	87912de93f	[fix](scan) catch exceptions thrown in scanner (#36101 ) (#37408 ) ## Proposed changes pick #36101 The uncaught exceptions thrown in the scanner will cause the BE to crash.	2024-07-12 08:49:39 +08:00
lihangyu	217eac790b	[pick](Variant) pick some refactor and fix #34925 #36317 #36201 #36793 (#37526 )	2024-07-11 21:25:34 +08:00
wangbo	b272247a57	[pick]log thread num (#37258 ) ## Proposed changes pick #37159	2024-07-04 15:27:52 +08:00
Ashin Gau	e686e85f27	[opt](split) add max wait time of getting splits (#36842 ) bp: #36843	2024-07-01 22:05:25 +08:00
zhangstar333	695d58f354	[cherry-pick](scan)scanner could eos early when reached limit (#36535 ) (#36736 ) ## Proposed changes cherry-pick from master #36535	2024-06-25 17:22:43 +08:00
Ashin Gau	f59dc4fb37	[opt](split) generate and get split batch concurrently (#36044 ) bp #36045, and turn on batch split, which is turn off in #36109 Generate and get split batch concurrently. `SplitSource.getNextBatch` remove the synchronization, and make each get their splits concurrently, and `SplitAssignment` generates splits asynchronously.	2024-06-19 16:16:02 +08:00
Jerry Hu	4a277affdc	[fix](scan) In-predicate should not be pushed down for non-key column(#35913 ) (#35968 ) pick #35913	2024-06-11 11:13:34 +08:00
amory	fe1a4c4136	[Feature](IP) support ipv4/ipv6 with inverted index and conjuncts for query (#35734 ) support data type ipv4/ipv6 with inverted index and then we can query like "> or < or >= or <= or in/not in " this conjuncts expr for ip with inverted index speeding up	2024-06-03 23:24:03 +08:00
zclllyybb	680be6d19f	[fix](ub) fix uninitialized accesses in BE (#35370 ) ubsan hints: ```c++ /root/doris/be/src/olap/hll.h:93:29: runtime error: load of value 3078029312, which is not a valid value for type 'HllDataType' /root/doris/be/src/olap/hll.h:94:23: runtime error: load of value 3078029312, which is not a valid value for type 'HllDataType' /root/doris/be/src/runtime/descriptors.h:439:38: runtime error: load of value 118, which is not a valid value for type 'bool' /root/doris/be/src/vec/exec/vjdbc_connector.cpp:61:50: runtime error: load of value 35, which is not a valid value for type 'bool' ```	2024-05-29 20:31:07 +08:00
Mingyu Chen	86c7092f21	[opt](external) ignore not find files (#35319 ) The file list is got from external meta cache, and the file may already be removed from storage. We should ignore not found files and that query continue.	2024-05-28 18:51:56 +08:00
HappenLee	d97788dec8	[Refactor](Status) Refactor the scanner scheduler code make return error msg means (#35286 ) ## Proposed changes Before error msg： ``` Failed to submit scanner to scanner pool ``` After error msg: ``` Failed to submit scanner to scanner pool reason:Scan thread pool had shutdown\|type 1 ```	2024-05-28 18:49:55 +08:00
wangbo	c44affb43f	Add downgrade scan thread num by column num (#35351 )	2024-05-27 15:27:12 +08:00
Ashin Gau	eb49cd839b	[refactor](datalake) return the error status instead of static_cast<void> (#34873 ) Followup #34797 `static_cast<void>` has ignored the wrong status, some of them should make the query finished with error status, so replace `static_cast<void>` with `RETURN_IF_ERROR`. The following three scenarios need to be handled separately and cannot be simply replaced: 1. The outer function returns void; 2. Call status function inner constructors or destructors; 3. Call status function with best effort, and should ignore the wrong status.	2024-05-23 19:06:21 +08:00
Mingyu Chen	adc364a6fd	[feature](Paimon) support deletion vector for Paimon naive reader (#34743 ) (#35241 ) bp #34743 Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>	2024-05-23 00:01:30 +08:00
Mingyu Chen	903ff32021	[opt](fe) exit FE when transfer to (non)master failed (#34809 ) (#35158 ) bp #34809	2024-05-21 22:31:47 +08:00
Ashin Gau	98f8eb5c43	[opt](split) get file splits in batch mode (#34032 ) (#35107 ) bp #34032	2024-05-21 22:27:07 +08:00
Ashin Gau	1f0c45204b	[fix](iceberg) read the primary key columns if hasing equality delete (#34884 ) backport: #34835	2024-05-15 11:37:25 +08:00
zzzxl	cc00666be6	[opt](inverted index) add inlist condition handling to compound (#34134 ) 1. Previously, the compound did not support the inlist condition, which could impact performance if an inverted index was created.	2024-05-10 14:35:47 +08:00
Mingyu Chen	e085f75a43	[opt](file-scanner) print current path when encountering error (#34365 ) (#34523 ) bp #34365	2024-05-08 14:49:03 +08:00
daidai	1bfe0f0393	[feature](iceberg)support read iceberg complex type，iceberg.orc format and position delete. (#33935 ) (#34256 ) master #33935	2024-04-29 14:40:12 +08:00
Mryange	9f0a5690a6	[profile](scan) add projection time in scaner #34120	2024-04-26 07:43:40 +08:00
wangbo	47b54d4bd5	Fix remote scan pool (#33976 )	2024-04-25 15:04:43 +08:00
Gabriel	4d7ac82305	[profile](scanner) Fix wrong metrics (#33965 )	2024-04-22 22:33:24 +08:00
huanghaibin	59de97be5e	[improvement](mow) Add profile for delete_bitmap get_agg function (#33576 )	2024-04-17 23:42:13 +08:00
Ashin Gau	2cd4012541	[opt](scan) read scan ranges in the order of partitions (#33515 ) (#33657 ) backport: #33515	2024-04-17 23:42:12 +08:00
wangbo	8ee8de7857	[Fix](executor)reset remote scan thread num #33579	2024-04-17 23:42:11 +08:00
lihangyu	249a9c9875	[Feature](Variant) support aggregation model for Variant type (#33493 ) refactor use `insert_from` to replace `replace_column_data` for variable lengths columns	2024-04-17 23:42:00 +08:00
zhangstar333	6bcf24b1f6	[bug](not in) if not in (null) could eos early (#33482 ) * [bug](not in) if not in (null) could eos early	2024-04-17 23:41:59 +08:00
Pxl	5f30463bb3	[Chore](descriptors) remove unused codes for descriptors (#33408 ) remove unused codes for descriptors	2024-04-12 15:09:25 +08:00
Mryange	f7d52b5b1c	[feature](expr) add type check when expr prepare (#33330 )	2024-04-11 09:31:50 +08:00
Pxl	8fd6d4c41b	[Chore](build) add -Wconversion and remove some unused code (#33127 ) add -Wconversion and remove some unused code	2024-04-10 15:26:08 +08:00
Mryange	8e19cdd745	[featrue](expr) support common subexpression elimination be part (#32673 )	2024-04-10 11:56:21 +08:00
Xinyi Zou	cf7595d423	[opt](memory) Optimize mem tracker accuracy (#32039 ) (#33140 )	2024-04-10 11:42:19 +08:00
yiguolei	3c4ccb3981	Revert "[opt](scan) read scan ranges in the order of partitions (#31630 )" This reverts commit 5d99dffe6f1a3fcb107ce56181aeff96ef222def.	2024-04-09 12:37:31 +08:00
Gabriel	0234976ab7	[refactor](meta scan) Remove RPC from execute threads (#33378 )	2024-04-08 20:28:02 +08:00
Gabriel	a8232c67f9	[pipelineX](runtime filter) Fix task timeout caused by runtime filter (#33332 ) (#33369 )	2024-04-08 16:30:32 +08:00

1 2 3 4 5 ...

541 Commits