doris

Author	SHA1	Message	Date
lihangyu	3103bb08dc	[pick](Variant) casting to decimal type may lost precision (#39843 ) #39650	2024-08-23 22:47:32 +08:00
wangbo	6ceb574aa0	[branch-2.1]Pick IO limit/workload group usage table (#39839 )	2024-08-23 18:51:47 +08:00
Xinyi Zou	8ce8887b75	[branch-2.1](memory) Refactor refresh workload groups weighted memory ratio and record refresh interval memory growth (#39760 ) pick #38168 overwrites changes in #37221 on workload_group_manager.cpp. If need to pick 37221, ignore it.	2024-08-22 17:33:11 +08:00
Mingyu Chen	a3fd13fee6	[fix](catalog) set timeout for split fetch (#39346 ) (#39624 ) bp #39346	2024-08-20 21:59:55 +08:00
Sun Chenyang	12ed2951c4	[fix] (inverted index) remove tmp columns in block (#39369 ) (#39533 )	2024-08-20 20:53:23 +08:00
Xin Liao	5fcd6e6270	[Fix](load) Fix the incorrect src value printed in the error log when strict mode is true #39447 (#39587 ) cherry pick from #39447	2024-08-20 12:02:13 +08:00
Qi Chen	a44a274563	[Fix](parquet-reader) Fix and optimize parquet min-max filtering. (#39375 ) Backport #38277.	2024-08-15 14:12:54 +08:00
lihangyu	677435cef8	[Pick](Branch-2.1) pick json reader fix and support specify $. as column (#39271 ) #39206 #38213	2024-08-13 17:44:45 +08:00
zhangstar333	7e7729c4b0	[cherry-pick](branch-21) fix partition-topn calculate partition input rows have error (#39100 ) (#39281 ) ## Proposed changes cherry-pick from master: #39100 <!--Describe your changes.-->	2024-08-13 17:42:29 +08:00
daidai	3da2d1c9d6	[bug](parquet)Fix the problem that the parquet reader reads the missing sub-columns of the struct and fails. (#38718 ) (#39192 ) bp #38718	2024-08-11 20:37:40 +08:00
Jerry Hu	e15b6cfc68	[fix](be) return correct canceled status from scanner (#36392 ) (#39111 ) ## Proposed changes pick #36392	2024-08-09 04:02:42 +08:00
Kang	44cb7978a9	[opt](index) add more inverted index profile metrics #36696 (#38858 )	2024-08-08 14:16:55 +08:00
Mingyu Chen	bc644cb253	[opt](catalog) merge scan range to avoid too many splits (#38311 ) (#38964 ) bp #38311	2024-08-06 21:57:02 +08:00
daidai	607c0b82a9	[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. (#37377 ) (#38245 ) (#38810 ) ## Proposed changes pick pr: #38575 and fix this pr bug : #38245	2024-08-05 09:13:08 +08:00
daidai	5d02c48715	[feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432 ) (#38809 ) bp #38432 ## Proposed changes Add `hive_parquet_use_column_names` and `hive_orc_use_column_names` session variables to read the table after rename column in `Hive`. These two session variables are referenced from `parquet_use_column_names` and `orc_use_column_names` of `Trino` hive connector. By default, these two session variables are true. When they are set to false, reading orc/parquet will access the columns according to the ordinal position in the Hive table definition. For example: ```mysql in Hive : hive> create table tmp (a int , b string) stored as parquet; hive> insert into table tmp values(1,"2"); hive> alter table tmp change column a new_a int; hive> insert into table tmp values(2,"4"); in Doris : mysql> set hive_parquet_use_column_names=true; Query OK, 0 rows affected (0.00 sec) mysql> select * from tmp; +-------+------+ \| new_a \| b \| +-------+------+ \| NULL \| 2 \| \| 2 \| 4 \| +-------+------+ 2 rows in set (0.02 sec) mysql> set hive_parquet_use_column_names=false; Query OK, 0 rows affected (0.00 sec) mysql> select * from tmp; +-------+------+ \| new_a \| b \| +-------+------+ \| 1 \| 2 \| \| 2 \| 4 \| +-------+------+ 2 rows in set (0.02 sec) ``` You can use `set parquet.column.index.access/orc.force.positional.evolution = true/false` in hive 3 to control the results of reading the table like these two session variables. However, for the rename struct inside column parquet table, the effects of hive and doris are different.	2024-08-05 09:06:49 +08:00
HappenLee	7bdc508ac7	[Bug](fix) fix coredump case in (not null, null) execpt (not null, not null) case (#38756 ) ## Proposed changes Issue Number: close #38612 <!--Describe your changes.-->	2024-08-04 10:44:10 +08:00
Gabriel	9d23ccf1f2	[Improvement](schema scan) Use async scanner for schema scanners (#38… (#38666 ) …403)	2024-08-01 16:05:24 +08:00
amory	338fa32303	[pick](simdjson) fix simdjson with object array when jsonroot is not empty (#38633 ) ## Proposed changes backport: https://github.com/apache/doris/pull/38490 Issue Number: close #xxx <!--Describe your changes.-->	2024-08-01 11:04:54 +08:00
wuwenchi	41fa7bc9fd	[bugfix](paimon)Fixed the reading of timestamp with time zone type data for 2.1 (#37716 ) (#38592 ) bp: #37716	2024-08-01 10:23:06 +08:00
hui lai	17d351af80	[fix](csv reader) fix csv parser incorrect if enclosing line_delimiter (#38347 ) (#38445 ) Csv reader parse data incorrect when data enclosing line_delimiter, for example, line_delimiter is \n and enclose is ', data as follows: ``` 'aaaaaaaaaaaa bbbb' ``` it will be parsed as two columns: `'aaaaaaaaaaaa` and `bbbb',` rather than one column ``` 'aaaaaaaaaaaa bbbb' ``` The reason why this happened is csv reader will not reset result when not match enclose in this `output_buf_read`, causing incorrect truncation was made. Co-authored-by: Xin Liao <liaoxinbit@126.com>	2024-07-29 14:55:45 +08:00
zzzxl	e2bb86e7f8	[fix](inverted index) fixed in_list condition not indexed on pipelinex (#38178 ) ## Proposed changes https://github.com/apache/doris/pull/36565 https://github.com/apache/doris/pull/37842 https://github.com/apache/doris/pull/37921 https://github.com/apache/doris/pull/37386 <!--Describe your changes.-->	2024-07-25 14:42:34 +08:00
Qi Chen	a751372e76	[Feature](multi-catalog) Add memory tracker for orc reader/writer and arrow parquet writer。 (#37257 ) ## Proposed changes backport #37234	2024-07-25 13:51:59 +08:00
zhangstar333	57864e8554	[cherry-pick](branch-21) fix collect_set function core dump without arena pool (#38234 ) (#38307 ) ## Proposed changes cherry-pick from master #38234 <!--Describe your changes.-->	2024-07-25 12:05:52 +08:00
Mingyu Chen	3ea26a8c95	[fix](external) record not found file number (#38253 ) (#38285 ) bp #38253	2024-07-25 11:03:19 +08:00
Qi Chen	ef00dad680	[Fix](multi-catalog) Fix some undefined behaviors. (#38274 ) ## Proposed changes backport #37845	2024-07-24 16:14:34 +08:00
daidai	193be20c86	[feature](csv)Supports reading CSV data using LF and CRLF as line separators. (#37687 ) (#38099 ) bp #37687	2024-07-22 22:53:04 +08:00
lihangyu	d9fd419e47	[Fix](JsonReader) fix json with duplicate key entry may result out of bound exception (#38147 ) #38146	2024-07-19 22:53:02 +08:00
wangbo	7b141ffde7	[pick]add min scan thread num for workload group's scan thread (#38123 ) ## Proposed changes pick #38096	2024-07-19 18:43:05 +08:00
camby	de2272ce48	[fix](round) fix round decimal128 overflow (#37733 ) (#37963 ) cherry-pick #37733 to branch-2.1	2024-07-18 23:50:23 +08:00
Gabriel	88d771d360	[pipeline](fix) Avoid to use a freed dependency when cancelled (#34584 ) (#38046 ) ## Proposed changes pick #34584 <!--Describe your changes.-->	2024-07-18 15:27:10 +08:00
Mingyu Chen	3d5043817a	Revert "[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. (#37377 )" (#38007 ) Reverts apache/doris#37530 Need more test, revert it temporarily	2024-07-17 21:44:25 +08:00
daidai	6932eef65e	[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. (#37377 ) (#37530 ) bp #37377	2024-07-16 10:56:13 +08:00
zhangdong	e7a001c420	[enhance](mtmv)support partition tvf (#37795 ) pick from: https://github.com/apache/doris/pull/36479 and https://github.com/apache/doris/pull/37201	2024-07-16 09:27:44 +08:00
zhangstar333	1d49d386aa	[cherry-pick](branch-21) remove the useless code in column vector (#34432 ) (#37827 ) cherry-pick from master https://github.com/apache/doris/pull/34432 Co-authored-by: HappenLee <happenlee@hotmail.com>	2024-07-15 22:10:58 +08:00
zhangstar333	967173d7d0	[cherry-pick-2.1](table-function) pick some table functions exec performance (#34090 ) (#37778 ) ## Proposed changes pick from master: https://github.com/apache/doris/pull/33904 https://github.com/apache/doris/pull/34090 Co-authored-by: HappenLee <happenlee@hotmail.com>	2024-07-15 17:15:56 +08:00
Mingyu Chen	a4d37d96ca	[opt](file-scanner) add not found file number in profile (#37042 ) (#37764 ) bp #37042	2024-07-15 17:11:06 +08:00
Ashin Gau	20758576b2	[fix](split) remove retry when fetch split batch failed (#37637 ) bp: #37636	2024-07-12 22:46:03 +08:00
Jerry Hu	87912de93f	[fix](scan) catch exceptions thrown in scanner (#36101 ) (#37408 ) ## Proposed changes pick #36101 The uncaught exceptions thrown in the scanner will cause the BE to crash.	2024-07-12 08:49:39 +08:00
lihangyu	217eac790b	[pick](Variant) pick some refactor and fix #34925 #36317 #36201 #36793 (#37526 )	2024-07-11 21:25:34 +08:00
wangbo	b272247a57	[pick]log thread num (#37258 ) ## Proposed changes pick #37159	2024-07-04 15:27:52 +08:00
Tiewei Fang	bd24a8bdd9	[Fix](csv_reader) Add a session variable to control whether empty rows in CSV files are read as NULL values (#37153 ) bp: #36668	2024-07-02 22:12:17 +08:00
Mingyu Chen	e25717458e	[opt](catalog) add some profile for parquet reader and change meta cache config (#37040 ) (#37146 ) bp #37040	2024-07-02 20:58:43 +08:00
Ashin Gau	d0eea3886d	[fix](multi-catalog) Revert #36575 and check nullptr of data column (#37086 ) Revert #36575, because `VScanner::get_block` will check `DCHECK(block->rows() == 0)`, so block should be cleared when `eof = true`.	2024-07-02 15:32:52 +08:00
Ashin Gau	e686e85f27	[opt](split) add max wait time of getting splits (#36842 ) bp: #36843	2024-07-01 22:05:25 +08:00
TengJianPing	25fb30c723	[fix](intersect) fix coredump caused by intersect of nullable and not nullable children #36401 (#36441 ) ## Proposed changes Pick #36765	2024-06-26 17:45:21 +08:00
zhangstar333	695d58f354	[cherry-pick](scan)scanner could eos early when reached limit (#36535 ) (#36736 ) ## Proposed changes cherry-pick from master #36535	2024-06-25 17:22:43 +08:00
Ashin Gau	e4b6dac0c1	[fix](ubsan) reinterpret_cast fix length types to int8 is not safe (#36725 ) ## Proposed changes Fix type check of ubsan. ``` /root/doris/be/src/vec/exec/format/parquet/fix_length_plain_decoder.h:75:78: runtime error: member call on address 0x5582f35db5c0 which does not point to an object of type 'doris::vectorized::ColumnVector<signed char>' 0x5582f35db5c0: note: object is of type 'doris::vectorized::ColumnVector<int>' 83 55 00 00 78 c0 b0 5a 82 55 00 00 02 00 00 00 00 00 00 00 10 a0 00 d7 83 55 00 00 10 a0 00 d7 ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'doris::vectorized::ColumnVector<int>' doris::Status doris::vectorized::FixLengthPlainDecoder::_decode_values<false>(COW<doris::vectorized::IColumn>::mutable_ptr<doris::vectorized::IColumn>&, std::shared_ptr<doris::vectorized::IDataType const>&, doris::vectorized::ColumnSelectVector&, bool) at fix_length_plain_decoder.h:75:78 ```	2024-06-24 14:03:41 +08:00
Qi Chen	17cf34b244	[Fix](multi-catalog) Fix core in orc and parquet reader sometimes after low mem exception. (#36575 ) ## Proposed changes Backport #36574.	2024-06-22 11:28:21 +08:00
Qi Chen	f7f7b2b738	[Enhancement](multi-catalog) Add more error msgs for wrong data types in orc and parquet reader. (#36580 ) Backport #36417	2024-06-20 18:10:25 +08:00
Ashin Gau	f59dc4fb37	[opt](split) generate and get split batch concurrently (#36044 ) bp #36045, and turn on batch split, which is turn off in #36109 Generate and get split batch concurrently. `SplitSource.getNextBatch` remove the synchronization, and make each get their splits concurrently, and `SplitAssignment` generates splits asynchronously.	2024-06-19 16:16:02 +08:00

1 2 3 4 5 ...

1479 Commits