ddf72ce09a
[Fix](branch-2.1) fix manual pick with mistake when handling decimal … ( #40008 )
...
…type
introduced by #39843
2024-08-28 10:15:52 +08:00
8256c6f0ba
[Fix](parquet-reader) Fix definition level rle decode dead loop in parquet-reader. ( #39523 ) ( #39945 )
...
bp #39523
Co-authored-by: Qi Chen <kaka11.chen@gmail.com >
2024-08-27 08:54:43 +08:00
263746b04b
[fix](paimon) fix crash when enable cache with paimon deletion vector( #39877 ) ( #39875 )
...
bp #39877
2024-08-24 17:58:20 +08:00
14a2a66106
[fix](paimon) fix not able to read paimon data from hdfs with HA ( #39806 ) ( #39876 )
...
bp #39806
2024-08-24 17:51:15 +08:00
3103bb08dc
[pick](Variant) casting to decimal type may lost precision ( #39843 )
...
#39650
2024-08-23 22:47:32 +08:00
6ceb574aa0
[branch-2.1]Pick IO limit/workload group usage table ( #39839 )
2024-08-23 18:51:47 +08:00
8ce8887b75
[branch-2.1](memory) Refactor refresh workload groups weighted memory ratio and record refresh interval memory growth ( #39760 )
...
pick #38168
overwrites changes in #37221 on workload_group_manager.cpp. If need to
pick 37221, ignore it.
2024-08-22 17:33:11 +08:00
a3fd13fee6
[fix](catalog) set timeout for split fetch ( #39346 ) ( #39624 )
...
bp #39346
2024-08-20 21:59:55 +08:00
12ed2951c4
[fix] (inverted index) remove tmp columns in block ( #39369 ) ( #39533 )
2024-08-20 20:53:23 +08:00
5fcd6e6270
[Fix](load) Fix the incorrect src value printed in the error log when strict mode is true #39447 ( #39587 )
...
cherry pick from #39447
2024-08-20 12:02:13 +08:00
a44a274563
[Fix](parquet-reader) Fix and optimize parquet min-max filtering. ( #39375 )
...
Backport #38277 .
2024-08-15 14:12:54 +08:00
677435cef8
[Pick](Branch-2.1) pick json reader fix and support specify $. as column ( #39271 )
...
#39206
#38213
2024-08-13 17:44:45 +08:00
7e7729c4b0
[cherry-pick](branch-21) fix partition-topn calculate partition input rows have error ( #39100 ) ( #39281 )
...
## Proposed changes
cherry-pick from master: #39100
<!--Describe your changes.-->
2024-08-13 17:42:29 +08:00
3da2d1c9d6
[bug](parquet)Fix the problem that the parquet reader reads the missing sub-columns of the struct and fails. ( #38718 ) ( #39192 )
...
bp #38718
2024-08-11 20:37:40 +08:00
e15b6cfc68
[fix](be) return correct canceled status from scanner ( #36392 ) ( #39111 )
...
## Proposed changes
pick #36392
2024-08-09 04:02:42 +08:00
44cb7978a9
[opt](index) add more inverted index profile metrics #36696 ( #38858 )
2024-08-08 14:16:55 +08:00
bc644cb253
[opt](catalog) merge scan range to avoid too many splits ( #38311 ) ( #38964 )
...
bp #38311
2024-08-06 21:57:02 +08:00
607c0b82a9
[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. ( #37377 ) ( #38245 ) ( #38810 )
...
## Proposed changes
pick pr: #38575 and fix this pr bug : #38245
2024-08-05 09:13:08 +08:00
5d02c48715
[feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. ( #38432 ) ( #38809 )
...
bp #38432
## Proposed changes
Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.
These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.
By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.
For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp change column a new_a int;
hive> insert into table tmp values(2,"4");
in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from tmp;
+-------+------+
| new_a | b |
+-------+------+
| NULL | 2 |
| 2 | 4 |
+-------+------+
2 rows in set (0.02 sec)
mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from tmp;
+-------+------+
| new_a | b |
+-------+------+
| 1 | 2 |
| 2 | 4 |
+-------+------+
2 rows in set (0.02 sec)
```
You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
2024-08-05 09:06:49 +08:00
7bdc508ac7
[Bug](fix) fix coredump case in (not null, null) execpt (not null, not null) case ( #38756 )
...
## Proposed changes
Issue Number: close #38612
<!--Describe your changes.-->
2024-08-04 10:44:10 +08:00
9d23ccf1f2
[Improvement](schema scan) Use async scanner for schema scanners (#38… ( #38666 )
...
…403)
2024-08-01 16:05:24 +08:00
338fa32303
[pick](simdjson) fix simdjson with object array when jsonroot is not empty ( #38633 )
...
## Proposed changes
backport: https://github.com/apache/doris/pull/38490
Issue Number: close #xxx
<!--Describe your changes.-->
2024-08-01 11:04:54 +08:00
41fa7bc9fd
[bugfix](paimon)Fixed the reading of timestamp with time zone type data for 2.1 ( #37716 ) ( #38592 )
...
bp: #37716
2024-08-01 10:23:06 +08:00
17d351af80
[fix](csv reader) fix csv parser incorrect if enclosing line_delimiter ( #38347 ) ( #38445 )
...
Csv reader parse data incorrect when data enclosing line_delimiter, for
example, line_delimiter is \n and enclose is ', data as follows:
```
'aaaaaaaaaaaa
bbbb'
```
it will be parsed as two columns: `'aaaaaaaaaaaa` and `bbbb',` rather
than one column
```
'aaaaaaaaaaaa
bbbb'
```
The reason why this happened is csv reader will not reset result when
not match enclose in this `output_buf_read`, causing incorrect
truncation was made.
Co-authored-by: Xin Liao <liaoxinbit@126.com >
2024-07-29 14:55:45 +08:00
e2bb86e7f8
[fix](inverted index) fixed in_list condition not indexed on pipelinex ( #38178 )
...
## Proposed changes
https://github.com/apache/doris/pull/36565
https://github.com/apache/doris/pull/37842
https://github.com/apache/doris/pull/37921
https://github.com/apache/doris/pull/37386
<!--Describe your changes.-->
2024-07-25 14:42:34 +08:00
a751372e76
[Feature](multi-catalog) Add memory tracker for orc reader/writer and arrow parquet writer。 ( #37257 )
...
## Proposed changes
backport #37234
2024-07-25 13:51:59 +08:00
57864e8554
[cherry-pick](branch-21) fix collect_set function core dump without arena pool ( #38234 ) ( #38307 )
...
## Proposed changes
cherry-pick from master #38234
<!--Describe your changes.-->
2024-07-25 12:05:52 +08:00
3ea26a8c95
[fix](external) record not found file number ( #38253 ) ( #38285 )
...
bp #38253
2024-07-25 11:03:19 +08:00
ef00dad680
[Fix](multi-catalog) Fix some undefined behaviors. ( #38274 )
...
## Proposed changes
backport #37845
2024-07-24 16:14:34 +08:00
193be20c86
[feature](csv)Supports reading CSV data using LF and CRLF as line separators. ( #37687 ) ( #38099 )
...
bp #37687
2024-07-22 22:53:04 +08:00
d9fd419e47
[Fix](JsonReader) fix json with duplicate key entry may result out of bound exception ( #38147 )
...
#38146
2024-07-19 22:53:02 +08:00
7b141ffde7
[pick]add min scan thread num for workload group's scan thread ( #38123 )
...
## Proposed changes
pick #38096
2024-07-19 18:43:05 +08:00
de2272ce48
[fix](round) fix round decimal128 overflow ( #37733 ) ( #37963 )
...
cherry-pick #37733 to branch-2.1
2024-07-18 23:50:23 +08:00
88d771d360
[pipeline](fix) Avoid to use a freed dependency when cancelled ( #34584 ) ( #38046 )
...
## Proposed changes
pick #34584
<!--Describe your changes.-->
2024-07-18 15:27:10 +08:00
3d5043817a
Revert "[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. ( #37377 )" ( #38007 )
...
Reverts apache/doris#37530
Need more test, revert it temporarily
2024-07-17 21:44:25 +08:00
6932eef65e
[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. ( #37377 ) ( #37530 )
...
bp #37377
2024-07-16 10:56:13 +08:00
e7a001c420
[enhance](mtmv)support partition tvf ( #37795 )
...
pick from: https://github.com/apache/doris/pull/36479 and
https://github.com/apache/doris/pull/37201
2024-07-16 09:27:44 +08:00
1d49d386aa
[cherry-pick](branch-21) remove the useless code in column vector ( #34432 ) ( #37827 )
...
cherry-pick from master https://github.com/apache/doris/pull/34432
Co-authored-by: HappenLee <happenlee@hotmail.com >
2024-07-15 22:10:58 +08:00
967173d7d0
[cherry-pick-2.1](table-function) pick some table functions exec performance ( #34090 ) ( #37778 )
...
## Proposed changes
pick from master:
https://github.com/apache/doris/pull/33904
https://github.com/apache/doris/pull/34090
Co-authored-by: HappenLee <happenlee@hotmail.com >
2024-07-15 17:15:56 +08:00
a4d37d96ca
[opt](file-scanner) add not found file number in profile ( #37042 ) ( #37764 )
...
bp #37042
2024-07-15 17:11:06 +08:00
20758576b2
[fix](split) remove retry when fetch split batch failed ( #37637 )
...
bp: #37636
2024-07-12 22:46:03 +08:00
87912de93f
[fix](scan) catch exceptions thrown in scanner ( #36101 ) ( #37408 )
...
## Proposed changes
pick #36101
The uncaught exceptions thrown in the scanner will cause the BE to
crash.
2024-07-12 08:49:39 +08:00
217eac790b
[pick](Variant) pick some refactor and fix #34925 #36317 #36201 #36793 ( #37526 )
2024-07-11 21:25:34 +08:00
b272247a57
[pick]log thread num ( #37258 )
...
## Proposed changes
pick #37159
2024-07-04 15:27:52 +08:00
bd24a8bdd9
[Fix](csv_reader) Add a session variable to control whether empty rows in CSV files are read as NULL values ( #37153 )
...
bp: #36668
2024-07-02 22:12:17 +08:00
e25717458e
[opt](catalog) add some profile for parquet reader and change meta cache config ( #37040 ) ( #37146 )
...
bp #37040
2024-07-02 20:58:43 +08:00
d0eea3886d
[fix](multi-catalog) Revert #36575 and check nullptr of data column ( #37086 )
...
Revert #36575 , because `VScanner::get_block` will check
`DCHECK(block->rows() == 0)`, so block should be cleared when `eof =
true`.
2024-07-02 15:32:52 +08:00
e686e85f27
[opt](split) add max wait time of getting splits ( #36842 )
...
bp: #36843
2024-07-01 22:05:25 +08:00
25fb30c723
[fix](intersect) fix coredump caused by intersect of nullable and not nullable children #36401 ( #36441 )
...
## Proposed changes
Pick #36765
2024-06-26 17:45:21 +08:00
695d58f354
[cherry-pick](scan)scanner could eos early when reached limit ( #36535 ) ( #36736 )
...
## Proposed changes
cherry-pick from master #36535
2024-06-25 17:22:43 +08:00