Commit Graph

13721 Commits

Author SHA1 Message Date
a6c1eaf1d8 [refactor] bind slot and function in one rule (#16288)
1. use one rule to bind slot and function and do type coercion to fix type and nullable error
  a. SUM(a1 + AVG(a2)) when a1 and a2 are TINYINT. Before, the return type was SMALLINT, after this PR will return the right type - DOUBLE.
2. fix runtime filter gnerator bugs - bind runtime filter on wrong join conjuncts.
2023-02-02 15:02:32 +08:00
42960ffd08 [typo](docs)fix docs format (#16279) 2023-02-02 14:13:17 +08:00
3b8182ee7e [nereids](nvl) Fix function signature (#16345) 2023-02-02 14:05:51 +08:00
9618427020 [improvement](multi-catalog) increase default batch_size to 4064 (#16326)
The performance of ClickBench Q30 is affected by batch_size:
| batch_size | 1024 | 4096 | 20480 |
| -- | -- | -- | -- |
| Q30 query time | 2.27 | 1.08 | 0.62 |

Because aggregation operator will create a new result block for each batch block, and Q30 has 90 columns, which is time-consuming. Larger batch_size will decrease the number of aggregation blocks, so the larger batch_size will improve performance.

Doris internal reader will read at least 4064 rows even if batch_size < 4064, so this PR keep the process of reading external table the same  as internal table.
2023-02-02 11:51:09 +08:00
69f34cd1c3 [fix](load) sequence column do not compare correctly in memtable (#16211) 2023-02-02 11:00:23 +08:00
eba70f972e [improvement](global context) remove some unused method from runtime state (#16329)
This is part of #16296.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-02 10:24:55 +08:00
1973b3a86f [test](regression) add tvf regression to test the remove of eof check (#16342)
Add regression test for #16302. This regression test will be failed if add EOF check for non-predicate columns.
2023-02-02 10:06:36 +08:00
941e192019 [enhancement](test) add function case date_sub(datetime,INTERVAL dayofmonth(datetime)-1 DAY) (#16306) 2023-02-02 09:56:01 +08:00
696c6ffcc5 [fix](join) crash caused by canceling query (#16311)
If the query was canceled,
the status in shared context may be `OK` with other fields not set.
2023-02-02 09:55:37 +08:00
63042a38bd [fix](memtracker) Fix high frequency load slow lock in memtracker (#16244)
Global lock stuck in memtracker when bthread is frequently created
2023-02-02 09:53:44 +08:00
06db0c6a91 [fix](iceberg) fix meta persist bug of iceberg catalog (#16344)
This PR #16082 forgot to update the GsonUtil for Iceberg Catalog/Database/Table
2023-02-02 09:30:25 +08:00
1c5279d26e [fix](multi-catalog) remove the eof check among parquet columns (#16302)
Read parquet file failed:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed, reason = [CORRUPTION]The number of rows are not equal among parquet columns
```
This error may be thrown when reading non-predicate columns in lazy-read, for example:
A row group with 1000 rows has tow non-predicate columns.
Column A has one page, Column B has two pages with 500 rows for each page.
The read range of `ParquetColumnReader` is [0, 400), and the rows between [0, 450) are all filtered by predicate columns.
So column A can skip the first page, and reach the EOF,  while column B can also skip the first page, but doesn't read the EOF.
2023-02-02 09:22:09 +08:00
aa0837f198 [bugfix](topn) fix topn runtime predicate getting value bug for decimal type (#16331)
* fix topn runtime predicate getting value bug for decimal type

* fix cast_to_string bug for TYPE_DECIMALV2
2023-02-02 09:13:32 +08:00
c4e1c5c15a [Docs](pipeline) Add doc of pipeline execution engine and remove vectorized-execution-engine (#16310)
Add doc of pipeline execution engine and remove vectorized-execution-engine
2023-02-01 23:57:18 +08:00
7c145faa80 [Enhance] use fast_float::from_chars to do str cast to float/double to avoid lose precision (#16190) 2023-02-01 23:53:34 +08:00
40d9e19e1d [feature-wip](multi-catalog) support iceberg union catalog, and add h… (#16082)
support iceberg unified catalog framework, and add hms and rest catalog for the framework
2023-02-01 22:59:42 +08:00
82faa965f5 [Bug](followup) fix datev2 functions (#16330) 2023-02-01 22:38:34 +08:00
b878a7e61e [feature](Load)Suppot skip specific lines number for csv stream load (#16055)
Support set skip line number for stream load to load csv file.

Usage `-H skip_lines:number`:
```
curl --location-trusted -u root: -T test.csv -H skip_lines:5  -XPUT http://127.0.0.1:8030/api/testDb/testTbl/_stream_load
```

Skip line number also can be used in mysql load as below:
```sql
LOAD DATA
LOCAL
INFILE '${mysql_load_skip_lines}'
INTO TABLE ${tableName}
COLUMNS TERMINATED BY ','
IGNORE 2 LINES
PROPERTIES ("auth" = "root:");
```
2023-02-01 20:42:43 +08:00
bb0d4ba787 [BugFix](sort) use correct agg function when using 2 phase sort for agg table (#16185) 2023-02-01 20:07:43 +08:00
0842aa2947 [Fix](MTMV)Support master and follow change in multi fe for mtmv (#16149)
Support master and follow change in multi fe for mtmv

This PR fixes following issues:

1. Start the mtmv only in master node, if master change to follower, it will stop the scheduler.
2. Fix a double meta write here
3. Rename some edit log function and variables
4. If a mv both have PeriodicalJob and immediate job and PeriodicalJob will be trigger right now, scheduler will ignore the immediate job.
5. Fix expired time bugs, and make sure it will be clean among all the fes.
6. cleanerScheduler interval from 1 day to 1 minute.
2023-02-01 20:02:46 +08:00
f14c62b274 [enhance](Nereids): polish code. (#16309) 2023-02-01 19:41:10 +08:00
f8513dee2a [fix](Nereids): fix regression test in A-F.out (#16260) 2023-02-01 18:19:40 +08:00
d224624bbe [improvement](session variable)Add enable_file_cache session variable (#16268)
Add enable_file_cache session variable, so that we can close file cache without restart BE.
2023-02-01 18:15:03 +08:00
4e92f63d7b [Fix](Load) Disable for the developer to import fast json in fe (#16235) 2023-02-01 16:32:11 +08:00
bf16228851 [fix](hashjoin) join produce blocks with rows larger than batch size (#16166)
* [fix](hashjoin) join produce blocks with rows larger than batch size

* fix
2023-02-01 16:02:31 +08:00
aaae1497cd [Refactor](function) opt the exec of function with null column (#16256) 2023-02-01 15:56:31 +08:00
Pxl
ca73c60442 [Chore](build) enable ignored-qualifiers check (#16196)
enable ignored-qualifiers check
2023-02-01 15:15:59 +08:00
e3c8fffd99 [function](round) fix decimal scale for scale not specified (#15541) 2023-02-01 14:58:48 +08:00
72a05a4358 [Bug](date) remove MinuteFloor/MinuteCeil for datev2 (#16247) 2023-02-01 14:57:51 +08:00
Pxl
1b99746355 [Bug](function) enchance esquery error msg && forbid to_quantile_state #16274
forbidden to_quantile_state temporary to avoid core dump. waiting for [Feature] support QuantileState in vectorized engine #15868 get the ball rolling on implementation.
2023-02-01 14:06:09 +08:00
1c7c6b2f44 [improve](file cache) rename the var QueryContext to QueryFileCacheContext (#16272) 2023-02-01 14:05:00 +08:00
ba026b6e99 [datev2](function) make function nullable DEPEND_ON_ARGUMENT (#16159) 2023-02-01 13:57:43 +08:00
dbd1dfb64c [Bug](date) fix BE crash if month_floor 's argument is null (#16281) 2023-02-01 12:25:57 +08:00
95d7c2de26 [Refactor](function) Rewrite the function elt (#16287) 2023-02-01 11:17:06 +08:00
17bec356a3 [Bug](decimalv3) always use decimalv3 for show create table (#16295) 2023-02-01 09:54:42 +08:00
0b1202051e [fix](Nereids): remove error regression case in nereids_p0 #16280
Signed-off-by: xiejiann <jianxie0@gmail.com>
2023-02-01 08:52:41 +08:00
cd457312e4 [Enhancement](grouping) Add a switch for users to force using alias name in group by and having clause (#15748) 2023-01-31 23:46:31 +08:00
c63a960df6 [fix](planner) create view generate wrong sql when sql contains multi count distinct (#16092)
If sql in create view has more than one count distinct, and write column name explicitly.
We will generate sql contains function multi_count_distinct.
It cannot be analyzed and all query containing this view will fail.
2023-01-31 23:42:53 +08:00
6470ae58ea [enhancement](config) remove config load_process_max_memory_limit_bytes (#15686) 2023-01-31 21:36:34 +08:00
934f2de8da [fix](inverted index) fix some bug about fulltext match query with compound conditions (#16226) 2023-01-31 21:34:30 +08:00
ca7eb94f23 [improvement](agg-function) Increase the limit maximum number of agg function parameters (#15924) 2023-01-31 21:03:50 +08:00
f798f60afd [Fix](Nereids) fix incorrectly push down cast expression in hash join conjunctions (#16216) 2023-01-31 20:02:31 +08:00
644efb6437 [enhencement](lock) print table lock owner when failed to try lock (#16186) 2023-01-31 18:21:18 +08:00
30915c8626 [Bug](regression-framework) fix regression framework throw strange exception (#16273)
fix regression framework throw strange exception
2023-01-31 16:52:19 +08:00
e7cd85f147 [feature](Nereids): generate phyiscal plan in DPhyp (#15264) 2023-01-31 14:16:25 +08:00
00a598a839 [feature](cooldown) Decouple storage policy and resource (#15873) 2023-01-31 14:13:47 +08:00
a8a29427f6 [fix](multi catalog)Collect decimal and date type min max statistic value (#16262)
The min and max value of decimal and date columns in hive external table are incorrect,
this pr is to parse the min max value in HMS correctly.
2023-01-31 11:58:56 +08:00
cb43010d60 [docs](kms) Add faq doc for accessing kms hdfs jce invalid key size issue. (#16264) 2023-01-31 11:30:58 +08:00
471db80f69 [Bug](date) Fix invalid date (#16205)
Issue Number: close #15777
2023-01-31 10:08:44 +08:00
a7b030778a [fix](sort) fix heap-use-after-free error if sort with limit and is spilled (#16267) 2023-01-31 09:59:03 +08:00