Commit Graph

1584 Commits

Author SHA1 Message Date
52d25f41a4 [feature](multi-catalog) Rename multi-catalog config 'specified_database_list' to 'include_database_list', and introduce new multi-catalog config 'exclude_database_list' (#18834)
In my scene, We need to specify databases that are excluded to synchronize to doris,
like some databases store temporary table.
Since #17803 introduce `specified_database_list` to specify 'include databases',
this pr introduce new config `exclude_database_list` to specify 'exclude databases',
and rename `specified_database_list` to `include_database_list` for naming symmetry.

BTW, when `include_database_list` and `exclude_database_list` specify overlapping databases, `exclude_database_list` would take effect with higher privilege over `include_database_list`.
2023-05-04 09:30:02 +08:00
7652d8649b [regression](nereids) check tpc-h 1G/500G/1T plan if backend_num == 1 #18848
cases in nereids_tpch_shape_sf1_p0, nereids_tpch_shape_sf500_p0 and nereids_tpch_shape_sf1000_p0 are only for one be environment
2023-05-04 08:55:06 +08:00
9d18be9dd3 [doc](thrift) update doc for thrift 0.16 (#19217)
* 1

update doc for thrift 0.16
2023-05-02 16:00:10 +08:00
a978be32a6 [fix](schema_change) remove shadow prefix of schema for tablesink (#18822)
LSC updates tablet's schema in writing. Be optimized adding columns via linked schema change and
it distinguishes adding by comparing column name. e.g. if new column's name is not found in old schema,
then it is a newly-add column.

When a table is under schema-changing, it adds __doris_shadow_ prefix in name of columns in shadow index.
Then  writes during schema-changing would bring schema with __doris_shadow_ to be.
If schema change request arrives at be after writes, then be do it as a add-column schema change due to 
__doris_shadow_ is not in base tablet.
2023-04-30 22:46:36 +08:00
da4de37dec [feature-wip](mv lifecycle) separate life cycle of base table and its materialized views (#19210)
support related syntax and add:regress-test case

---------

Co-authored-by: yzy <yzy@nanfeng_yzy@163.com>
2023-04-30 17:42:02 +08:00
8eab20d3df [bugfix](low cardinality) cached code is wrong will result wrong query result when many null pages (#19221)
Sometimes the dict is not initialized when run comparison predicate here, for example, the full page is null, then the reader will skip read, so that the dictionary is not inited. The cached code is wrong during this case, because the following page maybe not null, and the dict should have items in the future.
This will result the dict string column query return wrong result, if there are many null values in the column.
I also add some regression test for dict column's equal query, larger than query, less than query.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-29 21:28:41 +08:00
f2b15c03ca [fix]disable enable_resource_group for regression test (#19206)
When running regression test with setting enable_resource_group = true, it's shared by other test case, may be cause regression test failed.
So we should not set it to true until we have fully test it.
2023-04-29 14:47:50 +08:00
8c6ccc092a [fix](test) fix 2 unstable test (#19220) 2023-04-29 14:42:47 +08:00
c74c2a4f8e [fix](Metadata tvf) Metadata TVF supports read the specified columns from Fe (#19110) 2023-04-29 00:06:08 +08:00
fd3c132d91 [enhancement](test) split large data of p2 cases (#19186) 2023-04-28 18:18:25 +08:00
aef9355cd3 [feature-wip](partial update) PART1: support basic partial write (#17542) 2023-04-28 17:17:57 +08:00
718297d3c1 [test](statistics) add p0 test of sampling statistics (#19176)
1. Added test p0 for sampling collection statistics
2. Modify the uniqueKeys of table analysis_jobs for deletion based on relevant conditions
3. Solve the problem that incremental statistics p0 is less stable
2023-04-28 15:50:05 +08:00
5e9c0c3500 [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type (#19077)
* prohibits date and decimal type

* add config in test
2023-04-28 11:31:51 +08:00
8288494e8e [fix](planner) AnalyticEvalNode should call child's getOutputTupleIds method to get the correct output tuple id (#19163) 2023-04-27 20:04:51 +08:00
484612a0af [opt](statistics) optimize Incremental statistics collection and statistics cleaning (#18971)
This pr mainly optimizes the following items:
- the collection of statistics: clear up invalid historical statistics before collecting them, so as not to affect the final table statistics.
- the incremental collection of statistics: in the case of incremental collection, only the corresponding partition statistics need to be collected.

TODO: Supports incremental collection of materialized view statistics.
2023-04-27 11:51:47 +08:00
708c1850d9 [test](hll) add test case for hll_raw_agg (#19127) 2023-04-27 11:33:49 +08:00
bab34e9e7c [fix](test) change to strong pwd to avoid test failure (#19042)
Some other test case may set password policy to "strong",
which may cause this case fail.
2023-04-27 11:19:07 +08:00
e76b3a316f [Bug](mysql proto) fix binary proto with dynamic mode (#19055)
Dynamic mode used in array type when serialize it to mysql row buffer using dynamic mode, when combine binary row format with dynamic mode,something goes wrong, and lead to invalid binary row format.
2023-04-27 11:18:01 +08:00
20395ce501 [feature](array_function): add support for array_cum_sum function (#18231) 2023-04-27 09:57:13 +08:00
1afa7c786f [test](regression) add test case for bucket shuffle of datetime column (#19088) 2023-04-27 09:05:32 +08:00
925efc1902 [bug](map-type)fix some bugs in map and map element function (#18935)
fix some bugs in map and map element function.
2023-04-26 22:10:15 +08:00
aacc075f09 [fix](planner) SetOperationNode's slots' nullability calculation is wrong (#19108)
SetOperationNode's slots' nullability should consider slots info from all children, even some children have EmptyResultSet
2023-04-26 21:18:37 +08:00
e83d0d9b6a [opt](Nereids) forbid some bad pattern aggregate in AggregateStrategy (#18877)
since we cannot do stats derive and cost estimate on agg very good.
this PR remove some aggregate pattern that usually not good.
1. one stage agg after exchange. this pattern is good only when process very few rows.
2. three stage distinct agg with gather middle merge.
2023-04-26 20:01:35 +08:00
1ccbdee757 [FIX](map-type)fix map regress test & create mapTypeInfo without delete #19033 2023-04-26 19:03:55 +08:00
59d8aa5a6f [Fix](multi catalog)Fix Hive partition path doesn't contain partition value case bug (#19053)
Hive support create partition with a specific location. In this case, the file path for the create partition may not contain the partition name and value. Which will cause Doris fail to query the the hive partition.
This pr is to fix this bug.
2023-04-26 17:18:51 +08:00
0c9fb7297e [fix](regression) mv segcompaction_p1 to segcompaction_p2 (#18806)
segcompaction_p1 contains fairly large load jobs, which will exceed
memlimit or timeout in pipeline under such heavy loads.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-04-26 15:34:46 +08:00
6356146274 [Fix](Nereids) fix nereids fold failed by be return null exception (#19013)
```sql
select if(
    date_format(CONCAT_WS('', '9999-07', '-26'), '%Y-%m') = DATE_FORMAT(curdate(), '%Y-%m'),
    curdate(),
    DATE_FORMAT(DATE_SUB(month_ceil(CONCAT_WS('', '9999-07', '-26')), 1), '%Y-%m-%d')
) 
```
return null when construct new children of if(), we find that the the more than "0" index in result map doesn't replace the const map caused by incorrect value-assignment in code.
2023-04-26 14:57:45 +08:00
d3a0b94602 [feature](stats) Support to kill analyze #18901
1. Report error if submit analyze jobs when stats table is not available
2. Support kill analyze
3. Support cancel sync analyze
2023-04-26 14:23:44 +08:00
50d9f35f63 [fix](planner) NPE when use ctas to create table (#18973)
This is caused by expr in orderbyelements is not analyzed.
2023-04-26 14:12:28 +08:00
7a786c3b09 [fix](Nerieds) fix bucket shuffle plan and cost model bugs and add new function add_months (#18836)
fix
1. fix varchar(1) compare to varchar(2) bug
2. fix bucket shuffle join's cost model bug

feature:
1. support add_months function
2023-04-26 13:52:44 +08:00
d037938a4c [vectorzied](function) fix year_floor get result is incorrectly (#19006) 2023-04-26 11:39:22 +08:00
5fd6d8ebd4 [fix](function) Support more behaviors of cast time in MySQL 2023-04-26 07:49:54 +08:00
c993964a88 [Bug](delete) fix the delete ignore char case (#18714) 2023-04-26 07:30:44 +08:00
9c25b514f5 [fix](doc) fix jsonb_extract doc (#19059)
This will cause FE start fail

1. docs under sql-manual need strict format.
2. Change the rule of github checks, to run FE ut if docs under sql-manual is changed
2023-04-25 20:01:51 +08:00
17b59df8dd [fix](function) Array_map compared offset rows one by one (#18406)
Array_map 's multi columns compare not only nested data rows to be equal,but also the offsets data must equal each other.
2023-04-25 19:12:19 +08:00
8ea69ca11c [refactor](nereids) do not use in_filter in pipeline mode (#19028)
1. in pipeline in_or_bloom filter replaced by bloom filter
2. do not set broadcast row limit
2023-04-25 19:02:12 +08:00
d5c82b2ea0 [optimize](regression case) Optimizing some regression case of inverted index (#19032) 2023-04-25 15:35:56 +08:00
61b7a52444 [Enhancement](multi-catalogs) Use decimal V3 type in multi-catalogs module. (#18926)
1. Use decimal V3 type in JDBC and Iceberg tables.
2. Fix hdfs TVF decimal V3 type and regression test.
2023-04-25 14:49:40 +08:00
a4a85f2476 [feat](stats) Return job id for async analyze stmt (#18800)
1. Return job id from async analysis
2. Sync analysis jobs don't save to analysis_jobs anymore
2023-04-25 14:43:54 +08:00
39d66ca2c6 [fix](parquet) hasn't initialize select vector when number of nested values equals zero (#18953)
Fix bug when reading array type in parquet file:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed,
reason = [IO_ERROR]Decode too many values in current page
```
When reading normal columns, `ScalarColumnReader::_read_values` still calls `ColumnSelectVector::set_run_length_null_map` to initialize select vector, but `ScalarColumnReader::_read_nested_column` hasn't do this, making the number of values wrong.
The situation where this error occurs is particularly extreme: The column pages have remaining values to be read,
but all of them are null values at ancestor level, so there's no actual read operation, just skipping null values at ancestor level.
2023-04-25 14:21:33 +08:00
d555bae290 [Bug](serde) fix serialize column to jsonb when meet boolean and decimal_v3 (#19011)
* [Bug](serde) fix serialize column to jsonb when meet boolean and decimal_v3

* add comment to explain why use uint8
2023-04-25 10:48:13 +08:00
171a194070 [minor](regression) fix unstable test case (#19018)
* [minor](regression) fix unstable test case

* update
2023-04-25 09:09:24 +08:00
93c48f2bb0 [fix](regression) fix show create table in_memory = false test result error #19022 2023-04-25 09:04:59 +08:00
72632b1e32 [improvement](regression-test) add max_failure_num to skip tests when too much failure #19003 2023-04-25 09:03:36 +08:00
207c827cdb [fix](test) fix result of CHARACTER_OCTET_LENGTH in . (#18896) 2023-04-25 08:42:54 +08:00
efebb3d21e [fix](schema) fix show create table get wrong random distribution info (#18895)
* [fix](schema) fix show create table get wrong random distribution info


---------

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-04-24 23:33:42 +08:00
6bf51150f3 [fix](nereids) remove unnecessary project above scan node (#18920)
1. remove unnecessary project node above scan node.
2. fix in subquery may be recognized as scalar subquery bug
3. fix some Quantile related functions' return type bug
2023-04-24 13:58:57 +08:00
Pxl
1f9450e0f7 [Chore](case) add some regression-test case about materialized-view #18946 2023-04-24 11:36:56 +08:00
ab2a6864bc [function](json) Json unquote (#18037) 2023-04-24 10:33:29 +08:00
45d0f53529 [Regression-test](Export) add regression test for export #18897 2023-04-23 19:43:22 +08:00