Commit Graph

1736 Commits

Author SHA1 Message Date
4b5cea1ef8 [enhancement](fix)change ordinary type null value is \N,complex type null value is null (#24207) 2023-09-16 21:46:42 +08:00
88adab3114 [fix](Nereids): fix be core when array_map is not nullable (#24488)
fix be core when array_map is not nullable
2023-09-16 20:39:15 +08:00
ed8db3727c [feature](partial update) support MOW partial update for insert statement (#21597) 2023-09-16 17:11:59 +08:00
4dad7c94da [fix](orc) fix the count(*) pushdown issue in orc format (#24446)
In previous, when querying hive table in orc format, and the file is splitted.
the result of select count(*) may be multiple of the real row number.

This is because the number of rows should be got after orc strip prune,
otherwise, it may return wrong result
2023-09-16 09:57:39 +08:00
1c142309a6 [refactor](jdbc catalog) refactor JdbcFunctionPushDownRule (#23826)
1. Change from using string matching function to using Expr matching
2. Replace the `nvl` function with `ifnull` when pushed down to MySQL
3. Adapt ClickHouse's `from_unixtime` function to push down
4. Non-function filtering can still be pushed down when `enable_func_pushdown` is set to false
2023-09-15 22:16:07 +08:00
ba4c738ac7 [Feature](Nereids) support values table (#23121)
support insert into table values(...) for Nereids.
sql like:
insert into t values(1, 2, 3)
insert into t values(1 + 1, dayofweek(now()), 4), (4, 5, 6)
insert into t values('1', '6.5', cast(1.5 as int))
2023-09-15 21:46:37 +08:00
07d4769134 [fix](bitmap) fix coredump of bitmap_from_array caused by null array literal (#24404) 2023-09-15 18:36:33 +08:00
dc0c39f1d8 [Enhance](external)change hive docker to host network and add hive case (#24401)
1. Change the external hive docker network mode from the bridge mode to the host mode to support the external test of the multi-node doris cluster
2. Added more hive test data in various formats
3. Added a test case with hive
2023-09-15 17:46:24 +08:00
699023069d [regression](lateral view) add test case for explode_bitmap (#24421) 2023-09-15 17:30:26 +08:00
7fd72351f9 [fix](agg) windown_funnel compatibility issue with multi backends (#24385) 2023-09-15 17:22:47 +08:00
6dbe07bd3b [Enhancement](inverted index) use conjunction query to accelerate fulltext equal query (#24373) 2023-09-15 15:34:57 +08:00
08740b47cd [FIX](decimalv3) fix decimalv3 value with leading zeros (#24416)
now we make error if we deal with leading zeros in decimal value , type_precision >= precision will make value overflow and DCHECK will fail , so if here has leading zero we should only make type_precision > precision to make value right
2023-09-15 13:35:20 +08:00
29fe87982f [improve](outfile) add file_suffix options for outfile (#24334) 2023-09-15 12:58:41 +08:00
c5e7f55b63 [performance](executor) optimize time_round function (#23058)
optimize time_round function
2023-09-15 10:49:22 +08:00
dbd7733e02 [feature](regression) Add p2 level test for schema change (#20243) 2023-09-15 10:39:07 +08:00
00bb32cfc0 [opt](nereids) enable two phase partition topn opt #23870
Enable two phase partition topn optimization, instead of original full sort at the second phase.
E.g, partial plan of tpcds q67 is as following and a full sort after exchange will have performance impact, especially if the window column's ndv is very high and the number of window is huge.

------PhysicalTopN
--------filter((rk <= 100))
----------PhysicalWindow
------------PhysicalQuickSort
--------------PhysicalDistribute
----------------PhysicalPartitionTopN
------------------PhysicalProject

Under this scenario, the second phase full sort can be transformed to a global PhysicalPartitionTopN and reduce the cost from full sort. The plan will be optimized to the following:

------PhysicalTopN
--------filter((rk <= 100))
----------PhysicalWindow
------------PhysicalPartitionTopN
--------------PhysicalDistribute
----------------PhysicalPartitionTopN
------------------PhysicalProject
2023-09-15 10:30:34 +08:00
c5ef6cfea2 [fix](Table-Valued Function) fix be core when user sepcified empty column_separator using hdfs tvf (#24369) 2023-09-14 23:19:48 +08:00
5ba1f62da8 [enhancement](Nereids) make stats unchanged (#23737)
make stats unchanged when explore plan
2023-09-14 22:18:54 +08:00
d4756d3118 [feature](Nereids): fold Cast(s as date/datetime) on FE (#24353)
cast("20210101" as Date) -> DateLiteral(2021, 1, 1)
2023-09-14 22:08:26 +08:00
f61e6483bf [enhancement](broker-load) support compress type for old broker load, and split compress type from file format (#23882) 2023-09-14 21:42:28 +08:00
eb65cc6954 [Fix](nereids) eliminate_outer_join regression case fix #24262 2023-09-14 18:22:17 +08:00
3ee89aea35 [Feature](merge-on-write)Support ignore mode for merge-on-write unique table (#21773) 2023-09-14 18:03:51 +08:00
d035a58374 [feature](nereids) support unnest subquery in LogicalOneRowRelation (#24355)
select (select 1);
before : 
ERROR 1105 (HY000): errCode = 2, detailMessage = Subquery is not supported in the select list.
after:
mysql> select (select 1);
+---------------------------------------------------------------------+
|  (SCALARSUBQUERY) (LogicalOneRowRelation ( projects=[1 AS `1`#0] )) |
+---------------------------------------------------------------------+
|                                                                   1 |
+---------------------------------------------------------------------+
1 row in set (0.61 sec)
2023-09-14 17:22:08 +08:00
4fbb25bc55 [Enhancement](function) Support date_trunc(date) and use it in auto partition (#24341)
Support date_trunc(date) and use it in auto partition
2023-09-14 16:53:09 +08:00
b6d7116dea [fix](datetime) fix compare of DatetimeLiteral (#24343)
fix compare of DatetimeLiteral
2023-09-14 16:51:50 +08:00
ccba5a729a [fix](planner)cast string to float like type should return NULL literal if it fails (#24222) 2023-09-14 15:59:20 +08:00
268c867679 [Improve](serde)replace function_cast from_string to serde (#24087)
Now we can not support streamload with column which is map/array nested map/array
serde can do this now , so we can replace it
Notice. if item data in complex type data is empty we just return error, instead of makeup default value , because now we can not define right default for complex type
2023-09-14 13:53:16 +08:00
d23d1870a2 [fix](Nereids): fix regression-test (#24329) 2023-09-14 13:21:50 +08:00
ed108d48fa [fix](invert index) fix query use char filter (#24268) 2023-09-14 11:42:47 +08:00
46f5988245 [fix](Nereids) set operation children output order not same (#24060)
we generate project for all set operation's children to ensure the order
of all children are not changed. However, some rules, such as
PushDownProjectThroughLimit could remove these projects involuntarily.
When it happen, the column order is wrong and lead to BE core dump.
This PR use a new variable in SetOperation to save the output order of
children of set operation. Then the children's output order could be
changed and never affect to SetOperation at all.
2023-09-14 11:09:58 +08:00
9b7f041bea [Bug](function) fix explode_json_array_int can't handle min/max values (#24284)
the json str get value maybe beyond max/min of Int64,
so add some check to limit the value, and return the max/min of Int64
2023-09-14 09:20:59 +08:00
93a9f1007c [fix](Nereids): fix regression test (#24336)
fix failed regression test by #23842
2023-09-14 01:55:09 +08:00
11afd321cb [fix](es catalog) fix issue with select and insert from es catalog core (#24318)
Issue Number: close #24315

The root cause of this issue is that Elasticsearch's long type allows inserting floats and strings. Doris did not handle these cases when doing type conversion. The current strategy is to take the integer before the decimal point if a float or string is found.
2023-09-13 23:07:31 +08:00
d5b490b2e7 [test](regression) add file cache regression test (#24192)
Add file cache regression test in tpch 1g on orc&parquet format.
tpch will run 3 times:
1. running without file cache
2. running with file cache for the first time
3. running with file cache for the second time

The file cache configuration is already added in `be/conf/be.conf` on the regression test environment, and the available capacity is 100MB. After running the tpch 1g test, the metrics introduced by https://github.com/apache/doris/pull/19177 is like:
```
doris_be_file_cache_normal_queue_curr_size{path="/mnt/datadisk1/gaoxin/file_cache"} 92808933
doris_be_file_cache_normal_queue_curr_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 59
doris_be_file_cache_normal_queue_max_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 102400
doris_be_file_cache_normal_queue_max_size{path="/mnt/datadisk1/gaoxin/file_cache"} 89128960
doris_be_file_cache_removed_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 2132
doris_be_file_cache_segment_reader_cache_size{path="/mnt/datadisk1/gaoxin/file_cache"} 54
```
2023-09-13 22:59:01 +08:00
9847f7789f [Feature](Export) Export sql supports to export data of view and exrernal table (#24070)
Previously, EXPORT only supported the export of the olap table,
This pr supports the export of view table and external table.
2023-09-13 22:55:19 +08:00
d7e5f97b74 [feature](Nereids): eliminate AssertNumRows (#23842) 2023-09-13 22:24:02 +08:00
07dd6830e8 [pipelineX](refactor) add union node in pipelineX (#24286) 2023-09-13 20:39:58 +08:00
231038f050 [fix](planner)allow infer predicate for external table (#24227)
CREATE EXTERNAL TABLE `dim_server` (
    `col1` varchar(50) NOT NULL,
    `col2` varchar(50) NOT NULL
    )
create view ads_oreo_sid_report
    (
    `col1` ,
        `col2`
    )
    AS
    select
    tmp.col1,tmp.col2
    from (
    select 'abc' as col1,'def' as col2
    ) tmp
    inner join dim_server ds on tmp.col1 = ds.col1  and tmp.col2 = ds.col2;

select * from ads_oreo_sid_report where col1='abc' and col2='def';

before this pr,  col1='abc' and col2='def' can't be pushed to dim_server. now the 2 predicates can be pushed to odbc table.
2023-09-13 17:22:39 +08:00
335064f897 [feature](Nereids) add lambda argument and array_map function (#23598)
add array_map function

SELECT ARRAY_MAP(x->x+1, ARRAY(87, 33, -49))
+----------------------------------------------------------------------+
| array_map([x] -> (x + 1), x#1 of array(87, 33, -49))     |
+----------------------------------------------------------------------+
| [88, 34, -48]                                                                 |
+----------------------------------------------------------------------+
2023-09-13 14:24:16 +08:00
e30c3f3a65 [fix](csv_reader)fix bug that Read garbled files caused be crash. (#24164)
fix bug that read garbled files caused be crash.
2023-09-13 14:12:55 +08:00
ebe3749996 [fix](tvf)support s3,local compress_type and append regression test (#24055)
support s3,local compress_type and append regression test.
2023-09-13 00:32:59 +08:00
9df72a96f3 [Feature](multi-catalog) Support hadoop viewfs. (#24168)
### Feature

Support hadoop viewfs.

### Test

- Regression tests: 
  - hive viewfs test.
  - tvf viewfs test.

- Broker load with broker and with hdfs tests manually.
2023-09-13 00:20:12 +08:00
c402d48f97 [fix](query-cache) fix query cache with empty set (#24147)
If the query result set is empty, the query cache will not cache the result.
This PR fix it.
2023-09-12 20:11:20 +08:00
d3f1388717 [Feature](partitions) Support auto-partition (#24153)
Co-authored-by: zhangstar333 <2561612514@qq.com>
2023-09-12 15:23:15 +08:00
4bb9a12038 [function](bitmap) support bitmap_remove (#24190) 2023-09-12 14:52:04 +08:00
6913d68ba0 [Enhancement](merge-on-write) use delete bitmap to mark delete for rows with delete sign when sequence column doesn't exist (#24011) 2023-09-12 08:56:46 +08:00
6e28d878b5 [fix](hudi) compatible with hudi spark configuration and support skip merge (#24067)
Fix three bugs:
1. Hudi slice maybe has log files only, so `new Path(filePath)`  will throw errors.
2. Hive column names are lowercase only, so match column names in ignore-case-mode.
3.  Compatible with [Spark Datasource Configs](https://hudi.apache.org/docs/configurations/#Read-Options), so users can add `hoodie.datasource.merge.type=skip_merge` in catalog properties to skip merge logs files.
2023-09-11 19:54:59 +08:00
115969c3fb [opt](nereids) improve eliminate outerjoin in cascades (#24120)
* eliminate outer join cascading
2023-09-11 19:42:05 +08:00
9c441a4a16 [feature](Nereids) support create table and ctas (#24150)
Co-authored-by: sohardforaname <organic_chemistry@foxmail.com>
2023-09-11 12:37:58 +08:00
cd13f9e8c6 [BUG](view) fix can't create view with lambda function (#23942)
before the lambda function Expr not implement toSqlImpl() function.
so it's call parent function, which is not suit for lambda function.
and will be have error when create view.
2023-09-11 10:04:00 +08:00