Commit Graph

1563 Commits

Author SHA1 Message Date
97fa840324 [feature](multi-catalog)support iceberg hadoop catalog external table query (#22949)
support iceberg hadoop catalog external table query
2023-08-20 19:29:25 +08:00
7c4870c371 [fix](catalog) fix hive partition prune bug on nereids (#23026) 2023-08-18 18:31:01 +08:00
419e922a69 [fix](json)Fix the bug that does not stop when reading json files (#23062)
* [fix](json)Fix the bug that does not stop when reading json files
2023-08-18 18:23:19 +08:00
1c3cc77a54 [fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty (#21236)
* [fix](function) to_bitmap parameter parsing failure returns null instead of bitmap_empty

* add ut

* fix nereids

* fix regression-test
2023-08-18 14:37:49 +08:00
a7771ea507 [fix](planner) fix current_timestamp param type mismatch when doing stream load (#23092)
FileLoadScanNode did not analyze the default value expr, result in target param type int32 become int8 as the original IntLiteral type.
2023-08-18 14:28:45 +08:00
795006ea3d [fix](multi-catalog) conversion of compatible numerical types (#23113)
Hive support schema change, but doesn't rewrite the parquet file, so the physical type of parquet file may not equal the logical type of table schema.
2023-08-18 14:05:33 +08:00
2d96d19030 [FIX](array-func) fix array() with decimal type (#23117)
if we write sql with : select array(1.0,2.0,null, null,2.0)
here will pass arg type with uint8 to be which does not match array() func sign with deicmal, and make be core. so here should cast from be and make null tag to cast decimal type
2023-08-18 12:12:50 +08:00
Pxl
59c6139aa5 [Chore](parser) fix create view failed when view contained cast as varchar (#23043)
fix create view failed when view contained cast as varchar
2023-08-18 11:50:18 +08:00
d018ac8fb7 fix show grants throw NullPointerException (#22943) 2023-08-18 10:48:56 +08:00
314f5a5143 [Fix](orc-reader) Fix filling partition or missing column used incorrect row count. (#23096)
[Fix](orc-reader) Fix filling partition or missing column used incorrect row count.

`_row_reader->nextBatch` returns number of read rows. When orc lazy materialization is turned on, the number of read rows includes filtered rows, so caller must look at `numElements` in the row batch to determine how
many rows were not filtered which will to fill to the block.

In this case, filling partition or missing column used incorrect row count which will cause be crash by `filter.size() != offsets.size()` in filter column step.

When orc lazy materialization is turned off, add `_convert_dict_cols_to_string_cols(block, nullptr)` if `(block->rows() == 0)`.
2023-08-17 23:26:11 +08:00
11d76d0ebe [fix](Nereids) non-inner join should not merge dist info (#22979)
1. left join should use left dist info.
2. right join should use right dist info.
3. full outer join should return ANY dist info.
2023-08-17 17:48:50 +08:00
d7a6b64a65 [Fix](Planner) fix case function with null cast to array null (#22947) 2023-08-17 16:37:07 +08:00
e289e03a1a [fix](executor)fix no return with old type in time_round 2023-08-17 15:34:26 +08:00
a288377118 [fix](regresstion) Fix sql server external case (#23031) 2023-08-17 10:54:54 +08:00
343a6dc29d [improvement](hash join) Return result early if probe side has no data (#23044) 2023-08-17 09:17:09 +08:00
390c52f73a [Improve](complex-type) update for array/map element_at with nested complex type with local tvf (#22927) 2023-08-16 20:47:36 +08:00
4510e16845 [improvement](delete) support delete predicate on value column for merge-on-write unique table (#21933)
Previously, delete statement with conditions on value columns are only supported on duplicate tables. After we introduce delete sign mechanism to do batch delete, a delete statement with conditions on value columns on unique tables will be transformed into the corresponding insert into ..., __DELETE_SIGN__ select ... statement. However, for unique table with merge-on-write enabled, the overhead of inserting these data can be eliminated. So this PR add the ability to allow delete predicate on value columns for merge-on-write unique tables.
2023-08-16 12:18:05 +08:00
3efa06e63e [Fix](View)varchar type conversion error (#22987) 2023-08-16 11:49:04 +08:00
221e7bdd17 [test](jdbc external) fix mysql and pg external regression test (#22998) 2023-08-16 10:44:47 +08:00
c8c46e042d [Improve](regress-test)add regress test for map_agg with nested type and insert to doris inner table #23006 2023-08-16 09:21:02 +08:00
423002b20a [fix](nereids) partitionTopN & Window estimation (#22953)
* partitionTopN & winExpr estimation

* tpcds 44/47/57
2023-08-15 20:19:03 +08:00
80566f7fed [stats](nereids)support partition stats (#22606) 2023-08-15 17:52:25 +08:00
7de362f646 [fix](Nereids): expand other join which has or condition (#22809) 2023-08-15 16:49:19 +08:00
f1864d9fcf [fix](function) fix str_to_date with specific format #22981 2023-08-15 15:30:48 +08:00
9b42093742 [feature](agg) Make 'map_agg' support array type as value (#22945) 2023-08-15 14:44:50 +08:00
707a527775 [FIX](map)insert into doris table with array/map type by local tvf (#22955) 2023-08-15 13:11:23 +08:00
b49dc8042d [feature](load) refactor CSV reading process during scanning, and support enclose and escape for stream load (#22539)
## Proposed changes

Refactor thoughts: close #22383
Descriptions about `enclose` and `escape`: #22385

## Further comments

2023-08-09: 
It's a pity that experiment shows that the original way for parsing plain CSV is faster. Therefor, the refactor is only applied on enclose related code. The plain CSV parser use the original logic.

Fallback of performance is unavoidable anyway. From the `CSV reader`'s perspective, the real weak point may be the write column behavior, proved by the flame graph.
 
Trimming escape will be enable after fix: #22411 is merged

Cases should be discussed: 

1. When an incomplete enclose appears in the beginning of a large scale data, the line delimiter will be unreachable till the EOF, will the buffer become extremely large?
2. What if an infinite line occurs in the case? Essentially,  `1.` is equivalent to this.  

Only support stream load as trial in this PR, avoid too many unrelated changes. Docs will be added when `enclose` and `escape` is available for all kinds of load.
2023-08-15 09:23:53 +08:00
fa6110accd [fix](catalog)paimon support more data type (#22899) 2023-08-14 13:48:33 +08:00
bddab94121 [Enhancement](partial update) Support including delete sign column in partial update stream load (#22874) 2023-08-13 10:32:21 +08:00
41ff48f838 [regresstion][external]fix case test_show_where and es_query 0811 (#22898) 2023-08-12 19:41:55 +08:00
5e2748d2b4 [Improve](complex-type)update orc reader for complex type and add regress tests (#22856) 2023-08-12 07:06:12 +08:00
44475b64ef [fix](pg test) fix postgresql jdbc catalog test case (#22875) 2023-08-11 20:50:47 +08:00
28561f77e9 [fix](regression)fix test_hdfs_tvf regression_test out file : decimalv3 -> decimal (#22852) 2023-08-11 20:44:18 +08:00
045843991a [Fix](Nereids) fix insert into table of random distribution for nereids (#22831)
currently insert into a table of random distribution info is not supported, we fix it by set physical properties to Any.
2023-08-11 19:26:39 +08:00
72e264dd59 [fix](executor)fix error when FixedContainer with null (#22850) 2023-08-11 17:20:50 +08:00
3e169511e3 [test](jdbc_mysql)update test_jdbc_query_mysql regression test result #22866 2023-08-11 17:15:14 +08:00
548226acfc [fix](planner)shouldn't change the child type to assignmentCompatibleType if it's INVALID_TYPE (#22841)
if changing the child type to INVALID_TYPE, the later getBuiltinFunction call will fail
2023-08-11 17:14:49 +08:00
f2075d0a81 [Fix](multi-catalog) Fix decimal precision issue in regression test result. (#22819)
Fix decimal precision issue in regression test result.
2023-08-11 13:49:30 +08:00
0aa00026bb [fix](autoinc) ignore column property isAutoInc() for create table as select ... statement(#22827) 2023-08-10 23:25:54 +08:00
Pxl
56392e21ae [Bug](decimalv3) fix decimalv3 keyrange set wrong number #22818 2023-08-10 18:15:40 +08:00
f2658dc7bd [Feature](multi-catalog) Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema. (#22318)
Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema by session var `truncate_char_or_varchar_columns`.
2023-08-10 14:37:20 +08:00
f1db6bd8c1 [feature](hive)append support for struct and map column type on textfile format of hive table (#22347)
1. append support for struct and map column type on textfile format  of hive table.
2. optimizer code that array column type.

```mysql
+------+------------------------------------+
| id   | perf                               |
+------+------------------------------------+
| 1    | {"key1":"value1", "key2":"value2"} |
| 1    | {"key1":"value1", "key2":"value2"} |
| 2    | {"name":"John", "age":"30"}        |
+------+------------------------------------+
```

```mysql
+---------+------------------+
| column1 | column2          |
+---------+------------------+
|       1 | {10, "data1", 1} |
|       2 | {20, "data2", 0} |
|       3 | {30, "data3", 1} |
+---------+------------------+
```
Summarizes support for complex types(support assign delimiter) :

1. array< primitive_type > and array< array< ... > >
2. map< primitive_type , primitive_type >
3. Struct< primitive_type , primitive_type ... >
2023-08-10 13:47:58 +08:00
57fb9799b5 [feature](agg) add aggregation function 'bitmap_agg' (#22768)
This function can be used to replace bitmap_union(to_bitmap(expr)), because bitmap_union(to_bitmap(expr)) need create many many small bitmaps firstly and then merge them into a single bitmap.
bitmap_agg will convert the column value into a bitmap directly. Its performance is better than bitmap_union(to_bitmap(expr)) . In our test , there is about 30% improvement.
2023-08-10 12:18:25 +08:00
df1f67d835 [improve](insert) Support server side prepare insert stmt (#22353) 2023-08-10 09:59:17 +08:00
768088c95e [refactor](udaf) refactor call udaf function and support map type in return (#22508) 2023-08-09 22:44:07 +08:00
Pxl
89dc1f73b2 [Bug](materialized-view) make mv matched when preagg have value column predicate contained in mv'where clause (#22779)
1. make mv matched when preagg have value column predicate contained in mv
'where clause
2. fix `org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = BITMAP_UNION need input a bitmap column, but input INVALID_TYPE`
3. make the error message more detailed when create mv stmt parse failed
2023-08-09 19:17:55 +08:00
2019bb3870 [fix](bitmap) fix wrong result of bitmap intersect functions (#22735)
* [fix](bitmap) fix wrong result of bitmap intersect functions

* fix test case
2023-08-09 18:31:24 +08:00
4608dcb2d9 [fix](agg) fix coredump caused by push down count aggregation (#22699)
fix coredump caused by push down count aggregation
2023-08-09 10:21:20 +08:00
66784cef71 [Enhancement](Load) Stream Load using SQL (#22509)
This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first.

thanks @Cai-Yao @yiguolei
2023-08-08 13:49:04 +08:00
1617368ee1 [fix](planner) fix bug of push constant conjuncts through set operation node (#22695)
when pushing down constant conjunct into set operation node, we should assign the conjunct to agg node if there is one. This is consistant with pushing constant conjunct into inlineview.
2023-08-08 12:25:42 +08:00