Commit Graph

1533 Commits

Author SHA1 Message Date
5e2748d2b4 [Improve](complex-type)update orc reader for complex type and add regress tests (#22856) 2023-08-12 07:06:12 +08:00
44475b64ef [fix](pg test) fix postgresql jdbc catalog test case (#22875) 2023-08-11 20:50:47 +08:00
28561f77e9 [fix](regression)fix test_hdfs_tvf regression_test out file : decimalv3 -> decimal (#22852) 2023-08-11 20:44:18 +08:00
045843991a [Fix](Nereids) fix insert into table of random distribution for nereids (#22831)
currently insert into a table of random distribution info is not supported, we fix it by set physical properties to Any.
2023-08-11 19:26:39 +08:00
72e264dd59 [fix](executor)fix error when FixedContainer with null (#22850) 2023-08-11 17:20:50 +08:00
3e169511e3 [test](jdbc_mysql)update test_jdbc_query_mysql regression test result #22866 2023-08-11 17:15:14 +08:00
548226acfc [fix](planner)shouldn't change the child type to assignmentCompatibleType if it's INVALID_TYPE (#22841)
if changing the child type to INVALID_TYPE, the later getBuiltinFunction call will fail
2023-08-11 17:14:49 +08:00
f2075d0a81 [Fix](multi-catalog) Fix decimal precision issue in regression test result. (#22819)
Fix decimal precision issue in regression test result.
2023-08-11 13:49:30 +08:00
0aa00026bb [fix](autoinc) ignore column property isAutoInc() for create table as select ... statement(#22827) 2023-08-10 23:25:54 +08:00
Pxl
56392e21ae [Bug](decimalv3) fix decimalv3 keyrange set wrong number #22818 2023-08-10 18:15:40 +08:00
f2658dc7bd [Feature](multi-catalog) Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema. (#22318)
Truncate char or varchar columns if size is smaller than file columns or not found in the file column schema by session var `truncate_char_or_varchar_columns`.
2023-08-10 14:37:20 +08:00
f1db6bd8c1 [feature](hive)append support for struct and map column type on textfile format of hive table (#22347)
1. append support for struct and map column type on textfile format  of hive table.
2. optimizer code that array column type.

```mysql
+------+------------------------------------+
| id   | perf                               |
+------+------------------------------------+
| 1    | {"key1":"value1", "key2":"value2"} |
| 1    | {"key1":"value1", "key2":"value2"} |
| 2    | {"name":"John", "age":"30"}        |
+------+------------------------------------+
```

```mysql
+---------+------------------+
| column1 | column2          |
+---------+------------------+
|       1 | {10, "data1", 1} |
|       2 | {20, "data2", 0} |
|       3 | {30, "data3", 1} |
+---------+------------------+
```
Summarizes support for complex types(support assign delimiter) :

1. array< primitive_type > and array< array< ... > >
2. map< primitive_type , primitive_type >
3. Struct< primitive_type , primitive_type ... >
2023-08-10 13:47:58 +08:00
57fb9799b5 [feature](agg) add aggregation function 'bitmap_agg' (#22768)
This function can be used to replace bitmap_union(to_bitmap(expr)), because bitmap_union(to_bitmap(expr)) need create many many small bitmaps firstly and then merge them into a single bitmap.
bitmap_agg will convert the column value into a bitmap directly. Its performance is better than bitmap_union(to_bitmap(expr)) . In our test , there is about 30% improvement.
2023-08-10 12:18:25 +08:00
df1f67d835 [improve](insert) Support server side prepare insert stmt (#22353) 2023-08-10 09:59:17 +08:00
768088c95e [refactor](udaf) refactor call udaf function and support map type in return (#22508) 2023-08-09 22:44:07 +08:00
Pxl
89dc1f73b2 [Bug](materialized-view) make mv matched when preagg have value column predicate contained in mv'where clause (#22779)
1. make mv matched when preagg have value column predicate contained in mv
'where clause
2. fix `org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = BITMAP_UNION need input a bitmap column, but input INVALID_TYPE`
3. make the error message more detailed when create mv stmt parse failed
2023-08-09 19:17:55 +08:00
2019bb3870 [fix](bitmap) fix wrong result of bitmap intersect functions (#22735)
* [fix](bitmap) fix wrong result of bitmap intersect functions

* fix test case
2023-08-09 18:31:24 +08:00
4608dcb2d9 [fix](agg) fix coredump caused by push down count aggregation (#22699)
fix coredump caused by push down count aggregation
2023-08-09 10:21:20 +08:00
66784cef71 [Enhancement](Load) Stream Load using SQL (#22509)
This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first.

thanks @Cai-Yao @yiguolei
2023-08-08 13:49:04 +08:00
1617368ee1 [fix](planner) fix bug of push constant conjuncts through set operation node (#22695)
when pushing down constant conjunct into set operation node, we should assign the conjunct to agg node if there is one. This is consistant with pushing constant conjunct into inlineview.
2023-08-08 12:25:42 +08:00
91b15183e7 [enhance][external]enhance and fix external cases 0807 (#22689)
enhance and fix external cases 0807
2023-08-08 10:53:08 +08:00
c9dc715c5d [fix](broker-load) fix error when using multi data description for same table in load stmt (#22666)
For load request, there are 2 tuples on scan node, input tuple and output tuple.
The input tuple is for reading file, and it will be converted to output tuple based on user specified column mappings.

And the broker load support different column mapping in different data description to same table(or partition).
So for each scanner, the output tuples are same but the input tuple can be different.

The previous implements save the input tuple in scan node level, causing different scanner using same input tuple,
which is incorrect.
This PR remove the input tuple from scan node and save them in each scanners.
2023-08-07 20:03:03 +08:00
bc697ca9d6 [fix](time) fix error in time_to_sec 2023-08-07 17:33:24 +08:00
f036cdfde6 [feature](compaction) support delete in cumulative compaction (#19609) 2023-08-07 15:22:21 +08:00
c31226b144 [refractor](regression-test) sort out test cases of external tables (#22640)
sort out the test cases of external table.
After modify, there are 2 directories:

1. `external_table_p0`: all p0 cases of external tables: hive, es, jdbc and tvf
2. `external_table_p2`: all p2 cases of external tables: hive, es, mysql, pg, iceberg and tvf

So that we can run it with one line command like:

```
sh run-regression-test.sh --run -d external_table_p0,external_table_p2
```
2023-08-07 11:12:30 +08:00
1a8a1e5b16 [Feature](count_by_enum) support count_by_enum function (#22071)
count_by_enum(expr1, expr2, ... , exprN);

Treats the data in a column as an enumeration and counts the number of values in each enumeration. Returns the number of enumerated values for each column, and the number of non-null values versus the number of null values.
2023-08-06 16:05:14 +08:00
d3b50e3b2a [BUG](date_trunc) fix date_trunc function only handle lower string (#22602)
fix date_trunc function only handle lower string
2023-08-05 12:53:13 +08:00
fe6bae2924 [fix](invert index) supports utf8 and non-utf8 strings (#22570)
supports utf8 and non-utf8 strings: [fix] compatible with utf8 and invalid utf8 doris-thirdparty#110
2023-08-05 12:52:53 +08:00
3024b82918 [fix](load)Fix wrong default value for char and varchar of reading json data (#22626)
If a column is defined as: col VARCHAR/CHAR NULL and no default value. Then we load json data which misses column col, the result queried is not correct:
+------+
| col |
+------+
| 1 |
+------+
But expect:
+------+
| col |
+------+
| NULL |
+------+

---------

Co-authored-by: duanxujian <duanxujian@jd.com>
2023-08-05 12:47:27 +08:00
7fe08c74fe [fix](inverted index) return empty result instead of error for empty match query (#22592)
return empty result instead of error for empty match query as follows:

`SELECT * FROM t WHERE msg MATCH ''`

`SELECT * FROM t WHERE msg MATCH 'stop_word'`
2023-08-04 17:36:32 +08:00
ef53a27887 [fix](nereids) allow in or exits subquery in binary operator (#22391)
support subquery in binary operator like if( xx  in ( subquery ), 1, 0 )
2023-08-04 15:35:19 +08:00
658d75c816 [feature](Nereids): normalize join condition after expanding or condition NLJ (#22555) 2023-08-04 13:37:37 +08:00
f828a3d826 [shape](nereids) ssb sf100 plan shape check (#22596) 2023-08-04 13:12:21 +08:00
62b1a7bcf3 [tpcds](nereids) add rule to eliminate empty relation #22203
1. eliminate emptyrelation,
2. const fold after filter pushdown
2023-08-04 12:49:53 +08:00
0e9fad4fe9 [stats](nereids) improve Anti join stats estimation #22444
No impact on TPC-H
impact on TPC-DS 16/69/94  improved
2023-08-04 12:48:39 +08:00
3447a70b25 [Fix](planner)fix delete stmt contains where but delete all data. (#22563) 2023-08-03 23:44:05 +08:00
469886eb4e [FIX](array)fix if function for array() #22553
[FIX](array)fix if function for array() #22553
2023-08-03 19:40:45 +08:00
4322fdc96d [feature](Nereids): add or expansion in CBO(#22465) 2023-08-03 13:29:33 +08:00
938f768aba [fix](parquet) resolve offset check failed in parquet map type (#22510)
Fix error when reading empty map values in parquet. The `offsets.back()` doesn't not equal the number of elements in map's key column.

### How does this happen
Map in parquet is stored as repeated group, and `repeated_parent_def_level` is set incorrectly when parsing map node in parquet schema.
```
the map definition in parquet:
 optional group <name> (MAP) {
   repeated group map (MAP_KEY_VALUE) {
     required <type> key;
     optional <type> value;
   }
}
```

### How to fix
Set the `repeated_parent_def_level` of key/value node as the definition level of map node.

`repeated_parent_def_level` is the definition level of the first ancestor node whose `repetition_type` equals `REPEATED`.  Empty array/map values are not stored in doris column, so have to use `repeated_parent_def_level` to skip the empty or null values in ancestor node.

For instance, considering an array of strings with 3 rows like the following:
`null, [], [a, b, c]`
We can store four elements in data column: `null, a, b, c`
and the offsets column is: `1, 1, 4`
and the null map is: `1, 0, 0`
For the `i-th` row in array column: range from `offsets[i - 1]` until `offsets[i]` represents the elements in this row, so we can't store empty array/map values in doris data column. As a comparison, spark does not require `repeated_parent_def_level`, because the spark column stores empty array/map values , and use anther length column to indicate empty values. Please reference: https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java

Furthermore, we can also avoid store null array/map values in doris data column. The same three rows as above, We can only store three elements in data column: `a, b, c`
and the offsets column is: `0, 0, 3`
and the null map is: `1, 0, 0`
2023-08-02 22:33:10 +08:00
0cd5183556 [Refactor](inverted index) refact tokenize function for inverted index (#22313) 2023-08-02 19:12:22 +08:00
ddd90855a9 [vectorized](udaf) java udaf support with map type (#22397)
[vectorized](udaf) java udaf support with map type (#22397)
* test
* remove some unused
* update
* add case
2023-08-02 15:03:44 +08:00
bf50f9fa7f [fix](decimal) fix cast rounding half up with negative number (#22450) 2023-08-01 21:47:42 +08:00
b8399148ef [fix](DOE) es catalog not working with pipeline,datetimev2, array and esquery (#22046) 2023-08-01 21:45:16 +08:00
d5d82b7c31 [stats](nereids) fix bug for avg-size (#22421) 2023-08-01 17:13:00 +08:00
Pxl
8d16f1bb09 [Chore](materialized-view) update documentation about materialized-view and update test (#22350)
update documentation about materialized-view and update test
2023-08-01 15:13:34 +08:00
7a2ff56863 [regression](fix) fix test_round case (#22441) 2023-08-01 11:35:44 +08:00
c1f36639fd [fix](sort) VSortedRunMerger does not return any rows with a large offset value (#22191) 2023-07-31 22:28:13 +08:00
450e0b1078 [fix](nereids) recompute logical properties in plan post process (#22356)
join commute rule will swap the left and right child. This cause the change of logical properties. So we need recompute the logical properties in plan post process to get the correct result
2023-07-31 21:04:39 +08:00
3a1d678ca9 [Fix](Planner) fix parse error of view with group_concat order by (#22196)
Problem:
    When create view with projection group_concat(xxx, xxx order by orderkey). It will failed during second parse of inline view

For example:
    it works when doing 
    "SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM  test GROUP BY id"
    but when create view it does not work
    "create view test_view as SELECT id, group_concat(`name`, "," ORDER BY id) AS test_group_column FROM  test GROUP BY id"

Reason:
    when creating view, we will doing parse again of view.toSql() to check whether it has some syntax error. And when doing toSql() to group_concat with order by, it add seperate ', ' between second parameter and order by. So when parsing again, it
would failed because it is different semantic with original statement.
    group_concat(`name`, "," ORDER BY id)  ==> group_concat(`name`, "," , ORDER BY id)

Solved:
    Change toSql of group_concat and add order by statement analyze() of group_concat in Planner cause it would work if we get order by from view statement and do not analyze and binding slot reference to it
2023-07-31 17:20:23 +08:00
7261845b3d [FIX](complex-type)fix complex type nested col_const (#22375)
for array/map/struct in mysql_writer unpack_if_const only unpack self column not nested , so col_const should not used in nested column.
2023-07-31 14:53:18 +08:00