Commit Graph

1469 Commits

Author SHA1 Message Date
8caa5a9ba4 [Fix](mutli-catalog) Fix null partitions error in iceberg tables. (#22185)
### Issue
when partition has null partitions, it throws error
`Failed to fill partition column: t_int=null`

### Resolution
- Fix the following null partitions error in iceberg tables by replacing null partition to '\N'.
- Add regression test for hive null partition.
2023-07-27 23:57:35 +08:00
b5fa29e138 [fix](bitmap) incorrect result of function 'bitmap_from_array' (#22305) 2023-07-27 22:44:06 +08:00
716d58f5ff [fix](Nereids) decimal divide should not return null if numerator is zero (#22309) 2023-07-27 20:23:04 +08:00
a87d34b19b [Fix](multi catalog statistics)Improve external table statistics collection (#22224)
Improve external table statistics collection, including log, observability and fix some bugs.
1. Add Running state for statistics job.
2. Add progress for show analyze job. (n/m tasks finished, n/m task failed and so on)
3. Add analyze time cost for show analyze task.
4. Make task failure message more clear.
5. Synchronize the job status updating code in updateTaskStatus.
6. Fix NPE in HMSAnalyzeTask. (Avoid refreshing statistics cache if the collection sql failed)
7. Return error message for with sync collection while timeout. 
8. Log level improvement
9. Fix misuse of logCreateAnalysisJob for tasks.
2023-07-27 20:01:14 +08:00
ae5e39ad26 [opt](Nereids) add double signature back for round like function (#22284)
add double signature back for round like function
2023-07-27 19:10:43 +08:00
6f1c03c766 [fix](jdbc_catalog) fix int and bigint in mysql view when use doris catalog (#22251) 2023-07-27 16:50:42 +08:00
0512e0b168 [test](regression) add cases for partial update with sequence_type (#22215) 2023-07-27 15:51:01 +08:00
4f6a3c5bf0 [feature](catalog) support clob type in oracle jdbc catalog (#21532) 2023-07-27 15:49:15 +08:00
ddfdf62993 [opt](planner) support to parse scientific notation(aEb) (#22248) 2023-07-27 13:31:37 +08:00
41a230b721 [fix] iceberg catalog to specify the version and time (#22209)
problem:
1. create a iceberg_type catalog:
2. use iceberg catalog to specify verison
```
mysql> show catalog iceberg;
+----------------------+--------------------------+
| Key                  | Value                    |
+----------------------+--------------------------+
| type                 | iceberg                  |
| iceberg.catalog.type | hms                      |
| hive.metastore.uris  | thrift://127.0.0.1:9083 |
| hadoop.username      | hadoop                   |
| create_time          | 2023-07-25 16:51:00.522  |
+----------------------+--------------------------+
5 rows in set (0.02 sec)

mysql> select * from iceberg.iceberg_db.tb1 FOR VERSION AS OF 8783036402036752909;
ERROR 5090 (42000): errCode = 2, detailMessage = Only iceberg/hudi external table supports time travel in current version
```

change:
Add `ICEBERG_EXTERNAL_TABLE` type for specify the version and time
2023-07-27 12:04:41 +08:00
619a2857e1 [improvement](jdbc catalog) improve mysql jdbc catalog read bytea`s types & else improve (#22233) 2023-07-27 10:18:37 +08:00
341c45974c [round](decimalv2) round precise decimalv2 value (#22258) 2023-07-27 10:00:36 +08:00
163a38a527 [opt](Nereids) support sql cache (#22144)
1. let Nereids support sql cache
2. let legacy planner's sql cache supports union all
2023-07-27 09:57:31 +08:00
8fb28ecc9e [test](partial-update) add some cases for partial-update (#22210) 2023-07-27 09:52:40 +08:00
dcd6844ea5 [improvement](regression-test) add partial update with schema change case (#22213) 2023-07-27 09:51:42 +08:00
fb41265c27 [opt](Nereids) add boolean type signature for sum aggregate function (#21959) 2023-07-27 09:41:19 +08:00
8ff487cc4b [fix](cast) fix invalid value error when casting null date value to string then casting to date value (#22223) 2023-07-26 17:59:01 +08:00
14dcc53135 [fix](Nereids) cast time should turn nullable on all valid types (#22242)
valid types to cast to time/timev2:
- TINYINT
- SMALLINT
- INT
- BIGINT
- LARGEINT
- FLOAT
- DOUBLE
- CHAR
- VARCHAR
- STRING
2023-07-26 17:56:19 +08:00
be69025878 [opt](Nereids) add partial update support for delete stmt (#22184)
Currently, the new optimizer don't consider anything about partial update.
This PR add the ability to convert a delete statement to a partial update insert statement
for merge-on-write unique table
2023-07-26 17:34:31 +08:00
bb67a1467a [fix](Nereids): mergeGroup should merge target Group into existed Group (#22123) 2023-07-26 13:13:25 +08:00
21a3593a9a [fix](Nereids) translate failed when enable topn two phase opt (#22197)
1. should not add rowid slot to reslovedTupleExprs
2. should set notMaterialize to sort's tuple when do two phase opt
2023-07-26 11:38:50 +08:00
cf677b327b [fix](jdbc catalog) Fixed mappings with type errors for bool and tinyint(1) (#22089)
First of all, mysql does not have a boolean type, its boolean type is actually tinyint(1), in the previous logic, We force tinyint(1) to be a boolean by passing tinyInt1isBit=true, which causes an error if tinyint(1) is not a 0 or 1, Therefore, we need to match tinyint(1) according to tinyint instead of boolean, and this change will not affect the correctness of where k = 1 or where k = true queries
2023-07-25 22:45:22 +08:00
5c8eda8685 [enhencement](regression) add UPDATE & DELETE tests for MOW partial update (#22212) 2023-07-25 22:03:38 +08:00
fc2b9db0ad [Feature](inverted index) add tokenize function for inverted index (#21813)
In this PR, we introduce TOKENIZE function for inverted index, it is used as following:
```
SELECT TOKENIZE('I love my country', 'english');
```
It has two arguments, first is text which has to be tokenized, the second is parser type which can be **english**, **chinese** or **unicode**.
It also can be used with existing table, like this:
```
mysql> SELECT TOKENIZE(c,"chinese") FROM chinese_analyzer_test;
+---------------------------------------+
| tokenize(`c`, 'chinese')              |
+---------------------------------------+
| ["来到", "北京", "清华大学"]          |
| ["我爱你", "中国"]                    |
| ["人民", "得到", "更", "实惠"]        |
+---------------------------------------+
```
2023-07-25 15:05:35 +08:00
d96e31c4d7 [opt](Nereids) not push down global limit to avoid early gather (#21891)
the global limit will create a gather action, and all the data will be calculated in one instance. If we push down the global limit, the node run after the limit node will run slowly.
We fix it by push down only local limit.

a join plan tree before fixing:

```
LogicalLimit(global)
    LogicalLimit(local)
        Plan()
            LogicalLimit(global)
                LogicalLimit(local)
                    LogicalJoin
                        LogicalLimit(global)
                            LogicalLimit(local)
                                Plan()
                        LogicalLimit(global)
                            LogicalLimit(local)
                                Plan()    

after fixing:
LogicalLimit(global)
    LogicalLimit(local)      
        Plan()
            LogicalLimit(local)
                LogicalJoin
                    LogicalLimit(local)
                        Plan()
                    LogicalLimit(local)
                        Plan()
```
2023-07-25 14:45:20 +08:00
2b4bfe5be7 [fix](autoinc) fix _fill_auto_inc_cols when the input column is ColumnConst (#22175) 2023-07-25 14:41:36 +08:00
c01230f99a [fix](match) Optimize the logic for match_phrase function filter (#21622) 2023-07-25 14:22:37 +08:00
0f439bb1ca [vectorized](udf) java udf support map type (#22059) 2023-07-25 11:56:20 +08:00
b41fcbb783 [feature](agg) add the aggregation function 'mag_agg' (#22043)
New aggregation function: map_agg.

This function requires two arguments: a key and a value, which are used to build a map.

select map_agg(column1, column2) from t group by column3;
2023-07-25 11:21:03 +08:00
a0463ea047 [round](decimalv2) round decimalv2 to precision value (#22138)
* [round](decimalv2) round decimalv2 to precision value

* update

* update`
2023-07-25 03:29:48 +08:00
752cec9e19 [Fix](multi-catalog) Fix not single slot filter conjuncts with dict filter issue. (#22052)
### Issue
Dictionary filtering is a mechanism that directly reads the dictionary encoding of a single string column filter condition for filter comparison. But dictionary filtered single string columns may be included in other multi-column filter conditions. This can cause problems.

For example:
`select * from multi_catalog.lineitem_string_date_orc where l_commitdate < l_receiptdate and l_receiptdate = '1995-01-01'  order by l_orderkey, l_partkey, l_suppkey, l_linenumber limit 10;`

`l_receiptdate` is string filter column,it is included by multi-column filter condition `l_commitdate < l_receiptdate`.

### Solution
Resolve it by separating the multi-column filter conditions and executing it after the dictionary filter column is converted to string.
2023-07-24 22:31:18 +08:00
21deb57a4d [fix](Nereids) remove double sigature of ceil, floor and round (#22134)
we convert input parameters to double for function ceil, floor and round,
because DecimalV2 could not do these operation. Since we intro DecimalV3,
we should convert all parameters to DecimalV3 to get correct result.
For example, when we use double as parameters, we get wrong result:
```sql
select round(341/20000,4),341/20000,round(0.01705,4);
+-------------------------+---------------+-------------------+
| round((341 / 20000), 4) | (341 / 20000) | round(0.01705, 4) |
+-------------------------+---------------+-------------------+
| 0.017                   | 0.01705       | 0.0171            |
+-------------------------+---------------+-------------------+
```
DecimalV3 could get correct result
```sql
select round(341/20000,4),341/20000,round(0.01705,4);
+-------------------------+---------------+-------------------+
| round((341 / 20000), 4) | (341 / 20000) | round(0.01705, 4) |
+-------------------------+---------------+-------------------+
| 0.0171                  | 0.01705       | 0.0171            |
+-------------------------+---------------+-------------------+
```
2023-07-24 16:08:00 +08:00
ac9480123c [refactor](Nereids) push down all non-slot order key in sort and prune them upper sort (#22034)
According the implementation in execution engine, all order keys
in SortNode will be output. We must normalize LogicalSort follow
by it.
We push down all non-slot order key in sort to materialize them
behind sort. So, all order key will be slot and do not need do
projection by SortNode itself.
This will simplify translation of SortNode by avoid to generate
resolvedTupleExprs and sortTupleDesc.
2023-07-24 15:36:33 +08:00
b5f27b5349 [enhance](nereids) enable wf partition topn by default (#21860) 2023-07-24 14:21:45 +08:00
138e6c2f01 [stats](nereids)keep min/max expr in colstats (#22064)
columnStatistics.minExpr and maxExpr is useful when we derive stats for cast function.
This pr
1. maintains the min/max expr during stats derive in filter condition: col<literal, col>literal and col=literal
2. adjust column stats range for cast function (now only support cast from string to other types)

ds9 is changed, but no performance issue: on tpcds_sf100_rf exe time is 1.5~1.6sec, the same as master
2023-07-24 10:28:36 +08:00
4f0158c458 [fix](partial-update) fix update core for merge-on-write table (#22090) 2023-07-23 13:35:08 +08:00
Pxl
ae809fbeba [Bug](storage )fix dead lock when create_tablet need lock two tablet && update mv_p0… (#21969)
fix dead lock when create_tablet need lock two tablet && update mv_p0/ssb case
2023-07-22 15:27:05 +08:00
50c8563f35 [fix](partial update) fix some bugs of sequence column (#21896) 2023-07-22 15:26:48 +08:00
355ac18363 [Fix](jdbc catalog) Pass conjuncts to JdbcScanNode and FileScanNode before doing finalize. (#21998)
JdbcScanNode need to use the conjuncts to generate sql in finalize function. But the conjuncts have not passed to JdbcScanNode yet while calling finalize. This pr is to pass the conjuncts to scan node before using it to avoid scan the whole table.
2023-07-22 14:08:44 +08:00
f7e3cc1553 [FIX](map)fix map proto contains_null #22107
when we select map in order by and limit; be node will coredump
2023-07-22 10:41:55 +08:00
40299d280d [Fix](json reader) fix rapidjson array->PushBack may take ownership… (#21988)
With bellow json path
`["$.data","$.data.datatimestamp"]`

After `array_obj->PushBack` the `data` field owner will be taken from array_obj, and lead to null values for json path `$.data.datatimestamp`

Rapidjson doc:
```
//! Append a GenericValue at the end of the array.
  \note The ownership of \c value will be transferred to this array on success.
 */
GenericValue& PushBack(GenericValue& value, Allocator& allocator);
```
2023-07-21 17:02:01 +08:00
2b2ac10e93 [feature](partial update) add failure tolerance for strict mode partial update stream load 2023-07-21 16:46:44 +08:00
732e0d14ff [Enhancement](window-funnel)add different modes for window_funnel() function (#20563) 2023-07-21 13:57:27 +08:00
74313c7d54 [feature-wip](autoinc)(step-3) add auto increment support for unique table (#22036) 2023-07-21 13:24:41 +08:00
eabd5d386b [Fix](multi catalog)Fix nereids context table always use internal catalog bug (#21953)
The getTable function in CascadesContext only handles the internal catalog case (try to find table only in internal 
catalog and dbs). However, it should take all the external catalogs into consideration, otherwise, it will failed to find a 
table or get the wrong table while querying external table. This pr is to fix this bug.
2023-07-20 20:32:01 +08:00
ee65e0a6b1 [fix](Nereids) should not remove any limit from uncorrelated subquery (#21976)
We should not remove any limit from uncorrelated subquery. For Example
```sql
-- should return nothing, but return all tuple of t if we remove limit from exists
SELECT * FROM t WHERE EXISTS (SELECT * FROM t limit 0);

-- should return the tuple with smallest c1 in t,
-- but report error if we remove limit from scalar subquery
SELECT * FROM t WHERE c1 = (SELECT * FROM t ORDER BY c1 LIMIT 1);
```
2023-07-20 18:37:04 +08:00
367ad9164a [feature-wip](auto-inc)(step-2) support auto-increment column for duplicate table (#19917) 2023-07-20 18:03:39 +08:00
86d7233b06 [fix](nereids) ExtractAndNormalizeWindowExpression rule should push down correct exprs to child (#21827)
consider the window function:
```sql
substr(
ref_1.cp_type,
sum(CASE WHEN ref_1.cp_type = 0 THEN 3 ELSE 2 END) OVER (),
1)
```
Before the pr, only "CASE WHEN ref_1.cp_type  = 0 THEN 3 ELSE 2 END" is pushed down.
But both "ref_1.cp_type" and "CASE WHEN ref_1.cp_type  = 0 THEN 3 ELSE 2 END"
should be pushed down.
This pr fix it
2023-07-20 11:47:55 +08:00
20242d9a0e [Improve](simdjson) put unescaped string value after parsed (#21866)
In some cases, it is necessary to unescape the original value, such as when converting a string to JSONB.
If not unescape, then later jsonb parse will be failed
2023-07-20 10:33:17 +08:00
2daad2151d [enhancement](jdbc catalog) Add mysql jdbc catalog function to filter push-down identification (#21745) 2023-07-19 23:48:23 +08:00