Commit Graph

881 Commits

Author SHA1 Message Date
05f6e4c48a [fix](predicate) fix be core dump caused by pushing down the double column predicate (#15693) 2023-01-09 19:31:04 +08:00
67ceb83294 [enhance](Nereids): polish test format, add more comment. (#15662) 2023-01-09 15:40:27 +08:00
5ceb5441f4 [feature](nereids) let set operation syntax campatible with lagecy planner (#15664)
Though this syntax doesn't get suppoted in many other systems since the order by clause here almost redandunt and useless but we have to keep consistent with the legacy doris syntax

Here is a example:
SELECT * FROM (SELECT k1, k3 FROM tbl1 ORDER BY k3 UNION ALL SELECT k1, k5 FROM tbl2) t;
2023-01-09 15:31:29 +08:00
7543d677fa [fix](nereids) Fix the bugs of data distribution calculation on OlapScan (#15699)
when need to scan more than one olap table partition and it is not a colocate table or its colocate group is unstable, we need to make it as any distribution even if its distribution type is Hash
2023-01-09 15:25:54 +08:00
e2492cf7fc [Bug](DECIMALV3) Fix binary predicate between decimalv3 and float (#15696) 2023-01-09 15:16:59 +08:00
4c50c4906b [fix](Nereids) add implicit casting for arithmetic expression (#15630)
Add implicit casting for arithmetic expression to support select "1" + "2"
2023-01-09 15:10:35 +08:00
699bf972e2 [Bug](bitmap) Fix bitmap_from_string for null constant (#15698) 2023-01-09 10:21:08 +08:00
211cc66d02 [fix](multi-catalog) fix image loading failture when create catalog with resource (#15692)
Bug fix
fix image loading failture when create catalog with resource
When creating jdbc catalog with resource, the metadata image will failed to be loaded.
Because when loading jdbc catalog image, it will try to get resource from ResourceMgr,
but ResourceMgr has not been loaded, so NPE will be thrown.

This PR fix this bug, and refactor some logic about catalog and resource.

When loading jdbc catalog image, it will not get resource from ResourceMgr.
And now user can create catalog with resource and properties, like:

create catalog jdbc_catalog with resource jdbc_resource
properites("user" = "user1");
The properties in "properties" clause will overwrite the properties in "jdbc_resource".

force adding tinyInt1isBit=false to jdbc url
The default value of tinyInt1isBit is true, and it will cause tinyint in mysql to be bit type.
force adding tinyInt1isBit=false to jdbc url so that the tinyint in mysql will be tinyint in Doris.

Avoid calculate checksum of jdbc driver jar multiple times
Refactor
Refactor the notification logic when updating properties in resource.
When updating properties in resource, it will notify the corresponding catalog to update its own properties.
This PR change this logic. After updating properties in resource, it will only uninitialize the catalog's internal
objects such "jdbc client" or "hms client". And this objects will be re-initialized lazily.

And all properties will be got from Resource at runtime, so that it will always get the latest properties

Regression test cases
Because we add tinyInt1isBit=false to jdbc url, some of cases need to be changed.
2023-01-09 09:56:26 +08:00
Pxl
1514b5ab5c [Feature](Materialized-View) support advanced Materialized-View (#15212) 2023-01-09 09:53:11 +08:00
36590da24b [fix](regression p0) add the alias function hist to histogram and fix p0 (#15708)
add the alias function hist to histogram and fix p0
2023-01-08 11:31:23 +08:00
90be1a22a9 [bugfix](vertical compaction) fix dcheck failed in MOW tablet (#15638)
fix a dcheck error for vertical compaction on Merge-On-Write table。
When merge rowsets with empty segment, VerticalHeapMergeIterator::init
return ok directly and _record_rowids not set, dcheck failed when
_unique_key_next_block call current_block_row_locations。
2023-01-08 10:39:52 +08:00
500c7fb702 [improvement](multi-catalog) support unsupported column type (#15660)
When creating an external catalog, Doris will automatically sync the schema of table from external catalog.
But some of column type are not supported by Doris now, such as struct, map, etc.

In previous, when meeting these unsupported column, Doris will throw an exception, and the corresponding
table can not be synced. But user may just want to query other supported columns.

In this PR, I add a new column type: UNSUPPORTED. And now it is just used for external table schema sync.
When meeting unsupported column, it will be synced as column with UNSUPPORTED type.

When query this table, there are serval situation:

select * from table: throw error Unsupported type 'UNSUPPORTED_TYPE' xxx
select k1 from table: k1 is with supported type. query OK.
select * except(k2): k2 is with unsupported type. query OK
2023-01-08 10:07:10 +08:00
a1d8177e33 [fix](test) remove unstable regression test (#15689)
remove `regression-test/suites/nereids_performance_p0/sub_query_join_where_pushdown.groovy`
2023-01-07 22:05:22 +08:00
5dfdacd278 [enhancement](histogram) add histogram syntax and perstist histogram statistics (#15490)
Histogram statistics are more expensive to collect and we collect and persist them separately.

This PR does the following work:
1. Add histogram syntax and add keyword `TABLE`
2. Add the task of collecting histogram statistics
3. Persistent histogram statistics
4. Replace fastjson with gson
5. Add unit tests...

Relevant syntax examples:
> Refer to some databases such as mysql and add the keyword `TABLE`.

```SQL
-- collect column statistics
ANALYZE TABLE statistics_test;

-- collect histogram statistics
ANALYZE TABLE statistics_test UPDATE HISTOGRAM ON col1,col2;
```

base on #15317
2023-01-07 00:55:42 +08:00
76ad599fd7 [enhancement](histogram) optimise aggregate function histogram (#15317)
This pr mainly to optimize the histogram(👉🏻 https://github.com/apache/doris/pull/14910)  aggregation function. Including the following:
1. Support input parameters `sample_rate` and `max_bucket_num`
2. Add UT and regression test
3. Add documentation
4. Optimize function implementation logic
 
Parameter description:
- `sample_rate`:Optional. The proportion of sample data used to generate the histogram. The default is 0.2.
- `max_bucket_num`:Optional. Limit the number of histogram buckets. The default value is 128.

---

Example:

```
MySQL [test]> SELECT histogram(c_float) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_float`)                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.2,"max_bucket_num":128,"bucket_num":3,"buckets":[{"lower":"0.1","upper":"0.1","count":1,"pre_sum":0,"ndv":1},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+

MySQL [test]> SELECT histogram(c_string, 0.5, 2) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_string`)                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.5,"max_bucket_num":2,"bucket_num":2,"buckets":[{"lower":"str1","upper":"str7","count":4,"pre_sum":0,"ndv":3},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+
```

Query result description:

```
{
    "sample_rate": 0.2, 
    "max_bucket_num": 128, 
    "bucket_num": 3, 
    "buckets": [
        {
            "lower": "0.1", 
            "upper": "0.2", 
            "count": 2, 
            "pre_sum": 0, 
            "ndv": 2
        }, 
        {
            "lower": "0.8", 
            "upper": "0.9", 
            "count": 2, 
            "pre_sum": 2, 
            "ndv": 2
        }, 
        {
            "lower": "1.0", 
            "upper": "1.0", 
            "count": 2, 
            "pre_sum": 4, 
            "ndv": 1
        }
    ]
}
```

Field description:
- sample_rate:Rate of sampling
- max_bucket_num:Limit the maximum number of buckets
- bucket_num:The actual number of buckets
- buckets:All buckets
    - lower:Upper bound of the bucket
    - upper:Lower bound of the bucket
    - count:The number of elements contained in the bucket
    - pre_sum:The total number of elements in the front bucket
    - ndv:The number of different values in the bucket

> Total number of histogram elements = number of elements in the last bucket(count) + total number of elements in the previous bucket(pre_sum).
2023-01-07 00:50:32 +08:00
9c8fcd805c [feature](Nereids) support variable type expression (#15659) 2023-01-07 00:32:57 +08:00
08d439cde7 [feature](Nereids) add keyword rlike (#15647) 2023-01-07 00:28:21 +08:00
c18bfdc93e [test][regression cases][external]add external table p2 regression cases according doris1.2 docs 20230105 (#15651) 2023-01-06 20:19:39 +08:00
cad47dd9d9 [test](Nereids) add two regression test cases for Nereids (#15598)
1. test predicates infer could work well with push down predicates through join
2. test count with subquery containing constant literal
2023-01-06 16:29:50 +08:00
53559e2bdc [fix](decimalv2) fix loss of precision when cast to decimalv2 literal (#15629) 2023-01-06 16:02:46 +08:00
7f84db310a [fix](nereids) Convert to datetime when binary expr's left is date and right is int type (#15615)
In the below case, expression ` date > 20200101` should implicit cast date both side to datetime instead of bigint

```sql
        CREATE TABLE `part_by_date`
        (
            `date`                  date   NOT NULL COMMENT '',
            `id`                      int(11) NOT NULL COMMENT ''
        ) ENGINE=OLAP
        UNIQUE KEY(`date`, `id`)
        PARTITION BY RANGE(`date`) 
        (PARTITION p201912 VALUES [('0000-01-01'), ('2020-01-01')),
        PARTITION p202001 VALUES [('2020-01-01'), ('2020-02-01')))
        DISTRIBUTED BY HASH(`id`) BUCKETS 3
        PROPERTIES (
        "replication_allocation" = "tag.location.default: 1"
        );

        INSERT INTO  part_by_date VALUES('0001-02-01', 1),('2020-01-15', 2);

        SELECT
            id
        FROM
           part_by_date
        WHERE date > 20200101;
```
2023-01-06 14:08:05 +08:00
ae77b582f0 [fix](Nereids) add information function and fix bugs in schemaScan (#15608)
1. Add information function
- Database()
- User()
- Current_User()
- Connection_id()

2. Fix bugs in schemaScan
2023-01-06 13:37:27 +08:00
ef72b8d859 [Feature](Nereids): add logical operator || && (#15643) 2023-01-06 12:18:21 +08:00
df2da89b89 [feature](multi-catalog) support postgresql jdbc catalog (#15570)
support postgresql jdbc catalog
2023-01-06 11:00:59 +08:00
05d72e8919 [fix](join) fix anti join incorrectly outputs null values (#15567) 2023-01-06 09:55:48 +08:00
5460c873e8 [Feature] (Nereids) support un equals conjuncts in un scalar sub query (#15591)
support un equals conjuncts in un scalar sub query.
[fix] in correlated subquery wrong result
2023-01-05 16:56:14 +08:00
59f34be41f [fix](having-clause) having clause do not works correct with same alias name (#15143) 2023-01-05 10:15:15 +08:00
5ff5b8fc98 [feature](mark join) Support mark join for hash join node (#15569)
* [feature](mark join) Support mark join for hash join node
2023-01-05 09:32:26 +08:00
61d538c713 [improvement](storage-policy) Add check validity when create storage policy. (#14405) 2023-01-04 22:24:49 +08:00
a4af1fbf90 [fix](inbitmap) forbid having clause to include in bitmap. (#15494) 2023-01-04 14:33:18 +08:00
f2f06c1acc [feature](nereids) Support select temp partition (#15579)
Support such grammer:
    select * from t_p temporary partition(tp1);
    select * from t_p temporary partitions(tp1);
    select * from t_p temporary partition tp1;
2023-01-04 11:04:36 +08:00
eef1f432dd [Bug](datetimev2/decimalv3) Fix wrong predicate infer rule (#15574) 2023-01-04 10:03:43 +08:00
a97f582b93 [fix](nereids) use DAYS as default unit for DATE_ADD and DATE_SUB function (#15559) 2023-01-04 01:55:15 +08:00
18bc354c06 [fix](Nereids) use correct column unique id when read data from non-base index (#15534)
When light schema change is enabled by default, a column in OLAP scan is retrieved by column unique id instead of the column name. Columns with the same name would use different unique IDs among materialized indexes.
This PR ensures that the column in the OLAP scan node could use the correct column unique id.
2023-01-04 01:41:25 +08:00
8d0c06c897 [fix](nereids) binding priority in agg-sort, having, group_by_key (#15240)
This PR defines order_key and having_key binding priority.

1. order key priority
 ```
                select
                        col1 * -1 as col1    # inner_col1 * -1 as alias_col1
                from
                        t
                order by col1;     # order by order_col1
```
to bind `order_col1`, `alias_col1` has higher priority than `inner_col1`

2. having key priority
```
       select (a-1) as a  # inner_a - 1 as alias_a
       from bind_priority_tbl 
       group by a 
       having a=1;
```
to bind having key, `inner_a` has higher priority than `alias_a`

3. group by key binding priority
```
SELECT date_format(b.k10,
         '%Y%m%d') AS k10
FROM test a
LEFT JOIN 
    (SELECT k10
    FROM baseall) b
    ON a.k10 = b.k10
GROUP BY  k10;
```
group_by_key (k10) binding priority:

- agg.child.output
- agg.output
if binding with agg.child.output failed(the slot not found, or more than one candidate slot found in agg.child.output), nereids try to bind group_by_key with agg.output.
In above example, nereids found 2 candidate slots (a.k10, b.k10) in agg.child.output for group_by_key (k10), binding with agg.child.output failed. Then nereids try to bind group_by_key with agg.output, that is `date_format(b.k10, '%Y%m%d') AS k10`. and finally, group_by_key is bound with `alias k10`
2023-01-03 22:09:28 +08:00
55dc541c90 [Fix](Nereids) aggregate function except COUNT should nullable without group by expr (#15547)
Co-authored-by: mch_ucchi
2023-01-03 21:28:07 +08:00
Pxl
85fe9d2496 [Bug](filter) fix not in(null) return true (#15466)
fix not in(null) return true
2023-01-03 21:14:50 +08:00
1dabcb0111 [Fix](Nereids) fix except and intersect error for statsCalculator (#15557)
When calculating the statsCalculator of except and intersect, the slotId of the corresponding column was not replaced with the slotId of output, resulting in NPE.
2023-01-03 17:06:57 +08:00
b50448d5c4 [vectorized](udaf) fix udaf result is null when has multiple aggs (#15554) 2023-01-03 16:03:43 +08:00
8748f65a1b [fix](nereids)support nulls first/last in order by clause (#15530) 2023-01-03 14:56:00 +08:00
ada72b055f [feature](Nereids): Support any_value/any function. (#15450) 2023-01-03 12:21:13 +08:00
31548cfe2a [fix](nereids) check failed that exchange node under agg must from PhysicalDistribute (#15473)
when nereids translates PhysicalHashAggreg node to original plan, if the input fragment root is exchange node, nereids assumes that this exchanged node is generated from PhyscialDistirbute node.
But this assumption is not true. For example, sort node could be translated to exchange(merge phase)+sort(local phase).
2023-01-03 11:19:25 +08:00
5d145cf86f [fix](regression-test) fix duplicate columns in yandex_metrica_p2 case (#15489) 2023-01-02 20:31:46 +08:00
ad9a67a76a [Bug](decimalv3) Fix wrong decimalv3 value after insertion (#15505) 2023-01-01 11:08:59 +08:00
487d159a3d [improvement](test) add one case for hll (#15543) 2023-01-01 11:02:34 +08:00
781fa17993 [fix](Nereids) round function return type should be double (#15502) 2022-12-30 23:36:15 +08:00
100834df8b [fix](nereids) fix some arrgregate bugs in Nereids (#15326)
1. the agg function without distinct keyword should be a "merge" funcion in threePhaseAggregateWithDistinct
2. use aggregateParam.aggMode.consumeAggregateBuffer instead of aggregateParam.aggPhase.isGlobal() to indicate if a agg function is a "merge" function
3. add an AvgDistinctToSumDivCount rule to support avg(distinct xxx) in some case
4. AggregateExpression's nullable method should call inner function's nullable method.
5. add a bind slot rule to bind pattern "logicalSort(logicalHaving(logicalProject()))"
6. don't remove project node in PhysicalPlanTranslator
7. add a cast to bigint expr when count( distinct datelike type )
8. fallback to old optimizer if bitmap runtime filter is enabled.
9. fix exchange node mem leak
2022-12-30 23:07:37 +08:00
9c3c9db49b [enhancement](fuzzy test) support fuzzy test of RewriteOrToInPredicateThreshold #15469
Co-authored-by: wuhangze <wuhangze@jd.com>
2022-12-30 22:59:59 +08:00
93a25e1af5 [fix](nereids) the project node is lost when creating PhysicalStorageLayerAggregate node (#15467) 2022-12-30 16:33:24 +08:00
2704651fde [fix](nereids) hll and bitmap type can't be used as order by and group by exprs (#15471)
hll, bitmap, array and quantile state type can't be used in order by, group by and some agg exprs.
2022-12-30 14:26:21 +08:00