Commit Graph

157 Commits

Author SHA1 Message Date
feeb69438b [opt](Nereids) optimize DistributeSpec generator of OlapScan (#15965)
use the size of selected partitions instead of olap table partition size to decide whether generate hashDistributeSpec
2023-01-18 20:18:11 +08:00
e2d145cf5d [fix](fe)fix anti join bug (#15955)
* [fix](fe)fix anti join bug

* fix fe ut
2023-01-17 20:25:00 +08:00
6609eb804d [fix](regression) result of withUnionAll in query_p0/select_no_from is unstable (#15958) 2023-01-17 11:34:41 +08:00
Pxl
81bab55d43 [Bug](function) catch function calculation error on aggregate node to avoid core dump (#15903) 2023-01-16 11:21:28 +08:00
a788623ee2 doris largeint type execute where query, the result is incorrect (#15034) 2023-01-13 23:12:02 +08:00
514de605b6 [Bug](predicate) add double predicate creator (#15762)
Add one double predicator the same as integer predicate creator.
2023-01-13 18:34:09 +08:00
b1fb1277dd [fix](bitmap) fix bitmap iterator comparison error (#15779)
Fix the bug that bitmap.begin() == bitmap.end() is always true when the bitmap contains a single value.
2023-01-13 11:37:07 +08:00
9468711f9f [Bug](join) fix bug null aware left anti join not correct result (#15841) 2023-01-13 10:18:05 +08:00
7441b4dc96 [Feature](function) Support width_bucket function (#14396) 2023-01-12 13:59:21 +08:00
cfb110c905 [fix](nereids) fix some nereids bugs (#15714)
1. remove forcing nullable for slot on EmptySetNode.
2. order by xxx desc should use nulls last as default order.
3. don't create runtime filter if runtime filter mode is OFF.
4. group by constant value need check the corresponding expr shouldn't have any aggregation functions.
5. fix two left outer join reorder bug( A left join B left join C).
6. fix semi join and left outer join reorder bug.( A left join B semi join C ).
7. fix group by NULL bug.
8. change ceil and floor function to correct signature.
9. add literal comparasion for string and date type.
10. fix the getOnClauseUsedSlots method may not return valid value.
11. the tightness common type of string and date should be date.
12. the nullability of set operation node's result exprs is not set correctly.
13. Sort node should remove redundent ordering exprs.
2023-01-11 17:18:44 +08:00
05f6e4c48a [fix](predicate) fix be core dump caused by pushing down the double column predicate (#15693) 2023-01-09 19:31:04 +08:00
211cc66d02 [fix](multi-catalog) fix image loading failture when create catalog with resource (#15692)
Bug fix
fix image loading failture when create catalog with resource
When creating jdbc catalog with resource, the metadata image will failed to be loaded.
Because when loading jdbc catalog image, it will try to get resource from ResourceMgr,
but ResourceMgr has not been loaded, so NPE will be thrown.

This PR fix this bug, and refactor some logic about catalog and resource.

When loading jdbc catalog image, it will not get resource from ResourceMgr.
And now user can create catalog with resource and properties, like:

create catalog jdbc_catalog with resource jdbc_resource
properites("user" = "user1");
The properties in "properties" clause will overwrite the properties in "jdbc_resource".

force adding tinyInt1isBit=false to jdbc url
The default value of tinyInt1isBit is true, and it will cause tinyint in mysql to be bit type.
force adding tinyInt1isBit=false to jdbc url so that the tinyint in mysql will be tinyint in Doris.

Avoid calculate checksum of jdbc driver jar multiple times
Refactor
Refactor the notification logic when updating properties in resource.
When updating properties in resource, it will notify the corresponding catalog to update its own properties.
This PR change this logic. After updating properties in resource, it will only uninitialize the catalog's internal
objects such "jdbc client" or "hms client". And this objects will be re-initialized lazily.

And all properties will be got from Resource at runtime, so that it will always get the latest properties

Regression test cases
Because we add tinyInt1isBit=false to jdbc url, some of cases need to be changed.
2023-01-09 09:56:26 +08:00
76ad599fd7 [enhancement](histogram) optimise aggregate function histogram (#15317)
This pr mainly to optimize the histogram(👉🏻 https://github.com/apache/doris/pull/14910)  aggregation function. Including the following:
1. Support input parameters `sample_rate` and `max_bucket_num`
2. Add UT and regression test
3. Add documentation
4. Optimize function implementation logic
 
Parameter description:
- `sample_rate`:Optional. The proportion of sample data used to generate the histogram. The default is 0.2.
- `max_bucket_num`:Optional. Limit the number of histogram buckets. The default value is 128.

---

Example:

```
MySQL [test]> SELECT histogram(c_float) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_float`)                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.2,"max_bucket_num":128,"bucket_num":3,"buckets":[{"lower":"0.1","upper":"0.1","count":1,"pre_sum":0,"ndv":1},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+

MySQL [test]> SELECT histogram(c_string, 0.5, 2) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_string`)                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.5,"max_bucket_num":2,"bucket_num":2,"buckets":[{"lower":"str1","upper":"str7","count":4,"pre_sum":0,"ndv":3},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+
```

Query result description:

```
{
    "sample_rate": 0.2, 
    "max_bucket_num": 128, 
    "bucket_num": 3, 
    "buckets": [
        {
            "lower": "0.1", 
            "upper": "0.2", 
            "count": 2, 
            "pre_sum": 0, 
            "ndv": 2
        }, 
        {
            "lower": "0.8", 
            "upper": "0.9", 
            "count": 2, 
            "pre_sum": 2, 
            "ndv": 2
        }, 
        {
            "lower": "1.0", 
            "upper": "1.0", 
            "count": 2, 
            "pre_sum": 4, 
            "ndv": 1
        }
    ]
}
```

Field description:
- sample_rate:Rate of sampling
- max_bucket_num:Limit the maximum number of buckets
- bucket_num:The actual number of buckets
- buckets:All buckets
    - lower:Upper bound of the bucket
    - upper:Lower bound of the bucket
    - count:The number of elements contained in the bucket
    - pre_sum:The total number of elements in the front bucket
    - ndv:The number of different values in the bucket

> Total number of histogram elements = number of elements in the last bucket(count) + total number of elements in the previous bucket(pre_sum).
2023-01-07 00:50:32 +08:00
53559e2bdc [fix](decimalv2) fix loss of precision when cast to decimalv2 literal (#15629) 2023-01-06 16:02:46 +08:00
05d72e8919 [fix](join) fix anti join incorrectly outputs null values (#15567) 2023-01-06 09:55:48 +08:00
a97f582b93 [fix](nereids) use DAYS as default unit for DATE_ADD and DATE_SUB function (#15559) 2023-01-04 01:55:15 +08:00
Pxl
85fe9d2496 [Bug](filter) fix not in(null) return true (#15466)
fix not in(null) return true
2023-01-03 21:14:50 +08:00
8748f65a1b [fix](nereids)support nulls first/last in order by clause (#15530) 2023-01-03 14:56:00 +08:00
781fa17993 [fix](Nereids) round function return type should be double (#15502) 2022-12-30 23:36:15 +08:00
100834df8b [fix](nereids) fix some arrgregate bugs in Nereids (#15326)
1. the agg function without distinct keyword should be a "merge" funcion in threePhaseAggregateWithDistinct
2. use aggregateParam.aggMode.consumeAggregateBuffer instead of aggregateParam.aggPhase.isGlobal() to indicate if a agg function is a "merge" function
3. add an AvgDistinctToSumDivCount rule to support avg(distinct xxx) in some case
4. AggregateExpression's nullable method should call inner function's nullable method.
5. add a bind slot rule to bind pattern "logicalSort(logicalHaving(logicalProject()))"
6. don't remove project node in PhysicalPlanTranslator
7. add a cast to bigint expr when count( distinct datelike type )
8. fallback to old optimizer if bitmap runtime filter is enabled.
9. fix exchange node mem leak
2022-12-30 23:07:37 +08:00
93a25e1af5 [fix](nereids) the project node is lost when creating PhysicalStorageLayerAggregate node (#15467) 2022-12-30 16:33:24 +08:00
edb9a3b58d [Bug](timediff) Fix wrong result for function timediff (#15312) 2022-12-30 00:28:51 +08:00
28bb13a026 [feature](light-schema-change) enable light schema change by default (#15344) 2022-12-28 09:29:26 +08:00
03aef7a8ac [fix](nereids) sender in union's child fragment has no destination (#15402)
1. always create an exchange node for set operation node's children
2. fix cast expr's nullability bug.
2022-12-27 22:36:54 +08:00
51b14c06d3 [enhancement](nereids) support approx_count_distinct function (#15406) 2022-12-27 22:25:21 +08:00
a1c6ea876f [fix](inbitmap) fix core dump caused by bitmap filter with union (#15333)
The join node need project operation to remove unnecessary columns from the output tuples.
For SetOperationNode output tuple and input tuple is consistent and do not need project,
but the children of SetOperationNode may be join nodes, so the children of the SetOperationNode
need to do the project operation.
2022-12-26 23:14:32 +08:00
6bec1ffc47 [feature](planner) remove restrict of offset without order by (#15218)
Support SELECT * FROM tbl LIMIT 5, 3;
2022-12-26 09:37:41 +08:00
8a810cd554 [fix](bitmapfilter) fix core dump caused by bitmap filter (#15296)
Do not push down the bitmap filter to a non-integer column
2022-12-23 16:42:45 +08:00
09a22813e4 [feature](Nereids) support syntax SELECT DISTINCT (#15197)
Add a new rule 'ProjectWithDistinctToAggregate' to support "select distinct xx from table".
This rule check's the logicalProject node's isDisinct property and replace the logicalProject node with a LogicalAggregate node.
So any rule before this, if createing a new logicalProject node, should make sure isDisinct property is correctly passed around.
please see rule BindSlotReference or BindFunction for example.
2022-12-22 23:54:08 +08:00
df5969ab58 [Feature] Support function roundBankers (#15154) 2022-12-22 22:53:09 +08:00
1fdd4172bd [fix](Inbitmap) fix in bitmap result error when left expr is constant (#15271)
* [fix](Inbitmap) fix in bitmap result error when left expr is constant

1. When left expr of the in predicate is a constant, instead of generating a bitmap filter, rewrite sql to use `bitmap_contains`.
  For example,"select k1, k2 from (select 2 k1, 11 k2) t where k1 in (select bitmap_col from bitmap_tbl)"
  => "select k1, k2 from (select 2 k1, 11 k2) t left semi join bitmap_tbl b on bitmap_contains(b.bitmap_col, t.k1)"

* add regression test
2022-12-22 19:25:09 +08:00
1ca1417824 [feature](multi-catalog) support show tables/table status from catalog.db (#15180)
support 'show tables from catalog.db' and 'show table status from catalog.db'
2022-12-22 09:22:40 +08:00
6712f1fc1d [fix](Nereids) encryption function with 4 params should auto-complate last param with config (#15038) 2022-12-20 13:55:54 +08:00
1597afcd67 [fix](mutil-catalog) fix get many same name db/table when show where (#15076)
when show databases/tables/table status where xxx, it will change a selectStmt to select result from 
information_schema, it need catalog info to scan schema table, otherwise may get many
database or table info from multi catalog.

for example
mysql> show databases where schema_name='test';
+----------+
| Database |
+----------+
| test |
| test |
+----------+

MySQL [internal.test]> show tables from test where table_name='test_dc';
+----------------+
| Tables_in_test |
+----------------+
| test_dc |
| test_dc |
+----------------+
2022-12-19 14:27:48 +08:00
6aba948df0 [fix](multi-catalog) hidden password for show create jdbc catalog (#15145)
when show create catalog of jdbc, it will show 'jdbc.password' plain text. fix it like other code that hidden password.
2022-12-17 17:20:17 +08:00
33abe11dea [regression-test](query) Add regression case of error could not be changed to nullabl when exe… (#15123)
* Add regression case of error could not be changed to nullabl when exeing sql

* add out file

Co-authored-by: smallhibiscus <844981280>
2022-12-16 21:57:36 +08:00
4dbe30d37b [regression](vectorized) delete vectorized config in regression tests (#15126) 2022-12-16 17:08:29 +08:00
5e0d44ff25 [fix](nereids) fix bug of expr rewrite and column prune rule of group by exprs (#15097) 2022-12-16 03:22:36 +08:00
21c2e485ae [improvment](function) add new function substring_index (#15024) 2022-12-15 09:54:34 +08:00
1200b22fd2 [function](round) compute accurate round value by decimal (#14946) 2022-12-13 09:53:43 +08:00
b5c0d4870d [fix](nereids)fix bug of elt and sub_replace function (#14971) 2022-12-12 17:37:36 +08:00
38570312dd [feature](split_by_string)support split by string function (#13741) 2022-12-12 15:22:30 +08:00
33349c3419 [feature](function)Support negative index for function split_part (#13914) 2022-12-12 09:56:09 +08:00
3286fb48ab [fix](if) fix coredump of if const (#14858) 2022-12-07 09:43:10 +08:00
6c70d794f6 [fix](bitmapfilter) fix core dump caused by bitmap filter (#14702) 2022-12-01 09:56:22 +08:00
a60490651f [improvement](function) add timezone cache for convert_tz (#14616) 2022-11-29 17:00:54 +08:00
7513c82431 [NLJoin](conjuncts) separate join conjuncts and general conjuncts (#14608) 2022-11-29 08:55:54 +08:00
529bdfb153 [Fix](function) Fix retention function return wrong value type (#14552)
MySQL [db]> SELECT SUM(a.r[1]) as active_user_num, SUM(a.r[2]) as active_user_num_1day, SUM(a.r[3]) as active_user_num_3day, SUM(a.r[4]) as active_user_num_7day FROM ( SELECT user_id, retention( day = '2022-11-01', day = '2022-11-02', day = '2022-11-04', day = '2022-11-07') as r FROM login_event WHERE (day >= '2022-11-01') AND (day <= '2022-11-21') GROUP BY user_id ) a;
ERROR 1105 (HY000): errCode = 2, detailMessage = sum requires a numeric parameter: sum(%element_extract%(a.r, 1))
2022-11-28 15:56:18 +08:00
38b4cbe253 [Bug](regression) regression fail random in fuzzy mode (#14614) 2022-11-27 09:23:36 +08:00
4728e75079 [feature](bitmap) Support in bitmap syntax and bitmap runtime filter (#14340)
1.Support in bitmap syntax, like 'where k1 in (select bitmap_column from tbl)';
2.Support bitmap runtime filter. Generate a bitmap filter using the right table bitmap and push it down to the left table storage layer for filtering.
2022-11-25 15:22:44 +08:00