Commit Graph

3123 Commits

Author SHA1 Message Date
f42db08ccc [fix](Nereids) Fixed a problem with completing ClusterName (#18366) 2023-04-07 13:35:03 +08:00
b1956b42fb [enhancement](Nereids) disable heavy operator penalty in cost model v1 (#18422) 2023-04-07 13:16:59 +08:00
a7b708263d [fix](nereids) move validate data types before EliminateUnnecessaryProject rule (#18393)
validate supported data types checks if a project node's output contains any unsupported data types like array, map, etc in nereids. So this validation should run before EliminateUnnecessaryProject rule
2023-04-07 13:10:18 +08:00
2783b27788 [fix](Nereids): fix LogicalProject withXXX(). (#18441) 2023-04-07 12:38:53 +08:00
Pxl
e77da1519a [Enchancement](materialized-view) adjust desc table all display fields (#18357)
adjust desc table all display fields
2023-04-07 11:14:17 +08:00
Pxl
7631a8fb39 [Bug](materialized-view) fix mv define expr persistence replay incorrect after schema change (#18418)
fix mv define expr persistence replay incorrect after schema change
2023-04-07 11:11:55 +08:00
Pxl
267b690dad [Bug](materialized-view) fix materialized-view query match not consider with order by elements (#18384)
fix materialized-view query match not consider with order by elements
2023-04-07 11:11:18 +08:00
505f25c580 [fix](planner)use base index if the where clause is a constant value (#18367)
sql : select bitmap_empty() from d_table where true;
should always use base index instead of any mv, because the conjuncts is constant (true) and use none of the column from any mv
2023-04-07 09:15:00 +08:00
759f1da32e [Enhencement](Backends) add HostName filed in backends table and delete backends table in information_schema (#18156)
1.  Add `HostName` field for `show backends` statement and `backends()` tvf.
2. delete the `backends` table in `information_schema` database
2023-04-07 08:30:42 +08:00
22deeecbe1 [Improvement](multi catalog)Cache File for Hive Table, instead of cache file splits. (#18419)
Currently, the session variable for Split size will not take effect after the file splits are cached.
1. This PR is to cache file for Hive Table, instead of cache file splits. And split the file every time using the current split size.
2. Use self splitter by default.
2023-04-07 00:07:23 +08:00
981ead9032 [feature](Nereids) support binary arithmetic function (#18213)
support binary arithmetic functions like:

add(op1, op2) -> op1 + op2
subtract(op1, op2) -> op1 - op2
multiply(op1, op2) -> op1 * op2
divide(op1, op2) -> op1 / op2
mod(op1, op2) -> op1 % op2
2023-04-06 16:57:04 +08:00
33ae4524ce [fix](multi-catalog) Fix properties check in S3Storage and add hive socket timeout config (#18420)
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2023-04-06 16:35:24 +08:00
27576ef8dc [fix](stats) Fix analyze table failed (#18386) 2023-04-06 15:45:09 +08:00
591f76a6a4 [fix](alter inverted index) Temporary deal with add or drop inverted index by directly schema change (#18378)
In the current implementation of the function of dynamically add and drop inverted index, there is a problem that the inverted index information of historical data is out of date after compaction on the base tablet.

In the future, I will submit PRs to solve this problem. Now, temporarily add or drop inverted index by the directly schema change logic
2023-04-06 15:07:37 +08:00
a474be6d03 [Bug](ES): Es object mapping error (#18382)
Issue Number: close #18379
2023-04-06 14:11:09 +08:00
db766bb073 [fix](planner) decimalv2 castTo decimalv2 should change type directly (#18297) 2023-04-06 13:51:50 +08:00
8b61709ec8 [feature](multi-catalog) support select current_catalog(); (#18163) 2023-04-06 12:06:10 +08:00
4ec6aa1691 [fix](planner) trying register constnat slotRef to table cause NPE (#18356)
could reproduced by:

CREATE TABLE t (
name varchar(128)
) ENGINE=OLAP
UNIQUE KEY(name)
DISTRIBUTED BY HASH(name) BUCKETS 1;

insert into t values('abc');

SELECT cd
FROM
(SELECT cast(now() as string) cd FROM t) t1
JOIN
(select cast(now() as string) td from t GROUP BY now()) t2
ON t1.cd = t2.td;

ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: null
2023-04-06 11:50:12 +08:00
9a916cffe4 [Optimization](String) Optimize the injection of statistics. #18401
1. Remove useless partition statistics injection.
2. Adding judgment logic to avoid exception during numerical transformation.
2023-04-06 11:42:11 +08:00
f73189860f [tpch](nereids) trustable join condition (#18272)
(affects tpch q14/7/9)
1. equation estimation confidence level
For equation, if any side is almost unique, its estimation confidence is high, we call it trustable condition.
if a join contains more than one un-trustable condition, we only use the one whose selectivity is biggest in order to avoid error propagation.

2. like expression estimation factor: 0.2
give a separate default shrink ratio for like operator, default ratio is 0.2

3. disable fat-child-penalty
set HEAVY_OPERATOR_PUNISH_FACTOR=1
this change affect tpch q15. This factor should be adaptive to the implementation of BE.
2023-04-06 11:20:47 +08:00
Pxl
76d76f672c [Chore](build) enchancement for backend build time usage (#18344) 2023-04-06 11:13:21 +08:00
d0219180a9 [feature-wip](multi-catalog)add properties converter (#18005)
Refactor properties of each cloud , use property converter to convert properties accessing fe
metadata and be data.
user docs #18287
2023-04-06 09:55:30 +08:00
60bad33e7e [fix](nereids) explain shape refactor #18399
previous pr 18296 has a bug when parse SHAPE_PLAN.
2023-04-06 08:55:05 +08:00
1ec400c786 [fix](SSL) fix ssl connection buffer overflow (#18359) 2023-04-05 08:42:41 +08:00
ea60d65384 [Improvement](multi catalog)Move split size config to session variable (#18355)
Move split size config to session variable. Before, it was in Config class, user need to restart FE after change it.
2023-04-05 01:02:47 +08:00
7f8d92656e [fix](streamload) fix stream load failed when enable profile (#18364)
#18015 enables stream load profile log,  however be will encounter rpc fail when loading tpch data(see #18291). This is because when `is_report_success` is true, be will reportExecStatus to fe, but fe cannot find QueryInfo in `coordinatorMap`, thus it will return error to be.
2023-04-05 01:01:46 +08:00
d8b293de07 [fix](multi-catalog) add catalog info for show proc (#18276)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-04-04 22:49:22 +08:00
7c36bef6bc [Feature-Wip](MySQL Load)Show load warning for my sql load (#18224)
1. Support the show load warnings for mysql load to get the detail error message.
2. Fix fillByteBufferAsync not mark the load as finished in same data load
3. Fix drain data only in client mode.
2023-04-04 22:44:48 +08:00
3fc8c19735 [improve](nereids)compute statsRange.length() according to the column datatype (#18331)
we map date/datetime/V2 to double. this map reserves date order, but it does not reserve range length.
For example, from 1990-01-01 to 1991-01-01, there are 12 months. for filter `A < 1990-02-01`, the selectivity
should be `1/12`.

if we compute this filter by their corresponding double value,
`sel = (19900201 - 19900101) / (19910101 - 19900101) = 100/10000 = 1/100`

the error is about 10 times.
This pr aims to fix this error.
Describe your changes.

Solution:
convert double to its corresponding dataType(date/datev2), then compute the range length with respect to its datatype.
2023-04-04 14:20:34 +08:00
6231ca80f7 [improve](clickhouse catalog) Add " wrap select column for the sql query clickhouse jdbc (#18352) 2023-04-04 10:19:24 +08:00
3e7a9424e4 [feature](nereids) explain shape plan (#18296)
`explain shape plan select ...`
only print plan shape related information, including
- node name
- join type, join condition
- filter condition
- agg phase

It is painful to maintain regression cases using explain since there are a lot of mutable information, like slot id.
By this pr, we could use explain shape plan in regression cases.

for exmaple:
this is tpch q2
+-----------------------------------------------------------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------------------------------------------------------+
| PhysicalTopN |
| --PhysicalDistribute |
| ----PhysicalTopN |
| ------PhysicalProject |
| --------filter((cast(ps_supplycost as DECIMAL(27, 9)) = min(ps_supplycost) OVER(PARTITION BY p_partkey))) |
| ----------PhysicalWindow |
| ------------PhysicalQuickSort |
| --------------PhysicalProject |
| ----------------hashJoin[INNER_JOIN](supplier.s_suppkey = partsupp.ps_suppkey) |
| ------------------PhysicalProject |
| --------------------hashJoin[INNER_JOIN](part.p_partkey = partsupp.ps_partkey) |
| ----------------------PhysicalProject |
| ------------------------PhysicalOlapScan[partsupp] |
| ----------------------PhysicalProject |
| ------------------------filter((part.p_size = 15)(p_type like '%BRASS')) |
| --------------------------PhysicalOlapScan[part] |
| ------------------PhysicalDistribute |
| --------------------hashJoin[INNER_JOIN](supplier.s_nationkey = nation.n_nationkey) |
| ----------------------PhysicalOlapScan[supplier] |
| ----------------------PhysicalDistribute |
| ------------------------hashJoin[INNER_JOIN](nation.n_regionkey = region.r_regionkey) |
| --------------------------PhysicalProject |
| ----------------------------PhysicalOlapScan[nation] |
| --------------------------PhysicalDistribute |
| ----------------------------PhysicalProject |
| ------------------------------filter((region.r_name = 'EUROPE')) |
| --------------------------------PhysicalOlapScan[region] |
+-----------------------------------------------------------------------------------------------------------+
2023-04-04 09:44:15 +08:00
798d2e5160 [fix](catalog) all properties should be checked when create unpartitioned table (#18149)
all properties should be checked when create unpartitioned table like partitioned table.



Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-04-04 08:53:45 +08:00
8b85c55117 [vectorized](function) Support array_shuffle and shuffle function. (#18116)
---------

Co-authored-by: zhangyu209 <zhangyu209@meituan.com>
2023-04-04 08:53:13 +08:00
88c5e64c4a [fix](nereids) fix bug of SelectMaterializedIndexWithAggregate rule (#18265)
1. create a project node to adjust the output column position when a mv is selected in olap scan node
2. pass SlotReference's column info when call Alias's toSlot() method
3. should compare plan's logical properties when compare two plans after rewrite
2023-04-03 22:32:43 +08:00
96a64dc9e8 [Improvement](pipeline) Use bloom runtime filter by default for pipeline engine (#18177) 2023-04-03 15:31:48 +08:00
aff260c06f [Enhancement](HttpServer) Support https interface (#16834)
1. Organize http documents
2. Add http interface authentication for FE
3. **Support https interface for FE**
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
2023-04-03 14:18:17 +08:00
ecd3fd07f6 [feature](colocate) support cross database colocate join (#18152) 2023-04-03 14:03:42 +08:00
e260dca7a1 [Improvement](multi catalog)Change hive metastore cache split value type to Doris defined Split. Fix split file length -1 bug (#18319)
HiveMetastoreCache type for file split was Hadoop InputSplit. In this pr, change it to Doris defined Split
This change could avoid convert it every time.
Also fix the explain verbose result return -1 for split file length.
2023-04-03 13:54:28 +08:00
961f5d1bb7 [feature](function)Add St_Angle/St_Azimuth function (#18293)
Add St_Angle/St_azimuth function:
St_Angle:
Enter three point, which represent two intersecting lines. Returns the angle between these lines. Point 2 and point 1 represent the first line and point 2 and point 3 represent the second line. The angle between these lines is in radians, in the range [0, 2pi). The angle is measured clockwise from the first line to the second line.

`

mysql> SELECT ST_Angle(ST_Point(1, 0),ST_Point(0, 0),ST_Point(0, 1));
+----------------------------------------------------------------------+
| st_angle(st_point(1.0, 0.0), st_point(0.0, 0.0), st_point(0.0, 1.0)) |
+----------------------------------------------------------------------+
| 4.71238898038469 |
+----------------------------------------------------------------------+
1 row in set (0.04 sec)
`

St_azimuth:
Enter two point, and returns the azimuth of the line segment formed by points 1 and 2. The azimuth is the angle in radians measured between the line from point 1 facing true North to the line segment from point 1 to point 2.
`

mysql> SELECT st_azimuth(ST_Point(0, 0),ST_Point(1, 0));
+----------------------------------------------------+
| st_azimuth(st_point(0.0, 0.0), st_point(1.0, 0.0)) |
+----------------------------------------------------+
| 1.5707963267948966 |
+----------------------------------------------------+
1 row in set (0.04 sec)
2023-04-03 13:01:59 +08:00
Pxl
e77833bfa1 [Bug](materialized-view) fix where clause persistence replay incorrect (#18228)
fix where clause persistence replay incorrect
2023-04-03 12:49:01 +08:00
ce4dc681be [test](stats) Test framework for stats estimation on TPCH-1G dataset (#18267)
Implement a test framework for stats estimation on TPCH-1G dataset to ensure accuracy
2023-04-03 11:01:57 +08:00
2bce4db81a [Enchancement](mysql-compatable) add regression-test for MySQLdump #18208
add regression-test for like this:
mysqldump -h127.0.0.1 -P9030 -uroot --no-tablespaces --databases > /backup/mysqldump/test.db

To prevent errors Unknown table 'column_statistics' in information_schema (1109), the table information_schema.column_statistics was added.
2023-04-03 09:49:07 +08:00
b9381570d6 [feature](nereids) semi and anti join estimation (#18129)
in this pr, we add a new algorithm to estimate semi/anti join row count.
In original alg., we reduce row count from cross join. usually, this is not good.
for example, L left semi join R on L.a=R.a
suppose L is larger than R, and ndv(L.a) < ndv(R.a)
the estimated row count is rowcount(R) * rowcount(L) / ndv(R.a). in most cases, the estimated row count is larger than rowcount(L).

in new alg, we use ndv(R.a)/originalNdv(R.a) to estimate result rowCount. the basic idea is as following:
1. Suppose ndv(R.a) reduced from m to n.
2. Assume that the value space of L.a is the same as R.a if R.a is not filtered.(this assumption is also hold in original alg.)
regard `L left join R` as a filter applied on L, that is, if L.a is in R.a, then this tuple stays in result.
R.a shrinks to m/n, so L.a also shrinks to m/n
2023-04-03 09:11:10 +08:00
7131c60e05 [fix](audit-log) fixslow query missing in audit log (#18317)
#17738 changed the column name in audit log, causing "slow_query" will not be recorded in fe.audit.log
2023-04-03 08:52:14 +08:00
4fcd93ac00 [Enhancement](Nereids)add datelikev2 type support for fold constant. #18275
add datelikev2 type support for fold constant.
date_add / years_add / mouths_add / days_add / hours_add / minutes_add / seconds_add and xxx_sub.
2023-04-03 08:47:47 +08:00
7d49d9cf99 [improvement](dynamic partition) Fix dynamic partition no bucket (#18300)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-04-02 15:51:21 +08:00
97aab138aa [fix](parquet-reader) reset value idx in bool rle decoder and support iceberg datetime(3) (#18245)
1. Fix value  idx in bool rle decoder 
2. Iceberg table support datetimev2(3).  In the previous version, we converted hive timestamp to datetimev2(0) default.
2023-04-01 21:00:01 +08:00
9e087622ab [fix](Nereids): fix JoinReorderContext in withXXX() of LogicalJoin. (#18299) 2023-04-01 16:51:27 +08:00
7e61a85331 [refactor](libhdfs) introduce hadoop libhdfs (#18204)
1. Introduce hadoop libhdfs 
2. For Linux-X86 platform, use the hadoop libhdfs
3. For other platform, use libhdfs3, because currently we don't have  hadoop libhdfs binary for other platform

Co-authored-by: adonis0147 <adonis0147@gmail.com>
2023-03-31 18:41:39 +08:00
3ea98b65df [Fix](Nereids) fix nereids failed to parse set operation with query in parenthesis (#18062)
sql like the format (q1, q2, q3 is a query): 

``` sql
(q1) 
UNION ALL (q2)
UNION ALL (q3)
ORDER BY keys
```
cannot be parsed by nereids, because order will be recognized as an alias of query, we add queryOrganization to avoid it.
2023-03-31 15:55:52 +08:00