Commit Graph

8289 Commits

Author SHA1 Message Date
Pxl
09db427eed [Feature](materialized-view) support ignore not slot is null when count(slot) not has key in mv (#32912)
support ignore not slot is null when count(slot) not has key in mv
2024-04-10 11:59:36 +08:00
61e214c327 [Fix](Hive-Metastore) fix that if JDBC reads the NULL value, it will cause NPE (#32831) 2024-04-10 11:55:17 +08:00
caea45586f fix compile 2024-04-10 11:42:22 +08:00
fb910e5304 [fix](planner) retain groupingSlotIds as materialized for aggregate (#33060) 2024-04-10 11:34:30 +08:00
c5ab7ca573 [fix](planner) remove and retain input slot for aggregate slot which is not materialized (#33033)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2024-04-10 11:34:30 +08:00
Pxl
5b162a80f2 [Improvement](materialized-view) The materialized view can not involved auto increment column (#32885)
The materialized view can not involved auto increment column
2024-04-10 11:34:30 +08:00
d7c1c7dcd4 [fix](mtmv)partition limit #32978 2024-04-10 11:34:30 +08:00
59aa923bce [bug](function) fix milliseconds_diff function return wrong result (#32897)
* [bug](function) fix milliseconds_diff function return wrong result
2024-04-10 11:34:30 +08:00
fdb9500023 [fix](nereids) null-safe-eq runtime filter denies outer join #32927 2024-04-10 11:34:30 +08:00
1f1932c6b7 [enhancement](nereids)add some date functions for constant fold (#32772) 2024-04-10 11:34:30 +08:00
814e4ed3ec [fix](nereids)partition prune should consider <=> operator (#32965) 2024-04-10 11:34:30 +08:00
2ee6f28cec [fix](nereids)column name should be case insensitive when selecting mv (#33002) 2024-04-10 11:34:30 +08:00
a7be070021 [chore](session_variable) change parallel_scan_min_rows_per_scanner' default value to 16384 (#32939) 2024-04-10 11:34:30 +08:00
53309e32a9 [Improvement](execution) Use single phase execution commit if only 1 BE is used (#32937) 2024-04-10 11:34:30 +08:00
97a2977f2a [improvement](executor)Add tag property for workload group #32874 2024-04-10 11:34:29 +08:00
dcddd88e01 Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470) 2024-04-10 11:34:29 +08:00
0499d4013e Support identical column name in different index. (#32792) 2024-04-10 11:34:29 +08:00
407f8642da [Enhancement](data skew) extends show data skew (#32732) 2024-04-10 11:34:29 +08:00
ed0949f6c5 [fix](compile) fe cannot compile in idea (#32955) 2024-04-10 11:34:29 +08:00
e980cd3e7f [feature](Nereids): add ColumnPruningPostProcessor. (#32800) 2024-04-10 11:34:29 +08:00
26e86d53a4 [enhance](mtmv)support olap table partition column is null (#32698) 2024-04-10 11:34:29 +08:00
22a7fc3c55 [improvement](mtmv) Support to get tables in materialized view when collecting table in plan (#32797)
Support to get tables in materialized view when collecting table in plan

table scehma as fllowing:

create materialized view mv1
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
DISTRIBUTED BY RANDOM BUCKETS 1 
PROPERTIES ('replication_num' = '1')
 as 
select 
  t1.c1, 
  t3.c2 
from 
  table1 t1 
  inner join table3 t3 on t1.c1 = t3.c2

if get table from the plan as follwoing, we can get [table1, table3, table2], the mv1 is expanded to get base tables;

SELECT 
  mv1.*, 
  uuid() 
FROM 
  mv1 LEFT SEMI 
  JOIN table2 ON mv1.c1 = table2.c1 
WHERE 
  mv1.c1 IN (
    SELECT 
      c1 
    FROM 
      table2
  ) 
  OR mv1.c1 < 10
2024-04-10 11:34:29 +08:00
bb8bc75af4 [feature](agg) add aggregate function sum0 (#32541) 2024-04-10 11:34:29 +08:00
80cdc74908 [fix](arrow-flight) Fix reach limit of connections error (#32911)
Fix Reach limit of connections error
in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext.

Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout.

Fix bearer token evict log and exception.

TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH
2024-04-10 11:34:29 +08:00
d959291c98 [improvement](decommission be) decommission check replica num (#32748) 2024-04-10 11:34:28 +08:00
f5340039fc [fix](multicatalog) fix no data error when read hive table on cosn (#32815)
Currently, when reading a hive on cosn table, doris return empty result, but the table has data.
iceberg on cosn is ok.
The reason is misuse of cosn's file sytem. according to cosn's doc, its fs.cosn.impl should be org.apache.hadoop.fs.CosFileSystem
2024-04-10 11:34:28 +08:00
66536c2976 [fix](Nereids) NPE when create table with implicit index type (#32893) 2024-04-10 11:34:28 +08:00
dcfdbf0629 [chore](show) support statement to show views from table (#32358)
MySQL [test]> show views;
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
| t2_view        |
+----------------+
2 rows in set (0.00 sec)

MySQL [test]> show views like '%t1%';
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
+----------------+
1 row in set (0.01 sec)

MySQL [test]> show views where create_time > '2024-03-18';
+----------------+
| Tables_in_test |
+----------------+
| t2_view        |
+----------------+
1 row in set (0.02 sec)
2024-04-10 11:34:28 +08:00
96b995504c [enhancement](statistics) excluded delta rows num for rollup&mv tablets (#32568)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
Co-authored-by: tsy <tangsiyang2001@foxmail.com>
2024-04-10 11:34:28 +08:00
217514e5dd [minor](test) Add Iceberg hadoop catalog FE unit test (#32449)
For easy testing the behavior of Iceberg's HadoopCatalog.listNamespaces()
2024-04-10 11:34:28 +08:00
e574b35833 [Enhancement](partition) Refine some auto partition behaviours (#32737) (#33412)
fix legacy planner grammer
fix nereids planner parsing
fix cases
forbid auto range partition with null column
fix CreateTableStmt with auto partition and some partition items.
1 and 2 are about #31585
doc pr: apache/doris-website#488
2024-04-09 15:51:02 +08:00
7892e7300f [fix](external catalog) Reset external table creation status on log replay (#33393) 2024-04-08 23:17:15 +08:00
5e5fffe4e3 Set enable_unique_key_partial_update to false in statistics session variable. (#33220) 2024-04-08 16:49:58 +08:00
1f3ab4fd24 [fix](jdbc catalog) fix db2 test connection sql (#33335) 2024-04-08 09:05:44 +08:00
fae55e0e46 [Feature](information_schema) add processlist table for information_schema db (#32511) 2024-04-07 23:24:22 +08:00
b882704eaf [fix](Export) Set the default value of the data_consistence property of export to partition (#32830) 2024-04-07 23:24:22 +08:00
feb2f4fae8 [feature](local-tvf) support local tvf on shared storage (#33050)
Previously, local tvf can only query data on one BE node.
But if the storage is shared(eg, NAS), it can be executed on multi nodes.

This PR mainly changes:
1. Add a new property `"shared_stoage" = "false/true"`

    Default is false, if set to true, "backend_id" is optional. If "backend_id" is set,
    it still be executed on that BE, if not set, "shared_stoage" must be "true"
    and it will be executed on multi nodes.

Doc: https://github.com/apache/doris-website/pull/494
2024-04-07 22:17:28 +08:00
95da52b9d8 [fix](avro) avoid BE crash if avro scanner's dependency jars is mssing (#33031)
1. Check the return value of avro reader's init_fetch_table_schema_reader()
2. Also fix a bug but the parse exception of Nereids may suppress the real exception from old planner
    It will result unable to see the real error msg.
2024-04-07 22:17:16 +08:00
c758a25dd8 [opt](fqdn) Add DNS Cache for FE and BE (#32869)
In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname
each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client.
So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname.

This PR mainly changes:

1. Add DNSCache for both FE and BE.
    The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP.
    Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it
    and update the cache.
    In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may
    be changed at anytime.

There are other implements of this dns cache:

1.  36fed13997
    This is for BE side, but it does not handle the IP change case.

3. https://github.com/apache/doris/pull/28479
    This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change.
    And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
2024-04-07 22:16:04 +08:00
8bb2ef1668 [opt](iceberg) no need to check the name format of iceberg's database (#32977)
No need to check the name format of iceberg's database.
We should accept all databases.
2024-04-07 22:14:51 +08:00
e9b67bc82d [bugfix](paimon)merge meta-inf/services for paimon FileIOLoader (#33166)
We introduced paimon's oss and s3 packages, but did not register them in meta-info/service. As a result, when be used the s3  or oss interface, an error was reported and the class could not be found(`Could not find a file io implementation for scheme 's3' in the classpath.`).

FYI:
https://stackoverflow.com/questions/47310215/merging-meta-inf-services-files-with-maven-assembly-plugin
https://stackoverflow.com/questions/1607220/how-can-i-merge-resource-files-in-a-maven-assembly
2024-04-07 22:13:00 +08:00
d9d950d98e [fix](iceberg) fix iceberg predicate conversion bug (#33283)
Followup #32923

Some cases are not covered in #32923
2024-04-07 22:12:38 +08:00
190763e301 [bugfix](iceberg)Convert the datetime type in the predicate according to the target column (#32923)
Convert the datetime type in the predicate according to the target column.
And add a testcase for #32194
related #30478 #30162
2024-04-07 22:12:33 +08:00
32d6a4fdd5 [opt](rowcount) refresh external table's rowcount async (#32997)
In previous implementation, the row count cache will be expired after 10min(by default),
and after expiration, the next row count request will miss the cache, causing unstable query plan.

In this PR, the cache will be refreshed after Config.external_cache_expire_time_minutes_after_access,
so that the cache entry will remain fresh.
2024-04-07 22:11:14 +08:00
ebf45bff20 [fix](variables) change column type of @@autocommit to BIGINT (#33282)
Some of mysql connector (eg, dotnet MySQL.Data) rely on variable's column type to make connection.
eg, `select @@autocommit` should with column type `BIGINT`, not `BIT`, otherwise it will throw error like:

```
System.FormatException: The input string 'True' was not in a correct format.
   at System.Number.ThrowFormatException[TChar](ReadOnlySpan`1 value)
   at System.Convert.ToInt32(String value)
   at MySql.Data.MySqlClient.Driver.LoadCharacterSetsAsync(MySqlConnection connection, Boolean execAsync, CancellationToken cancellationToken)
   at MySql.Data.MySqlClient.Driver.ConfigureAsync(MySqlConnection connection, Boolean execAsync, CancellationToken cancellationToken)
   at MySql.Data.MySqlClient.MySqlConnection.OpenAsync(Boolean execAsync, CancellationToken cancellationToken)
   at MySql.Data.MySqlClient.MySqlConnection.Open()
```

In this PR, I add a new field of `VarAttr`: `convertBoolToLongMethod`, if set, it will convert boolean to long.
And set it for `autocommit`
2024-04-07 22:02:28 +08:00
132dbeda7f [BugFix](Iceberg Catalog) Fix iceberg catalog of hms and hadoop not support iceberg properties (#33113)
* fix iceberg catalog of  hms and hadoop not support iceberg properties

* remove unused import
2024-04-07 13:01:24 +08:00
62699c8eea [improve](function) the offset params in lead/lag function could use 0 (#33174) 2024-04-07 12:58:03 +08:00
0d0cb6d8a4 [fix](nereids)SimplifyRange didn't process NULL value correctly (#33296) 2024-04-07 11:02:32 +08:00
df8e397dd8 [Fix](executor)Fix normal group can not be appended when image exits #33197 2024-04-03 20:37:12 +08:00
Pxl
113bada7ed [Chore](runtime-filter) add check is broadcast on nlj (#33088)
add check is broadcast on nlj
2024-04-03 19:14:05 +08:00