Commit Graph

18263 Commits

Author SHA1 Message Date
9b8de017df [Regression test](inverted index) fix regression case for index_compound_directory_fault_injection (#28232) 2023-12-11 19:17:28 +08:00
c1f666c497 [doc] fix typo (#28245) 2023-12-11 18:09:54 +08:00
877935442f [feature](pipelineX)use markFragments instead of markInstances in pipelineX (#27829) 2023-12-11 17:59:53 +08:00
3e1e8d2ebe [fix](jdbc catalog) Fixed data conversion problem when all data is null (#28230) 2023-12-11 17:57:57 +08:00
cff1de29ce [fix](group commit) Fix group commit memory calculation (#28242) 2023-12-11 17:05:26 +08:00
ad483efca5 [regression-test](case) forbid group commit case first #28244 2023-12-11 16:06:59 +08:00
3c2e8b0ecf [fix](Nereids) rewrite cte children check wrong map for consumer (#28220) 2023-12-11 14:58:42 +08:00
c2d6fbbc85 [feature](Nereids): add filter edge in hyperGraph (#28006) 2023-12-11 14:36:43 +08:00
593cc92501 [chore] Change default max segment size to 1GB (#28201) 2023-12-11 14:30:57 +08:00
1bbc54d1b2 [regression-test](variant) change p2 case to s3 load (#28193) 2023-12-11 12:31:25 +08:00
ac167f493b [fix](join) fix decimal overflow caused by left outer join (#28221)
For left outer join or full outer join, when build side data is empty, null data is output for build side, but nested column data of nullable column is not properly initialized, which may cause decimal arithmetic overflow
2023-12-11 11:51:05 +08:00
7c163fdf21 [test](decimal) add some cases about overflow (#28198) 2023-12-11 11:22:53 +08:00
f2fd66ad3b [feature-wip](nereids) Make nereids more compatible with spark-sql syntax. (#27231)
**Thanks for** pr #21855 to provide a wonderful reference. 

Maybe it is very difficult and **cost-expensive** to implement **a comprehensive logical plan adapter**, maybe there is just some small syntax variations between doris and some other engines (such as hive/spark), so we can just **focus on** the **difference** here.

This pr mainly focus on the **syntax difference between doris and spark-sql**. For instance, do some function tranformations and override some syntax validations.

- add a dialect named `spark_sql`
- move method `NereidsParser#parseSQLWithDialect` to `TrinoParser`
- extract some `FnCallTransformer`/`FnCallTransformers` classes, so we can reuse the logic about the function transformers
- allow derived tables without alias when we set dialect to `spark_sql`(legacy and nereids parser are both supported)
- add some function transformers for hive/spark built-in functions

### Test case (from our online doris cluster)

- Test derived table without alias

```sql
MySQL [(none)]> show variables like '%dialect%';
+---------------+-------+---------------+---------+
| Variable_name | Value | Default_Value | Changed |
+---------------+-------+---------------+---------+
| sql_dialect   | spark_sql  | doris         | 1       |
+---------------+-------+---------------+---------+
1 row in set (0.01 sec)

MySQL [(none)]> select * from (select 1);
+------+
| 1    |
+------+
|    1 |
+------+
1 row in set (0.03 sec)

MySQL [(none)]> select __auto_generated_subquery_name.a from (select 1 as a);
+------+
| a    |
+------+
|    1 |
+------+
1 row in set (0.03 sec)

MySQL [(none)]> set sql_dialect=doris;
Query OK, 0 rows affected (0.02 sec)

MySQL [(none)]> select * from (select 1);
ERROR 1248 (42000): errCode = 2, detailMessage = Every derived table must have its own alias
MySQL [(none)]> 
```

- Test spark-sql/hive built-in functions

```sql
MySQL [(none)]> show global functions;
Empty set (0.01 sec)

MySQL [(none)]> show variables like '%dialect%';
+---------------+-------+---------------+---------+
| Variable_name | Value | Default_Value | Changed |
+---------------+-------+---------------+---------+
| sql_dialect   | spark_sql  | doris         | 1       |
+---------------+-------+---------------+---------+
1 row in set (0.01 sec)

MySQL [(none)]> select get_json_object('{"a":"b"}', '$.a');
+----------------------------------+
| json_extract('{"a":"b"}', '$.a') |
+----------------------------------+
| "b"                              |
+----------------------------------+
1 row in set (0.04 sec)

MySQL [(none)]> select split("a b c", " ");
+-------------------------------+
| split_by_string('a b c', ' ') |
+-------------------------------+
| ["a", "b", "c"]               |
+-------------------------------+
1 row in set (1.17 sec)
```
2023-12-11 11:16:53 +08:00
e1587537bc [Fix](status) fix unhandled status in exprs #28218
which marked static_cast<void> in https://github.com/apache/doris/pull/23395/files
partially fixed #28160
2023-12-11 11:04:58 +08:00
53802fe0da [doc] document desc param is incorrect #26063 (#26064) 2023-12-11 10:33:07 +08:00
f236261256 [fix](regression) compaction cases adapt force_olap_table_replica_num option (#28136) 2023-12-11 10:08:21 +08:00
8f2202c89d [minor](log) Add debug info in operators (#28211) 2023-12-11 10:02:24 +08:00
1e5ff40e17 [refactor](group commit) remove future block (#27720)
Co-authored-by: huanghaibin <284824253@qq.com>
2023-12-11 08:41:51 +08:00
320ddf4987 [pipelineX](improvement) Support multiple instances execution on single tablet (#28178) 2023-12-10 20:18:41 +08:00
485d7db516 [fix](partial update) Fix missing rowsets during doing alignment when flushing memtable due to compaction (#28062) 2023-12-10 12:09:48 +08:00
a3cd36ce60 [bug](cooldown) Fix incorrect remote rowset dir after restarting BE (#28140) 2023-12-10 00:44:01 +08:00
5aa90a3bce [pipelineX](local shuffle) Fix bucket hash shuffle (#28202) 2023-12-10 00:35:00 +08:00
61379b141e [fix](insert) fix group commit regression test (#28142) 2023-12-09 16:24:20 +08:00
4e86f9bab5 [improve](move-memtable) include and check offset when append data (#28159) 2023-12-09 16:21:36 +08:00
16e232a8a1 [minor](lower-table-names) use GlobalVariable.lowerCaseTableNames instead of Config.lower_case_table_names (#27911)
GlobalVariable.lowerCaseTableNames instead of Config.lower_case_table_names
2023-12-09 12:04:26 +08:00
363721e066 [Bug](udf) java-udf function open failed cause BE core dump #28063
when the java-udf open function failed, and some JNI have not set,
so in close function can't call jni.
2023-12-09 11:00:30 +08:00
42aa174405 [chore](log) Log to trace before wait rpc timeout #28024 2023-12-09 10:04:43 +08:00
9d9b6462bf [improve](group_commit) optimize group commit select be logic #28190
Group commit choose be always first no decommissioned be in all be.

Choose be with selectBackendIdsByPolicy like common stream load and do not choose decommissioned be may be better.
2023-12-09 05:09:52 +08:00
287bd87a4f [typo](docs)add some faq for flink-connector-doris (#26309)
* add flink-connector-doris faq

* add faq
2023-12-09 02:19:49 +08:00
bd8130154a [fix](doc) spell errors fixes hardware-info-action (#28154) 2023-12-09 01:47:19 +08:00
c6f8b1b2ee [fix](repository) the exist repo_file must contails same name with new repo (#27668)
The user manually adjusted the 'name' field in the __repo_info file under the repo file on S3, but did not modify the folder name. This led to an issue when the user created a repo with the same name as the folder in a certain cluster. The system parsed the 'name' field in the existing __repo_info and used an incorrect name, causing the subsequent repo to be unusable. A judgment has been added here: the 'name' field in the __repo_info must be the same as the new repo's name, otherwise, an error will be reported.
2023-12-09 01:46:54 +08:00
07336980f9 [fix](meta) show partitions with Limit for external HMS tables (27835) (#27835)
This enhancement shall extend existing logic for SHOW PARTITIONS FROM to include: -

Limit/Offset
Where [partition name only] [equal operator and like operator]
Order by [partition name only]
Issue Number: close #27834
2023-12-09 01:44:45 +08:00
99b38ddca7 [improve](env) Ensure next majority is met before drop an alive follower (#28101)
Here is an example:

```
mysql> ALTER SYSTEM DROP FOLLOWER "127.0.0.1:19017";
ERROR 1105 (HY000): errCode = 2, detailMessage = Unable to drop this alive
follower, because the quorum requirements are not met after this command
execution. Current num alive followers 2, num followers 3, majority after
execution 2
```
2023-12-09 01:41:38 +08:00
99be9d6ad3 [fix](memlimiter) refresh memtracker before flush active memtables (#28196)
Currently, _flush_active_memtables() is using stale memtracker data, especially when some other thread has just it.
Refresh memtrackers before flush to avoid this problem.
2023-12-09 01:40:51 +08:00
Pxl
027b06059a [Feature](materialized-view) support count(1) on materialized view (#28135)
support count(1) on materialized view
fix match failed like select k1, sum(k1) from t group by k1
2023-12-09 01:36:46 +08:00
b6e72d57c5 [Improvement](hms catalog) support show_create_database for hms catalog (#28145)
* [Improvement](hms catalog) support show_create_database for hms catalog

* update
2023-12-09 01:34:21 +08:00
055b3885c9 [Fix](inverted index) fix compound directory flush buffer error (#28191) 2023-12-09 00:57:35 +08:00
abc802b5ba [bugfix](core) child block is shared between operator and node, it should be shared ptr (#28106)
_child_block in nest loop join , table value function, repeat node will be shared between ExecNode and related operator, but it should not be a unique ptr in operator, it belongs to exec node.

It will double free the block, if operator's close method is not called correctly.

It should be a shared ptr, then it will not core even if the opeartor's close method is not called.
2023-12-09 00:18:14 +08:00
8eed760704 [fix](planner) separate table's isPartitioned() method (#28163)
This PR #27515 change the logic if Table's `isPartitioned()` method.
But this method has 2 usages:

1. To check whether a table is range or list partitioned, for some DML operation such as Alter, Export.

    For this case, it should return true if the table is range or list partitioned. even if it has only
    one partition and one buckets.

2. To check whether the data is distributed (either by partitions or by buckets), for query planner.

    For this case, it should return true if table has more than one bucket. Even if this table is not
    range or list partitioned, if it has more than one bucket, it should return true.

So we should separate this method into 2, for different usages.
Otherwise, it may cause some unreasonable plan shape
2023-12-08 23:15:45 +08:00
baf85547ae [feature](jdbc) support call function to pass sql directly to jdbc catalog #26492
Support a new stmt in Nereids:
`CALL EXECUTE_STMT("jdbc", "stmt")`

So that we can pass the origin stmt directly to the datasource of a jdbc catalog.

show case:
```
mysql> select * from mysql_catalog.db1.tbl1;
+------+------+
| k1   | k2   |
+------+------+
|  111 | 222  |
+------+------+
1 row in set (0.63 sec)

mysql> call execute("mysql_catalog", "insert into db1.tbl1 values(1,'abc')");
Query OK, 0 rows affected (0.01 sec)

mysql> select * from mysql_catalog.db1.tbl1;
+------+------+
| k1   | k2   |
+------+------+
|  111 | 222  |
|    1 | abc  |
+------+------+
2 rows in set (0.03 sec)

mysql> call execute_stmt("mysql_catalog", "delete from db1.tbl1 where k1=111");
Query OK, 0 rows affected (0.01 sec)

mysql> select * from mysql_catalog.db1.tbl1;
+------+------+
| k1   | k2   |
+------+------+
|    1 | abc  |
+------+------+
1 row in set (0.03 sec)
```
2023-12-08 23:06:05 +08:00
2b914aebb6 [opt](nereids)improve partition prune when Date function is used (#27960)
date func in partition prune
2023-12-08 21:53:39 +08:00
18ef131410 [fix](load) select more active memtables at once in memtable limiter (#28171) 2023-12-08 21:45:35 +08:00
06404114f1 [Fix](point query) fix memleak by increasing scanReplicaIds when using prepared statement (#28184)
OlapScanNode should release memory for `scanReplicaIds`
2023-12-08 21:02:01 +08:00
5e7afa768e [fix](statistics)Avoid potential NPE #28147 2023-12-08 20:42:17 +08:00
573b594df3 [improvement](Variant Type) Support displaying subcolumns expanded for the variant column (#27764) 2023-12-08 20:34:58 +08:00
51f320a606 [bug](function) fix array_apply function return wrong result (#28133) 2023-12-08 20:14:54 +08:00
0931eb536c Revert "[Improvement](auditlog) add column catalog for audit log and audit log table (#26403)" (#28177)
This reverts commit daea751a986823bf5858704663d58f49fd5dfb39.
2023-12-08 18:46:59 +08:00
75b55f8f2f [enhance](session)check invalid value when set parallel instance variables (#28141)
in some case, if set incorrectly, will be cause BE core dump

10:18:19   *** SIGFPE integer divide by zero (@0x564853c204c8) received by PID 2132555 
    int max_scanners =
            config::doris_scanner_thread_pool_thread_num / state->query_parallel_instance_num();
2023-12-08 17:38:48 +08:00
226a0c3b1d [chore](memory) Warning in log when turning on THP (#28122) 2023-12-08 17:38:38 +08:00
bc40025631 [opt](Nereids)Join cluster connectivity (#27833)
* estimation join stats by connectivity
2023-12-08 14:55:10 +08:00