Commit Graph

18429 Commits

Author SHA1 Message Date
Pxl
027b06059a [Feature](materialized-view) support count(1) on materialized view (#28135)
support count(1) on materialized view
fix match failed like select k1, sum(k1) from t group by k1
2023-12-09 01:36:46 +08:00
b6e72d57c5 [Improvement](hms catalog) support show_create_database for hms catalog (#28145)
* [Improvement](hms catalog) support show_create_database for hms catalog

* update
2023-12-09 01:34:21 +08:00
055b3885c9 [Fix](inverted index) fix compound directory flush buffer error (#28191) 2023-12-09 00:57:35 +08:00
abc802b5ba [bugfix](core) child block is shared between operator and node, it should be shared ptr (#28106)
_child_block in nest loop join , table value function, repeat node will be shared between ExecNode and related operator, but it should not be a unique ptr in operator, it belongs to exec node.

It will double free the block, if operator's close method is not called correctly.

It should be a shared ptr, then it will not core even if the opeartor's close method is not called.
2023-12-09 00:18:14 +08:00
8eed760704 [fix](planner) separate table's isPartitioned() method (#28163)
This PR #27515 change the logic if Table's `isPartitioned()` method.
But this method has 2 usages:

1. To check whether a table is range or list partitioned, for some DML operation such as Alter, Export.

    For this case, it should return true if the table is range or list partitioned. even if it has only
    one partition and one buckets.

2. To check whether the data is distributed (either by partitions or by buckets), for query planner.

    For this case, it should return true if table has more than one bucket. Even if this table is not
    range or list partitioned, if it has more than one bucket, it should return true.

So we should separate this method into 2, for different usages.
Otherwise, it may cause some unreasonable plan shape
2023-12-08 23:15:45 +08:00
baf85547ae [feature](jdbc) support call function to pass sql directly to jdbc catalog #26492
Support a new stmt in Nereids:
`CALL EXECUTE_STMT("jdbc", "stmt")`

So that we can pass the origin stmt directly to the datasource of a jdbc catalog.

show case:
```
mysql> select * from mysql_catalog.db1.tbl1;
+------+------+
| k1   | k2   |
+------+------+
|  111 | 222  |
+------+------+
1 row in set (0.63 sec)

mysql> call execute("mysql_catalog", "insert into db1.tbl1 values(1,'abc')");
Query OK, 0 rows affected (0.01 sec)

mysql> select * from mysql_catalog.db1.tbl1;
+------+------+
| k1   | k2   |
+------+------+
|  111 | 222  |
|    1 | abc  |
+------+------+
2 rows in set (0.03 sec)

mysql> call execute_stmt("mysql_catalog", "delete from db1.tbl1 where k1=111");
Query OK, 0 rows affected (0.01 sec)

mysql> select * from mysql_catalog.db1.tbl1;
+------+------+
| k1   | k2   |
+------+------+
|    1 | abc  |
+------+------+
1 row in set (0.03 sec)
```
2023-12-08 23:06:05 +08:00
2b914aebb6 [opt](nereids)improve partition prune when Date function is used (#27960)
date func in partition prune
2023-12-08 21:53:39 +08:00
18ef131410 [fix](load) select more active memtables at once in memtable limiter (#28171) 2023-12-08 21:45:35 +08:00
06404114f1 [Fix](point query) fix memleak by increasing scanReplicaIds when using prepared statement (#28184)
OlapScanNode should release memory for `scanReplicaIds`
2023-12-08 21:02:01 +08:00
5e7afa768e [fix](statistics)Avoid potential NPE #28147 2023-12-08 20:42:17 +08:00
573b594df3 [improvement](Variant Type) Support displaying subcolumns expanded for the variant column (#27764) 2023-12-08 20:34:58 +08:00
51f320a606 [bug](function) fix array_apply function return wrong result (#28133) 2023-12-08 20:14:54 +08:00
0931eb536c Revert "[Improvement](auditlog) add column catalog for audit log and audit log table (#26403)" (#28177)
This reverts commit daea751a986823bf5858704663d58f49fd5dfb39.
2023-12-08 18:46:59 +08:00
75b55f8f2f [enhance](session)check invalid value when set parallel instance variables (#28141)
in some case, if set incorrectly, will be cause BE core dump

10:18:19   *** SIGFPE integer divide by zero (@0x564853c204c8) received by PID 2132555 
    int max_scanners =
            config::doris_scanner_thread_pool_thread_num / state->query_parallel_instance_num();
2023-12-08 17:38:48 +08:00
226a0c3b1d [chore](memory) Warning in log when turning on THP (#28122) 2023-12-08 17:38:38 +08:00
bc40025631 [opt](Nereids)Join cluster connectivity (#27833)
* estimation join stats by connectivity
2023-12-08 14:55:10 +08:00
6da36e1077 [feature](merge-cloud) Refactor write path code by abstract base class (#26537)
Refactor write path code by abstract base class. Whether to use `StorageEngine` or `CloudStorageEngine` will be determined during compilation instead of runtime `config::cloud_mode` to avoid unexpected null pointer or undefined behavior issues caused by merging code.

Class that depend on `StorageEngine` but are shared by the cloud mode need to have an abstract base class. Common code should be extracted into the base class, while the code that depends on `StorageEngine` should be implemented in a `StorageEngine` mix-in class of the base class.
2023-12-08 14:50:36 +08:00
16230b5ebd [Enhance](multi-catalog) parse hive view ddl first to avoid NPE. (#28067) 2023-12-08 13:54:50 +08:00
61d556c718 [fix](nereids)runtime filter translator failed on set operator (#28102)
* runtime filter translator failed on set operator
2023-12-08 12:58:42 +08:00
341822ec05 [regression-test](Variant) add compaction case for variant and fix bugs (#28066) 2023-12-08 12:18:46 +08:00
59ec3da899 open workload group in PR pipeline (#27744) 2023-12-08 11:56:03 +08:00
ebed055d2b [chore](clone) rename clone request field (#27591) 2023-12-08 11:53:57 +08:00
d534cdf027 [compile](BE) let arm gcc know some function no return (#28157)
let arm gcc know some function no return
2023-12-08 11:32:08 +08:00
cd108688c1 [Chore](docs)Fix job error docs (#28127) 2023-12-08 10:24:21 +08:00
0947bf4e97 [opt](mysql serde) Avoid core dump when converting invalid block to mysql result (#28069)
BE will core dump if result block is invalid when we doing result serialization.
An existing bug case is described in #28030, so we add check branch to avoid BE core dump due to out of range related problem.
2023-12-08 10:21:09 +08:00
25b90eb782 [Feature](function) support random int from specific range (#28076)
mysql> select rand(-20, -10);
+------------------+
| random(-20, -10) |
+------------------+
|              -13 |
+------------------+
1 row in set (0.10 sec)
2023-12-08 10:15:25 +08:00
e75d91c91b [regression-test](Variant) Add more cases related to schema changes (#27958)
* [regression-test](Variant) Add more cases related to schema changes

And fix bugs about schema change for variant:
fix bug schema change crash on doing schema change with tablet schema that contains extracted columns
2023-12-08 10:15:12 +08:00
1d345877ce [fix](regression-test) load_to_single_tablet assertEquals usage (#28128) 2023-12-08 10:09:44 +08:00
d8d8f15bf3 [improvement](vectorization) Use requires instead of specialization for doris::vectorized::Decimal (#28027)
Use requires instead of specialization for doris::vectorized::Decimal
2023-12-08 09:59:52 +08:00
9461e86b10 [pipelineX](debug) add debug string (#28137)
* [pipelineX](debug) add debug string

* update
2023-12-07 23:21:10 +08:00
66ed093410 [test](Nereids): fix test push_down_top_n (#26937) 2023-12-07 23:07:32 +08:00
cbb238a0ff [improve](env) Add disk usage in not ready msg (#28125) 2023-12-07 22:49:52 +08:00
f9d4690023 [improve](stack_trace) avoid print stack trace in csv and json reader #28129 2023-12-07 22:45:18 +08:00
81a0f8c041 [Feature](function) support generating const values from tvf numbers (#28051)
If specified, got a column of constant. otherwise an incremental series like it always be.

mysql> select * from numbers("number" = "5", "const_value" = "-123");
+--------+
| number |
+--------+
|   -123 |
|   -123 |
|   -123 |
|   -123 |
|   -123 |
+--------+
5 rows in set (0.11 sec)
2023-12-07 22:26:43 +08:00
397a401241 [fix](arrow-flight) Modify FE Arrow version to 14.0.1 #28093
Previously temporarily upgrade Arrow to dev version 15.0.0-SNAPSHOT, because the latest release version Arrow 14.0.1 jdbc:arrow-flight-sql has BUG, jdbc:arrow-flight-sql cannot be used normally, see: apache/arrow#38785

But Arrow 15.0.0-SNAPSHOT was not published to the Maven central repository, and the network could not be connected sometimes, so back to Arrow 14.0.1. jdbc:arrow-flight-sql will be supported after upgrading to Arrow 15.0.0 release version.
2023-12-07 22:25:08 +08:00
a2d66911cd [chore](docs) Fix partition cache design principles #28110 2023-12-07 22:23:46 +08:00
b1c5519aa8 [doc](statistics)Update external catalog statistics doc (#28123) 2023-12-07 21:33:05 +08:00
104a822a2f [Refacotr](RuntimeFilter) refactor rf code to improve performance (#28094) 2023-12-07 20:32:30 +08:00
be81eb1a9b [feature](nereids) Support inner join query rewrite by materialized view (#27922)
Work in process. Support inner join query rewrite by materialized view in some scene.
Such as an exmple as following:

> mv = "select  lineitem.L_LINENUMBER, orders.O_CUSTKEY " +
>             "from orders " +
>             "inner join lineitem on lineitem.L_ORDERKEY = orders.O_ORDERKEY "
>     query = "select lineitem.L_LINENUMBER " +
>             "from lineitem " +
>             "inner join orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY "
2023-12-07 20:29:51 +08:00
f37215a32a [fix](Nereids) insert into target table lock should include finalize (#28085) 2023-12-07 20:15:12 +08:00
65fc2e0438 [fix](Nereids) forbid two TVF in one fragment since the limit of coordinator (#28114) 2023-12-07 19:58:31 +08:00
cc9b4bcddb [Fix](variant) fallback to none partial update for mow table (#28116) 2023-12-07 19:30:24 +08:00
942450a2e5 [Fix](Variant) ColumnObject need to be finalized when doing ColumnObject::update_hash_with_value (#28119)
Otherwise accessing rows at `n` will lead to heap buffer overflow

```
 5# SipHash::update(char const*, unsigned long) at /home/zcp/repo_center/doris_master/doris/be/src/vec/common/sip_hash.h:132
 6# doris::vectorized::ColumnString::update_hash_with_value(unsigned long, SipHash&) const at /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_string.h:452
 7# doris::vectorized::ColumnObject::update_hash_with_value(unsigned long, SipHash&) const at /home/zcp/repo_center/doris_master/doris/be/src/vec/columns/column_object.cpp:1433
 8# doris::vectorized::Block::update_hash(SipHash&) const at /home/zcp/repo_center/doris_master/doris/be/src/vec/core/block.cpp:721
 9# doris::EngineChecksumTask::_compute_checksum() at
```
2023-12-07 18:48:05 +08:00
34642781c2 [fix](meta) fix ConcurrentModificationException when dump image (#28072)
```
Caused by: java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) ~[?:1.8.0_131]
        at java.util.HashMap$EntryIterator.next(HashMap.java:1471) ~[?:1.8.0_131]
        at java.util.HashMap$EntryIterator.next(HashMap.java:1469) ~[?:1.8.0_131]
        at org.apache.doris.catalog.CatalogRecycleBin.write(CatalogRecycleBin.java:1047) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.saveRecycleBin(Env.java:2298) ~[doris-fe.jar:1.2-SNAPSHOT]
```

When calling `/dump` api to dump image, ConcurrentModificationException may be thrown.
Because no lock to protect `CatalogRecycleBin`
2023-12-07 18:26:02 +08:00
3dcbf16404 [Fix](Outfile) The Struct type data exported from select outfile to the csv file format should contain a column name #28068
If the original data is:
```sql
+-----------------------------------------------------+
| s_info                                              |
+-----------------------------------------------------+
| {"s_id": 2, "s_name": "nereids", "s_address": "20"} |
| {"s_id": 1, "s_name": "doris", "s_address": "18"}   |
+-----------------------------------------------------+
```

In the original logic, the struct type data exported to a csv file format did not contain column names,like
```
{2, "nereids", "20"} 
{1, "doris", "18"}
```

This pr do not need to be merged into branch-2.0
2023-12-07 18:23:36 +08:00
394b420180 [Update](inverted index) use session variable for inverted index try query threshold (#28052)
* [Update](inverted index) use session variable for inverted index try query threshold

* remove unused config

* update clucene
2023-12-07 17:54:44 +08:00
172747669e [fix](Nereids)fix regression case:nereids_rules_p0/transposeJoin/transposeSemiJoinAgg #28111 2023-12-07 17:41:08 +08:00
a27c068a9d [improve](move-memtable) make StreamWait time configurable (#28086) 2023-12-07 17:27:43 +08:00
84a651d976 [improve](load) rewrite memtable memory limiter rules (#27759) 2023-12-07 17:26:26 +08:00
bc12a05915 [fix](Nereids) explain graph insert-select NPE (#28007) 2023-12-07 17:25:44 +08:00