* [enhancement](Nereids) Enable parse sql from sql cache (#33262)
Before this pr, the query must pass through parser, analyzer, rewriter, optimizer and translator, then we can check whether this query can use sql cache, if the query is too long, or the number of join tables too big, the plan time usually >= 500ms.
This pr reduce this time by skip the fashion plan path, because we can reuse the previous physical plan and query result if no any changed. In some cases we should not parse sql from sql cache, e.g. table structure changed, data changed, user policies changed, privileges changed, contains non-deterministic functions, and user variables changed.
In my test case: query a view which has lots of join and union, and the tables has empty partition, the query latency is about 3ms. if not parse sql from sql cache, the plan time is about 550ms
## Features
1. use Config.sql_cache_manage_num to control how many sql cache be reused in on fe
2. if explain plan appear some plans contains `LogicalSqlCache` or `PhysicalSqlCache`, it means the query can use sql cache, like this:
```sql
mysql> set enable_sql_cache=true;
Query OK, 0 rows affected (0.00 sec)
mysql> explain physical plan select * from test.t;
+----------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+----------------------------------------------------------------------------------+
| cost = 3.135 |
| PhysicalResultSink[53] ( outputExprs=[c1#0, c2#1] ) |
| +--PhysicalDistribute[50]@0 ( stats=3, distributionSpec=DistributionSpecGather ) |
| +--PhysicalOlapScan[t]@0 ( stats=3 ) |
+----------------------------------------------------------------------------------+
4 rows in set (0.02 sec)
mysql> select * from test.t;
+------+------+
| c1 | c2 |
+------+------+
| 1 | 2 |
| -2 | -2 |
| NULL | 30 |
+------+------+
3 rows in set (0.05 sec)
mysql> explain physical plan select * from test.t;
+-------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+-------------------------------------------------------------------------------------------+
| cost = 0.0 |
| PhysicalSqlCache[2] ( queryId=78511f515cda466b-95385d892d6c68d0, backend=127.0.0.1:9050 ) |
| +--PhysicalResultSink[52] ( outputExprs=[c1#0, c2#1] ) |
| +--PhysicalDistribute[49]@0 ( stats=3, distributionSpec=DistributionSpecGather ) |
| +--PhysicalOlapScan[t]@0 ( stats=3 ) |
+-------------------------------------------------------------------------------------------+
5 rows in set (0.01 sec)
```
(cherry picked from commit 03bd2a337d4a56ea9c91673b3bd4ae518ed10f20)
* fix
* [fix](Nereids) fix some sql cache consistence bug between multiple frontends (#33722)
fix some sql cache consistence bug between multiple frontends which introduced by [enhancement](Nereids) Enable parse sql from sql cache #33262, fix by use row policy as the part of sql cache key.
support dynamic update the num of fe manage sql cache key
(cherry picked from commit 90abd76f71e73702e49794d375ace4f27f834a30)
* [fix](Nereids) fix bug of dry run query with sql cache (#33799)
1. dry run query should not use sql cache
2. fix test sql cache in cloud mode
3. enable cache OneRowRelation and EmptyRelation in frontend to skip parse sql
(cherry picked from commit dc80ecf7f33da7b8c04832dee88abd09f7db9ffe)
* remove cloud mode
* remove @NotNull
* [feature](backup) ignore table that not support type when backup, and not report exception
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
* fix
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
---------
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
Refactor the config for log dir of FE and BE
TLDR:
- Use env variable `LOG_DIR` to set root log dir
- Remove `sys_log_dir` for FE and BE
Details:
1. FE
1. The root log dir is set by env variable `LOG_DIR` in `fe.conf`
2. The default value of `audit_log_dir` is same as `${LOG_DIR}/`
3. The default value of `spark_launcher_log_dir` is `${LOG_DIR}/spark_launcher_log`
4. The default value of `nereids_trace_log_dir` is `${LOG_DIR}/nereids_trace_log`
5. The origin `sys_log_dir` is deprecated, and default value is `""`.
But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.
2. BE
1. The root log dir is set by env variable `LOG_DIR` in `be.conf`
2. Remove `pipeline_tracing_log_dir`, use `${LOG_DIR}` directly.
3. The origin `sys_log_dir` is deprecated, and default value is `""`.
But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.
Issue Number: #31442
dependent on #32824
add ddl(create and drop) test
add ctas test
add complex type test
TODO:
bucketed table test
truncate test
add/drop partition test
Fix Reach limit of connections error
in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext.
Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout.
Fix bearer token evict log and exception.
TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH
1. rename old create/drop table to add/removeMemoryTable
2. add new create/drop table/db method
3. support hms catalog create/drop table/db
(cherry picked from commit b2e869c7414c68186de8d43b324ae736d7cc3463)
should always enable workload group because other operations depend on it for example MTMV, and spill to disk.
the normal workload group should be created in constructor.
* [Fix](Variant) support materialize view for variant and accessing variant subcolumns
1. fix schema change with path lost and lead to invalid data read
2. support element_at function in BE side and use simdjson to parse data
3. fix multi slot expression
# 1. Motivation
PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL.
Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL.
Reference documentation:
Hive: http://mail.hplsql.org
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715
Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html
# 2. Implementation
Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol.
```
CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int)
select count(*) from test;
select count(*) into result from test where k = name;
END
declare result INT default = 0;
call A(‘xxx’, result);
print result;
```

1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata.
2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>.
3. Execute Doris Statement
- Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax.
- Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult.
- Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables.
- Stored Programs compatible with Mysql protocol support multiple statements.
4. Execute PL-SQL Statement
- Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL.
# 3. TODO
1. Support drop procedure.
2. Create procedure only in `PlSqlOperation`.
3. Doris Parser supports declare variable.
4. Select Statement supports insert into variable.
5. Parameters and fields have the same name.
6. If Cursor exits halfway, will there be a memory leak?
7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters?
8. Supports complex types such as Map and Struct.
9. Test syntax such as Package.
10. Support UDF
11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN,
but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later.
12. Built-in functions require a separate management.
13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt.
14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt.
# 4. Some questions
1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error;
2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste.
3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented.
4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution.
5. The format of the result returned by Doris Statement is ```xxxx\n, xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results.
# 5. Some thoughts
The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process.
HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated.

[Opt] (multi-catalog) Opt split assignment to resolve uneven distribution. Currently only for `FileQueryScanNode`.
Referring to the implementation of Trino,
- Local node soft affinity optimization. Prefer local replication node.
- Remote split will use the consistent hash algorithm is used when the file cache is turned on, and because of the possible unevenness of the consistent hash, the split is re-adjusted so that the maximum and minimum split numbers of hosts differ by at most `max_split_num_variance` split.
- Remote split will use the round-robin algorithm is used when the file cache is turned off.