this pr can improve the performance of the nereids planner, in plan stage.
1. refactor expression rewriter to pattern match, so the lots of expression rewrite rules can criss-crossed apply in a big bottom-up iteration, and rewrite until the expression became stable. now we can process more cases because original there has no loop, and sometimes only process the top expression, like `SimplifyArithmeticRule`.
2. replace `Collection.stream()` to `ImmutableXxx.Builder` to avoid useless method call
3. loop unrolling some codes, like `Expression.<init>`, `PlanTreeRewriteBottomUpJob.pushChildrenJobs`
4. use type/arity specified-code, like `OneRangePartitionEvaluator.toNereidsLiterals()`, `PartitionRangeExpander.tryExpandRange()`, `PartitionRangeExpander.enumerableCount()`
5. refactor `ExtractCommonFactorRule`, now we can extract more cases, and I fix the deed loop when use `ExtractCommonFactorRule` and `SimplifyRange` in one iterative, because `SimplifyRange` generate right deep tree, but `ExtractCommonFactorRule` generate left deep tree
6. refactor `FoldConstantRuleOnFE`, support visitor/pattern match mode, in ExpressionNormalization, pattern match can criss-crossed apply with other rules; in PartitionPruner, visitor can evaluate expression faster
7. lazy compute and cache some operation
8. use int field to compare date
9. use BitSet to find disableNereidsRules
10. two level loop usually faster then build Multimap when bind slot in Scope, so I revert the code
11. `PlanTreeRewriteBottomUpJob` don't need to clearStatePhase any more
### test case
100 threads parallel continuous send this sql which query an empty table, test in my mac machine(m2 chip, 8 core), enable sql cache
```sql
select count(1),date_format(time_col,'%Y%m%d'),varchar_col1
from tbl
where partition_date>'2024-02-15' and (varchar_col2 ='73130' or varchar_col3='73130') and time_col>'2024-03-04'
and time_col<'2024-03-05'
group by date_format(time_col,'%Y%m%d'),varchar_col1
order by date_format(time_col,'%Y%m%d') desc, varchar_col1 desc,count(1) asc
limit 1000
```
before this pr: 3100 peak QPS, about 2700 avg QPS
after this pr: 4800 peak QPS, about 4400 avg QPS
(cherry picked from commit 7338683fdbdf77711f2ce61e580c19f4ea100723)
Currently, when reading a hive on cosn table, doris return empty result, but the table has data.
iceberg on cosn is ok.
The reason is misuse of cosn's file sytem. according to cosn's doc, its fs.cosn.impl should be org.apache.hadoop.fs.CosFileSystem
# 1. Motivation
PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL.
Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL.
Reference documentation:
Hive: http://mail.hplsql.org
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715
Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html
# 2. Implementation
Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol.
```
CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int)
select count(*) from test;
select count(*) into result from test where k = name;
END
declare result INT default = 0;
call A(‘xxx’, result);
print result;
```

1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata.
2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>.
3. Execute Doris Statement
- Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax.
- Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult.
- Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables.
- Stored Programs compatible with Mysql protocol support multiple statements.
4. Execute PL-SQL Statement
- Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL.
# 3. TODO
1. Support drop procedure.
2. Create procedure only in `PlSqlOperation`.
3. Doris Parser supports declare variable.
4. Select Statement supports insert into variable.
5. Parameters and fields have the same name.
6. If Cursor exits halfway, will there be a memory leak?
7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters?
8. Supports complex types such as Map and Struct.
9. Test syntax such as Package.
10. Support UDF
11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN,
but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later.
12. Built-in functions require a separate management.
13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt.
14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt.
# 4. Some questions
1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error;
2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste.
3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented.
4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution.
5. The format of the result returned by Doris Statement is ```xxxx\n, xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results.
# 5. Some thoughts
The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process.
HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated.

The current logic for SQL dialect conversion is all in the `fe-core` module, which may lead to the following issues:
- Changes to the dialect conversion logic may occur frequently, requiring users to upgrade the Doris version frequently within the fe-core module, leading to a longer change cycle.
- The cost of customized development is high, requiring users to replace the fe-core JAR package.
Turning it into a plugin can address the above issues properly.
This reverts commit 5b641ebd40fff71e632ee9be4ede58b744b602b9.
Currently, Deltalake Catalog is not a usable feature. We will continue to implement it in the datalake plug-in system in the future, so we will delete it from the FE code for now.
Support transforming trino dialect SQL to logical plan (#21854)
## Proposed changes
Issue Number: #21854
Use io.trino.sql.tree.AstVisitor as vistor, visit coorresponding trino node and transform it to doris logical plan.
## Further comments
Here are some examples for function transforming as following:
**ascii('a')** function is in doris and **codepoint('a')** funtion in trino, they have the same feature and have the same method signature, so we can use [TrinoFnCallTransformer](3b37b76886/fe/fe-core/src/main/java/org/apache/doris/nereids/parser/trino/TrinoFnCallTransformer.java) to handle them.
another example for ComplexTransformer as following:
**date_diff('second', TIMESTAMP '2020-12-25 22:00:00', TIMESTAMP '2020-12-25 21:00:00')"** fuction in trino
and **seconds_diff(2020-12-25 22:00:00, 2020-12-25 21:00:00)")** fuction in doris. They have different method signature, we cant not handle it by TrinoFnCallTransformer simply and we should handle it by individual complex transformer [DateDiffFnCallTransformer](3b37b76886/fe/fe-core/src/main/java/org/apache/doris/nereids/parser/trino/DateDiffFnCallTransformer.java).
* be scanner
- Upgrade avro to 1.11.2
fe
- Upgrade quartz to 2.5.0-rc1
- Upgrade maxcompute to 0.45-2-publish
- Binding avro-ipc to 1.11.2
* Binding hbase version to 2.5.5
binding nimbusds version to 9.35
Upgrade guava to 32.1.2-jre
Set ck dependency scope to provided
Upgrade okio to 3.4.0
Upgrade snake yaml to 1.33
Upgrade aws-java-sdk to 1.12.519
Upgrade hadoop to 3.3.6
Support such grammar
ANALYZE TABLE test WITH CRON "* * * * * ?"
Such job would be scheduled as the cron expr specifie, but natively support minute-level schedule only
1.support filesystem metastore
2.support predicate and project when split
3.fix partition table query error
todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem
doc pr: #21966
Keep hadoop-aliyun version consistent with hadoop main version (3.3.5)
upgrade jackson to 2.14.3
upgrade netty version to 4.1.94.final
binding check.freamework version to 3.32.0
upgrade snappy-java to 1.1.10.1
upgrade hudi version to 0.13.1
upgrade spring version to 2.7.13
upgrade orc version to 1.8.4
revert nonsensical changes
The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner.
Co-authored-by: lexluo <lexluo@tencent.com>
PR(https://github.com/apache/doris/pull/19909) has implemented the framework of hudi reader for MOR table. This PR completes all functions of reading MOR table and enables end-to-end queries.
Key Implementations:
1. Use hudi meta information to generate the table schema, not from hive client.
2. Use hive client to list hudi partitions, so it strongly depends the sync-tools(https://hudi.apache.org/docs/syncing_metastore/) which syncs the partitions of hudi into hive metastore. However, we may get the hudi partitions directly from .hoodie directory.
3. Remove `HudiHMSExternalCatalog`, because other catalogs like glue is compatible with hive catalog.
4. Read the COW table originally from c++.
5. Hudi RecordReader will use ProcessBuilder to start a hotspot debugger process, which may be stuck when attaching the origin JNI process, soI use a tricky method to kill this useless process.
why upgrade? anything wrong?
Try to fix the problem about opentelemetry::v1::ext::http::client::curl::HttpOperation::Send(), I have updated the pr info.
After #19246, when compilng FE, it will automatically generate Config and Session Variables doc and overwrite the origin one.
Need to avoid it because it is not ready to use yet
The document of configs(FE and BE) and session variables is hard to maintain.
Because developer need to modify both code and document.
And you can see that some of config's document is missing.
So I plan to write the document of config or variables directly in code, and using
script to generate document automatically.
How To
This CL mainly changes:
Add field in Config and Session Variables' annaotion
description: The description of the config or variable item. It is a String array. And first element is in Chinese, second is in English
options: the valid options if the config or variable is enum.
Add a scripts docs/generate-config-and-variable-doc.sh
Simple run sh docs/generate-config-and-variable-doc.sh and it will generate docs of FE config and variables,
And save it under docs/admin-manual/config/fe-config.md and docs/advanced/variables.md,
both in Chinese and in English.
And there are template markdowns for this script to read and replace with real doc content.
TODO
Too many description need to be filled. I will finish them in next PR. And now the origin doc remain unchanged.
Find a way to check the description field of config and variables, to make sure we won't missing it.
Generate doc for BE config.
bind netty-version to 4.1.89-final
bind jettison to 1.5.4
upgrade hadoop version to 3.3.5
upgrade range-plugins-common to 2.4.0
bind bcprov-jdk15on to 2.4.0
upgrade and bind woodstox to 6.5.1
upgrade and bind kerby to 2.0.3
upgrade hudi to 0.13.0
upgrade parquet to 1.13.0
upgrade maven-source-plugin to 3.2.1
upgrade maven-assembly-plugin to 3.3.0
upgrade maven-javadoc-plugin to 3.3.2
upgrade maven-shade-plugin to 3.3.4
upgrade maven-clean-plugin to 3.1.0
Remove meaningless plugins
Optimize doris maven path
Unify the Java modules for management in fe
`hudi-common` depends on `parque-avro`, but the dependency scope is `provide`.
When we use `hudi-catalog`, `HoodieAvroWriteSupport` will be called. This method depends on `parque-avro`, so it will generate ClassNotFound
Describe your changes.
`iceberg-hive-metastore` and `hive-storage-api` have been defined in hive-catalog-shade,
and some classes in the shade have been renamed, so we cannot declare them again.
The classes in the shade should be kept.
The `hive-metastore-api` used in `ranger` can also use the jar in the `shade`.
Since we rename the tool class used inside the `hive`, this has no effect.
`Hive 3` uses the `thrift-0.9.3` package, and `Doris` uses the `thrift-0.16.0` package.
These two packages are not compatible, so we use the `hive-sahde` package to manage hive dependencies
in a unified way. This jar package renames the `thrift` class , so the problem of conflict can be resolved.
add <optional> head to solve the compilation issue
use 3.12.9 as the protoc.artifact's version, because there is no 3.12.21
See: https://repo.maven.apache.org/maven2/com/google/protobuf/protoc/
Remove --show-progress arguments of wget because it is not supported in low version wget
Describe your changes.
1.Add the watch mechanism to listen for changes in k8s statefulSet and update nodes in time.
2.For broker, there is only one name by default when using deployManager
3.Refactoring code makes it easier to understand and maintain
4.Fix jar package conflicts between okhttp-ws and okhttp
Previously, the logic of k8sDeployManager.getGroupHostInfos was to call the endpoints () interface of k8s,
which would cause if the pod was unexpectedly restarted, k8sDeployManager would delete the pod before the
restart from the fe or be list and add the pod after the restart to the fe or be list, which obviously does not
meet our expectations.
Now, after fqdn is enabled, we call the statefulSets() interface of k8s to listen for the number of copies to
determine whether we need to be online or offline.
In addition, the watch mechanism is added to avoid the possible A-B-A problem caused by timed polling.
For the sake of stability, when the watch mechanism does not receive messages for a period of time,
it will be degraded to the polling mode.
Now several environment variables have been added,ENV_FE_STATEFULSET,ENV_FE_OBSERVER_STATEFULSET,ENV_BE_STATEFULSET,ENV_BROKER_STATEFULSET,ENV_CN_STATEFULSET For statefulsetName,One-to-one correspondence with ENV_FE_SERVICE,ENV_FE_OBSERVER_SERVICE,ENV_BE_SERVICE,ENV_BROKER_SERVICE,ENV_CN_SERVICE,If a serviceName is configured, the corresponding statefulsetName must be configured, otherwise the program cannot be started.
In #17797 , we introduced aspectj to help log exception easily.
However, the plugin version 1.11 do not support jdk9 and later.
For support compile FE with jdk11
update aspectj-maven-plugin to 1.14.0 version
add new dependency org.aspectj.aspectjrt 1.9.7 to fe-core
according to:
aspectj java version compatibility
aspectj-maven-plugin issue
aspectj release note
intro to aspectj