Push down limit-distinct through left/right outer join or cross join.
such as select t1.c1 from t1 left join t2 on t1.c1 = t2.c1 order by t1.c1 limit 1;
it is painful to align node in `explain` and node in `explain physical plan`, since they use two different sets of node IDs.
This pr makes 'explain' command use node IDs of their correspond node in 'explain physical plan'
If user has database with same name mysql, will introduce problem when doing checkpoint.
Solution:
Add check for this situation, if duplicate, exit and print log info to prevent damage of metadata;
Add fe config field: mysqldb_replace_name to make things correct if user already has mysql db.
Related pr: #23087#22868
PR https://github.com/apache/doris/pull/24606 has updated hbase version to 2.5.5, but it conflict with hudi, causing error like:
```
org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = Unexpected exception: Failed to get hudi partitions
at org.apache.doris.qe.StmtExecutor.analyze(StmtExecutor.java:1021) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.executeByLegacy(StmtExecutor.java:696) ~[doris-fe.jar:1.2-SNAPSHOT]
...
Caused by: java.lang.NullPointerException
at org.apache.hadoop.fs.FilterFileSystem.getConf(FilterFileSystem.java:524) ~[hadoop-common-3.3.6.jar:?]
at org.apache.hadoop.hbase.io.hfile.ReaderContext.<init>(ReaderContext.java:53) ~[hbase-server-2.5.5.jar:2.5.5]
at org.apache.hadoop.hbase.io.hfile.ReaderContextBuilder.build(ReaderContextBuilder.java:106) ~[hbase-server-2.5.5.jar:2.5.5]
```
1. To ensure compatibility with the original optimizer, expose the non-lambda signature of highorder function externally.
2. fix some bugs in toSql function in the original optimizer
The session variable in export job should be copied from session variable in connection context.
Because both session variable in connection context and in export job may be modified at same time,
cause ConcurrentModificationException like:
2023-10-07 22:56:12,818 WARN (mysql-nio-pool-2|249) [ConnectProcessor.handleQueryException():396] Process one query failed because unknown reason:
java.util.ConcurrentModificationException: null
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) ~[?:1.8.0_131]
at java.util.HashMap$KeyIterator.next(HashMap.java:1461) ~[?:1.8.0_131]
at org.apache.doris.qe.VariableMgr.revertSessionValue(VariableMgr.java:238) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:474) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:438) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:353) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.dispatch(ConnectProcessor.java:501) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:752) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]
This error is reported by external_table_p0/export/test_export_external_table.groovy
Issue Number: close#25023
The detail of this bug has been described at the above issue. We can check if current FE is a master node to avoid such problems.
1. Add a abstraction for column stats status which is required so furthur optimization and feature development
2. Enable analyze test in p0 that disabled unexpectedly before
This pull request addresses the behavior of the `lower_case_table_names` parameter for jdbc catalog's based on the configuration of the internal table's corresponding parameter.
Changes:
- For internal tables, if `lower_case_table_names` is set to 1 or 2, thejdbc catalog's parameter is forcefully set to `true`.
- For internal tables, if `lower_case_table_names` is set to 0, the jdbc catalog's parameter can be either `true` or `false` with a default value of `false`.
These adjustments ensure consistency and predictability when working with both internal and external table configurations in Doris.
1.refactor statistics functions withSel/updateRowCountOnly/withRowCount,
2. donot use Double.MAX in stats estimation
3. dateLikeType.rangeLength() do not throw DateTimeException.
Push TopN through Join.
JoinType just can be left/right outer join or cross join, because data of their one child can't be filtered.
new TopN is (original limit + original offset, 0) as limit and offset.
`ExternalFileTableValuedFunction` now has 3 derived classes:
- LocalTableValuedFunction
- HdfsTableValuedFunction
- S3TableValuedFunction
All these tvfs are for reading data from file. The difference is where to read the file, eg, from HDFS or from local filesystem.
So I refine the fields and methods of these classes.
Now there 3 kinds of properties of these tvfs:
1. File format properties
File format properties, such as `format`, `column_separator`. For all these tvfs, they are common properties.
So these properties should be analyzed in parenet class `ExternalFileTableValuedFunction`.
2. URI or file path
The URI or file path property indicate the file location. For different storage, the format of the uri are not same.
So they should be analyzed in each derived classes.
3. Other properties
All other properties which are special for certain tvf.
So they should be analyzed in each derived classes.
There are 2 new classes:
- `FileFormatConstants`: Define some common property names or variables related to file format.
- `FileFormatUtils`: Define some util methods related to file format.
After this PR, if we want to add some common properties for all these tvfs, only need to handled it in
`ExternalFileTableValuedFunction`, to avoid missing handle it in any one of them.
### Behavior change
1. Remove `fs.defaultFS` property in `hdfs()`, it can be got from `uri`
2. Use `\t` as the default column separator of csv format, same as stream load
The current implementation needs to iterate all metrics in a lock,
which might cause latency spikes. This PR changes the underlying
data structure to ConcurrentHashMap so that removing metrics doesn't
need to block the entire registry.
when we do NormalizeToSlot, we pushed complex expression and only remain
slot of it. When we do this, we collect alias and their child and
compute its child in bottom project, remain the result slot in current
node. for example
Window(max(...), c1 as a1)
after normalization, we get
Window(max(...), a1)
+-- Project(..., c1 as a1)
But, in some cases, we remove some SlotReference by mistake, for example
Window(max(...), c1, c1 as a1)
after normalization, we get
Window(max(...), a1)
+-- Project(..., c1 as a1)
we lost the SlotReference c1. This PR fix this problem. After this Pr,
we get
Window(max(...), c1, a1)
+-- Project(..., c1, c1 as a1)