This pull request addresses the behavior of the `lower_case_table_names` parameter for jdbc catalog's based on the configuration of the internal table's corresponding parameter.
Changes:
- For internal tables, if `lower_case_table_names` is set to 1 or 2, thejdbc catalog's parameter is forcefully set to `true`.
- For internal tables, if `lower_case_table_names` is set to 0, the jdbc catalog's parameter can be either `true` or `false` with a default value of `false`.
These adjustments ensure consistency and predictability when working with both internal and external table configurations in Doris.
1.refactor statistics functions withSel/updateRowCountOnly/withRowCount,
2. donot use Double.MAX in stats estimation
3. dateLikeType.rangeLength() do not throw DateTimeException.
Push TopN through Join.
JoinType just can be left/right outer join or cross join, because data of their one child can't be filtered.
new TopN is (original limit + original offset, 0) as limit and offset.
`ExternalFileTableValuedFunction` now has 3 derived classes:
- LocalTableValuedFunction
- HdfsTableValuedFunction
- S3TableValuedFunction
All these tvfs are for reading data from file. The difference is where to read the file, eg, from HDFS or from local filesystem.
So I refine the fields and methods of these classes.
Now there 3 kinds of properties of these tvfs:
1. File format properties
File format properties, such as `format`, `column_separator`. For all these tvfs, they are common properties.
So these properties should be analyzed in parenet class `ExternalFileTableValuedFunction`.
2. URI or file path
The URI or file path property indicate the file location. For different storage, the format of the uri are not same.
So they should be analyzed in each derived classes.
3. Other properties
All other properties which are special for certain tvf.
So they should be analyzed in each derived classes.
There are 2 new classes:
- `FileFormatConstants`: Define some common property names or variables related to file format.
- `FileFormatUtils`: Define some util methods related to file format.
After this PR, if we want to add some common properties for all these tvfs, only need to handled it in
`ExternalFileTableValuedFunction`, to avoid missing handle it in any one of them.
### Behavior change
1. Remove `fs.defaultFS` property in `hdfs()`, it can be got from `uri`
2. Use `\t` as the default column separator of csv format, same as stream load
The current implementation needs to iterate all metrics in a lock,
which might cause latency spikes. This PR changes the underlying
data structure to ConcurrentHashMap so that removing metrics doesn't
need to block the entire registry.
when we do NormalizeToSlot, we pushed complex expression and only remain
slot of it. When we do this, we collect alias and their child and
compute its child in bottom project, remain the result slot in current
node. for example
Window(max(...), c1 as a1)
after normalization, we get
Window(max(...), a1)
+-- Project(..., c1 as a1)
But, in some cases, we remove some SlotReference by mistake, for example
Window(max(...), c1, c1 as a1)
after normalization, we get
Window(max(...), a1)
+-- Project(..., c1 as a1)
we lost the SlotReference c1. This PR fix this problem. After this Pr,
we get
Window(max(...), c1, a1)
+-- Project(..., c1, c1 as a1)
Current multi-window plan generation has problem on the project sequence, for example:
+--LogicalWindow ( windowExpressions=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`#116, rank() WindowSpec(...) AS `rn`#117], ...)
and correspond physical plan is:
+--PhysicalWindow[6572]@16 ( windowFrameGroup=(Funcs=[avg(sum_sales#115) WindowSpec(...) AS `avg_monthly_sales`#116], ... )
+--PhysicalWindow[6568]@29 ( windowFrameGroup=(Funcs=[rank() WindowSpec(...) AS `rn`#117], ...] )
If the final plan is generated as following:
MultiCastDataSinks
STREAM DATA SINK
EXCHANGE ID: 20
HASH_PARTITIONED: rn[#208], i_brand[#202], cc_name[#203], i_category[#201]
Before we eventually resolve the multi-window issue, we add a projection as following and force a mapping but this will not cover all potential problems.
MultiCastDataSinks
STREAM DATA SINK
EXCHANGE ID: 20
HASH_PARTITIONED: rn[#219], i_brand[#213], cc_name[#214], i_category[#212]
PROJECTIONS: i_category[#184], i_brand[#185], cc_name[#186], d_year[#187], d_moy[#188], sum_sales[#189], avg_monthly_sales[#191], rn[#190]
PROJECTION TUPLE: 20
Fix some problems of json_length and json_contains function on Nereids
fix wrong result of json_contains function
Regression test jsonb_p0 to enable Nereids
because storage engine could not process date comparison predicates.
we convert it to datetime comparison predicates.
however, partition prunner could not process cast(slot) cp literal.
so, we convert back in partition pruner to let it work well.
TODO:
move convert date to datetime in translate stage
and only convert predicates for storage engine.
Problem:
be core because of bitmap calculation.
Reason:
when be check failed, it would core directly.
Example:
SELECT id_bitmap FROM test_bitmap WHERE id_bitmap IN (NULL) LIMIT 20;
Solved:
Forbidden this kind of expression in fe when analyze. And also forbid bitmap type comparing in other unsupported expressions.
Issue Number: #24858
If isAllNode is true, the api should only distribute the query to all fe and do not run checkAuthByUserAndQueryId.
If isAllNode is false, the api queries profile on the fe, at this time the api should run checkAuthByUserAndQueryId.
Support complex types in jni framework, and successfully run end-to-end on hudi.
### How to Use
Other scanners only need to implement three interfaces in `ColumnValue`:
```
// Get array elements and append into values
void unpackArray(List<ColumnValue> values);
// Get map key array&value array, and append into keys&values
void unpackMap(List<ColumnValue> keys, List<ColumnValue> values);
// Get the struct fields specified by `structFieldIndex`, and append into values
void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values);
```
Developers can take `HudiColumnValue` as an example.
mark partition topn phase to notice be to handle passthrough logic well, this pr is fe part code.
be side logic: the the phase equals to PTopNPhase.TWO_PAHSE_GLOBAL, it should skip the bypass logic and do the second phase ptopn operation anyway.