jdbc read array type get result from Doris is string, PG is java.sql.array, CK is java.lang.object
it's difficult to maintain and read the code,
so change all database's array result to string, then add a cast function from string to doris array type
1. create a project node to adjust the output column position when a mv is selected in olap scan node
2. pass SlotReference's column info when call Alias's toSlot() method
3. should compare plan's logical properties when compare two plans after rewrite
1. Organize http documents
2. Add http interface authentication for FE
3. **Support https interface for FE**
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
HiveMetastoreCache type for file split was Hadoop InputSplit. In this pr, change it to Doris defined Split
This change could avoid convert it every time.
Also fix the explain verbose result return -1 for split file length.
Add St_Angle/St_azimuth function:
St_Angle:
Enter three point, which represent two intersecting lines. Returns the angle between these lines. Point 2 and point 1 represent the first line and point 2 and point 3 represent the second line. The angle between these lines is in radians, in the range [0, 2pi). The angle is measured clockwise from the first line to the second line.
`
mysql> SELECT ST_Angle(ST_Point(1, 0),ST_Point(0, 0),ST_Point(0, 1));
+----------------------------------------------------------------------+
| st_angle(st_point(1.0, 0.0), st_point(0.0, 0.0), st_point(0.0, 1.0)) |
+----------------------------------------------------------------------+
| 4.71238898038469 |
+----------------------------------------------------------------------+
1 row in set (0.04 sec)
`
St_azimuth:
Enter two point, and returns the azimuth of the line segment formed by points 1 and 2. The azimuth is the angle in radians measured between the line from point 1 facing true North to the line segment from point 1 to point 2.
`
mysql> SELECT st_azimuth(ST_Point(0, 0),ST_Point(1, 0));
+----------------------------------------------------+
| st_azimuth(st_point(0.0, 0.0), st_point(1.0, 0.0)) |
+----------------------------------------------------+
| 1.5707963267948966 |
+----------------------------------------------------+
1 row in set (0.04 sec)
add regression-test for like this:
mysqldump -h127.0.0.1 -P9030 -uroot --no-tablespaces --databases > /backup/mysqldump/test.db
To prevent errors Unknown table 'column_statistics' in information_schema (1109), the table information_schema.column_statistics was added.
in this pr, we add a new algorithm to estimate semi/anti join row count.
In original alg., we reduce row count from cross join. usually, this is not good.
for example, L left semi join R on L.a=R.a
suppose L is larger than R, and ndv(L.a) < ndv(R.a)
the estimated row count is rowcount(R) * rowcount(L) / ndv(R.a). in most cases, the estimated row count is larger than rowcount(L).
in new alg, we use ndv(R.a)/originalNdv(R.a) to estimate result rowCount. the basic idea is as following:
1. Suppose ndv(R.a) reduced from m to n.
2. Assume that the value space of L.a is the same as R.a if R.a is not filtered.(this assumption is also hold in original alg.)
regard `L left join R` as a filter applied on L, that is, if L.a is in R.a, then this tuple stays in result.
R.a shrinks to m/n, so L.a also shrinks to m/n
1. Fix value idx in bool rle decoder
2. Iceberg table support datetimev2(3). In the previous version, we converted hive timestamp to datetimev2(0) default.
1. Introduce hadoop libhdfs
2. For Linux-X86 platform, use the hadoop libhdfs
3. For other platform, use libhdfs3, because currently we don't have hadoop libhdfs binary for other platform
Co-authored-by: adonis0147 <adonis0147@gmail.com>
sql like the format (q1, q2, q3 is a query):
``` sql
(q1)
UNION ALL (q2)
UNION ALL (q3)
ORDER BY keys
```
cannot be parsed by nereids, because order will be recognized as an alias of query, we add queryOrganization to avoid it.
In clickhouse's 4.x version of jdbc, some UInt types use special Java types, so I adapted Doris's ClickHouse JDBC External
```
com.clickhouse.data.value.UnsignedByte;
com.clickhouse.data.value.UnsignedInteger;
com.clickhouse.data.value.UnsignedLong;
com.clickhouse.data.value.UnsignedShort;
```
1. add PassNullPredicate to fix topn wrong result for NULL values
2. refactor RuntimePredicate to avoid using TCondition
3. refactor using ordering_exprs in fe and vsort_node
Since slot that reference to constant has been marked as constant expr either, just add condition check to make sure such slot wouldn't be eliminated as constant from group exprs
A framework that read data from jni scanner, which can support the data source from java ecosystem(java API).
## Java Interface
Java scanner should extends `org.apache.doris.jni.JniScanner`, implements the following methods:
```
// Initialize JniScanner
public abstract void open() throws IOException;
// Close JniScanner and release resources
public abstract void close() throws IOException;
// Scan data and save as vector table
public abstract int getNext() throws IOException;
```
See demo usage in `org.apache.doris.jni.MockJniScanner`
## c++ interface
C++ reader should use `doris::JniConnector` to get data from `org.apache.doris.jni.JniScanner`. See demo usage in `doris::MockJniReader`.
## Pushed-down predicates
Java scanner can get pushed-down predicates by `org.apache.doris.jni.vec.ScanPredicate`.
## Remaining works:
1. Implement complex nested types.
2. Read hudi MOR table as the end-to-end demo usage.
we introduce this rule by PR #17968, but some corner case do
not be processed correctly. This PR fix these bugs:
1. fix window function generation method, replace inner slot with
equivalent outer slot
2. forbid below scenes
a. inner has a mapping project
b. inner has an unexpected filter
c. outer has a mapping project
d. outer has an unexpected filter
e. outer has additional table
f. outer has same table
g. outer and inner with different join condition
h. outer and inner has same table with different join condition
forbiden create mv with where clause contained aggregate column
create table a_table(
k1 int null,
k2 int not null,
k3 bigint null,
k4 bigint sum null,
k5 bitmap bitmap_union null,
k6 hll hll_union null
)
aggregate key (k1,k2,k3)
distributed BY hash(k1) buckets 3
properties("replication_num" = "1");
create materialized view where_1 as select k1,k4 from a_table where k4 =1; // invalid, mv on agg table need group by
create materialized view where_2 as select k1,sum(k4) from a_table where k4 =1 group by k1; // invalid, k4 is agg column
create materialized view where_2 as select k1,sum(k4) from a_table where k1+k4 =1 group by k1; // invalid, k4 is agg column
nested alias function will cause bind argument exception, sql like:
``` sql
CREATE ALIAS FUNCTION f1(DATETIMEV2(3), INT)
with PARAMETER (datetime1, int1) as date_trunc(days_sub(datetime1, int1), 'day')
CREATE ALIAS FUNCTION f2(DATETIMEV2(3), int)
with PARAMETER (datetime1, int1) as DATE_FORMAT(HOURS_ADD(
date_trunc(datetime1, 'day'),
add(multiply(floor(divide(HOUR(datetime1), divide(24,int1))), 1), 1)
), '%Y%m%d:%H')
select f2(f1(now(3), 2), 3)
```
bug in FunctionCallExpr#rewriteExpr(), the retExpr will be replaced to originExpr to change the alias function to builtin function, but the retExpr.fn is not null, so when return to outer scope, the fn will be covered. That's the example:
```
f1(f1()) -> date_trunc(days_sub(date_trunc(days_sub()))) is correct and
f1(f1()) -> date_trunc(days_sub(days_sub())) is bug.
```
we fix it.
1. add LateralViewRef's id into TableRef's allTableRefIds, so the caller won't miss LateralViewRef when trying to get all the tableref ids.
2. TableFunctionNode should use child node's output tuple id as the input tuple id
improve column match performance by introducing a column name map in `MaterializedIndexMeta`
`getColumnByName` is slow due to the linear search process, using a map to speed up search.