jdbc read array type get result from Doris is string, PG is java.sql.array, CK is java.lang.object
it's difficult to maintain and read the code,
so change all database's array result to string, then add a cast function from string to doris array type
When full clone, if the max version of the local table is less than or equal to the max version of the clone table, there is no need to calculate the delete bitmap again.
For scan node with no vectorized predicate, the input column for the first short-circuit predicate is dense and we don't need to access the selector column.
This PR improve performance by ~30% on TPCH Q3.
We want to use file cache for caching cold data in S3.
When reading them, we want to know where the data come from and the time taken to read the datas.
So we support the metrics in olap scan node.
And for clearing the information, i also update the fields about the metrics.
1. create a project node to adjust the output column position when a mv is selected in olap scan node
2. pass SlotReference's column info when call Alias's toSlot() method
3. should compare plan's logical properties when compare two plans after rewrite
Optimize q20, q21, q22, q23 LIKE_SUBSTRING (like '%xxxx%'). Idea is from clickhouse stringsearcher:
Stringsearcher is about 10%~20% faster than volnitsky algorithm when needle size is less than 10 using two chars at beginning search in SIMD .
Stringsearcher is faster than volnitsky algorithm, when needle size is less than 21.
The changes are as follows:
Using first two chars of needle at beginning search. We can compare two chars of needle and [n:n+17) chars in haystack in SIMD in one loop. Filter efficiency will be higher.
When env support SIMD, we use stringsearcher.
Test result in clickbench:
q20 is about 15% up.
q20: SELECT COUNT(*) FROM hits WHERE URL LIKE '%google%';
q21, q22 is about 1%~5% up.
q21: SELECT SearchPhrase, MIN(URL), COUNT(*) AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
q22: SELECT SearchPhrase, MIN(URL), MIN(Title), COUNT(*) AS c, COUNT(DISTINCT UserID) FROM hits WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
q23 is about 30%~40% up and not stable.
q23: SELECT * FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10;
1. Organize http documents
2. Add http interface authentication for FE
3. **Support https interface for FE**
4. Provide authentication interface
5. Add http interface authentication for BE
6. Support https interface for BE
HiveMetastoreCache type for file split was Hadoop InputSplit. In this pr, change it to Doris defined Split
This change could avoid convert it every time.
Also fix the explain verbose result return -1 for split file length.
Add St_Angle/St_azimuth function:
St_Angle:
Enter three point, which represent two intersecting lines. Returns the angle between these lines. Point 2 and point 1 represent the first line and point 2 and point 3 represent the second line. The angle between these lines is in radians, in the range [0, 2pi). The angle is measured clockwise from the first line to the second line.
`
mysql> SELECT ST_Angle(ST_Point(1, 0),ST_Point(0, 0),ST_Point(0, 1));
+----------------------------------------------------------------------+
| st_angle(st_point(1.0, 0.0), st_point(0.0, 0.0), st_point(0.0, 1.0)) |
+----------------------------------------------------------------------+
| 4.71238898038469 |
+----------------------------------------------------------------------+
1 row in set (0.04 sec)
`
St_azimuth:
Enter two point, and returns the azimuth of the line segment formed by points 1 and 2. The azimuth is the angle in radians measured between the line from point 1 facing true North to the line segment from point 1 to point 2.
`
mysql> SELECT st_azimuth(ST_Point(0, 0),ST_Point(1, 0));
+----------------------------------------------------+
| st_azimuth(st_point(0.0, 0.0), st_point(1.0, 0.0)) |
+----------------------------------------------------+
| 1.5707963267948966 |
+----------------------------------------------------+
1 row in set (0.04 sec)
add regression-test for like this:
mysqldump -h127.0.0.1 -P9030 -uroot --no-tablespaces --databases > /backup/mysqldump/test.db
To prevent errors Unknown table 'column_statistics' in information_schema (1109), the table information_schema.column_statistics was added.
Result of functions grouping and grouping_id is always not nullable, but outer join will convert the result column to nullable when necessary, which will cause mismatch of column type and column object when executing unctions grouping and grouping_id.
The offset of _nullmap and _value are inconsistent in OlapDataConvertor, so the obtained null flag is incorrect when calling get_ data_ at function. When the key column or sequence column has null values, the encoding of the short key index or primary key index may be wrong.
This was introduced by #10883#10925.
in this pr, we add a new algorithm to estimate semi/anti join row count.
In original alg., we reduce row count from cross join. usually, this is not good.
for example, L left semi join R on L.a=R.a
suppose L is larger than R, and ndv(L.a) < ndv(R.a)
the estimated row count is rowcount(R) * rowcount(L) / ndv(R.a). in most cases, the estimated row count is larger than rowcount(L).
in new alg, we use ndv(R.a)/originalNdv(R.a) to estimate result rowCount. the basic idea is as following:
1. Suppose ndv(R.a) reduced from m to n.
2. Assume that the value space of L.a is the same as R.a if R.a is not filtered.(this assumption is also hold in original alg.)
regard `L left join R` as a filter applied on L, that is, if L.a is in R.a, then this tuple stays in result.
R.a shrinks to m/n, so L.a also shrinks to m/n
1. Fix value idx in bool rle decoder
2. Iceberg table support datetimev2(3). In the previous version, we converted hive timestamp to datetimev2(0) default.