Commit Graph

1240 Commits

Author SHA1 Message Date
ffadaa4935 [improvement](inverted index) skip write index on load and generate index on compaction (#20325) 2023-06-03 16:03:21 +08:00
b62c5a70c7 [fix](match query) fix array column match query failed without inverted index (#20344) 2023-06-02 21:10:12 +08:00
adc3acb283 [fix](match) fix match query with compound predicates return -6003 (#20361) 2023-06-02 18:25:37 +08:00
a20a6d2bea [refactor](jdbc catalog) Refactor the JdbcClient code (#20109)
This PR does the following:

1. This PR is a substantial refactor of the JDBC client architecture. The previous monolithic JDBC client has been refactored into an abstract base class `JdbcClient`, and a set of database-specific subclasses (e.g., `JdbcMySQLClient`, `JdbcOracleClient`, etc.), and the JdbcClient required config, abstract into an object. This allows for improved modularity, easier addition of support for new databases, and cleaner, more maintainable code. This change is backward-compatible and does not affect existing functionality.
2. As a result of client refactoring, OceanBaseClient can automatically recognize the mode of operation as MySQL or Oracle, so we cancel the oceanbase_mode property in the Jdbc Catalog, but due to the cancellation of the property, When creating a single OceanBase Jdbc Table, the table type needs to be filled in as oceanbase(mysql mode) or oceanbase_oracle(oracle_mode). The above work is a change in the usage behavior, please note.
3. For the PostgreSQL Jdbc Catalog, I did two things:

      1.   The adaptation to MATERIALIZED VIEW and FOREIGN TABLE is added
      2.   Fixed reading jsonb, which had been incorrectly changed to json in a previous PR

4. fix some jdbc catalog test case
5. modify oceanbase jdbc doc

And,Thanks @wolfboys for the guidance
2023-06-02 17:58:10 +08:00
d68f3f3b3d [Feature](array-functions)improve array functions for array_last_index (#20294)
Now we just support array_first_index for lambda input , but no array_last_index
2023-06-02 13:54:03 +08:00
8ff8705b3f [fix](olap) deletion statement with space conditions did not take effect (#20349)
Deletion statement like this:

delete from tb where k1 = '  ';
The rows whose k1's value is ' ' will not be deleted.
2023-06-02 13:52:57 +08:00
a8a4da9b9e [fix](nereids)dphyper join reorder may cache wrong project list for project node (#20209)
* [fix](nereids)dphyper join reorder may cache wrong project list for project node
2023-06-02 09:35:28 +08:00
ecdc5124be [feature-wip](duplicate-no-keys) schame change support for duplicate no keys (#19326) 2023-06-02 09:22:41 +08:00
608d2a3eca [Bug](exec) push down no group by agg min cause error result (#20289)
sql """
CREATE TABLE t1_int (
num int(11) NULL,
dgs_jkrq bigint(20) NULL
) ENGINE=OLAP
DUPLICATE KEY(num)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(num) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);
"""
sql """insert into t1_int values(1,1),(1,2),(1,3),(1,4),(1,null);"""
qt_sql """
select min(dgs_jkrq) from t1_int;
"""

get the error result:4

after change we get the right result:1
2023-06-01 17:29:46 +08:00
a8b273ae31 [P2](test) Fix P2 output (#20311) 2023-06-01 15:11:12 +08:00
519f01133a [feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811) 2023-06-01 13:09:58 +08:00
1b968c4ade [fix](multi catalog)Fix nereids planner text format include extra column index bug (#20260)
Nereids planner include all columns index in TFileScanRangeParams, this may cause the column projection incorrect for
 text format table. Because csv reader use the column index position to split a line. Extra column index will cause get 
wrong split result. This PR is to reset the column index after Projection, remove the useless column index.
2023-06-01 12:17:47 +08:00
cc41cb0e7e [Fix](Nereids) fix some insert into select bugs (#20052)
fix 3 bugs:

1. failed to insert into a table with mv.
```sql
create table t (
    id int,
   c1 int,
   c2 int,
   c3 int
) duplicate key(id)
distributed by hash(id) buckets 4

create materialized view k12s3m as select id, sum(c1), max(c3) from t group by id;

insert into t select -4, -4, -4, 'd';
```
insert will rise exception because mv column is not handled. now we will add a target column and value as defineExpr.

2. failed to insert into a table with not all the columns.
```sql
insert into t(c1, c2) select c1, c2 from t
```
and t(id ukey, c1, c2, c3), will insert too many data, we fix it by change the output partitions.

3. failed to insert into a table with complex select.
the select statement has join or agg, fix the bug by the way similar to the one at 2nd bug.
2023-06-01 12:15:19 +08:00
68e593fbf1 [fix](nereids)(planner) case when should return NullLiteral when all case result is NullLiteral (#20280) 2023-06-01 11:11:41 +08:00
9e21318834 [refactor](dynamic table) Make segment_writer unaware of dynamic schema, and ensure parsing is exception-safe. (#19594)
1. make ColumnObject exception safe
2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema
3. add more test cases
2023-06-01 10:25:04 +08:00
65a75abecb [Fix](Nereids) bitmap type should not be used in comparison predicate (#19807)
When using nereids, if we use compare operator of bitmap type, an analyze exception need to be throwed.

like: 
select id from (select BITMAP_EMPTY() as c0 from expr_test) as ref0 where c0 = 1 order by id

Which c0 in subq0 is a bitmap type, this scenario is not supported right now.
2023-05-31 23:09:36 +08:00
6adb3fdf11 [fix](match_phrase) Fix the inconsistent query result for 'match_phrase' after creating index without support_phrase property (#20258)
if create inverted index without support_phrase property, remaining the match_phrase condition to filter by match function.
2023-05-31 18:09:50 +08:00
d93ff5d1ab [fix](pipeline) Enable pipeline explicitly in the plan shape check cases. (#20221)
enable pipeline explicitly in tpcds plan shape check
2023-05-31 14:40:24 +08:00
1f22aa6961 [fix](nereids) like function's nullable property should be PropagateNullable (#20237) 2023-05-31 12:13:38 +08:00
3f91127854 [fix](regression)Update external Brown test case out file. #20232
Update external Brown test case out file to match the new precision.
2023-05-31 09:21:04 +08:00
ff05217a1e [regression](p0) fix test for array_enumerate_uniq (#20231) 2023-05-30 22:14:19 +08:00
b7a69fbf4b [test](regression) add regression test from materialized slot bug (#20207)
The test query includes the conversion of string types to other types, and the processing of materialized columns for nested subqueries, which is the regression test for bug fix(#18783)
2023-05-30 21:23:05 +08:00
accaff1026 [Feature](compaction) wip: single replica compaction (#19237)
Currently, compaction is executed separately for each backend, and the reconstruction of the index during compaction leads to high CPU usage. To address this, we are introducing single replica compaction, where a specific primary replica is selected to perform compaction, and the remaining replicas fetch the compaction results from the primary replica.

The Backend (BE) requests replica information for all peers corresponding to a tablet from the Frontend (FE). This information includes the host where the replica is located and the replica_id. By calculating hash(replica_id), the replica with the smallest hash value is responsible for executing compaction, while the remaining replicas are responsible for fetching the compaction results from this replica.
The compaction task producer thread, before submitting a compaction task, checks whether the local replica should fetch from its peer. If it should, the task is then submitted to the single replica compaction thread pool.
When performing single replica compaction, the process begins by requesting rowset versions from the target replica. These rowset_versions are then compared with the local rowset versions. The first version that can be fetched is selected.
2023-05-30 21:12:48 +08:00
a855253543 [fix](Nereids) filter should not push through union to OneRowRelation (#20132)
## Problem summary
When we want to push the filter through the union. We should check whether the union's children are `OneRowRelation` or not. If there are some `OneRowRelation`, we shouldn't push down the filter to that part

Before this PR
```
mysql> select * from (select 1 as a, 2 as b union all select 3, 3) t where a = 1;
+------+------+
| a    | b    |
+------+------+
|    1 |    2 |
|    3 |    3 |
+------+------+
2 rows in set (0.01 sec)
```

After this PR
```
mysql> select * from (select 1 as a, 2 as b union all select 3, 3) t where a = 1;
+------+------+
| a    | b    |
+------+------+
|    1 |    2 |
+------+------+
1 row in set (0.38 sec)
```
2023-05-30 17:06:52 +08:00
0c98355fff [fix](catalog) fix create catalog with resource replay issue and kerberos auth issue (#20137)
1. Fix create catalog with resource replay bug.
	If user create catalog using `create catalog hive with resource xxx`, when replaying edit log,
	there is a bug that resource may be dropped, causing NPE and FE will fail to start.

	In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true.
	So that `with resource` will not be allowed, and it will be deprecated later.

	And also fix the replay bug to avoid NPE.

2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication.

	When user create 2 hive catalogs, one use simple auth, the other use kerberos auth.
	The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.`

	So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`.
	Which means this property will be added automatically when user creating hive catalog, to avoid such problem.

3. Fix calling `hdfsExists()` issue

	When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found.

3. Some code refactor

	Avoid import `org.apache.parquet.Strings`
2023-05-30 16:57:39 +08:00
49ce4e6fda [fix](test) fix p2 broker load (#20196) 2023-05-30 16:26:00 +08:00
631494e05d [regression](decimalv3) Fix output for P1 regression (#20213) 2023-05-30 15:21:29 +08:00
bb12a1cb49 [Enhance](array function) add support for DecimalV3 for array_enumerate_uniq() (#17724) 2023-05-30 13:09:19 +08:00
94e1072d14 Revert "[fix](DECIMALV3) Fix the error in DECIMALV3 when explicitly casting. (#19926)" (#20204)
This reverts commit 8ca4f9306763b5a18ffda27a07ab03cc77351e35.
2023-05-30 10:35:33 +08:00
72cfe5865a [feat](optimizer) Support CTE reuse (#19934)
Before this PR, new optimizer would inline CTE directly. However in many scenario a CTE could be referenced many times, such as in TPC-DS tests, for these cases materialize the result sets of CTE and reuse it would significantly agument performance. In our tests on tpc-ds related sqls, it would improve the performance by up to almost **4 times** than before.

We introduce belowing plan node in optimizer

1. CTEConsumer: which hold a reference to CTEProducer
2. CTEProducer: Plan defined by CTE stmt
3. CTEAnchor: the father node of CTEProducer, a CTEProducer could only be referenced from  corresponding CTEAnchor's right child.

A CTEConsumer would be converted to a inlined plan if corresponding CTE referenced less than or equal `inline_cte_referenced_threshold` (it's a session variable, by default is 1).


For SQL:

```sql
EXPLAIN REWRITTEN PLAN
WITH cte AS (SELECT col2 FROM t1)
SELECT * FROM t1 WHERE (col3 IN (SELECT c1.col2 FROM cte c1))
UNION ALL
SELECT * FROM t1 WHERE (col3 IN (SELECT c1.col2 FROM cte c1));
```

Rewritten plan before this PR:

```
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalUnion ( qualifier=ALL, outputs=[col1#14, col2#15, col3#16], hasPushedFilter=false )                                                           |
| |--LogicalJoin[559] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#6 = col2#8)], otherJoinConjuncts=[] )      |
| |  |--LogicalProject[551] ( distinct=false, projects=[col1#4, col2#5, col3#6], excepts=[], canEliminate=true )                                       |
| |  |  +--LogicalFilter[549] ( predicates=(__DORIS_DELETE_SIGN__#7 = 0) )                                                                             |
| |  |     +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                    |
| |  +--LogicalProject[555] ( distinct=false, projects=[col2#20 AS `col2`#8], excepts=[], canEliminate=true )                                          |
| |     +--LogicalFilter[553] ( predicates=(__DORIS_DELETE_SIGN__#22 = 0) )                                                                            |
| |        +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                    |
| +--LogicalProject[575] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=false )                                       |
|    +--LogicalJoin[573] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#11 = col2#13)], otherJoinConjuncts=[] ) |
|       |--LogicalProject[565] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=true )                                  |
|       |  +--LogicalFilter[563] ( predicates=(__DORIS_DELETE_SIGN__#12 = 0) )                                                                         |
|       |     +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
|       +--LogicalProject[569] ( distinct=false, projects=[col2#24 AS `col2`#13], excepts=[], canEliminate=true )                                      |
|          +--LogicalFilter[567] ( predicates=(__DORIS_DELETE_SIGN__#26 = 0) )                                                                         |
|             +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
+------------------------------------------------------------------------------------------------------------------------------------------------------+

```

After this PR

```
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalUnion ( qualifier=ALL, outputs=[col1#14, col2#15, col3#16], hasPushedFilter=false )                                                           |
| |--LOGICAL_CTE_ANCHOR#-1164890733                                                                                                                    |
| |  |--LOGICAL_CTE_PRODUCER#-1164890733                                                                                                               |
| |  |  +--LogicalProject[427] ( distinct=false, projects=[col2#1], excepts=[], canEliminate=true )                                                    |
| |  |     +--LogicalFilter[425] ( predicates=(__DORIS_DELETE_SIGN__#3 = 0) )                                                                          |
| |  |        +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
| |  +--LogicalJoin[373] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#6 = col2#8)], otherJoinConjuncts=[] )   |
| |     |--LogicalProject[370] ( distinct=false, projects=[col1#4, col2#5, col3#6], excepts=[], canEliminate=true )                                    |
| |     |  +--LogicalFilter[368] ( predicates=(__DORIS_DELETE_SIGN__#7 = 0) )                                                                          |
| |     |     +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
| |     +--LOGICAL_CTE_CONSUMER#-1164890733#1038782805                                                                                                 |
| +--LogicalProject[384] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=false )                                       |
|    +--LogicalJoin[382] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#11 = col2#13)], otherJoinConjuncts=[] ) |
|       |--LogicalProject[379] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=true )                                  |
|       |  +--LogicalFilter[377] ( predicates=(__DORIS_DELETE_SIGN__#12 = 0) )                                                                         |
|       |     +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
|       +--LOGICAL_CTE_CONSUMER#-1164890733#858618008                                                                                                  |
+------------------------------------------------------------------------------------------------------------------------------------------------------+

```
2023-05-30 10:18:59 +08:00
6f31ee9492 [fix](p0 regression)Update hive docker test case result data (#20176)
Doris updated array type output format, using double quote for Strings.
Before, it was using single quote. So we need to update the case out file using double quote.
2023-05-30 00:17:30 +08:00
90b4e127e3 [Feature](inverted index) add parser_mode properties for inverted index parser (#20116)
We add parser mode for inverted index, usage like this:
```
CREATE TABLE `inverted` (
  `FIELD0` text NULL,
  `FIELD1` text NULL,
  `FIELD2` text NULL,
  `FIELD3` text NULL,
  INDEX idx_name1 (`FIELD0`) USING INVERTED PROPERTIES("parser" = "chinese", "parser_mode" = "fine_grained") COMMENT '',
  INDEX idx_name2 (`FIELD1`) USING INVERTED PROPERTIES("parser" = "chinese", "parser_mode" = "coarse_grained") COMMENT ''
) ENGINE=OLAP
);
```
2023-05-29 23:21:52 +08:00
8ca4f93067 [fix](DECIMALV3) Fix the error in DECIMALV3 when explicitly casting. (#19926)
before

mysql [test]>select cast(1 as DECIMALV3(16, 2)) /  cast(3 as DECIMALV3(16, 2));
+-----------------------------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) |
+-----------------------------------------------------------+
|                                                      0.00 |
+-----------------------------------------------------------+


mysql [test]>select * from divtest;
+------+------+
| id   | val  |
+------+------+
|    3 | 5.00 |
|    2 | 4.00 |
|    1 | 3.00 |
+------+------+

mysql [test]>select cast(1 as decimalv3(16,2)) / val from divtest;
+-------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / `val` |
+-------------------------------------+
|                                   0 |
|                                   0 |
|                                   0 |
+-------------------------------------+
after

mysql [test]>select cast(1 as DECIMALV3(16, 2)) /  cast(3 as DECIMALV3(16, 2));
+-----------------------------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) |
+-----------------------------------------------------------+
|                                                      0.33 |
+-----------------------------------------------------------+

mysql [test]>select cast(1 as decimalv3(16,2)) / val from divtest;
+-------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / `val` |
+-------------------------------------+
|                            0.250000 |
|                            0.200000 |
|                            0.333333 |
+-------------------------------------+
This is because in the previous code, the constant 1.000 would be transformed into 1.

remove "ReduceType
2023-05-29 19:51:12 +08:00
Pxl
5788214416 [Bug](function) fix equals implements not judge order by elements of function call expr (#20083)
fix equals implements not judge order by elements of function call expr
#19296
2023-05-29 19:03:05 +08:00
55ccddb62c [Conf](decimalv3) enable decimalv3 by default 2023-05-29 15:38:31 +08:00
a86134cb39 [fix](executor) Fixed an error with cast as time. #20144
before

mysql [(none)]>select cast("10:10:10" as time);
+-------------------------------+
| CAST('10:10:10' AS TIMEV2(0)) |
+-------------------------------+
| 00:00:00                      |
+-------------------------------+
after

mysql [(none)]>select cast("10:10:10" as time);
+-------------------------------+
| CAST('10:10:10' AS TIMEV2(0)) |
+-------------------------------+
| 10:10:10                      |
+-------------------------------+
In the past, we supported this syntax.

mysql [(none)]>select cast("2023:05:01 13:14:15" as time);
+------------------------------------------+
| CAST('2023:05:01 13:14:15' AS TIMEV2(0)) |
+------------------------------------------+
| 13:14:15                                 |
+------------------------------------------+
However, "10:10:10" is also a valid datetime.

mysql [(none)]>select cast("10:10:10" as datetime);
+-----------------------------------+
| CAST('10:10:10' AS DATETIMEV2(0)) |
+-----------------------------------+
| 2010-10-10 00:00:00               |
+-----------------------------------+
So here, the order of parsing has been adjusted.
2023-05-29 12:17:21 +08:00
970efdc1cb [Feature](Nereids) support advanced materialized view (#19650)
Increase the functionality of advanced materialized view

This feature already supported by legacy planner with PR #19650

This PR implement it in Nereids. This PR implement the features as below:
1. Support multiple columns in aggregate function.  eg: select sum(c1 + c2) from t1;
2. Supports complex expressions.  eg: select abs(c1), sum(abc(c1+1) + 1) from t1;

TODO:
1. Support adding where in materialized view
2023-05-29 10:37:44 +08:00
859b03dfdf [Improvement](topn) prevent memory usage of key topn increasing unlimited (#19978) 2023-05-29 10:16:15 +08:00
ae352997b4 [Enhancement](alter inverted index) Improve alter inverted index performance with light weight add or drop inverted index (#19063) 2023-05-28 11:23:07 +08:00
4cbb6ece10 [fix](fe)ordering exprs should be substituted in the same way as select part (#20091) 2023-05-27 21:00:57 +08:00
f54a068d82 [feature](function) add json->operator convert to json_extract (#19899) 2023-05-27 12:45:45 +08:00
f3d8af330a [Bug](point query) check point query before check two phase read (#20055)
* [Bug](point query) checkAndSetPointQuery before checkEnableTwoPhaseRead

1. checkEnableTwoPhaseRead rely on thr short circuit flag
2. add more metric to display lookup profile

* fix rebase
2023-05-27 12:38:58 +08:00
9539bbf8ae Revert "[test](executor)add crud regression test for resource group (#19659)" (#20121)
This reverts commit 8b9813663d87afa7b359b31782f3864dc54881df.
2023-05-27 08:25:00 +08:00
23c95d15da [regression-test](sort) Fix unstable sorting (#20125) 2023-05-26 23:42:05 +08:00
860e28a3a3 [Fix](multi-catalog) Fix db name is not lower case when jdbc catalog configuration lower_case_table_names is true. (#20021)
Fix db name is not lower case when jdbc catalog configuration lower_case_table_names is true.
Fix regression-test test_oracle_jdbc_catalog.
2023-05-26 21:35:38 +08:00
ce45d6119d [FIX](regress-test) fix struct_export out data (#20111)
fix struct_export out data
2023-05-26 19:57:51 +08:00
317338913c [Bug](topn) Fix topn fetch set real default value (#20074)
1. Before this PR if rowset does not contain column which should be read for related SlotDescriptor will call `insert_default` to column, but it's not this real defautl value.Real default value relevant information should be provided by the frontend side.

2. Support fetch when light schema change is not enabled, but disable for AGG or UNIQUE MOR model
2023-05-26 16:06:55 +08:00
488c9ba7c2 [improvement](exchange) test: data stream sender stop sending data to receiver if it returns eos early (#20081) 2023-05-26 16:05:38 +08:00
Pxl
43aa062fb1 [Chore](hash-join) remove useless conditions and add some case (#20050) 2023-05-26 14:45:24 +08:00
315b30c23d [testcase](union) add test case for union of decimal (#20080) 2023-05-26 14:12:14 +08:00