Commit Graph

2551 Commits

Author SHA1 Message Date
3576304e3e [regression test]Test UniqueModel Schema Value Change (#31260)
* Test UniqueModel Schema Value Change

* Opt Test UniqueModel Schema value Change
2024-02-23 19:03:28 +08:00
8f77e6363a [Feature](function) Support xxhash function like murmur hash function (#31193) 2024-02-23 19:03:28 +08:00
1456785aa1 [fix](join) incorrect result of mark join in nested loop join (#31280) 2024-02-23 19:03:28 +08:00
f65876d803 [Feature](explode) support explode map type (#30151) 2024-02-22 20:08:44 +08:00
22efb1cb7a [fix](Nereids) not equals and hashCode should contains generate flag (#31123) 2024-02-22 19:51:20 +08:00
260568db17 [update](hudi) update hudi version to 0.14.1 and compatible with flink hive catalog (#31181)
1. Update hudi version from 0.13.1 to .14.1
2. Compatible with the hudi table created by flink hive catalog
2024-02-22 19:51:20 +08:00
98c3cb825f [fix](Nereids) simplify airthmetic should not change return type (#31237) 2024-02-22 19:50:47 +08:00
ef22bd3318 [fix](txn insert) Txn insert can not write to table with mv (#31167) 2024-02-22 13:01:49 +08:00
ad07dec0ed [Improve](InPredict) enhance in predict with struct type (#30840) 2024-02-22 13:01:49 +08:00
8a496bf554 [regression test]Optimize testUniqueModelSchemaKeyChange (#31150) 2024-02-22 13:01:48 +08:00
ca89ac0c23 [fix](regression) insert_group_commit_into_unique is unstable when cluster has multi BE (#31188) 2024-02-21 19:18:45 +08:00
49dd411f87 [fix](datetime) fix datetime round on BE (#31205)
with tmp as (
            select CONCAT(
                YEAR('2024-02-06 03:37:07.157'), '-', 
                LPAD(MONTH('2024-02-06 03:37:07.157'), 2, '0'), '-',
                LPAD(DAY('2024-02-06 03:37:07.157'), 2, '0'), ' ',
                LPAD(HOUR('2024-02-06 03:37:07.157'), 2, '0'), ':',
                LPAD(MINUTE('2024-02-06 03:37:07.157'), 2, '0'), ':',
                LPAD(SECOND('2024-02-06 03:37:07.157'), 2, '0'), '.', "123456789" )
            AS generated_string)
            select generated_string, cast(generated_string as DateTime(6)) from tmp
before (incorrect round)

+-------------------------------+-----------------------------------------+
| generated_string              | cast(generated_string as DATETIMEV2(6)) |
+-------------------------------+-----------------------------------------+
| 2024-02-06 03:37:07.123456789 | 2024-02-06 03:37:07.123456              |
+-------------------------------+-----------------------------------------+
after (round up, keep consistent with mysql):

+-------------------------------+-----------------------------------------+
| generated_string              | cast(generated_string as DATETIMEV2(6)) |
+-------------------------------+-----------------------------------------+
| 2024-02-06 03:37:07.123456789 | 2024-02-06 03:37:07.123457              |
+-------------------------------+-----------------------------------------+
1 row in set (0.03 sec)
same work with #30744 but implemented on BE
2024-02-21 19:18:45 +08:00
eb5e6a829a [fix](nereids)should normalize window expression by bottom project's output (#31115) 2024-02-21 19:18:45 +08:00
8d889e434b [fix](topn) Fix key topn block reverse is missed in some cases (#31199)
* move reverse block row order operation after _next_batch_internal

* add testcase
2024-02-21 19:18:45 +08:00
8fc9d80479 [compatibility](MySQL) update charset to utf8mb4, collation to utf8mb4_0900_bin (#31046)
Doris's behaviour is more like utf8mb4 and utf8mb4_0900_bin than utf8 and utf8_general_ci
2024-02-21 17:01:39 +08:00
f1fbfeba2f [fix](nereids) removeRuntimeFilter() removes more than one RFs if there are different types of RF from the buildNode to the target (#31197)
this bug can be produced on tpcds 95, when set runtime filter type: min-max + bloom
2024-02-21 17:01:39 +08:00
98106ad60f [fix](nereids) fix bug of unnest subuqery with having clause (#31152)
1. run PushDownFilterThroughAggregation, PushDownFilterThroughProject and MergeFilters before subquery unnesting
2. should keep plan unchanged even left semi join's condition is true (because the right table may be empty())
3. PushDownFilterThroughProject need check if filter's input slots are all coming from project's output
2024-02-21 17:01:30 +08:00
a8d8c6a271 [fix](file-writer) opt s3 file writer and fix empty file related issue #28983 #30703 #31169 (#31213)
* (feature)(cloud) Use dynamic allocator instead of static buffer pool for better elasticity. (#28983)

* [fix](outfile) Fix unable to export empty data (#30703)

Issue Number: close #30600
Fix unable to export empty data to hdfs / S3, this behavior is inconsistent with version 1.2.7,
version 1.2.7 can export empty data to hdfs/ S3, and there will be exported files on S3/HDFS.

* [fix](file-writer) avoid empty file for segment writer (#31169)

---------

Co-authored-by: AlexYue <yj976240184@gmail.com>
Co-authored-by: zxealous <zhouchangyue@baidu.com>
2024-02-21 16:48:54 +08:00
278b232e76 [Bug](json reader) object should stop processing when encounter error (#31159)
If DATA_QUALITY_ERROR encountered we should stop processing this document any more.Otherwise there will be UB in simdjson.
2024-02-21 13:53:32 +08:00
cd7230885f [fix](nereids)push more than one runtime filters into cte (#30901)
* push rf into cte, used by tpcds95
2024-02-21 13:53:32 +08:00
1e3968fe7e [fix](group_commit) Need to wait wal to be deleted when creating MaterializedView (#30956) 2024-02-21 13:53:19 +08:00
1d6dc9a5f0 [fix](catalog recycle bin) Forbid recover partition if table schame is changed #31146 2024-02-21 13:53:18 +08:00
97c9d75af3 [Feature](executor)Add scan_thread_num property for workload group (#31106) 2024-02-20 16:24:05 +08:00
Pxl
493385c2c7 [Bug](AggState) fix not match function when agg combinator function has alias (#31101) 2024-02-20 16:23:53 +08:00
a7a530189e [regression-test](decimalv3) Adding numerical precision exceeding the data type will error case (#31068)
* Adding numerical precision exceeding the data type will result in an error case

* add order by

---------

Co-authored-by: smallhibiscus <844981280>
2024-02-20 16:23:53 +08:00
180fc13f6b [fix](nereids) disable PushDownJoinOtherCondition rule for mark join (#31084) 2024-02-19 17:48:29 +08:00
b3ac2128dd [Refactor](catalog) Refactor Jdbc Catalog external name case mapping rules (#28414) 2024-02-19 17:22:03 +08:00
8db2824c44 [bugfix](es catalog) add constant_keyword wildcard data type (#30947) 2024-02-19 17:20:21 +08:00
6e4f76de54 [improvement](jdbc catalog) Delete unnecessary schema and optimize insert logic (#30880)
In the previous design, we were compatible with MySQL's auto-increment column and default value to bypass the null value check when writing back Jdbc External Table. However, because MySQL's default value is not completely unified with Doris, this resulted in The unsuitable default value is wrong. In response to this situation, I made the following optimizations
1. For JDBC External Table, we always allow certain columns to be missing during insertion. Even if these columns are not allowed to be empty at the source end, the error should be generated by the source end, not Doris herself.
2. When the target column is non-nullable and the insertion is done via `INSERT INTO tbl VALUES()` or `INSERT INTO tbl SELECT constants`, Doris should verify any inconsistency between them and throw an exception. This check is not applied for `INSERT INTO tbl SELECT ... FROM tbl` operations.
2024-02-19 17:20:21 +08:00
870a9342b7 [fix](function) fix extract_url_parameter's bug then get the last key (#30929)
fix extract_url_parameter's bug then get the last key
2024-02-18 14:45:25 +08:00
6cf7468073 [enhancement](function) change some function nullable mode (#30991)
change some function nullable mode
2024-02-18 14:45:25 +08:00
d70776af55 [feature](agg-func) support covar and covar_samp function (#30983) 2024-02-18 11:50:17 +08:00
acdc9575ad [fix](function) incorrect result of 'equal_for_null' (#30990) 2024-02-18 11:50:16 +08:00
e68019c10a [Function](Exec) Support windows function cume_dist (#30997) 2024-02-16 10:16:40 +08:00
f65844fae4 [Enhencement](Outfile/Export) Export data to csv file format with BOM (#30533)
The UTF8 format of the Windows system has BOM. 

We add a new user property to `Outfile/Export`。Therefore, when exporting Doris data, users can choose whether to bring BOM on the beginning of the CSV file.

**Usage:**
```sql
-- outfile:
select * from demo.student
into outfile "file:///xxx/export/exp_"
format as csv
properties(
    "column_separator" = ",",
    "with_bom" = "true"
);

-- Export:
EXPORT TABLE student TO "file:///xx/tmpdata/export/exp_"
PROPERTIES(
    "format" = "csv",
    "with_bom" = "true"
);
```
2024-02-16 10:16:40 +08:00
ce892d04e5 [feature](index) Replace BITMAP INDEX with INVERTED INDEX (#30950) 2024-02-16 10:16:40 +08:00
49cec39038 [test](mtmv) Add materialized view which contain external table rewrite test (#30975) 2024-02-16 10:16:39 +08:00
40e1326bc9 [feature](window-func) support percent_rank window function (#30926) 2024-02-16 10:12:25 +08:00
448fb70f68 [fix](nereids) collect all correlated slots from subquery in correct way (#30908) 2024-02-16 10:12:25 +08:00
d60ecdba6f [fix](regex) fix wrong escape of function LIKE (#30557)
fix wrong escape of function LIKE
2024-02-16 10:12:25 +08:00
b0835b4336 [fix](test) Increase the timeout duration for the test case (#30952) 2024-02-16 10:12:24 +08:00
5e6e2f8061 [fix](nereids)should not infer predicate for nullaware anti-join (#30924) 2024-02-16 10:12:24 +08:00
c72f634c10 [enhancement](schema change) some types changes (#30919) 2024-02-16 10:12:24 +08:00
de1724ab6a [case](mtmv) MTMV hive case (#30930) 2024-02-16 10:12:24 +08:00
3017c0a6ff [enhance](mtmv) Limit the number of partitions for table creation (#30867)
- Creating too many partitions is time-consuming, so limiting the number of partitions
- add more case,such as `mor`,`mow`
2024-02-16 10:12:24 +08:00
2bb477bae7 [feature](agg-func) support corr function #30822 2024-02-16 10:12:24 +08:00
e086d0d719 [test](statistics)Add analyze mtmv test case (#30847) 2024-02-16 10:12:23 +08:00
0442d5dc0e [fix](Variant Type) Add sparse columns meta to fix compaction (#28673)
Co-authored-by: eldenmoon <15605149486@163.com>
2024-02-16 10:12:23 +08:00
b23a785775 [Fix](Variant) support materialize view for variant and accessing variant subcolumns (#30603)
* [Fix](Variant) support materialize view for variant and accessing variant subcolumns
1. fix schema change with path lost and lead to invalid data read
2. support element_at function in BE side and use simdjson to parse data
3. fix multi slot expression
2024-02-16 10:12:23 +08:00
08508d65fd [feature-wip](plsql)(step1) Support PL-SQL (#30817)
# 1. Motivation
PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL.

Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL.

Reference documentation:
Hive: http://mail.hplsql.org
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715
Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html

# 2. Implementation
Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol.
```
CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int)
          select count(*) from test;
          select count(*) into result from test where k = name;
END

declare result INT default = 0;
call A(‘xxx’, result);
print result;
```
![image](https://github.com/apache/doris/assets/13197424/0b78e039-0350-4ef1-bef3-0ebbf90274cd)

1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata.
2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>.
3. Execute Doris Statement
     - Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax.
     - Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult.
     - Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables.
     - Stored Programs compatible with Mysql protocol support multiple statements.
4. Execute PL-SQL Statement
     - Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL.

# 3. TODO
1. Support drop procedure.
2. Create procedure only in `PlSqlOperation`.
3. Doris Parser supports declare variable.
4. Select Statement supports insert into variable.
5. Parameters and fields have the same name.
6. If Cursor exits halfway, will there be a memory leak?
7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters?
8. Supports complex types such as Map and Struct.
9. Test syntax such as Package.
10. Support UDF
11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN,
but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later.
12. Built-in functions require a separate management.
13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt.
14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt.

# 4. Some questions
1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error;
2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste.
3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented.
4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution.
5. The format of the result returned by Doris Statement is ```xxxx\n, xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results.

# 5. Some thoughts
The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process.
HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated.
![image](https://github.com/apache/doris/assets/13197424/7539e485-0161-44de-9100-1a01ebe6cc07)
2024-02-16 10:12:23 +08:00