Commit Graph

1137 Commits

Author SHA1 Message Date
92e3b31f50 [feature](invert index) match_phrase_edge feature added (#31142) 2024-02-29 19:51:18 +08:00
0b5b7175d6 [fix](multi-catalog) add max compute custom odps and tunnel url (#31390)
add max compute custom odps and tunnel url
2024-02-29 16:44:40 +08:00
9c4708ee74 [function](random_bytes)add random_bytes function (#31547)
SELECT random_bytes(10);

random_bytes(10) |
----------------------+
0x9b8ea00b7d1084bc5b26|
2024-02-29 16:44:39 +08:00
7f566f9365 Reset report_workload_runtime_status to optional (#31479) 2024-02-28 13:07:47 +08:00
c0754583cb [opt](plsql) Fix procedure key compatibility (#31445)
use dbId replace dbName, because dbName may be renamed by Alter.
procedure key add package name (only reserved, currently no plans to support package)
Optimize procedure create and exception
2024-02-28 13:07:47 +08:00
c34639245e [Improvement](executor)add remote scan thread pool (#31376)
* add remote scan thread pool

* +1
2024-02-27 10:12:33 +08:00
1127b0065a [Improment](executor)Add scanbytes/scanrows condition (#31364)
* Add scanbytes/scanrows condition

* fix reg
2024-02-27 10:12:33 +08:00
f163d56a98 [feature](function) support sequence function(alias of array_range), enhance both to handle datetimev2 (#30823) 2024-02-27 10:12:19 +08:00
41a67fc218 change to 2.1.0-rc10 2024-02-26 01:36:54 +08:00
8f77e6363a [Feature](function) Support xxhash function like murmur hash function (#31193) 2024-02-23 19:03:28 +08:00
7bb276d071 change to rc09 2024-02-22 22:23:01 +08:00
4735c5b50f 2.1.0-rc08 2024-02-20 23:45:35 +08:00
97c9d75af3 [Feature](executor)Add scan_thread_num property for workload group (#31106) 2024-02-20 16:24:05 +08:00
6cf7468073 [enhancement](function) change some function nullable mode (#30991)
change some function nullable mode
2024-02-18 14:45:25 +08:00
f65844fae4 [Enhencement](Outfile/Export) Export data to csv file format with BOM (#30533)
The UTF8 format of the Windows system has BOM. 

We add a new user property to `Outfile/Export`。Therefore, when exporting Doris data, users can choose whether to bring BOM on the beginning of the CSV file.

**Usage:**
```sql
-- outfile:
select * from demo.student
into outfile "file:///xxx/export/exp_"
format as csv
properties(
    "column_separator" = ",",
    "with_bom" = "true"
);

-- Export:
EXPORT TABLE student TO "file:///xx/tmpdata/export/exp_"
PROPERTIES(
    "format" = "csv",
    "with_bom" = "true"
);
```
2024-02-16 10:16:40 +08:00
24e80b23a5 [Feature](executor)Support ShowProcessStmt Show all Fe connection (#30907) 2024-02-16 10:16:39 +08:00
08c196f3dc [enhancement](stmt-forward) record query result for proxy query to avoid EOF (#30536) 2024-02-16 10:12:25 +08:00
366a6792bf [refactor](scanner) refactoring and optimizing scanner scheduling (#30746) 2024-02-16 10:12:24 +08:00
0442d5dc0e [fix](Variant Type) Add sparse columns meta to fix compaction (#28673)
Co-authored-by: eldenmoon <15605149486@163.com>
2024-02-16 10:12:23 +08:00
08508d65fd [feature-wip](plsql)(step1) Support PL-SQL (#30817)
# 1. Motivation
PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL.

Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL.

Reference documentation:
Hive: http://mail.hplsql.org
Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715
Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html

# 2. Implementation
Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol.
```
CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int)
          select count(*) from test;
          select count(*) into result from test where k = name;
END

declare result INT default = 0;
call A(‘xxx’, result);
print result;
```
![image](https://github.com/apache/doris/assets/13197424/0b78e039-0350-4ef1-bef3-0ebbf90274cd)

1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata.
2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>.
3. Execute Doris Statement
     - Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax.
     - Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult.
     - Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables.
     - Stored Programs compatible with Mysql protocol support multiple statements.
4. Execute PL-SQL Statement
     - Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL.

# 3. TODO
1. Support drop procedure.
2. Create procedure only in `PlSqlOperation`.
3. Doris Parser supports declare variable.
4. Select Statement supports insert into variable.
5. Parameters and fields have the same name.
6. If Cursor exits halfway, will there be a memory leak?
7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters?
8. Supports complex types such as Map and Struct.
9. Test syntax such as Package.
10. Support UDF
11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN,
but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later.
12. Built-in functions require a separate management.
13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt.
14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt.

# 4. Some questions
1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error;
2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste.
3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented.
4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution.
5. The format of the result returned by Doris Statement is ```xxxx\n, xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results.

# 5. Some thoughts
The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process.
HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated.
![image](https://github.com/apache/doris/assets/13197424/7539e485-0161-44de-9100-1a01ebe6cc07)
2024-02-16 10:12:23 +08:00
8d4e0c50c6 [feature](sink) support paritition tablet sink shuffle (#30821) 2024-02-16 10:12:23 +08:00
1ed24117ac [function](url_decode)add url_decode function (#30667) 2024-02-05 22:23:00 +08:00
48aaaa8005 [Enhancement](fuction) change function REPEAT nullable mode (#30743) 2024-02-04 22:21:36 +08:00
6442663735 [Function](exec) upport atan2 math function (#30672)
Co-authored-by: Rohit Satardekar <rohitrs1983@gmail.com>
2024-02-04 14:28:38 +08:00
4f8730d092 [improvement](jdbc catalog) Optimize connection pool parameter settings (#30588)
This PR makes the following changes to the connection pool of JDBC Catalog
1. Set the maximum connection survival time, the default is 30 minutes

-   Moreover, one-half of the maximum survival time is the recyclable time,
-   One-tenth is the check interval for recycling connections

2. Keepalive only takes effect on the connection pool on BE, and will be activated based on one-fifth of the maximum survival time.
3. The maximum number of existing connections is changed from 100 to 10
4. Add the connection cache recycling thread on BE, and add a parameter to control the recycling time, the default is 28800 (8 hours)
5. Add CatalogID to the key of the connection pool cache to achieve better isolation, requires refresh catalog to take effect
6. Upgrade druid connection pool to version 1.2.20
7. Added JdbcResource's setting of default parameters when upgrading the FE version to avoid errors due to unset parameters.
2024-02-03 20:26:03 +08:00
94eedd8ea4 [Enhancement](function)make SUBSTRING_INDEX function DEPEND_ON_ARGUMENT (#30392) 2024-02-02 13:31:47 +08:00
cd65a8c9a7 Remove useless statistics report path (#30687) 2024-02-01 23:14:14 +08:00
7c7a423828 Sync stats cache while task finished, doesn't need to query column_statistics table. (#30609) 2024-01-31 23:53:40 +08:00
19f57b544e support cosh math function (#30602)
Co-authored-by: Rohit Satardekar <rohitrs1983@gmail.com>
2024-01-31 23:53:39 +08:00
8b61b7c6cd [exec](function) Add tanh func (#30555) 2024-01-31 23:53:39 +08:00
0433b8730d [Feature](profile)add shuffle send rows/bytes #30456 2024-01-28 18:25:08 +08:00
f988686708 2.1.0-rc07 2024-01-27 10:55:03 +08:00
2284575afa [opt](nereids)set flag to indicate if bloom filter size is calculated by ndv (#30278)
set flag to indicate if bloom filter size is calculated by ndv
2024-01-27 10:07:41 +08:00
713798d549 [feature](nereids)support mark join (#30133)
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
2024-01-27 09:09:53 +08:00
a954bab81a [fix](function) fix error result in time_to_sec and timediff (#30248) 2024-01-27 09:08:29 +08:00
584d73e3f9 change to rc06 2024-01-27 08:56:22 +08:00
64f7a5f395 [fix](fe)add from_unixtime to null_result_with_one_null_param_functions set (#30388) 2024-01-25 23:56:39 +08:00
ca5a314765 [fix](function) make STRLEFT and STRRIGHT and SUBSTR function DEPEND_ON_ARGUMENT (#28352)
make STRLEFT and STRRIGHT function DEPEND_ON_ARGUMENT
2024-01-25 13:23:59 +08:00
c7360fd014 [feature](function) support ip function named ipv4_cidr_to_range(addr, cidr) (#29819)
* support ip function ipv4_cidr_to_range

* fix ipv4_cidr_to_range function only support ipv4 type
2024-01-24 10:02:03 +08:00
4cbacb5b39 [enhancement](recover) Support skipping bad tablet in select by session variable (#30241)
In some scenarios, user has a huge amount of data and only a single replica was specified when creating the table, if one of the tablet is damaged, the table will not be able to be select. If the user does not care about the integrity of the data, they can use this variable to temporarily skip the bad tablet for querying and load the remaining data into a new table.
2024-01-24 09:59:43 +08:00
32c5153999 [fix](routine-load) pause job when json path is invalid #30197
If jsonpaths is set wrong, routine load job will report error but running all time.For example:

CREATE ROUTINE LOAD jobName ON tableName
PROPERTIES
(
    "format" = "json",
    "max_batch_interval" = "5",
    "max_batch_rows" = "300000",
    "max_batch_size" = "209715200",
    "jsonpaths" = "[\'t\',\'a\']"
)
FROM KAFKA
(
    "kafka_broker_list" = "$IP:PORT",
    "kafka_topic" = "XXX",
    "property.kafka_default_offsets" = "OFFSET_BEGINNING"
);
Jsonpaths ['t','a'] is invalid, but job will running all time.
2024-01-23 10:12:37 +08:00
9c742d46a2 [fix](group commit) abort txn should use label if replay wal failed (#30219) 2024-01-23 10:12:35 +08:00
9e0c518aaf [Feature](executor)Workload Group support Non-Pipeline Execution (#30164) 2024-01-23 10:11:25 +08:00
d5d0e5e611 [feature](function) support ip functions named to_ipv4[or_default, or_null](string) and to_ipv6[or_default, or_null](string) (#29838) 2024-01-23 10:09:54 +08:00
e5dea910bf [feature](bitwise function) bit_count/bit_shift_left/bit_shift_right implementation (#30046) 2024-01-23 10:09:54 +08:00
dfde10d4c8 [improvement](function) switch inet(6)_aton alias origin function (#30196) 2024-01-23 10:09:54 +08:00
1bb1d35f70 [fix](group commit) Fix some group commit case (#30132) 2024-01-23 10:07:51 +08:00
ead3b4ac1d [feature](function) support ip function is_ipv4_compat, is_ipv4_mapped (#29954) 2024-01-23 10:07:51 +08:00
09d46dc6e4 2.1.0-rc05 2024-01-23 10:01:54 +08:00
97b2a3b993 [improvement](ip function) refactor some ip functions and remove dirty codes (#30080) 2024-01-19 15:48:56 +08:00