doris

Author	SHA1	Message	Date
lihangyu	b23a785775	[Fix](Variant) support materialize view for variant and accessing variant subcolumns (#30603 ) * [Fix](Variant) support materialize view for variant and accessing variant subcolumns 1. fix schema change with path lost and lead to invalid data read 2. support element_at function in BE side and use simdjson to parse data 3. fix multi slot expression	2024-02-16 10:12:23 +08:00
Xinyi Zou	08508d65fd	[feature-wip](plsql)(step1) Support PL-SQL (#30817 ) # 1. Motivation PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL. Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL. Reference documentation: Hive: http://mail.hplsql.org Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715 Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html # 2. Implementation Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol. ``` CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int) select count() from test; select count() into result from test where k = name; END declare result INT default = 0; call A(‘xxx’, result); print result; ``` ![image](https://github.com/apache/doris/assets/13197424/0b78e039-0350-4ef1-bef3-0ebbf90274cd) 1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata. 2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>. 3. Execute Doris Statement - Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax. - Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult. - Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables. - Stored Programs compatible with Mysql protocol support multiple statements. 4. Execute PL-SQL Statement - Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL. # 3. TODO 1. Support drop procedure. 2. Create procedure only in `PlSqlOperation`. 3. Doris Parser supports declare variable. 4. Select Statement supports insert into variable. 5. Parameters and fields have the same name. 6. If Cursor exits halfway, will there be a memory leak? 7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters? 8. Supports complex types such as Map and Struct. 9. Test syntax such as Package. 10. Support UDF 11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN, but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later. 12. Built-in functions require a separate management. 13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt. 14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt. # 4. Some questions 1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error; 2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste. 3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented. 4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution. 5. The format of the result returned by Doris Statement is ```xxxx\n, xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results. # 5. Some thoughts The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process. HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated. ![image](https://github.com/apache/doris/assets/13197424/7539e485-0161-44de-9100-1a01ebe6cc07)	2024-02-16 10:12:23 +08:00
xzj7019	c6cd6b125d	[nereids] group by key elimination (#30774 )	2024-02-16 10:12:23 +08:00
meiyi	2667e10ba2	[improve](group-commit) Modify some log (#30842 )	2024-02-16 10:12:23 +08:00
abmdocrt	2cb46eed94	[Feature](auto-inc) Add start value for auto increment column (#30512 )	2024-02-16 10:12:23 +08:00
seawinde	5c2a4a80dd	[fix](nereids) Fix use aggregate mv wrongly when rewrite query which only contains join (#30858 ) the materialized view def is as following: > select > o_orderdate, > o_shippriority, > o_comment, > l_orderkey, > o_orderkey, > count() > from > orders > left join lineitem on l_orderkey = o_orderkey > group by o_orderdate, > o_shippriority, > o_comment, > l_orderkey; the query should rewrite success by using above materialized view > select > o_orderdate, > o_shippriority, > o_comment, > l_orderkey, > ps_partkey, > count() > from > orders left > join lineitem on l_orderkey = o_orderkey > left join partsupp on ps_partkey = l_orderkey > group by > o_orderdate, > o_shippriority, > o_comment, > l_orderkey, > ps_partkey;	2024-02-16 10:12:23 +08:00
zy-kkk	92226c986a	[fix](catalog) fix data_sub/data_add func pushdown in jdbcscan (#30807 )	2024-02-06 08:35:54 +08:00
Guangdong Liu	1ed24117ac	[function](url_decode)add url_decode function (#30667 )	2024-02-05 22:23:00 +08:00
HowardQin	0d32aeeaf6	[improvement](load) Enable lzo & Remove dependency on Markus F.X.J. Oberhumer's lzo library (#30573 ) Issue Number: close #29406 1. increase lzop version to 0x1040, I set to 0x1040 only for decompressing lzo files compressed by higher version of lzop, no change of decompressing logic, actully, 0x1040 should have "F_H_FILTER" feature, but it mainly for audio and image data, so we do not support it. 2. use orc::lzoDecompress() instead of lzo1x_decompress_safe() to decompress lzo data 3. use crc32c::Extend() instead of lzo_crc32() 4. use olap_adler32() instead of lzo_adler32() 5. thus, remove dependency of Markus F.X.J. Oberhumer's lzo library 6. remove DORIS_WITH_LZO, so lzo file are supported by stream and broker load by default 7. add some regression test	2024-02-05 22:00:24 +08:00
morrySnow	3a752b758a	[fix](Nereids) colcoate node attr lost after merge fragment (#30818 )	2024-02-05 21:58:08 +08:00
zhangdong	fc762f426b	[enhance](mtmv) mtmv disable hive auto refresh (#30775 ) - If the `related table` is `hive`, do not refresh automatically - If the `related table` is `hive`, the partition col is allowed to be `null`. Otherwise, it must be `not null` - add more `ut`	2024-02-05 21:56:57 +08:00
yangshijie	8ff8d94697	[fix](ip) change IPv6 to little-endian byte order storage (like IPv4) (#30730 )	2024-02-05 21:56:57 +08:00
Pxl	1d39e16eda	[Bug](compaction) pass arena to function->add_batch_range (#30709 )	2024-02-04 14:28:38 +08:00
zhangdong	b275cb0f44	[feature](mtmv) mtmv support workload group (#29595 ) MTMV supports controlling the resource usage of refresh tasks by setting the name of workload group about workload group : https://doris.apache.org/zh-CN/docs/dev/admin-manual/workload-group	2024-02-04 14:28:38 +08:00
Rohit Satardekar	6442663735	[Function](exec) upport atan2 math function (#30672 ) Co-authored-by: Rohit Satardekar <rohitrs1983@gmail.com>	2024-02-04 14:28:38 +08:00
Jack Drogon	d749fc3d27	[improvement](binlog) Change BinlogConfig default TTL_SECONDS to 86400 (1day) (#30771 ) * Change BinlogConfig default TTL_SECONDS to 86400 (1day) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com> * Fix binlog.ttl_seconds in regression test Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com> --------- Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2024-02-04 14:28:38 +08:00
seawinde	5aed3abb8a	[Fix](Nereids) Fix rewrite by materialized view fail when join input has agg (#30734 ) materialized view definition is as following, and the query sql is the same when outer group by use the col1 in the inner group, which can be rewritten by materialized view select t1.o_orderdate, t1.o_orderkey, t1.col1 from ( select o_orderkey, o_custkey, o_orderstatus, o_orderdate, sum(o_shippriority) as col1 from orders group by o_orderkey, o_custkey, o_orderstatus, o_orderdate ) as t1 left join lineitem on lineitem.l_orderkey = t1.o_orderkey group by t1.o_orderdate, t1.o_orderkey, t1.col1	2024-02-03 20:27:04 +08:00
Nitin-Kashyap	d99bb51d36	[fix](legacy-planner) fixed loss of BetweenPredicate rewrite on reanalyze in legacy planner (29798) (#30328 )	2024-02-03 20:26:04 +08:00
jakevin	8a0ea4b651	[enhancement](Nereids): datetime support microsecond overflow (#30744 )	2024-02-03 20:26:04 +08:00
jakevin	151735748b	[test](Nereids): add push_down_distinct_through_join regression test (#30760 )	2024-02-03 20:26:03 +08:00
Pxl	5687ca977d	[Bug](java-udf) fix core dump when javaudf input 0 row block (#30720 ) fix core dump when javaudf input 0 row block	2024-02-03 20:25:25 +08:00
Pxl	0f47f7f389	[Feature](runtime filter) normalize ignore runtime filter (#30152 ) normalize ignore runtime filter	2024-02-03 20:24:39 +08:00
minghong	e5bdc369e2	[runtimefilter](nereids)push down RF into cte producer (#30568 ) * push down Rf into CTE	2024-02-03 20:24:39 +08:00
abmdocrt	82bb3ed50f	[Fix](group commit) Fix pre allocated err handling for group commit async load and add regression test #30718	2024-02-02 13:31:47 +08:00
Qi Chen	9100fba47e	[Fix](parquet-reader) Fix decimal test case out files. (#30715 )	2024-02-01 21:17:17 +08:00
zclllyybb	3315c16383	[enhance](function) refactor from_format_str and support more format (#30452 )	2024-02-01 19:08:37 +08:00
zhangstar333	fd2d9ae63e	[improve](test) fix regression test case report error when run times (#30531 )	2024-02-01 19:01:08 +08:00
xueweizhang	203daba19d	[fix](outfile) fix outfile csv did not write json column with string (#29067 )	2024-02-01 19:01:08 +08:00
zzzxl	1ac5b45180	[fix](invert index) fixed the issue of insufficient index idx generation during partial column updates. (#30678 )	2024-02-01 19:01:08 +08:00
meiyi	ecf282ca92	[improve](catalog recycle bin) show data size info when show catalog recycle bin (#30592 )	2024-02-01 19:00:51 +08:00
zzwwhh	b86bd2672f	[fix](Nereids) add logical project to prevent extra wrong column (#30459 ) Issue Number: close #30264	2024-02-01 19:00:50 +08:00
seawinde	1ab37737ae	[Test](Nereids) Add SSB dataset to test materialized view rewrite (#30528 ) * [Test](Nereids) Add SSB dataset to test materialized view rewrite * rollback irrelevant code * fix sort slot 0	2024-02-01 19:00:50 +08:00
Qi Chen	92cad69fc4	[Fix](parquet-reader) Fix reading fixed length byte array decimal in parquet reader. (#30535 )	2024-01-31 23:53:40 +08:00
Jerry Hu	77b366fc4b	[fix](join) incorrect result of mark join (#30543 ) incorrect result of mark join	2024-01-31 23:53:40 +08:00
morrySnow	8aaae4c873	[fix](Nereids) div priority is not right (#30575 )	2024-01-31 23:53:40 +08:00
Pxl	bf582cd5d3	[Chore](case) reset all variables at start on set_and_unset_variable case (#30580 ) reset all variables at start on set_and_unset_variable case	2024-01-31 23:53:39 +08:00
Rohit Satardekar	19f57b544e	support cosh math function (#30602 ) Co-authored-by: Rohit Satardekar <rohitrs1983@gmail.com>	2024-01-31 23:53:39 +08:00
lihangyu	e6fbccd3ed	[Feature](Variant) support row store for variant type (#30052 )	2024-01-31 23:53:39 +08:00
yangshijie	8b61b7c6cd	[exec](function) Add tanh func (#30555 )	2024-01-31 23:53:39 +08:00
wuwenchi	7d037c12bf	[bugfix](paimon)fix paimon testcases (#30514 ) 1. set default timezone 2. not supported `char` type to pushdown	2024-01-31 23:53:39 +08:00
yangshijie	221308f78a	[fix](datatype) fix bugs for IPv4/v6 datatype and add some basic regression test cases (#30261 )	2024-01-31 23:53:39 +08:00
jakevin	12827ceb16	[fix](Nereids): fix wrong regression test (#30520 )	2024-01-30 15:33:40 +08:00
wuwenchi	4648902350	[bugfix](iceberg)fix read NULL with date partition (#30478 ) * fix date * fix date * add case	2024-01-30 15:32:43 +08:00
谢健	5731ed7aad	[fix](Nereids): add order by when testing pkfk to avoid unstable res #30507	2024-01-30 15:32:42 +08:00
qiye	b712f0b810	[improvement](index)add index_id column in show index stmt (#30431 )	2024-01-30 15:32:42 +08:00
Guangdong Liu	009bca9652	[regression test](broker load) add partition load case (#28259 )	2024-01-30 15:30:39 +08:00
Guangdong Liu	5f20d7c5d0	[regression test](stream load) test for `enable_profile` (#28534 )	2024-01-30 15:30:39 +08:00
Guangdong Liu	57a8c75ddc	[regression test](schema change) add case for column type change (#30472 )	2024-01-30 15:30:39 +08:00
zhangstar333	f7e01ceffa	[bug](node) add dependency for set operation node (#30203 ) These sinks must be completed one by one in order, eg: child(1) must wait child(0) build finish	2024-01-30 15:30:39 +08:00
谢健	f17d29090e	[feat](Nereids): drop foreign key after dropping primary key that is referenced by the foreign key (#30417 )	2024-01-29 19:03:48 +08:00

1 2 3 4 5 ...

2503 Commits