doris

Author	SHA1	Message	Date
Jibing-Li	91887a285e	Implement HLL with 128 buckets to support statistics cache. (#34124 )	2024-04-26 15:05:36 +08:00
Lei Zhang	2a1fbfd72c	[feat](fe) Add `ignore_bdbje_log_checksum_read` for BDBEnvironment (#31247 ) * https://forums.oracle.com/ords/apexds/post/je-log-checksumexception-2812 * When meeting disk damage or other reason described in the oracle forums and fe cannot start due to `com.sleepycat.je.log.ChecksumException`, we add a param `ignore_bdbje_log_checksum_read` to ignore the exception, but there is no guarantee of correctness for bdbje kv data Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>	2024-04-22 22:33:24 +08:00
Mingyu Chen	88b3d61eca	[refactor](Mysql) Refactoring the process of using external components to authenticate in MySQL connections (#32875 ) (#33958 ) bp #32875 Co-authored-by: LompleZ Liu <47652868+LompleZ@users.noreply.github.com>	2024-04-22 16:41:49 +08:00
924060929	15f8014e4e	[enhancement](Nereids) Enable parse sql from sql cache and fix some bugs (#33867 ) * [enhancement](Nereids) Enable parse sql from sql cache (#33262) Before this pr, the query must pass through parser, analyzer, rewriter, optimizer and translator, then we can check whether this query can use sql cache, if the query is too long, or the number of join tables too big, the plan time usually >= 500ms. This pr reduce this time by skip the fashion plan path, because we can reuse the previous physical plan and query result if no any changed. In some cases we should not parse sql from sql cache, e.g. table structure changed, data changed, user policies changed, privileges changed, contains non-deterministic functions, and user variables changed. In my test case: query a view which has lots of join and union, and the tables has empty partition, the query latency is about 3ms. if not parse sql from sql cache, the plan time is about 550ms ## Features 1. use Config.sql_cache_manage_num to control how many sql cache be reused in on fe 2. if explain plan appear some plans contains `LogicalSqlCache` or `PhysicalSqlCache`, it means the query can use sql cache, like this: ```sql mysql> set enable_sql_cache=true; Query OK, 0 rows affected (0.00 sec) mysql> explain physical plan select * from test.t; +----------------------------------------------------------------------------------+ \| Explain String(Nereids Planner) \| +----------------------------------------------------------------------------------+ \| cost = 3.135 \| \| PhysicalResultSink[53] ( outputExprs=[c1#0, c2#1] ) \| \| +--PhysicalDistribute[50]@0 ( stats=3, distributionSpec=DistributionSpecGather ) \| \| +--PhysicalOlapScan[t]@0 ( stats=3 ) \| +----------------------------------------------------------------------------------+ 4 rows in set (0.02 sec) mysql> select * from test.t; +------+------+ \| c1 \| c2 \| +------+------+ \| 1 \| 2 \| \| -2 \| -2 \| \| NULL \| 30 \| +------+------+ 3 rows in set (0.05 sec) mysql> explain physical plan select * from test.t; +-------------------------------------------------------------------------------------------+ \| Explain String(Nereids Planner) \| +-------------------------------------------------------------------------------------------+ \| cost = 0.0 \| \| PhysicalSqlCache[2] ( queryId=78511f515cda466b-95385d892d6c68d0, backend=127.0.0.1:9050 ) \| \| +--PhysicalResultSink[52] ( outputExprs=[c1#0, c2#1] ) \| \| +--PhysicalDistribute[49]@0 ( stats=3, distributionSpec=DistributionSpecGather ) \| \| +--PhysicalOlapScan[t]@0 ( stats=3 ) \| +-------------------------------------------------------------------------------------------+ 5 rows in set (0.01 sec) ``` (cherry picked from commit 03bd2a337d4a56ea9c91673b3bd4ae518ed10f20) * fix * [fix](Nereids) fix some sql cache consistence bug between multiple frontends (#33722) fix some sql cache consistence bug between multiple frontends which introduced by [enhancement](Nereids) Enable parse sql from sql cache #33262, fix by use row policy as the part of sql cache key. support dynamic update the num of fe manage sql cache key (cherry picked from commit 90abd76f71e73702e49794d375ace4f27f834a30) * [fix](Nereids) fix bug of dry run query with sql cache (#33799) 1. dry run query should not use sql cache 2. fix test sql cache in cloud mode 3. enable cache OneRowRelation and EmptyRelation in frontend to skip parse sql (cherry picked from commit dc80ecf7f33da7b8c04832dee88abd09f7db9ffe) * remove cloud mode * remove @NotNull	2024-04-19 15:22:14 +08:00
Kang	ad75b9b142	[opt](auto bucket) add fe config autobucket_max_buckets (#33842 )	2024-04-19 15:03:06 +08:00
feiniaofeiafei	6776a3ad1b	[Fix](planner) fix create view star except and modify cast to sql (#33726 )	2024-04-19 15:02:49 +08:00
xueweizhang	56b7839447	[feature](backup) ignore table that not support type when backup, and… (#33158 ) * [feature](backup) ignore table that not support type when backup, and not report exception Signed-off-by: nextdreamblue <zxw520blue1@163.com> * fix Signed-off-by: nextdreamblue <zxw520blue1@163.com> --------- Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2024-04-17 23:42:11 +08:00
zhangstar333	b2b385a4ff	[improve](fold) support complex type for constant folding (#32867 )	2024-04-17 23:41:59 +08:00
Mingyu Chen	38c5030f97	[opt](log) refactor the log dir config (#32933 ) Refactor the config for log dir of FE and BE TLDR: - Use env variable `LOG_DIR` to set root log dir - Remove `sys_log_dir` for FE and BE Details: 1. FE 1. The root log dir is set by env variable `LOG_DIR` in `fe.conf` 2. The default value of `audit_log_dir` is same as `${LOG_DIR}/` 3. The default value of `spark_launcher_log_dir` is `${LOG_DIR}/spark_launcher_log` 4. The default value of `nereids_trace_log_dir` is `${LOG_DIR}/nereids_trace_log` 5. The origin `sys_log_dir` is deprecated, and default value is `""`. But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir. 2. BE 1. The root log dir is set by env variable `LOG_DIR` in `be.conf` 2. Remove `pipeline_tracing_log_dir`, use `${LOG_DIR}` directly. 3. The origin `sys_log_dir` is deprecated, and default value is `""`. But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.	2024-04-17 23:41:59 +08:00
slothever	07f296734a	[regression](insert)add hive DDL and CTAS regression case (#32924 ) Issue Number: #31442 dependent on #32824 add ddl(create and drop) test add ctas test add complex type test TODO: bucketed table test truncate test add/drop partition test	2024-04-12 10:24:23 +08:00
slothever	36a1bf1d73	[feature][insert]Adapt the create table statement to the nereids sql (#32458 ) issue: #31442 1. adapt create table statement from doris to hive 2. fix insert overwrite for table sink > The doris create hive table statement: ``` mysql> CREATE TABLE buck2( -> id int COMMENT 'col1', -> name string COMMENT 'col2', -> dt string COMMENT 'part1', -> dtm string COMMENT 'part2' -> ) ENGINE=hive -> COMMENT "create tbl" -> PARTITION BY LIST (dt, dtm) () -> DISTRIBUTED BY HASH (id) BUCKETS 16 -> PROPERTIES( -> "file_format" = "orc" -> ); ``` > generated hive create table statement: ``` CREATE TABLE `buck2`( `id` int COMMENT 'col1', `name` string COMMENT 'col2') PARTITIONED BY ( `dt` string, `dtm` string) CLUSTERED BY ( id) INTO 16 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/buck2' TBLPROPERTIES ( 'transient_lastDdlTime'='1710840747', 'doris.file_format'='orc') ```	2024-04-12 09:57:37 +08:00
camby	14c5247fb7	[feature](replica) support force set replicate allocation for olap tables (#32916 ) Add a config to force set replication allocation for all OLAP tables and partitions.	2024-04-10 16:00:15 +08:00
yiguolei	16f8afc408	[refactor](coordinator) split profile logic and instance report logic (#32010 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-04-10 15:51:32 +08:00
Xinyi Zou	80cdc74908	[fix](arrow-flight) Fix reach limit of connections error (#32911 ) Fix Reach limit of connections error in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext. Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout. Fix bearer token evict log and exception. TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH	2024-04-10 11:34:29 +08:00
zhangdong	1d4e5a1c58	[enhance](auth)enable col auth (#32659 )	2024-03-24 08:07:01 +08:00
deardeng	ab467f53db	[fix](partition) Fix be tablet partition id eq 0 By report tablet (#32179 ) (#32667 )	2024-03-22 15:38:58 +08:00
Mingyu Chen	279ea2f366	[feature](proxy-protocol) Support proxy protocol v1 (#32338 ) Enable proxy protocol to support IP transparency. See: `IP Transparency` in `f57387b502/docs/en/docs/admin-manual/cluster-management/load-balancing.md` for details	2024-03-21 14:07:22 +08:00
airborne12	ecadb60bcd	[Pick 2.1](inverted index) support inverted index format v2 (#30145 ) (#32418 )	2024-03-19 08:11:33 +08:00
slothever	711c0cd55c	[feature](insert)implement hive table sink plan (#31765 ) (#32386 ) from #31765	2024-03-18 22:49:30 +08:00
Mingyu Chen	4732aae628	[Refactor](insert) refactor insert command to support other type of table (#31610 ) (#32345 ) bp #31610	2024-03-17 20:46:07 +08:00
deardeng	844a1b53b7	[fix](retry) Set query encounter rpc exception default retry times to 3 (#28555 )	2024-03-16 20:53:46 +08:00
morrySnow	ea2fbfaffa	[feature](Nereids) support agg state type in create table (#32171 ) this PR introduce a behavior change, syntax of create table with agg_state type is changed.	2024-03-15 18:04:49 +08:00
zclllyybb	847ec368be	[Fix](smooth-upgrade) Fix incompatibility when upgrade from 2.0 to 2.1 (#32220 )	2024-03-14 11:23:05 +08:00
walter	b9a87c63f7	[chore](catalog recycle bin) Add option to ignore min erase latency for testing (#31417 )	2024-02-29 16:44:40 +08:00
slothever	9243b3eeee	[fix](multi-catalog) add config to disable external DDL (#31528 ) from #31453	2024-02-29 08:42:35 +08:00
Pxl	6737fdea64	[Chore](agg-state) adjust AggStateType constructor check input (#31401 ) adjust AggStateType constructor check input	2024-02-28 17:52:11 +08:00
Mingyu Chen	883d022f84	[fix](paimon) fix hadoop.username does not take effect in paimon catalog (#31478 )	2024-02-28 13:08:41 +08:00
morrySnow	a371a10603	[fix](Nereids) let time type coercion same with legacy planner (#31472 )	2024-02-28 13:07:47 +08:00
slothever	eb0416032b	[feature](multi-catalog)support hms catalog create and drop table/db (#30198 ) (#31499 ) 1. rename old create/drop table to add/removeMemoryTable 2. add new create/drop table/db method 3. support hms catalog create/drop table/db (cherry picked from commit b2e869c7414c68186de8d43b324ae736d7cc3463)	2024-02-28 09:33:54 +08:00
wangbo	1127b0065a	[Improment](executor)Add scanbytes/scanrows condition (#31364 ) * Add scanbytes/scanrows condition * fix reg	2024-02-27 10:12:33 +08:00
ZhongJinHacker	e48f4f38d0	[Fix](fe-common) Fix the Pair.java code about the hidden danger of NullPointException (#31371 ) * 修复Pair类 first 或 second 为null时，调用equals和toString 抛NullPointException问题 * add license	2024-02-26 19:07:10 +08:00
yiguolei	7a1caf4718	[refactor](wg) enable wg by default and init normal wg in constructor (#31373 ) should always enable workload group because other operations depend on it for example MTMV, and spill to disk. the normal workload group should be created in constructor.	2024-02-25 18:08:19 +08:00
Mingyu Chen	aee49adf1e	[opt](compute-node) refactor compute node doc and opt some default config (#31325 ) * [opt](compute-node) refactor compute node doc and opt some default config * 1 * 1	2024-02-24 11:44:53 +08:00
Jibing-Li	9a40b6c978	Refactor get row count related interface, add row count cache for external table. (#31276 )	2024-02-23 19:03:28 +08:00
zy-kkk	c27692fb3b	[Enhancement](jdbc catalog) Add security check on driver when creating Jdbc Catalog (#31153 )	2024-02-21 13:53:40 +08:00
Calvin Kirs	02bded2688	[Improve](common)Optimize logging performance with LOG.isDebugEnabled() (#31091 ) * [Improve](common)Optimize logging performance with LOG.isDebugEnabled() * fix error ut	2024-02-20 09:16:14 +08:00
slothever	4a33d9820a	[fix](multi-catalog)fix getting ugi methods and unify them (#30844 ) put all ugi login methods to HadoopUGI	2024-02-20 09:12:38 +08:00
wangbo	ac756075bb	Alter workload group queue prop sync for regression test (#30869 )	2024-02-16 10:12:24 +08:00
yujun	4052746f1c	[improvement](balance) fix multiple problems for balance on large cluster (#30713 )	2024-02-16 10:12:24 +08:00
lihangyu	b23a785775	[Fix](Variant) support materialize view for variant and accessing variant subcolumns (#30603 ) * [Fix](Variant) support materialize view for variant and accessing variant subcolumns 1. fix schema change with path lost and lead to invalid data read 2. support element_at function in BE side and use simdjson to parse data 3. fix multi slot expression	2024-02-16 10:12:23 +08:00
Xinyi Zou	08508d65fd	[feature-wip](plsql)(step1) Support PL-SQL (#30817 ) # 1. Motivation PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL. Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL. Reference documentation: Hive: http://mail.hplsql.org Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715 Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html # 2. Implementation Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol. ``` CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int) select count() from test; select count() into result from test where k = name; END declare result INT default = 0; call A(‘xxx’, result); print result; ``` ![image](https://github.com/apache/doris/assets/13197424/0b78e039-0350-4ef1-bef3-0ebbf90274cd) 1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata. 2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>. 3. Execute Doris Statement - Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax. - Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult. - Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables. - Stored Programs compatible with Mysql protocol support multiple statements. 4. Execute PL-SQL Statement - Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL. # 3. TODO 1. Support drop procedure. 2. Create procedure only in `PlSqlOperation`. 3. Doris Parser supports declare variable. 4. Select Statement supports insert into variable. 5. Parameters and fields have the same name. 6. If Cursor exits halfway, will there be a memory leak? 7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters? 8. Supports complex types such as Map and Struct. 9. Test syntax such as Package. 10. Support UDF 11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN, but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later. 12. Built-in functions require a separate management. 13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt. 14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt. # 4. Some questions 1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error; 2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste. 3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented. 4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution. 5. The format of the result returned by Doris Statement is ```xxxx\n, xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results. # 5. Some thoughts The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process. HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated. ![image](https://github.com/apache/doris/assets/13197424/7539e485-0161-44de-9100-1a01ebe6cc07)	2024-02-16 10:12:23 +08:00
Qi Chen	383850ef12	[Opt](multi-catalog) Opt split assignment to resolve uneven distribution. (#30390 ) [Opt] (multi-catalog) Opt split assignment to resolve uneven distribution. Currently only for `FileQueryScanNode`. Referring to the implementation of Trino, - Local node soft affinity optimization. Prefer local replication node. - Remote split will use the consistent hash algorithm is used when the file cache is turned on, and because of the possible unevenness of the consistent hash, the split is re-adjusted so that the maximum and minimum split numbers of hosts differ by at most `max_split_num_variance` split. - Remote split will use the round-robin algorithm is used when the file cache is turned off.	2024-02-04 14:28:38 +08:00
HHoflittlefish777	f35803b7a0	[feature](pipeline-load) enable pipeline load by default (#30581 )	2024-01-31 23:53:39 +08:00
amory	0f81d2d533	[FIX](complextype)fix complex type nested version type but not hide version (#30419 )	2024-01-29 19:03:47 +08:00
zhangdong	3354ac48f7	[enhance](mtmv)add version and version time for table (#30437 ) Add version to record data changes in the table Scope of impact: - Transaction related operations - drop partition - replace partition	2024-01-29 19:03:47 +08:00
lihangyu	7667fe8570	[Improve)(Variant) do not allow fall back to legacy planner (#30430 )	2024-01-29 19:02:46 +08:00
Yongqiang YANG	bfdc41d37b	[fix](ccr) handle large binlog (#30435 )	2024-01-28 18:25:31 +08:00
Mingyu Chen	5d7543b30b	[feature](ranger) Support Apache ranger for Doris (#27864 ) For usage, see: `5d340ce24f/docs/zh-CN/docs/admin-manual/privilege-ldap/ranger.md` For range-doris-plugin, see: https://github.com/morningman/ranger/tree/doris-plugin To support ranger, there are several other modification: 1. Support `show resources like "pattern"` 2. Support `show workload group like "pattern"` 3. Support `show schemas like "pattern"`	2024-01-27 10:29:38 +08:00
yujun	3d22f9cfc8	[feature](replica) Add drop replica safely on backend (#30303 )	2024-01-25 13:24:52 +08:00
zclllyybb	2e6a00690f	[Fix](smooth-upgrade) fix unnecessary high version of smooth upgrade (#30283 ) fix unnecessary high version of smooth upgrade	2024-01-25 13:24:09 +08:00

1 2 3 4 5 ...

427 Commits