doris

Author	SHA1	Message	Date
walter	b9a87c63f7	[chore](catalog recycle bin) Add option to ignore min erase latency for testing (#31417 )	2024-02-29 16:44:40 +08:00
slothever	9243b3eeee	[fix](multi-catalog) add config to disable external DDL (#31528 ) from #31453	2024-02-29 08:42:35 +08:00
Pxl	6737fdea64	[Chore](agg-state) adjust AggStateType constructor check input (#31401 ) adjust AggStateType constructor check input	2024-02-28 17:52:11 +08:00
Mingyu Chen	883d022f84	[fix](paimon) fix hadoop.username does not take effect in paimon catalog (#31478 )	2024-02-28 13:08:41 +08:00
morrySnow	a371a10603	[fix](Nereids) let time type coercion same with legacy planner (#31472 )	2024-02-28 13:07:47 +08:00
slothever	eb0416032b	[feature](multi-catalog)support hms catalog create and drop table/db (#30198 ) (#31499 ) 1. rename old create/drop table to add/removeMemoryTable 2. add new create/drop table/db method 3. support hms catalog create/drop table/db (cherry picked from commit b2e869c7414c68186de8d43b324ae736d7cc3463)	2024-02-28 09:33:54 +08:00
wangbo	1127b0065a	[Improment](executor)Add scanbytes/scanrows condition (#31364 ) * Add scanbytes/scanrows condition * fix reg	2024-02-27 10:12:33 +08:00
ZhongJinHacker	e48f4f38d0	[Fix](fe-common) Fix the Pair.java code about the hidden danger of NullPointException (#31371 ) * 修复Pair类 first 或 second 为null时，调用equals和toString 抛NullPointException问题 * add license	2024-02-26 19:07:10 +08:00
yiguolei	7a1caf4718	[refactor](wg) enable wg by default and init normal wg in constructor (#31373 ) should always enable workload group because other operations depend on it for example MTMV, and spill to disk. the normal workload group should be created in constructor.	2024-02-25 18:08:19 +08:00
Mingyu Chen	aee49adf1e	[opt](compute-node) refactor compute node doc and opt some default config (#31325 ) * [opt](compute-node) refactor compute node doc and opt some default config * 1 * 1	2024-02-24 11:44:53 +08:00
Jibing-Li	9a40b6c978	Refactor get row count related interface, add row count cache for external table. (#31276 )	2024-02-23 19:03:28 +08:00
zy-kkk	c27692fb3b	[Enhancement](jdbc catalog) Add security check on driver when creating Jdbc Catalog (#31153 )	2024-02-21 13:53:40 +08:00
Calvin Kirs	02bded2688	[Improve](common)Optimize logging performance with LOG.isDebugEnabled() (#31091 ) * [Improve](common)Optimize logging performance with LOG.isDebugEnabled() * fix error ut	2024-02-20 09:16:14 +08:00
slothever	4a33d9820a	[fix](multi-catalog)fix getting ugi methods and unify them (#30844 ) put all ugi login methods to HadoopUGI	2024-02-20 09:12:38 +08:00
wangbo	ac756075bb	Alter workload group queue prop sync for regression test (#30869 )	2024-02-16 10:12:24 +08:00
yujun	4052746f1c	[improvement](balance) fix multiple problems for balance on large cluster (#30713 )	2024-02-16 10:12:24 +08:00
lihangyu	b23a785775	[Fix](Variant) support materialize view for variant and accessing variant subcolumns (#30603 ) * [Fix](Variant) support materialize view for variant and accessing variant subcolumns 1. fix schema change with path lost and lead to invalid data read 2. support element_at function in BE side and use simdjson to parse data 3. fix multi slot expression	2024-02-16 10:12:23 +08:00
Xinyi Zou	08508d65fd	[feature-wip](plsql)(step1) Support PL-SQL (#30817 ) # 1. Motivation PL-SQL (Stored procedure) is a collection of sql, which is defined and used similarly to functions. It supports conditional judgments, loops and other control statements, supports cursor processing of result sets, and can write business logic in SQL. Hive uses Hplsql to support PL-SQL and is largely compatible with Oracle, Impala, MySQL, Redshift, PostgreSQL, DB2, etc. We support PL-SQL in Doris based on Hplsql to achieve compatibility with Stored procedures of database systems such as Oracle and PostgreSQL. Reference documentation: Hive: http://mail.hplsql.org Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/21/lnpls/plsql-language-fundamentals.html#GUID-640DB3AA-15AF-4825-BD6C-1D4EB5AB7715 Mysql: https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html # 2. Implementation Take the following case as an example to explain the process of connecting Doris FE to execute stored procedures using the Mysql protocol. ``` CREATE OR REPLACE PROCEDURE A(IN name STRING, OUT result int) select count() from test; select count() into result from test where k = name; END declare result INT default = 0; call A(‘xxx’, result); print result; ``` ![image](https://github.com/apache/doris/assets/13197424/0b78e039-0350-4ef1-bef3-0ebbf90274cd) 1. Add procedure and persist the Procedure Name and Source (raw SQL) into Doris FE metadata. 2. Call procedure, extract the actual parameter Value and Procedure Name in Call Stmt. Use Procedure Name to find the Source in the metadata, extract the Name and Type of the Procedure parameter, and match them with the actual parameter Value to form a complete variable <Name, Type, Value>. 3. Execute Doris Statement - Use Doris Logical Plan Builder to parse the Doris Statement syntax in Source, replace parameter variables, remove the into variable clause, and generate a Plan Tree that conforms to Doris syntax. - Use stmtExecutor to execute SQL and encapsulate the query result set iterator into QueryResult. - Output the query results to Mysql Channel, or write them into Cursor, parameters, and variables. - Stored Programs compatible with Mysql protocol support multiple statements. 4. Execute PL-SQL Statement - Use Plsql Logical Plan Builder to parse and execute PL-SQL Statement syntax in Source, including Loop, Cursor, IF, Declare, etc., and basically reuse HplSQL. # 3. TODO 1. Support drop procedure. 2. Create procedure only in `PlSqlOperation`. 3. Doris Parser supports declare variable. 4. Select Statement supports insert into variable. 5. Parameters and fields have the same name. 6. If Cursor exits halfway, will there be a memory leak? 7. Use getOriginSql(ctx) in syntax parsing LogicalPlanBuilder to obtain the original SQL. Is there any problem with special characters? 8. Supports complex types such as Map and Struct. 9. Test syntax such as Package. 10. Support UDF 11. In Oracle, create procedure must have AS or IS after RIGHT_PAREN, but Mysql and Hive not support AS or IS. Compatibility issues with Oracle will be discussed and resolved later. 12. Built-in functions require a separate management. 13. Doris statement add stmt: egin_transaction_stmt, end_transaction_stmt, commit_stmt, rollback_stmt. 14. Add plsql stmt: cmp_stmt, copy_from_local_stmt, copy_stmt, create_local_temp_table_stmt, merge_stmt. # 4. Some questions 1. JDBC does not support the execution of stored procedures that return results. You can only Into the execution results into a variable or write them into a table, because when multiple result sets are returned, JDBC needs to use the prepareCall statement to execute, otherwise the Statemnt of the returned result executes Finalize. Send EOF Packet will report an error; 2. Use PL-SQL Cursor to open multiple Query result set iterators at the same time. Doris BE will cache the intermediate status of these Queries (such as HashTable) and query results until the Query result set iteration is completed. If the Cursor is not available for a long time Being used will result in a lot of memory waste. 3. In plsql/Var.defineType(), the corresponding Plsql Var type will be found through the Mysql type name string, and the corresponding relationship between Doris type and Plsql Var needs to be implemented. 4. Currently, PL-SQL Statement will be forwarded to Master FE for creation and calculation, which may affect other services on Doris FE and is limited by the performance of Doris FE. Consider moving it to Doris BE for execution. 5. The format of the result returned by Doris Statement is ```xxxx\n, xxxx\n, 2 rows affected (0.03 sec)```. PL-SQL uses Print to print variable values in an unformatted format, and JDBC cannot easily obtain them. Real results. # 5. Some thoughts The above execution of Doris Statement reuses Doris Logical Plan Builder for syntax parsing, parses it from top to bottom into a Plan Tree, and calls stmtExecutor for execution. PL-SQL replacement variables, removal of Into Variable and other operations are coupled in Doris syntax parsing. The advantage is that it is easier to It can be compatible with Doris grammar with a few changes, but the disadvantage is that it will invade the Doris grammar parsing process. HplSQL performs a syntax parsing independently of Hive to implement variable substitution and other operations, and finally outputs a SQL that conforms to Hive syntax. The following is a simple syntax parsing process for select, where, expression, table name, join, The parsing of agg, order and other grammars must be re-implemented. The advantage is that it is completely independent from the original system, but the changes are too complicated. ![image](https://github.com/apache/doris/assets/13197424/7539e485-0161-44de-9100-1a01ebe6cc07)	2024-02-16 10:12:23 +08:00
Qi Chen	383850ef12	[Opt](multi-catalog) Opt split assignment to resolve uneven distribution. (#30390 ) [Opt] (multi-catalog) Opt split assignment to resolve uneven distribution. Currently only for `FileQueryScanNode`. Referring to the implementation of Trino, - Local node soft affinity optimization. Prefer local replication node. - Remote split will use the consistent hash algorithm is used when the file cache is turned on, and because of the possible unevenness of the consistent hash, the split is re-adjusted so that the maximum and minimum split numbers of hosts differ by at most `max_split_num_variance` split. - Remote split will use the round-robin algorithm is used when the file cache is turned off.	2024-02-04 14:28:38 +08:00
HHoflittlefish777	f35803b7a0	[feature](pipeline-load) enable pipeline load by default (#30581 )	2024-01-31 23:53:39 +08:00
amory	0f81d2d533	[FIX](complextype)fix complex type nested version type but not hide version (#30419 )	2024-01-29 19:03:47 +08:00
zhangdong	3354ac48f7	[enhance](mtmv)add version and version time for table (#30437 ) Add version to record data changes in the table Scope of impact: - Transaction related operations - drop partition - replace partition	2024-01-29 19:03:47 +08:00
lihangyu	7667fe8570	[Improve)(Variant) do not allow fall back to legacy planner (#30430 )	2024-01-29 19:02:46 +08:00
Yongqiang YANG	bfdc41d37b	[fix](ccr) handle large binlog (#30435 )	2024-01-28 18:25:31 +08:00
Mingyu Chen	5d7543b30b	[feature](ranger) Support Apache ranger for Doris (#27864 ) For usage, see: `5d340ce24f/docs/zh-CN/docs/admin-manual/privilege-ldap/ranger.md` For range-doris-plugin, see: https://github.com/morningman/ranger/tree/doris-plugin To support ranger, there are several other modification: 1. Support `show resources like "pattern"` 2. Support `show workload group like "pattern"` 3. Support `show schemas like "pattern"`	2024-01-27 10:29:38 +08:00
yujun	3d22f9cfc8	[feature](replica) Add drop replica safely on backend (#30303 )	2024-01-25 13:24:52 +08:00
zclllyybb	2e6a00690f	[Fix](smooth-upgrade) fix unnecessary high version of smooth upgrade (#30283 ) fix unnecessary high version of smooth upgrade	2024-01-25 13:24:09 +08:00
Jibing-Li	668a68967c	[fix](statistics)Reanalyze olapTable if getRowCount is not 0 and last time row count is 0 (#30096 ) Sample analyze may write 0 result if getRowCount is not updated while analyzing. So we need to reanalyze the table if getRowCount > 0 and previous analyze row count is 0. Otherwise the stats for this table may stay 0 for ever before user load new data to this table.	2024-01-19 15:48:56 +08:00
Jibing-Li	7d1b3d4704	[feature](statistics, metadata)Meta data place holder for statistics (#29867 ) Meta data place holder for statistics in version 2.1.x. Users could upgrade to this version, but doesn't support rollback. After this change, statistics related functions doesn't need to change meta data any more in the 2.1 series.	2024-01-18 12:03:07 +08:00
amory	ade720470d	[Improve](config)delete confused config for nested complex type (#29988 )	2024-01-18 12:03:07 +08:00
zy-kkk	d658a44cef	[improvement](catalog) Change the push-down parameters of the predicate function of the table query SQL into variables (#30028 ) In this PR, we will control whether the external data source query is a push-down function parameter in the filter condition, changing the enable_fun_pushdown of fe conf to the enable_ext_func_pred_pushdown of the variable	2024-01-16 21:14:35 +08:00
谢健	4e41e1d797	[feat](Nereids) persist constraint in table (#29767 )	2024-01-16 18:49:29 +08:00
deardeng	168afdb965	[fix](disk balance) Change disk rebalance unpick time to configurable (#28949 )	2024-01-16 18:49:04 +08:00
Siyang Tang	97955da749	[enhancement](fe-memory) support label num threshold to reduce fe memory consumption (#22889 ) Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>	2024-01-16 18:38:54 +08:00
Mingyu Chen	7c493b08c5	[refactor](dialect) make http sql converter plugin and audit loader as builtin plugin (#29692 ) Followup #28890 Make HttpSqlConverterPlugin and AuditLoader as Doris' builtin plugin. To make it simple for user to support sql dialect and using audit loader. HttpSqlConverterPlugin By default, there is nothing changed. There is a new global variable sql_converter_service, default is empty, if set, the HttpSqlConverterPlugin will be enabled set global sql_converter_service = "http://127.0.0.1:5001/api/v1/convert" AuditLoader By default, there is nothing changed. There is a new global variable enable_audit_plugin, default is false, if set to true, the audit loader plugin will be enable. Doris will create audit_log in __internal_schema when startup If enable_audit_plugin is true, the audit load will be inserted into audit_log table. 3 other global variables related to this plugin: audit_plugin_max_batch_interval_sec: The max interval for audit loader to insert a batch of audit log. audit_plugin_max_batch_bytes: The max batch size for audit loader to insert a batch of audit log. audit_plugin_max_sql_length: The max length of statement in audit log	2024-01-16 18:31:59 +08:00
gnehil	6598b4f7c8	[fix](http) fix exception when querying map data through http #29686 The mysql type code mapped by the map type is 400, but 400 is an unknown type for mysql. For the jdbc driver of mariadb, when querying through the http api of /api/query or using the jdbc driver of mariadb, an exception will occur. For the jdbc driver of mysql, it will be converted into binary form, and the correct data can be read through the string type. Therefore, the mysql custom type of map was removed and changed to string type, so that both the jdbc driver of mariadb and mysql can work normally.	2024-01-16 18:31:27 +08:00
Mingyu Chen	ebfbe0c8dd	[opt](information_schema) support information_schema in external catalog (#28919 ) Add `information_schema` database for all catalog. This is useful when using BI tools to connect to Doris, the tools can get meta info from `information_schema`. This PR mainly changes: 1. There will be a `information_schema` db in each catalog. 2. Each `information_schema` db only store the meta info of the catalog it belongs to. 3. For `information_schema`, the `TABLE_SCHEMA` column's value is the database name. 4. There is a new global variable `show_full_dbname_in_info_schema_db`, default is false, if set to true, The `TABLE_SCHEMA` column's value is the like `ctl.db`, because: When connect to Doris, the `database` info in connection url will be: `xxx?db=ctl.db`. And then some BI will try to query `information_schema` with sql like: `select * from information_schema.columns where TABLE_SCHEMA = "ctl.db"` So it has to be format as `ctl.db` eg, the `information_schema.columns` table in external catalog `doris` is like: ``` mysql> select * from information_schema.columns limit 1\G ************************* 1. row ************************* TABLE_CATALOG: doris TABLE_SCHEMA: doris.__internal_schema TABLE_NAME: column_statistics COLUMN_NAME: id ORDINAL_POSITION: 1 COLUMN_DEFAULT: NULL IS_NULLABLE: NO DATA_TYPE: varchar CHARACTER_MAXIMUM_LENGTH: 4096 CHARACTER_OCTET_LENGTH: 16384 NUMERIC_PRECISION: NULL NUMERIC_SCALE: NULL DATETIME_PRECISION: NULL CHARACTER_SET_NAME: NULL COLLATION_NAME: NULL COLUMN_TYPE: varchar(4096) COLUMN_KEY: EXTRA: PRIVILEGES: COLUMN_COMMENT: COLUMN_SIZE: 4096 DECIMAL_DIGITS: NULL GENERATION_EXPRESSION: NULL SRS_ID: NULL ``` 6. Modify the behavior of - show tables - shwo databases - show columns - show table status The above statements may query the `information_schema` db if there is `where` predicate after them	2024-01-12 13:58:19 +08:00
morrySnow	e93a16ac6e	[fix](Nereids) support complex literal cast in fe (#29599 )	2024-01-12 11:59:52 +08:00
wangbo	0d691c638b	[Feature](profile)Support report runtime workload statistics #29591	2024-01-12 11:59:27 +08:00
HappenLee	463a7ab212	[Performance](exec) opt the exchange performance (#29579 )	2024-01-12 11:46:29 +08:00
Xiangyu Wang	2ca90b2bf1	[Refactor](dialect) Add sql dialect converter plugins (#28890 ) The current logic for SQL dialect conversion is all in the `fe-core` module, which may lead to the following issues: - Changes to the dialect conversion logic may occur frequently, requiring users to upgrade the Doris version frequently within the fe-core module, leading to a longer change cycle. - The cost of customized development is high, requiring users to replace the fe-core JAR package. Turning it into a plugin can address the above issues properly.	2024-01-12 11:44:20 +08:00
abmdocrt	1ea51e9f20	[Feature](group commit) Support table property "group commit data bytes" (#29484 )	2024-01-07 19:46:42 +08:00
yujun	2d89b7aed4	[fix](tablet sched) disable disk balance for single replica (#29576 )	2024-01-07 19:21:42 +08:00
xueweizhang	75efdd6e1f	[fix](http) throw RejectedExecutionException to prevent http hanging by Future (#29607 )	2024-01-06 16:17:07 +08:00
Mingyu Chen	c1ddcc5751	[opt](config) create custom conf dir if not exists (#29391 )	2024-01-05 00:14:16 +08:00
Luwei	3c6c652997	[Fix](schema change) disable convert light schema change (#28205 ) (#29300 )	2023-12-31 17:02:15 +08:00
wangbo	c3c34e10bb	[feature](executor) Add some check when create workload group/workload schedule policy (#29236 )	2023-12-29 15:41:16 +08:00
slothever	8becf053cb	[fix](multi-catalog)unsupported hive input format should throw an exception and remove useless method (#29087 ) introduce from: #28644	2023-12-28 15:43:28 +08:00
HowardQin	8a169b9906	[case](regression) Test enable pipeline load (#28172 ) Co-authored-by: qinhao <qinhao@newland.com.cn>	2023-12-28 10:49:19 +08:00
yujun	ffc6596cef	[refactor](create tablet) default create tablet round robin (#28911 )	2023-12-26 17:36:05 +08:00

1 2 3 4 5 ...

404 Commits