doris

Author	SHA1	Message	Date
morrySnow	3c2e8b0ecf	[fix](Nereids) rewrite cte children check wrong map for consumer (#28220 )	2023-12-11 14:58:42 +08:00
谢健	c2d6fbbc85	[feature](Nereids): add filter edge in hyperGraph (#28006 )	2023-12-11 14:36:43 +08:00
Xiangyu Wang	f2fd66ad3b	[feature-wip](nereids) Make nereids more compatible with spark-sql syntax. (#27231 ) Thanks for pr #21855 to provide a wonderful reference. Maybe it is very difficult and cost-expensive to implement a comprehensive logical plan adapter, maybe there is just some small syntax variations between doris and some other engines (such as hive/spark), so we can just focus on the difference here. This pr mainly focus on the syntax difference between doris and spark-sql. For instance, do some function tranformations and override some syntax validations. - add a dialect named `spark_sql` - move method `NereidsParser#parseSQLWithDialect` to `TrinoParser` - extract some `FnCallTransformer`/`FnCallTransformers` classes, so we can reuse the logic about the function transformers - allow derived tables without alias when we set dialect to `spark_sql`(legacy and nereids parser are both supported) - add some function transformers for hive/spark built-in functions ### Test case (from our online doris cluster) - Test derived table without alias ```sql MySQL [(none)]> show variables like '%dialect%'; +---------------+-------+---------------+---------+ \| Variable_name \| Value \| Default_Value \| Changed \| +---------------+-------+---------------+---------+ \| sql_dialect \| spark_sql \| doris \| 1 \| +---------------+-------+---------------+---------+ 1 row in set (0.01 sec) MySQL [(none)]> select * from (select 1); +------+ \| 1 \| +------+ \| 1 \| +------+ 1 row in set (0.03 sec) MySQL [(none)]> select __auto_generated_subquery_name.a from (select 1 as a); +------+ \| a \| +------+ \| 1 \| +------+ 1 row in set (0.03 sec) MySQL [(none)]> set sql_dialect=doris; Query OK, 0 rows affected (0.02 sec) MySQL [(none)]> select * from (select 1); ERROR 1248 (42000): errCode = 2, detailMessage = Every derived table must have its own alias MySQL [(none)]> ``` - Test spark-sql/hive built-in functions ```sql MySQL [(none)]> show global functions; Empty set (0.01 sec) MySQL [(none)]> show variables like '%dialect%'; +---------------+-------+---------------+---------+ \| Variable_name \| Value \| Default_Value \| Changed \| +---------------+-------+---------------+---------+ \| sql_dialect \| spark_sql \| doris \| 1 \| +---------------+-------+---------------+---------+ 1 row in set (0.01 sec) MySQL [(none)]> select get_json_object('{"a":"b"}', '$.a'); +----------------------------------+ \| json_extract('{"a":"b"}', '$.a') \| +----------------------------------+ \| "b" \| +----------------------------------+ 1 row in set (0.04 sec) MySQL [(none)]> select split("a b c", " "); +-------------------------------+ \| split_by_string('a b c', ' ') \| +-------------------------------+ \| ["a", "b", "c"] \| +-------------------------------+ 1 row in set (1.17 sec) ```	2023-12-11 11:16:53 +08:00
meiyi	1e5ff40e17	[refactor](group commit) remove future block (#27720 ) Co-authored-by: huanghaibin <284824253@qq.com>	2023-12-11 08:41:51 +08:00
Gabriel	320ddf4987	[pipelineX](improvement) Support multiple instances execution on single tablet (#28178 )	2023-12-10 20:18:41 +08:00
Gabriel	5aa90a3bce	[pipelineX](local shuffle) Fix bucket hash shuffle (#28202 )	2023-12-10 00:35:00 +08:00
Mingyu Chen	16e232a8a1	[minor](lower-table-names) use GlobalVariable.lowerCaseTableNames instead of Config.lower_case_table_names (#27911 ) GlobalVariable.lowerCaseTableNames instead of Config.lower_case_table_names	2023-12-09 12:04:26 +08:00
zhiqiang	42aa174405	[chore](log) Log to trace before wait rpc timeout #28024	2023-12-09 10:04:43 +08:00
HHoflittlefish777	9d9b6462bf	[improve](group_commit) optimize group commit select be logic #28190 Group commit choose be always first no decommissioned be in all be. Choose be with selectBackendIdsByPolicy like common stream load and do not choose decommissioned be may be better.	2023-12-09 05:09:52 +08:00
xueweizhang	c6f8b1b2ee	[fix](repository) the exist repo_file must contails same name with new repo (#27668 ) The user manually adjusted the 'name' field in the __repo_info file under the repo file on S3, but did not modify the folder name. This led to an issue when the user created a repo with the same name as the folder in a certain cluster. The system parsed the 'name' field in the existing __repo_info and used an incorrect name, causing the subsequent repo to be unusable. A judgment has been added here: the 'name' field in the __repo_info must be the same as the new repo's name, otherwise, an error will be reported.	2023-12-09 01:46:54 +08:00
Nitin-Kashyap	07336980f9	[fix](meta) show partitions with Limit for external HMS tables (27835) (#27835 ) This enhancement shall extend existing logic for SHOW PARTITIONS FROM to include: - Limit/Offset Where [partition name only] [equal operator and like operator] Order by [partition name only] Issue Number: close #27834	2023-12-09 01:44:45 +08:00
walter	99b38ddca7	[improve](env) Ensure next majority is met before drop an alive follower (#28101 ) Here is an example: ``` mysql> ALTER SYSTEM DROP FOLLOWER "127.0.0.1:19017"; ERROR 1105 (HY000): errCode = 2, detailMessage = Unable to drop this alive follower, because the quorum requirements are not met after this command execution. Current num alive followers 2, num followers 3, majority after execution 2 ```	2023-12-09 01:41:38 +08:00
Pxl	027b06059a	[Feature](materialized-view) support count(1) on materialized view (#28135 ) support count(1) on materialized view fix match failed like select k1, sum(k1) from t group by k1	2023-12-09 01:36:46 +08:00
Yulei-Yang	b6e72d57c5	[Improvement](hms catalog) support show_create_database for hms catalog (#28145 ) * [Improvement](hms catalog) support show_create_database for hms catalog * update	2023-12-09 01:34:21 +08:00
Mingyu Chen	8eed760704	[fix](planner) separate table's isPartitioned() method (#28163 ) This PR #27515 change the logic if Table's `isPartitioned()` method. But this method has 2 usages: 1. To check whether a table is range or list partitioned, for some DML operation such as Alter, Export. For this case, it should return true if the table is range or list partitioned. even if it has only one partition and one buckets. 2. To check whether the data is distributed (either by partitions or by buckets), for query planner. For this case, it should return true if table has more than one bucket. Even if this table is not range or list partitioned, if it has more than one bucket, it should return true. So we should separate this method into 2, for different usages. Otherwise, it may cause some unreasonable plan shape	2023-12-08 23:15:45 +08:00
Mingyu Chen	baf85547ae	[feature](jdbc) support call function to pass sql directly to jdbc catalog #26492 Support a new stmt in Nereids: `CALL EXECUTE_STMT("jdbc", "stmt")` So that we can pass the origin stmt directly to the datasource of a jdbc catalog. show case: ``` mysql> select * from mysql_catalog.db1.tbl1; +------+------+ \| k1 \| k2 \| +------+------+ \| 111 \| 222 \| +------+------+ 1 row in set (0.63 sec) mysql> call execute("mysql_catalog", "insert into db1.tbl1 values(1,'abc')"); Query OK, 0 rows affected (0.01 sec) mysql> select * from mysql_catalog.db1.tbl1; +------+------+ \| k1 \| k2 \| +------+------+ \| 111 \| 222 \| \| 1 \| abc \| +------+------+ 2 rows in set (0.03 sec) mysql> call execute_stmt("mysql_catalog", "delete from db1.tbl1 where k1=111"); Query OK, 0 rows affected (0.01 sec) mysql> select * from mysql_catalog.db1.tbl1; +------+------+ \| k1 \| k2 \| +------+------+ \| 1 \| abc \| +------+------+ 1 row in set (0.03 sec) ```	2023-12-08 23:06:05 +08:00
minghong	2b914aebb6	[opt](nereids)improve partition prune when Date function is used (#27960 ) date func in partition prune	2023-12-08 21:53:39 +08:00
lihangyu	06404114f1	[Fix](point query) fix memleak by increasing `scanReplicaIds` when using prepared statement (#28184 ) OlapScanNode should release memory for `scanReplicaIds`	2023-12-08 21:02:01 +08:00
Jibing-Li	5e7afa768e	[fix](statistics)Avoid potential NPE #28147	2023-12-08 20:42:17 +08:00
Sun Chenyang	573b594df3	[improvement](Variant Type) Support displaying subcolumns expanded for the variant column (#27764 )	2023-12-08 20:34:58 +08:00
zhangstar333	51f320a606	[bug](function) fix array_apply function return wrong result (#28133 )	2023-12-08 20:14:54 +08:00
zhiqiang	0931eb536c	Revert "[Improvement](auditlog) add column catalog for audit log and audit log table (#26403 )" (#28177 ) This reverts commit daea751a986823bf5858704663d58f49fd5dfb39.	2023-12-08 18:46:59 +08:00
zhangstar333	75b55f8f2f	[enhance](session)check invalid value when set parallel instance variables (#28141 ) in some case, if set incorrectly, will be cause BE core dump 10:18:19 *** SIGFPE integer divide by zero (@0x564853c204c8) received by PID 2132555 int max_scanners = config::doris_scanner_thread_pool_thread_num / state->query_parallel_instance_num();	2023-12-08 17:38:48 +08:00
minghong	bc40025631	[opt](Nereids)Join cluster connectivity (#27833 ) * estimation join stats by connectivity	2023-12-08 14:55:10 +08:00
Xiangyu Wang	16230b5ebd	[Enhance](multi-catalog) parse hive view ddl first to avoid NPE. (#28067 )	2023-12-08 13:54:50 +08:00
minghong	61d556c718	[fix](nereids)runtime filter translator failed on set operator (#28102 ) * runtime filter translator failed on set operator	2023-12-08 12:58:42 +08:00
yujun	ebed055d2b	[chore](clone) rename clone request field (#27591 )	2023-12-08 11:53:57 +08:00
Calvin Kirs	cd108688c1	[Chore](docs)Fix job error docs (#28127 )	2023-12-08 10:24:21 +08:00
zclllyybb	25b90eb782	[Feature](function) support random int from specific range (#28076 ) mysql> select rand(-20, -10); +------------------+ \| random(-20, -10) \| +------------------+ \| -13 \| +------------------+ 1 row in set (0.10 sec)	2023-12-08 10:15:25 +08:00
walter	cbb238a0ff	[improve](env) Add disk usage in not ready msg (#28125 )	2023-12-07 22:49:52 +08:00
zclllyybb	81a0f8c041	[Feature](function) support generating const values from tvf numbers (#28051 ) If specified, got a column of constant. otherwise an incremental series like it always be. mysql> select * from numbers("number" = "5", "const_value" = "-123"); +--------+ \| number \| +--------+ \| -123 \| \| -123 \| \| -123 \| \| -123 \| \| -123 \| +--------+ 5 rows in set (0.11 sec)	2023-12-07 22:26:43 +08:00
seawinde	be81eb1a9b	[feature](nereids) Support inner join query rewrite by materialized view (#27922 ) Work in process. Support inner join query rewrite by materialized view in some scene. Such as an exmple as following: > mv = "select lineitem.L_LINENUMBER, orders.O_CUSTKEY " + > "from orders " + > "inner join lineitem on lineitem.L_ORDERKEY = orders.O_ORDERKEY " > query = "select lineitem.L_LINENUMBER " + > "from lineitem " + > "inner join orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY "	2023-12-07 20:29:51 +08:00
morrySnow	f37215a32a	[fix](Nereids) insert into target table lock should include finalize (#28085 )	2023-12-07 20:15:12 +08:00
morrySnow	65fc2e0438	[fix](Nereids) forbid two TVF in one fragment since the limit of coordinator (#28114 )	2023-12-07 19:58:31 +08:00
lihangyu	cc9b4bcddb	[Fix](variant) fallback to none partial update for mow table (#28116 )	2023-12-07 19:30:24 +08:00
Mingyu Chen	34642781c2	[fix](meta) fix ConcurrentModificationException when dump image (#28072 ) ``` Caused by: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1437) ~[?:1.8.0_131] at java.util.HashMap$EntryIterator.next(HashMap.java:1471) ~[?:1.8.0_131] at java.util.HashMap$EntryIterator.next(HashMap.java:1469) ~[?:1.8.0_131] at org.apache.doris.catalog.CatalogRecycleBin.write(CatalogRecycleBin.java:1047) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.catalog.Env.saveRecycleBin(Env.java:2298) ~[doris-fe.jar:1.2-SNAPSHOT] ``` When calling `/dump` api to dump image, ConcurrentModificationException may be thrown. Because no lock to protect `CatalogRecycleBin`	2023-12-07 18:26:02 +08:00
airborne12	394b420180	[Update](inverted index) use session variable for inverted index try query threshold (#28052 ) * [Update](inverted index) use session variable for inverted index try query threshold * remove unused config * update clucene	2023-12-07 17:54:44 +08:00
minghong	bc12a05915	[fix](Nereids) explain graph insert-select NPE (#28007 )	2023-12-07 17:25:44 +08:00
Mingyu Chen	495c01bdfd	[fix](session-variable) enable_unicode_name_support need forward to master (#28112 ) Session variable `enable_unicode_name_support` is used to enable using unicode for table/column names. And it should be forward to master when executing `create table` stmt on non-master FE. Otherwise, even if we set `enable_unicode_name_support` to true on non-master FE, we can create table with unicode name.	2023-12-07 15:43:11 +08:00
xueweizhang	8526b9ffbe	[imporvement](table property) support for alter table property disable_auto_compaction (#27961 ) in some case, some tablets may cause coredump or OOM when compaction, and it is necessary to manually close the compaction of a specific table by 'disable_auto_compaction' to make be service available This commit allow modify disable_auto_compaction table property in schema change. --------- Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2023-12-07 15:08:39 +08:00
yiguolei	8c79b86f5b	Revert "[feature](merge-on-write) enable merge-on-write by default (#27188 )" (#28096 ) This reverts commit 00c8bab84de8154052f9d323800b436cd0ad36e5.	2023-12-07 11:31:36 +08:00
Mingyu Chen	3a7a8bb107	[opt](resource-tag) root and admin user can use any resource tag by default (#28088 ) In #25331, I change the behavior of user's default resource tag, that is, if a user does not set resource tag, it can only use default resource tag. This PR change this logic. The normal user can only use default resource tag if resource tag is not set, but root and admin user can use any resource tag if resource tag is not set.	2023-12-07 11:22:30 +08:00
Calvin Kirs	b6722653cf	[test](Job)Delete the JOB show syntax (now we use TVF) and add tvf case (#28058 )	2023-12-07 10:17:52 +08:00
Yulei-Yang	630de740ea	[fix](meta) fix bug for using full name in show_full_columns stmt (#28019 )	2023-12-07 10:17:25 +08:00
Jibing-Li	4cac07be30	[improvement](statistics)Analyze empty table. #28077 Analyze a table even when it's empty. The result should be like this: mysql> show column stats nation; +-------------+-------+------+----------+-----------+---------------+------+------+--------+--------------+---------+-------------+---------------------+ \| column_name \| count \| ndv \| num_null \| data_size \| avg_size_byte \| min \| max \| method \| type \| trigger \| query_times \| updated_time \| +-------------+-------+------+----------+-----------+---------------+------+------+--------+--------------+---------+-------------+---------------------+ \| n_comment \| 0.0 \| 0.0 \| 0.0 \| 0.0 \| 0.0 \| N/A \| N/A \| FULL \| FUNDAMENTALS \| MANUAL \| 0 \| 2023-12-06 19:22:09 \| \| n_nationkey \| 0.0 \| 0.0 \| 0.0 \| 0.0 \| 0.0 \| N/A \| N/A \| FULL \| FUNDAMENTALS \| MANUAL \| 0 \| 2023-12-06 19:22:09 \| \| n_regionkey \| 0.0 \| 0.0 \| 0.0 \| 0.0 \| 0.0 \| N/A \| N/A \| FULL \| FUNDAMENTALS \| MANUAL \| 0 \| 2023-12-06 19:22:09 \| \| n_name \| 0.0 \| 0.0 \| 0.0 \| 0.0 \| 0.0 \| N/A \| N/A \| FULL \| FUNDAMENTALS \| MANUAL \| 0 \| 2023-12-06 19:22:09 \| +-------------+-------+------+----------+-----------+---------------+------+------+--------+--------------+---------+----	2023-12-07 10:16:52 +08:00
Guangdong Liu	42b3dd35bb	[regression test](broker load) add case for without filepath (#27658 )	2023-12-07 10:15:37 +08:00
wuwenchi	54d062ddee	[feature](stream load) (step one)Add arrow data type for stream load (#26709 ) By using the Arrow data format, we can reduce the streamload of data transferred and improve the data import performance	2023-12-06 23:29:46 +08:00
yiguolei	4a4d137402	[feature](workloadgroup) support nereids internal query and all dml query (#28054 ) support nereids internal query to bind a workload group support insert into select bind workload group support create table as select bind workload group change token wait timeout to be query timeout or queue timeout query queue should not bind to pipeline engine, it could be used every where.	2023-12-06 21:07:55 +08:00
bobhan1	00c8bab84d	[feature](merge-on-write) enable merge-on-write by default (#27188 )	2023-12-06 21:06:58 +08:00
Nitin-Kashyap	0ff5a1cc25	[fix](doc) spell error and aligned with code (#27609 )	2023-12-06 20:58:39 +08:00

1 2 3 4 5 ...

5450 Commits