doris

Author	SHA1	Message	Date
Mingyu Chen	f788acaa6e	[fix](regression-test) fix insert overwrite case same db name issue (#19839 )	2023-05-19 08:43:46 +08:00
WenYao	481e9aebdb	[Refactor](spark load) remove parquet scanner (#19251 )	2023-05-18 19:19:13 +08:00
Kang	294599ee45	[feature](jsonb) rename JSONB type name and function name to JSON (#19774 ) To be more compatible with MySQL, rename JSONB type name and function name to JSON. The old JSONB type name and jsonb_xx function can still be used for backward compatibility. There is a function jsonb_extract remained since json_extract is used by json string function and more work need to change it. It will be changed further.	2023-05-18 16:16:52 +08:00
Gabriel	851886cc18	[minor](datev2) remove datev2 because datev2 is used by default (#19777 )	2023-05-18 13:36:11 +08:00
ZI-MA	f43e8cc98f	[regressiontest](unionall) Regression_test_similar_query_boolean (#19553 ) * regression_test_similar_query * add the ORDER BY * update ORDER BY to comfirm correctness --------- Co-authored-by: ZI-MA <chime316@qq.com>	2023-05-18 12:21:32 +08:00
starocean999	18c1081659	[fix](nereids) fix some nereids bugs (#19711 ) 1. add json_unquote and json_extract functions 2. remove mv releated code in visitPhysicalOlapScan 3. forbid bitmap and hll type for topn node's sort exprs 4. HashDistributionInfo of olap scan node should use the slots from output not the full schema 5. SelectMaterializedIndexWithoutAggregate should use the filter node's output together with the predicate to get the correct mv 6. forbid SimplifyArithmeticRule for decimal type 7. make DecimalLiteral's type and value consistent with each other if the value is decimalv2 8. json_array need support empty argument	2023-05-18 11:33:56 +08:00
Kang	88ca4f3e6b	[feature](like) make like regexp used as a sql function (#19755 )	2023-05-18 10:03:12 +08:00
lvshaokang	c80c4477cf	[Enhancement](broker-load) broker load show stmt support display cluster name if specified (#19392 )	2023-05-18 00:10:15 +08:00
xueweizhang	97d4778ecf	[enhancement](schema) dynamic_partition.time_unit support year (#19551 ) dynamic_partition.time_unit support year	2023-05-17 23:49:15 +08:00
luozenglin	60d5c82f44	[fix](tvf) fix the inconsistency between tvf backends function and show backends result (#19697 )	2023-05-17 22:55:46 +08:00
amory	67668905d6	[Improve](complex-type)add complex type support unique table with regress test #19751 add complex type support unique table with regress test struct / map / array now support unique table but no regress test	2023-05-17 21:32:46 +08:00
mch_ucchi	1d05feea1b	[Feature](Nereids) add executable function to support fold constant for functions (#18209 ) 1. Add date-time functions for fold constant for Nereids. This is the list of executable date-time function nereids supports up to now: - now() - now(int) - current_timestamp() - current_timestamp(int) - localtime() - localtimestamp() - curdate() - current_date() - curtime() - current_time() - date_{add/sub}(),{years/months/days/hours/minutes/seconds}_{add/sub}() - datediff() - {date/datev2}() - {year/quarter/month/day/hour/minute/second}() - dayof{year/month/week}() - date_format() - date_trunc() - from_days() - last_day() - to_monday() - from_unixtime() - unix_timestamp() - utc_timestamp() - to_date() - to_days() - str_to_date() - makedate() 2. solved problem: - enable datev2/datetimev2 default. - refactor Nereids foldConstantOnFE and support fold nested expression. - separate the executable into multi-files for easily-reading and adding new functions	2023-05-17 21:26:31 +08:00
Tiewei Fang	1eb929e1ca	[Bugfix](Jdbc Catalog) fix data type mapping of SQLServer Catalog (#19525 ) We map `money/smallmoney` types of SQLSERVER into decimal type of doris.	2023-05-17 21:02:42 +08:00
Ashin Gau	30c4f25cb3	[fix](multi-catalog) verify the precision of datetime types for each data source (#19544 ) Fix threes bugs of timestampv2 precision: 1. Hive catalog doesn't set the precision of timestampv2, and can't get the precision from hive metastore, so set the largest precision for timestampv2; 2. Jdbc catalog use datetimev1 to parse timestamp, and convert to timestampv2, so the precision is lost. 3. TVF doesn't use the precision from meta data of file format.	2023-05-17 20:50:15 +08:00
Chengpeng Yan	05d47d43bd	[Fix](Nereids) check the tableName in catalog (#19695 ) # Proposed changes In the nereids. Before this PR: when we access some unexists tables. It will report the exception as follows: ``` mysql> select * from tt; ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: null ``` After this PR, it will get the following results: ``` mysql> select * from tt; ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: Table [tt] does not exist in database [default_cluster:test]. ``` ## Problem summary It is because in this [function](`f5af07f7b2/fe/fe-core/src/main/java/org/apache/doris/nereids/CascadesContext.java (L328)`), we ignore the exception. So the size of `tables` in `CascadesContext` is zero not null. So we can only get null after `table = cascadesContext.getTableByName(tableName);`.	2023-05-17 19:48:30 +08:00
Zhang Wenxin	9b6b847745	[test](Nereids) diable Nereids explicitly on explain case in regression test (#19744 )	2023-05-17 19:26:30 +08:00
Zhang Wenxin	bee2e2964f	[refactor](Nereids) refactor adjust nullable rule as a custom rewriter (#19702 ) use custom rewriter to do adjust nullable to avoid nullable changed in expression but not changed in output	2023-05-17 19:24:42 +08:00
Kang	ce12cf404c	[bugfix](inverted index) Fix mv inheriting unexpectedly inverted index of base table (#19722 )	2023-05-17 17:18:07 +08:00
starocean999	3e661a30c2	[fix](planner)just return non-empty side of ExprSubstitutionMap if one of ExprSubstitutionMap is empty (#19600 )	2023-05-17 15:06:43 +08:00
xueweizhang	48ec530d2c	[fix](functions) fix least/greatest function coredump bug (#19462 ) fix least/greatest function coredump bug	2023-05-17 14:12:52 +08:00
lihangyu	1462e44162	[Bug](topn) fix rowid fetcher merge with empty block (#19712 )	2023-05-17 10:56:32 +08:00
Pxl	d784c99360	[Bug](planner) fix unassigned conjunct assigned on wrong node (#19672 ) * fix unassigned conjunct assigned on wrong node	2023-05-17 10:28:22 +08:00
Pxl	7f73749b88	[Bug](pipeline) fix distributionColumnIds not updated correct when outputColumnUnique… (#19704 ) fix distributionColumnIds not updated correct when outputColumnUnique	2023-05-17 00:13:10 +08:00
Ziyu Wang	325a1d4b28	[vectorized](function) support array_count function (#18557 ) support array_count function. array_count：Returns the number of non-zero and non-null elements in the given array.	2023-05-16 17:00:01 +08:00
lihangyu	e22f5891d2	[WIP](row store) two phase opt read row store (#18654 )	2023-05-16 13:21:58 +08:00
Weijie Guo	9535ed01aa	[feature](tvf) Support compress file for tvf hdfs() and s3() (#19530 ) We can support this by add a new properties for tvf, like : `select * from hdfs("uri" = "xxx", ..., "compress_type" = "lz4", ...)` User can: Specify compression explicitly by setting `"compression" = "xxx"`. Doris can infer the compression type by the suffix of file name(e.g. `file1.gz`) Currently, we only support reading compress file in `csv` format, and on BE side, we already support. All need to do is to analyze the `"compress_type"` on FE side and pass it to BE.	2023-05-16 08:50:43 +08:00
Liqf	c87e78dc35	[bug](jsonb) fix jsonb query bug When the json key value contains "." (#19185 ) Issue Number: close #19173 mysql> SELECT jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1'); +-------------------------------------------------------------------------------------------+ \| jsonb_extract('{"a.b.c":{"k1":"v31", "k2.a1": 300},"a":"opentelemetry"}', '$."a.b.c".k1') \| +-------------------------------------------------------------------------------------------+ \| "v31" \| +-------------------------------------------------------------------------------------------+ 1 row in set (0.06 sec)	2023-05-15 15:43:12 +08:00
LiBinfeng	052c7cff89	[Fix](Planner) fix cast from decimal to boolean (#19585 )	2023-05-15 15:13:16 +08:00
Zhengguo Yang	6748ae4a57	[Feature] Collect the information statistics of the query hit (#18805 ) 1. Show the query hit statistics for `baseall` ```sql MySQL [test_query_db]> show query stats from baseall; +-------+------------+-------------+ \| Field \| QueryCount \| FilterCount \| +-------+------------+-------------+ \| k0 \| 0 \| 0 \| \| k1 \| 0 \| 0 \| \| k2 \| 0 \| 0 \| \| k3 \| 0 \| 0 \| \| k4 \| 0 \| 0 \| \| k5 \| 0 \| 0 \| \| k6 \| 0 \| 0 \| \| k10 \| 0 \| 0 \| \| k11 \| 0 \| 0 \| \| k7 \| 0 \| 0 \| \| k8 \| 0 \| 0 \| \| k9 \| 0 \| 0 \| \| k12 \| 0 \| 0 \| \| k13 \| 0 \| 0 \| +-------+------------+-------------+ 14 rows in set (0.002 sec) MySQL [test_query_db]> select k0, k1,k2, sum(k3) from baseall where k9 > 1 group by k0,k1,k2; +------+------+--------+-------------+ \| k0 \| k1 \| k2 \| sum(`k3`) \| +------+------+--------+-------------+ \| 0 \| 6 \| 32767 \| 3021 \| \| 1 \| 12 \| 32767 \| -2147483647 \| \| 0 \| 3 \| 1989 \| 1002 \| \| 0 \| 7 \| -32767 \| 1002 \| \| 1 \| 8 \| 255 \| 2147483647 \| \| 1 \| 9 \| 1991 \| -2147483647 \| \| 1 \| 11 \| 1989 \| 25699 \| \| 1 \| 13 \| -32767 \| 2147483647 \| \| 1 \| 14 \| 255 \| 103 \| \| 0 \| 1 \| 1989 \| 1001 \| \| 0 \| 2 \| 1986 \| 1001 \| \| 1 \| 15 \| 1992 \| 3021 \| +------+------+--------+-------------+ 12 rows in set (0.050 sec) MySQL [test_query_db]> show query stats from baseall; +-------+------------+-------------+ \| Field \| QueryCount \| FilterCount \| +-------+------------+-------------+ \| k0 \| 1 \| 0 \| \| k1 \| 1 \| 0 \| \| k2 \| 1 \| 0 \| \| k3 \| 1 \| 0 \| \| k4 \| 0 \| 0 \| \| k5 \| 0 \| 0 \| \| k6 \| 0 \| 0 \| \| k10 \| 0 \| 0 \| \| k11 \| 0 \| 0 \| \| k7 \| 0 \| 0 \| \| k8 \| 0 \| 0 \| \| k9 \| 1 \| 1 \| \| k12 \| 0 \| 0 \| \| k13 \| 0 \| 0 \| +-------+------------+-------------+ 14 rows in set (0.001 sec) ``` 2. Show the query hit statistics summary for all the mv in a table ```sql MySQL [test_query_db]> show query stats from baseall all; +-----------+------------+ \| IndexName \| QueryCount \| +-----------+------------+ \| baseall \| 1 \| +-----------+------------+ 1 row in set (0.005 sec) ``` 3. Show the query hit statistics detail info for all the mv in a table ```sql MySQL [test_query_db]> show query stats from baseall all verbose; +-----------+-------+------------+-------------+ \| IndexName \| Field \| QueryCount \| FilterCount \| +-----------+-------+------------+-------------+ \| baseall \| k0 \| 1 \| 0 \| \| \| k1 \| 1 \| 0 \| \| \| k2 \| 1 \| 0 \| \| \| k3 \| 1 \| 0 \| \| \| k4 \| 0 \| 0 \| \| \| k5 \| 0 \| 0 \| \| \| k6 \| 0 \| 0 \| \| \| k10 \| 0 \| 0 \| \| \| k11 \| 0 \| 0 \| \| \| k7 \| 0 \| 0 \| \| \| k8 \| 0 \| 0 \| \| \| k9 \| 1 \| 1 \| \| \| k12 \| 0 \| 0 \| \| \| k13 \| 0 \| 0 \| +-----------+-------+------------+-------------+ 14 rows in set (0.017 sec) ``` 4. Show the query hit for a database ```sql MySQL [test_query_db]> show query stats for test_query_db; +----------------------------+------------+ \| TableName \| QueryCount \| +----------------------------+------------+ \| compaction_tbl \| 0 \| \| bigtable \| 0 \| \| empty \| 0 \| \| tempbaseall \| 0 \| \| test \| 0 \| \| test_data_type \| 0 \| \| test_string_function_field \| 0 \| \| baseall \| 1 \| \| nullable \| 0 \| +----------------------------+------------+ 9 rows in set (0.005 sec) ``` 5. Show query hit statistics for all the databases ```sql MySQL [(none)]> show query stats; +-----------------+------------+ \| Database \| QueryCount \| +-----------------+------------+ \| test_query_db \| 1 \| +-----------------+------------+ 1 rows in set (0.005 sec) ```	2023-05-15 10:56:34 +08:00
zclllyybb	92bf485abd	[Bug] Fix doris pipeline shared scan and top n opt (#19599 )	2023-05-15 10:00:44 +08:00
nanfeng	0068828a94	[Feature](insert) support insert overwrite stmt (#19616 )	2023-05-14 20:01:30 +08:00
Tiewei Fang	91cdb79d89	[Bugfix](Outfile) fix that export data to parquet and orc file format (#19436 ) 1. support export `LARGEINT` data type to parquet/orc file format. 2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format. 3. Fix that the data is not correct when the DATE type data is exported to ORC.	2023-05-13 22:39:24 +08:00
ElvinWei	c37d781942	[enchancement](statistics) manually inject table level statistics (#19495 ) supports users to manually inject table level statistics. table stats type: - row_count Modify table or partition statistics: ```SQL ALTER TABLE table_name SET STATS ('k1' = 'v1', ...) ``` TODO： - support other table stats type if necessary - update statistics cache if necessary	2023-05-12 17:03:12 +08:00
Zhang Wenxin	a1da57c63e	[opt](Nereids)(WIP) optimize agg and window normalization step 2 #19305 1. refactor aggregate normalization to avoid data amplification before aggregate 2. remove useless aggreagte processing in ExtractAndNormalizeWindowExpression 3. only push distinct aggregate function children TODO: 1. push down redundant expression in aggregate functions 2. refactor normalize repeat rule 3. move expression normalization and optimization after plan normalization to avoid unexpected expression optimization.	2023-05-12 14:00:13 +08:00
starocean999	e9392780a9	[fix](nereids)fix some nereids planner bugs (#19509 ) 1.some encrypt and decrypt functions have wrong blockEncryptionMode 2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id 3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )	2023-05-12 09:06:16 +08:00
xy720	39ec8aa64c	[refactor](complex-type) refactor array/map/struct literal to not invoke execute() function in prepare state (#19068 )	2023-05-11 18:44:37 +08:00
yangshijie	ed8a4b4120	[feature-wip](duplicate_no_keys) skip sort function if the table is duplicate without keys (#19483 )	2023-05-11 14:44:16 +08:00
jakevin	dc497e11bb	[fix](Nereids) avoid to push top Project of JoinCluster in PushdownProjectThroughJoin (#19441 ) We shouldn't push top Project of JoinCluster in PushdownProjectThroughJoin like ``` * Project (id + 1) if this project is top project of Join Cluster * \| * Join * / \ * Join Join * / .... * Join ```	2023-05-11 13:58:54 +08:00
herry2038	834bf2eab7	[feature](array) Add array_last lambda function (#18388 ) Add array_last lambda function	2023-05-11 13:15:54 +08:00
zhannngchen	5167dc1251	[feature](merge-on-write) enable merge on write by default (#19017 )	2023-05-11 11:10:48 +08:00
abmdocrt	71f7e9e185	[test](cast func) add test for cast float text to int when nereids is on #19517	2023-05-11 08:24:54 +08:00
Qi Chen	4418eb36a3	[Fix](multi-catalog) Fix some hive partition issues. (#19513 ) Fix some hive partition issues. 1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type. 2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.	2023-05-11 07:49:46 +08:00
Tiewei Fang	95833426e8	[BugFix](table-value-function) Fix backends() tvf (#19452 ) Change the `Alive/SystemDecommissioned/ClusterDecommissioned` field type of the `backends()`tvf to bool	2023-05-11 07:49:27 +08:00
Jibing-Li	68505a1192	[Test](multi catalog)Add test case for Iceberg External Table. #19488	2023-05-11 01:13:40 +08:00
Jerry Hu	47edc5a06e	[fix](functions) Support nullable column for multi_string functions (#19498 )	2023-05-11 01:13:13 +08:00
zhangdong	b129c9901b	[improvement](FQDN)Change the implementation of fqdn (#19123 ) Main changes: 1. If fqdn is enabled in the configuration file, when fe starts, localAddr will obtain fqdn instead of IP, priority_ Networks will fail 2. The IP and host names of Backend and Front are combined into one field, host. When fqdn is enabled, it represents the host name, and when not enabled, it represents the IP address 3. The communication between clusters directly uses fqdn, and various Connection pool add authentication mechanisms to prevent the IP address of the domain name from changing and the connection between nodes from making errors 4. No longer requires polling to verify if the IP has changed, delete fqdnManager 5. Change the method of verifying the legitimacy of nodes between FEs from obtaining client IP to displaying the identity of the transmitting node itself in the HTTP request header or the message body of the throttle 6. When processing the heartbeat, if BE finds that the host stored by itself is inconsistent with the host stored by the master, after verifying the legitimacy of the host, it will change its own host instead of directly reporting an error 7. Simplify the generation logic of fe name Scope of influence: 1. Establishing communication connections between clusters 2. Determine whether it is the same node through attributes such as IP 3. Print Log 4. Information display 5. Address Splicing 6. k8s deployment 7. Upgrade compatibility Test plan: 1. Change the IP address of the node, while keeping the fqdn unchanged, change the IP addresses of fe and be, and verify whether the cluster can read and write data normally 2. Use the master code to generate metadata, and use the previous metadata on the current pr to verify whether it is compatible with the old version (upgrading is no longer supported if fqdn has been enabled before) 3. Deploy fe and be clusters using k8s to verify whether the cluster can read and write data normally 4. According to https://doris.apache.org/zh-CN/docs/dev/admin-manual/cluster-management/fqdn?_highlight=fqdn#%E6%97%A7%E9%9B%86%E7%BE%A4%E5%90%AF%E7%94%A8fqdn Upgrading old clusters 5. Use streamload to specify the fqdn of fe and be to import data separately 6. Use different users to start transactions and write data using insert statements	2023-05-11 00:44:48 +08:00
yongkang.zhong	3a22af836e	[fix](jdbc catalog) fix error to clickhouse uint64 type Conversion (#19463 ) * [fix](jdbc catalog) fix error to clickhouse uint64 type Conversion * add test case	2023-05-10 21:53:30 +08:00
starocean999	d0a8cd0fc5	[fix](nereids) dphyper join reorder may lost some join conjuncts (#19318 )	2023-05-10 19:02:35 +08:00
starocean999	337732ae01	[fix](nereids) lost exchange before global limit merge node sometimes (#19396 ) should add exchange node between global and local limit	2023-05-10 17:57:21 +08:00
奕冷	894801f5ce	[feature](load-refactor) Step1: InsertStmt as facade layer and run S3/Broker Load (#19142 )	2023-05-10 17:48:50 +08:00

1 2 3 4 5 ...

1656 Commits