doris

Author	SHA1	Message	Date
starocean999	e9392780a9	[fix](nereids)fix some nereids planner bugs (#19509 ) 1.some encrypt and decrypt functions have wrong blockEncryptionMode 2.topN node should compare tuples from intermediate_row_desc with first_sort_slot.tuple_id 3.must keep the limit if it's an uncorrelated in-subquery with limit on sort, like select a from t1 where a in ( select b from t2 order by xx limit yy )	2023-05-12 09:06:16 +08:00
Chuang Li	a041f8eabe	[fix](fe) Fx SimpleDateFormatter thread unsafe issue by replacing to DateTimeFormatter. (#19265 ) DateTimeFormatter replace SimpleDateFormat in fe module because SimpleDateFormat is not thread-safe.	2023-05-11 22:50:24 +08:00
jakevin	d58498841a	[fix](Nereids) Should copy JoinReorderContext for PushdownProject (#19508 ) 1. should copy JoinReorderContext 2. verify bushy tree join reorder	2023-05-11 21:05:12 +08:00
Zhang Wenxin	35c4de9fea	[fix](Nereids) convert decimalv2 type to decimalv3 type by mistake (#19491 )	2023-05-11 19:11:51 +08:00
minghong	c5a53e0caa	[tpch](nereids) estimate cost with unknown column stats #19046 make nereids generate more reasonable plans with table row count, but without column stats. TODO: q5 and q7 is not good, because of column correlation ps_suppkey and ps_partkey	2023-05-11 19:03:11 +08:00
xy720	39ec8aa64c	[refactor](complex-type) refactor array/map/struct literal to not invoke execute() function in prepare state (#19068 )	2023-05-11 18:44:37 +08:00
LiBinfeng	99cef84acf	[Feature](Nereids) Add nereids minidump (#18747 )	2023-05-11 18:36:30 +08:00
AKIRA	45c89c1d3c	[Fix](stats) Stats persistence failed when a column is all null values (#19412 )	2023-05-11 17:44:44 +08:00
Xiangyu Wang	589dd8a9b3	[Fix](multi-catalog) Fix query hms tbl with compressed data files. (#19387 ) If submit a query contains hms tbls which data files are compressed (bz2,lzo,lz4 ...), a error will occurs like this: ```[INTERNAL_ERROR]Only support csv data in utf8 codec``` . This is because `org.apache.doris.planner.external.HiveScanNode` set `fileFormatType` as `TFileFormatType.FORMAT_CSV_PLAIN` whether the real compress algo of data files are. This pr try to fix this problem.	2023-05-11 14:53:58 +08:00
AKIRA	6d2070c59d	[enhancement](stats) Make stats cache item size configurable (#19205 )	2023-05-11 13:59:37 +08:00
jakevin	dc497e11bb	[fix](Nereids) avoid to push top Project of JoinCluster in PushdownProjectThroughJoin (#19441 ) We shouldn't push top Project of JoinCluster in PushdownProjectThroughJoin like ``` * Project (id + 1) if this project is top project of Join Cluster * \| * Join * / \ * Join Join * / .... * Join ```	2023-05-11 13:58:54 +08:00
herry2038	834bf2eab7	[feature](array) Add array_last lambda function (#18388 ) Add array_last lambda function	2023-05-11 13:15:54 +08:00
zhannngchen	5167dc1251	[feature](merge-on-write) enable merge on write by default (#19017 )	2023-05-11 11:10:48 +08:00
Ashin Gau	3ba3b6c66f	[opt](FileCache) use modification time to determine whether the file is changed (#18906 ) Get the last modification time from file status, and use the combination of path and modification time to generate cache identifier. When a file is changed, the modification time will be changed, so the former cache path will be invalid.	2023-05-11 07:50:39 +08:00
Qi Chen	4418eb36a3	[Fix](multi-catalog) Fix some hive partition issues. (#19513 ) Fix some hive partition issues. 1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type. 2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.	2023-05-11 07:49:46 +08:00
Tiewei Fang	95833426e8	[BugFix](table-value-function) Fix backends() tvf (#19452 ) Change the `Alive/SystemDecommissioned/ClusterDecommissioned` field type of the `backends()`tvf to bool	2023-05-11 07:49:27 +08:00
Jibing-Li	2d1f597413	[Fix](statistics)Fix hive table statistic bug (#19365 ) Fix hive table statistic bug. Collect table/partition level statistics.	2023-05-11 07:48:58 +08:00
Yulei-Yang	41d4ed8367	[Improvement](multicatalog) support show_partitions for hms catalog (#19242 ) * [Improvement](multicatalog) support show_partitions for hms catalog * update according review advice	2023-05-11 01:17:23 +08:00
Lei Zhang	8845c2cf44	[fix](bdbje) remove `System.exit(-1)` in BDBEnvironment.close() (#19335 ) * https://github.com/apache/doris/issues/18766	2023-05-11 01:01:38 +08:00
Xiangyu Wang	0f6c69de53	[Fix](multi-catalog) Fix sync hms event failed when start FE soon. (#19344 ) * [Fix](multi-catalog) Fix sync hms event failed when start FE soon after. * [Fix](multi-catalog) Fix sync hms event failed when start FE soon after. --------- Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>	2023-05-11 01:00:55 +08:00
zhangdong	b129c9901b	[improvement](FQDN)Change the implementation of fqdn (#19123 ) Main changes: 1. If fqdn is enabled in the configuration file, when fe starts, localAddr will obtain fqdn instead of IP, priority_ Networks will fail 2. The IP and host names of Backend and Front are combined into one field, host. When fqdn is enabled, it represents the host name, and when not enabled, it represents the IP address 3. The communication between clusters directly uses fqdn, and various Connection pool add authentication mechanisms to prevent the IP address of the domain name from changing and the connection between nodes from making errors 4. No longer requires polling to verify if the IP has changed, delete fqdnManager 5. Change the method of verifying the legitimacy of nodes between FEs from obtaining client IP to displaying the identity of the transmitting node itself in the HTTP request header or the message body of the throttle 6. When processing the heartbeat, if BE finds that the host stored by itself is inconsistent with the host stored by the master, after verifying the legitimacy of the host, it will change its own host instead of directly reporting an error 7. Simplify the generation logic of fe name Scope of influence: 1. Establishing communication connections between clusters 2. Determine whether it is the same node through attributes such as IP 3. Print Log 4. Information display 5. Address Splicing 6. k8s deployment 7. Upgrade compatibility Test plan: 1. Change the IP address of the node, while keeping the fqdn unchanged, change the IP addresses of fe and be, and verify whether the cluster can read and write data normally 2. Use the master code to generate metadata, and use the previous metadata on the current pr to verify whether it is compatible with the old version (upgrading is no longer supported if fqdn has been enabled before) 3. Deploy fe and be clusters using k8s to verify whether the cluster can read and write data normally 4. According to https://doris.apache.org/zh-CN/docs/dev/admin-manual/cluster-management/fqdn?_highlight=fqdn#%E6%97%A7%E9%9B%86%E7%BE%A4%E5%90%AF%E7%94%A8fqdn Upgrading old clusters 5. Use streamload to specify the fqdn of fe and be to import data separately 6. Use different users to start transactions and write data using insert statements	2023-05-11 00:44:48 +08:00
yongkang.zhong	3a22af836e	[fix](jdbc catalog) fix error to clickhouse uint64 type Conversion (#19463 ) * [fix](jdbc catalog) fix error to clickhouse uint64 type Conversion * add test case	2023-05-10 21:53:30 +08:00
starocean999	d0a8cd0fc5	[fix](nereids) dphyper join reorder may lost some join conjuncts (#19318 )	2023-05-10 19:02:35 +08:00
starocean999	337732ae01	[fix](nereids) lost exchange before global limit merge node sometimes (#19396 ) should add exchange node between global and local limit	2023-05-10 17:57:21 +08:00
奕冷	894801f5ce	[feature](load-refactor) Step1: InsertStmt as facade layer and run S3/Broker Load (#19142 )	2023-05-10 17:48:50 +08:00
Mryange	d20b5f90d8	[feature](executor) Automatically set the instance_num using the info from be. (#19345 ) 1. fixed some error regressions (results error with big nstance_num due to incorrect order by). 2. if set parallel_fragment_exec_instance_num to 0, the concurrency in the Pipeline execution engine will automatically be set to half of the number of CPU cores. 3. add limit to parallel_fragment_exec_instance_num that it cannot be set to more than fe.conf::max_instance_num(Default: 128) ``` mysql [(none)]>set parallel_fragment_exec_instance_num = 514; ERROR 1231 (42000): errCode = 2, detailMessage = Variable 'parallel_fragment_exec_instance_num' can't be set to the value of '514(Should not be set to more than 128)' ```	2023-05-10 17:07:41 +08:00
Gabriel	4483e3a6e1	[Improvement](scan) add a config for scan queue memory limit (#19439 )	2023-05-10 13:14:23 +08:00
yiguolei	ab8cfbbfb6	[bugfix](regression-test) add some window function test (#19460 ) Only 2000 union will cause BE use a lot of memory, so that I enable other test in this PR only disable 2000 union case. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-05-10 12:06:02 +08:00
jakevin	553068f7be	[feat](Nereids): trace enumeration of DPHyp (#19394 )	2023-05-10 11:57:35 +08:00
ElvinWei	fae2e5fd22	[enchancement](statistics) implement automatically analyzing statistics and support table level statistics #19420 Add table level statistics, support SHOW TABLE STATS statement to show table level statistics. Implement automatically analyze statistics, support ANALYZE... WITH AUTO ... statement to automatically analyze statistics. TODO: collate relevant p0 tests Supplement the design description to README.md Issue Number: close #xxx	2023-05-10 11:47:34 +08:00
Mingyu Chen	601565341b	[fix](gson) avoid gson serde with EsRepository (#19385 ) To avoid error like: class org.apache.doris.external.elasticsearch.EsRepository declares multiple JSON fields named runnable	2023-05-10 11:37:18 +08:00
Jibing-Li	78435823b6	[Fix](multi catalog)Return all partition values while reading hive table. (#19434 ) Return all partition values while reading hive table. Add a config item for the max value of hive table to partition list cache. Default value is 100.	2023-05-10 10:55:33 +08:00
Ashin Gau	68eb420cab	[fix](MySQL) the way Doris handles boolean type is consistent with MySQL (#19416 )	2023-05-10 00:58:09 +08:00
Qi Chen	096aa25ca6	[improvement](orc-reader) Implements ORC lazy materialization (#18615 ) - Implements ORC lazy materialization, integrate with the implementation of https://github.com/apache/doris-thirdparty/pull/56 and https://github.com/apache/doris-thirdparty/pull/62. - Refactor code: Move `execute_conjuncts()` and `execute_conjuncts_and_filter_block()` in `parquet_group_reader `to `VExprContext`, used by parquet reader and orc reader. - Add session variables `enable_parquet_lazy_materialization` and `enable_orc_lazy_materialization` to control whether enable lazy materialization. - Modify `build.sh` to update apache-orc submodule or download package every time.	2023-05-09 23:33:33 +08:00
yongkang.zhong	1bc405c06f	[fix](catalog) fix doris jdbc catalog largeint select error (#19407 ) when I use mysql-jdbc 5.1.47 create a doris jdbc catalog, the largeint cannot select When mysql-jdbc reads largeint, it will convert the format to string because it is too long mysql> select `largeint` from type3; ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Fail to convert jdbc type of java.lang.String to doris type LARGEINT on column: largeint. You need to check this column type between external table and doris table.	2023-05-09 17:34:48 +08:00
chenlinzhong	aeb3450151	[feature](graph)Support querying data from the Nebula graph database (#19209 ) Support querying data from the Nebula graph database This feature comes from the needs of commercial customers who have used Doris and Nebula, hoping to connect these two databases changes mainly include: * add New Graph Database JDBC Type * Adapt the type and map the graph to the Doris type	2023-05-09 15:30:11 +08:00
Ashin Gau	e3d4723849	[fix](JDBC) set jdbc parameters to compatible with both MySQL and Doris when reading boolean type (#19399 ) Fix errors when read boolean type from external doris cluster by jdbc catalog: ``` ERROR 1105 (HY000): errCode = 2, detailMessage = (172.16.10.11)[INTERNAL_ERROR]Fail to convert jdbc type of java.lang.Integer to doris type BOOL on column: deleted. You need to check this column type between external table and doris table. ``` MySQL Types and Return Values for GetColumnTypeName and GetColumnClassName are presented in https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-reference-type-conversions.html. However when tinyInt1isBit=false, GetColumnClassName of MySQL returns java.lang.Boolean, while that of Doris returns java.lang.Integer. In order to be compatible with both MySQL and Doris, Jdbc params should set tinyInt1isBit=true&transformedBitIsBoolean=true	2023-05-09 13:53:17 +08:00
ZashJie	4302ceaee8	[Improvement](data types) enhance show data types stmt (#18831 )	2023-05-09 09:42:44 +08:00
lvshaokang	af04c3acab	[fix](sequence-column) Fix sequence_col column used default expr insert failed (#18933 )	2023-05-08 17:18:25 +08:00
yongkang.zhong	c7a04fa05a	[improvement](JDBC Catalog)Added Presto connection to Presto/Trino (#19307 )	2023-05-08 14:05:56 +08:00
yongkang.zhong	7f0d6eb644	[log](fe)add log partitionInfo is null, fe not start service (#19143 )	2023-05-08 14:04:16 +08:00
Tiewei Fang	e78149cb65	[Enhencement](Export) add property for outfile/export and add test (#18997 ) This pr does three things: 1. add `delete_existing_files` property for outfile/export. If `delete_existing_files = true`, export/outfile will delete all files under file_path first. 2. add p2 test for export 3. modify docs	2023-05-08 14:02:20 +08:00
Ashin Gau	05c5c5949c	[refactor](FileCache) set FE session variable enable_file_cache=false as default (#19327 ) Users should set `enable_file_cache=true` in FE session variables and BE configuration to enable file cache.	2023-05-08 13:53:51 +08:00
Mingyu Chen	fb5b3029a7	[fix](meta) fix image file checksum error (#19363 )	2023-05-08 10:00:09 +08:00
yongkang.zhong	32273a7a9b	[improvement](backend)Optimized error messages for insufficient replication (#19211 ) optimized the error message for creating insufficient table replications	2023-05-07 20:45:21 +08:00
Mingyu Chen	abc73ac1eb	[refactor](cluster)(step-1) remove cluster related stmt (#19355 ) * [refactor](cluster)(step-1) remove cluster stmt	2023-05-07 18:44:42 +08:00
Yusheng Xu	9edbfa37cd	[Enhancement](Broker Load) New progress manager for showing loading progress status (#19170 ) This work is in the early stage, current progress is not accurate because the scan range will be too large for gathering information, what's more, only file scan node and import job support new progress manager ## How it works for example, when we use the following load query: ``` LOAD LABEL test_broker_load ( DATA INFILE("XXX") INTO TABLE `XXX` ...... ) ``` Initial Progress: the query will call `BrokerLoadJob` to create job, then `coordinator` is called to calculate scan range and its location. Update Progress: BE will report runtime_state to FE and FE update progress status according to jobID and fragmentID we can use `show load` to see the progress PENDING: ``` State: PENDING Progress: 0.00% ``` LOADING: ``` State: LOADING Progress: 14.29% (1/7) ``` FINISH: ``` State: FINISHED Progress: 100.00% (7/7) ``` At current time, full output of `show load\G` looks like: ``` ************************* 1. row ************************* JobId: 25052 Label: test_broker State: LOADING Progress: 0.00% (0/7) Type: BROKER EtlInfo: NULL TaskInfo: cluster:N/A; timeout(s):250000; max_filter_ratio:0.0 ErrorMsg: NULL CreateTime: 2023-05-03 20:53:13 EtlStartTime: 2023-05-03 20:53:15 EtlFinishTime: 2023-05-03 20:53:15 LoadStartTime: 2023-05-03 20:53:15 LoadFinishTime: NULL URL: NULL JobDetails: {"Unfinished backends":{"5a9a3ecd203049bc-85e39a765c043228":[10080]},"ScannedRows":39611808,"TaskNumber":1,"LoadBytes":7398908902,"All backends":{"5a9a3ecd203049bc-85e39a765c043228":[10080]},"FileNumber":1,"FileSize":7895697364} TransactionId: 14015 ErrorTablets: {} User: root Comment: ``` ## TODO: 1. The current partition granularity of scan range is too large, resulting in an uneven loading process for progress." 2. Only broker load supports the new Progress Manager, support progress for other query	2023-05-06 22:44:40 +08:00
yongkang.zhong	2fe9ba7c2a	[fix](jdbc catalog) fix trino jdbc catalog varchar type err (#19298 )	2023-05-06 17:16:28 +08:00
Gabriel	4c6ca88088	Revert "[refactor](function) ignore DST for function `from_unixtime` (#19151 )" (#19333 ) This reverts commit 9dd6c8f87b73db238bfd38fb1d76f3796910f398.	2023-05-06 16:33:58 +08:00
ElvinWei	3f6e5118e6	[enchancement](statistics) support periodic collection of statistics (#19247 ) This PR enables periodic collection of statistics and is a precursor to automatic statistics collection. It mainly includes the following contents： support periodic collection of statistics. Change the type of Date in statistics p0 to DateV2(see [Enhancement](data-type) add FE config to prohibit create date and decimalv2 type #19077) for test locally. complement cases(remove Chinese characters, optimize code, etc) , improve stability. Supports setting whether to keep records of statistics synchronization job info, convenient for use in p0 testing. The statistics job table was modified, and some auxiliary judgments were added to avoid the user perceiving the modification. This function was removed when the table schema is stable.	2023-05-06 14:53:06 +08:00

1 2 3 4 5 ...

4573 Commits