doris

Author	SHA1	Message	Date
924060929	16c218fde5	[feature](nereids) support bind external relation out of Doris fe environment (#21123 ) support bind external relation out of Doris fe environment, for example, analyze sql in other java application. see BindRelationTest.bindExternalRelation.	2023-06-29 14:29:29 +08:00
Jibing-Li	3a12b67517	[Improvement](statistics, multi catalog)Implement hive table statistic connector (#21053 ) This pr is to add the collecting hive statistic function. While the CBO fetching hive table statistics, statistic cache will first load from internal stats olap table. If not found, then using this pr's function to fetch from remote Hive metastore.	2023-06-29 10:50:54 +08:00
Pxl	45f1909bc3	[Bug](lateral-view) make lateral view function's nullable mode work (#21242 ) make lateral view function's nullable mode work	2023-06-29 10:50:07 +08:00
Calvin Kirs	30b1b93353	[dependency](fe)Dependency version upgrade (#21191 ) Keep hadoop-aliyun version consistent with hadoop main version (3.3.5) upgrade jackson to 2.14.3 upgrade netty version to 4.1.94.final binding check.freamework version to 3.32.0 upgrade snappy-java to 1.1.10.1 upgrade hudi version to 0.13.1 upgrade spring version to 2.7.13 upgrade orc version to 1.8.4 revert nonsensical changes	2023-06-29 10:01:33 +08:00
morrySnow	64ffb06a79	[fix](Nereids) olap scan should not be gather since coordinator chould not process (#21298 ) in PR #21168 , we refactor physcial properties and translator to ensure not generating useless excahange. olap scan node could be gather in Nereids but translate to hash partitioned. since coordinator could not process gather olap scan node, we remove the candidate distribution spec of olap scan	2023-06-29 09:12:08 +08:00
Mingyu Chen	9af714bceb	[fix](catalog) disble FileSystem Cache to avoid too many fs cache (#21283 ) When creating a new hive catalog or refresh the hive catalog, it will refresh the HiveMetaStore cache. And it will call "FileInputFormat.setInputPaths()". In this method, it will create a new FileSystem instance and store it in FileSystem's cache. So if refresh catalog frequently, there will be too many FileSystem instances in cache, causing OOM. This PR disable the FileSystem Cache.	2023-06-29 09:06:00 +08:00
Xiangyu Wang	884c908e25	[Enhancement](multi-catalog) try to reuse existed ugi. (#21274 ) Try to reuse an existed ugi at DFSFileSystem, otherwise if we query a more then ten-thousands partitons hms table, we will do more than ten-thousands login operations, each login operation will cost hundreds of ms from my test. Co-authored-by: 王翔宇 <wangxiangyu@360shuke.com>	2023-06-29 09:04:59 +08:00
zy-kkk	449c8d4568	[fix](jdbc) Handling Zero DateTime Values in Non-nullable Columns for JDBC Catalog Reading MySQL (#21296 )	2023-06-28 22:51:17 +08:00
Kang	e7dd65f551	[fix](test) fix PlannerTest testEliminatingSortNode (#21112 ) testEliminatingSortNode needs to check if SortNode is existed in plan tree, so it should check plan1.contains("order by:"), but rather than plan1.contains("SORT INFO:") or plan1.contains("SORT LIMIT:").	2023-06-28 21:29:23 +08:00
DongLiang-0	a6b51ec19a	[Feature](avro) Support Apache Avro file format (#19990 ) support read avro file by hdfs() or s3() . ```sql select * from s3( "uri" = "http://127.0.0.1:9312/test2/person.avro", "ACCESS_KEY" = "ak", "SECRET_KEY" = "sk", "FORMAT" = "avro"); +--------+--------------+-------------+-----------------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------------+ \| Alyssa \| 1 \| 10.0012 \| 100000000221133 \| \| Ben \| 0 \| 5555.999 \| 4009990000 \| \| lisi \| 0 \| 5992225.999 \| 9099933330 \| +--------+--------------+-------------+-----------------+ select * from hdfs( "uri" = "hdfs://127.0.0.1:9000/input/person2.avro", "fs.defaultFS" = "hdfs://127.0.0.1:9000", "hadoop.username" = "doris", "format" = "avro"); +--------+--------------+-------------+-----------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------+ \| Alyssa \| 1 \| 8888.99999 \| 89898989 \| +--------+--------------+-------------+-----------+ ``` current avro reader only support common data type, the complex data types will be supported later.	2023-06-28 21:15:35 +08:00
yiguolei	325504deeb	[bugfix](recover) do not need dynamic partition recover except olap table (#21290 ) introduced by #19031 FE could not recover any more because there is a convert to olap table operation in the code. But there are many table types that is not a olap table such as view jdbc table ... It will convert failed and FE will not start correctly.Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-06-28 19:56:17 +08:00
starocean999	016870b673	[opt](nereids) use Expression's isConstant to check whether could be remove from group by key (#21195 )	2023-06-28 19:12:36 +08:00
xzj7019	76620c21aa	[improvement](nereids) prune hash join output slot ids list (#20789 ) 1. prune hash join output slot ids list based on slot ids in required project and other conjunctions, to reduce the be side effort. 2. support pruning for semi/anti also	2023-06-28 17:28:18 +08:00
morrySnow	7588abe76b	[refactor](Nereids) refactor physical properties and plan translator (#21168 ) this PR 1. refactor physical properties, property deriver and property regular to ensure Nereids could generate plan with sufficent PhysicalDistribute. 2. refactor PhyscialPlanTranslator to ensure all ExchangeNode generated by PhysicalDistribute, except CTEConsumer. We will refactor all cte related node later. the detail changes of this PR: 1. update DistributionSpec of physical properties: - Any: random distribution, used in output and require - StorageAny: random distribution but constrained by where the data is stored, used in output - ExecutionAny: random distribution to present random shuffle, used in output - Gather: gather distribution, used in output and require - StorageGather: gather distribution but constrained by where the data is stored, used in output - Replicated: broadcast distribution - Hash: bucket distribution 2. update shuffle type of DistributionSpecHash - REQUIRE: used in require - NATURAL: distribution as storage engine hash algorithm, constrained by where the data is stored - STORAGE_BUCKETED: distribution as storage engine hash algorithm - EXECUTION_BUCKETED: distribution as execution engine hash algorithm 3. update HideOneRowRelationUnderSetOperation to MergeOneRowRelationIntoSetOperation 4. update property deriver of SetOperation to ensure suitable PhysicalDistribute be added at top and below of SetOperation 5. refactor PhysicalPlanTranslator to ensure no unplanned exchange node will be added	2023-06-28 15:15:11 +08:00
Jack Drogon	08fe22cb0c	[improvement](backup) Add BackupJobInfo with tableCommitSeqMap (#21255 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-06-28 11:10:12 +08:00
bingquanzhao	853fa5f688	[typo](nativeInsertStmt) fix object-stored column exception description (#21221 )	2023-06-28 10:12:55 +08:00
amory	b1e973b721	[Improve](func)support array to window-func first-last-value arg type (#21201 ) * support array to windown-func first-last-value arg type * add regress test for first-last-value of array type * update * format be:	2023-06-28 10:02:00 +08:00
bingquanzhao	98b2bc87b5	[typo](MultiPartitionDesc) fix Multi partition time interval exception description (#21222 )	2023-06-28 00:42:25 +08:00
zy-kkk	d871df64ca	[improvement](oracle jdbc)Support for automatically obtaining the precision of the oracle timestamp type (#21252 )	2023-06-28 00:19:01 +08:00
YueW	92882ebd91	[fix](inverted index) update output rowset index meta with input rowset when drop inverted index (#21248 )	2023-06-27 23:54:35 +08:00
Gabriel	5506faa7b4	[datetimev2](minor) Add scale parameter for datetimev2 (#21176 )	2023-06-27 19:55:35 +08:00
AKIRA	acba8648a5	[enhancement](nereids) Add log for stats (#21164 ) 1. LOG sql when analyze failed 2. Return directly for analyze_test suite when there is more than one frontend 3. Set query_timeout for tpcds suites to avoid unneccessary failed caused by analyze sync	2023-06-27 19:17:22 +08:00
wangbo	7d22910fbd	[improvement](workloadgroup)add check when drop/set workload group (#21174 ) 1 check group exists when set group for user property; eg, if g1 not exists, then set op should be failed. mysql [test]>SET PROPERTY FOR 'root' 'default_workload_group' = 'g1'; ERROR 1105 (HY000): errCode = 2, detailMessage = workload group g1 not exists 2 check whether group is used for user when drop group; eg, if a group is set for root, then drop should be failed. mysql [test]>drop workload group test_g1; ERROR 1105 (HY000): errCode = 2, detailMessage = workload group test_g1 is set for user root	2023-06-27 18:10:32 +08:00
mch_ucchi	64a1eb77f0	[opt](planner) support delete with a subquery in predicate by construct an insert. (#20983 ) complex predicate in delete stmt like: ```sql delete from t1 where t1.id in (select id from t2); ``` will be replaced to an insert stmt. ```sql insert into t1(id, __DORIS_DELETE_SIGN__) select id, 1 from t1 where id in (select id from t2); ```	2023-06-27 17:51:13 +08:00
starocean999	c52c73c1c6	[fix](nereids)return original expr if cast to decimal literal overflow (#21189 )	2023-06-27 17:25:04 +08:00
starocean999	84554ec0fd	[fix](planner) the resultExprs should be substituted using table function node's outputSmap (#21182 )	2023-06-27 17:19:49 +08:00
zhangdong	7b93b26b8c	[feature-wip](MTMV) optimize lock of mtmv job & task, to avoid dead lock (#21054 )	2023-06-27 16:23:50 +08:00
luozenglin	efcc65a0d3	[feature-wip](workload-group) Support for workload group Authentication (#20242 )	2023-06-27 09:57:18 +08:00
zy-kkk	c9306e9c48	[improvement](ms jdbc)Support for automatically obtaining the precision of the sqlserver datetime type (#21145 )	2023-06-26 23:10:46 +08:00
minghong	095550271b	[fix](nereids) set proper sort info to scan node to enable TopN-opt (#21148 )	2023-06-26 19:54:37 +08:00
YueW	c19e35116b	[fix](inverted index)fix transaction id not unique for one index change job when light index change (#21180 )	2023-06-26 19:54:05 +08:00
lihangyu	50c1d55769	[Improve](dynamic schema) support filtering invalid data (#21160 ) * [Improve](dynamic schema) support filtering invalid data 1. Support dynamic schema to filter illegal data. 2. Expand the regular expression for ColumnName to support more column names. 3. Be compatible with PropertyAnalyzer and support legacy tables. 4. Default disable parse multi dimenssion array, since some bug unresolved	2023-06-26 19:32:43 +08:00
zy-kkk	9c5a0cc471	[bug](jdbc catalog) fix getPrimaryKeys fun bug (#21137 )	2023-06-26 17:13:50 +08:00
jakevin	cdc2d42c3a	[refactor](Nereids): adjust order of rewrite rules. (#21133 ) Put the rules that eliminate plan in front to avoid block other rules, so we can avoid to invoke pushdown filter/limit again	2023-06-26 16:47:33 +08:00
starocean999	f2ed1bce1a	[fix](nereids)change PushdownFilterThroughProject post processor from bottom up to top down rewrite (#21125 ) 1. pass physicalProperties in withChildren function 2. use top down traverse in PushdownFilterThroughProject post processor	2023-06-26 15:34:41 +08:00
slothever	2b3c82f57a	[fix](multi-catalog)fix max compute scanner OOM and datetime (#20957 ) 1. Fix MC jni scanner OOM 2. add the second datetime type for MC SDK timestamp 3. make s3 uri case insensitive by the way 4. optimize max compute scanner parallel model	2023-06-26 13:53:29 +08:00
slothever	d4240ac21b	[fix](multi-catalog)add oss sdk, supported oss properties (#21029 )	2023-06-26 13:00:44 +08:00
caiconghui	f8ef4ed18f	[fix](log4j) fix some issues when modify log config (#21099 ) Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-06-26 08:46:33 +08:00
Pxl	0122aa79df	[Chore](vectorized) remove all isVectorized (#21076 ) isVectorized is always true now	2023-06-25 23:13:34 +08:00
starocean999	58b3e5ebdb	[fix](nereids)scan node's smap should use materiazlied slots and project list as left and right expr list (#21142 )	2023-06-25 22:34:43 +08:00
yuxuan-luo	8f7a62c79b	[improvement](mutil-catalog) PaimonColumnValue support short and Decimal (#20723 )	2023-06-25 22:31:38 +08:00
Xiangyu Wang	2c2d56e8a0	[Feature](broker-load) Add priority info for ShowLoadStmt. (#20984 ) Following pr #20628 , add priority information of the load job.	2023-06-25 22:11:21 +08:00
yiguolei	64790a3a86	[bugfix](workloadgroup) could not upgrade from 2.0 alpha (#21149 ) --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-06-25 22:02:53 +08:00
minghong	2d1163c4d8	[refactor](nereids) update Agg stats derive method #21036 This pr has no effect on tpch queries. Some tpcds queries are impacted. They are 4/11/23/24/47/51/57/65/74, in which 4 and 51 are improved	2023-06-25 21:47:32 +08:00
minghong	34b048a2bd	[fix](nereids) update outer join estimation #21126 the row count of left outer join should be no less than left child row count.	2023-06-25 21:37:55 +08:00
Xiangyu Wang	af2b67e65a	[Fix](multi-catalog) Invalidate cache when enable auto refresh catalog. (#21070 ) The default value of RefreshCatalogStmt.invalidCache is false now, but the RefreshManager.RefreshTask does not invoke RefreshCatalogStmt.analyze() so it will not invalidate the cache. This pr mainly fix this problem	2023-06-25 19:14:44 +08:00
AKIRA	638aa41988	[fix](planner) fix push filter through agg #21080 In the previous implementation, the check for groupby exprs was ignored. Add this necessary check to make sure it would work You could reproduce it by runnning belowing sql: CREATE TABLE t_push_filter_through_agg (col1 varchar(11451) not null, col2 int not null, col3 int not null) UNIQUE KEY(col1) DISTRIBUTED BY HASH(col1) BUCKETS 3 PROPERTIES( "replication_num"="1" ); CREATE VIEW `view_i` AS SELECT `b`.`col1` AS `col1`, `b`.`col2` AS `col2` FROM ( SELECT `col1` AS `col1`, sum(`cost`) AS `col2` FROM ( SELECT `col1` AS `col1`, sum(CAST(`col3` AS INT)) AS `cost` FROM `t_push_filter_through_agg` GROUP BY `col1` ) a GROUP BY `col1` ) b; SELECT SUM(`total_cost`) FROM view_a WHERE `dt` BETWEEN '2023-06-12' AND '2023-06-18' LIMIT 1;	2023-06-25 19:14:20 +08:00
starocean999	b6c9feb458	[fix](nereids) check table privilege when it's needed (#21130 ) check privilege on LogicalOlapScan, LogicalEsScan, LogicalFileScan and LogicalSchemaScan	2023-06-25 18:35:39 +08:00
Siyang Tang	46f0295b78	[feature](load-refactor-with-tvf) S3 load with S3 tvf and native insert (#19937 )	2023-06-25 17:45:31 +08:00
AKIRA	771b0cbb4c	[fix](stats) Update analyze task execute time (#21026 ) Before this PR last_execute_time of pending analyze jobs would be 1970-01-01, you can reproduce it by run show analyze	2023-06-25 15:52:33 +08:00

1 2 3 4 5 ...

5055 Commits