doris

Author	SHA1	Message	Date
Xiangyu Wang	11e18f4c98	[Fix](multi-catalog) fix NPE for FileCacheValue. (#21441 ) FileCacheValue.files may be null if there is not any files exists for some partitions.	2023-07-03 23:38:58 +08:00
starocean999	63b170251e	[fix](nereids)cast filter and join conjunct's return type to boolean (#21434 )	2023-07-03 17:22:46 +08:00
Qi Chen	f80df20b6f	[Fix](multi-catalog) Fix read error in mixed partition locations. (#21399 ) Issue Number: close #20948 Fix read error in mixed partition locations(for example, some partitions locations are on s3, other are on hdfs) by `getLocationType` of file split level instead of the table level.	2023-07-03 15:14:17 +08:00
jakevin	9fa2dac352	[fix](Nereids): DefaultPlanRewriter visit plan children. (#21395 )	2023-07-03 13:20:01 +08:00
minghong	17af099dc3	[fix](nereids)miss group id in explain plan #21402 after we introduce "PushdownFilterThroughProject" post processor, some plan node missed their groupExpression (withChildren function will remove groupExpression). this is not good for debug, since it takes more time to find the owner group of a plan node This pr record the missing owner group id in plan node mutableState.	2023-07-03 13:16:33 +08:00
minghong	2827bc1a39	[Fix](nereids) fix a bug in ColumnStatistics.numNulls update #21220 no impact on tpch has impact on tpcds 95, before 1.63 sec, after 1.30 sec	2023-07-03 10:51:23 +08:00
Pxl	59c1bbd163	[Feature](materialized view) support query match mv with agg_state on nereids planner (#21067 ) * support create mv contain aggstate column * update * update * update * support query match mv with agg_state on nereids planner update * update * update	2023-07-03 10:19:31 +08:00
Qi Chen	124516c1ea	[Fix](orc-reader) Fix `Wrong data type for column` error when column order in hive table is not same in orc file schema. (#21306 ) `Wrong data type for column` error when column order in hive table is not same in orc file schema. The root cause is in order to handle the following case: The table in orc format of Hive 1.x may encounter system column names such as `_col0`, `_col1`, `_col2`... in the underlying orc file schema, which need to use the column names in the hive table for mapping. ### Solution Currently fix this issue by handling the following case by specifying hive version to 1.x.x in the hive catalog configuration. ```sql CREATE CATALOG hive PROPERTIES ( 'hive.version' = '1.x.x' ); ```	2023-07-03 09:32:55 +08:00
slothever	f5af735fa6	[fix](multi-catalog)fix obj file cache and dlf iceberg catalog (#21238 ) 1. fix storage prefix for obj file cache: oss/cos/obs don't need convert to s3 prefix , just convert when create split 2. dlf iceberg catalog: support dlf iceberg table, use s3 file io.	2023-07-02 21:08:41 +08:00
xy720	f74e635aa5	[bug](proc) fix NumberFormatException in show proc '/current_queries' (#21400 ) If the current query is running for a very long time, the ExecTime of this query may larger than the MAX_INT value, then a NumberFormatException will be thrown when execute "show proc '/current_queries'." The query's ExecTime is long type, we should not use 'Integer.parseInt' to parse it.	2023-07-01 17:42:46 +08:00
Mingyu Chen	887d33c789	[fix](cup) add keywords KW_PERCENT (#21404 ) Or it may cause some edit log replay error, like parsing create routine load stmt, which contains this keyword as a column name	2023-07-01 16:53:54 +08:00
htyoung	603f4ab20f	[fix](truncate) it will directly return and avoid throwing IllegalStateException caused by bufferSize equals zero when table has no partition (#21378 ) if table currently has no partition, the truncate SQL will be a empty command, it should directly return and avoid throwing IllegalStateException caused by bufferSize equals zero Issue Number: close #21316 Co-authored-by: tongyang.han <tongyang.han@jiduauto.com>	2023-07-01 08:39:38 +08:00
Ashin Gau	0e17cd4d92	[fix](hudi) use hudi api to split the COW table (#21385 ) Fix tow bugs: COW & Read Optimized table will use hive splitter to split files, but it can't recognize some specific files. ERROR 1105 (HY000): errCode = 2, detailMessage = (172.21.0.101)[CORRUPTION]Invalid magic number in parquet file, bytes read: 3035, file size: 3035, path: /usr/hive/warehouse/hudi.db/test/.hoodie/metadata/.hoodie/00000000000000.deltacommit.inflight, read magic: The read optimized table created by spark will add empty partition even if the table has no partition, so we have to filter these empty partition keys in hive client. \| test_ro \| CREATE TABLE `test_ro`( `_hoodie_commit_time` string COMMENT '', ... `ts` bigint COMMENT '') PARTITIONED BY ( `` string) ROW FORMAT SERDE	2023-07-01 08:35:33 +08:00
zzzzzzzs	96aa0e5876	[fix](tvf) To fix the bug that requires adding backticks on "frontends()" in order to query the frontends TVF. (#21338 )	2023-06-30 22:37:21 +08:00
Jibing-Li	ed2cd4974e	[fix](nereids) to_date should return type datev2 for datetimev2 (#21375 ) To_date function in nereids return type should be DATEV2 if the arg type is DATETIMEV2. Before the return type was DATE which would cause BE get wrong query result.	2023-06-30 21:42:59 +08:00
jakevin	18b7d84436	[fix](Nereids): reject infer distinct when children exist NLJ (#21391 )	2023-06-30 20:29:48 +08:00
xzj7019	4117f0b93b	[improve](nereids) Support outer rf into inner left outer join (#21368 ) Support rf into left outer join from outside allowed type join. Before this pr, some join type, such as full outer join, are all not allowed to do rf pushing. For example, (a left join b on a.id = b.id) inner join c on a.id2 = c.id2, will lost the rf pushing from c.id2 to inner table a. This pr will open this limitation for supporting rf into left outer join from outside allowed type join.	2023-06-30 19:07:39 +08:00
xzj7019	164448dac3	[fix](nereids) fix rf info missing for set op (#21367 ) During physical set operation translation, we forget to inherit rf related info from set op children, which will lead the merge filter error and get a long waittime.	2023-06-30 18:50:29 +08:00
Pxl	88cbea2b56	[Bug](agg-state) fix core dump on not nullable argument for aggstate's nested argument (#21331 ) fix core dump on not nullable argument for aggstate's nested argument	2023-06-30 18:20:25 +08:00
Jack Drogon	de39632f1b	[feature](binlog) Add AddPartitionRecord && DROP_PARTITION (#21344 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-06-30 16:57:11 +08:00
Calvin Kirs	2c3183f5eb	[Feature](Job)Provide unified internal Job scheduling (#21113 ) We use the time wheel algorithm to complete the scheduling and triggering of periodic tasks. The implementation of the time wheel algorithm refers to netty's HashedWheelTimer. We will periodically (10 minutes by default) put the events that need to be triggered in the future cycle into the time wheel for periodic scheduling. In order to ensure the efficient triggering of tasks and avoid task blocking and subsequent task scheduling delays, we use Disruptor to implement the production and consumption model. When the task expires and needs to be triggered, the task will be put into the RingBuffer of the Disruptor, and then the consumer thread will consume the task. Consumers need to register for events, and event registration needs to provide event executors. Event executors are a functional interface with only one method for executing events. If it is a single event, the event definition will be deleted after the scheduling is completed; if it is a periodic event, it will be put back into the time wheel according to the periodic scheduling after the scheduling is completed.	2023-06-30 16:43:20 +08:00
starocean999	8809cca74a	[fix](nereids) physical sort node's equals method should compare sort phase (#21301 )	2023-06-30 14:04:22 +08:00
Xiangyu Wang	8f4b7c8f3d	[Fix](multi-catalog) optimize hashcode for PartitionKey. (#21307 )	2023-06-30 13:48:08 +08:00
yongjinhou	df23ab3f29	[Enhancement](tvf) Add authentication for workload group tvf (#21323 )	2023-06-30 12:56:23 +08:00
minghong	9f44c2d80d	[fix](nereids) nest loop join stats estimation (#21275 ) 1. fix bug in nest loop join estimation 2. update column=column stats estimation	2023-06-30 10:00:30 +08:00
jakevin	9756ff1e25	[feature](Nereids): infer distinct from SetOperator (#21235 ) Infer distinct from Distinct SetOperator, and put distinct above children to reduce data. tpcds_sf100 q14: before 100 rows in set (7.60 sec) after 100 rows in set (6.80 sec)	2023-06-29 22:04:41 +08:00
zhannngchen	c7286c620b	[fix](unique key) agg_function is NONE when properties is null (#21337 )	2023-06-29 20:47:13 +08:00
morrySnow	6259a91d12	[opt](profile) add whether use Nereids info in Profile (#21342 ) add whether use Nereids or pipeline engine in profile, for example: Summary: - Profile ID: 460e710601674438-9df2d685bdfc20f8 - Task Type: QUERY ... - Is Nereids: Yes - Is Pipeline: Yes - Is Cached: No	2023-06-29 20:36:15 +08:00
morrySnow	f3fc606312	[minor](Nereids) change Nereids parse failed log level to debug (#21335 )	2023-06-29 19:52:48 +08:00
morrySnow	5bb79be932	[opt](Nereids) forbid gather agg and gather set operation (#21332 ) gather agg and gather set operation usually not good we cannot compute cost on them nicely, so just forbid them until we could choose realy best plan	2023-06-29 19:52:15 +08:00
minghong	419f51ca2c	[feature](nereids)set nereids cbo weights by session var #21293 good for tune cost model	2023-06-29 18:54:04 +08:00
xzj7019	59198ed59e	[improvement](nereids) Support rf into cte (#21114 ) Support runtime filter pushing down into cte internal.	2023-06-29 16:58:31 +08:00
minghong	64e9eab0dd	[fix](nereids)update Agg stats estimation #21300 Agg stats estimation should use the biggest groupby key's NDV as base, and multiply expansion factor, which is calculated by other groupby key' ndv. Before, we use the smallest ndv as base	2023-06-29 16:37:05 +08:00
Pxl	a518ea5063	[Bug](pipeline) do not call cancelPlanFragmentAsync when instance finished (#21193 ) do not call cancelPlanFragmentAsync when instance finished	2023-06-29 15:35:23 +08:00
924060929	16c218fde5	[feature](nereids) support bind external relation out of Doris fe environment (#21123 ) support bind external relation out of Doris fe environment, for example, analyze sql in other java application. see BindRelationTest.bindExternalRelation.	2023-06-29 14:29:29 +08:00
Jibing-Li	3a12b67517	[Improvement](statistics, multi catalog)Implement hive table statistic connector (#21053 ) This pr is to add the collecting hive statistic function. While the CBO fetching hive table statistics, statistic cache will first load from internal stats olap table. If not found, then using this pr's function to fetch from remote Hive metastore.	2023-06-29 10:50:54 +08:00
Pxl	45f1909bc3	[Bug](lateral-view) make lateral view function's nullable mode work (#21242 ) make lateral view function's nullable mode work	2023-06-29 10:50:07 +08:00
Calvin Kirs	30b1b93353	[dependency](fe)Dependency version upgrade (#21191 ) Keep hadoop-aliyun version consistent with hadoop main version (3.3.5) upgrade jackson to 2.14.3 upgrade netty version to 4.1.94.final binding check.freamework version to 3.32.0 upgrade snappy-java to 1.1.10.1 upgrade hudi version to 0.13.1 upgrade spring version to 2.7.13 upgrade orc version to 1.8.4 revert nonsensical changes	2023-06-29 10:01:33 +08:00
morrySnow	64ffb06a79	[fix](Nereids) olap scan should not be gather since coordinator chould not process (#21298 ) in PR #21168 , we refactor physcial properties and translator to ensure not generating useless excahange. olap scan node could be gather in Nereids but translate to hash partitioned. since coordinator could not process gather olap scan node, we remove the candidate distribution spec of olap scan	2023-06-29 09:12:08 +08:00
Mingyu Chen	9af714bceb	[fix](catalog) disble FileSystem Cache to avoid too many fs cache (#21283 ) When creating a new hive catalog or refresh the hive catalog, it will refresh the HiveMetaStore cache. And it will call "FileInputFormat.setInputPaths()". In this method, it will create a new FileSystem instance and store it in FileSystem's cache. So if refresh catalog frequently, there will be too many FileSystem instances in cache, causing OOM. This PR disable the FileSystem Cache.	2023-06-29 09:06:00 +08:00
Xiangyu Wang	884c908e25	[Enhancement](multi-catalog) try to reuse existed ugi. (#21274 ) Try to reuse an existed ugi at DFSFileSystem, otherwise if we query a more then ten-thousands partitons hms table, we will do more than ten-thousands login operations, each login operation will cost hundreds of ms from my test. Co-authored-by: 王翔宇 <wangxiangyu@360shuke.com>	2023-06-29 09:04:59 +08:00
zy-kkk	449c8d4568	[fix](jdbc) Handling Zero DateTime Values in Non-nullable Columns for JDBC Catalog Reading MySQL (#21296 )	2023-06-28 22:51:17 +08:00
Kang	e7dd65f551	[fix](test) fix PlannerTest testEliminatingSortNode (#21112 ) testEliminatingSortNode needs to check if SortNode is existed in plan tree, so it should check plan1.contains("order by:"), but rather than plan1.contains("SORT INFO:") or plan1.contains("SORT LIMIT:").	2023-06-28 21:29:23 +08:00
DongLiang-0	a6b51ec19a	[Feature](avro) Support Apache Avro file format (#19990 ) support read avro file by hdfs() or s3() . ```sql select * from s3( "uri" = "http://127.0.0.1:9312/test2/person.avro", "ACCESS_KEY" = "ak", "SECRET_KEY" = "sk", "FORMAT" = "avro"); +--------+--------------+-------------+-----------------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------------+ \| Alyssa \| 1 \| 10.0012 \| 100000000221133 \| \| Ben \| 0 \| 5555.999 \| 4009990000 \| \| lisi \| 0 \| 5992225.999 \| 9099933330 \| +--------+--------------+-------------+-----------------+ select * from hdfs( "uri" = "hdfs://127.0.0.1:9000/input/person2.avro", "fs.defaultFS" = "hdfs://127.0.0.1:9000", "hadoop.username" = "doris", "format" = "avro"); +--------+--------------+-------------+-----------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------+ \| Alyssa \| 1 \| 8888.99999 \| 89898989 \| +--------+--------------+-------------+-----------+ ``` current avro reader only support common data type, the complex data types will be supported later.	2023-06-28 21:15:35 +08:00
yiguolei	325504deeb	[bugfix](recover) do not need dynamic partition recover except olap table (#21290 ) introduced by #19031 FE could not recover any more because there is a convert to olap table operation in the code. But there are many table types that is not a olap table such as view jdbc table ... It will convert failed and FE will not start correctly.Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-06-28 19:56:17 +08:00
starocean999	016870b673	[opt](nereids) use Expression's isConstant to check whether could be remove from group by key (#21195 )	2023-06-28 19:12:36 +08:00
xzj7019	76620c21aa	[improvement](nereids) prune hash join output slot ids list (#20789 ) 1. prune hash join output slot ids list based on slot ids in required project and other conjunctions, to reduce the be side effort. 2. support pruning for semi/anti also	2023-06-28 17:28:18 +08:00
morrySnow	7588abe76b	[refactor](Nereids) refactor physical properties and plan translator (#21168 ) this PR 1. refactor physical properties, property deriver and property regular to ensure Nereids could generate plan with sufficent PhysicalDistribute. 2. refactor PhyscialPlanTranslator to ensure all ExchangeNode generated by PhysicalDistribute, except CTEConsumer. We will refactor all cte related node later. the detail changes of this PR: 1. update DistributionSpec of physical properties: - Any: random distribution, used in output and require - StorageAny: random distribution but constrained by where the data is stored, used in output - ExecutionAny: random distribution to present random shuffle, used in output - Gather: gather distribution, used in output and require - StorageGather: gather distribution but constrained by where the data is stored, used in output - Replicated: broadcast distribution - Hash: bucket distribution 2. update shuffle type of DistributionSpecHash - REQUIRE: used in require - NATURAL: distribution as storage engine hash algorithm, constrained by where the data is stored - STORAGE_BUCKETED: distribution as storage engine hash algorithm - EXECUTION_BUCKETED: distribution as execution engine hash algorithm 3. update HideOneRowRelationUnderSetOperation to MergeOneRowRelationIntoSetOperation 4. update property deriver of SetOperation to ensure suitable PhysicalDistribute be added at top and below of SetOperation 5. refactor PhysicalPlanTranslator to ensure no unplanned exchange node will be added	2023-06-28 15:15:11 +08:00
Jack Drogon	08fe22cb0c	[improvement](backup) Add BackupJobInfo with tableCommitSeqMap (#21255 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-06-28 11:10:12 +08:00
bingquanzhao	853fa5f688	[typo](nativeInsertStmt) fix object-stored column exception description (#21221 )	2023-06-28 10:12:55 +08:00

... 63 64 65 66 67 ...

8289 Commits