a wide table
Users rarely set scan_queue_mem_limit, so it almost often works as 2G/20. However,
somecases we need set it to a larger value, especially for insrt into
select from a wide table.
generate more runtime filters
example:
lineitem join partsupp on l_partkey= ps_partkey join filter(part) on ps_partkey=p_partkey
we need two RFs:
RF1: p_partkey->ps_partkey
RF2: p_partkey->l_partkey
This pr will generate RF2, but current version will not.
merge runtime filters
current version, if one src could affect 2 targets, we will generate 2 runtime filters.
after this pr, the two rf will be merged.
refer to regression test: ds_rf2/ds_rf5/ds_rf54
It's caused by we used same query id for multiple queries of same olap analyze task, but many structures related to query execution depends on query id.
fix: non-static inner class should not implement serialized interface, or when it is serialized it will contain outer class info, which is not safe
And in this scenario, the class does not use info of outer class, which should use static class instead
Two improvements:
1. Move the `Job_id` column for the return info of `Analyze table` command to the first column. To keep consistent with `show analyze`.
```
mysql> analyze table hive.tpch100.region;
+--------+--------------+-------------------------+------------+--------------------------------+
| Job_Id | Catalog_Name | DB_Name | Table_Name | Columns |
+--------+--------------+-------------------------+------------+--------------------------------+
| 14403 | hive | default_cluster:tpch100 | region | [r_regionkey,r_comment,r_name] |
+--------+--------------+-------------------------+------------+--------------------------------+
1 row in set (0.03 sec)
```
2. Add `analyze_timeout` session variable, to control `analyze table/database with sync` timeout.
since we have three infrastructure to ensure changing input column order
not lead to wrong result, we could remove this flag on LogicalProject to
eliminate project as mush as possible and let code clear.
1. output list in ResultSink node
2. regular children output in SetOperation node
3. producer to consumer slot id map in CteConsumer
Current cte common filter extraction doesn't work if the filters can be aggregated, which will lead the common filter can't be pushed down inside cte. Consider the following case:
with main as (select c1 from t1) select * from (select m1.* from main m1, main m2 where m1.c1 = m2.c1) abc where c1 = 1;
The common c1=1 filter can't be pushed down.
This pr fixed the original extraction logic from set to list to make the logic works, and this will also resolve the tpcds query4/11's pattern works well also.
In order to decouple PointQueryExec from the Coordinator, both PointQueryExec and Coordinator inherit from CoordInterface, and are collectively scheduled through StmtExecutor.
Error Msg:
Caused by: org.apache.doris.datasource.CacheException: failed to get input splits for FileCacheKey{location='viewfs://my-cluster/ns1/usr/hive/warehouse/viewfs.db/parquet_table', inputFormat='org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'} in catalog test_viewfs_hive
at org.apache.doris.datasource.hive.HiveMetaStoreCache.loadFiles(HiveMetaStoreCache.java:466) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HiveMetaStoreCache.access$400(HiveMetaStoreCache.java:112) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HiveMetaStoreCache$3.load(HiveMetaStoreCache.java:210) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HiveMetaStoreCache$3.load(HiveMetaStoreCache.java:202) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.CacheBulkLoader.lambda$null$0(CacheBulkLoader.java:42) ~[doris-fe.jar:1.2-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_131]
... 3 more
Caused by: org.apache.doris.common.UserException: errCode = 2, detailMessage = Failed to list located status for path: viewfs://my-cluster/ns1/usr/hive/warehouse/viewfs.db/parquet_table
at org.apache.doris.fs.remote.RemoteFileSystem.listLocatedFiles(RemoteFileSystem.java:54) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HiveMetaStoreCache.getFileCache(HiveMetaStoreCache.java:381) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HiveMetaStoreCache.loadFiles(HiveMetaStoreCache.java:432) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HiveMetaStoreCache.access$400(HiveMetaStoreCache.java:112) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HiveMetaStoreCache$3.load(HiveMetaStoreCache.java:210) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HiveMetaStoreCache$3.load(HiveMetaStoreCache.java:202) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.CacheBulkLoader.lambda$null$0(CacheBulkLoader.java:42) ~[doris-fe.jar:1.2-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_131]
... 3 more
Caused by: java.nio.file.AccessDeniedException: viewfs://my-cluster/ns1/usr/hive/warehouse/viewfs.db/parquet_table: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:215) ~[hadoop-aws-3.3.6.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.onceInTheFuture(Invoker.java:190) ~[hadoop-aws-3.3.6.jar:?]
at org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.next(Listing.java:651) ~[hadoop-aws-3.3.6.jar:?]
at org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator.requestNextBatch(Listing.java:430) ~[hadoop-aws-3.3.6.jar:?]
at org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator.<init>(Listing.java:372) ~[hadoop-aws-3.3.6.jar:?]
at org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Listing.java:143) ~[hadoop-aws-3.3.6.jar:?]
at org.apache.hadoop.fs.s3a.Listing.getListFilesAssumingDir(Listing.java:211) ~[hadoop-aws-3.3.6.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(S3AFileSystem.java:4898) ~[hadoop-aws-3.3.6.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listFiles$38(S3AFileSystem.java:4840) ~[hadoop-aws-3.3.6.jar:?]
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547) ~[hadoop-common-3.3.6.jar:?]
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528) ~[hadoop-common-3.3.6.jar:?]
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449) ~[hadoop-common-3.3.6.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2480) ~[hadoop-aws-3.3.6.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2499) ~[hadoop-aws-3.3.6.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(S3AFileSystem.java:4839) ~[hadoop-aws-3.3.6.jar:?]
at org.apache.doris.fs.remote.RemoteFileSystem.listLocatedFiles(RemoteFileSystem.java:50) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HiveMetaStoreCache.getFileCache(HiveMetaStoreCache.java:381) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HiveMetaStoreCache.loadFiles(HiveMetaStoreCache.java:432) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HiveMetaStoreCache.access$400(HiveMetaStoreCache.java:112) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HiveMetaStoreCache$3.load(HiveMetaStoreCache.java:210) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HiveMetaStoreCache$3.load(HiveMetaStoreCache.java:202) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.common.util.CacheBulkLoader.lambda$null$0(CacheBulkLoader.java:42) ~[doris-fe.jar:1.2-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_131]
... 3 more
In previous, when querying hive table in orc format, and the file is splitted.
the result of select count(*) may be multiple of the real row number.
This is because the number of rows should be got after orc strip prune,
otherwise, it may return wrong result
original SQL
select t1.* from t1 where t1.k1 not in ( select t3.k1 from t3 where t1.k2 = t3.k2 );
rewrite SQL
before (wrong):
select t1.* from t1 null aware left anti join t2 on t1.k1 = t3.k1 and t1.k2 = t3.k2;
now (correct):
select t1.* from t1 left anti join t3 on t1.k2 = t3.k2 and (t1.k1 = t3.k1 or t3.k1 is null or t1.k1 is null);
1. Change from using string matching function to using Expr matching
2. Replace the `nvl` function with `ifnull` when pushed down to MySQL
3. Adapt ClickHouse's `from_unixtime` function to push down
4. Non-function filtering can still be pushed down when `enable_func_pushdown` is set to false
support insert into table values(...) for Nereids.
sql like:
insert into t values(1, 2, 3)
insert into t values(1 + 1, dayofweek(now()), 4), (4, 5, 6)
insert into t values('1', '6.5', cast(1.5 as int))
1. Fix hive partition prune bug, introduced from #23845, will fail `test_hive_default_partition` test case.
2. Fix `test_local_tvf.groovy` test case, the path of local tvf should be relative path.
3. Fix `test_external_catalog_hive` test case, the `partitions` is now reserve keywords
4. Support `local` tvf in Nereids, but fix related issue like:
```
Caused by: java.lang.NullPointerException
at org.apache.doris.nereids.stats.ExpressionEstimation.castMinMax(ExpressionEstimation.java:171) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.nereids.stats.ExpressionEstimation.visitCast(ExpressionEstimation.java:167) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.nereids.stats.ExpressionEstimation.visitCast(ExpressionEstimation.java:109) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.nereids.trees.expressions.Cast.accept(Cast.java:55) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.nereids.stats.ExpressionEstimation.visitAlias(ExpressionEstimation.java:394) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.nereids.stats.ExpressionEstimation.visitAlias(ExpressionEstimation.java:109) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.nereids.trees.expressions.Alias.accept(Alias.java:145) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.nereids.stats.ExpressionEstimation.estimate(ExpressionEstimation.java:119) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.nereids.stats.StatsCalculator.lambda$computeProject$7(StatsCalculator.java:785) ~[doris-fe.jar:1.2-SNAPSHOT]
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_341]
at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_341]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_341]
```
if set preserveRootTypes to false when calling substituteList, the root cast expr may be lost during substituting. For example, the top cast expr is cast(decimal_col as double), if it's lost, the data type mismatch between plan node and be crashes.
Enable two phase partition topn optimization, instead of original full sort at the second phase.
E.g, partial plan of tpcds q67 is as following and a full sort after exchange will have performance impact, especially if the window column's ndv is very high and the number of window is huge.
------PhysicalTopN
--------filter((rk <= 100))
----------PhysicalWindow
------------PhysicalQuickSort
--------------PhysicalDistribute
----------------PhysicalPartitionTopN
------------------PhysicalProject
Under this scenario, the second phase full sort can be transformed to a global PhysicalPartitionTopN and reduce the cost from full sort. The plan will be optimized to the following:
------PhysicalTopN
--------filter((rk <= 100))
----------PhysicalWindow
------------PhysicalPartitionTopN
--------------PhysicalDistribute
----------------PhysicalPartitionTopN
------------------PhysicalProject
A Simplified Version of the Profile
Divided into three levels:
Level 2: The original profile.
Level 1: Instances with identical structures are merged, utilizing concatenation for info strings, and recording the extremum for time types.
Note that currently, this is purely experimental, simplifying the profile on the frontend (you can view profiles at any level).
Subsequently, we will transition the simplification process to the backend. At that point, due to the simplification being done on the backend, viewing profiles at other levels won't be possible.
Due to the issue with the pipeline structure, the active time does not accurately reflect the time of the operators.
```
set enable_simply_profile = false;
set enable_simply_profile = true;
```