Commit Graph

13546 Commits

Author SHA1 Message Date
e59aa49f28 [feature](datetime-func)support milliseconds_add/sub/diff and microseconds_diff (#24114) 2023-09-20 10:38:56 +08:00
a71d7f2beb [pipelineX](operator) support partition sort operator and distinct streaming agg operator (#24544) 2023-09-20 09:50:51 +08:00
7e17e0d3f7 [fix](Nereids) select outfile column order is wrong (#24595) 2023-09-20 09:27:40 +08:00
527b284e90 [improvement](jdbc catalog) Extend conjunctExprToString to Support both 'AND' and 'OR' with Optimized DateLiteral Handling (#24537) 2023-09-19 23:11:44 +08:00
4f215a7dc3 [Improve](Fe)Ensure that only one FE process uses the metedata file (#24442) 2023-09-19 23:11:20 +08:00
1a553f7e14 [Improve](start-shell)Optimize fe&be startup (#24556)
- sh start_fe/start_be --console is used to instruct the program to run in console mode.
- sh start_fe/start_be --daemon is used to instruct the program to run in daemon mode.
- sh start_fe/start_be used starts as a background execution, records output and error logs to the specified file
2023-09-19 23:00:59 +08:00
420914abfc [Fix](RoutineLoad)multi-table query table error (#24538)
multi-table will take all tables and then convert them into OlapTable, thus causing View type conversion errors.
2023-09-19 22:57:13 +08:00
19ccb9517f [fix](iceberg) should call UserGroupInformation when enable security authentication (#24614)
Fix two bugs:
1. Call `UserGroupInformation.doAs` when enable security authentication
2. `catalogId` is 0 when `IcebergExternalCatalog` is loaded from fe image
2023-09-19 22:39:58 +08:00
32c6f5f905 [opt](test) set longer timeout for hive query cache test case (#24569)
Sometimes the first run of query may be longer then former given threshold, which case test fail.
Also add a new session variable test_query_cache_hit

So that we can use it to test if cache is hit in regression test
2023-09-19 22:25:18 +08:00
71dcb58db9 [improvement](scanner_schedule) reduce memory consumption of scanner (#24199)
* [improvement](scanner_schedule) reduce memory consumption of scanner

1. limit scanner by memory consumptin rather than blocks.
2. scheduler run correcty instread of at lest 1.
2023-09-19 21:36:23 +08:00
8afdfd58e2 [fix](case) ensure jar downloaded (#24475)
ensure jar downloaded
2023-09-19 21:26:12 +08:00
8c502f65f2 [Fix](metrics) fix wrong timer metrics for _seek_columns (#24622) 2023-09-19 20:59:09 +08:00
c3bd2a22d4 [feature](Nereids) add many array functions (#24301)
Add function array_filter, array_sortby, array_last_index, array_first_index, array_orderby, array_count
2023-09-19 18:58:49 +08:00
c9f5142420 [Imporve](UNIX_TIMESTAMP) UNIX_TIMESTAMP func support 'yyyy-MM-dd HH:mm:ss' format (#24561)
UNIX_TIMESTAMP function data format parameter supports 'yyyy-MM-dd HH:mm:ss'
The implementation is the same as the date_format function
before:
```sql
mysql> select UNIX_TIMESTAMP('2023-09-18 00:00:00','yyyy-MM-dd HH:mm:ss');
+--------------------------------------------------------------+
| unix_timestamp('2023-09-18 00:00:00', 'yyyy-MM-dd HH:mm:ss') |
+--------------------------------------------------------------+
|                                                         NULL |
+--------------------------------------------------------------+
1 row in set (0.04 sec)
```
now:
```sql
mysql> select UNIX_TIMESTAMP('2023-09-18 00:00:00','yyyy-MM-dd HH:mm:ss');
+------------+
| 1694966400 |
+------------+
| 1694966400 |
+------------+
1 row in set (0.01 sec)
```
2023-09-19 18:41:59 +08:00
037ff2d5a6 [fix](nereids) bug: runtimefilter should not be pushed through window and topN (#24439)
runtime filter should not push down through topN
runtime filter should not push down through window if target slot is not partition key of all windowExpressions
2023-09-19 18:18:06 +08:00
e54c4ef258 [pipelineX](dependency) refactor write dependency (#24555) 2023-09-19 18:01:42 +08:00
3cac6806b4 [fix](txn) persist txn record of single replica load and ccr ingestion (#24543)
Otherwise txn would be dropped when a be reboots.
2023-09-19 15:10:38 +08:00
Pxl
5e4ab7cd25 [Bug](materialized-view) add limit for drop column on mv (#24493)
add limit for drop column on mv
2023-09-19 14:32:14 +08:00
ee56783629 [fix](Java UDF) Do not use enum as the data type for JavaUdfDataType. (#24460) 2023-09-19 14:06:02 +08:00
eea84ac36c [fix](Nereids): use == instead of id to identity PhysicalHashJoin (#24535) 2023-09-19 12:06:30 +08:00
b092bdaabf [feature](load) collect loaded rows on table level after txn published (#24346)
As title.

Stream load 20 lines

```
2023-09-14 11:40:04,186 DEBUG (PUBLISH_VERSION|23) [DatabaseTransactionMgr.updateCatalogAfterVisible():1769] table id to loaded rows:{51016=20}
```

```
mysql> select count(*) from dup_tbl_basic;
+----------+
| count(*) |
+----------+
|       20 |
+----------+
1 row in set (0.05 sec)
```
2023-09-19 12:00:08 +08:00
80bcb43143 [Feature]Support external table sample stats collection (#24376)
Support hive table sample stats collection. Gramma is like

`analyze table with sample percent 10`
2023-09-19 11:20:27 +08:00
6a33e4639a [schedule](pipeline) Remove wait schedule time in pipeline query engine and change current queue to std::mutex (#24525)
This reverts commit 591aeaa98d1178e2e277278c7afeafef9bdb88d6.
2023-09-18 23:57:56 +08:00
1ac7c8f14d [improvement](scan_queue_mem_limit) scan queue mem limit is so small for (#24553)
a wide table

Users rarely set scan_queue_mem_limit, so it almost often works as 2G/20. However,
somecases we need set it to a larger value, especially for insrt into
select from a wide table.
2023-09-18 20:22:03 +08:00
c54fc82031 [improve](nereids) expand runtime filter target by hashJoin's equal condition (#23274)
generate more runtime filters
example:

lineitem join partsupp on l_partkey= ps_partkey join filter(part) on ps_partkey=p_partkey 
we need two RFs:
RF1: p_partkey->ps_partkey
RF2: p_partkey->l_partkey

This pr will generate RF2, but current version will not.

merge runtime filters
current version, if one src could affect 2 targets, we will generate 2 runtime filters.
after this pr, the two rf will be merged.
refer to regression test: ds_rf2/ds_rf5/ds_rf54
2023-09-18 18:27:01 +08:00
3f0d9d182f [doc](optimizer) Rewrite docs of statistics according to the current master (#24438)
Remove verbsoe sample, stale grammar, deleted function
And add some use advices
2023-09-18 18:25:43 +08:00
67e8951b72 [fix](stats) Fix analyze failed when there are thousands of partitions. (#24521)
It's caused by we used same query id for multiple queries of same olap analyze task, but many structures related to query execution depends on query id.
2023-09-18 17:27:10 +08:00
dcabc06510 [fix](pipeline) update check-pr-if-need-run-build.sh (#24515)
update check-pr-if-need-run-build.sh
2023-09-18 17:18:23 +08:00
9a47e8fa73 [catalog lock](log) enable catalog lock log (#24530) 2023-09-18 16:56:01 +08:00
96f197114c [Improve](explode) improve explode func with array nested other type (#24455)
improve explode func with array nested other type
2023-09-18 16:05:30 +08:00
ef4ab106d8 [fix](security): non-static inner class should not implement serialized interface, or when it is serialized it will contain outer class info, which is not safe #24454
fix: non-static inner class should not implement serialized interface, or when it is serialized it will contain outer class info, which is not safe

And in this scenario, the class does not use info of outer class, which should use static class instead
2023-09-18 15:55:43 +08:00
b9f1ac153a [improvement](profile) do not remove value 0 counter (#24487)
do not remove value 0 counter
2023-09-18 15:31:19 +08:00
Pxl
1d1eeaec4a [Chore](checks) fix Project root configuration file: NONE (#24533)
fix Project root configuration file: NONE
2023-09-18 15:30:20 +08:00
b4432ce577 [Feature](statistics)Support external table analyze partition (#24154)
Enable collect partition level stats for hive external table.
2023-09-18 14:59:26 +08:00
1153907897 [opt](nereids)add explanation why we always update col stats in StatsCalculator. 2023-09-18 13:47:37 +08:00
f3e350e8ec [Improvement](statistics)Improve statistics user experience (#24414)
Two improvements:
1. Move the `Job_id` column for the return info of `Analyze table` command to the first column. To keep consistent with `show analyze`.
```
mysql> analyze table hive.tpch100.region;
+--------+--------------+-------------------------+------------+--------------------------------+
| Job_Id | Catalog_Name | DB_Name                 | Table_Name | Columns                        |
+--------+--------------+-------------------------+------------+--------------------------------+
| 14403  | hive         | default_cluster:tpch100 | region     | [r_regionkey,r_comment,r_name] |
+--------+--------------+-------------------------+------------+--------------------------------+
1 row in set (0.03 sec)
```
2. Add `analyze_timeout` session variable, to control `analyze table/database with sync` timeout.
2023-09-18 13:36:41 +08:00
79fbc2e819 [regression-test](planner)add test for insert default values (#23559)
Co-authored-by: Yongqiang YANG <98214048+dataroaring@users.noreply.github.com>
2023-09-18 12:24:54 +08:00
c56d3237e8 [opt](Nereids) remove canEliminate flag on LogicalProject (#24362)
since we have three infrastructure to ensure changing input column order
not lead to wrong result, we could remove this flag on LogicalProject to
eliminate project as mush as possible and let code clear.

1. output list in ResultSink node
2. regular children output in SetOperation node
3. producer to consumer slot id map in CteConsumer
2023-09-18 12:22:33 +08:00
ae0b58fcde [typo](doc)modify error word (#24456) 2023-09-18 11:27:52 +08:00
7a8e3a6587 [fix](nereids) fix cte filter pushdown if the filters can be aggregated (#24489)
Current cte common filter extraction doesn't work if the filters can be aggregated, which will lead the common filter can't be pushed down inside cte. Consider the following case:
with main as (select c1 from t1) select * from (select m1.* from main m1, main m2 where m1.c1 = m2.c1) abc where c1 = 1;
The common c1=1 filter can't be pushed down.

This pr fixed the original extraction logic from set to list to make the logic works, and this will also resolve the tpcds query4/11's pattern works well also.
2023-09-18 11:26:55 +08:00
932b639086 [refactor](point query) decouple PointQueryExec from the Coordinator (#24509)
In order to decouple PointQueryExec from the Coordinator, both PointQueryExec and Coordinator inherit from CoordInterface, and are collectively scheduled through StmtExecutor.
2023-09-18 11:25:40 +08:00
23a75d0277 [FIX](decimalv3) Fix decimalv3 with abnormal value same with mysql result (#24499)
Fix decimalv3 with abnormal value same with mysql result
2023-09-18 11:12:26 +08:00
Pxl
7f7ec496cd [Chore](checks) fix sonarcloud properties have wrong path (#24517)
fix sonarcloud properties have wrong path
2023-09-18 11:11:53 +08:00
c746a89c72 [improvement](transaction) print txn edit log cost time #24501 2023-09-18 11:06:30 +08:00
f04bc05a7e [fix](agg) The offset value was added twice in 'pack_fixed' (#24506) 2023-09-18 10:24:32 +08:00
591aeaa98d Revert "[schedule](pipeline) Remove wait schedule time in pipeline query engine (#23994)" (#24472)
This reverts commit 32a7eef96a09799c8336c1964bfe7d676b7e4c98.
2023-09-18 09:57:38 +08:00
a07f59de8c [Fix](multi-catalog) Fix hadoop viewfs issues. (#24507)
Error Msg:
Caused by: org.apache.doris.datasource.CacheException: failed to get input splits for FileCacheKey{location='viewfs://my-cluster/ns1/usr/hive/warehouse/viewfs.db/parquet_table', inputFormat='org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'} in catalog test_viewfs_hive
        at org.apache.doris.datasource.hive.HiveMetaStoreCache.loadFiles(HiveMetaStoreCache.java:466) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.hive.HiveMetaStoreCache.access$400(HiveMetaStoreCache.java:112) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.hive.HiveMetaStoreCache$3.load(HiveMetaStoreCache.java:210) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.hive.HiveMetaStoreCache$3.load(HiveMetaStoreCache.java:202) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.CacheBulkLoader.lambda$null$0(CacheBulkLoader.java:42) ~[doris-fe.jar:1.2-SNAPSHOT]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_131]
        ... 3 more
Caused by: org.apache.doris.common.UserException: errCode = 2, detailMessage = Failed to list located status for path: viewfs://my-cluster/ns1/usr/hive/warehouse/viewfs.db/parquet_table
        at org.apache.doris.fs.remote.RemoteFileSystem.listLocatedFiles(RemoteFileSystem.java:54) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.hive.HiveMetaStoreCache.getFileCache(HiveMetaStoreCache.java:381) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.hive.HiveMetaStoreCache.loadFiles(HiveMetaStoreCache.java:432) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.hive.HiveMetaStoreCache.access$400(HiveMetaStoreCache.java:112) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.hive.HiveMetaStoreCache$3.load(HiveMetaStoreCache.java:210) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.hive.HiveMetaStoreCache$3.load(HiveMetaStoreCache.java:202) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.CacheBulkLoader.lambda$null$0(CacheBulkLoader.java:42) ~[doris-fe.jar:1.2-SNAPSHOT]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_131]
        ... 3 more
Caused by: java.nio.file.AccessDeniedException: viewfs://my-cluster/ns1/usr/hive/warehouse/viewfs.db/parquet_table: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
        at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:215) ~[hadoop-aws-3.3.6.jar:?]
        at org.apache.hadoop.fs.s3a.Invoker.onceInTheFuture(Invoker.java:190) ~[hadoop-aws-3.3.6.jar:?]
        at org.apache.hadoop.fs.s3a.Listing$ObjectListingIterator.next(Listing.java:651) ~[hadoop-aws-3.3.6.jar:?]
        at org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator.requestNextBatch(Listing.java:430) ~[hadoop-aws-3.3.6.jar:?]
        at org.apache.hadoop.fs.s3a.Listing$FileStatusListingIterator.<init>(Listing.java:372) ~[hadoop-aws-3.3.6.jar:?]
        at org.apache.hadoop.fs.s3a.Listing.createFileStatusListingIterator(Listing.java:143) ~[hadoop-aws-3.3.6.jar:?]
        at org.apache.hadoop.fs.s3a.Listing.getListFilesAssumingDir(Listing.java:211) ~[hadoop-aws-3.3.6.jar:?]
        at org.apache.hadoop.fs.s3a.S3AFileSystem.innerListFiles(S3AFileSystem.java:4898) ~[hadoop-aws-3.3.6.jar:?]
        at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listFiles$38(S3AFileSystem.java:4840) ~[hadoop-aws-3.3.6.jar:?]
        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547) ~[hadoop-common-3.3.6.jar:?]
        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528) ~[hadoop-common-3.3.6.jar:?]
        at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449) ~[hadoop-common-3.3.6.jar:?]
        at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2480) ~[hadoop-aws-3.3.6.jar:?]
        at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2499) ~[hadoop-aws-3.3.6.jar:?]
        at org.apache.hadoop.fs.s3a.S3AFileSystem.listFiles(S3AFileSystem.java:4839) ~[hadoop-aws-3.3.6.jar:?]
        at org.apache.doris.fs.remote.RemoteFileSystem.listLocatedFiles(RemoteFileSystem.java:50) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.hive.HiveMetaStoreCache.getFileCache(HiveMetaStoreCache.java:381) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.hive.HiveMetaStoreCache.loadFiles(HiveMetaStoreCache.java:432) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.hive.HiveMetaStoreCache.access$400(HiveMetaStoreCache.java:112) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.hive.HiveMetaStoreCache$3.load(HiveMetaStoreCache.java:210) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.datasource.hive.HiveMetaStoreCache$3.load(HiveMetaStoreCache.java:202) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.CacheBulkLoader.lambda$null$0(CacheBulkLoader.java:42) ~[doris-fe.jar:1.2-SNAPSHOT]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_131]
        ... 3 more
2023-09-18 09:51:33 +08:00
f1e049d4d6 [fix](java-udaf)Fix need to restart BE after replacing the jar package in java-udaf (#24469) 2023-09-17 21:17:05 +08:00
92e521bba5 [regression-test](fix) fix query_p0/having/having.groovy case bug (#24478) 2023-09-17 20:26:21 +08:00
ebe582758f [opt](Nereids): use LocalDate to replace Calendar (#24361) 2023-09-17 11:16:03 +08:00