[regression](multi-catalog)add EMR cloud env test tools (#21788)
add emr test tools for aliyun, huawei cloud, tencent cloud.
This commit is contained in:
178
tools/emr_storage_regression/README.md
Normal file
178
tools/emr_storage_regression/README.md
Normal file
@ -0,0 +1,178 @@
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one
|
||||
or more contributor license agreements. See the NOTICE file
|
||||
distributed with this work for additional information
|
||||
regarding copyright ownership. The ASF licenses this file
|
||||
to you under the Apache License, Version 2.0 (the
|
||||
"License"); you may not use this file except in compliance
|
||||
with the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing,
|
||||
software distributed under the License is distributed on an
|
||||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations
|
||||
under the License.
|
||||
-->
|
||||
|
||||
# Data Lake Regression Testing Tool For External Table
|
||||
|
||||
Used to test the doris external table on object storage for cloud vendors
|
||||
|
||||
> Supported storage formats: HDFS, Alibaba Cloud OSS, Tencent Cloud COS, Huawei Cloud OBS
|
||||
|
||||
> Supported data lake table formats: Iceberg
|
||||
|
||||
The following provides the example of the command line options:
|
||||
|
||||
```
|
||||
sh tools/emr_storage_regression/emr_tools.sh --profile default_emr_env.sh
|
||||
```
|
||||
|
||||
Or
|
||||
|
||||
```
|
||||
sh tools/emr_storage_regression/emr_tools.sh --case CASE --endpoint ENDPOINT --region REGION --service SERVICE --ak AK --sk SK --host HOST --user USER --port PORT
|
||||
```
|
||||
|
||||
The usage of each option is described below.
|
||||
|
||||
## Connectivity Test
|
||||
|
||||
When the `--case` option is set to `ping`, will check Doris's connectivity on EMR:
|
||||
|
||||
- `--endpoint`, Object Storage Endpoint.
|
||||
|
||||
- `--region`, Object Storage Region.
|
||||
|
||||
- `--ak`, Object Storage Access Key.
|
||||
|
||||
- `--sk`, Object Storage Secret Key.
|
||||
|
||||
- `--host`, Doris Mysql Client IP.
|
||||
|
||||
- `--user`, Doris Mysql Client Username.
|
||||
|
||||
- `--port`, Doris Mysql Client Port.
|
||||
|
||||
- `--service`, EMR cloud vendors: ali(Alibaba), hw(Huawei), tx(tencent).
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Need modify the environment variable in `default_emr_env.sh`, the script will execute `source default_emr_env.sh` to make the environment variable take effect.
|
||||
|
||||
If environment variables are configured, you can run the test script directly with the following command:
|
||||
|
||||
```
|
||||
sh emr_tools.sh --profile default_emr_env.sh
|
||||
```
|
||||
|
||||
### The Script Execution Steps For Connectivity Test
|
||||
|
||||
1. Create Spark and Hive tables on EMR
|
||||
2. Use Spark and Hive command lines to insert sample data
|
||||
3. Doris creates the Catalog for connectivity test
|
||||
4. Execute SQL for connectivity test: `ping.sql`
|
||||
|
||||
### Alibaba Cloud
|
||||
|
||||
```
|
||||
sh emr_tools.sh --profile default_emr_env.sh
|
||||
```
|
||||
|
||||
Or
|
||||
|
||||
Set `--service` to `ali`, and then test connectivity on Huawei Cloud.
|
||||
|
||||
```
|
||||
sh emr_tools.sh --case ping --endpoint oss-cn-beijing-internal.aliyuncs.com --region cn-beijing --service ali --ak ak --sk sk --host 127.0.0.1 --user root --port 9030 > log
|
||||
```
|
||||
|
||||
Alibaba Cloud EMR also supports testing connectivity for both Doris with DLF metadata and Doris on OSS-HDFS storage.
|
||||
|
||||
- The DLF metadata connectivity test needs to be performed on the EMR cluster where the DLF serves as the metadata store, Default value of `DLF_ENDPOINT` is `datalake-vpc.cn-beijing.aliyuncs.com`, configured at ping_test/ping_poc.sh.
|
||||
|
||||
- To test the OSS-HDFS storage connectivity, need to [enable the HDFS service on the OSS storage and configure](https://www.alibabacloud.com/help/en/e-mapreduce/latest/oss-hdfsnew), Default value of `JINDO_ENDPOINT` is `cn-beijing.oss-dls.aliyuncs.com`, configured at ping_test/ping_poc.sh.
|
||||
|
||||
### Tencent Cloud
|
||||
|
||||
```
|
||||
sh emr_tools.sh --profile default_emr_env.sh
|
||||
```
|
||||
|
||||
Or
|
||||
|
||||
Set `--service` to `tx`, and then test connectivity on Huawei Cloud.
|
||||
|
||||
```
|
||||
sh emr_tools.sh --case ping --endpoint cos.ap-beijing.myqcloud.com --region ap-beijing --service tx --ak ak --sk sk --host 127.0.0.1 --user root --port 9030 > log
|
||||
```
|
||||
|
||||
### Huawei Cloud
|
||||
|
||||
```
|
||||
sh emr_tools.sh --profile default_emr_env.sh
|
||||
```
|
||||
|
||||
Or
|
||||
|
||||
Set `--service`to `hw`, and then test connectivity on Huawei Cloud.
|
||||
|
||||
```
|
||||
sh emr_tools.sh --case ping --endpoint obs.cn-north-4.myhuaweicloud.com --region cn-north-4 --service hw --ak ak --sk sk --host 127.0.0.1 --user root --port 9030 > log
|
||||
```
|
||||
|
||||
## Performance Testing on Standard Test Set
|
||||
|
||||
When the `--case` option is set to `data_set`, will test the query performance of Doris external table:
|
||||
|
||||
- `--test` test data set: ssb, ssb_flat, tpch, clickbench and all. Default `all`.
|
||||
|
||||
- `--service`, EMR cloud vendors: ali(Alibaba), hw(Huawei), tx(tencent).
|
||||
|
||||
- `--host`, Doris Mysql Client IP.
|
||||
|
||||
- `--user`, Doris Mysql Client Username.
|
||||
|
||||
- `--port`, Doris Mysql Client Port.
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Just modify the above environment variable in `default_emr_env.sh`, the script will execute `source default_emr_env.sh` to make the environment variable take effect.
|
||||
|
||||
If environment variables are configured, you can run the test script directly with the following command:
|
||||
|
||||
```
|
||||
sh emr_tools.sh --profile default_emr_env.sh
|
||||
```
|
||||
|
||||
### Prepare Data
|
||||
|
||||
1. To run the standard test set using the `emr_tools.sh` script, you need to rewrite the object storage bucket specified by the `BUCKET` variable, and then prepare data in advance and put them under the bucket. The script will generate table creation statements based on the bucket.
|
||||
|
||||
2. Now the `emr_tools.sh` script supports iceberg, parquet and orc data for ssb, ssb_flat, tpch, clickbench.
|
||||
|
||||
### Execution Steps
|
||||
|
||||
1. After the connectivity test, the Doris Catalog corresponding to the standard test set is created
|
||||
2. Prepare the test set data based on the object storage bucket specified by the `BUCKET` variable
|
||||
3. Generate Spark table creation statements and create Spark object storage tables on EMR
|
||||
4. Create the spark table in the local HDFS directory: `hdfs:///benchmark-hdfs`
|
||||
5. You can choose to analyze Doris tables ahead of time and manually execute the statements in `analyze.sql` in the Doris Catalog
|
||||
6. Execute standard test set scripts: `run_standard_set.sh`
|
||||
|
||||
### Standard data set: ssb, ssb_flat, tpch, clickbench
|
||||
|
||||
- Full test. After executing the test command, Doris will run ssb, ssb_flat, tpch, clickbench tests in sequence, and the test results will include the cases on HDFS and on the object storage specified by `--service`.
|
||||
|
||||
```
|
||||
sh emr_tools.sh --case data_set --service ali --host 127.0.0.1 --user root --port 9030 > log
|
||||
```
|
||||
|
||||
- Specify a single test. `--test` option can be set to one of ssb, ssb_flat, tpch and clickbench.
|
||||
|
||||
```
|
||||
sh emr_tools.sh --case data_set --test ssb --service ali --host 127.0.0.1 --user root --port 9030 > log
|
||||
```
|
||||
58
tools/emr_storage_regression/default_emr_env.sh
Normal file
58
tools/emr_storage_regression/default_emr_env.sh
Normal file
@ -0,0 +1,58 @@
|
||||
#!/usr/bin/env bash
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
##############################################################
|
||||
# See emr_tools.sh
|
||||
##############################################################
|
||||
|
||||
# specified sevices: ali,hw,tx
|
||||
export SERVICE=ali
|
||||
# doris host
|
||||
export HOST=127.0.0.1
|
||||
# doris user
|
||||
export USER=root
|
||||
# doris mysql cli port
|
||||
export PORT=9030
|
||||
|
||||
# prepare endpoint,region,ak/sk
|
||||
if [[ ${SERVICE} == 'ali' ]]; then
|
||||
export CASE=ping
|
||||
export AK=ak
|
||||
export SK=sk
|
||||
export ENDPOINT=oss-cn-beijing-internal.aliyuncs.com
|
||||
export REGION=oss-cn-beijing
|
||||
export HMS_META_URI="thrift://172.16.1.1:9083"
|
||||
export HMS_WAREHOUSE=oss://benchmark-oss/user
|
||||
elif [[ ${SERVICE} == 'hw' ]]; then
|
||||
export CASE=ping
|
||||
export AK=ak
|
||||
export SK=sk
|
||||
export ENDPOINT=obs.cn-north-4.myhuaweicloud.com
|
||||
export REGION=cn-north-4
|
||||
export HMS_META_URI="thrift://node1:9083,thrift://node2:9083"
|
||||
export HMS_WAREHOUSE=obs://datalake-bench/user
|
||||
export BEELINE_URI="jdbc:hive2://192.168.0.1:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;hive.server2.proxy.user=hive"
|
||||
elif [[ ${SERVICE} == 'tx' ]]; then
|
||||
export CASE=ping
|
||||
export AK=ak
|
||||
export SK=sk
|
||||
export ENDPOINT=cos.ap-beijing.mycloud.com
|
||||
export REGION=ap-beijing
|
||||
export HMS_META_URI="thrift://172.21.0.1:7004"
|
||||
export HMS_WAREHOUSE=cosn://datalake-bench-cos-1308700295/user
|
||||
fi
|
||||
192
tools/emr_storage_regression/emr_tools.sh
Normal file
192
tools/emr_storage_regression/emr_tools.sh
Normal file
@ -0,0 +1,192 @@
|
||||
#!/usr/bin/env bash
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
##############################################################
|
||||
# This script is used to test EMR cloud service
|
||||
# Usage:
|
||||
# provide your env arguments in default_emr_env.sh
|
||||
# sh emr_tools.sh --case ping --endpoint oss-cn-beijing-internal.aliyuncs.com --region cn-beijing --service ali --ak ak --sk sk
|
||||
##############################################################
|
||||
|
||||
set -eo pipefail
|
||||
|
||||
usage() {
|
||||
echo "
|
||||
Usage: $0 <options>
|
||||
Optional options:
|
||||
[no option]
|
||||
--case regression case runner: ping, data_set
|
||||
--profile cloud credential profile
|
||||
--ak cloud access key
|
||||
--sk cloud secret key
|
||||
--endpoint cloud endpoint
|
||||
--region cloud region
|
||||
--service cloud optional service provider: ali, tx, hw
|
||||
--host doris mysql cli host, example: 127.0.0.1
|
||||
--user doris username, example: user
|
||||
--port doris port, example: 9030
|
||||
Example:
|
||||
sh emr_tools.sh --case ping --endpoint oss-cn-beijing-internal.aliyuncs.com --region cn-beijing --service ali --ak ak --sk sk
|
||||
"
|
||||
exit 1
|
||||
}
|
||||
|
||||
if ! OPTS="$(getopt \
|
||||
-n "$0" \
|
||||
-o '' \
|
||||
-l 'case:' \
|
||||
-l 'profile:' \
|
||||
-l 'ak:' \
|
||||
-l 'sk:' \
|
||||
-l 'endpoint:' \
|
||||
-l 'region:' \
|
||||
-l 'service:' \
|
||||
-l 'host:' \
|
||||
-l 'user:' \
|
||||
-l 'port:' \
|
||||
-l 'test:' \
|
||||
-o 'h' \
|
||||
-- "$@")"; then
|
||||
usage
|
||||
fi
|
||||
eval set -- "${OPTS}"
|
||||
|
||||
while true; do
|
||||
case "$1" in
|
||||
--profile)
|
||||
PROFILE="$2"
|
||||
# can use custom profile: sh emr_tools.sh --profile default_emr_env.sh
|
||||
if [[ -n "${PROFILE}" ]]; then
|
||||
# example: "$(pwd)/default_emr_env.sh"
|
||||
# shellcheck disable=SC1090
|
||||
source "${PROFILE}"
|
||||
fi
|
||||
shift 2
|
||||
break
|
||||
;;
|
||||
--case)
|
||||
CASE="$2"
|
||||
shift 2
|
||||
;;
|
||||
--ak)
|
||||
AK="$2"
|
||||
shift 2
|
||||
;;
|
||||
--sk)
|
||||
SK="$2"
|
||||
shift 2
|
||||
;;
|
||||
--endpoint)
|
||||
ENDPOINT="$2"
|
||||
shift 2
|
||||
;;
|
||||
--region)
|
||||
REGION="$2"
|
||||
shift 2
|
||||
;;
|
||||
--test)
|
||||
TEST_SET="$2"
|
||||
shift 2
|
||||
;;
|
||||
--service)
|
||||
SERVICE="$2"
|
||||
shift 2
|
||||
;;
|
||||
--host)
|
||||
HOST="$2"
|
||||
shift 2
|
||||
;;
|
||||
--user)
|
||||
USER="$2"
|
||||
shift 2
|
||||
;;
|
||||
--port)
|
||||
PORT="$2"
|
||||
shift 2
|
||||
;;
|
||||
-h)
|
||||
usage
|
||||
;;
|
||||
--)
|
||||
shift
|
||||
break
|
||||
;;
|
||||
*)
|
||||
echo "$1"
|
||||
echo "Internal error"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
export FE_HOST=${HOST}
|
||||
export USER=${USER}
|
||||
export FE_QUERY_PORT=${PORT}
|
||||
|
||||
if [[ ${CASE} == 'ping' ]]; then
|
||||
if [[ ${SERVICE} == 'hw' ]]; then
|
||||
# shellcheck disable=SC2269
|
||||
HMS_META_URI="${HMS_META_URI}"
|
||||
# shellcheck disable=SC2269
|
||||
HMS_WAREHOUSE="${HMS_WAREHOUSE}"
|
||||
# shellcheck disable=SC2269
|
||||
BEELINE_URI="${BEELINE_URI}"
|
||||
elif [[ ${SERVICE} == 'ali' ]]; then
|
||||
# shellcheck disable=SC2269
|
||||
HMS_META_URI="${HMS_META_URI}"
|
||||
# shellcheck disable=SC2269
|
||||
HMS_WAREHOUSE="${HMS_WAREHOUSE}"
|
||||
else
|
||||
# [[ ${SERVICE} == 'tx' ]];
|
||||
# shellcheck disable=SC2269
|
||||
HMS_META_URI="${HMS_META_URI}"
|
||||
# shellcheck disable=SC2269
|
||||
HMS_WAREHOUSE="${HMS_WAREHOUSE}"
|
||||
fi
|
||||
sh ping_test/ping_poc.sh "${ENDPOINT}" "${REGION}" "${SERVICE}" "${AK}" "${SK}" "${HMS_META_URI}" "${HMS_WAREHOUSE}" "${BEELINE_URI}"
|
||||
elif [[ ${CASE} == 'data_set' ]]; then
|
||||
if [[ ${SERVICE} == 'tx' ]]; then
|
||||
BUCKET=cosn://datalake-bench-cos-1308700295
|
||||
elif [[ ${SERVICE} == 'ali' ]]; then
|
||||
BUCKET=oss://benchmark-oss
|
||||
fi
|
||||
# gen table for spark
|
||||
if ! sh stardard_set/gen_spark_create_sql.sh "${BUCKET}" obj; then
|
||||
echo "Fail to generate spark obj table for test set"
|
||||
exit 1
|
||||
fi
|
||||
if ! sh stardard_set/gen_spark_create_sql.sh hdfs:///benchmark-hdfs hdfs; then
|
||||
echo "Fail to generate spark hdfs table for test set, import hdfs data first"
|
||||
exit 1
|
||||
fi
|
||||
# FE_HOST=172.16.1.163
|
||||
# USER=root
|
||||
# PORT=9035
|
||||
if [[ -z ${TEST_SET} ]]; then
|
||||
TEST_SET='all'
|
||||
fi
|
||||
TYPE=hdfs sh stardard_set/run_standard_set.sh "${FE_HOST}" "${USER}" "${PORT}" hms_hdfs "${TEST_SET}"
|
||||
TYPE=hdfs sh stardard_set/run_standard_set.sh "${FE_HOST}" "${USER}" "${PORT}" iceberg_hms "${TEST_SET}"
|
||||
if [[ ${SERVICE} == 'tx' ]]; then
|
||||
sh stardard_set/run_standard_set.sh "${FE_HOST}" "${USER}" "${PORT}" hms_cos "${TEST_SET}"
|
||||
sh stardard_set/run_standard_set.sh "${FE_HOST}" "${USER}" "${PORT}" iceberg_hms_cos "${TEST_SET}"
|
||||
elif [[ ${SERVICE} == 'ali' ]]; then
|
||||
sh stardard_set/run_standard_set.sh "${FE_HOST}" "${USER}" "${PORT}" hms_oss "${TEST_SET}"
|
||||
sh stardard_set/run_standard_set.sh "${FE_HOST}" "${USER}" "${PORT}" iceberg_hms_oss "${TEST_SET}"
|
||||
fi
|
||||
fi
|
||||
@ -0,0 +1,12 @@
|
||||
CREATE CATALOG IF NOT EXISTS hms_hdfs PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI" );
|
||||
CREATE CATALOG IF NOT EXISTS hms_oss PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI", "oss.secret_key" = "SK_INPUT", "oss.endpoint" = "ENDPOINT", "oss.access_key" = "AK_INPUT" );
|
||||
CREATE CATALOG IF NOT EXISTS hms_jindo PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI", "oss.secret_key" = "SK_INPUT", "oss.endpoint" = "JINDO_ENDPOINT", "oss.access_key" = "AK_INPUT", "oss.hdfs.enabled" = "true" );
|
||||
|
||||
CREATE CATALOG IF NOT EXISTS iceberg_hms PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI" );
|
||||
CREATE CATALOG IF NOT EXISTS iceberg_hms_oss PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI", "oss.secret_key" = "SK_INPUT", "oss.endpoint" = "ENDPOINT", "oss.access_key" = "AK_INPUT" );
|
||||
CREATE CATALOG IF NOT EXISTS iceberg_hms_jindo PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI", "oss.secret_key" = "SK_INPUT", "oss.endpoint" = "JINDO_ENDPOINT", "oss.access_key" = "AK_INPUT", "oss.hdfs.enabled" = "true" );
|
||||
|
||||
CREATE CATALOG IF NOT EXISTS dlf PROPERTIES( "type" = "hms", "hive.metastore.type" = "dlf", "dlf.proxy.mode" = "DLF_ONLY", "dlf.endpoint" = "DLF_ENDPOINT", "dlf.uid" = "217316283625971977", "dlf.access_key" = "AK_INPUT", "dlf.secret_key" = "SK_INPUT" );
|
||||
CREATE CATALOG IF NOT EXISTS dlf_jindo PROPERTIES( "type" = "hms", "hive.metastore.type" = "dlf", "dlf.proxy.mode" = "DLF_ONLY", "dlf.endpoint" = "DLF_ENDPOINT", "dlf.uid" = "217316283625971977", "dlf.access_key" = "AK_INPUT", "dlf.secret_key" = "SK_INPUT", "oss.hdfs.enabled" = "true" );
|
||||
CREATE CATALOG IF NOT EXISTS iceberg_dlf PROPERTIES ( "type"="iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.type" = "dlf", "dlf.endpoint" = "DLF_ENDPOINT", "dlf.region" = "cn-beijing", "dlf.proxy.mode" = "DLF_ONLY", "dlf.uid" = "217316283625971977", "dlf.access_key" = "AK_INPUT", "dlf.secret_key" = "SK_INPUT" );
|
||||
CREATE CATALOG IF NOT EXISTS iceberg_dlf_jindo PROPERTIES ( "type"="iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.type" = "dlf", "dlf.endpoint" = "DLF_ENDPOINT", "dlf.region" = "cn-beijing", "dlf.proxy.mode" = "DLF_ONLY", "dlf.uid" = "217316283625971977", "dlf.access_key" = "AK_INPUT", "dlf.secret_key" = "SK_INPUT", "oss.hdfs.enabled" = "true" );
|
||||
@ -0,0 +1,7 @@
|
||||
CREATE CATALOG IF NOT EXISTS hms_hdfs PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI" );
|
||||
CREATE CATALOG IF NOT EXISTS hms_s3 PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI", "s3.secret_key" = "SK_INPUT", "s3.endpoint" = "ENDPOINT", "s3.access_key" = "AK_INPUT" );
|
||||
|
||||
CREATE CATALOG IF NOT EXISTS iceberg_hms PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI" );
|
||||
CREATE CATALOG IF NOT EXISTS iceberg_hms_s3 PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI", "s3.secret_key" = "SK_INPUT", "s3.endpoint" = "ENDPOINT", "s3.access_key" = "AK_INPUT" );
|
||||
|
||||
-- glue s3
|
||||
@ -0,0 +1,5 @@
|
||||
CREATE CATALOG IF NOT EXISTS hms_hdfs PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI" );
|
||||
CREATE CATALOG IF NOT EXISTS hms_obs PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI", "obs.secret_key" = "SK_INPUT", "obs.endpoint" = "ENDPOINT", "obs.access_key" = "AK_INPUT" );
|
||||
|
||||
CREATE CATALOG IF NOT EXISTS iceberg_hms PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI" );
|
||||
CREATE CATALOG IF NOT EXISTS iceberg_hms_obs PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI", "obs.secret_key" = "SK_INPUT", "obs.endpoint" = "ENDPOINT", "obs.access_key" = "AK_INPUT" );
|
||||
@ -0,0 +1,5 @@
|
||||
CREATE CATALOG IF NOT EXISTS hms_hdfs PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI" );
|
||||
CREATE CATALOG IF NOT EXISTS hms_cos PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI", "cos.secret_key" = "SK_INPUT", "cos.endpoint" = "ENDPOINT", "cos.access_key" = "AK_INPUT" );
|
||||
|
||||
CREATE CATALOG IF NOT EXISTS iceberg_hms PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI" );
|
||||
CREATE CATALOG IF NOT EXISTS iceberg_hms_cos PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI", "cos.secret_key" = "SK_INPUT", "cos.endpoint" = "ENDPOINT", "cos.access_key" = "AK_INPUT" );
|
||||
@ -0,0 +1,9 @@
|
||||
CREATE DATABASE IF NOT EXISTS hive_dlf_db;
|
||||
CREATE TABLE IF NOT EXISTS hive_dlf_db.types ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
|
||||
CREATE TABLE IF NOT EXISTS hive_dlf_db.types_one_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) PARTITIONED BY (dt STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
|
||||
CREATE TABLE IF NOT EXISTS hive_dlf_db.types_multi_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN ) PARTITIONED BY (dt STRING, hms_timstamp TIMESTAMP, hms_date DATE) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
|
||||
|
||||
CREATE DATABASE IF NOT EXISTS hive_iceberg_db_dlf;
|
||||
CREATE TABLE IF NOT EXISTS hive_iceberg_db_dlf.types ( hms_int INT, hms_smallint INT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char STRING, hms_varchar STRING, hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) STORED BY "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler" TBLPROPERTIES ('iceberg.catalog'='dlf', 'format-version'= '2');
|
||||
CREATE TABLE IF NOT EXISTS hive_iceberg_db_dlf.types_one_part ( hms_int INT, hms_smallint INT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char STRING, hms_varchar STRING, hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) PARTITIONED BY (dt STRING) STORED BY "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler" TBLPROPERTIES ('iceberg.catalog'='dlf', 'format-version'='2');
|
||||
CREATE TABLE IF NOT EXISTS hive_iceberg_db_dlf.types_multi_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN ) PARTITIONED BY (dt STRING, hms_timstamp TIMESTAMP, hms_date DATE) STORED BY "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler" TBLPROPERTIES ('iceberg.catalog'='dlf', 'format-version'='2');
|
||||
@ -0,0 +1,9 @@
|
||||
CREATE DATABASE IF NOT EXISTS hive_hms_db;
|
||||
CREATE TABLE IF NOT EXISTS hive_hms_db.types ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
|
||||
CREATE TABLE IF NOT EXISTS hive_hms_db.types_one_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) PARTITIONED BY (dt STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
|
||||
CREATE TABLE IF NOT EXISTS hive_hms_db.types_multi_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN ) PARTITIONED BY (dt STRING, hms_timstamp TIMESTAMP, hms_date DATE) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
|
||||
|
||||
CREATE DATABASE IF NOT EXISTS hive_iceberg_db_hms;
|
||||
CREATE TABLE IF NOT EXISTS hive_iceberg_db_hms.types ( hms_int INT, hms_smallint INT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char STRING, hms_varchar STRING, hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) STORED BY "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler" TBLPROPERTIES ('format-version'='2');
|
||||
CREATE TABLE IF NOT EXISTS hive_iceberg_db_hms.types_one_part ( hms_int INT, hms_smallint INT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char STRING, hms_varchar STRING, hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) PARTITIONED BY (dt STRING) STORED BY "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler" TBLPROPERTIES ( 'format-version'='2');
|
||||
CREATE TABLE IF NOT EXISTS hive_iceberg_db_hms.types_multi_part ( hms_int INT, hms_smallint INT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char STRING, hms_varchar STRING, hms_bool BOOLEAN ) PARTITIONED BY (dt STRING, hms_timstamp TIMESTAMP, hms_date DATE) STORED BY "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler" TBLPROPERTIES ( 'format-version'='2');
|
||||
@ -0,0 +1,9 @@
|
||||
CREATE DATABASE IF NOT EXISTS spark_hms_db;
|
||||
CREATE TABLE IF NOT EXISTS spark_hms_db.types ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) using parquet;
|
||||
CREATE TABLE IF NOT EXISTS spark_hms_db.types_one_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) using parquet partitioned by (dt string);
|
||||
CREATE TABLE IF NOT EXISTS spark_hms_db.types_multi_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN ) using parquet partitioned by (dt string, hms_timstamp TIMESTAMP, hms_date DATE);
|
||||
|
||||
--CREATE DATABASE IF NOT EXISTS iceberg.spark_iceberg_db_hms;
|
||||
--CREATE TABLE IF NOT EXISTS iceberg.spark_iceberg_db_hms.types ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) USING iceberg TBLPROPERTIES ( 'format-version'='2');
|
||||
--CREATE TABLE IF NOT EXISTS iceberg.spark_iceberg_db_hms.types_one_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) PARTITIONED BY (dt STRING) USING iceberg TBLPROPERTIES ( 'format-version'='2');
|
||||
--CREATE TABLE IF NOT EXISTS iceberg.spark_iceberg_db_hms.types_multi_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN ) PARTITIONED BY (dt STRING, hms_timstamp TIMESTAMP, hms_date DATE) USING iceberg TBLPROPERTIES ( 'format-version'='2');
|
||||
@ -0,0 +1,7 @@
|
||||
insert into hive_dlf_db.types values(123,34,3455,34.667754,"wastxali",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23");
|
||||
insert into hive_dlf_db.types_one_part values(604,376,234,123.478,"aswwas",234.1234,"a23f","wsd",false,"2023-04-23 21:23:34.123","2023-04-23","2023-04-22");
|
||||
insert into hive_dlf_db.types_one_part values(223,22,234,234.500,"awsali",234.1234,"a23f","1234vb",true,"2023-04-22 21:21:34.123","2023-04-21","2023-04-24");
|
||||
insert into hive_dlf_db.types_multi_part values(1234,346,234,123.65567,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-20","2023-04-21 19:23:34.123","2023-04-19");
|
||||
insert into hive_dlf_db.types_multi_part values(3212343,34,234,123.730,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-20","2023-04-22 20:23:34.123","2023-04-22");
|
||||
insert into hive_dlf_db.types_multi_part values(355,22,990,123.324235,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-21","2023-04-22 20:23:34.123","2023-04-22");
|
||||
insert into hive_dlf_db.types_multi_part values(23675,22,986,123.324235,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-25","2023-04-21 19:23:34.123","2023-04-24");
|
||||
@ -0,0 +1,15 @@
|
||||
insert into hive_hms_db.types values(1123,5126,51,4534.63463,"wastxali",235.2351,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23");
|
||||
insert into hive_hms_db.types_one_part values(23621,23,234,345.12512356,"aswwas",525.2352,"a23f","wsd",false,"2023-04-23 21:23:34.123","2023-04-23","2023-04-22");
|
||||
insert into hive_hms_db.types_one_part values(11625,62,234,2347.6236,"awsali",546.2342,"a23f","1234vb",true,"2023-04-22 21:21:34.123","2023-04-21","2023-04-24");
|
||||
insert into hive_hms_db.types_multi_part values(123,66,234,13.1242,"hwaws",3463.4363,"a23f","1234vb",true,"2023-04-20","2023-04-21 19:23:34.123","2023-04-19");
|
||||
insert into hive_hms_db.types_multi_part values(324,77,234,123.163446,"hwaws",345.3413,"a23f","1234vb",true,"2023-04-20","2023-04-22 20:23:34.123","2023-04-22");
|
||||
insert into hive_hms_db.types_multi_part values(423,909,234,123657.512,"hwaws",234.2363,"a23f","1234vb",true,"2023-04-21","2023-04-22 20:23:34.123","2023-04-22");
|
||||
insert into hive_hms_db.types_multi_part values(343,712,234,1234.21451,"hwaws",3564.8945,"a23f","1234vb",true,"2023-04-25","2023-04-21 19:23:34.123","2023-04-24");
|
||||
|
||||
insert into hive_iceberg_db_hms.types values(123,22,234,123.324235,"wsawh",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23");
|
||||
insert into hive_iceberg_db_hms.types_one_part values(223,22,234,123.324235,"aswwas",234.1234,"a23f","wsd",false,"2023-04-23 21:23:34.123","2023-04-23","2023-04-22");
|
||||
insert into hive_iceberg_db_hms.types_one_part values(223,22,234,123.324235,"awsali",234.1234,"a23f","1234vb",true,"2023-04-22 21:21:34.123","2023-04-21","2023-04-24");
|
||||
insert into hive_iceberg_db_hms.types_multi_part values(323,22,234,123.324235,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-20","2023-04-21 19:23:34.123","2023-04-19");
|
||||
insert into hive_iceberg_db_hms.types_multi_part values(323,22,234,123.324235,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-20","2023-04-22 20:23:34.123","2023-04-22");
|
||||
insert into hive_iceberg_db_hms.types_multi_part values(323,22,234,123.324235,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-21","2023-04-22 20:23:34.123","2023-04-22");
|
||||
insert into hive_iceberg_db_hms.types_multi_part values(323,22,234,123.324235,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-25","2023-04-21 19:23:34.123","2023-04-24");
|
||||
@ -0,0 +1,10 @@
|
||||
insert into spark_hms_db.types values(123,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23");
|
||||
insert into spark_hms_db.types_one_part values(223,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23 21:23:34","2023-04-23");
|
||||
insert into spark_hms_db.types_one_part values(223,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23 21:23:34","2023-04-23");
|
||||
insert into spark_hms_db.types_multi_part values(323,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",true,"2023-04-23","2023-04-23 21:23:34.123","2023-04-23");
|
||||
insert into spark_hms_db.types_multi_part values(323,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",true,"2023-04-23","2023-04-23 21:23:34.123","2023-04-23");
|
||||
insert into spark_hms_db.types_multi_part values(323,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",true,"2023-04-23","2023-04-23 21:23:34.123","2023-04-23");
|
||||
|
||||
--insert into iceberg.spark_iceberg_db_hms.types values(123,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23");
|
||||
--insert into iceberg.spark_iceberg_db_hms.types_one_part values(223,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23","2023-04-23");
|
||||
--insert into iceberg.spark_iceberg_db_hms.types_multi_part values(323,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",true,"2023-04-23","2023-04-23 21:23:34.123","2023-04-23");
|
||||
15
tools/emr_storage_regression/ping_test/ping.sql
Normal file
15
tools/emr_storage_regression/ping_test/ping.sql
Normal file
@ -0,0 +1,15 @@
|
||||
select * from spark_hms_db.types;
|
||||
select * from spark_hms_db.types_one_part;
|
||||
select * from spark_hms_db.types_multi_part;
|
||||
|
||||
select * from hive_hms_db.types;
|
||||
select * from hive_hms_db.types_one_part;
|
||||
select * from hive_hms_db.types_multi_part;
|
||||
|
||||
select * from spark_iceberg_db_hms.types;
|
||||
select * from spark_iceberg_db_hms.types_one_part;
|
||||
select * from spark_iceberg_db_hms.types_multi_part;
|
||||
|
||||
select * from hive_iceberg_db_hms.types;
|
||||
select * from hive_iceberg_db_hms.types_one_part;
|
||||
select * from hive_iceberg_db_hms.types_multi_part;
|
||||
23
tools/emr_storage_regression/ping_test/ping_aliyun.sql
Normal file
23
tools/emr_storage_regression/ping_test/ping_aliyun.sql
Normal file
@ -0,0 +1,23 @@
|
||||
select * from spark_hms_db.types;
|
||||
select * from spark_hms_db.types_one_part;
|
||||
select * from spark_hms_db.types_multi_part;
|
||||
|
||||
select * from hive_hms_db.types;
|
||||
select * from hive_hms_db.types_one_part;
|
||||
select * from hive_hms_db.types_multi_part;
|
||||
|
||||
select * from spark_iceberg_db_hms.types;
|
||||
select * from spark_iceberg_db_hms.types_one_part;
|
||||
select * from spark_iceberg_db_hms.types_multi_part;
|
||||
|
||||
select * from hive_iceberg_db_hms.types;
|
||||
select * from hive_iceberg_db_hms.types_one_part;
|
||||
select * from hive_iceberg_db_hms.types_multi_part;
|
||||
|
||||
select * from hive_dlf_db.types;
|
||||
select * from hive_dlf_db.types_one_part;
|
||||
select * from hive_dlf_db.types_multi_part;
|
||||
|
||||
select * from hive_iceberg_db_dlf.types;
|
||||
select * from hive_iceberg_db_dlf.types_one_part;
|
||||
select * from hive_iceberg_db_dlf.types_multi_part;
|
||||
151
tools/emr_storage_regression/ping_test/ping_poc.sh
Normal file
151
tools/emr_storage_regression/ping_test/ping_poc.sh
Normal file
@ -0,0 +1,151 @@
|
||||
#!/usr/bin/env bash
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
##############################################################
|
||||
# See emr_tools.sh
|
||||
##############################################################
|
||||
|
||||
## Step 1: create external table and import data
|
||||
ENDPOINT=$1
|
||||
REGION=$2
|
||||
SERVICE=$3
|
||||
AK=$4
|
||||
SK=$5
|
||||
HMS_META_URI=$6
|
||||
HMS_WAREHOUSE=$7
|
||||
BEELINE_URI=$8
|
||||
|
||||
# set global env to local
|
||||
# shellcheck disable=SC2269
|
||||
FE_HOST=${FE_HOST}
|
||||
# shellcheck disable=SC2269
|
||||
FE_QUERY_PORT=${FE_QUERY_PORT}
|
||||
# shellcheck disable=SC2269
|
||||
USER=${USER}
|
||||
|
||||
DLF_ENDPOINT=datalake-vpc.cn-beijing.aliyuncs.com
|
||||
JINDO_ENDPOINT=cn-beijing.oss-dls.aliyuncs.com
|
||||
|
||||
if [[ -z ${HMS_WAREHOUSE} ]]; then
|
||||
echo "Need warehouse for ${SERVICE}"
|
||||
fi
|
||||
cd "$(dirname "$0")" || gexit
|
||||
|
||||
run_spark_create_sql() {
|
||||
if [[ ${SERVICE} == 'ali' ]]; then
|
||||
PARAM="--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
|
||||
--conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog \
|
||||
--conf spark.sql.catalog.iceberg.catalog-impl=org.apache.iceberg.aliyun.dlf.hive.DlfCatalog \
|
||||
--conf spark.sql.catalog.iceberg.access.key.id=${AK} \
|
||||
--conf spark.sql.catalog.iceberg.access.key.secret=${SK} \
|
||||
--conf spark.sql.catalog.iceberg.dlf.endpoint=${DLF_ENDPOINT} \
|
||||
--conf spark.sql.catalog.iceberg.dlf.region-id=${REGION} \
|
||||
--conf spark.sql.catalog.hms=org.apache.iceberg.spark.SparkCatalog \
|
||||
--conf spark.sql.catalog.hms.type=hive \
|
||||
--conf spark.sql.defaultCatalog=hms \
|
||||
--conf spark.sql.catalog.hms.warehouse=${HMS_WAREHOUSE} \
|
||||
-f data/create_spark_ping.sql" 2>spark_create.log
|
||||
elif [[ ${SERVICE} == 'tx' ]]; then
|
||||
PARAM="--jars /usr/local/service/iceberg/iceberg-spark-runtime-3.2_2.12-0.13.1.jar \
|
||||
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
|
||||
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
|
||||
--conf spark.sql.catalog.spark_catalog.type=hive \
|
||||
--conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
|
||||
--conf spark.sql.catalog.local.type=hadoop \
|
||||
--conf spark.sql.catalog.local.warehouse=/usr/hive/warehouse \
|
||||
-f data/create_spark_ping.sql" 2>spark_create.log
|
||||
elif [[ ${SERVICE} == 'hw' ]]; then
|
||||
PARAM="-f data/create_spark_ping.sql" 2>spark_create.log
|
||||
else
|
||||
echo "Unknown service type: ${SERVICE}"
|
||||
exit 1
|
||||
fi
|
||||
eval spark-sql "${PARAM}"
|
||||
}
|
||||
|
||||
run_spark_create_sql
|
||||
run_hive_create_sql() {
|
||||
if [[ ${SERVICE} == 'hw' ]]; then
|
||||
beeline -u "${BEELINE_URI}" -f data/create_hive_ping.sql 2>hive_create.log
|
||||
elif [[ ${SERVICE} == 'ali' ]]; then
|
||||
hive -f data/create_hive_ping.sql 2>hive_create.log
|
||||
else
|
||||
hive -f data/create_hive_ping.sql 2>hive_create.log
|
||||
fi
|
||||
}
|
||||
|
||||
run_hive_create_sql
|
||||
|
||||
## Step 2: make ping data
|
||||
spark-sql -f data/data_for_spark.sql >>spark_data.log
|
||||
hive -f data/data_for_hive.sql >>hive_data.log
|
||||
|
||||
run_query() {
|
||||
QUERY_NUM=1
|
||||
TRIES=2
|
||||
sql_file=$1
|
||||
catalog=$2
|
||||
while read -r query; do
|
||||
echo -n "create catalog ${QUERY_NUM},"
|
||||
for i in $(seq 1 "${TRIES}"); do
|
||||
if [[ -n ${catalog} ]]; then
|
||||
query="switch ${catalog};${query}"
|
||||
fi
|
||||
RES=$(mysql -vvv -h"${FE_HOST}" -u"${USER}" -P"${FE_QUERY_PORT}" -e "${query}")
|
||||
echo -n "${RES}"
|
||||
[[ "${i}" != "${TRIES}" ]] && echo -n ","
|
||||
done
|
||||
QUERY_NUM=$((QUERY_NUM + 1))
|
||||
done <"${sql_file}"
|
||||
}
|
||||
|
||||
## Step 3: create external catalog in doris
|
||||
# shellcheck disable=SC2094
|
||||
case "${SERVICE}" in
|
||||
ali)
|
||||
sed -e 's#DLF_ENDPOINT#'"${DLF_ENDPOINT}"'#g' emr_catalog.sql >emr_catalog.sql
|
||||
sed -e 's#JINDO_ENDPOINT#'"${JINDO_ENDPOINT}"'#g' emr_catalog.sql >emr_catalog.sql
|
||||
sed -e 's#ENDPOINT#'"${ENDPOINT}"'#g' -e 's#META_URI#'"${HMS_META_URI}"'#g' -e 's#AK_INPUT#'"${AK}"'#g' -e 's#SK_INPUT#'"${SK}"'#g' create_catalog_aliyun.sql >emr_catalog.sql
|
||||
;;
|
||||
tx)
|
||||
sed -e 's#ENDPOINT#'"${ENDPOINT}"'#g' -e 's#META_URI#'"${HMS_META_URI}"'#g' -e 's#AK_INPUT#'"${AK}"'#g' -e 's#SK_INPUT#'"${SK}"'#g' create_catalog_tx.sql >emr_catalog.sql
|
||||
;;
|
||||
aws)
|
||||
sed -e 's#ENDPOINT#'"${ENDPOINT}"'#g' -e 's#META_URI#'"${HMS_META_URI}"'#g' -e 's#AK_INPUT#'"${AK}"'#g' -e 's#SK_INPUT#'"${SK}"'#g' create_catalog_aws.sql >emr_catalog.sql
|
||||
;;
|
||||
hw)
|
||||
sed -e 's#ENDPOINT#'"${ENDPOINT}"'#g' -e 's#META_URI#'"${HMS_META_URI}"'#g' -e 's#AK_INPUT#'"${AK}"'#g' -e 's#SK_INPUT#'"${SK}"'#g' create_catalog_hw.sql >emr_catalog.sql
|
||||
;;
|
||||
*)
|
||||
echo "Internal error"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
|
||||
run_query emr_catalog.sql
|
||||
|
||||
## Step 4: query ping
|
||||
EMR_CATALOG=$(awk '{print $6}' emr_catalog.sql)
|
||||
# shellcheck disable=SC2116
|
||||
# required echo here, or the EMR_CATALOG will not be split.
|
||||
for c in $(echo "${EMR_CATALOG}"); do
|
||||
if [[ ${SERVICE} == 'ali' ]]; then
|
||||
run_query ping_aliyun.sql "${c}"
|
||||
fi
|
||||
run_query ping.sql "${c}"
|
||||
done
|
||||
69
tools/emr_storage_regression/standard_set/analyze.sql
Normal file
69
tools/emr_storage_regression/standard_set/analyze.sql
Normal file
@ -0,0 +1,69 @@
|
||||
analyze table ssb100_parquet.customer;
|
||||
analyze table ssb100_parquet.dates;
|
||||
analyze table ssb100_parquet.lineorder;
|
||||
analyze table ssb100_parquet.lineorder_flat;
|
||||
analyze table ssb100_parquet.part;
|
||||
analyze table ssb100_parquet.supplier;
|
||||
|
||||
analyze table ssb100_orc.customer;
|
||||
analyze table ssb100_orc.dates;
|
||||
analyze table ssb100_orc.lineorder;
|
||||
analyze table ssb100_orc.lineorder_flat;
|
||||
analyze table ssb100_orc.part;
|
||||
analyze table ssb100_orc.supplier;
|
||||
|
||||
analyze table tpch100_parquet.customer;
|
||||
analyze table tpch100_parquet.lineitem;
|
||||
analyze table tpch100_parquet.nation;
|
||||
analyze table tpch100_parquet.orders;
|
||||
analyze table tpch100_parquet.part;
|
||||
analyze table tpch100_parquet.partsupp;
|
||||
analyze table tpch100_parquet.region;
|
||||
analyze table tpch100_parquet.supplier;
|
||||
|
||||
analyze table tpch100_orc.customer;
|
||||
analyze table tpch100_orc.lineitem;
|
||||
analyze table tpch100_orc.nation;
|
||||
analyze table tpch100_orc.orders;
|
||||
analyze table tpch100_orc.part;
|
||||
analyze table tpch100_orc.partsupp;
|
||||
analyze table tpch100_orc.region;
|
||||
analyze table tpch100_orc.supplier;
|
||||
|
||||
analyze table clickbench_orc.hits;
|
||||
analyze table clickbench_parquet.hits;
|
||||
|
||||
analyze table ssb100_parquet_hdfs.customer;
|
||||
analyze table ssb100_parquet_hdfs.dates;
|
||||
analyze table ssb100_parquet_hdfs.lineorder;
|
||||
analyze table ssb100_parquet_hdfs.lineorder_flat;
|
||||
analyze table ssb100_parquet_hdfs.part;
|
||||
analyze table ssb100_parquet_hdfs.supplier;
|
||||
|
||||
analyze table ssb100_orc_hdfs.customer;
|
||||
analyze table ssb100_orc_hdfs.dates;
|
||||
analyze table ssb100_orc_hdfs.lineorder;
|
||||
analyze table ssb100_orc_hdfs.lineorder_flat;
|
||||
analyze table ssb100_orc_hdfs.part;
|
||||
analyze table ssb100_orc_hdfs.supplier;
|
||||
|
||||
analyze table tpch100_orc_hdfs.customer;
|
||||
analyze table tpch100_orc_hdfs.lineitem;
|
||||
analyze table tpch100_orc_hdfs.nation;
|
||||
analyze table tpch100_orc_hdfs.orders;
|
||||
analyze table tpch100_orc_hdfs.part;
|
||||
analyze table tpch100_orc_hdfs.partsupp;
|
||||
analyze table tpch100_orc_hdfs.region;
|
||||
analyze table tpch100_orc_hdfs.supplier;
|
||||
|
||||
analyze table tpch100_parquet_hdfs.customer;
|
||||
analyze table tpch100_parquet_hdfs.lineitem;
|
||||
analyze table tpch100_parquet_hdfs.nation;
|
||||
analyze table tpch100_parquet_hdfs.orders;
|
||||
analyze table tpch100_parquet_hdfs.part;
|
||||
analyze table tpch100_parquet_hdfs.partsupp;
|
||||
analyze table tpch100_parquet_hdfs.region;
|
||||
analyze table tpch100_parquet_hdfs.supplier;
|
||||
|
||||
analyze table clickbench_hdfs_orc.hits;
|
||||
analyze table clickbench_hdfs_parquet.hits;
|
||||
@ -0,0 +1,38 @@
|
||||
#!/usr/bin/env bash
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
##############################################################
|
||||
# See emr_tools.sh
|
||||
##############################################################
|
||||
|
||||
# shellcheck disable=SC2129
|
||||
BUCKET=$1
|
||||
TYPE=$2
|
||||
cd "$(dirname "$0")" || exit
|
||||
sh gen_tbl/gen_ssb_create_sql.sh "${BUCKET}"/ssb/ssb100_orc ssb100_orc_"${TYPE}" orc >create_"${TYPE}".sql
|
||||
sh gen_tbl/gen_ssb_create_sql.sh "${BUCKET}"/ssb/ssb100_parquet ssb100_parquet_"${TYPE}" parquet >>create_"${TYPE}".sql
|
||||
# tpch
|
||||
sh gen_tbl/gen_tpch_create_sql.sh "${BUCKET}"/tpch/tpch100_orc tpch100_orc_"${TYPE}" orc >>create_"${TYPE}".sql
|
||||
sh gen_tbl/gen_tpch_create_sql.sh "${BUCKET}"/tpch/tpch100_parquet tpch100_parquet_"${TYPE}" parquet >>create_"${TYPE}".sql
|
||||
# clickbench
|
||||
sh gen_tbl/gen_clickbench_create_sql.sh "${BUCKET}"/clickbench/hits_parquet clickbench_parquet_"${TYPE}" parquet >>create_"${TYPE}".sql
|
||||
sh gen_tbl/gen_clickbench_create_sql.sh "${BUCKET}"/clickbench/hits_orc clickbench_orc_"${TYPE}" orc >>create_"${TYPE}".sql
|
||||
# iceberg
|
||||
# sh gen_tbl/gen_ssb_create_sql.sh oss://benchmark-oss/ssb/ssb100_iceberg ssb100_iceberg iceberg >> create_"${TYPE}".sql
|
||||
# sh gen_tbl/gen_tpch_create_sql.sh oss://benchmark-oss/tpch/tpch100_iceberg tpch100_iceberg iceberg >> create_"${TYPE}".sql
|
||||
# sh gen_tbl/gen_clickbench_create_sql.sh oss://benchmark-oss/clickbench/hits_iceberg clickbench_iceberg_hdfs >> create_"${TYPE}".sql
|
||||
@ -0,0 +1,153 @@
|
||||
#!/usr/bin/env bash
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
##############################################################
|
||||
# See emr_tools.sh
|
||||
##############################################################
|
||||
|
||||
if [[ -z "$1" ]]; then
|
||||
echo 'the first argument is database location'
|
||||
exit
|
||||
else
|
||||
db_loc=$1
|
||||
fi
|
||||
|
||||
if [[ -z "$2" ]]; then
|
||||
echo 'the second argument is database name'
|
||||
exit
|
||||
else
|
||||
db=$2
|
||||
fi
|
||||
if [[ -z "$3" ]]; then
|
||||
format=parquet
|
||||
else
|
||||
format=$3
|
||||
fi
|
||||
# shellcheck disable=SC2016
|
||||
echo '
|
||||
CREATE DATABASE IF NOT EXISTS '"${db}"';
|
||||
USE '"${db}"';
|
||||
CREATE TABLE IF NOT EXISTS `hits`(
|
||||
`WatchID` BIGINT,
|
||||
`JavaEnable` SMALLINT,
|
||||
`Title` STRING,
|
||||
`GoodEvent` SMALLINT,
|
||||
`EventTime` TIMESTAMP,
|
||||
`EventDate` DATE,
|
||||
`CounterID` INT,
|
||||
`ClientIP` INT,
|
||||
`RegionID` INT,
|
||||
`UserID` BIGINT,
|
||||
`CounterClass` SMALLINT,
|
||||
`OS` SMALLINT,
|
||||
`UserAgent` SMALLINT,
|
||||
`URL` STRING,
|
||||
`Referer` STRING,
|
||||
`IsRefresh` SMALLINT,
|
||||
`RefererCategoryID` SMALLINT,
|
||||
`RefererRegionID` INT,
|
||||
`URLCategoryID` SMALLINT,
|
||||
`URLRegionID` INT,
|
||||
`ResolutionWidth` SMALLINT,
|
||||
`ResolutionHeight` SMALLINT,
|
||||
`ResolutionDepth` SMALLINT,
|
||||
`FlashMajor` SMALLINT,
|
||||
`FlashMinor` SMALLINT,
|
||||
`FlashMinor2` STRING,
|
||||
`NetMajor` SMALLINT,
|
||||
`NetMinor` SMALLINT,
|
||||
`UserAgentMajor` SMALLINT,
|
||||
`UserAgentMinor` STRING,
|
||||
`CookieEnable` SMALLINT,
|
||||
`JavascriptEnable` SMALLINT,
|
||||
`IsMobile` SMALLINT,
|
||||
`MobilePhone` SMALLINT,
|
||||
`MobilePhoneModel` STRING,
|
||||
`Params` STRING,
|
||||
`IPNetworkID` INT,
|
||||
`TraficSourceID` SMALLINT,
|
||||
`SearchEngineID` SMALLINT,
|
||||
`SearchPhrase` STRING,
|
||||
`AdvEngineID` SMALLINT,
|
||||
`IsArtifical` SMALLINT,
|
||||
`WindowClientWidth` SMALLINT,
|
||||
`WindowClientHeight` SMALLINT,
|
||||
`ClientTimeZone` SMALLINT,
|
||||
`ClientEventTime` TIMESTAMP,
|
||||
`SilverlightVersion1` SMALLINT,
|
||||
`SilverlightVersion2` SMALLINT,
|
||||
`SilverlightVersion3` INT,
|
||||
`SilverlightVersion4` SMALLINT,
|
||||
`PageCharset` STRING,
|
||||
`CodeVersion` INT,
|
||||
`IsLink` SMALLINT,
|
||||
`IsDownload` SMALLINT,
|
||||
`IsNotBounce` SMALLINT,
|
||||
`FUniqID` BIGINT,
|
||||
`OriginalURL` STRING,
|
||||
`HID` INT,
|
||||
`IsOldCounter` SMALLINT,
|
||||
`IsEvent` SMALLINT,
|
||||
`IsParameter` SMALLINT,
|
||||
`DontCountHits` SMALLINT,
|
||||
`WithHash` SMALLINT,
|
||||
`HitColor` STRING,
|
||||
`LocalEventTime` TIMESTAMP,
|
||||
`Age` SMALLINT,
|
||||
`Sex` SMALLINT,
|
||||
`Income` SMALLINT,
|
||||
`Interests` SMALLINT,
|
||||
`Robotness` SMALLINT,
|
||||
`RemoteIP` INT,
|
||||
`WindowName` INT,
|
||||
`OpenerName` INT,
|
||||
`HistoryLength` SMALLINT,
|
||||
`BrowserLanguage` STRING,
|
||||
`BrowserCountry` STRING,
|
||||
`SocialNetwork` STRING,
|
||||
`SocialAction` STRING,
|
||||
`HTTPError` SMALLINT,
|
||||
`SendTiming` INT,
|
||||
`DNSTiming` INT,
|
||||
`ConnectTiming` INT,
|
||||
`ResponseStartTiming` INT,
|
||||
`ResponseEndTiming` INT,
|
||||
`FetchTiming` INT,
|
||||
`SocialSourceNetworkID` SMALLINT,
|
||||
`SocialSourcePage` STRING,
|
||||
`ParamPrice` BIGINT,
|
||||
`ParamOrderID` STRING,
|
||||
`ParamCurrency` STRING,
|
||||
`ParamCurrencyID` SMALLINT,
|
||||
`OpenstatServiceName` STRING,
|
||||
`OpenstatCampaignID` STRING,
|
||||
`OpenstatAdID` STRING,
|
||||
`OpenstatSourceID` STRING,
|
||||
`UTMSource` STRING,
|
||||
`UTMMedium` STRING,
|
||||
`UTMCampaign` STRING,
|
||||
`UTMContent` STRING,
|
||||
`UTMTerm` STRING,
|
||||
`FromTag` STRING,
|
||||
`HasGCLID` SMALLINT,
|
||||
`RefererHash` BIGINT,
|
||||
`URLHash` BIGINT,
|
||||
`CLID` INT)
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"'";
|
||||
'
|
||||
@ -0,0 +1,166 @@
|
||||
#!/usr/bin/env bash
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
##############################################################
|
||||
# See emr_tools.sh
|
||||
##############################################################
|
||||
|
||||
if [[ -z "$1" ]]; then
|
||||
echo 'the first argument is database location'
|
||||
exit
|
||||
else
|
||||
db_loc=$1
|
||||
fi
|
||||
|
||||
if [[ -z "$2" ]]; then
|
||||
echo 'the second argument is database name'
|
||||
exit
|
||||
else
|
||||
db=$2
|
||||
fi
|
||||
|
||||
if [[ -z "$3" ]]; then
|
||||
format=parquet
|
||||
else
|
||||
format=$3
|
||||
fi
|
||||
# shellcheck disable=SC2016
|
||||
echo '
|
||||
CREATE DATABASE IF NOT EXISTS '"${db}"';
|
||||
USE '"${db}"';
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `customer`(
|
||||
`c_custkey` BIGINT COMMENT "",
|
||||
`c_name` VARCHAR(26) COMMENT "",
|
||||
`c_address` VARCHAR(41) COMMENT "",
|
||||
`c_city` VARCHAR(11) COMMENT "",
|
||||
`c_nation` VARCHAR(16) COMMENT "",
|
||||
`c_region` VARCHAR(13) COMMENT "",
|
||||
`c_phone` VARCHAR(16) COMMENT "",
|
||||
`c_mktsegment` VARCHAR(11) COMMENT "")
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/customer'";
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `dates`(
|
||||
`d_datekey` BIGINT COMMENT "",
|
||||
`d_date` VARCHAR(20) COMMENT "",
|
||||
`d_dayofweek` VARCHAR(10) COMMENT "",
|
||||
`d_month` VARCHAR(11) COMMENT "",
|
||||
`d_year` BIGINT COMMENT "",
|
||||
`d_yearmonthnum` BIGINT COMMENT "",
|
||||
`d_yearmonth` VARCHAR(9) COMMENT "",
|
||||
`d_daynuminweek` BIGINT COMMENT "",
|
||||
`d_daynuminmonth` BIGINT COMMENT "",
|
||||
`d_daynuminyear` BIGINT COMMENT "",
|
||||
`d_monthnuminyear` BIGINT COMMENT "",
|
||||
`d_weeknuminyear` BIGINT COMMENT "",
|
||||
`d_sellingseason` VARCHAR(14) COMMENT "",
|
||||
`d_lastdayinweekfl` BIGINT COMMENT "",
|
||||
`d_lastdayinmonthfl` BIGINT COMMENT "",
|
||||
`d_holidayfl` BIGINT COMMENT "",
|
||||
`d_weekdayfl` BIGINT COMMENT "")
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/dates'";
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `lineorder`(
|
||||
`lo_orderkey` BIGINT COMMENT "",
|
||||
`lo_linenumber` BIGINT COMMENT "",
|
||||
`lo_custkey` BIGINT COMMENT "",
|
||||
`lo_partkey` BIGINT COMMENT "",
|
||||
`lo_suppkey` BIGINT COMMENT "",
|
||||
`lo_orderdate` BIGINT COMMENT "",
|
||||
`lo_orderpriority` VARCHAR(16) COMMENT "",
|
||||
`lo_shippriority` BIGINT COMMENT "",
|
||||
`lo_quantity` BIGINT COMMENT "",
|
||||
`lo_extendedprice` BIGINT COMMENT "",
|
||||
`lo_ordtotalprice` BIGINT COMMENT "",
|
||||
`lo_discount` BIGINT COMMENT "",
|
||||
`lo_revenue` BIGINT COMMENT "",
|
||||
`lo_supplycost` BIGINT COMMENT "",
|
||||
`lo_tax` BIGINT COMMENT "",
|
||||
`lo_commitdate` BIGINT COMMENT "",
|
||||
`lo_shipmode` VARCHAR(11) COMMENT "")
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/lineorder'";
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `part`(
|
||||
`p_partkey` BIGINT COMMENT "",
|
||||
`p_name` VARCHAR(23) COMMENT "",
|
||||
`p_mfgr` VARCHAR(7) COMMENT "",
|
||||
`p_category` VARCHAR(8) COMMENT "",
|
||||
`p_brand` VARCHAR(10) COMMENT "",
|
||||
`p_color` VARCHAR(12) COMMENT "",
|
||||
`p_type` VARCHAR(26) COMMENT "",
|
||||
`p_size` BIGINT COMMENT "",
|
||||
`p_container` VARCHAR(11) COMMENT "")
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/part'";
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `supplier`(
|
||||
`s_suppkey` BIGINT COMMENT "",
|
||||
`s_name` VARCHAR(26) COMMENT "",
|
||||
`s_address` VARCHAR(26) COMMENT "",
|
||||
`s_city` VARCHAR(11) COMMENT "",
|
||||
`s_nation` VARCHAR(16) COMMENT "",
|
||||
`s_region` VARCHAR(13) COMMENT "",
|
||||
`s_phone` VARCHAR(16) COMMENT "")
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/supplier'";
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `lineorder_flat` (
|
||||
`lo_orderdate` BIGINT COMMENT "",
|
||||
`lo_orderkey` BIGINT COMMENT "",
|
||||
`lo_linenumber` TINYINT COMMENT "",
|
||||
`lo_custkey` BIGINT COMMENT "",
|
||||
`lo_partkey` BIGINT COMMENT "",
|
||||
`lo_suppkey` BIGINT COMMENT "",
|
||||
`lo_orderpriority` VARCHAR(100) COMMENT "",
|
||||
`lo_shippriority` TINYINT COMMENT "",
|
||||
`lo_quantity` TINYINT COMMENT "",
|
||||
`lo_extendedprice` BIGINT COMMENT "",
|
||||
`lo_ordtotalprice` BIGINT COMMENT "",
|
||||
`lo_discount` TINYINT COMMENT "",
|
||||
`lo_revenue` BIGINT COMMENT "",
|
||||
`lo_supplycost` BIGINT COMMENT "",
|
||||
`lo_tax` TINYINT COMMENT "",
|
||||
`lo_commitdate` BIGINT COMMENT "",
|
||||
`lo_shipmode` VARCHAR(100) COMMENT "",
|
||||
`c_name` VARCHAR(100) COMMENT "",
|
||||
`c_address` VARCHAR(100) COMMENT "",
|
||||
`c_city` VARCHAR(100) COMMENT "",
|
||||
`c_nation` VARCHAR(100) COMMENT "",
|
||||
`c_region` VARCHAR(100) COMMENT "",
|
||||
`c_phone` VARCHAR(100) COMMENT "",
|
||||
`c_mktsegment` VARCHAR(100) COMMENT "",
|
||||
`s_name` VARCHAR(100) COMMENT "",
|
||||
`s_address` VARCHAR(100) COMMENT "",
|
||||
`s_city` VARCHAR(100) COMMENT "",
|
||||
`s_nation` VARCHAR(100) COMMENT "",
|
||||
`s_region` VARCHAR(100) COMMENT "",
|
||||
`s_phone` VARCHAR(100) COMMENT "",
|
||||
`p_name` VARCHAR(100) COMMENT "",
|
||||
`p_mfgr` VARCHAR(100) COMMENT "",
|
||||
`p_category` VARCHAR(100) COMMENT "",
|
||||
`p_brand` VARCHAR(100) COMMENT "",
|
||||
`p_color` VARCHAR(100) COMMENT "",
|
||||
`p_type` VARCHAR(100) COMMENT "",
|
||||
`p_size` TINYINT COMMENT "",
|
||||
`p_container` VARCHAR(100) COMMENT "")
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/lineorder_flat'";
|
||||
'
|
||||
@ -0,0 +1,140 @@
|
||||
#!/usr/bin/env bash
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
##############################################################
|
||||
# See emr_tools.sh
|
||||
##############################################################
|
||||
|
||||
if [[ -z "$1" ]]; then
|
||||
echo 'the first argument is database location'
|
||||
exit
|
||||
else
|
||||
db_loc=$1
|
||||
fi
|
||||
|
||||
if [[ -z "$2" ]]; then
|
||||
echo 'the second argument is database name'
|
||||
exit
|
||||
else
|
||||
db=$2
|
||||
fi
|
||||
|
||||
if [[ -z "$3" ]]; then
|
||||
format=parquet
|
||||
else
|
||||
format=$3
|
||||
fi
|
||||
# shellcheck disable=SC2016
|
||||
echo '
|
||||
CREATE DATABASE IF NOT EXISTS '"${db}"' ;
|
||||
USE '"${db}"';
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `customer`(
|
||||
`c_custkey` int,
|
||||
`c_name` string,
|
||||
`c_address` string,
|
||||
`c_nationkey` int,
|
||||
`c_phone` string,
|
||||
`c_acctbal` decimal(12,2),
|
||||
`c_mktsegment` string,
|
||||
`c_comment` string)
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/customer'";
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `lineitem`(
|
||||
`l_orderkey` int,
|
||||
`l_partkey` int,
|
||||
`l_suppkey` int,
|
||||
`l_linenumber` int,
|
||||
`l_quantity` decimal(12,2),
|
||||
`l_extendedprice` decimal(12,2),
|
||||
`l_discount` decimal(12,2),
|
||||
`l_tax` decimal(12,2),
|
||||
`l_returnflag` string,
|
||||
`l_linestatus` string,
|
||||
`l_shipdate` date,
|
||||
`l_commitdate` date,
|
||||
`l_receiptdate` date,
|
||||
`l_shipinstruct` string,
|
||||
`l_shipmode` string,
|
||||
`l_comment` string)
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/lineitem'";
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `nation`(
|
||||
`n_nationkey` int,
|
||||
`n_name` string,
|
||||
`n_regionkey` int,
|
||||
`n_comment` string)
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/nation'";
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `orders`(
|
||||
`o_orderkey` int,
|
||||
`o_custkey` int,
|
||||
`o_orderstatus` string,
|
||||
`o_totalprice` decimal(12,2),
|
||||
`o_orderdate` date,
|
||||
`o_orderpriority` string,
|
||||
`o_clerk` string,
|
||||
`o_shippriority` int,
|
||||
`o_comment` string)
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/orders'";
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `part`(
|
||||
`p_partkey` int,
|
||||
`p_name` string,
|
||||
`p_mfgr` string,
|
||||
`p_brand` string,
|
||||
`p_type` string,
|
||||
`p_size` int,
|
||||
`p_container` string,
|
||||
`p_retailprice` decimal(12,2),
|
||||
`p_comment` string)
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/part'";
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `partsupp`(
|
||||
`ps_partkey` int,
|
||||
`ps_suppkey` int,
|
||||
`ps_availqty` int,
|
||||
`ps_supplycost` decimal(12,2),
|
||||
`ps_comment` string)
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/partsupp'";
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `region` (
|
||||
`r_regionkey` int,
|
||||
`r_name` string,
|
||||
`r_comment` string)
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/region'";
|
||||
|
||||
CREATE TABLE IF NOT EXISTS `supplier`(
|
||||
`s_suppkey` int,
|
||||
`s_name` string,
|
||||
`s_address` string,
|
||||
`s_nationkey` int,
|
||||
`s_phone` string,
|
||||
`s_acctbal` decimal(12,2),
|
||||
`s_comment` string)
|
||||
USING '"${format}"'
|
||||
LOCATION "'"${db_loc}"/supplier'";
|
||||
|
||||
'
|
||||
@ -0,0 +1,43 @@
|
||||
SELECT COUNT(*) FROM hits;
|
||||
SELECT COUNT(*) FROM hits WHERE AdvEngineID <> 0;
|
||||
SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits;
|
||||
SELECT AVG(UserID) FROM hits;
|
||||
SELECT COUNT(DISTINCT UserID) FROM hits;
|
||||
SELECT COUNT(DISTINCT SearchPhrase) FROM hits;
|
||||
SELECT MIN(EventDate), MAX(EventDate) FROM hits;
|
||||
SELECT AdvEngineID, COUNT(*) FROM hits WHERE AdvEngineID <> 0 GROUP BY AdvEngineID ORDER BY COUNT(*) DESC;
|
||||
SELECT RegionID, COUNT(DISTINCT UserID) AS u FROM hits GROUP BY RegionID ORDER BY u DESC LIMIT 10;
|
||||
SELECT RegionID, SUM(AdvEngineID), COUNT(*) AS c, AVG(ResolutionWidth), COUNT(DISTINCT UserID) FROM hits GROUP BY RegionID ORDER BY c DESC LIMIT 10;
|
||||
SELECT MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhoneModel ORDER BY u DESC LIMIT 10;
|
||||
SELECT MobilePhone, MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhone, MobilePhoneModel ORDER BY u DESC LIMIT 10;
|
||||
SELECT SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
|
||||
SELECT SearchPhrase, COUNT(DISTINCT UserID) AS u FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY u DESC LIMIT 10;
|
||||
SELECT SearchEngineID, SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchEngineID, SearchPhrase ORDER BY c DESC LIMIT 10;
|
||||
SELECT UserID, COUNT(*) FROM hits GROUP BY UserID ORDER BY COUNT(*) DESC LIMIT 10;
|
||||
SELECT UserID, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, SearchPhrase ORDER BY COUNT(*) DESC LIMIT 10;
|
||||
SELECT UserID, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, SearchPhrase LIMIT 10;
|
||||
SELECT UserID, extract(minute FROM EventTime) AS m, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, m, SearchPhrase ORDER BY COUNT(*) DESC LIMIT 10;
|
||||
SELECT UserID FROM hits WHERE UserID = 435090932899640449;
|
||||
SELECT COUNT(*) FROM hits WHERE URL LIKE '%google%';
|
||||
SELECT SearchPhrase, MIN(URL), COUNT(*) AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
|
||||
SELECT SearchPhrase, MIN(URL), MIN(Title), COUNT(*) AS c, COUNT(DISTINCT UserID) FROM hits WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
|
||||
SELECT * FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10;
|
||||
SELECT SearchPhrase FROM hits WHERE SearchPhrase <> '' ORDER BY EventTime LIMIT 10;
|
||||
SELECT SearchPhrase FROM hits WHERE SearchPhrase <> '' ORDER BY SearchPhrase LIMIT 10;
|
||||
SELECT SearchPhrase FROM hits WHERE SearchPhrase <> '' ORDER BY EventTime, SearchPhrase LIMIT 10;
|
||||
SELECT CounterID, AVG(length(URL)) AS l, COUNT(*) AS c FROM hits WHERE URL <> '' GROUP BY CounterID HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
|
||||
SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM hits WHERE Referer <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
|
||||
SELECT SUM(ResolutionWidth), SUM(ResolutionWidth + 1), SUM(ResolutionWidth + 2), SUM(ResolutionWidth + 3), SUM(ResolutionWidth + 4), SUM(ResolutionWidth + 5), SUM(ResolutionWidth + 6), SUM(ResolutionWidth + 7), SUM(ResolutionWidth + 8), SUM(ResolutionWidth + 9), SUM(ResolutionWidth + 10), SUM(ResolutionWidth + 11), SUM(ResolutionWidth + 12), SUM(ResolutionWidth + 13), SUM(ResolutionWidth + 14), SUM(ResolutionWidth + 15), SUM(ResolutionWidth + 16), SUM(ResolutionWidth + 17), SUM(ResolutionWidth + 18), SUM(ResolutionWidth + 19), SUM(ResolutionWidth + 20), SUM(ResolutionWidth + 21), SUM(ResolutionWidth + 22), SUM(ResolutionWidth + 23), SUM(ResolutionWidth + 24), SUM(ResolutionWidth + 25), SUM(ResolutionWidth + 26), SUM(ResolutionWidth + 27), SUM(ResolutionWidth + 28), SUM(ResolutionWidth + 29), SUM(ResolutionWidth + 30), SUM(ResolutionWidth + 31), SUM(ResolutionWidth + 32), SUM(ResolutionWidth + 33), SUM(ResolutionWidth + 34), SUM(ResolutionWidth + 35), SUM(ResolutionWidth + 36), SUM(ResolutionWidth + 37), SUM(ResolutionWidth + 38), SUM(ResolutionWidth + 39), SUM(ResolutionWidth + 40), SUM(ResolutionWidth + 41), SUM(ResolutionWidth + 42), SUM(ResolutionWidth + 43), SUM(ResolutionWidth + 44), SUM(ResolutionWidth + 45), SUM(ResolutionWidth + 46), SUM(ResolutionWidth + 47), SUM(ResolutionWidth + 48), SUM(ResolutionWidth + 49), SUM(ResolutionWidth + 50), SUM(ResolutionWidth + 51), SUM(ResolutionWidth + 52), SUM(ResolutionWidth + 53), SUM(ResolutionWidth + 54), SUM(ResolutionWidth + 55), SUM(ResolutionWidth + 56), SUM(ResolutionWidth + 57), SUM(ResolutionWidth + 58), SUM(ResolutionWidth + 59), SUM(ResolutionWidth + 60), SUM(ResolutionWidth + 61), SUM(ResolutionWidth + 62), SUM(ResolutionWidth + 63), SUM(ResolutionWidth + 64), SUM(ResolutionWidth + 65), SUM(ResolutionWidth + 66), SUM(ResolutionWidth + 67), SUM(ResolutionWidth + 68), SUM(ResolutionWidth + 69), SUM(ResolutionWidth + 70), SUM(ResolutionWidth + 71), SUM(ResolutionWidth + 72), SUM(ResolutionWidth + 73), SUM(ResolutionWidth + 74), SUM(ResolutionWidth + 75), SUM(ResolutionWidth + 76), SUM(ResolutionWidth + 77), SUM(ResolutionWidth + 78), SUM(ResolutionWidth + 79), SUM(ResolutionWidth + 80), SUM(ResolutionWidth + 81), SUM(ResolutionWidth + 82), SUM(ResolutionWidth + 83), SUM(ResolutionWidth + 84), SUM(ResolutionWidth + 85), SUM(ResolutionWidth + 86), SUM(ResolutionWidth + 87), SUM(ResolutionWidth + 88), SUM(ResolutionWidth + 89) FROM hits;
|
||||
SELECT SearchEngineID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits WHERE SearchPhrase <> '' GROUP BY SearchEngineID, ClientIP ORDER BY c DESC LIMIT 10;
|
||||
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits WHERE SearchPhrase <> '' GROUP BY WatchID, ClientIP ORDER BY c DESC LIMIT 10;
|
||||
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits GROUP BY WatchID, ClientIP ORDER BY c DESC LIMIT 10;
|
||||
SELECT URL, COUNT(*) AS c FROM hits GROUP BY URL ORDER BY c DESC LIMIT 10;
|
||||
SELECT 1, URL, COUNT(*) AS c FROM hits GROUP BY 1, URL ORDER BY c DESC LIMIT 10;
|
||||
SELECT ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3, COUNT(*) AS c FROM hits GROUP BY ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3 ORDER BY c DESC LIMIT 10;
|
||||
SELECT URL, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND URL <> '' GROUP BY URL ORDER BY PageViews DESC LIMIT 10;
|
||||
SELECT Title, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND Title <> '' GROUP BY Title ORDER BY PageViews DESC LIMIT 10;
|
||||
SELECT URL, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND IsLink <> 0 AND IsDownload = 0 GROUP BY URL ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
|
||||
SELECT TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID = 0 AND AdvEngineID = 0) THEN Referer ELSE '' END AS Src, URL AS Dst, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 GROUP BY TraficSourceID, SearchEngineID, AdvEngineID, Src, Dst ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
|
||||
SELECT URLHash, EventDate, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND TraficSourceID IN (-1, 6) AND RefererHash = 3594120000172545465 GROUP BY URLHash, EventDate ORDER BY PageViews DESC LIMIT 10 OFFSET 100;
|
||||
SELECT WindowClientWidth, WindowClientHeight, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND DontCountHits = 0 AND URLHash = 2868770270353813622 GROUP BY WindowClientWidth, WindowClientHeight ORDER BY PageViews DESC LIMIT 10 OFFSET 10000;
|
||||
SELECT DATE_FORMAT(EventTime, '%Y-%m-%d %H:%i:00') AS M, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-14' AND EventDate <= '2013-07-15' AND IsRefresh = 0 AND DontCountHits = 0 GROUP BY DATE_FORMAT(EventTime, '%Y-%m-%d %H:%i:00') ORDER BY DATE_FORMAT(EventTime, '%Y-%m-%d %H:%i:00') LIMIT 10 OFFSET 1000;
|
||||
@ -0,0 +1,13 @@
|
||||
SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE LO_ORDERDATE >= 19930101 AND LO_ORDERDATE <= 19931231 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;
|
||||
SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE LO_ORDERDATE >= 19940101 AND LO_ORDERDATE <= 19940131 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;
|
||||
SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE weekofyear(LO_ORDERDATE) = 6 AND LO_ORDERDATE >= 19940101 AND LO_ORDERDATE <= 19941231 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;
|
||||
SELECT SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR, P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BY YEAR, P_BRAND ORDER BY YEAR, P_BRAND;
|
||||
SELECT SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR, P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BY YEAR, P_BRAND ORDER BY YEAR, P_BRAND;
|
||||
SELECT SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR, P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BY YEAR, P_BRAND ORDER BY YEAR, P_BRAND;
|
||||
SELECT C_NATION, S_NATION, (LO_ORDERDATE DIV 10000) AS YEAR, SUM(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND LO_ORDERDATE >= 19920101 AND LO_ORDERDATE <= 19971231 GROUP BY C_NATION, S_NATION, YEAR ORDER BY YEAR ASC, revenue DESC;
|
||||
SELECT C_CITY, S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR, SUM(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND LO_ORDERDATE >= 19920101 AND LO_ORDERDATE <= 19971231 GROUP BY C_CITY, S_CITY, YEAR ORDER BY YEAR ASC, revenue DESC;
|
||||
SELECT C_CITY, S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR, SUM(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_CITY IN ('UNITED KI1', 'UNITED KI5') AND S_CITY IN ('UNITED KI1', 'UNITED KI5') AND LO_ORDERDATE >= 19920101 AND LO_ORDERDATE <= 19971231 GROUP BY C_CITY, S_CITY, YEAR ORDER BY YEAR ASC, revenue DESC;
|
||||
SELECT C_CITY, S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR, SUM(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_CITY IN ('UNITED KI1', 'UNITED KI5') AND S_CITY IN ('UNITED KI1', 'UNITED KI5') AND LO_ORDERDATE >= 19971201 AND LO_ORDERDATE <= 19971231 GROUP BY C_CITY, S_CITY, YEAR ORDER BY YEAR ASC, revenue DESC;
|
||||
SELECT (LO_ORDERDATE DIV 10000) AS YEAR, C_NATION, SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND P_MFGR IN ('MFGR#1', 'MFGR#2') GROUP BY YEAR, C_NATION ORDER BY YEAR ASC, C_NATION ASC;
|
||||
SELECT (LO_ORDERDATE DIV 10000) AS YEAR, S_NATION, P_CATEGORY, SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND LO_ORDERDATE >= 19970101 AND LO_ORDERDATE <= 19981231 AND P_MFGR IN ('MFGR#1', 'MFGR#2') GROUP BY YEAR, S_NATION, P_CATEGORY ORDER BY YEAR ASC, S_NATION ASC, P_CATEGORY ASC;
|
||||
SELECT (LO_ORDERDATE DIV 10000) AS YEAR, S_CITY, P_BRAND, SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND LO_ORDERDATE >= 19970101 AND LO_ORDERDATE <= 19981231 AND P_CATEGORY = 'MFGR#14' GROUP BY YEAR, S_CITY, P_BRAND ORDER BY YEAR ASC, S_CITY ASC, P_BRAND ASC;
|
||||
@ -0,0 +1,13 @@
|
||||
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE FROM lineorder, dates WHERE lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;
|
||||
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE FROM lineorder, dates WHERE lo_orderdate = d_datekey AND d_yearmonth = 'Jan1994' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35;
|
||||
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE FROM lineorder, dates WHERE lo_orderdate = d_datekey AND d_weeknuminyear = 6 AND d_year = 1994 AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35;
|
||||
SELECT SUM(lo_revenue), d_year, p_brand FROM lineorder, dates, part, supplier WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey AND lo_suppkey = s_suppkey AND p_category = 'MFGR#12' AND s_region = 'AMERICA' GROUP BY d_year, p_brand ORDER BY p_brand;
|
||||
SELECT SUM(lo_revenue), d_year, p_brand FROM lineorder, dates, part, supplier WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey AND lo_suppkey = s_suppkey AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228' AND s_region = 'ASIA' GROUP BY d_year, p_brand ORDER BY d_year, p_brand;
|
||||
SELECT SUM(lo_revenue), d_year, p_brand FROM lineorder, dates, part, supplier WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey AND lo_suppkey = s_suppkey AND p_brand = 'MFGR#2239' AND s_region = 'EUROPE' GROUP BY d_year, p_brand ORDER BY d_year, p_brand;
|
||||
SELECT c_nation, s_nation, d_year, SUM(lo_revenue) AS REVENUE FROM customer, lineorder, supplier, dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_orderdate = d_datekey AND c_region = 'ASIA' AND s_region = 'ASIA' AND d_year >= 1992 AND d_year <= 1997 GROUP BY c_nation, s_nation, d_year ORDER BY d_year ASC, REVENUE DESC;
|
||||
SELECT c_city, s_city, d_year, SUM(lo_revenue) AS REVENUE FROM customer, lineorder, supplier, dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_orderdate = d_datekey AND c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND d_year >= 1992 AND d_year <= 1997 GROUP BY c_city, s_city, d_year ORDER BY d_year ASC, REVENUE DESC;
|
||||
SELECT c_city, s_city, d_year, SUM(lo_revenue) AS REVENUE FROM customer, lineorder, supplier, dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_orderdate = d_datekey AND ( c_city = 'UNITED KI1' OR c_city = 'UNITED KI5' ) AND ( s_city = 'UNITED KI1' OR s_city = 'UNITED KI5' ) AND d_year >= 1992 AND d_year <= 1997 GROUP BY c_city, s_city, d_year ORDER BY d_year ASC, REVENUE DESC;
|
||||
SELECT c_city, s_city, d_year, SUM(lo_revenue) AS REVENUE FROM customer, lineorder, supplier, dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_orderdate = d_datekey AND ( c_city = 'UNITED KI1' OR c_city = 'UNITED KI5' ) AND ( s_city = 'UNITED KI1' OR s_city = 'UNITED KI5' ) AND d_yearmonth = 'Dec1997' GROUP BY c_city, s_city, d_year ORDER BY d_year ASC, REVENUE DESC;
|
||||
SELECT d_year, c_nation, SUM(lo_revenue - lo_supplycost) AS PROFIT FROM dates, customer, supplier, part, lineorder WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_partkey = p_partkey AND lo_orderdate = d_datekey AND c_region = 'AMERICA' AND s_region = 'AMERICA' AND ( p_mfgr = 'MFGR#1' OR p_mfgr = 'MFGR#2' ) GROUP BY d_year, c_nation ORDER BY d_year, c_nation;
|
||||
SELECT d_year, s_nation, p_category, SUM(lo_revenue - lo_supplycost) AS PROFIT FROM dates, customer, supplier, part, lineorder WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_partkey = p_partkey AND lo_orderdate = d_datekey AND c_region = 'AMERICA' AND s_region = 'AMERICA' AND ( d_year = 1997 OR d_year = 1998 ) AND ( p_mfgr = 'MFGR#1' OR p_mfgr = 'MFGR#2' ) GROUP BY d_year, s_nation, p_category ORDER BY d_year, s_nation, p_category;
|
||||
SELECT d_year, s_city, p_brand, SUM(lo_revenue - lo_supplycost) AS PROFIT FROM dates, customer, supplier, part, lineorder WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_partkey = p_partkey AND lo_orderdate = d_datekey AND s_nation = 'UNITED STATES' AND ( d_year = 1997 OR d_year = 1998 ) AND p_category = 'MFGR#14' GROUP BY d_year, s_city, p_brand ORDER BY d_year, s_city, p_brand;
|
||||
@ -0,0 +1,22 @@
|
||||
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '90' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;
|
||||
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like '%BRASS' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' ) order by s_acctbal desc, n_name, s_name, p_partkey limit 100;
|
||||
select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = 'BUILDING' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '1995-03-15' and l_shipdate > date '1995-03-15' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10;
|
||||
select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1993-07-01' and o_orderdate < date '1993-07-01' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
|
||||
select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and o_orderdate >= date '1994-01-01' and o_orderdate < date '1994-01-01' + interval '1' year group by n_name order by revenue desc;
|
||||
select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year and l_discount between .06 - 0.01 and .06 + 0.01 and l_quantity < 24;
|
||||
select supp_nation, cust_nation, l_year, sum(volume) as revenue from ( select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE') ) and l_shipdate between date '1995-01-01' and date '1996-12-31' ) as shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year;
|
||||
select o_year, sum(case when nation = 'BRAZIL' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) as volume, n2.n_name as nation from part, supplier, lineitem, orders, customer, nation n1, nation n2, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'ECONOMY ANODIZED STEEL' ) as all_nations group by o_year order by o_year;
|
||||
select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%green%' ) as profit group by nation, o_year order by nation, o_year desc;
|
||||
select c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date '1993-10-01' and o_orderdate < date '1993-10-01' + interval '3' month and l_returnflag = 'R' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue desc limit 20;
|
||||
select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.000002 from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' ) order by value desc;
|
||||
select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count from orders, lineitem where o_orderkey = l_orderkey and l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date '1994-01-01' and l_receiptdate < date '1994-01-01' + interval '1' year group by l_shipmode order by l_shipmode;
|
||||
select c_count, count(*) as custdist from ( select c_custkey, count(o_orderkey) as c_count from customer left outer join orders on c_custkey = o_custkey and o_comment not like '%special%requests%' group by c_custkey ) as c_orders group by c_count order by custdist desc, c_count desc;
|
||||
select 100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue from lineitem, part where l_partkey = p_partkey and l_shipdate >= date '1995-09-01' and l_shipdate < date '1995-09-01' + interval '1' month;
|
||||
with revenue0 as ( select l_suppkey supplier_no, sum(l_extendedprice * (1 - l_discount)) total_revenue from lineitem where l_shipdate >= date '1996-01-01' and l_shipdate < date '1996-01-01' + interval '3' month group by l_suppkey ) select /*+SET_VAR(enable_nereids_planner=true,enable_pipeline_engine=true) */ s_suppkey, s_name, s_address, s_phone, total_revenue from supplier, revenue0 where s_suppkey = supplier_no and total_revenue = ( select max(total_revenue) from revenue0 ) order by s_suppkey;
|
||||
select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#45' and p_type not like 'MEDIUM POLISHED%' and p_size in (49, 14, 23, 45, 19, 3, 36, 9) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size;
|
||||
select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey );
|
||||
select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 300 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate limit 100;
|
||||
select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#12' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 1 and l_quantity <= 1 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#34' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 20 and l_quantity <= 20 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' );
|
||||
select s_name, s_address from supplier, nation where s_suppkey in ( select ps_suppkey from partsupp where ps_partkey in ( select p_partkey from part where p_name like 'forest%' ) and ps_availqty > ( select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year ) ) and s_nationkey = n_nationkey and n_name = 'CANADA' order by s_name;
|
||||
select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'SAUDI ARABIA' group by s_name order by numwait desc, s_name limit 100;
|
||||
select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal from ( select substring(c_phone, 1, 2) as cntrycode, c_acctbal from customer where substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17') and c_acctbal > ( select avg(c_acctbal) from customer where c_acctbal > 0.00 and substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17') ) and not exists ( select * from orders where o_custkey = c_custkey ) ) as custsale group by cntrycode order by cntrycode;
|
||||
46
tools/emr_storage_regression/standard_set/run_queries.sh
Normal file
46
tools/emr_storage_regression/standard_set/run_queries.sh
Normal file
@ -0,0 +1,46 @@
|
||||
#!/usr/bin/env bash
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
##############################################################
|
||||
# See emr_tools.sh
|
||||
##############################################################
|
||||
|
||||
set -e
|
||||
|
||||
FE_HOST=$1
|
||||
USER=$2
|
||||
FE_QUERY_PORT=$3
|
||||
DB=$4
|
||||
|
||||
TRIES=3
|
||||
QUERY_NUM=1
|
||||
RESULT_FILE=result-master-"${DB}".csv
|
||||
touch "${RESULT_FILE}"
|
||||
truncate -s 0 "${RESULT_FILE}"
|
||||
|
||||
while read -r query; do
|
||||
echo -n "query${QUERY_NUM}," | tee -a "${RESULT_FILE}"
|
||||
for i in $(seq 1 "${TRIES}"); do
|
||||
RES=$(mysql -vvv -h"${FE_HOST}" -u"${USER}" -P"${FE_QUERY_PORT}" -D"${DB}" -e "${query}" | perl -nle 'print $1 if /((\d+\.\d+)+ sec)/' || :)
|
||||
echo -n "${RES}" | tee -a "${RESULT_FILE}"
|
||||
[[ "${i}" != "${TRIES}" ]] && echo -n "," | tee -a "${RESULT_FILE}"
|
||||
done
|
||||
echo "" | tee -a "${RESULT_FILE}"
|
||||
|
||||
QUERY_NUM=$((QUERY_NUM + 1))
|
||||
done <"$5"
|
||||
@ -0,0 +1,76 @@
|
||||
#!/usr/bin/env bash
|
||||
# Licensed to the Apache Software Foundation (ASF) under one
|
||||
# or more contributor license agreements. See the NOTICE file
|
||||
# distributed with this work for additional information
|
||||
# regarding copyright ownership. The ASF licenses this file
|
||||
# to you under the Apache License, Version 2.0 (the
|
||||
# "License"); you may not use this file except in compliance
|
||||
# with the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing,
|
||||
# software distributed under the License is distributed on an
|
||||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
# KIND, either express or implied. See the License for the
|
||||
# specific language governing permissions and limitations
|
||||
# under the License.
|
||||
|
||||
##############################################################
|
||||
# See emr_tools.sh
|
||||
##############################################################
|
||||
|
||||
# usage: sh run.sh dlf parquet
|
||||
FE_HOST=$1
|
||||
USER=$2
|
||||
PORT=$3
|
||||
if [[ -z "$4" ]]; then
|
||||
echo 'need catalog name'
|
||||
exit
|
||||
else
|
||||
catalog_name=$4
|
||||
fi
|
||||
|
||||
if [[ -z "$5" ]]; then
|
||||
echo "run all test default"
|
||||
elif [[ "$5" = 'all' ]]; then
|
||||
echo "run all test"
|
||||
else
|
||||
case=$5
|
||||
fi
|
||||
|
||||
if [[ -z ${TYPE} ]]; then
|
||||
TYPE=obj
|
||||
fi
|
||||
echo "execute ${case} benchmark for ${TYPE}..."
|
||||
|
||||
if [[ "${case}" = 'ssb' ]]; then
|
||||
# ssb
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_parquet_"${TYPE}" queries/ssb_queries.sql
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_orc_"${TYPE}" queries/ssb_queries.sql
|
||||
elif [[ "${case}" = 'ssb_flat' ]]; then
|
||||
# ssb_flat
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_parquet_"${TYPE}" queries/ssb_flat_queries.sql
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_orc_"${TYPE}" queries/ssb_flat_queries.sql
|
||||
elif [[ "${case}" = 'tpch' ]]; then
|
||||
# tpch
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".tpch100_parquet_"${TYPE}" queries/tpch_queries.sql
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".tpch100_orc_"${TYPE}" queries/tpch_queries.sql
|
||||
elif [[ "${case}" = 'clickbench' ]]; then
|
||||
# clickbench
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".clickbench_parquet_"${TYPE}" queries/clickbench_queries.sql
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".clickbench_orc_"${TYPE}" queries/clickbench_queries.sql
|
||||
else
|
||||
# run all
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_parquet_"${TYPE}" queries/ssb_queries.sql
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_orc_"${TYPE}" queries/ssb_queries.sql
|
||||
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_parquet_"${TYPE}" queries/ssb_flat_queries.sql
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_orc_"${TYPE}" queries/ssb_flat_queries.sql
|
||||
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".tpch100_parquet_"${TYPE}" queries/tpch_queries.sql
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".tpch100_orc_"${TYPE}" queries/tpch_queries.sql
|
||||
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".clickbench_parquet_"${TYPE}" queries/clickbench_queries.sql
|
||||
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".clickbench_orc_"${TYPE}" queries/clickbench_queries.sql
|
||||
fi
|
||||
Reference in New Issue
Block a user