[regression](multi-catalog)add EMR cloud env test tools (#21788)

add emr test tools for aliyun, huawei cloud, tencent cloud.
This commit is contained in:
slothever
2023-07-28 09:45:10 +08:00
committed by GitHub
parent 8caa5a9ba4
commit f0f3548dfe
27 changed files with 1484 additions and 0 deletions

View File

@ -0,0 +1,178 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# Data Lake Regression Testing Tool For External Table
Used to test the doris external table on object storage for cloud vendors
> Supported storage formats: HDFS, Alibaba Cloud OSS, Tencent Cloud COS, Huawei Cloud OBS
> Supported data lake table formats: Iceberg
The following provides the example of the command line options:
```
sh tools/emr_storage_regression/emr_tools.sh --profile default_emr_env.sh
```
Or
```
sh tools/emr_storage_regression/emr_tools.sh --case CASE --endpoint ENDPOINT --region REGION --service SERVICE --ak AK --sk SK --host HOST --user USER --port PORT
```
The usage of each option is described below.
## Connectivity Test
When the `--case` option is set to `ping`, will check Doris's connectivity on EMR:
- `--endpoint`, Object Storage Endpoint.
- `--region`, Object Storage Region.
- `--ak`, Object Storage Access Key.
- `--sk`, Object Storage Secret Key.
- `--host`, Doris Mysql Client IP.
- `--user`, Doris Mysql Client Username.
- `--port`, Doris Mysql Client Port.
- `--service`, EMR cloud vendors: ali(Alibaba), hw(Huawei), tx(tencent).
### Environment Variables
Need modify the environment variable in `default_emr_env.sh`, the script will execute `source default_emr_env.sh` to make the environment variable take effect.
If environment variables are configured, you can run the test script directly with the following command:
```
sh emr_tools.sh --profile default_emr_env.sh
```
### The Script Execution Steps For Connectivity Test
1. Create Spark and Hive tables on EMR
2. Use Spark and Hive command lines to insert sample data
3. Doris creates the Catalog for connectivity test
4. Execute SQL for connectivity test: `ping.sql`
### Alibaba Cloud
```
sh emr_tools.sh --profile default_emr_env.sh
```
Or
Set `--service` to `ali`, and then test connectivity on Huawei Cloud.
```
sh emr_tools.sh --case ping --endpoint oss-cn-beijing-internal.aliyuncs.com --region cn-beijing --service ali --ak ak --sk sk --host 127.0.0.1 --user root --port 9030 > log
```
Alibaba Cloud EMR also supports testing connectivity for both Doris with DLF metadata and Doris on OSS-HDFS storage.
- The DLF metadata connectivity test needs to be performed on the EMR cluster where the DLF serves as the metadata store, Default value of `DLF_ENDPOINT` is `datalake-vpc.cn-beijing.aliyuncs.com`, configured at ping_test/ping_poc.sh.
- To test the OSS-HDFS storage connectivity, need to [enable the HDFS service on the OSS storage and configure](https://www.alibabacloud.com/help/en/e-mapreduce/latest/oss-hdfsnew), Default value of `JINDO_ENDPOINT` is `cn-beijing.oss-dls.aliyuncs.com`, configured at ping_test/ping_poc.sh.
### Tencent Cloud
```
sh emr_tools.sh --profile default_emr_env.sh
```
Or
Set `--service` to `tx`, and then test connectivity on Huawei Cloud.
```
sh emr_tools.sh --case ping --endpoint cos.ap-beijing.myqcloud.com --region ap-beijing --service tx --ak ak --sk sk --host 127.0.0.1 --user root --port 9030 > log
```
### Huawei Cloud
```
sh emr_tools.sh --profile default_emr_env.sh
```
Or
Set `--service`to `hw`, and then test connectivity on Huawei Cloud.
```
sh emr_tools.sh --case ping --endpoint obs.cn-north-4.myhuaweicloud.com --region cn-north-4 --service hw --ak ak --sk sk --host 127.0.0.1 --user root --port 9030 > log
```
## Performance Testing on Standard Test Set
When the `--case` option is set to `data_set`, will test the query performance of Doris external table:
- `--test` test data set: ssb, ssb_flat, tpch, clickbench and all. Default `all`.
- `--service`, EMR cloud vendors: ali(Alibaba), hw(Huawei), tx(tencent).
- `--host`, Doris Mysql Client IP.
- `--user`, Doris Mysql Client Username.
- `--port`, Doris Mysql Client Port.
### Environment Variables
Just modify the above environment variable in `default_emr_env.sh`, the script will execute `source default_emr_env.sh` to make the environment variable take effect.
If environment variables are configured, you can run the test script directly with the following command:
```
sh emr_tools.sh --profile default_emr_env.sh
```
### Prepare Data
1. To run the standard test set using the `emr_tools.sh` script, you need to rewrite the object storage bucket specified by the `BUCKET` variable, and then prepare data in advance and put them under the bucket. The script will generate table creation statements based on the bucket.
2. Now the `emr_tools.sh` script supports iceberg, parquet and orc data for ssb, ssb_flat, tpch, clickbench.
### Execution Steps
1. After the connectivity test, the Doris Catalog corresponding to the standard test set is created
2. Prepare the test set data based on the object storage bucket specified by the `BUCKET` variable
3. Generate Spark table creation statements and create Spark object storage tables on EMR
4. Create the spark table in the local HDFS directory: `hdfs:///benchmark-hdfs`
5. You can choose to analyze Doris tables ahead of time and manually execute the statements in `analyze.sql` in the Doris Catalog
6. Execute standard test set scripts: `run_standard_set.sh`
### Standard data set: ssb, ssb_flat, tpch, clickbench
- Full test. After executing the test command, Doris will run ssb, ssb_flat, tpch, clickbench tests in sequence, and the test results will include the cases on HDFS and on the object storage specified by `--service`.
```
sh emr_tools.sh --case data_set --service ali --host 127.0.0.1 --user root --port 9030 > log
```
- Specify a single test. `--test` option can be set to one of ssb, ssb_flat, tpch and clickbench.
```
sh emr_tools.sh --case data_set --test ssb --service ali --host 127.0.0.1 --user root --port 9030 > log
```

View File

@ -0,0 +1,58 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
##############################################################
# See emr_tools.sh
##############################################################
# specified sevices: ali,hw,tx
export SERVICE=ali
# doris host
export HOST=127.0.0.1
# doris user
export USER=root
# doris mysql cli port
export PORT=9030
# prepare endpoint,region,ak/sk
if [[ ${SERVICE} == 'ali' ]]; then
export CASE=ping
export AK=ak
export SK=sk
export ENDPOINT=oss-cn-beijing-internal.aliyuncs.com
export REGION=oss-cn-beijing
export HMS_META_URI="thrift://172.16.1.1:9083"
export HMS_WAREHOUSE=oss://benchmark-oss/user
elif [[ ${SERVICE} == 'hw' ]]; then
export CASE=ping
export AK=ak
export SK=sk
export ENDPOINT=obs.cn-north-4.myhuaweicloud.com
export REGION=cn-north-4
export HMS_META_URI="thrift://node1:9083,thrift://node2:9083"
export HMS_WAREHOUSE=obs://datalake-bench/user
export BEELINE_URI="jdbc:hive2://192.168.0.1:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;hive.server2.proxy.user=hive"
elif [[ ${SERVICE} == 'tx' ]]; then
export CASE=ping
export AK=ak
export SK=sk
export ENDPOINT=cos.ap-beijing.mycloud.com
export REGION=ap-beijing
export HMS_META_URI="thrift://172.21.0.1:7004"
export HMS_WAREHOUSE=cosn://datalake-bench-cos-1308700295/user
fi

View File

@ -0,0 +1,192 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
##############################################################
# This script is used to test EMR cloud service
# Usage:
# provide your env arguments in default_emr_env.sh
# sh emr_tools.sh --case ping --endpoint oss-cn-beijing-internal.aliyuncs.com --region cn-beijing --service ali --ak ak --sk sk
##############################################################
set -eo pipefail
usage() {
echo "
Usage: $0 <options>
Optional options:
[no option]
--case regression case runner: ping, data_set
--profile cloud credential profile
--ak cloud access key
--sk cloud secret key
--endpoint cloud endpoint
--region cloud region
--service cloud optional service provider: ali, tx, hw
--host doris mysql cli host, example: 127.0.0.1
--user doris username, example: user
--port doris port, example: 9030
Example:
sh emr_tools.sh --case ping --endpoint oss-cn-beijing-internal.aliyuncs.com --region cn-beijing --service ali --ak ak --sk sk
"
exit 1
}
if ! OPTS="$(getopt \
-n "$0" \
-o '' \
-l 'case:' \
-l 'profile:' \
-l 'ak:' \
-l 'sk:' \
-l 'endpoint:' \
-l 'region:' \
-l 'service:' \
-l 'host:' \
-l 'user:' \
-l 'port:' \
-l 'test:' \
-o 'h' \
-- "$@")"; then
usage
fi
eval set -- "${OPTS}"
while true; do
case "$1" in
--profile)
PROFILE="$2"
# can use custom profile: sh emr_tools.sh --profile default_emr_env.sh
if [[ -n "${PROFILE}" ]]; then
# example: "$(pwd)/default_emr_env.sh"
# shellcheck disable=SC1090
source "${PROFILE}"
fi
shift 2
break
;;
--case)
CASE="$2"
shift 2
;;
--ak)
AK="$2"
shift 2
;;
--sk)
SK="$2"
shift 2
;;
--endpoint)
ENDPOINT="$2"
shift 2
;;
--region)
REGION="$2"
shift 2
;;
--test)
TEST_SET="$2"
shift 2
;;
--service)
SERVICE="$2"
shift 2
;;
--host)
HOST="$2"
shift 2
;;
--user)
USER="$2"
shift 2
;;
--port)
PORT="$2"
shift 2
;;
-h)
usage
;;
--)
shift
break
;;
*)
echo "$1"
echo "Internal error"
exit 1
;;
esac
done
export FE_HOST=${HOST}
export USER=${USER}
export FE_QUERY_PORT=${PORT}
if [[ ${CASE} == 'ping' ]]; then
if [[ ${SERVICE} == 'hw' ]]; then
# shellcheck disable=SC2269
HMS_META_URI="${HMS_META_URI}"
# shellcheck disable=SC2269
HMS_WAREHOUSE="${HMS_WAREHOUSE}"
# shellcheck disable=SC2269
BEELINE_URI="${BEELINE_URI}"
elif [[ ${SERVICE} == 'ali' ]]; then
# shellcheck disable=SC2269
HMS_META_URI="${HMS_META_URI}"
# shellcheck disable=SC2269
HMS_WAREHOUSE="${HMS_WAREHOUSE}"
else
# [[ ${SERVICE} == 'tx' ]];
# shellcheck disable=SC2269
HMS_META_URI="${HMS_META_URI}"
# shellcheck disable=SC2269
HMS_WAREHOUSE="${HMS_WAREHOUSE}"
fi
sh ping_test/ping_poc.sh "${ENDPOINT}" "${REGION}" "${SERVICE}" "${AK}" "${SK}" "${HMS_META_URI}" "${HMS_WAREHOUSE}" "${BEELINE_URI}"
elif [[ ${CASE} == 'data_set' ]]; then
if [[ ${SERVICE} == 'tx' ]]; then
BUCKET=cosn://datalake-bench-cos-1308700295
elif [[ ${SERVICE} == 'ali' ]]; then
BUCKET=oss://benchmark-oss
fi
# gen table for spark
if ! sh stardard_set/gen_spark_create_sql.sh "${BUCKET}" obj; then
echo "Fail to generate spark obj table for test set"
exit 1
fi
if ! sh stardard_set/gen_spark_create_sql.sh hdfs:///benchmark-hdfs hdfs; then
echo "Fail to generate spark hdfs table for test set, import hdfs data first"
exit 1
fi
# FE_HOST=172.16.1.163
# USER=root
# PORT=9035
if [[ -z ${TEST_SET} ]]; then
TEST_SET='all'
fi
TYPE=hdfs sh stardard_set/run_standard_set.sh "${FE_HOST}" "${USER}" "${PORT}" hms_hdfs "${TEST_SET}"
TYPE=hdfs sh stardard_set/run_standard_set.sh "${FE_HOST}" "${USER}" "${PORT}" iceberg_hms "${TEST_SET}"
if [[ ${SERVICE} == 'tx' ]]; then
sh stardard_set/run_standard_set.sh "${FE_HOST}" "${USER}" "${PORT}" hms_cos "${TEST_SET}"
sh stardard_set/run_standard_set.sh "${FE_HOST}" "${USER}" "${PORT}" iceberg_hms_cos "${TEST_SET}"
elif [[ ${SERVICE} == 'ali' ]]; then
sh stardard_set/run_standard_set.sh "${FE_HOST}" "${USER}" "${PORT}" hms_oss "${TEST_SET}"
sh stardard_set/run_standard_set.sh "${FE_HOST}" "${USER}" "${PORT}" iceberg_hms_oss "${TEST_SET}"
fi
fi

View File

@ -0,0 +1,12 @@
CREATE CATALOG IF NOT EXISTS hms_hdfs PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI" );
CREATE CATALOG IF NOT EXISTS hms_oss PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI", "oss.secret_key" = "SK_INPUT", "oss.endpoint" = "ENDPOINT", "oss.access_key" = "AK_INPUT" );
CREATE CATALOG IF NOT EXISTS hms_jindo PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI", "oss.secret_key" = "SK_INPUT", "oss.endpoint" = "JINDO_ENDPOINT", "oss.access_key" = "AK_INPUT", "oss.hdfs.enabled" = "true" );
CREATE CATALOG IF NOT EXISTS iceberg_hms PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI" );
CREATE CATALOG IF NOT EXISTS iceberg_hms_oss PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI", "oss.secret_key" = "SK_INPUT", "oss.endpoint" = "ENDPOINT", "oss.access_key" = "AK_INPUT" );
CREATE CATALOG IF NOT EXISTS iceberg_hms_jindo PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI", "oss.secret_key" = "SK_INPUT", "oss.endpoint" = "JINDO_ENDPOINT", "oss.access_key" = "AK_INPUT", "oss.hdfs.enabled" = "true" );
CREATE CATALOG IF NOT EXISTS dlf PROPERTIES( "type" = "hms", "hive.metastore.type" = "dlf", "dlf.proxy.mode" = "DLF_ONLY", "dlf.endpoint" = "DLF_ENDPOINT", "dlf.uid" = "217316283625971977", "dlf.access_key" = "AK_INPUT", "dlf.secret_key" = "SK_INPUT" );
CREATE CATALOG IF NOT EXISTS dlf_jindo PROPERTIES( "type" = "hms", "hive.metastore.type" = "dlf", "dlf.proxy.mode" = "DLF_ONLY", "dlf.endpoint" = "DLF_ENDPOINT", "dlf.uid" = "217316283625971977", "dlf.access_key" = "AK_INPUT", "dlf.secret_key" = "SK_INPUT", "oss.hdfs.enabled" = "true" );
CREATE CATALOG IF NOT EXISTS iceberg_dlf PROPERTIES ( "type"="iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.type" = "dlf", "dlf.endpoint" = "DLF_ENDPOINT", "dlf.region" = "cn-beijing", "dlf.proxy.mode" = "DLF_ONLY", "dlf.uid" = "217316283625971977", "dlf.access_key" = "AK_INPUT", "dlf.secret_key" = "SK_INPUT" );
CREATE CATALOG IF NOT EXISTS iceberg_dlf_jindo PROPERTIES ( "type"="iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.type" = "dlf", "dlf.endpoint" = "DLF_ENDPOINT", "dlf.region" = "cn-beijing", "dlf.proxy.mode" = "DLF_ONLY", "dlf.uid" = "217316283625971977", "dlf.access_key" = "AK_INPUT", "dlf.secret_key" = "SK_INPUT", "oss.hdfs.enabled" = "true" );

View File

@ -0,0 +1,7 @@
CREATE CATALOG IF NOT EXISTS hms_hdfs PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI" );
CREATE CATALOG IF NOT EXISTS hms_s3 PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI", "s3.secret_key" = "SK_INPUT", "s3.endpoint" = "ENDPOINT", "s3.access_key" = "AK_INPUT" );
CREATE CATALOG IF NOT EXISTS iceberg_hms PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI" );
CREATE CATALOG IF NOT EXISTS iceberg_hms_s3 PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI", "s3.secret_key" = "SK_INPUT", "s3.endpoint" = "ENDPOINT", "s3.access_key" = "AK_INPUT" );
-- glue s3

View File

@ -0,0 +1,5 @@
CREATE CATALOG IF NOT EXISTS hms_hdfs PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI" );
CREATE CATALOG IF NOT EXISTS hms_obs PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI", "obs.secret_key" = "SK_INPUT", "obs.endpoint" = "ENDPOINT", "obs.access_key" = "AK_INPUT" );
CREATE CATALOG IF NOT EXISTS iceberg_hms PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI" );
CREATE CATALOG IF NOT EXISTS iceberg_hms_obs PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI", "obs.secret_key" = "SK_INPUT", "obs.endpoint" = "ENDPOINT", "obs.access_key" = "AK_INPUT" );

View File

@ -0,0 +1,5 @@
CREATE CATALOG IF NOT EXISTS hms_hdfs PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI" );
CREATE CATALOG IF NOT EXISTS hms_cos PROPERTIES ( "type" = "hms", "hive.metastore.uris" = "META_URI", "cos.secret_key" = "SK_INPUT", "cos.endpoint" = "ENDPOINT", "cos.access_key" = "AK_INPUT" );
CREATE CATALOG IF NOT EXISTS iceberg_hms PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI" );
CREATE CATALOG IF NOT EXISTS iceberg_hms_cos PROPERTIES ( "type" = "iceberg", "iceberg.catalog.type" = "hms", "hive.metastore.uris" = "META_URI", "cos.secret_key" = "SK_INPUT", "cos.endpoint" = "ENDPOINT", "cos.access_key" = "AK_INPUT" );

View File

@ -0,0 +1,9 @@
CREATE DATABASE IF NOT EXISTS hive_dlf_db;
CREATE TABLE IF NOT EXISTS hive_dlf_db.types ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
CREATE TABLE IF NOT EXISTS hive_dlf_db.types_one_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) PARTITIONED BY (dt STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
CREATE TABLE IF NOT EXISTS hive_dlf_db.types_multi_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN ) PARTITIONED BY (dt STRING, hms_timstamp TIMESTAMP, hms_date DATE) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
CREATE DATABASE IF NOT EXISTS hive_iceberg_db_dlf;
CREATE TABLE IF NOT EXISTS hive_iceberg_db_dlf.types ( hms_int INT, hms_smallint INT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char STRING, hms_varchar STRING, hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) STORED BY "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler" TBLPROPERTIES ('iceberg.catalog'='dlf', 'format-version'= '2');
CREATE TABLE IF NOT EXISTS hive_iceberg_db_dlf.types_one_part ( hms_int INT, hms_smallint INT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char STRING, hms_varchar STRING, hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) PARTITIONED BY (dt STRING) STORED BY "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler" TBLPROPERTIES ('iceberg.catalog'='dlf', 'format-version'='2');
CREATE TABLE IF NOT EXISTS hive_iceberg_db_dlf.types_multi_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN ) PARTITIONED BY (dt STRING, hms_timstamp TIMESTAMP, hms_date DATE) STORED BY "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler" TBLPROPERTIES ('iceberg.catalog'='dlf', 'format-version'='2');

View File

@ -0,0 +1,9 @@
CREATE DATABASE IF NOT EXISTS hive_hms_db;
CREATE TABLE IF NOT EXISTS hive_hms_db.types ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
CREATE TABLE IF NOT EXISTS hive_hms_db.types_one_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) PARTITIONED BY (dt STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
CREATE TABLE IF NOT EXISTS hive_hms_db.types_multi_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN ) PARTITIONED BY (dt STRING, hms_timstamp TIMESTAMP, hms_date DATE) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat';
CREATE DATABASE IF NOT EXISTS hive_iceberg_db_hms;
CREATE TABLE IF NOT EXISTS hive_iceberg_db_hms.types ( hms_int INT, hms_smallint INT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char STRING, hms_varchar STRING, hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) STORED BY "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler" TBLPROPERTIES ('format-version'='2');
CREATE TABLE IF NOT EXISTS hive_iceberg_db_hms.types_one_part ( hms_int INT, hms_smallint INT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char STRING, hms_varchar STRING, hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) PARTITIONED BY (dt STRING) STORED BY "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler" TBLPROPERTIES ( 'format-version'='2');
CREATE TABLE IF NOT EXISTS hive_iceberg_db_hms.types_multi_part ( hms_int INT, hms_smallint INT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char STRING, hms_varchar STRING, hms_bool BOOLEAN ) PARTITIONED BY (dt STRING, hms_timstamp TIMESTAMP, hms_date DATE) STORED BY "org.apache.iceberg.mr.hive.HiveIcebergStorageHandler" TBLPROPERTIES ( 'format-version'='2');

View File

@ -0,0 +1,9 @@
CREATE DATABASE IF NOT EXISTS spark_hms_db;
CREATE TABLE IF NOT EXISTS spark_hms_db.types ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) using parquet;
CREATE TABLE IF NOT EXISTS spark_hms_db.types_one_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) using parquet partitioned by (dt string);
CREATE TABLE IF NOT EXISTS spark_hms_db.types_multi_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN ) using parquet partitioned by (dt string, hms_timstamp TIMESTAMP, hms_date DATE);
--CREATE DATABASE IF NOT EXISTS iceberg.spark_iceberg_db_hms;
--CREATE TABLE IF NOT EXISTS iceberg.spark_iceberg_db_hms.types ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) USING iceberg TBLPROPERTIES ( 'format-version'='2');
--CREATE TABLE IF NOT EXISTS iceberg.spark_iceberg_db_hms.types_one_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN, hms_timstamp TIMESTAMP, hms_date DATE ) PARTITIONED BY (dt STRING) USING iceberg TBLPROPERTIES ( 'format-version'='2');
--CREATE TABLE IF NOT EXISTS iceberg.spark_iceberg_db_hms.types_multi_part ( hms_int INT, hms_smallint SMALLINT, hms_bigint BIGINT, hms_double DOUBLE, hms_string STRING, hms_decimal DECIMAL(12,4), hms_char CHAR(50), hms_varchar VARCHAR(50), hms_bool BOOLEAN ) PARTITIONED BY (dt STRING, hms_timstamp TIMESTAMP, hms_date DATE) USING iceberg TBLPROPERTIES ( 'format-version'='2');

View File

@ -0,0 +1,7 @@
insert into hive_dlf_db.types values(123,34,3455,34.667754,"wastxali",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23");
insert into hive_dlf_db.types_one_part values(604,376,234,123.478,"aswwas",234.1234,"a23f","wsd",false,"2023-04-23 21:23:34.123","2023-04-23","2023-04-22");
insert into hive_dlf_db.types_one_part values(223,22,234,234.500,"awsali",234.1234,"a23f","1234vb",true,"2023-04-22 21:21:34.123","2023-04-21","2023-04-24");
insert into hive_dlf_db.types_multi_part values(1234,346,234,123.65567,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-20","2023-04-21 19:23:34.123","2023-04-19");
insert into hive_dlf_db.types_multi_part values(3212343,34,234,123.730,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-20","2023-04-22 20:23:34.123","2023-04-22");
insert into hive_dlf_db.types_multi_part values(355,22,990,123.324235,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-21","2023-04-22 20:23:34.123","2023-04-22");
insert into hive_dlf_db.types_multi_part values(23675,22,986,123.324235,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-25","2023-04-21 19:23:34.123","2023-04-24");

View File

@ -0,0 +1,15 @@
insert into hive_hms_db.types values(1123,5126,51,4534.63463,"wastxali",235.2351,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23");
insert into hive_hms_db.types_one_part values(23621,23,234,345.12512356,"aswwas",525.2352,"a23f","wsd",false,"2023-04-23 21:23:34.123","2023-04-23","2023-04-22");
insert into hive_hms_db.types_one_part values(11625,62,234,2347.6236,"awsali",546.2342,"a23f","1234vb",true,"2023-04-22 21:21:34.123","2023-04-21","2023-04-24");
insert into hive_hms_db.types_multi_part values(123,66,234,13.1242,"hwaws",3463.4363,"a23f","1234vb",true,"2023-04-20","2023-04-21 19:23:34.123","2023-04-19");
insert into hive_hms_db.types_multi_part values(324,77,234,123.163446,"hwaws",345.3413,"a23f","1234vb",true,"2023-04-20","2023-04-22 20:23:34.123","2023-04-22");
insert into hive_hms_db.types_multi_part values(423,909,234,123657.512,"hwaws",234.2363,"a23f","1234vb",true,"2023-04-21","2023-04-22 20:23:34.123","2023-04-22");
insert into hive_hms_db.types_multi_part values(343,712,234,1234.21451,"hwaws",3564.8945,"a23f","1234vb",true,"2023-04-25","2023-04-21 19:23:34.123","2023-04-24");
insert into hive_iceberg_db_hms.types values(123,22,234,123.324235,"wsawh",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23");
insert into hive_iceberg_db_hms.types_one_part values(223,22,234,123.324235,"aswwas",234.1234,"a23f","wsd",false,"2023-04-23 21:23:34.123","2023-04-23","2023-04-22");
insert into hive_iceberg_db_hms.types_one_part values(223,22,234,123.324235,"awsali",234.1234,"a23f","1234vb",true,"2023-04-22 21:21:34.123","2023-04-21","2023-04-24");
insert into hive_iceberg_db_hms.types_multi_part values(323,22,234,123.324235,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-20","2023-04-21 19:23:34.123","2023-04-19");
insert into hive_iceberg_db_hms.types_multi_part values(323,22,234,123.324235,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-20","2023-04-22 20:23:34.123","2023-04-22");
insert into hive_iceberg_db_hms.types_multi_part values(323,22,234,123.324235,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-21","2023-04-22 20:23:34.123","2023-04-22");
insert into hive_iceberg_db_hms.types_multi_part values(323,22,234,123.324235,"hwaws",234.1234,"a23f","1234vb",true,"2023-04-25","2023-04-21 19:23:34.123","2023-04-24");

View File

@ -0,0 +1,10 @@
insert into spark_hms_db.types values(123,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23");
insert into spark_hms_db.types_one_part values(223,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23 21:23:34","2023-04-23");
insert into spark_hms_db.types_one_part values(223,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23 21:23:34","2023-04-23");
insert into spark_hms_db.types_multi_part values(323,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",true,"2023-04-23","2023-04-23 21:23:34.123","2023-04-23");
insert into spark_hms_db.types_multi_part values(323,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",true,"2023-04-23","2023-04-23 21:23:34.123","2023-04-23");
insert into spark_hms_db.types_multi_part values(323,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",true,"2023-04-23","2023-04-23 21:23:34.123","2023-04-23");
--insert into iceberg.spark_iceberg_db_hms.types values(123,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23");
--insert into iceberg.spark_iceberg_db_hms.types_one_part values(223,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",false,"2023-04-23 21:23:34.123","2023-04-23","2023-04-23");
--insert into iceberg.spark_iceberg_db_hms.types_multi_part values(323,22,234,123.324235,"sawer",234.1234,"a23f","1234vb",true,"2023-04-23","2023-04-23 21:23:34.123","2023-04-23");

View File

@ -0,0 +1,15 @@
select * from spark_hms_db.types;
select * from spark_hms_db.types_one_part;
select * from spark_hms_db.types_multi_part;
select * from hive_hms_db.types;
select * from hive_hms_db.types_one_part;
select * from hive_hms_db.types_multi_part;
select * from spark_iceberg_db_hms.types;
select * from spark_iceberg_db_hms.types_one_part;
select * from spark_iceberg_db_hms.types_multi_part;
select * from hive_iceberg_db_hms.types;
select * from hive_iceberg_db_hms.types_one_part;
select * from hive_iceberg_db_hms.types_multi_part;

View File

@ -0,0 +1,23 @@
select * from spark_hms_db.types;
select * from spark_hms_db.types_one_part;
select * from spark_hms_db.types_multi_part;
select * from hive_hms_db.types;
select * from hive_hms_db.types_one_part;
select * from hive_hms_db.types_multi_part;
select * from spark_iceberg_db_hms.types;
select * from spark_iceberg_db_hms.types_one_part;
select * from spark_iceberg_db_hms.types_multi_part;
select * from hive_iceberg_db_hms.types;
select * from hive_iceberg_db_hms.types_one_part;
select * from hive_iceberg_db_hms.types_multi_part;
select * from hive_dlf_db.types;
select * from hive_dlf_db.types_one_part;
select * from hive_dlf_db.types_multi_part;
select * from hive_iceberg_db_dlf.types;
select * from hive_iceberg_db_dlf.types_one_part;
select * from hive_iceberg_db_dlf.types_multi_part;

View File

@ -0,0 +1,151 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
##############################################################
# See emr_tools.sh
##############################################################
## Step 1: create external table and import data
ENDPOINT=$1
REGION=$2
SERVICE=$3
AK=$4
SK=$5
HMS_META_URI=$6
HMS_WAREHOUSE=$7
BEELINE_URI=$8
# set global env to local
# shellcheck disable=SC2269
FE_HOST=${FE_HOST}
# shellcheck disable=SC2269
FE_QUERY_PORT=${FE_QUERY_PORT}
# shellcheck disable=SC2269
USER=${USER}
DLF_ENDPOINT=datalake-vpc.cn-beijing.aliyuncs.com
JINDO_ENDPOINT=cn-beijing.oss-dls.aliyuncs.com
if [[ -z ${HMS_WAREHOUSE} ]]; then
echo "Need warehouse for ${SERVICE}"
fi
cd "$(dirname "$0")" || gexit
run_spark_create_sql() {
if [[ ${SERVICE} == 'ali' ]]; then
PARAM="--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.iceberg.catalog-impl=org.apache.iceberg.aliyun.dlf.hive.DlfCatalog \
--conf spark.sql.catalog.iceberg.access.key.id=${AK} \
--conf spark.sql.catalog.iceberg.access.key.secret=${SK} \
--conf spark.sql.catalog.iceberg.dlf.endpoint=${DLF_ENDPOINT} \
--conf spark.sql.catalog.iceberg.dlf.region-id=${REGION} \
--conf spark.sql.catalog.hms=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.hms.type=hive \
--conf spark.sql.defaultCatalog=hms \
--conf spark.sql.catalog.hms.warehouse=${HMS_WAREHOUSE} \
-f data/create_spark_ping.sql" 2>spark_create.log
elif [[ ${SERVICE} == 'tx' ]]; then
PARAM="--jars /usr/local/service/iceberg/iceberg-spark-runtime-3.2_2.12-0.13.1.jar \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
--conf spark.sql.catalog.spark_catalog.type=hive \
--conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.local.type=hadoop \
--conf spark.sql.catalog.local.warehouse=/usr/hive/warehouse \
-f data/create_spark_ping.sql" 2>spark_create.log
elif [[ ${SERVICE} == 'hw' ]]; then
PARAM="-f data/create_spark_ping.sql" 2>spark_create.log
else
echo "Unknown service type: ${SERVICE}"
exit 1
fi
eval spark-sql "${PARAM}"
}
run_spark_create_sql
run_hive_create_sql() {
if [[ ${SERVICE} == 'hw' ]]; then
beeline -u "${BEELINE_URI}" -f data/create_hive_ping.sql 2>hive_create.log
elif [[ ${SERVICE} == 'ali' ]]; then
hive -f data/create_hive_ping.sql 2>hive_create.log
else
hive -f data/create_hive_ping.sql 2>hive_create.log
fi
}
run_hive_create_sql
## Step 2: make ping data
spark-sql -f data/data_for_spark.sql >>spark_data.log
hive -f data/data_for_hive.sql >>hive_data.log
run_query() {
QUERY_NUM=1
TRIES=2
sql_file=$1
catalog=$2
while read -r query; do
echo -n "create catalog ${QUERY_NUM},"
for i in $(seq 1 "${TRIES}"); do
if [[ -n ${catalog} ]]; then
query="switch ${catalog};${query}"
fi
RES=$(mysql -vvv -h"${FE_HOST}" -u"${USER}" -P"${FE_QUERY_PORT}" -e "${query}")
echo -n "${RES}"
[[ "${i}" != "${TRIES}" ]] && echo -n ","
done
QUERY_NUM=$((QUERY_NUM + 1))
done <"${sql_file}"
}
## Step 3: create external catalog in doris
# shellcheck disable=SC2094
case "${SERVICE}" in
ali)
sed -e 's#DLF_ENDPOINT#'"${DLF_ENDPOINT}"'#g' emr_catalog.sql >emr_catalog.sql
sed -e 's#JINDO_ENDPOINT#'"${JINDO_ENDPOINT}"'#g' emr_catalog.sql >emr_catalog.sql
sed -e 's#ENDPOINT#'"${ENDPOINT}"'#g' -e 's#META_URI#'"${HMS_META_URI}"'#g' -e 's#AK_INPUT#'"${AK}"'#g' -e 's#SK_INPUT#'"${SK}"'#g' create_catalog_aliyun.sql >emr_catalog.sql
;;
tx)
sed -e 's#ENDPOINT#'"${ENDPOINT}"'#g' -e 's#META_URI#'"${HMS_META_URI}"'#g' -e 's#AK_INPUT#'"${AK}"'#g' -e 's#SK_INPUT#'"${SK}"'#g' create_catalog_tx.sql >emr_catalog.sql
;;
aws)
sed -e 's#ENDPOINT#'"${ENDPOINT}"'#g' -e 's#META_URI#'"${HMS_META_URI}"'#g' -e 's#AK_INPUT#'"${AK}"'#g' -e 's#SK_INPUT#'"${SK}"'#g' create_catalog_aws.sql >emr_catalog.sql
;;
hw)
sed -e 's#ENDPOINT#'"${ENDPOINT}"'#g' -e 's#META_URI#'"${HMS_META_URI}"'#g' -e 's#AK_INPUT#'"${AK}"'#g' -e 's#SK_INPUT#'"${SK}"'#g' create_catalog_hw.sql >emr_catalog.sql
;;
*)
echo "Internal error"
exit 1
;;
esac
run_query emr_catalog.sql
## Step 4: query ping
EMR_CATALOG=$(awk '{print $6}' emr_catalog.sql)
# shellcheck disable=SC2116
# required echo here, or the EMR_CATALOG will not be split.
for c in $(echo "${EMR_CATALOG}"); do
if [[ ${SERVICE} == 'ali' ]]; then
run_query ping_aliyun.sql "${c}"
fi
run_query ping.sql "${c}"
done

View File

@ -0,0 +1,69 @@
analyze table ssb100_parquet.customer;
analyze table ssb100_parquet.dates;
analyze table ssb100_parquet.lineorder;
analyze table ssb100_parquet.lineorder_flat;
analyze table ssb100_parquet.part;
analyze table ssb100_parquet.supplier;
analyze table ssb100_orc.customer;
analyze table ssb100_orc.dates;
analyze table ssb100_orc.lineorder;
analyze table ssb100_orc.lineorder_flat;
analyze table ssb100_orc.part;
analyze table ssb100_orc.supplier;
analyze table tpch100_parquet.customer;
analyze table tpch100_parquet.lineitem;
analyze table tpch100_parquet.nation;
analyze table tpch100_parquet.orders;
analyze table tpch100_parquet.part;
analyze table tpch100_parquet.partsupp;
analyze table tpch100_parquet.region;
analyze table tpch100_parquet.supplier;
analyze table tpch100_orc.customer;
analyze table tpch100_orc.lineitem;
analyze table tpch100_orc.nation;
analyze table tpch100_orc.orders;
analyze table tpch100_orc.part;
analyze table tpch100_orc.partsupp;
analyze table tpch100_orc.region;
analyze table tpch100_orc.supplier;
analyze table clickbench_orc.hits;
analyze table clickbench_parquet.hits;
analyze table ssb100_parquet_hdfs.customer;
analyze table ssb100_parquet_hdfs.dates;
analyze table ssb100_parquet_hdfs.lineorder;
analyze table ssb100_parquet_hdfs.lineorder_flat;
analyze table ssb100_parquet_hdfs.part;
analyze table ssb100_parquet_hdfs.supplier;
analyze table ssb100_orc_hdfs.customer;
analyze table ssb100_orc_hdfs.dates;
analyze table ssb100_orc_hdfs.lineorder;
analyze table ssb100_orc_hdfs.lineorder_flat;
analyze table ssb100_orc_hdfs.part;
analyze table ssb100_orc_hdfs.supplier;
analyze table tpch100_orc_hdfs.customer;
analyze table tpch100_orc_hdfs.lineitem;
analyze table tpch100_orc_hdfs.nation;
analyze table tpch100_orc_hdfs.orders;
analyze table tpch100_orc_hdfs.part;
analyze table tpch100_orc_hdfs.partsupp;
analyze table tpch100_orc_hdfs.region;
analyze table tpch100_orc_hdfs.supplier;
analyze table tpch100_parquet_hdfs.customer;
analyze table tpch100_parquet_hdfs.lineitem;
analyze table tpch100_parquet_hdfs.nation;
analyze table tpch100_parquet_hdfs.orders;
analyze table tpch100_parquet_hdfs.part;
analyze table tpch100_parquet_hdfs.partsupp;
analyze table tpch100_parquet_hdfs.region;
analyze table tpch100_parquet_hdfs.supplier;
analyze table clickbench_hdfs_orc.hits;
analyze table clickbench_hdfs_parquet.hits;

View File

@ -0,0 +1,38 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
##############################################################
# See emr_tools.sh
##############################################################
# shellcheck disable=SC2129
BUCKET=$1
TYPE=$2
cd "$(dirname "$0")" || exit
sh gen_tbl/gen_ssb_create_sql.sh "${BUCKET}"/ssb/ssb100_orc ssb100_orc_"${TYPE}" orc >create_"${TYPE}".sql
sh gen_tbl/gen_ssb_create_sql.sh "${BUCKET}"/ssb/ssb100_parquet ssb100_parquet_"${TYPE}" parquet >>create_"${TYPE}".sql
# tpch
sh gen_tbl/gen_tpch_create_sql.sh "${BUCKET}"/tpch/tpch100_orc tpch100_orc_"${TYPE}" orc >>create_"${TYPE}".sql
sh gen_tbl/gen_tpch_create_sql.sh "${BUCKET}"/tpch/tpch100_parquet tpch100_parquet_"${TYPE}" parquet >>create_"${TYPE}".sql
# clickbench
sh gen_tbl/gen_clickbench_create_sql.sh "${BUCKET}"/clickbench/hits_parquet clickbench_parquet_"${TYPE}" parquet >>create_"${TYPE}".sql
sh gen_tbl/gen_clickbench_create_sql.sh "${BUCKET}"/clickbench/hits_orc clickbench_orc_"${TYPE}" orc >>create_"${TYPE}".sql
# iceberg
# sh gen_tbl/gen_ssb_create_sql.sh oss://benchmark-oss/ssb/ssb100_iceberg ssb100_iceberg iceberg >> create_"${TYPE}".sql
# sh gen_tbl/gen_tpch_create_sql.sh oss://benchmark-oss/tpch/tpch100_iceberg tpch100_iceberg iceberg >> create_"${TYPE}".sql
# sh gen_tbl/gen_clickbench_create_sql.sh oss://benchmark-oss/clickbench/hits_iceberg clickbench_iceberg_hdfs >> create_"${TYPE}".sql

View File

@ -0,0 +1,153 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
##############################################################
# See emr_tools.sh
##############################################################
if [[ -z "$1" ]]; then
echo 'the first argument is database location'
exit
else
db_loc=$1
fi
if [[ -z "$2" ]]; then
echo 'the second argument is database name'
exit
else
db=$2
fi
if [[ -z "$3" ]]; then
format=parquet
else
format=$3
fi
# shellcheck disable=SC2016
echo '
CREATE DATABASE IF NOT EXISTS '"${db}"';
USE '"${db}"';
CREATE TABLE IF NOT EXISTS `hits`(
`WatchID` BIGINT,
`JavaEnable` SMALLINT,
`Title` STRING,
`GoodEvent` SMALLINT,
`EventTime` TIMESTAMP,
`EventDate` DATE,
`CounterID` INT,
`ClientIP` INT,
`RegionID` INT,
`UserID` BIGINT,
`CounterClass` SMALLINT,
`OS` SMALLINT,
`UserAgent` SMALLINT,
`URL` STRING,
`Referer` STRING,
`IsRefresh` SMALLINT,
`RefererCategoryID` SMALLINT,
`RefererRegionID` INT,
`URLCategoryID` SMALLINT,
`URLRegionID` INT,
`ResolutionWidth` SMALLINT,
`ResolutionHeight` SMALLINT,
`ResolutionDepth` SMALLINT,
`FlashMajor` SMALLINT,
`FlashMinor` SMALLINT,
`FlashMinor2` STRING,
`NetMajor` SMALLINT,
`NetMinor` SMALLINT,
`UserAgentMajor` SMALLINT,
`UserAgentMinor` STRING,
`CookieEnable` SMALLINT,
`JavascriptEnable` SMALLINT,
`IsMobile` SMALLINT,
`MobilePhone` SMALLINT,
`MobilePhoneModel` STRING,
`Params` STRING,
`IPNetworkID` INT,
`TraficSourceID` SMALLINT,
`SearchEngineID` SMALLINT,
`SearchPhrase` STRING,
`AdvEngineID` SMALLINT,
`IsArtifical` SMALLINT,
`WindowClientWidth` SMALLINT,
`WindowClientHeight` SMALLINT,
`ClientTimeZone` SMALLINT,
`ClientEventTime` TIMESTAMP,
`SilverlightVersion1` SMALLINT,
`SilverlightVersion2` SMALLINT,
`SilverlightVersion3` INT,
`SilverlightVersion4` SMALLINT,
`PageCharset` STRING,
`CodeVersion` INT,
`IsLink` SMALLINT,
`IsDownload` SMALLINT,
`IsNotBounce` SMALLINT,
`FUniqID` BIGINT,
`OriginalURL` STRING,
`HID` INT,
`IsOldCounter` SMALLINT,
`IsEvent` SMALLINT,
`IsParameter` SMALLINT,
`DontCountHits` SMALLINT,
`WithHash` SMALLINT,
`HitColor` STRING,
`LocalEventTime` TIMESTAMP,
`Age` SMALLINT,
`Sex` SMALLINT,
`Income` SMALLINT,
`Interests` SMALLINT,
`Robotness` SMALLINT,
`RemoteIP` INT,
`WindowName` INT,
`OpenerName` INT,
`HistoryLength` SMALLINT,
`BrowserLanguage` STRING,
`BrowserCountry` STRING,
`SocialNetwork` STRING,
`SocialAction` STRING,
`HTTPError` SMALLINT,
`SendTiming` INT,
`DNSTiming` INT,
`ConnectTiming` INT,
`ResponseStartTiming` INT,
`ResponseEndTiming` INT,
`FetchTiming` INT,
`SocialSourceNetworkID` SMALLINT,
`SocialSourcePage` STRING,
`ParamPrice` BIGINT,
`ParamOrderID` STRING,
`ParamCurrency` STRING,
`ParamCurrencyID` SMALLINT,
`OpenstatServiceName` STRING,
`OpenstatCampaignID` STRING,
`OpenstatAdID` STRING,
`OpenstatSourceID` STRING,
`UTMSource` STRING,
`UTMMedium` STRING,
`UTMCampaign` STRING,
`UTMContent` STRING,
`UTMTerm` STRING,
`FromTag` STRING,
`HasGCLID` SMALLINT,
`RefererHash` BIGINT,
`URLHash` BIGINT,
`CLID` INT)
USING '"${format}"'
LOCATION "'"${db_loc}"'";
'

View File

@ -0,0 +1,166 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
##############################################################
# See emr_tools.sh
##############################################################
if [[ -z "$1" ]]; then
echo 'the first argument is database location'
exit
else
db_loc=$1
fi
if [[ -z "$2" ]]; then
echo 'the second argument is database name'
exit
else
db=$2
fi
if [[ -z "$3" ]]; then
format=parquet
else
format=$3
fi
# shellcheck disable=SC2016
echo '
CREATE DATABASE IF NOT EXISTS '"${db}"';
USE '"${db}"';
CREATE TABLE IF NOT EXISTS `customer`(
`c_custkey` BIGINT COMMENT "",
`c_name` VARCHAR(26) COMMENT "",
`c_address` VARCHAR(41) COMMENT "",
`c_city` VARCHAR(11) COMMENT "",
`c_nation` VARCHAR(16) COMMENT "",
`c_region` VARCHAR(13) COMMENT "",
`c_phone` VARCHAR(16) COMMENT "",
`c_mktsegment` VARCHAR(11) COMMENT "")
USING '"${format}"'
LOCATION "'"${db_loc}"/customer'";
CREATE TABLE IF NOT EXISTS `dates`(
`d_datekey` BIGINT COMMENT "",
`d_date` VARCHAR(20) COMMENT "",
`d_dayofweek` VARCHAR(10) COMMENT "",
`d_month` VARCHAR(11) COMMENT "",
`d_year` BIGINT COMMENT "",
`d_yearmonthnum` BIGINT COMMENT "",
`d_yearmonth` VARCHAR(9) COMMENT "",
`d_daynuminweek` BIGINT COMMENT "",
`d_daynuminmonth` BIGINT COMMENT "",
`d_daynuminyear` BIGINT COMMENT "",
`d_monthnuminyear` BIGINT COMMENT "",
`d_weeknuminyear` BIGINT COMMENT "",
`d_sellingseason` VARCHAR(14) COMMENT "",
`d_lastdayinweekfl` BIGINT COMMENT "",
`d_lastdayinmonthfl` BIGINT COMMENT "",
`d_holidayfl` BIGINT COMMENT "",
`d_weekdayfl` BIGINT COMMENT "")
USING '"${format}"'
LOCATION "'"${db_loc}"/dates'";
CREATE TABLE IF NOT EXISTS `lineorder`(
`lo_orderkey` BIGINT COMMENT "",
`lo_linenumber` BIGINT COMMENT "",
`lo_custkey` BIGINT COMMENT "",
`lo_partkey` BIGINT COMMENT "",
`lo_suppkey` BIGINT COMMENT "",
`lo_orderdate` BIGINT COMMENT "",
`lo_orderpriority` VARCHAR(16) COMMENT "",
`lo_shippriority` BIGINT COMMENT "",
`lo_quantity` BIGINT COMMENT "",
`lo_extendedprice` BIGINT COMMENT "",
`lo_ordtotalprice` BIGINT COMMENT "",
`lo_discount` BIGINT COMMENT "",
`lo_revenue` BIGINT COMMENT "",
`lo_supplycost` BIGINT COMMENT "",
`lo_tax` BIGINT COMMENT "",
`lo_commitdate` BIGINT COMMENT "",
`lo_shipmode` VARCHAR(11) COMMENT "")
USING '"${format}"'
LOCATION "'"${db_loc}"/lineorder'";
CREATE TABLE IF NOT EXISTS `part`(
`p_partkey` BIGINT COMMENT "",
`p_name` VARCHAR(23) COMMENT "",
`p_mfgr` VARCHAR(7) COMMENT "",
`p_category` VARCHAR(8) COMMENT "",
`p_brand` VARCHAR(10) COMMENT "",
`p_color` VARCHAR(12) COMMENT "",
`p_type` VARCHAR(26) COMMENT "",
`p_size` BIGINT COMMENT "",
`p_container` VARCHAR(11) COMMENT "")
USING '"${format}"'
LOCATION "'"${db_loc}"/part'";
CREATE TABLE IF NOT EXISTS `supplier`(
`s_suppkey` BIGINT COMMENT "",
`s_name` VARCHAR(26) COMMENT "",
`s_address` VARCHAR(26) COMMENT "",
`s_city` VARCHAR(11) COMMENT "",
`s_nation` VARCHAR(16) COMMENT "",
`s_region` VARCHAR(13) COMMENT "",
`s_phone` VARCHAR(16) COMMENT "")
USING '"${format}"'
LOCATION "'"${db_loc}"/supplier'";
CREATE TABLE IF NOT EXISTS `lineorder_flat` (
`lo_orderdate` BIGINT COMMENT "",
`lo_orderkey` BIGINT COMMENT "",
`lo_linenumber` TINYINT COMMENT "",
`lo_custkey` BIGINT COMMENT "",
`lo_partkey` BIGINT COMMENT "",
`lo_suppkey` BIGINT COMMENT "",
`lo_orderpriority` VARCHAR(100) COMMENT "",
`lo_shippriority` TINYINT COMMENT "",
`lo_quantity` TINYINT COMMENT "",
`lo_extendedprice` BIGINT COMMENT "",
`lo_ordtotalprice` BIGINT COMMENT "",
`lo_discount` TINYINT COMMENT "",
`lo_revenue` BIGINT COMMENT "",
`lo_supplycost` BIGINT COMMENT "",
`lo_tax` TINYINT COMMENT "",
`lo_commitdate` BIGINT COMMENT "",
`lo_shipmode` VARCHAR(100) COMMENT "",
`c_name` VARCHAR(100) COMMENT "",
`c_address` VARCHAR(100) COMMENT "",
`c_city` VARCHAR(100) COMMENT "",
`c_nation` VARCHAR(100) COMMENT "",
`c_region` VARCHAR(100) COMMENT "",
`c_phone` VARCHAR(100) COMMENT "",
`c_mktsegment` VARCHAR(100) COMMENT "",
`s_name` VARCHAR(100) COMMENT "",
`s_address` VARCHAR(100) COMMENT "",
`s_city` VARCHAR(100) COMMENT "",
`s_nation` VARCHAR(100) COMMENT "",
`s_region` VARCHAR(100) COMMENT "",
`s_phone` VARCHAR(100) COMMENT "",
`p_name` VARCHAR(100) COMMENT "",
`p_mfgr` VARCHAR(100) COMMENT "",
`p_category` VARCHAR(100) COMMENT "",
`p_brand` VARCHAR(100) COMMENT "",
`p_color` VARCHAR(100) COMMENT "",
`p_type` VARCHAR(100) COMMENT "",
`p_size` TINYINT COMMENT "",
`p_container` VARCHAR(100) COMMENT "")
USING '"${format}"'
LOCATION "'"${db_loc}"/lineorder_flat'";
'

View File

@ -0,0 +1,140 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
##############################################################
# See emr_tools.sh
##############################################################
if [[ -z "$1" ]]; then
echo 'the first argument is database location'
exit
else
db_loc=$1
fi
if [[ -z "$2" ]]; then
echo 'the second argument is database name'
exit
else
db=$2
fi
if [[ -z "$3" ]]; then
format=parquet
else
format=$3
fi
# shellcheck disable=SC2016
echo '
CREATE DATABASE IF NOT EXISTS '"${db}"' ;
USE '"${db}"';
CREATE TABLE IF NOT EXISTS `customer`(
`c_custkey` int,
`c_name` string,
`c_address` string,
`c_nationkey` int,
`c_phone` string,
`c_acctbal` decimal(12,2),
`c_mktsegment` string,
`c_comment` string)
USING '"${format}"'
LOCATION "'"${db_loc}"/customer'";
CREATE TABLE IF NOT EXISTS `lineitem`(
`l_orderkey` int,
`l_partkey` int,
`l_suppkey` int,
`l_linenumber` int,
`l_quantity` decimal(12,2),
`l_extendedprice` decimal(12,2),
`l_discount` decimal(12,2),
`l_tax` decimal(12,2),
`l_returnflag` string,
`l_linestatus` string,
`l_shipdate` date,
`l_commitdate` date,
`l_receiptdate` date,
`l_shipinstruct` string,
`l_shipmode` string,
`l_comment` string)
USING '"${format}"'
LOCATION "'"${db_loc}"/lineitem'";
CREATE TABLE IF NOT EXISTS `nation`(
`n_nationkey` int,
`n_name` string,
`n_regionkey` int,
`n_comment` string)
USING '"${format}"'
LOCATION "'"${db_loc}"/nation'";
CREATE TABLE IF NOT EXISTS `orders`(
`o_orderkey` int,
`o_custkey` int,
`o_orderstatus` string,
`o_totalprice` decimal(12,2),
`o_orderdate` date,
`o_orderpriority` string,
`o_clerk` string,
`o_shippriority` int,
`o_comment` string)
USING '"${format}"'
LOCATION "'"${db_loc}"/orders'";
CREATE TABLE IF NOT EXISTS `part`(
`p_partkey` int,
`p_name` string,
`p_mfgr` string,
`p_brand` string,
`p_type` string,
`p_size` int,
`p_container` string,
`p_retailprice` decimal(12,2),
`p_comment` string)
USING '"${format}"'
LOCATION "'"${db_loc}"/part'";
CREATE TABLE IF NOT EXISTS `partsupp`(
`ps_partkey` int,
`ps_suppkey` int,
`ps_availqty` int,
`ps_supplycost` decimal(12,2),
`ps_comment` string)
USING '"${format}"'
LOCATION "'"${db_loc}"/partsupp'";
CREATE TABLE IF NOT EXISTS `region` (
`r_regionkey` int,
`r_name` string,
`r_comment` string)
USING '"${format}"'
LOCATION "'"${db_loc}"/region'";
CREATE TABLE IF NOT EXISTS `supplier`(
`s_suppkey` int,
`s_name` string,
`s_address` string,
`s_nationkey` int,
`s_phone` string,
`s_acctbal` decimal(12,2),
`s_comment` string)
USING '"${format}"'
LOCATION "'"${db_loc}"/supplier'";
'

View File

@ -0,0 +1,43 @@
SELECT COUNT(*) FROM hits;
SELECT COUNT(*) FROM hits WHERE AdvEngineID <> 0;
SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits;
SELECT AVG(UserID) FROM hits;
SELECT COUNT(DISTINCT UserID) FROM hits;
SELECT COUNT(DISTINCT SearchPhrase) FROM hits;
SELECT MIN(EventDate), MAX(EventDate) FROM hits;
SELECT AdvEngineID, COUNT(*) FROM hits WHERE AdvEngineID <> 0 GROUP BY AdvEngineID ORDER BY COUNT(*) DESC;
SELECT RegionID, COUNT(DISTINCT UserID) AS u FROM hits GROUP BY RegionID ORDER BY u DESC LIMIT 10;
SELECT RegionID, SUM(AdvEngineID), COUNT(*) AS c, AVG(ResolutionWidth), COUNT(DISTINCT UserID) FROM hits GROUP BY RegionID ORDER BY c DESC LIMIT 10;
SELECT MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhoneModel ORDER BY u DESC LIMIT 10;
SELECT MobilePhone, MobilePhoneModel, COUNT(DISTINCT UserID) AS u FROM hits WHERE MobilePhoneModel <> '' GROUP BY MobilePhone, MobilePhoneModel ORDER BY u DESC LIMIT 10;
SELECT SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT SearchPhrase, COUNT(DISTINCT UserID) AS u FROM hits WHERE SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY u DESC LIMIT 10;
SELECT SearchEngineID, SearchPhrase, COUNT(*) AS c FROM hits WHERE SearchPhrase <> '' GROUP BY SearchEngineID, SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT UserID, COUNT(*) FROM hits GROUP BY UserID ORDER BY COUNT(*) DESC LIMIT 10;
SELECT UserID, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, SearchPhrase ORDER BY COUNT(*) DESC LIMIT 10;
SELECT UserID, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, SearchPhrase LIMIT 10;
SELECT UserID, extract(minute FROM EventTime) AS m, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, m, SearchPhrase ORDER BY COUNT(*) DESC LIMIT 10;
SELECT UserID FROM hits WHERE UserID = 435090932899640449;
SELECT COUNT(*) FROM hits WHERE URL LIKE '%google%';
SELECT SearchPhrase, MIN(URL), COUNT(*) AS c FROM hits WHERE URL LIKE '%google%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT SearchPhrase, MIN(URL), MIN(Title), COUNT(*) AS c, COUNT(DISTINCT UserID) FROM hits WHERE Title LIKE '%Google%' AND URL NOT LIKE '%.google.%' AND SearchPhrase <> '' GROUP BY SearchPhrase ORDER BY c DESC LIMIT 10;
SELECT * FROM hits WHERE URL LIKE '%google%' ORDER BY EventTime LIMIT 10;
SELECT SearchPhrase FROM hits WHERE SearchPhrase <> '' ORDER BY EventTime LIMIT 10;
SELECT SearchPhrase FROM hits WHERE SearchPhrase <> '' ORDER BY SearchPhrase LIMIT 10;
SELECT SearchPhrase FROM hits WHERE SearchPhrase <> '' ORDER BY EventTime, SearchPhrase LIMIT 10;
SELECT CounterID, AVG(length(URL)) AS l, COUNT(*) AS c FROM hits WHERE URL <> '' GROUP BY CounterID HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
SELECT REGEXP_REPLACE(Referer, '^https?://(?:www\.)?([^/]+)/.*$', '\1') AS k, AVG(length(Referer)) AS l, COUNT(*) AS c, MIN(Referer) FROM hits WHERE Referer <> '' GROUP BY k HAVING COUNT(*) > 100000 ORDER BY l DESC LIMIT 25;
SELECT SUM(ResolutionWidth), SUM(ResolutionWidth + 1), SUM(ResolutionWidth + 2), SUM(ResolutionWidth + 3), SUM(ResolutionWidth + 4), SUM(ResolutionWidth + 5), SUM(ResolutionWidth + 6), SUM(ResolutionWidth + 7), SUM(ResolutionWidth + 8), SUM(ResolutionWidth + 9), SUM(ResolutionWidth + 10), SUM(ResolutionWidth + 11), SUM(ResolutionWidth + 12), SUM(ResolutionWidth + 13), SUM(ResolutionWidth + 14), SUM(ResolutionWidth + 15), SUM(ResolutionWidth + 16), SUM(ResolutionWidth + 17), SUM(ResolutionWidth + 18), SUM(ResolutionWidth + 19), SUM(ResolutionWidth + 20), SUM(ResolutionWidth + 21), SUM(ResolutionWidth + 22), SUM(ResolutionWidth + 23), SUM(ResolutionWidth + 24), SUM(ResolutionWidth + 25), SUM(ResolutionWidth + 26), SUM(ResolutionWidth + 27), SUM(ResolutionWidth + 28), SUM(ResolutionWidth + 29), SUM(ResolutionWidth + 30), SUM(ResolutionWidth + 31), SUM(ResolutionWidth + 32), SUM(ResolutionWidth + 33), SUM(ResolutionWidth + 34), SUM(ResolutionWidth + 35), SUM(ResolutionWidth + 36), SUM(ResolutionWidth + 37), SUM(ResolutionWidth + 38), SUM(ResolutionWidth + 39), SUM(ResolutionWidth + 40), SUM(ResolutionWidth + 41), SUM(ResolutionWidth + 42), SUM(ResolutionWidth + 43), SUM(ResolutionWidth + 44), SUM(ResolutionWidth + 45), SUM(ResolutionWidth + 46), SUM(ResolutionWidth + 47), SUM(ResolutionWidth + 48), SUM(ResolutionWidth + 49), SUM(ResolutionWidth + 50), SUM(ResolutionWidth + 51), SUM(ResolutionWidth + 52), SUM(ResolutionWidth + 53), SUM(ResolutionWidth + 54), SUM(ResolutionWidth + 55), SUM(ResolutionWidth + 56), SUM(ResolutionWidth + 57), SUM(ResolutionWidth + 58), SUM(ResolutionWidth + 59), SUM(ResolutionWidth + 60), SUM(ResolutionWidth + 61), SUM(ResolutionWidth + 62), SUM(ResolutionWidth + 63), SUM(ResolutionWidth + 64), SUM(ResolutionWidth + 65), SUM(ResolutionWidth + 66), SUM(ResolutionWidth + 67), SUM(ResolutionWidth + 68), SUM(ResolutionWidth + 69), SUM(ResolutionWidth + 70), SUM(ResolutionWidth + 71), SUM(ResolutionWidth + 72), SUM(ResolutionWidth + 73), SUM(ResolutionWidth + 74), SUM(ResolutionWidth + 75), SUM(ResolutionWidth + 76), SUM(ResolutionWidth + 77), SUM(ResolutionWidth + 78), SUM(ResolutionWidth + 79), SUM(ResolutionWidth + 80), SUM(ResolutionWidth + 81), SUM(ResolutionWidth + 82), SUM(ResolutionWidth + 83), SUM(ResolutionWidth + 84), SUM(ResolutionWidth + 85), SUM(ResolutionWidth + 86), SUM(ResolutionWidth + 87), SUM(ResolutionWidth + 88), SUM(ResolutionWidth + 89) FROM hits;
SELECT SearchEngineID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits WHERE SearchPhrase <> '' GROUP BY SearchEngineID, ClientIP ORDER BY c DESC LIMIT 10;
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits WHERE SearchPhrase <> '' GROUP BY WatchID, ClientIP ORDER BY c DESC LIMIT 10;
SELECT WatchID, ClientIP, COUNT(*) AS c, SUM(IsRefresh), AVG(ResolutionWidth) FROM hits GROUP BY WatchID, ClientIP ORDER BY c DESC LIMIT 10;
SELECT URL, COUNT(*) AS c FROM hits GROUP BY URL ORDER BY c DESC LIMIT 10;
SELECT 1, URL, COUNT(*) AS c FROM hits GROUP BY 1, URL ORDER BY c DESC LIMIT 10;
SELECT ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3, COUNT(*) AS c FROM hits GROUP BY ClientIP, ClientIP - 1, ClientIP - 2, ClientIP - 3 ORDER BY c DESC LIMIT 10;
SELECT URL, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND URL <> '' GROUP BY URL ORDER BY PageViews DESC LIMIT 10;
SELECT Title, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = 0 AND IsRefresh = 0 AND Title <> '' GROUP BY Title ORDER BY PageViews DESC LIMIT 10;
SELECT URL, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND IsLink <> 0 AND IsDownload = 0 GROUP BY URL ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
SELECT TraficSourceID, SearchEngineID, AdvEngineID, CASE WHEN (SearchEngineID = 0 AND AdvEngineID = 0) THEN Referer ELSE '' END AS Src, URL AS Dst, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 GROUP BY TraficSourceID, SearchEngineID, AdvEngineID, Src, Dst ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
SELECT URLHash, EventDate, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND TraficSourceID IN (-1, 6) AND RefererHash = 3594120000172545465 GROUP BY URLHash, EventDate ORDER BY PageViews DESC LIMIT 10 OFFSET 100;
SELECT WindowClientWidth, WindowClientHeight, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND IsRefresh = 0 AND DontCountHits = 0 AND URLHash = 2868770270353813622 GROUP BY WindowClientWidth, WindowClientHeight ORDER BY PageViews DESC LIMIT 10 OFFSET 10000;
SELECT DATE_FORMAT(EventTime, '%Y-%m-%d %H:%i:00') AS M, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-14' AND EventDate <= '2013-07-15' AND IsRefresh = 0 AND DontCountHits = 0 GROUP BY DATE_FORMAT(EventTime, '%Y-%m-%d %H:%i:00') ORDER BY DATE_FORMAT(EventTime, '%Y-%m-%d %H:%i:00') LIMIT 10 OFFSET 1000;

View File

@ -0,0 +1,13 @@
SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE LO_ORDERDATE >= 19930101 AND LO_ORDERDATE <= 19931231 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;
SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE LO_ORDERDATE >= 19940101 AND LO_ORDERDATE <= 19940131 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;
SELECT SUM(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE weekofyear(LO_ORDERDATE) = 6 AND LO_ORDERDATE >= 19940101 AND LO_ORDERDATE <= 19941231 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;
SELECT SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR, P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BY YEAR, P_BRAND ORDER BY YEAR, P_BRAND;
SELECT SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR, P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BY YEAR, P_BRAND ORDER BY YEAR, P_BRAND;
SELECT SUM(LO_REVENUE), (LO_ORDERDATE DIV 10000) AS YEAR, P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BY YEAR, P_BRAND ORDER BY YEAR, P_BRAND;
SELECT C_NATION, S_NATION, (LO_ORDERDATE DIV 10000) AS YEAR, SUM(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND LO_ORDERDATE >= 19920101 AND LO_ORDERDATE <= 19971231 GROUP BY C_NATION, S_NATION, YEAR ORDER BY YEAR ASC, revenue DESC;
SELECT C_CITY, S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR, SUM(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND LO_ORDERDATE >= 19920101 AND LO_ORDERDATE <= 19971231 GROUP BY C_CITY, S_CITY, YEAR ORDER BY YEAR ASC, revenue DESC;
SELECT C_CITY, S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR, SUM(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_CITY IN ('UNITED KI1', 'UNITED KI5') AND S_CITY IN ('UNITED KI1', 'UNITED KI5') AND LO_ORDERDATE >= 19920101 AND LO_ORDERDATE <= 19971231 GROUP BY C_CITY, S_CITY, YEAR ORDER BY YEAR ASC, revenue DESC;
SELECT C_CITY, S_CITY, (LO_ORDERDATE DIV 10000) AS YEAR, SUM(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_CITY IN ('UNITED KI1', 'UNITED KI5') AND S_CITY IN ('UNITED KI1', 'UNITED KI5') AND LO_ORDERDATE >= 19971201 AND LO_ORDERDATE <= 19971231 GROUP BY C_CITY, S_CITY, YEAR ORDER BY YEAR ASC, revenue DESC;
SELECT (LO_ORDERDATE DIV 10000) AS YEAR, C_NATION, SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND P_MFGR IN ('MFGR#1', 'MFGR#2') GROUP BY YEAR, C_NATION ORDER BY YEAR ASC, C_NATION ASC;
SELECT (LO_ORDERDATE DIV 10000) AS YEAR, S_NATION, P_CATEGORY, SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND LO_ORDERDATE >= 19970101 AND LO_ORDERDATE <= 19981231 AND P_MFGR IN ('MFGR#1', 'MFGR#2') GROUP BY YEAR, S_NATION, P_CATEGORY ORDER BY YEAR ASC, S_NATION ASC, P_CATEGORY ASC;
SELECT (LO_ORDERDATE DIV 10000) AS YEAR, S_CITY, P_BRAND, SUM(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND LO_ORDERDATE >= 19970101 AND LO_ORDERDATE <= 19981231 AND P_CATEGORY = 'MFGR#14' GROUP BY YEAR, S_CITY, P_BRAND ORDER BY YEAR ASC, S_CITY ASC, P_BRAND ASC;

View File

@ -0,0 +1,13 @@
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE FROM lineorder, dates WHERE lo_orderdate = d_datekey AND d_year = 1993 AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE FROM lineorder, dates WHERE lo_orderdate = d_datekey AND d_yearmonth = 'Jan1994' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35;
SELECT SUM(lo_extendedprice * lo_discount) AS REVENUE FROM lineorder, dates WHERE lo_orderdate = d_datekey AND d_weeknuminyear = 6 AND d_year = 1994 AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35;
SELECT SUM(lo_revenue), d_year, p_brand FROM lineorder, dates, part, supplier WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey AND lo_suppkey = s_suppkey AND p_category = 'MFGR#12' AND s_region = 'AMERICA' GROUP BY d_year, p_brand ORDER BY p_brand;
SELECT SUM(lo_revenue), d_year, p_brand FROM lineorder, dates, part, supplier WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey AND lo_suppkey = s_suppkey AND p_brand BETWEEN 'MFGR#2221' AND 'MFGR#2228' AND s_region = 'ASIA' GROUP BY d_year, p_brand ORDER BY d_year, p_brand;
SELECT SUM(lo_revenue), d_year, p_brand FROM lineorder, dates, part, supplier WHERE lo_orderdate = d_datekey AND lo_partkey = p_partkey AND lo_suppkey = s_suppkey AND p_brand = 'MFGR#2239' AND s_region = 'EUROPE' GROUP BY d_year, p_brand ORDER BY d_year, p_brand;
SELECT c_nation, s_nation, d_year, SUM(lo_revenue) AS REVENUE FROM customer, lineorder, supplier, dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_orderdate = d_datekey AND c_region = 'ASIA' AND s_region = 'ASIA' AND d_year >= 1992 AND d_year <= 1997 GROUP BY c_nation, s_nation, d_year ORDER BY d_year ASC, REVENUE DESC;
SELECT c_city, s_city, d_year, SUM(lo_revenue) AS REVENUE FROM customer, lineorder, supplier, dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_orderdate = d_datekey AND c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND d_year >= 1992 AND d_year <= 1997 GROUP BY c_city, s_city, d_year ORDER BY d_year ASC, REVENUE DESC;
SELECT c_city, s_city, d_year, SUM(lo_revenue) AS REVENUE FROM customer, lineorder, supplier, dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_orderdate = d_datekey AND ( c_city = 'UNITED KI1' OR c_city = 'UNITED KI5' ) AND ( s_city = 'UNITED KI1' OR s_city = 'UNITED KI5' ) AND d_year >= 1992 AND d_year <= 1997 GROUP BY c_city, s_city, d_year ORDER BY d_year ASC, REVENUE DESC;
SELECT c_city, s_city, d_year, SUM(lo_revenue) AS REVENUE FROM customer, lineorder, supplier, dates WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_orderdate = d_datekey AND ( c_city = 'UNITED KI1' OR c_city = 'UNITED KI5' ) AND ( s_city = 'UNITED KI1' OR s_city = 'UNITED KI5' ) AND d_yearmonth = 'Dec1997' GROUP BY c_city, s_city, d_year ORDER BY d_year ASC, REVENUE DESC;
SELECT d_year, c_nation, SUM(lo_revenue - lo_supplycost) AS PROFIT FROM dates, customer, supplier, part, lineorder WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_partkey = p_partkey AND lo_orderdate = d_datekey AND c_region = 'AMERICA' AND s_region = 'AMERICA' AND ( p_mfgr = 'MFGR#1' OR p_mfgr = 'MFGR#2' ) GROUP BY d_year, c_nation ORDER BY d_year, c_nation;
SELECT d_year, s_nation, p_category, SUM(lo_revenue - lo_supplycost) AS PROFIT FROM dates, customer, supplier, part, lineorder WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_partkey = p_partkey AND lo_orderdate = d_datekey AND c_region = 'AMERICA' AND s_region = 'AMERICA' AND ( d_year = 1997 OR d_year = 1998 ) AND ( p_mfgr = 'MFGR#1' OR p_mfgr = 'MFGR#2' ) GROUP BY d_year, s_nation, p_category ORDER BY d_year, s_nation, p_category;
SELECT d_year, s_city, p_brand, SUM(lo_revenue - lo_supplycost) AS PROFIT FROM dates, customer, supplier, part, lineorder WHERE lo_custkey = c_custkey AND lo_suppkey = s_suppkey AND lo_partkey = p_partkey AND lo_orderdate = d_datekey AND s_nation = 'UNITED STATES' AND ( d_year = 1997 OR d_year = 1998 ) AND p_category = 'MFGR#14' GROUP BY d_year, s_city, p_brand ORDER BY d_year, s_city, p_brand;

View File

@ -0,0 +1,22 @@
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '90' day group by l_returnflag, l_linestatus order by l_returnflag, l_linestatus;
select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like '%BRASS' and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' and ps_supplycost = ( select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'EUROPE' ) order by s_acctbal desc, n_name, s_name, p_partkey limit 100;
select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = 'BUILDING' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < date '1995-03-15' and l_shipdate > date '1995-03-15' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate limit 10;
select o_orderpriority, count(*) as order_count from orders where o_orderdate >= date '1993-07-01' and o_orderdate < date '1993-07-01' + interval '3' month and exists ( select * from lineitem where l_orderkey = o_orderkey and l_commitdate < l_receiptdate ) group by o_orderpriority order by o_orderpriority;
select n_name, sum(l_extendedprice * (1 - l_discount)) as revenue from customer, orders, lineitem, supplier, nation, region where c_custkey = o_custkey and l_orderkey = o_orderkey and l_suppkey = s_suppkey and c_nationkey = s_nationkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = 'ASIA' and o_orderdate >= date '1994-01-01' and o_orderdate < date '1994-01-01' + interval '1' year group by n_name order by revenue desc;
select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year and l_discount between .06 - 0.01 and .06 + 0.01 and l_quantity < 24;
select supp_nation, cust_nation, l_year, sum(volume) as revenue from ( select n1.n_name as supp_nation, n2.n_name as cust_nation, extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY') or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE') ) and l_shipdate between date '1995-01-01' and date '1996-12-31' ) as shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year;
select o_year, sum(case when nation = 'BRAZIL' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) as volume, n2.n_name as nation from part, supplier, lineitem, orders, customer, nation n1, nation n2, region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'AMERICA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'ECONOMY ANODIZED STEEL' ) as all_nations group by o_year order by o_year;
select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%green%' ) as profit group by nation, o_year order by nation, o_year desc;
select c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment from customer, orders, lineitem, nation where c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date '1993-10-01' and o_orderdate < date '1993-10-01' + interval '3' month and l_returnflag = 'R' and c_nationkey = n_nationkey group by c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment order by revenue desc limit 20;
select ps_partkey, sum(ps_supplycost * ps_availqty) as value from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' group by ps_partkey having sum(ps_supplycost * ps_availqty) > ( select sum(ps_supplycost * ps_availqty) * 0.000002 from partsupp, supplier, nation where ps_suppkey = s_suppkey and s_nationkey = n_nationkey and n_name = 'GERMANY' ) order by value desc;
select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count, sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count from orders, lineitem where o_orderkey = l_orderkey and l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date '1994-01-01' and l_receiptdate < date '1994-01-01' + interval '1' year group by l_shipmode order by l_shipmode;
select c_count, count(*) as custdist from ( select c_custkey, count(o_orderkey) as c_count from customer left outer join orders on c_custkey = o_custkey and o_comment not like '%special%requests%' group by c_custkey ) as c_orders group by c_count order by custdist desc, c_count desc;
select 100.00 * sum(case when p_type like 'PROMO%' then l_extendedprice * (1 - l_discount) else 0 end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue from lineitem, part where l_partkey = p_partkey and l_shipdate >= date '1995-09-01' and l_shipdate < date '1995-09-01' + interval '1' month;
with revenue0 as ( select l_suppkey supplier_no, sum(l_extendedprice * (1 - l_discount)) total_revenue from lineitem where l_shipdate >= date '1996-01-01' and l_shipdate < date '1996-01-01' + interval '3' month group by l_suppkey ) select /*+SET_VAR(enable_nereids_planner=true,enable_pipeline_engine=true) */ s_suppkey, s_name, s_address, s_phone, total_revenue from supplier, revenue0 where s_suppkey = supplier_no and total_revenue = ( select max(total_revenue) from revenue0 ) order by s_suppkey;
select p_brand, p_type, p_size, count(distinct ps_suppkey) as supplier_cnt from partsupp, part where p_partkey = ps_partkey and p_brand <> 'Brand#45' and p_type not like 'MEDIUM POLISHED%' and p_size in (49, 14, 23, 45, 19, 3, 36, 9) and ps_suppkey not in ( select s_suppkey from supplier where s_comment like '%Customer%Complaints%' ) group by p_brand, p_type, p_size order by supplier_cnt desc, p_brand, p_type, p_size;
select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey );
select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 300 ) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate limit 100;
select sum(l_extendedprice* (1 - l_discount)) as revenue from lineitem, part where ( p_partkey = l_partkey and p_brand = 'Brand#12' and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') and l_quantity >= 1 and l_quantity <= 1 + 10 and p_size between 1 and 5 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#23' and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK') and l_quantity >= 10 and l_quantity <= 10 + 10 and p_size between 1 and 10 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' ) or ( p_partkey = l_partkey and p_brand = 'Brand#34' and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG') and l_quantity >= 20 and l_quantity <= 20 + 10 and p_size between 1 and 15 and l_shipmode in ('AIR', 'AIR REG') and l_shipinstruct = 'DELIVER IN PERSON' );
select s_name, s_address from supplier, nation where s_suppkey in ( select ps_suppkey from partsupp where ps_partkey in ( select p_partkey from part where p_name like 'forest%' ) and ps_availqty > ( select 0.5 * sum(l_quantity) from lineitem where l_partkey = ps_partkey and l_suppkey = ps_suppkey and l_shipdate >= date '1994-01-01' and l_shipdate < date '1994-01-01' + interval '1' year ) ) and s_nationkey = n_nationkey and n_name = 'CANADA' order by s_name;
select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey ) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate ) and s_nationkey = n_nationkey and n_name = 'SAUDI ARABIA' group by s_name order by numwait desc, s_name limit 100;
select cntrycode, count(*) as numcust, sum(c_acctbal) as totacctbal from ( select substring(c_phone, 1, 2) as cntrycode, c_acctbal from customer where substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17') and c_acctbal > ( select avg(c_acctbal) from customer where c_acctbal > 0.00 and substring(c_phone, 1, 2) in ('13', '31', '23', '29', '30', '18', '17') ) and not exists ( select * from orders where o_custkey = c_custkey ) ) as custsale group by cntrycode order by cntrycode;

View File

@ -0,0 +1,46 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
##############################################################
# See emr_tools.sh
##############################################################
set -e
FE_HOST=$1
USER=$2
FE_QUERY_PORT=$3
DB=$4
TRIES=3
QUERY_NUM=1
RESULT_FILE=result-master-"${DB}".csv
touch "${RESULT_FILE}"
truncate -s 0 "${RESULT_FILE}"
while read -r query; do
echo -n "query${QUERY_NUM}," | tee -a "${RESULT_FILE}"
for i in $(seq 1 "${TRIES}"); do
RES=$(mysql -vvv -h"${FE_HOST}" -u"${USER}" -P"${FE_QUERY_PORT}" -D"${DB}" -e "${query}" | perl -nle 'print $1 if /((\d+\.\d+)+ sec)/' || :)
echo -n "${RES}" | tee -a "${RESULT_FILE}"
[[ "${i}" != "${TRIES}" ]] && echo -n "," | tee -a "${RESULT_FILE}"
done
echo "" | tee -a "${RESULT_FILE}"
QUERY_NUM=$((QUERY_NUM + 1))
done <"$5"

View File

@ -0,0 +1,76 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
##############################################################
# See emr_tools.sh
##############################################################
# usage: sh run.sh dlf parquet
FE_HOST=$1
USER=$2
PORT=$3
if [[ -z "$4" ]]; then
echo 'need catalog name'
exit
else
catalog_name=$4
fi
if [[ -z "$5" ]]; then
echo "run all test default"
elif [[ "$5" = 'all' ]]; then
echo "run all test"
else
case=$5
fi
if [[ -z ${TYPE} ]]; then
TYPE=obj
fi
echo "execute ${case} benchmark for ${TYPE}..."
if [[ "${case}" = 'ssb' ]]; then
# ssb
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_parquet_"${TYPE}" queries/ssb_queries.sql
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_orc_"${TYPE}" queries/ssb_queries.sql
elif [[ "${case}" = 'ssb_flat' ]]; then
# ssb_flat
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_parquet_"${TYPE}" queries/ssb_flat_queries.sql
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_orc_"${TYPE}" queries/ssb_flat_queries.sql
elif [[ "${case}" = 'tpch' ]]; then
# tpch
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".tpch100_parquet_"${TYPE}" queries/tpch_queries.sql
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".tpch100_orc_"${TYPE}" queries/tpch_queries.sql
elif [[ "${case}" = 'clickbench' ]]; then
# clickbench
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".clickbench_parquet_"${TYPE}" queries/clickbench_queries.sql
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".clickbench_orc_"${TYPE}" queries/clickbench_queries.sql
else
# run all
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_parquet_"${TYPE}" queries/ssb_queries.sql
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_orc_"${TYPE}" queries/ssb_queries.sql
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_parquet_"${TYPE}" queries/ssb_flat_queries.sql
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".ssb100_orc_"${TYPE}" queries/ssb_flat_queries.sql
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".tpch100_parquet_"${TYPE}" queries/tpch_queries.sql
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".tpch100_orc_"${TYPE}" queries/tpch_queries.sql
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".clickbench_parquet_"${TYPE}" queries/clickbench_queries.sql
sh run_queries.sh "${FE_HOST}" "${USER}" "${PORT}" "${catalog_name}".clickbench_orc_"${TYPE}" queries/clickbench_queries.sql
fi