The inferface of hive metastore changes from version to version.
Currently, Doris use hive 2.3.7 as hms client version.
When using to connect hive 1.x, some interface such as get_table_req does not exist
in hive 1.x. So we can't get metadata from hive 1.x.
In this PR, I copied the HiveMetastoreClient from hive 2.3.7 release, and modify some of interface's
implementation, so that it will use old interface to connect to hive 1.x.
And when creating hms catalog, you can specify the hive version, eg:
CREATE CATALOG `hive` PROPERTIES (
"hive.metastore.uris" = "thrift://127.0.0.1:9083",
"type" = "hms",
"hive.version" = "1.1"
);
If hive.version does not specified, Doris will use hive 2.3.x compatible interface to visit hms.
When we create a materialized view for multiple tables, users may not figure out the partition rule for the materialized view, because the query result can be too complex. If the query result doesn't match one of the partition rules, the insertion will fail.
We can resolve this issue by mapping the partition rule of base table to the materialized view. As a result, users don't need specify the partition rules and query results are all valid because they are retrieved from the partitions of the base table.
## Use case
mysql> CREATE TABLE t1 (pk INT NOT NULL, v1 INT SUM) PARTITION BY RANGE(pk) (
-> PARTITION p1 VALUES LESS THAN ('10'),
-> PARTITION p2 VALUES LESS THAN ('90')
-> )
-> DISTRIBUTED BY HASH(pk)
-> PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.04 sec)
mysql> CREATE TABLE t2 (pk INT NOT NULL, v2 INT SUM) PARTITION BY LIST(pk) (
-> PARTITION odd VALUES IN ('10', '30', '50', '70', '90'),
-> PARTITION even VALUES IN ('20', '40', '60', '80')
-> )
-> DISTRIBUTED BY HASH(pk)
-> PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.02 sec)
mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE
-> KEY (mpk) PARTITION BY (t1.pk) DISTRIBUTED BY HASH(mpk) PROPERTIES ('replication_num' = '1')
-> AS SELECT t1.pk AS mpk, v1, v2 FROM t1, t2 WHERE t1.pk = t2.pk;
Query OK, 0 rows affected (0.10 sec)
mysql> SHOW CREATE TABLE mv;
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Materialized View | Create Materialized View |
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| mv | CREATE MATERIALIZED VIEW `mv`
BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND
KEY(`mpk`)
PARTITION BY RANGE(`mpk`)
(PARTITION p1 VALUES [("-2147483648"), ("10")),
PARTITION p2 VALUES [("10"), ("90")))
DISTRIBUTED BY HASH(`mpk`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
)
AS SELECT `t1`.`pk` AS `mpk`, `v1` AS `v1`, `v2` AS `v2` FROM `default_cluster:dev`.`t1` , `default_cluster:dev`.`t2` WHERE `t1`.`pk` = `t2`.`pk`; |
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Fix a bug that JDBC catalog/database/table should be add to GsonUtil
Fix a class loader issue that sometime it will cause ClassNotFoundException
Fix regression test to use different catalog name.
Comment out 2 regression tests:
regression-test/suites/query_p0/system/test_query_sys.groovy
regression-test/suites/statistics/alter_col_stats.groovy
Need to be fixed later
[What is DLF](https://www.alibabacloud.com/product/datalake-formation)
This PR is a preparation for support DLF, with some changes of multi catalog
1. Add RuntimeException for most of hive meta store or es client visit operation.
2. Add DLF related dependencies.
3. Move the checks of es catalog properties to the analysis phase of creating es catalog
TODO(in next PR):
1. Refactor the `getSplit` method to support not only hdfs, but s3-compatible object storage.
2. Finish the implementation of supporting DLF
1. fix all checkstyle warning
2. change all checkstyle rules to error
3. remove some java doc rules
a. RequireEmptyLineBeforeBlockTagGroup
b. JavadocStyle
c. JavadocParagraph
4. suppress some rules for old codes
a. all java doc rules only affect on Nereids
b. DeclarationOrder only affect on Nereids
c. OverloadMethodsDeclarationOrder only affect on Nereids
d. VariableDeclarationUsageDistance only affect on Nereids
e. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/ColumnParser.java
f. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/SparkRDDAggregator.java
g. suppress LineLength on org/apache/doris/catalog/FunctionSet.java
h. suppress LineLength on org/apache/doris/common/ErrorCode.java
Current fe check style check all files. But some rules should be only applied on production files.
Add suppressions to suppress some rules on test files.
Currently, we use `UtFrameUtils` to start a FE server in the FE unit test.
Each test class has to do some initialization and clean up stuff with the JUnit4
`@BeforeClass` and `@AfterClass` annotation. It's redundant and boring.
Besides, almost all the APIs in `UtFrameUtils` has a `ConnectContext` parameter, which is not easy to use.
This PR proposes to use an inherit-manner, i.e., wrap all the common logic in base class `TestWithFeService`,
leveraging the
JUnit5 `@BeforeAll` and `@AfterAll` annotation to narrow down the setup and cleanup lifecycle to each test class instance.
At the same time, the derived concrete test class could directly use utility methods inherited from the base class,
without calling a util class and passing a `ConnectContext` argument.
`UtFrameUtils` and `DorisAssert` are marked as deprecated. We could remove these two classes
if this refactor works well for a time.
In #8319, I remove mysql-connector-java dependency because of license incompatibility.
But we need a mysql compatible driver for http query api. So I choose mariadb-java-client,
which is under LGPL.
This PR fixes the #8731 and refactor the `build.sh` script.
The build.sh script is currently responsible for the compilation of the following Doris components.
1. FE
- fe-common
- fe-core
- spark-dpp
- hive-udf
- java-udf
- ui
2. BE
- palo_be
- meta_tool
3. broker
In the FE module.
- The 4 submodules `fe-common, fe-core, spark-dpp and ui` together form Frontend.
- `spark-dpp, hive-udf and java-udf` can be compiled separately to produce jar packages for individual use.
In the BE module.
- `palo_be` can start the BE process separately.
- `meta_tool` can be compiled separately to produce binaries.
The modified build.sh script has the following changes:
1. there is no longer an option to compile `ui` separately, build together with `--fe`.
2. `fe/be/spark-dpp/hive-udf/java-udf/palo_be/meta_tool` can be compiled separately.
3. all components except `java-udf` will be compiled by default (`java-udf` is in development)
Remaining issues:
Several submodules of FE have messy dependencies.
For example, `java-udf` depends on `fe-core`, and `fe-core` depends on `spark-dpp`,
resulting in a large binary jar of `java-udf`.
It needs to be reorganized afterwards.
This feature is propsoed in [DSIP-1](https://cwiki.apache.org/confluence/display/DORIS/DSIP-001%3A+Java+UDF).
This PR support fixed-length input and output Java UDF. Phase I in DIP-1 is done after this PR.
To support Java UDF effeciently, I use no data copy in JNI call and all compute operations are off-heap in Java.
To achieve that, I use a UdfExecutor instead.
For users, a UDF class must have a public evaluate method.
1. mysql-connector-java
mysql-connector-java is under GLPv2 license, which is not compatible with APLv2, and Doris does not use it.
2. org.json
org.json is under JSON license, which is not compatible with APLv2. I use `json-simple` to replace it.
This reverts commit df7e848cbbc8170c7bd83d812d7cac58b5574570.
Reverts apache/incubator-doris#8218
Because when using grpc 1.44.1, the corresponding `protoc-gen-grpc-java` plugin
requried GLIBC_2.14, which is not found in CentOS 6.
So I suggest to revert this commit this time. And considering upgrading this component
after most systems have reached glibc version 2.14.
And for Mac M1, you may have to change this version manually for now
This bug is introduced from #8112.
Also , I change the `grpc-netty` dependency to `grpc-netty-shaded`, to avoid dependency conflict:
```
java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.
```
Hive Bitmap UDF provides UDFs for generating bitmap and bitmap operations in hive tables.
The bitmap in Hive is exactly the same as the Doris bitmap.
The bitmap in Hive can be imported into Doris through spark bitmap load.
Close related #7389
Support create Iceberg external table in Doris.
This is the first step to support Iceberg external table.
### Create Iceberg external table
This pr describes two ways to create Iceberg external tables. Both ways do not require explicitly specifying column definitions, Doris automatically converts them based on Iceberg's column definitions.
1. Create an Iceberg external table directly
```sql
CREATE [EXTERNAL] TABLE table_name
ENGINE = ICEBERG
[COMMENT "comment"]
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.table" = "icberg_table_name",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
```
2. Create an Iceberg database and automatically create all the tables under that db.
```sql
CREATE DATABASE db_name
[COMMENT "comment"]
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
```
### Show table creation
1. For individual tables you can view them with `help show create table`.
```sql
mysql> show create table iceberg_db.logs_1;
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logs_1 | CREATE TABLE `logs_1` (
`level` varchar(-1) NOT NULL COMMENT "null",
`event_time` datetime NOT NULL COMMENT "null",
`message` varchar(-1) NOT NULL COMMENT "null"
) ENGINE=ICEBERG
COMMENT "ICEBERG"
PROPERTIES (
"iceberg.database" = "doris",
"iceberg.table" = "logs_1",
"iceberg.hive.metastore.uris" = "thrift://10.10.10.10:9087",
"iceberg.catalog.type" = "HIVE_CATALOG"
) |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
2. For Iceberg database, you can view it with `help show table creation`.
```sql
mysql> show table creation from iceberg_db;
+--------+---------+---------------------+---------------------------------------------------------+
| Table | Status | Create Time | Error Msg |
+--------+---------+---------------------+---------------------------------------------------------+
| logs | fail | 2021-12-14 13:50:10 | Cannot convert unknown type to Doris type: list<string> |
| logs_1 | success | 2021-12-14 13:50:10 | |
+--------+---------+---------------------+---------------------------------------------------------+
2 rows in set (0.00 sec)
```
This is a new syntax.
Show table creation records in Iceberg database:
Syntax:
```sql
SHOW TABLE CREATION [FROM db] [LIKE mask]
```