why upgrade? anything wrong?
Try to fix the problem about opentelemetry::v1::ext::http::client::curl::HttpOperation::Send(), I have updated the pr info.
bind netty-version to 4.1.89-final
bind jettison to 1.5.4
upgrade hadoop version to 3.3.5
upgrade range-plugins-common to 2.4.0
bind bcprov-jdk15on to 2.4.0
upgrade and bind woodstox to 6.5.1
upgrade and bind kerby to 2.0.3
upgrade hudi to 0.13.0
upgrade parquet to 1.13.0
upgrade maven-source-plugin to 3.2.1
upgrade maven-assembly-plugin to 3.3.0
upgrade maven-javadoc-plugin to 3.3.2
upgrade maven-shade-plugin to 3.3.4
upgrade maven-clean-plugin to 3.1.0
Remove meaningless plugins
Optimize doris maven path
Unify the Java modules for management in fe
`hudi-common` depends on `parque-avro`, but the dependency scope is `provide`.
When we use `hudi-catalog`, `HoodieAvroWriteSupport` will be called. This method depends on `parque-avro`, so it will generate ClassNotFound
Describe your changes.
`iceberg-hive-metastore` and `hive-storage-api` have been defined in hive-catalog-shade,
and some classes in the shade have been renamed, so we cannot declare them again.
The classes in the shade should be kept.
The `hive-metastore-api` used in `ranger` can also use the jar in the `shade`.
Since we rename the tool class used inside the `hive`, this has no effect.
`Hive 3` uses the `thrift-0.9.3` package, and `Doris` uses the `thrift-0.16.0` package.
These two packages are not compatible, so we use the `hive-sahde` package to manage hive dependencies
in a unified way. This jar package renames the `thrift` class , so the problem of conflict can be resolved.
add <optional> head to solve the compilation issue
use 3.12.9 as the protoc.artifact's version, because there is no 3.12.21
See: https://repo.maven.apache.org/maven2/com/google/protobuf/protoc/
Remove --show-progress arguments of wget because it is not supported in low version wget
when Upgrade the version of jackson,k8s client will failed
java.lang.NoClassDefFoundError: org/yaml/snakeyaml/LoaderOptions
at com.fasterxml.jackson.dataformat.yaml.YAMLParser.(YAMLParser.java:191) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.dataformat.yaml.YAMLFactory._createParser(YAMLFactory.java:509) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:413) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:386) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:15) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3677) ~[jackson-databind-2.14.2.jar:2.14.2]
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3645) ~[jackson-databind-2.14.2.jar:2.14.2]
at io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:47) ~[kubernetes-client-5.12.2.jar:?]
...
Describe your changes.
1.Add the watch mechanism to listen for changes in k8s statefulSet and update nodes in time.
2.For broker, there is only one name by default when using deployManager
3.Refactoring code makes it easier to understand and maintain
4.Fix jar package conflicts between okhttp-ws and okhttp
Previously, the logic of k8sDeployManager.getGroupHostInfos was to call the endpoints () interface of k8s,
which would cause if the pod was unexpectedly restarted, k8sDeployManager would delete the pod before the
restart from the fe or be list and add the pod after the restart to the fe or be list, which obviously does not
meet our expectations.
Now, after fqdn is enabled, we call the statefulSets() interface of k8s to listen for the number of copies to
determine whether we need to be online or offline.
In addition, the watch mechanism is added to avoid the possible A-B-A problem caused by timed polling.
For the sake of stability, when the watch mechanism does not receive messages for a period of time,
it will be degraded to the polling mode.
Now several environment variables have been added,ENV_FE_STATEFULSET,ENV_FE_OBSERVER_STATEFULSET,ENV_BE_STATEFULSET,ENV_BROKER_STATEFULSET,ENV_CN_STATEFULSET For statefulsetName,One-to-one correspondence with ENV_FE_SERVICE,ENV_FE_OBSERVER_SERVICE,ENV_BE_SERVICE,ENV_BROKER_SERVICE,ENV_CN_SERVICE,If a serviceName is configured, the corresponding statefulsetName must be configured, otherwise the program cannot be started.
In #17797 , we introduced aspectj to help log exception easily.
However, the plugin version 1.11 do not support jdk9 and later.
For support compile FE with jdk11
update aspectj-maven-plugin to 1.14.0 version
add new dependency org.aspectj.aspectjrt 1.9.7 to fe-core
according to:
aspectj java version compatibility
aspectj-maven-plugin issue
aspectj release note
intro to aspectj
* Upgrade log4j to 2.X
- binding log4j version to 2.18.0
- used log4j-1.2-api complete smooth upgrade
* Upgrade filerupload to 1.5
* Upgrade commons-io to 2.7
* Upgrade commons-compress to 1.22
* Upgrade gson to 2.8.9
* Upgrade guava to 30.0-jre
* Binding jackson version to 2.14.2
* Upgrade netty-all to 4.1.89.final
* Upgrade protobuf to 3.21.12
* Upgrade kafka-clints to 3.4.0
* Upgrade calcite version to 1.33.0
* Upgrade aws-java-sdk to 1.12.302
* Upgrade hadoop to 3.3.4
* Upgrade zookeeper to 3.4.14
* Binding tomcat-embed-core to 8.5.86
* Upgrade apache parent pom to 25
* Use hive-exec-core as a hive dependency, add the missing jar-hive-serde separately
* Basic public dependencies are extracted to parent dependencies
* Use jackson uniformly as the basic json tool
* Remove springloaded, spring-boot-devtools has the same functionality
* Modify the spark-related dependency scope to provide, which should be provided at runtime
This pr does three things:
1. Use Druid instead of HikariCP in JdbcClient
2. when download udf jar, add the name of the jar package after the local file name.
3. refactor some jdbcResource code
Support IPV6 in Apache Doris, the main changes are:
1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string
2. BRPC and HTTP support binding to IPV6 address
3. BRPC and HTTP support visiting IPV6 Services
1. Spark dpp
Move `DppResult` and `EtlJobConfig` to sparkdpp package in `fe-common` module.
So taht `fe-core` is longer depends on `spark-dpp` module, so that the `spark-dpp.jar`
will not be moved into `fe/lib`, which reduce the size of FE output.
2. Modify start_fe.sh
Modify the CLASSPATH to make sure that doris-fe.jar is at front, so that
when loading classes with same qualified name, it will be got from doris-fe.jar firstly.
3. Upgrade hadoop and hive version
hadoop: 2.10.2 -> 3.3.3
hive: 2.3.7 -> 3.1.3
4. Override the IHiveMetastoreClient implementations from dependency
`ProxyMetaStoreClient.java` for Aliyun DLF.
`HiveMetaStoreClient.java` for origin Apache Hive metastore.
Because I need to modified some of their method to make them compatible with
different version of Hive.
5. Exclude some unused dependencies to reduce the size of FE output
Now it is only 370MB (Before is 600MB)
6. Upgrade aws-java-sdk version to 1.12.31
7. Support AWS Glue Data Catalog
8. Remove HudiScanNode(no longer support)
The inferface of hive metastore changes from version to version.
Currently, Doris use hive 2.3.7 as hms client version.
When using to connect hive 1.x, some interface such as get_table_req does not exist
in hive 1.x. So we can't get metadata from hive 1.x.
In this PR, I copied the HiveMetastoreClient from hive 2.3.7 release, and modify some of interface's
implementation, so that it will use old interface to connect to hive 1.x.
And when creating hms catalog, you can specify the hive version, eg:
CREATE CATALOG `hive` PROPERTIES (
"hive.metastore.uris" = "thrift://127.0.0.1:9083",
"type" = "hms",
"hive.version" = "1.1"
);
If hive.version does not specified, Doris will use hive 2.3.x compatible interface to visit hms.
When we create a materialized view for multiple tables, users may not figure out the partition rule for the materialized view, because the query result can be too complex. If the query result doesn't match one of the partition rules, the insertion will fail.
We can resolve this issue by mapping the partition rule of base table to the materialized view. As a result, users don't need specify the partition rules and query results are all valid because they are retrieved from the partitions of the base table.
## Use case
mysql> CREATE TABLE t1 (pk INT NOT NULL, v1 INT SUM) PARTITION BY RANGE(pk) (
-> PARTITION p1 VALUES LESS THAN ('10'),
-> PARTITION p2 VALUES LESS THAN ('90')
-> )
-> DISTRIBUTED BY HASH(pk)
-> PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.04 sec)
mysql> CREATE TABLE t2 (pk INT NOT NULL, v2 INT SUM) PARTITION BY LIST(pk) (
-> PARTITION odd VALUES IN ('10', '30', '50', '70', '90'),
-> PARTITION even VALUES IN ('20', '40', '60', '80')
-> )
-> DISTRIBUTED BY HASH(pk)
-> PROPERTIES ('replication_num' = '1');
Query OK, 0 rows affected (0.02 sec)
mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE
-> KEY (mpk) PARTITION BY (t1.pk) DISTRIBUTED BY HASH(mpk) PROPERTIES ('replication_num' = '1')
-> AS SELECT t1.pk AS mpk, v1, v2 FROM t1, t2 WHERE t1.pk = t2.pk;
Query OK, 0 rows affected (0.10 sec)
mysql> SHOW CREATE TABLE mv;
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Materialized View | Create Materialized View |
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| mv | CREATE MATERIALIZED VIEW `mv`
BUILD IMMEDIATE REFRESH COMPLETE ON DEMAND
KEY(`mpk`)
PARTITION BY RANGE(`mpk`)
(PARTITION p1 VALUES [("-2147483648"), ("10")),
PARTITION p2 VALUES [("10"), ("90")))
DISTRIBUTED BY HASH(`mpk`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
)
AS SELECT `t1`.`pk` AS `mpk`, `v1` AS `v1`, `v2` AS `v2` FROM `default_cluster:dev`.`t1` , `default_cluster:dev`.`t2` WHERE `t1`.`pk` = `t2`.`pk`; |
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Fix a bug that JDBC catalog/database/table should be add to GsonUtil
Fix a class loader issue that sometime it will cause ClassNotFoundException
Fix regression test to use different catalog name.
Comment out 2 regression tests:
regression-test/suites/query_p0/system/test_query_sys.groovy
regression-test/suites/statistics/alter_col_stats.groovy
Need to be fixed later
[What is DLF](https://www.alibabacloud.com/product/datalake-formation)
This PR is a preparation for support DLF, with some changes of multi catalog
1. Add RuntimeException for most of hive meta store or es client visit operation.
2. Add DLF related dependencies.
3. Move the checks of es catalog properties to the analysis phase of creating es catalog
TODO(in next PR):
1. Refactor the `getSplit` method to support not only hdfs, but s3-compatible object storage.
2. Finish the implementation of supporting DLF
1. fix all checkstyle warning
2. change all checkstyle rules to error
3. remove some java doc rules
a. RequireEmptyLineBeforeBlockTagGroup
b. JavadocStyle
c. JavadocParagraph
4. suppress some rules for old codes
a. all java doc rules only affect on Nereids
b. DeclarationOrder only affect on Nereids
c. OverloadMethodsDeclarationOrder only affect on Nereids
d. VariableDeclarationUsageDistance only affect on Nereids
e. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/ColumnParser.java
f. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/SparkRDDAggregator.java
g. suppress LineLength on org/apache/doris/catalog/FunctionSet.java
h. suppress LineLength on org/apache/doris/common/ErrorCode.java
Current fe check style check all files. But some rules should be only applied on production files.
Add suppressions to suppress some rules on test files.
Currently, we use `UtFrameUtils` to start a FE server in the FE unit test.
Each test class has to do some initialization and clean up stuff with the JUnit4
`@BeforeClass` and `@AfterClass` annotation. It's redundant and boring.
Besides, almost all the APIs in `UtFrameUtils` has a `ConnectContext` parameter, which is not easy to use.
This PR proposes to use an inherit-manner, i.e., wrap all the common logic in base class `TestWithFeService`,
leveraging the
JUnit5 `@BeforeAll` and `@AfterAll` annotation to narrow down the setup and cleanup lifecycle to each test class instance.
At the same time, the derived concrete test class could directly use utility methods inherited from the base class,
without calling a util class and passing a `ConnectContext` argument.
`UtFrameUtils` and `DorisAssert` are marked as deprecated. We could remove these two classes
if this refactor works well for a time.