Fix some hive partition issues.
1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type.
2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.
bind netty-version to 4.1.89-final
bind jettison to 1.5.4
upgrade hadoop version to 3.3.5
upgrade range-plugins-common to 2.4.0
bind bcprov-jdk15on to 2.4.0
upgrade and bind woodstox to 6.5.1
upgrade and bind kerby to 2.0.3
upgrade hudi to 0.13.0
upgrade parquet to 1.13.0
upgrade maven-source-plugin to 3.2.1
upgrade maven-assembly-plugin to 3.3.0
upgrade maven-javadoc-plugin to 3.3.2
upgrade maven-shade-plugin to 3.3.4
upgrade maven-clean-plugin to 3.1.0
Remove meaningless plugins
Optimize doris maven path
Unify the Java modules for management in fe
1. Spark dpp
Move `DppResult` and `EtlJobConfig` to sparkdpp package in `fe-common` module.
So taht `fe-core` is longer depends on `spark-dpp` module, so that the `spark-dpp.jar`
will not be moved into `fe/lib`, which reduce the size of FE output.
2. Modify start_fe.sh
Modify the CLASSPATH to make sure that doris-fe.jar is at front, so that
when loading classes with same qualified name, it will be got from doris-fe.jar firstly.
3. Upgrade hadoop and hive version
hadoop: 2.10.2 -> 3.3.3
hive: 2.3.7 -> 3.1.3
4. Override the IHiveMetastoreClient implementations from dependency
`ProxyMetaStoreClient.java` for Aliyun DLF.
`HiveMetaStoreClient.java` for origin Apache Hive metastore.
Because I need to modified some of their method to make them compatible with
different version of Hive.
5. Exclude some unused dependencies to reduce the size of FE output
Now it is only 370MB (Before is 600MB)
6. Upgrade aws-java-sdk version to 1.12.31
7. Support AWS Glue Data Catalog
8. Remove HudiScanNode(no longer support)
1. fix all checkstyle warning
2. change all checkstyle rules to error
3. remove some java doc rules
a. RequireEmptyLineBeforeBlockTagGroup
b. JavadocStyle
c. JavadocParagraph
4. suppress some rules for old codes
a. all java doc rules only affect on Nereids
b. DeclarationOrder only affect on Nereids
c. OverloadMethodsDeclarationOrder only affect on Nereids
d. VariableDeclarationUsageDistance only affect on Nereids
e. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/ColumnParser.java
f. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/SparkRDDAggregator.java
g. suppress LineLength on org/apache/doris/catalog/FunctionSet.java
h. suppress LineLength on org/apache/doris/common/ErrorCode.java
Issue Number: close#9403
set below rules' severity to error and format code according check info.
a. Merge conflicts unresolved
b. Avoid using corresponding octal or Unicode escape
c. Avoid Escaped Unicode Characters
d. No Line Wrap
e. Package Name
f. Type Name
g. Annotation Location
h. Interface Type Parameter
i. CatchParameterName
j. Pattern Variable Name
k. Record Component Name
l. Record Type Parameter Name
m. Method Type Parameter Name
n. Redundant Import
o. Custom Import Order
p. Unused Imports
q. Avoid Star Import
r. tab character in file
s. Newline At End Of File
t. Trailing whitespace found
Currently, we use `UtFrameUtils` to start a FE server in the FE unit test.
Each test class has to do some initialization and clean up stuff with the JUnit4
`@BeforeClass` and `@AfterClass` annotation. It's redundant and boring.
Besides, almost all the APIs in `UtFrameUtils` has a `ConnectContext` parameter, which is not easy to use.
This PR proposes to use an inherit-manner, i.e., wrap all the common logic in base class `TestWithFeService`,
leveraging the
JUnit5 `@BeforeAll` and `@AfterAll` annotation to narrow down the setup and cleanup lifecycle to each test class instance.
At the same time, the derived concrete test class could directly use utility methods inherited from the base class,
without calling a util class and passing a `ConnectContext` argument.
`UtFrameUtils` and `DorisAssert` are marked as deprecated. We could remove these two classes
if this refactor works well for a time.
Buffer flip is used incorrectly.
When the hash key is string type, the hash value is always zero.
The reason is that the buffer of string type is obtained by wrap, which is not needed to flip.
If we do so, the buffer limit for read will be zero.
* fix(sparkload): bitmap deep copy in `or` operator
fix multi rollup hold the same Ref of bitmapvalue which may be updated repeatedly.
* fix(sparkload): bitmap deep copy in `or` operator
fix multi rollup hold the same Ref of bitmapvalue which may be updated repeatedly.
Co-authored-by: weixiang <weixiang06@meituan.com>
The distinct count result of bitmap/hll column may be incorrect in the spark load mode.
Fix some bugs in spark load to solve the above problem.
1. FE is big end but BE is little end. BitmapValues should be transfered to little end in FE's serialization
2. BitmapUnionAggregator/HllUnionAggregator ignore `null` value
3. Make sure encodeVarint64 in FE is consistent with BE
Co-authored-by: weixiang <weixiang06@meituan.com>
The io related codes may be used by new modules, so It's better to move them to fe-common.
The modification to fe-core is frequent, but there are many generated java files by thrift
will slow down the compilation, so It's better to move thrift generation process to fe-common.
Currently both log4j1 and log4j2 are used, which leads to logs are written to wrong files.
Our modification will remove log4j1 from dependency, use slf4j + slf4j -> log4j2 instead.
When we use spark load from hive table, the function loadDataFromHiveTable
will read whole hive table and then filter the data in process()
if hive table have lots of partitions and history data,the load will be cost too much time and resource.
So we can do filter work in loadDataFromHiveTable function when read from hive table.
Co-authored-by: 杜安明 <anming.du@mihoyo.com>