1. fix all checkstyle warning
2. change all checkstyle rules to error
3. remove some java doc rules
a. RequireEmptyLineBeforeBlockTagGroup
b. JavadocStyle
c. JavadocParagraph
4. suppress some rules for old codes
a. all java doc rules only affect on Nereids
b. DeclarationOrder only affect on Nereids
c. OverloadMethodsDeclarationOrder only affect on Nereids
d. VariableDeclarationUsageDistance only affect on Nereids
e. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/ColumnParser.java
f. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/SparkRDDAggregator.java
g. suppress LineLength on org/apache/doris/catalog/FunctionSet.java
h. suppress LineLength on org/apache/doris/common/ErrorCode.java
Issue Number: close#9403
set below rules' severity to error and format code according check info.
a. Merge conflicts unresolved
b. Avoid using corresponding octal or Unicode escape
c. Avoid Escaped Unicode Characters
d. No Line Wrap
e. Package Name
f. Type Name
g. Annotation Location
h. Interface Type Parameter
i. CatchParameterName
j. Pattern Variable Name
k. Record Component Name
l. Record Type Parameter Name
m. Method Type Parameter Name
n. Redundant Import
o. Custom Import Order
p. Unused Imports
q. Avoid Star Import
r. tab character in file
s. Newline At End Of File
t. Trailing whitespace found
Currently, we use `UtFrameUtils` to start a FE server in the FE unit test.
Each test class has to do some initialization and clean up stuff with the JUnit4
`@BeforeClass` and `@AfterClass` annotation. It's redundant and boring.
Besides, almost all the APIs in `UtFrameUtils` has a `ConnectContext` parameter, which is not easy to use.
This PR proposes to use an inherit-manner, i.e., wrap all the common logic in base class `TestWithFeService`,
leveraging the
JUnit5 `@BeforeAll` and `@AfterAll` annotation to narrow down the setup and cleanup lifecycle to each test class instance.
At the same time, the derived concrete test class could directly use utility methods inherited from the base class,
without calling a util class and passing a `ConnectContext` argument.
`UtFrameUtils` and `DorisAssert` are marked as deprecated. We could remove these two classes
if this refactor works well for a time.
Buffer flip is used incorrectly.
When the hash key is string type, the hash value is always zero.
The reason is that the buffer of string type is obtained by wrap, which is not needed to flip.
If we do so, the buffer limit for read will be zero.
* fix(sparkload): bitmap deep copy in `or` operator
fix multi rollup hold the same Ref of bitmapvalue which may be updated repeatedly.
* fix(sparkload): bitmap deep copy in `or` operator
fix multi rollup hold the same Ref of bitmapvalue which may be updated repeatedly.
Co-authored-by: weixiang <weixiang06@meituan.com>
The distinct count result of bitmap/hll column may be incorrect in the spark load mode.
Fix some bugs in spark load to solve the above problem.
1. FE is big end but BE is little end. BitmapValues should be transfered to little end in FE's serialization
2. BitmapUnionAggregator/HllUnionAggregator ignore `null` value
3. Make sure encodeVarint64 in FE is consistent with BE
Co-authored-by: weixiang <weixiang06@meituan.com>
The io related codes may be used by new modules, so It's better to move them to fe-common.
The modification to fe-core is frequent, but there are many generated java files by thrift
will slow down the compilation, so It's better to move thrift generation process to fe-common.
Currently both log4j1 and log4j2 are used, which leads to logs are written to wrong files.
Our modification will remove log4j1 from dependency, use slf4j + slf4j -> log4j2 instead.
When we use spark load from hive table, the function loadDataFromHiveTable
will read whole hive table and then filter the data in process()
if hive table have lots of partitions and history data,the load will be cost too much time and resource.
So we can do filter work in loadDataFromHiveTable function when read from hive table.
Co-authored-by: 杜安明 <anming.du@mihoyo.com>
For[Spark Load]
1 support decimal andl largeint
2 add validate logic for char/varchar/decimal
3 check data load from hive with strict mode
4 support decimal/date/datetime aggregator
1. fix write dpp result when dpp throw exception
2. boolean value:true, false(IgnoreCase), 0, 1
3. wrong dest column for source data check
4. support * in source file path
5. if job state is cancelled or finished, submitPushTasks would throw all partitions have no load data exception,
because tableToLoadPartitions was already cleaned up
#3433