Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now.
If we call array_union to merge arrays repeatedly, the size of array may overflow.
So we need to extend it before `Array Data Type` release.
1. fix all checkstyle warning
2. change all checkstyle rules to error
3. remove some java doc rules
a. RequireEmptyLineBeforeBlockTagGroup
b. JavadocStyle
c. JavadocParagraph
4. suppress some rules for old codes
a. all java doc rules only affect on Nereids
b. DeclarationOrder only affect on Nereids
c. OverloadMethodsDeclarationOrder only affect on Nereids
d. VariableDeclarationUsageDistance only affect on Nereids
e. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/ColumnParser.java
f. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/SparkRDDAggregator.java
g. suppress LineLength on org/apache/doris/catalog/FunctionSet.java
h. suppress LineLength on org/apache/doris/common/ErrorCode.java
add some logic to opt compaction:
1.seperate base&cumu compaction in case base compaction runs too long and
affect cumu compaction
2.fix level size in cu compaction so that file size below 64M have a right level
size, when choose rowsets to do compaction, the policy will ignore big rowset,
this will reduce about 25% cpu in high frequency concurrent load
3.remove skip window restriction so rowset can do compaction right after
generated, cause we'll not delete rowset after compaction. This will highly
reduce compaction score in concurrent log.
4.remove version consistence check in can_do_compaction, we'll choose a
consecutive rowset to do compaction, so this logic is useless
after add logic above, compaction score and cpu cost will have a substantial
optimize in concurrent load.
Co-authored-by: yixiutt <yixiu@selectdb.com>
* [Vectorized][Function] add orthogonal bitmap agg functions
save some file about orthogonal bitmap function
add some file to rebase
update functions file
* refactor union_count function
refactor orthogonal union count functions
* remove bool is_variadic
In some cases, query mem tracker does not exist in BE when transmit block. This will result in a null pointer for get query mem tracker in brpc transmit_block
Support query hive table on S3. Pass AK/SK, Region and s3 endpoint to hive table while creating the external table.
example create table sql:
```
CREATE TABLE `region_s3` (
`r_regionkey` integer NOT NULL,
`r_name` char(25) NOT NULL,
`r_comment` varchar(152) )
engine=hive
properties
("database"="default",
"table"="region_s3",
“hive.metastore.uris"="thrift://127.0.0.1:9083",
“AWS_ACCESS_KEY”=“YOUR_ACCESS_KEY",
“AWS_SECRET_KEY”=“YOUR_SECRET_KEY",
"AWS_ENDPOINT"="s3.us-east-1.amazonaws.com",
“AWS_REGION”=“us-east-1”);
```
* If we disable AVX2 by config USE_AVX2=0, we need to croaringbitmap with ROARING_DISABLE_AVX=ON
* update to trigger regression test again
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
* compaction quickly for small data import #9791
1.merge small versions of rowset as soon as possible to increase the import frequency of small version data
2.small version means that the number of rows is less than config::small_compaction_rowset_rows default 1000