Commit Graph

32 Commits

Author SHA1 Message Date
976e7685db [minor](*): remove redundant log and unused code. (#11620) 2022-08-10 19:28:04 +08:00
486cf0ebd4 [Feature] Lightweight schema change of add/drop column (#10136)
* [Schema Change] support fast add/drop column  (#49)

* [feature](schema-change) support fast schema change. coauthor: yixiutt

* [schema change] Using columns desc from fe to read data. coauthor: Lchangliang

* [feature](schema change) schema change optimize for add/drop columns.

1.add uniqueId field for class column.
2.schema change for add/drop columns directly update schema meta

Co-authored-by: yixiutt <yixiu@selectdb.com>
Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com>

[Feature](schema change) fix write and add regression test (#69)

Co-authored-by: yixiutt <yixiu@selectdb.com>

[schema change] be ssupport that delete use newest schema

add delete regression test

fix regression case (#107)

tmp

[feature](schema change) light schema change exclude rollup and agg/uniq/dup key type.

[feature](schema change) fe olapTable maxUniqueId write in disk.

[feature](schema change) add rpc iface for sc add column.

[feature](schema change) add columnsDesc to TPushReq for ligtht sc.

resolve the deadlock when schema change (#124)

fix columns from fe don't has bitmap_index flag (#134)

add update/delete case

construct MATERIALIZED schema from origin schema when insert

fix not vectorized compaction coredump

use segment cache

choose newest schema by schema version when compaction (#182)

[bugfix](schema change) fix ligth schema change problem.

[feature](schema change) light schema change add alter job. (#1)

fix be ut

[bug] (schema change) unique drop key column should not light schema
change

[feature](schema change) add schema change regression-test.

fix regression test

[bugfix](schema change) fix multi alter clauses for light schema change. (#2)

[bugfix](schema change) fix multi clauses calculate column unique id (#3)

modify PushTask process (#217)

[Bugfix](schema change) fix jobId replay cause bdbje exception.

[bug](schema change) fix max col unique id repeatitive. (#232)

[optimize](schema change) modify pendingMaxColUniqueId generate rule.

fix compaction error
* fix be ut

* fix snapshot load core

fix unique_id error (#278)

[refact](fe) remove redundant code for light schema change. (#4)

[refact](fe) remove redundant code for light schema change. (#4)

format fe core

format be core

fix be ut

modify fe meta version

fix rebase error

flush schema into rowset_meta in old table

[refactor](schema change) refact fe light schema change. (#5)

delete the change of schemahash and support get max version schema

* modify for review

* fix be ut

* fix schema change test
2022-07-12 19:41:06 +08:00
6a54fc2fe5 [feature-wip](multi-catalog)(resubmit) add catalog level privileges (#10345) 2022-06-23 14:10:11 +08:00
47dba440d0 Revert "[feature-wip](multi-catalog) add CatalogPrivTable to support unified authority management of datalake (#10246)" (#10297)
This reverts commit 41cb4c8f9cf1b58fb33a1e46d2b7db803a15a59f.
2022-06-21 15:55:15 +08:00
41cb4c8f9c [feature-wip](multi-catalog) add CatalogPrivTable to support unified authority management of datalake (#10246)
Supported:
1. Change FeMetaVersion to 111, compatible with upgrade from 110.
2. Add catalog level privileges, and degrade global level privileges to catalog level if FeMetaVersion < 111.
3. Support 'show all grants', 'show roles' statement.
4. Previous version of SQL syntax.

Todo:
1. three-segment format catalog.database.table in SQL syntax.
2. User document for the unified authority management of datalake.
3. LDAP services to provide authentication.
2022-06-21 10:26:50 +08:00
b7b78ae707 [style](fe)the last step of fe CheckStyle (#10134)
1. fix all checkstyle warning
2. change all checkstyle rules to error
3. remove some java doc rules
    a. RequireEmptyLineBeforeBlockTagGroup
    b. JavadocStyle
    c. JavadocParagraph
4. suppress some rules for old codes
    a. all java doc rules only affect on Nereids
    b. DeclarationOrder only affect on Nereids
    c. OverloadMethodsDeclarationOrder only affect on Nereids
    d. VariableDeclarationUsageDistance only affect on Nereids
    e. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/ColumnParser.java
    f. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/SparkRDDAggregator.java
    g. suppress LineLength on org/apache/doris/catalog/FunctionSet.java
    h. suppress LineLength on org/apache/doris/common/ErrorCode.java
2022-06-17 21:02:45 +08:00
da33a48f39 [refactor](policy) Refactor the hierarchy of Policy. (#9786)
The RowPolicy extends Policy
2022-06-04 11:29:09 +08:00
e701c057dc [style](fe) wrap and whitespace rules (#9764)
change below rules' severity to error and fix original code error:

- EmptyBlock
- EmptyCatchBlock
- LeftCurly
- RightCurly
- IllegalTokenText
- MultipleVariableDeclarations
- OneStatementPerLine
- StringLiteralEquality
- UnusedLocalVariable
- Indentation
- OuterTypeFilename
- MethodParamPad
- GenericWhitespace
- NoWhitespaceBefore
- OperatorWrap
- ParenPad
- WhitespaceAfter
- WhitespaceAround
2022-05-26 16:56:20 +08:00
0c70359404 [fix](resource-tag) Consider resource tags when assigning tasks for broker & routine load (#9492)
This CL mainly changes:
1. Broker Load
    When assigning backends, use user level resource tag to find available backends.
    If user level resource tag is not set, broker load task can be assigned to any BE node,
    otherwise, task can only be assigned to BE node which match the user level tags.

2. Routine Load
    The current routine load job does not have user info, so it can not get user level tag when assigning tasks.
    So there are 2 ways:
    1. For old routine load job, use tags of replica allocation info to select BE nodes.
    2. For new routine load job, the user info will be added and persisted in routine load job.
2022-05-26 08:42:09 +08:00
235d586f11 [style](fe) code correct rules and name rules (#9670)
* [style](fe) code correct rules and name rules

* revert some change according to comments
2022-05-19 16:36:03 +08:00
8a0097cfb9 [style](java) format fe code with some check rules (#9460)
Issue Number: close #9403 

set below rules' severity to error and format code according check info.
a. Merge conflicts unresolved
b. Avoid using corresponding octal or Unicode escape
c. Avoid Escaped Unicode Characters
d. No Line Wrap
e. Package Name
f. Type Name
g. Annotation Location
h. Interface Type Parameter
i. CatchParameterName
j. Pattern Variable Name
k. Record Component Name
l. Record Type Parameter Name
m. Method Type Parameter Name
n. Redundant Import
o. Custom Import Order
p. Unused Imports
q. Avoid Star Import
r. tab character in file
s. Newline At End Of File
t. Trailing whitespace found
2022-05-12 20:14:38 +08:00
f11d320213 [feature] support row policy filter (#9206) 2022-05-11 22:11:10 +08:00
d1b85d51a0 [code style](fe) Include test sources (#9366)
Include test sources, we also need to check them.
2022-05-09 09:40:44 +08:00
1746f61388 [refactor](test) Refactor FE unit test framework that starts a FE server. (#9388)
Currently, we use `UtFrameUtils` to start a FE server in the FE unit test. 
Each test class has to do some initialization and clean up stuff with the JUnit4
`@BeforeClass` and `@AfterClass` annotation. It's redundant and boring.
Besides, almost all the APIs in `UtFrameUtils` has a `ConnectContext` parameter, which is not easy to use.

This PR proposes to use an inherit-manner, i.e., wrap all the common logic in base class `TestWithFeService`,
leveraging the 
JUnit5 `@BeforeAll` and `@AfterAll` annotation to narrow down the setup and cleanup lifecycle to each test class instance.
At the same time, the derived concrete test class could directly use utility methods inherited from the base class,
without calling a util class and passing a `ConnectContext` argument.

`UtFrameUtils` and `DorisAssert`  are marked as deprecated. We could remove these two classes
if this refactor works well for a time.
2022-05-07 21:28:42 +08:00
c5941fd166 [FE Code Style][sub] Adjust some check rules (#9345)
Adjust `RedundantImport`,`UnusedImports`,`EmptyStatement`,`NewlineAtEndOfFile`,`UpperEll`, `AvoidStarImport`, `MissingOverride` rules.
2022-05-04 23:34:55 +08:00
784681f106 [FE Code Style][step 0]add github action to check incremental code in pr (#9328)
1. add rules to checkstyle
2. add github action to check incremental code in pr
2022-05-01 17:30:29 +08:00
bca121333e [feature](cold-hot) support s3 resource (#8808)
Add cold hot support in FE meta, support alter resource DDL in FE
2022-04-13 09:52:03 +08:00
18098c5ceb [fix](fe-ut) Fix FE unit test (#8293)
Fix following ut:
1. GlobalTransactionMgrTest
2. BackupJobTest
3. ReplicaTest
4. SparkLoadJobTest

Also remove old FE Meta version
2022-03-03 09:30:17 +08:00
f7c18d300c [Improvement] Add minimum fe meta version check (#8203)
There are many old codes in FE for old FE meta version such as `if (FeMetaVersion < VERSION_45) xxxxx`,
but the latest FE meta version is 107, these code maybe never reached,
but we do not remove these code because "sometimes" there are old code.

Add minimum required version check to allow us remove these old codes.
2022-02-25 11:14:00 +08:00
3b8d48f08b [feature-wip](iceberg) Step1: Support create Iceberg external table (#7391)
Close related #7389

Support create Iceberg external table in Doris. 

This is the first step to support Iceberg external table.

### Create Iceberg external table
This pr describes two ways to create Iceberg external tables. Both ways do not require explicitly specifying column definitions, Doris automatically converts them based on Iceberg's column definitions.

1. Create an Iceberg external table directly

```sql
    CREATE [EXTERNAL] TABLE table_name 
    ENGINE = ICEBERG
    [COMMENT "comment"]
    PROPERTIES (
    "iceberg.database" = "iceberg_db_name",
    "iceberg.table" = "icberg_table_name",
    "iceberg.hive.metastore.uris"  =  "thrift://192.168.0.1:9083",
    "iceberg.catalog.type"  =  "HIVE_CATALOG"
    );
```

2. Create an Iceberg database and automatically create all the tables under that db.

```sql
    CREATE DATABASE db_name 
    [COMMENT "comment"]
    PROPERTIES (
    "iceberg.database" = "iceberg_db_name",
    "iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
    "iceberg.catalog.type" = "HIVE_CATALOG"
    );
```

### Show table creation

1. For individual tables you can view them with `help show create table`.

```sql 
mysql> show create table iceberg_db.logs_1;
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table  | Create Table                                                                                                                                                                                                                                                                                                                                                 |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logs_1 | CREATE TABLE `logs_1` (
  `level` varchar(-1) NOT NULL COMMENT "null",
  `event_time` datetime NOT NULL COMMENT "null",
  `message` varchar(-1) NOT NULL COMMENT "null"
) ENGINE=ICEBERG
COMMENT "ICEBERG"
PROPERTIES (
"iceberg.database" = "doris",
"iceberg.table" = "logs_1",
"iceberg.hive.metastore.uris"  =  "thrift://10.10.10.10:9087",
"iceberg.catalog.type"  =  "HIVE_CATALOG"
) |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```

2. For Iceberg database, you can view it with `help show table creation`.

```sql
mysql> show table creation from iceberg_db;
+--------+---------+---------------------+---------------------------------------------------------+
| Table  | Status  | Create Time         | Error Msg                                               |
+--------+---------+---------------------+---------------------------------------------------------+
| logs   | fail    | 2021-12-14 13:50:10 | Cannot convert unknown type to Doris type: list<string> |
| logs_1 | success | 2021-12-14 13:50:10 |                                                         |
+--------+---------+---------------------+---------------------------------------------------------+
2 rows in set (0.00 sec)
```

  This is a new syntax.
  
  Show table creation records in Iceberg database:
  
  Syntax:
  ```sql
      SHOW TABLE CREATION [FROM db] [LIKE mask]
  ```
2022-01-27 10:22:47 +08:00
4bdeef3b64 [chore][fix][doc](fe-plugin)(mysqldump) fix build auditlog plugin error (#7804)
1. fix problems when build fe_plugins
2. format
3. add docs about dump data using mysql dump
2022-01-26 09:11:23 +08:00
4ac8b3c9a9 [fix][s3] Fix bug that can not visit aliyun oss with aws s3 sdk (#7691)
Close #7690

1. Exclude httpclient and httpcore dependencies from thrift@0.13

    Explicitly use httpclient@4.5.13 and httpcore@4.4.15
    https://stackoverflow.com/questions/59265959/java-lang-bootstrapmethoderror-call-site-initialization-exception-from-athena-j

2. Exclude aws-java-sdk-s3 dependency from hadoop-aws

    Explicitly use aws-java-sdk-s3@1.11.95
    https://github.com/aws/aws-sdk-java/issues/1032
2022-01-11 15:00:31 +08:00
738d2d2e07 [refactor] update parent pom version and optimize build scripts (#7548) 2022-01-05 10:45:11 +08:00
2872dbfeb8 [refactor] Standardize the writing of pom files, prepare for deployment to maven (#7477) 2021-12-30 10:16:37 +08:00
ab60c5eb59 [fix](spark-load) fix Roaring64Map big-endian read/write in de/serialization (#7480)
See #7479
This bug is triggered when the bitmap exceeds 32 bits.
2021-12-26 11:09:50 +08:00
926540c561 [feature] Support return bitmp/hll data in select statement (#7276)
Support return bitmp/hll data in select statement, this can be used when set show_object_data=true;
2021-12-15 09:48:27 +08:00
24d38614a0 [Dependency] Upgrade thirdparty libs (#6766)
Upgrade the following dependecies:

libevent -> 2.1.12
OpenSSL 1.0.2k -> 1.1.1l
thrift 0.9.3 -> 0.13.0
protobuf 3.5.1 -> 3.14.0
gflags 2.2.0 -> 2.2.2
glog 0.3.3 -> 0.4.0
googletest 1.8.0 -> 1.10.0
snappy 1.1.7 -> 1.1.8
gperftools 2.7 -> 2.9.1
lz4 1.7.5 -> 1.9.3
curl 7.54.1 -> 7.79.0
re2 2017-05-01 -> 2021-02-02
zstd 1.3.7 -> 1.5.0
brotli 1.0.7 -> 1.0.9
flatbuffers 1.10.0 -> 2.0.0
apache-arrow 0.15.1 -> 5.0.0
CRoaring 0.2.60 -> 0.3.4
orc 1.5.8 -> 1.6.6
libdivide 4.0.0 -> 5.0
brpc 0.97 -> 1.0.0-rc02
librdkafka 1.7.0 -> 1.8.0

after this pr compile doris should use build-env:1.4.0
2021-10-15 13:03:04 +08:00
b1f5979103 [New Feature][Meta][Image] Add file header and footer for image (#6207)
#6206 

At present, our image file does not have file header/footer. When we need to change the image format (such as adding different journal versions to the image), there is no way to distinguish different image formats.

Therefore, we suggest adding file header and footer to the image. By the new image format, we can freely distinguish and define different image reading ways.

The format of the image is as follows:

```
/**
 * Image Format:
 * |- Image --------------------------------------|
 * | - Magic String (4 bytes)                     |
 * | - Header Length (4 bytes)                    |
 * | |- Header -----------------------------|     |
 * | | |- Json Header ---------------|      |     |
 * | | | - version                   |      |     |
 * | | | - other key/value(undecided)|      |     |
 * | | |-----------------------------|      |     |
 * | |--------------------------------------|     |
 * |                                              |
 * | |- Image Body -------------------------|     |
 * | | Object a                             |     |
 * | | Object b                             |     |
 * | | ...                                  |     |
 * | |--------------------------------------|     |
 * |                                              |
 * | |- Footer -----------------------------|     |
 * | | | - Checksum (8 bytes)               |     |
 * | | |- object index --------------|      |     |
 * | | | - index a                   |      |     |
 * | | | - index b                   |      |     |
 * | | | ...                         |      |     |
 * | | |-----------------------------|      |     |
 * | | - other value(undecided)             |     |
 * | |--------------------------------------|     |
 * | - Footer Length (8 bytes)                    |
 * | - Magic String (4 bytes)                     |
 * |----------------------------------------------|
 */
```
1. Magic Number
One image format is identified by one magic string and one version field. The magic string is save in the first 4 bytes and last 4 bytes in the images.

2. Image Header:
The version is save in the header with json format now.

3. Image Body:
Equal to the original image.

4.Image Footer:
Image footer stores the file offset(index) of many image objects. If necessary, we can read some objects in the image by the footer.
2021-07-27 13:36:53 +08:00
40bc3fed53 [Code] basic property related classes supports create, query, read, write, etc. (#6153)
Provides basic property related classes supports create, query, read, write, etc.
Currently, Doris FE mostly uses `if` statement to check properties in SQL. There is a lot of redundancy in the code.
The `PropertySet` class can be used in the analysis phase of `Statement`. The validation and correctness of the input properties are automatic verified. It can simplify the code and improve the readability of the code.

Usage:
1. Create a custom class that implements `SchemaGroup` interface.
2. Define the properties to be used. If it's a required parameter, there is no need to set the default value.
3. According the the requirements, in the logic called `readFromStrMap` and other functions to check and obtain parameters.

Demo:

Class definition

```
public class FileFormat implements PropertySchema.SchemaGroup {
    public static final PropertySchema<FileFormat.Type> FILE_FORMAT_TYPE =
            new PropertySchema.EnumProperty<>("type", FileFormat.Type.class).setDefauleValue(FileFormat.Type.CSV);
    public static final PropertySchema<String> RECORD_DELIMITER =
            new PropertySchema.StringProperty("record_delimiter").setDefauleValue("\n");
    public static final PropertySchema<String> FIELD_DELIMITER =
            new PropertySchema.StringProperty("field_delimiter").setDefauleValue("|");
    public static final PropertySchema<Integer> SKIP_HEADER =
            new PropertySchema.IntProperty("skip_header", true).setMin(0).setDefauleValue(0);

    private static final FileFormat INSTANCE = new FileFormat();

    private ImmutableMap<String, PropertySchema> schemas = PropertySchema.createSchemas(
            FILE_FORMAT_TYPE,
            RECORD_DELIMITER,
            FIELD_DELIMITER,
            SKIP_HEADER);

    public ImmutableMap<String, PropertySchema> getSchemas() {
        return schemas;
    }

    public static FileFormat get() {
        return INSTANCE;
    }
}

```

Usage
```
public class CreateXXXStmt extends DdlStmt {
    private PropertiesSet<FileFormat> analyzedFileFormat = PropertiesSet.empty(FileFormat.get());
    private final Map<String, String> fileFormatOptions;
    ...

    public void analyze(Analyzer analyzer) throws UserException {
        ...
        if (fileFormatOptions != null) {
            try {
                analyzedFileFormat = PropertiesSet.readFromStrMap(FileFormat.get(), fileFormatOptions);
            } catch (IllegalArgumentException e) {
                ...
            }
        }

        // 1. Get property value
        String recordDelimiter = analyzedFileFormat.get(FileFormat.RECORD_DELIMITER)
        // 2. Check the validity of parameters
        PropertiesSet.verifyKey(FileFormat.get(), fileFormatOptions);
        ...
    }

}
```
2021-07-08 09:55:07 +08:00
77485521d3 [Enhancement] move FeMetaVersion.java from fe-common to fe-core #5426 (#5427)
Currently, FeMetaVersion.java is in fe-common, users may forget to copy fe-common.jar when upgrading the service.
It's really dangerous because the data may be corrupted and can not be recovered.
2021-03-04 22:25:03 +08:00
d8202ca9cc [Enhancement] move common codes from fe-core to fe-common and remove log4j1 (#5317) (#5318)
The io related codes may be used by new modules, so It's better to move them to fe-common.

The modification to fe-core is frequent, but there are many generated java files by thrift
will slow down the compilation, so It's better to move thrift generation process to fe-common.

Currently both log4j1 and log4j2 are used, which leads to logs are written to wrong files.
Our modification will remove log4j1 from dependency, use slf4j + slf4j -> log4j2 instead.
2021-02-04 13:41:03 +08:00
0e79f6908b [CodeRefactor] Modify FE modules (#4146)
This CL mainly changes:

1. Add 2 new FE modules

    1. fe-common

        save all common classes for other modules, currently only `jmockit`
        
    2. spark-dpp

        The Spark DPP application for Spark Load. And I removed all dpp related classes to this module, including unit tests.
        
2. Change the `build.sh`

    Add a new param `--spark-dpp` to compile the `spark-dpp` alone. And `--fe` will compile all FE modules.
    
    the output of `spark-dpp` module is `spark-dpp-1.0.0-jar-with-dependencies.jar`, and it will be installed to `output/fe/spark-dpp/`.

3. Modify some bugs of spark load
2020-07-29 16:18:05 +08:00