1. mysql-connector-java
mysql-connector-java is under GLPv2 license, which is not compatible with APLv2, and Doris does not use it.
2. org.json
org.json is under JSON license, which is not compatible with APLv2. I use `json-simple` to replace it.
This reverts commit df7e848cbbc8170c7bd83d812d7cac58b5574570.
Reverts apache/incubator-doris#8218
Because when using grpc 1.44.1, the corresponding `protoc-gen-grpc-java` plugin
requried GLIBC_2.14, which is not found in CentOS 6.
So I suggest to revert this commit this time. And considering upgrading this component
after most systems have reached glibc version 2.14.
And for Mac M1, you may have to change this version manually for now
This bug is introduced from #8112.
Also , I change the `grpc-netty` dependency to `grpc-netty-shaded`, to avoid dependency conflict:
```
java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.
```
Hive Bitmap UDF provides UDFs for generating bitmap and bitmap operations in hive tables.
The bitmap in Hive is exactly the same as the Doris bitmap.
The bitmap in Hive can be imported into Doris through spark bitmap load.
Close related #7389
Support create Iceberg external table in Doris.
This is the first step to support Iceberg external table.
### Create Iceberg external table
This pr describes two ways to create Iceberg external tables. Both ways do not require explicitly specifying column definitions, Doris automatically converts them based on Iceberg's column definitions.
1. Create an Iceberg external table directly
```sql
CREATE [EXTERNAL] TABLE table_name
ENGINE = ICEBERG
[COMMENT "comment"]
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.table" = "icberg_table_name",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
```
2. Create an Iceberg database and automatically create all the tables under that db.
```sql
CREATE DATABASE db_name
[COMMENT "comment"]
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
```
### Show table creation
1. For individual tables you can view them with `help show create table`.
```sql
mysql> show create table iceberg_db.logs_1;
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logs_1 | CREATE TABLE `logs_1` (
`level` varchar(-1) NOT NULL COMMENT "null",
`event_time` datetime NOT NULL COMMENT "null",
`message` varchar(-1) NOT NULL COMMENT "null"
) ENGINE=ICEBERG
COMMENT "ICEBERG"
PROPERTIES (
"iceberg.database" = "doris",
"iceberg.table" = "logs_1",
"iceberg.hive.metastore.uris" = "thrift://10.10.10.10:9087",
"iceberg.catalog.type" = "HIVE_CATALOG"
) |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
2. For Iceberg database, you can view it with `help show table creation`.
```sql
mysql> show table creation from iceberg_db;
+--------+---------+---------------------+---------------------------------------------------------+
| Table | Status | Create Time | Error Msg |
+--------+---------+---------------------+---------------------------------------------------------+
| logs | fail | 2021-12-14 13:50:10 | Cannot convert unknown type to Doris type: list<string> |
| logs_1 | success | 2021-12-14 13:50:10 | |
+--------+---------+---------------------+---------------------------------------------------------+
2 rows in set (0.00 sec)
```
This is a new syntax.
Show table creation records in Iceberg database:
Syntax:
```sql
SHOW TABLE CREATION [FROM db] [LIKE mask]
```
Increase compatibility with mysql
1. Added two system tables files and partitions
2. Improved the return logic of mysql error code to make the error code more compatible with mysql
3. Added lock/unlock tables statement and show columns statement for compatibility with mysql dump
4. Compatible with mysqldump tool, now you can use mysql dump to dump data and table structure from doris
now use mysqldump may print error message like
```
$ mysqldump -h127.0.0.1 -P9130 -uroot test_query_qa > a
mysqldump: Error: 'errCode = 2, detailMessage = select list expression not produced by aggregation output (missing from GROUP BY clause?): `EXTRA`' when trying to dump tablespaces
```
This error message not effect the export file, you can add `--no-tablespaces` to avoid this error
Users can directly query the data in the hive table in Doris, and can use join to perform complex queries without laboriously importing data from hive.
Main changes list below:
FE:
Extend HiveScanNode from BrokerScanNode
HiveMetaStoreClientHelper communicate with HIVE and HDFS.
BE:
Treate HiveScanNode as BrokerScanNode, treate HiveTable as BrokerTable.
broker_scanner.cpp: suppot read column from HDFS path.
orc_scanner.cpp: support read hdfs file.
POM:
Add hive.version=2.3.7, hive-metastore and hive-exec
Add hadoop.version=2.8.0, hadoop-hdfs
Upgrade commons-lang to fix incompatiblity of Java 9 and later.
Thrift:
Add THiveTable
Add read_by_column_def in TBrokerRangeDesc
Add a use_path_style property for S3
Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property
Fix some S3 URI bugs
Add some logs for tracing load process.
1. Add license/total line/release badegs.
2. Add monthly active contributor and contributor growth graph
3. fix a pom.xml bug
4. Modify some routine load log on BE side
1. /api/cluster_overview to view some statistic info of the cluster
2. /api/meta/ to view the database/table schema
3. /api/import/file_review to review the file content with format CSV or PARQUET.
This is part of the array type support and has not been fully completed.
The following functions are implemented
1. fe array type support and implementation of array function, support array syntax analysis and planning
2. Support import array type data through insert into
3. Support select array type data
4. Only the array type is supported on the value lie of the duplicate table
this pr merge some code from #4655#4650#4644#4643#4623#2979
Add command:
1. EXPLAIN GRAPGH SELECT ...
2. SHOW QUERY PROFILE "..."
Document will be added in next PR
Change-Id: Ifd9365e10b1f9ff4fdf8ae0556343783d97545f0
* [doris-1008] support backup and restore directly to cloud storage via aws s3 protocol
* Internal][S3DirectAccess] Support backup,restore,load,export directlyconnect to s3
1. Support load and export data from/to s3 directly.
2. Add a config to auto convert broker access to s3 acces when available
Change-Id: Iac96d4b3670776708bc96a119ff491db8cb4cde7
(cherry picked from commit 2f03832ca52221cc7436069b96c45c48c4bc7201)
* [Internal][S3DirectAccess] File path glob compatible with broker
Change-Id: Ie55e07a547aa22c6fa8d432ca926216c10384e68
(cherry picked from commit d4fb25544c0dc06d23e1ada571ec3f8edd4ba56f)
* [internal] [doris-1008] fix log4j class not found
Change-Id: I468176aca0d821383c74ee658d461aba9e7d5be3
(cherry picked from commit 029adaa9d6ded8503acbd6644c1519456f3db232)
* add poms
Co-authored-by: yangzhengguo01 <yangzhengguo01@baidu.com>
The io related codes may be used by new modules, so It's better to move them to fe-common.
The modification to fe-core is frequent, but there are many generated java files by thrift
will slow down the compilation, so It's better to move thrift generation process to fe-common.
Currently both log4j1 and log4j2 are used, which leads to logs are written to wrong files.
Our modification will remove log4j1 from dependency, use slf4j + slf4j -> log4j2 instead.
This CL mainly changes:
1. Add 2 new FE modules
1. fe-common
save all common classes for other modules, currently only `jmockit`
2. spark-dpp
The Spark DPP application for Spark Load. And I removed all dpp related classes to this module, including unit tests.
2. Change the `build.sh`
Add a new param `--spark-dpp` to compile the `spark-dpp` alone. And `--fe` will compile all FE modules.
the output of `spark-dpp` module is `spark-dpp-1.0.0-jar-with-dependencies.jar`, and it will be installed to `output/fe/spark-dpp/`.
3. Modify some bugs of spark load
Fix#3403
The new version of Guava has move the `toStringHelper` from `Object` to `MoreObject`.
This CL has passed our test environment, and looks running well.
The index name in MaterializedViewMeta is still with `__doris_shadow` prefix
after schema change finished.
In this CL, I just remove the index name field in MaterializedViewMeta,
so that it would makes managing change of names less error-prone.