Commit Graph

79 Commits

Author SHA1 Message Date
a4f9628576 [improvement](datax) improvement json import and support csv writing
1.At present, read_json_by_line and fuzzy_parse are used for json format writing, and the performance of streamload writing will decrease. It is modified to strip_outer_array and fuzzy_parse writing, and the speed is increased by about 3 times.

2.Add csv writing, the column separator is set to \x01, and the row separator is set to \x02, the performance is about 5 times higher than before
2022-08-09 11:50:24 +08:00
65dd8eb885 Update init-env.sh (#11111)
This script is missing "!"
2022-07-22 21:55:12 +08:00
468040974e [compile]Update init-env.sh (#10451) 2022-06-30 11:28:06 +08:00
67f341f44e [TLP](step-1) Remove incubator prefix (#10230)
Remove some `incubator-` prefix in source code.
The document is not modified, will be done in next PR.
2022-06-19 19:34:52 +08:00
87e3904cc6 Fix some typos for docs. (#9680) 2022-05-19 20:55:21 +08:00
c1707ca388 [feature][datax]doriswriter support timeZone (#9327) 2022-05-06 18:39:10 +08:00
7af79e1df5 [Feature][dbt] add partition_type support (#9389) 2022-05-06 15:27:34 +08:00
2c81624765 [Features]Add dbt doris adapter (#9299)
* Add dbt doris adapter

* Add licence header to each file

* Fix licence header
2022-04-29 11:40:29 +08:00
3dd6b42781 [fix](datax) Fix the problem of keyword error when importing datax (#8893) 2022-04-08 09:20:54 +08:00
3b159a9820 support doriswriter build in macos (#8330)
support doriswriter build in macos (#8330)
2022-03-07 09:53:16 +08:00
c3b010b277 [refactor] Remove flink/spark connectors (#8004)
As we discussed in dev@doris[1]
Flink/Spark connectors has been moved to new repo: https://github.com/apache/incubator-doris-connectors

[1] https://lists.apache.org/thread/hnb7bf0l6y6rzb9pr6lhxz3jjoo04skl
2022-02-10 15:00:36 +08:00
4ada8e4854 [fix](httpv2) make http v2 and v1 interface compatible (#7848)
http v2 TableSchemaAction adds the return value of aggregation_type,
and modifies the corresponding code of Flink/Spark Connector
2022-01-31 22:12:34 +08:00
4bdeef3b64 [chore][fix][doc](fe-plugin)(mysqldump) fix build auditlog plugin error (#7804)
1. fix problems when build fe_plugins
2. format
3. add docs about dump data using mysql dump
2022-01-26 09:11:23 +08:00
60c6bb4f92 [Feature][flink-connector] support flink delete option (#7457)
* Flink Connector supports delete option on Unique models
Co-authored-by: wudi <wud3@shuhaisc.com>
2022-01-23 20:24:41 +08:00
a6ff1bd79e Flink / Spark connector compilation problem (#7725)
Flink / Spark connector compilation problem
2022-01-14 22:14:48 +08:00
6864a376ca [improvement](spark-connector) Throw an exception when the data push fails and there are too many retries (#7531) 2022-01-11 15:03:06 +08:00
7254bcc8ca [refactor](spark-connector) delete useless maven dependencies and some code variable definition issues (#7655) 2022-01-09 16:58:16 +08:00
9aaa3f63f7 [improvement](spark-connector) Stream load http exception handling (#7514)
Stream load http exception handling
2022-01-09 16:54:55 +08:00
3a8a85b739 [Optimize][Extension] optimize extension datax doriswriter,Remove import doris via csv in Dataxwriter, only support via json (#7568)
* 1.Remove import doris via csv in Dataxwriter, only support via json;
2.Format Dataxwriter code;
3.Optimize exception handling and reduce multiple output of exception logs;
4.Update the dataxwriter's documentation;

* Delete DorisCsvCodec.java

delete unused file extension/DataX/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCsvCodec.java

* 1.remove `format` config key;
2.Optimize serialization code in DorisJsonCodec class
2022-01-09 13:27:52 +08:00
ad35067a2a [chore][docs] add deploy spark/flink connectors to maven release repo docs (#7616) 2022-01-06 23:23:33 +08:00
738d2d2e07 [refactor] update parent pom version and optimize build scripts (#7548) 2022-01-05 10:45:11 +08:00
2872dbfeb8 [refactor] Standardize the writing of pom files, prepare for deployment to maven (#7477) 2021-12-30 10:16:37 +08:00
80587e7ac2 [improvement](spark-connector)(flink-connector) Modify the max num of batch written by Spark/Flink connector each time. (#7485)
Increase the default batch size and flush interval
2021-12-26 11:13:47 +08:00
b4ce189646 [improvement](flink-connector) flush data without multi httpclients (#7329) (#7450)
reuse http client to flush data
2021-12-24 21:28:35 +08:00
e9049605b6 [fix](flink-connector) Connector should visit the surviving BE nodes (#7435) 2021-12-21 11:05:42 +08:00
549e849400 [improvement](flink-connector) DataSourceFunction read doris supports parallel (#7232)
The previous DataSourceFunction inherited from RichSourceFunction.
As a result, no matter how much the parallelism of flink is set, the parallelism of DataSourceFunction is only 1.
Now modify it to RichParallelSourceFunction.

And when flink has multiple degrees of parallelism, assign the doris data to each parallelism.
For example, read dorisPartitions.size = 10, flink.parallelism = 4
The task is split as follows:
task0: dorisPartitions[0],[4],[8]
task1: dorisPartitions[1],[5],[9]
task2: dorisPartitions[2],[6]
task3: dorisPartitions[3],[7]
2021-12-15 16:21:29 +08:00
c8bc0cf523 [chore][community](github) Remove travis and add github action (#7380)
1. Remove travis
2. Add github action to build extension:
    1. docs
    2. fs_broker
    3. flink/spark/connector
2021-12-15 13:27:37 +08:00
19a3c393a9 [Improvement](spark-connector) Add 'sink.batch.size' and 'sink.max-retries' options in spark-connector (#7281)
Add  `sink.batch.size` `sink.max-retries` options in `Doris Spark-connector`.
Be consistent with `link-connector` options .
eg:
```scala
   df.write
      .format("doris")
      // specify maximum number of lines in a single flushing
      .option("sink.batch.size",2048)
      // specify number of retries after writing failed
      .option("sink.max-retries",3)
      .save()
```
2021-12-06 10:29:33 +08:00
dcad6ff5e5 [License] Add License header for missing files (#7130)
1. Add License header for missing files
2. Modify the spark pom.xml to correct the location of `thrift`
2021-11-16 18:37:54 +08:00
88651a47c7 [Feature] Support Flink and Spark connector support String type (#7075)
Support String type for Flink and Spark connector
2021-11-13 17:10:22 +08:00
ed61055912 [SparkConnector] Add thrift dir for spark connector (#7074)
Add thrift dir for spark connector, to fix error when building spark-doris-connector
2021-11-13 17:09:52 +08:00
8e9f36877c [Compile] Fix spark-connector compile problem (#7048)
Use `thrift` in thirdparty
2021-11-11 15:42:30 +08:00
b54a12ef11 [Build]Compile and output the jar file, add Spark, Flink version and Scala version (#7051)
The jar file compiled by Flink and Spark Connector, with the corresponding Flink, Spark version
and Scala version at compile time, so that users can know whether the version number matches when using it.

Example of output file name:doris-spark-1.0.0-spark-3.2.0_2.12.jar
2021-11-09 10:02:08 +08:00
29838f07da [HTTP][API] Add backends info API for spark/flink connector (#6984)
Doris should provide a http api to return backends list for connectors to submit stream load,
and without privilege checking, which can let common user to use it
2021-11-05 09:43:06 +08:00
d19a971582 [Revert] Revert RestService.java (#6994) 2021-11-04 12:13:18 +08:00
f39a5bc1d0 [Feature] Spark connector supports to specify fields to write (#6973)
1. By default , Spark connector must write all fields value to `Doris` table .
In this feature , user can specify part of fields to write ,  even specify the order of the fields to write.

eg:
I have a table named `student` which has three columns (name,gender,age) ,
creating table sql as following:
```sql
create table student (name varchar(255), gender varchar(10), age int) duplicate key (name) distributed by hash(name) buckets 2;
```
Now , I just want  to write values to two columns : name , gender.
The code as following:
```scala
    val df = spark.createDataFrame(Seq(
      ("m", "zhangsan"),
      ("f", "lisi"),
      ("m", "wangwu")
    ))
    df.write
      .format("doris")
      .option("doris.fenodes", dorisFeNodes)
      .option("doris.table.identifier", dorisTable)
      .option("user", dorisUser)
      .option("password", dorisPwd)
      //specify your fields or the order
      .option("doris.write.field", "gender,name")
      .save()
```
2021-11-02 16:35:29 +08:00
466cd5dd09 [Optimize] Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x (#6956)
* Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x
Co-authored-by: wei.zhao <wei.zhao@aispeech.com>
2021-10-29 17:06:05 +08:00
1f65de1a5d Fix spark connector build error (#6948)
pom.xml error
2021-10-29 14:59:05 +08:00
addfff74c4 support use char like \x01 in flink-doris-sink column & line delimiter (#6937)
* support use char like \x01 in flink-doris-sink column & line delimiter

* extend imports

* add docs
2021-10-29 13:56:52 +08:00
ebb4c282b1 [Flink]Simplify the use of flink connector (#6892)
1. Simplify the use of flink connector like other stream sink by GenericDorisSinkFunction.
2. Add the use cases of flink connector.

## Use case
```
env.fromElements("{\"longitude\": \"116.405419\", \"city\": \"北京\", \"latitude\": \"39.916927\"}")
     .addSink(
          DorisSink.sink(
             DorisOptions.builder()
                   .setFenodes("FE_IP:8030")
                   .setTableIdentifier("db.table")
                   .setUsername("root")
                   .setPassword("").build()
                ));
```
2021-10-23 18:10:47 +08:00
6029082c2a [Flink][Bug] Fix potential NPE when cancel DorisSourceFunction (#6838)
Fix potential NPE of `scalaValueReader` when cancelling DorisSourceFunction.
2021-10-23 16:45:24 +08:00
24d38614a0 [Dependency] Upgrade thirdparty libs (#6766)
Upgrade the following dependecies:

libevent -> 2.1.12
OpenSSL 1.0.2k -> 1.1.1l
thrift 0.9.3 -> 0.13.0
protobuf 3.5.1 -> 3.14.0
gflags 2.2.0 -> 2.2.2
glog 0.3.3 -> 0.4.0
googletest 1.8.0 -> 1.10.0
snappy 1.1.7 -> 1.1.8
gperftools 2.7 -> 2.9.1
lz4 1.7.5 -> 1.9.3
curl 7.54.1 -> 7.79.0
re2 2017-05-01 -> 2021-02-02
zstd 1.3.7 -> 1.5.0
brotli 1.0.7 -> 1.0.9
flatbuffers 1.10.0 -> 2.0.0
apache-arrow 0.15.1 -> 5.0.0
CRoaring 0.2.60 -> 0.3.4
orc 1.5.8 -> 1.6.6
libdivide 4.0.0 -> 5.0
brpc 0.97 -> 1.0.0-rc02
librdkafka 1.7.0 -> 1.8.0

after this pr compile doris should use build-env:1.4.0
2021-10-15 13:03:04 +08:00
237a8ae948 [Feature] support spark connector sink data using sql (#6796)
Co-authored-by: wei.zhao <wei.zhao@aispeech.com>
2021-10-09 15:47:36 +08:00
8d471007a6 [Feature] support spark connector sink stream data to doris (#6761)
* [Feature] support spark connector sink stream data to doris

* [Doc] Add spark-connector batch/stream writing instructions

* add license and remove meaningless blanks code

Co-authored-by: wei.zhao <wei.zhao@aispeech.com>
2021-09-28 17:46:19 +08:00
df5ba6b5a2 [Fix] Flink connector support json import and use httpclient to streamlaod (#6740)
* [Bug]:fix when data null , throw NullPointerException

* [Bug]:Distinguish between null and empty string

* [Feature]:flink-connector supports streamload parameters

* [Fix]:code style

* [Fix]: support json format import and use httpclient to streamload

* [Fix]:remove System out

* [Fix]:upgrade httpclient  version

* [Doc]: add json format import doc

Co-authored-by: wudi <wud3@shuhaisc.com>
2021-09-28 17:37:03 +08:00
68529d20f3 [Flink] Fix bug of flink doris connector (#6655)
Flink-Doris-Connector do not support flink 1.13, refactor doris sink forma 
to not use GenericRowData. But to use RowData::FieldGetter.
2021-09-24 21:38:35 +08:00
a7b8d110a0 Spark 2.x and 3.x version compilation instructions (#6503)
Spark 2.x and 3.x version compilation instructions
2021-08-27 10:55:29 +08:00
4ff6eb55d0 [FlinkConnector] Make flink datastream source parameterized (#6473)
make flink datastream source parameterized as List<?> instead of Object.
2021-08-22 22:03:32 +08:00
4ea2fcefbc [Improve]The connector supports spark 3.0, flink 1.13 (#6449)
Modify the flink/spark compilation documentation
2021-08-18 15:57:50 +08:00
2f90aaab8e [Doc] flink/spark connector: add sources/javadoc plugins (#6435)
spark-doris-connector/flink-doris-connect add plugins to generate javadoc and sources jar,
so can be easy to distribute and debug.
2021-08-16 22:41:24 +08:00