doris

Author	SHA1	Message	Date
Zhengguo Yang	738d2d2e07	[refactor] update parent pom version and optimize build scripts (#7548 )	2022-01-05 10:45:11 +08:00
Zhengguo Yang	2872dbfeb8	[refactor] Standardize the writing of pom files, prepare for deployment to maven (#7477 )	2021-12-30 10:16:37 +08:00
jiafeng.zhang	80587e7ac2	[improvement](spark-connector)(flink-connector) Modify the max num of batch written by Spark/Flink connector each time. (#7485 ) Increase the default batch size and flush interval	2021-12-26 11:13:47 +08:00
Heng Zhao	b4ce189646	[improvement](flink-connector) flush data without multi httpclients (#7329 ) (#7450 ) reuse http client to flush data	2021-12-24 21:28:35 +08:00
Heng Zhao	e9049605b6	[fix](flink-connector) Connector should visit the surviving BE nodes (#7435 )	2021-12-21 11:05:42 +08:00
wudi	549e849400	[improvement](flink-connector) DataSourceFunction read doris supports parallel (#7232 ) The previous DataSourceFunction inherited from RichSourceFunction. As a result, no matter how much the parallelism of flink is set, the parallelism of DataSourceFunction is only 1. Now modify it to RichParallelSourceFunction. And when flink has multiple degrees of parallelism, assign the doris data to each parallelism. For example, read dorisPartitions.size = 10, flink.parallelism = 4 The task is split as follows: task0: dorisPartitions[0],[4],[8] task1: dorisPartitions[1],[5],[9] task2: dorisPartitions[2],[6] task3: dorisPartitions[3],[7]	2021-12-15 16:21:29 +08:00
Mingyu Chen	c8bc0cf523	[chore][community](github) Remove travis and add github action (#7380 ) 1. Remove travis 2. Add github action to build extension: 1. docs 2. fs_broker 3. flink/spark/connector	2021-12-15 13:27:37 +08:00
wei zhao	19a3c393a9	[Improvement](spark-connector) Add 'sink.batch.size' and 'sink.max-retries' options in spark-connector (#7281 ) Add `sink.batch.size` `sink.max-retries` options in `Doris Spark-connector`. Be consistent with `link-connector` options . eg: ```scala df.write .format("doris") // specify maximum number of lines in a single flushing .option("sink.batch.size",2048) // specify number of retries after writing failed .option("sink.max-retries",3) .save() ```	2021-12-06 10:29:33 +08:00
Mingyu Chen	dcad6ff5e5	[License] Add License header for missing files (#7130 ) 1. Add License header for missing files 2. Modify the spark pom.xml to correct the location of `thrift`	2021-11-16 18:37:54 +08:00
wudi	88651a47c7	[Feature] Support Flink and Spark connector support String type (#7075 ) Support String type for Flink and Spark connector	2021-11-13 17:10:22 +08:00
tinkerrrr	ed61055912	[SparkConnector] Add thrift dir for spark connector (#7074 ) Add thrift dir for spark connector, to fix error when building spark-doris-connector	2021-11-13 17:09:52 +08:00
wei zhao	8e9f36877c	[Compile] Fix spark-connector compile problem (#7048 ) Use `thrift` in thirdparty	2021-11-11 15:42:30 +08:00
jiafeng.zhang	b54a12ef11	[Build]Compile and output the jar file, add Spark, Flink version and Scala version (#7051 ) The jar file compiled by Flink and Spark Connector, with the corresponding Flink, Spark version and Scala version at compile time, so that users can know whether the version number matches when using it. Example of output file name：doris-spark-1.0.0-spark-3.2.0_2.12.jar	2021-11-09 10:02:08 +08:00
Mingyu Chen	29838f07da	[HTTP][API] Add backends info API for spark/flink connector (#6984 ) Doris should provide a http api to return backends list for connectors to submit stream load, and without privilege checking, which can let common user to use it	2021-11-05 09:43:06 +08:00
wei zhao	d19a971582	[Revert] Revert RestService.java (#6994 )	2021-11-04 12:13:18 +08:00
wei zhao	f39a5bc1d0	[Feature] Spark connector supports to specify fields to write (#6973 ) 1. By default , Spark connector must write all fields value to `Doris` table . In this feature , user can specify part of fields to write , even specify the order of the fields to write. eg: I have a table named `student` which has three columns (name,gender,age) , creating table sql as following: ```sql create table student (name varchar(255), gender varchar(10), age int) duplicate key (name) distributed by hash(name) buckets 2; ``` Now , I just want to write values to two columns : name , gender. The code as following: ```scala val df = spark.createDataFrame(Seq( ("m", "zhangsan"), ("f", "lisi"), ("m", "wangwu") )) df.write .format("doris") .option("doris.fenodes", dorisFeNodes) .option("doris.table.identifier", dorisTable) .option("user", dorisUser) .option("password", dorisPwd) //specify your fields or the order .option("doris.write.field", "gender,name") .save() ```	2021-11-02 16:35:29 +08:00
wei zhao	466cd5dd09	[Optimize] Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x (#6956 ) * Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x Co-authored-by: wei.zhao <wei.zhao@aispeech.com>	2021-10-29 17:06:05 +08:00
jiafeng.zhang	1f65de1a5d	Fix spark connector build error (#6948 ) pom.xml error	2021-10-29 14:59:05 +08:00
wunan1210	addfff74c4	support use char like \x01 in flink-doris-sink column & line delimiter (#6937 ) * support use char like \x01 in flink-doris-sink column & line delimiter * extend imports * add docs	2021-10-29 13:56:52 +08:00
xiaokangguo	ebb4c282b1	[Flink]Simplify the use of flink connector (#6892 ) 1. Simplify the use of flink connector like other stream sink by GenericDorisSinkFunction. 2. Add the use cases of flink connector. ## Use case ``` env.fromElements("{\"longitude\": \"116.405419\", \"city\": \"北京\", \"latitude\": \"39.916927\"}") .addSink( DorisSink.sink( DorisOptions.builder() .setFenodes("FE_IP:8030") .setTableIdentifier("db.table") .setUsername("root") .setPassword("").build() )); ```	2021-10-23 18:10:47 +08:00
Yun Tang	6029082c2a	[Flink][Bug] Fix potential NPE when cancel DorisSourceFunction (#6838 ) Fix potential NPE of `scalaValueReader` when cancelling DorisSourceFunction.	2021-10-23 16:45:24 +08:00
Zhengguo Yang	24d38614a0	[Dependency] Upgrade thirdparty libs (#6766 ) Upgrade the following dependecies: libevent -> 2.1.12 OpenSSL 1.0.2k -> 1.1.1l thrift 0.9.3 -> 0.13.0 protobuf 3.5.1 -> 3.14.0 gflags 2.2.0 -> 2.2.2 glog 0.3.3 -> 0.4.0 googletest 1.8.0 -> 1.10.0 snappy 1.1.7 -> 1.1.8 gperftools 2.7 -> 2.9.1 lz4 1.7.5 -> 1.9.3 curl 7.54.1 -> 7.79.0 re2 2017-05-01 -> 2021-02-02 zstd 1.3.7 -> 1.5.0 brotli 1.0.7 -> 1.0.9 flatbuffers 1.10.0 -> 2.0.0 apache-arrow 0.15.1 -> 5.0.0 CRoaring 0.2.60 -> 0.3.4 orc 1.5.8 -> 1.6.6 libdivide 4.0.0 -> 5.0 brpc 0.97 -> 1.0.0-rc02 librdkafka 1.7.0 -> 1.8.0 after this pr compile doris should use build-env:1.4.0	2021-10-15 13:03:04 +08:00
wei zhao	237a8ae948	[Feature] support spark connector sink data using sql (#6796 ) Co-authored-by: wei.zhao <wei.zhao@aispeech.com>	2021-10-09 15:47:36 +08:00
chovy	8d471007a6	[Feature] support spark connector sink stream data to doris (#6761 ) * [Feature] support spark connector sink stream data to doris * [Doc] Add spark-connector batch/stream writing instructions * add license and remove meaningless blanks code Co-authored-by: wei.zhao <wei.zhao@aispeech.com>	2021-09-28 17:46:19 +08:00
wudi	df5ba6b5a2	[Fix] Flink connector support json import and use httpclient to streamlaod (#6740 ) * [Bug]:fix when data null , throw NullPointerException * [Bug]:Distinguish between null and empty string * [Feature]:flink-connector supports streamload parameters * [Fix]:code style * [Fix]: support json format import and use httpclient to streamload * [Fix]:remove System out * [Fix]:upgrade httpclient version * [Doc]: add json format import doc Co-authored-by: wudi <wud3@shuhaisc.com>	2021-09-28 17:37:03 +08:00
xhmz	68529d20f3	[Flink] Fix bug of flink doris connector (#6655 ) Flink-Doris-Connector do not support flink 1.13, refactor doris sink forma to not use GenericRowData. But to use RowData::FieldGetter.	2021-09-24 21:38:35 +08:00
jiafeng.zhang	a7b8d110a0	Spark 2.x and 3.x version compilation instructions (#6503 ) Spark 2.x and 3.x version compilation instructions	2021-08-27 10:55:29 +08:00
wunan1210	4ff6eb55d0	[FlinkConnector] Make flink datastream source parameterized (#6473 ) make flink datastream source parameterized as List<?> instead of Object.	2021-08-22 22:03:32 +08:00
jiafeng.zhang	4ea2fcefbc	[Improve]The connector supports spark 3.0, flink 1.13 (#6449 ) Modify the flink/spark compilation documentation	2021-08-18 15:57:50 +08:00
wunan1210	2f90aaab8e	[Doc] flink/spark connector: add sources/javadoc plugins (#6435 ) spark-doris-connector/flink-doris-connect add plugins to generate javadoc and sources jar, so can be easy to distribute and debug.	2021-08-16 22:41:24 +08:00
huzk	b13e512a65	[Feature] Support spark connector sink data to Doris (#6256 ) support spark conector write dataframe to doris	2021-08-16 22:40:43 +08:00
Mingyu Chen	1a5b03167a	[Doc] Add document for datax and sample codes (#6389 ) Add documents for datax in extension catalog. Add documents for sampes in best-practice catalog.	2021-08-11 11:51:13 +08:00
wunan1210	929b33ac0a	[DataX] doriswriter support csv (#6373 ) make doriswriter of DataX support format csv. Format csv is more simple and faster than format json when data is simple add property format: csv/json add property column_separator: effect when format is csv, for example "\x01" , "^", etc...	2021-08-10 10:14:21 +08:00
wudi	d9fc1bf3ca	[Feature]:Flink-connector supports streamload parameters (#6243 ) Flink-connector supports streamload parameters #6199	2021-08-09 22:12:46 +08:00
Mingyu Chen	8fe5c75877	[DataX] Refactor doriswriter (#6188 ) 1. Use `read_json_by_line` to load data 2. Use FE http server as the target host of stream load	2021-07-13 11:36:40 +08:00
wudi	fcd31f29b6	[Bug][Flink] Fix when data null , flink-connector throw NullPointerException (#6165 )	2021-07-08 09:55:50 +08:00
huzk	c33321ff42	[Feature][DataX] Implementation Datax doriswriter plugin (#6107 )	2021-07-08 09:33:02 +08:00
Mingyu Chen	b69ebc3ec4	[Extension] Add DataX doriswriter extension directory (#6111 ) This CL only add the script for building DataX development environment	2021-06-30 09:55:19 +08:00
wudi	28e7d01ef7	[FlinkConnector] Support time interval for flink connector (#5934 )	2021-06-30 09:27:12 +08:00
zhangboya1	04cc6eaadc	[Log] Fix a mistake in DorisDynamicOutputFormat.java (#5963 ) Fix a mistake DorisDynamicOutputFormat.java	2021-06-06 22:06:57 +08:00
jiafeng.zhang	7ef9aa13d4	[Bug] Modify spark, flink doris connector to send request to FE, fix the problem of POST method, it should be the same as the method when sending the request (#5788 ) Modify spark, flink doris connector to send request to FE, fix the problem of POST method, it should be the same as the method when sending the request	2021-05-19 09:28:21 +08:00
wudi	7eea811f6b	[Feature] Flink Doris Connector (#5372 ) (#5375 )	2021-04-23 09:43:48 +08:00
924060929	39136011c2	[Spark-Doris-Connector][Bug-Fix] Resolve deserialize exception when Spark Doris Connector in aync deserialize mode (#5336 ) Resolve deserialize exception when Spark Doris Connector in aync deserialize mode Co-authored-by: lanhuajian <lanhuajian@sankuai.com>	2021-03-04 17:48:59 +08:00
Zhengguo Yang	5781d67afe	Fix file licences (#5414 ) Add license to files For Doris 0.14	2021-02-24 16:37:17 +08:00
张家锋	9c022e3764	[Bug] Spark doris connector http v2 authentication fails, and HTTP v2 interface returns json nesting problem (#5366 ) 1. Deal with the problem of inconsistent data format returned by http v1 and v2 2. Deal with user authentication failure	2021-02-07 09:28:55 +08:00
HuangWei	56d0cc3f54	[Spark on Doris] fix the encode of varchar when convertArrowToRowBatch (#5202 ) `convertArrowToRowBatch` use the default charset to encode String. Set it to UTF_8, because we use `arrow::utf8` on the Backends.	2021-01-10 20:48:46 +08:00
wfjcmcb	86d235a76a	[Extension] Logstash Doris output plugin (#3800 ) This plugin is used to output data to Doris for logstash Use the HTTP protocol to interact with the Doris FE Http interface Load data through Doris's stream load	2020-06-11 08:54:51 +08:00
Mingyu Chen	4cbcae1574	[Spark on Doris] Shade and provide the thrift lib in spark-doris-connector (#3631 ) Mainly changes: 1. Shade and provide the thrift lib in spark-doris-connector 2. Add a `build.sh` for spark-doris-connector 3. Move the README.md of spark-doris-connector to `docs/` 4. Change the line delimiter of `fe/src/test/java/org/apache/doris/analysis/AggregateTest.java`	2020-05-19 14:20:21 +08:00
lichaoyong	c9c58342b2	[License] Add License to codes (#3272 )	2020-04-07 16:35:13 +08:00
Youngwb	16b61b62f5	[Spark] Support convert Arrow data to RowBatch asynchronously in Spark-Doris-Connector (#3186 ) Currently, in the Spark-Doris-Connector, when Spark iteratively obtains each row of data, it needs to synchronously convert the Arrow format data into the row format required by Spark. In order to speed up the conversion process, we can add an asynchronous thread in the Connector, which is responsible for obtaining the Arrow format data from BE and converting it into the row format required by Spark calculation In our test environment, Doris cluster used 1 fe and 7 be (32C+128G). When using Spark-Doris-Connector to query a table containing 67 columns, the original query returned 69 million rows of data took about 2.5min, but after improvement, it reduced to about 1.6min, which reduced the time by about 30%	2020-03-26 21:34:37 +08:00

1 2 3

109 Commits