doris

Author	SHA1	Message	Date
jiafeng.zhang	b54a12ef11	[Build]Compile and output the jar file, add Spark, Flink version and Scala version (#7051 ) The jar file compiled by Flink and Spark Connector, with the corresponding Flink, Spark version and Scala version at compile time, so that users can know whether the version number matches when using it. Example of output file name：doris-spark-1.0.0-spark-3.2.0_2.12.jar	2021-11-09 10:02:08 +08:00
Mingyu Chen	29838f07da	[HTTP][API] Add backends info API for spark/flink connector (#6984 ) Doris should provide a http api to return backends list for connectors to submit stream load, and without privilege checking, which can let common user to use it	2021-11-05 09:43:06 +08:00
wei zhao	d19a971582	[Revert] Revert RestService.java (#6994 )	2021-11-04 12:13:18 +08:00
wei zhao	f39a5bc1d0	[Feature] Spark connector supports to specify fields to write (#6973 ) 1. By default , Spark connector must write all fields value to `Doris` table . In this feature , user can specify part of fields to write , even specify the order of the fields to write. eg: I have a table named `student` which has three columns (name,gender,age) , creating table sql as following: ```sql create table student (name varchar(255), gender varchar(10), age int) duplicate key (name) distributed by hash(name) buckets 2; ``` Now , I just want to write values to two columns : name , gender. The code as following: ```scala val df = spark.createDataFrame(Seq( ("m", "zhangsan"), ("f", "lisi"), ("m", "wangwu") )) df.write .format("doris") .option("doris.fenodes", dorisFeNodes) .option("doris.table.identifier", dorisTable) .option("user", dorisUser) .option("password", dorisPwd) //specify your fields or the order .option("doris.write.field", "gender,name") .save() ```	2021-11-02 16:35:29 +08:00
wei zhao	466cd5dd09	[Optimize] Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x (#6956 ) * Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x Co-authored-by: wei.zhao <wei.zhao@aispeech.com>	2021-10-29 17:06:05 +08:00
jiafeng.zhang	1f65de1a5d	Fix spark connector build error (#6948 ) pom.xml error	2021-10-29 14:59:05 +08:00
wunan1210	addfff74c4	support use char like \x01 in flink-doris-sink column & line delimiter (#6937 ) * support use char like \x01 in flink-doris-sink column & line delimiter * extend imports * add docs	2021-10-29 13:56:52 +08:00
xiaokangguo	ebb4c282b1	[Flink]Simplify the use of flink connector (#6892 ) 1. Simplify the use of flink connector like other stream sink by GenericDorisSinkFunction. 2. Add the use cases of flink connector. ## Use case ``` env.fromElements("{\"longitude\": \"116.405419\", \"city\": \"北京\", \"latitude\": \"39.916927\"}") .addSink( DorisSink.sink( DorisOptions.builder() .setFenodes("FE_IP:8030") .setTableIdentifier("db.table") .setUsername("root") .setPassword("").build() )); ```	2021-10-23 18:10:47 +08:00
Yun Tang	6029082c2a	[Flink][Bug] Fix potential NPE when cancel DorisSourceFunction (#6838 ) Fix potential NPE of `scalaValueReader` when cancelling DorisSourceFunction.	2021-10-23 16:45:24 +08:00
Zhengguo Yang	24d38614a0	[Dependency] Upgrade thirdparty libs (#6766 ) Upgrade the following dependecies: libevent -> 2.1.12 OpenSSL 1.0.2k -> 1.1.1l thrift 0.9.3 -> 0.13.0 protobuf 3.5.1 -> 3.14.0 gflags 2.2.0 -> 2.2.2 glog 0.3.3 -> 0.4.0 googletest 1.8.0 -> 1.10.0 snappy 1.1.7 -> 1.1.8 gperftools 2.7 -> 2.9.1 lz4 1.7.5 -> 1.9.3 curl 7.54.1 -> 7.79.0 re2 2017-05-01 -> 2021-02-02 zstd 1.3.7 -> 1.5.0 brotli 1.0.7 -> 1.0.9 flatbuffers 1.10.0 -> 2.0.0 apache-arrow 0.15.1 -> 5.0.0 CRoaring 0.2.60 -> 0.3.4 orc 1.5.8 -> 1.6.6 libdivide 4.0.0 -> 5.0 brpc 0.97 -> 1.0.0-rc02 librdkafka 1.7.0 -> 1.8.0 after this pr compile doris should use build-env:1.4.0	2021-10-15 13:03:04 +08:00
wei zhao	237a8ae948	[Feature] support spark connector sink data using sql (#6796 ) Co-authored-by: wei.zhao <wei.zhao@aispeech.com>	2021-10-09 15:47:36 +08:00
chovy	8d471007a6	[Feature] support spark connector sink stream data to doris (#6761 ) * [Feature] support spark connector sink stream data to doris * [Doc] Add spark-connector batch/stream writing instructions * add license and remove meaningless blanks code Co-authored-by: wei.zhao <wei.zhao@aispeech.com>	2021-09-28 17:46:19 +08:00
wudi	df5ba6b5a2	[Fix] Flink connector support json import and use httpclient to streamlaod (#6740 ) * [Bug]:fix when data null , throw NullPointerException * [Bug]:Distinguish between null and empty string * [Feature]:flink-connector supports streamload parameters * [Fix]:code style * [Fix]: support json format import and use httpclient to streamload * [Fix]:remove System out * [Fix]:upgrade httpclient version * [Doc]: add json format import doc Co-authored-by: wudi <wud3@shuhaisc.com>	2021-09-28 17:37:03 +08:00
xhmz	68529d20f3	[Flink] Fix bug of flink doris connector (#6655 ) Flink-Doris-Connector do not support flink 1.13, refactor doris sink forma to not use GenericRowData. But to use RowData::FieldGetter.	2021-09-24 21:38:35 +08:00
jiafeng.zhang	a7b8d110a0	Spark 2.x and 3.x version compilation instructions (#6503 ) Spark 2.x and 3.x version compilation instructions	2021-08-27 10:55:29 +08:00
wunan1210	4ff6eb55d0	[FlinkConnector] Make flink datastream source parameterized (#6473 ) make flink datastream source parameterized as List<?> instead of Object.	2021-08-22 22:03:32 +08:00
jiafeng.zhang	4ea2fcefbc	[Improve]The connector supports spark 3.0, flink 1.13 (#6449 ) Modify the flink/spark compilation documentation	2021-08-18 15:57:50 +08:00
wunan1210	2f90aaab8e	[Doc] flink/spark connector: add sources/javadoc plugins (#6435 ) spark-doris-connector/flink-doris-connect add plugins to generate javadoc and sources jar, so can be easy to distribute and debug.	2021-08-16 22:41:24 +08:00
huzk	b13e512a65	[Feature] Support spark connector sink data to Doris (#6256 ) support spark conector write dataframe to doris	2021-08-16 22:40:43 +08:00
Mingyu Chen	1a5b03167a	[Doc] Add document for datax and sample codes (#6389 ) Add documents for datax in extension catalog. Add documents for sampes in best-practice catalog.	2021-08-11 11:51:13 +08:00
wunan1210	929b33ac0a	[DataX] doriswriter support csv (#6373 ) make doriswriter of DataX support format csv. Format csv is more simple and faster than format json when data is simple add property format: csv/json add property column_separator: effect when format is csv, for example "\x01" , "^", etc...	2021-08-10 10:14:21 +08:00
wudi	d9fc1bf3ca	[Feature]:Flink-connector supports streamload parameters (#6243 ) Flink-connector supports streamload parameters #6199	2021-08-09 22:12:46 +08:00
Mingyu Chen	8fe5c75877	[DataX] Refactor doriswriter (#6188 ) 1. Use `read_json_by_line` to load data 2. Use FE http server as the target host of stream load	2021-07-13 11:36:40 +08:00
wudi	fcd31f29b6	[Bug][Flink] Fix when data null , flink-connector throw NullPointerException (#6165 )	2021-07-08 09:55:50 +08:00
huzk	c33321ff42	[Feature][DataX] Implementation Datax doriswriter plugin (#6107 )	2021-07-08 09:33:02 +08:00
Mingyu Chen	b69ebc3ec4	[Extension] Add DataX doriswriter extension directory (#6111 ) This CL only add the script for building DataX development environment	2021-06-30 09:55:19 +08:00
wudi	28e7d01ef7	[FlinkConnector] Support time interval for flink connector (#5934 )	2021-06-30 09:27:12 +08:00
zhangboya1	04cc6eaadc	[Log] Fix a mistake in DorisDynamicOutputFormat.java (#5963 ) Fix a mistake DorisDynamicOutputFormat.java	2021-06-06 22:06:57 +08:00
jiafeng.zhang	7ef9aa13d4	[Bug] Modify spark, flink doris connector to send request to FE, fix the problem of POST method, it should be the same as the method when sending the request (#5788 ) Modify spark, flink doris connector to send request to FE, fix the problem of POST method, it should be the same as the method when sending the request	2021-05-19 09:28:21 +08:00
wudi	7eea811f6b	[Feature] Flink Doris Connector (#5372 ) (#5375 )	2021-04-23 09:43:48 +08:00
924060929	39136011c2	[Spark-Doris-Connector][Bug-Fix] Resolve deserialize exception when Spark Doris Connector in aync deserialize mode (#5336 ) Resolve deserialize exception when Spark Doris Connector in aync deserialize mode Co-authored-by: lanhuajian <lanhuajian@sankuai.com>	2021-03-04 17:48:59 +08:00
Zhengguo Yang	5781d67afe	Fix file licences (#5414 ) Add license to files For Doris 0.14	2021-02-24 16:37:17 +08:00
张家锋	9c022e3764	[Bug] Spark doris connector http v2 authentication fails, and HTTP v2 interface returns json nesting problem (#5366 ) 1. Deal with the problem of inconsistent data format returned by http v1 and v2 2. Deal with user authentication failure	2021-02-07 09:28:55 +08:00
HuangWei	56d0cc3f54	[Spark on Doris] fix the encode of varchar when convertArrowToRowBatch (#5202 ) `convertArrowToRowBatch` use the default charset to encode String. Set it to UTF_8, because we use `arrow::utf8` on the Backends.	2021-01-10 20:48:46 +08:00
wfjcmcb	86d235a76a	[Extension] Logstash Doris output plugin (#3800 ) This plugin is used to output data to Doris for logstash Use the HTTP protocol to interact with the Doris FE Http interface Load data through Doris's stream load	2020-06-11 08:54:51 +08:00
Mingyu Chen	4cbcae1574	[Spark on Doris] Shade and provide the thrift lib in spark-doris-connector (#3631 ) Mainly changes: 1. Shade and provide the thrift lib in spark-doris-connector 2. Add a `build.sh` for spark-doris-connector 3. Move the README.md of spark-doris-connector to `docs/` 4. Change the line delimiter of `fe/src/test/java/org/apache/doris/analysis/AggregateTest.java`	2020-05-19 14:20:21 +08:00
lichaoyong	c9c58342b2	[License] Add License to codes (#3272 )	2020-04-07 16:35:13 +08:00
Youngwb	16b61b62f5	[Spark] Support convert Arrow data to RowBatch asynchronously in Spark-Doris-Connector (#3186 ) Currently, in the Spark-Doris-Connector, when Spark iteratively obtains each row of data, it needs to synchronously convert the Arrow format data into the row format required by Spark. In order to speed up the conversion process, we can add an asynchronous thread in the Connector, which is responsible for obtaining the Arrow format data from BE and converting it into the row format required by Spark calculation In our test environment, Doris cluster used 1 fe and 7 be (32C+128G). When using Spark-Doris-Connector to query a table containing 67 columns, the original query returned 69 million rows of data took about 2.5min, but after improvement, it reduced to about 1.6min, which reduced the time by about 30%	2020-03-26 21:34:37 +08:00
lichaoyong	e20d905d70	Remove unused KUDU codes (#3175 ) KUDU table is no longer supported long time ago. Remove code related to it.	2020-03-24 13:54:05 +08:00
Youngwb	1550401d4b	Support param exec_mem_limit for spark-doris-connctor (#2775 )	2020-01-18 00:14:39 +08:00
vinson0526	8ea5907252	Update arrow's version to 0.15.1 and shaded it in spark-doris-connector (#2769 )	2020-01-15 21:08:34 +08:00
Youngwb	18a11f5663	Convert from arrow to rowbatch (#2723 ) For #2722 In our test environment, Doris cluster used 1 fe and 7 be (32C+128G). When using spakr-doris connecter to query a table containing 67 columns, it took about 1 hour for the query to return 69 million rows of data. After the improvement, the same query condition took 2.5 minutes and the query performance was significantly improved	2020-01-10 14:11:15 +08:00
Youngwb	feda66f99f	Spark return error to users when spark on doris query failed (#2531 )	2019-12-30 21:58:13 +08:00
vinson0526	435fdd236e	Fix npe in spark-doris-connector when query is complex (#2503 )	2019-12-19 14:53:29 +08:00
Youngwb	48f559600f	Fix bug when spark on doris run long time (#2485 )	2019-12-18 13:08:21 +08:00
vinson0526	0e84a88c1a	Fix document bugs in spark-doris-connector (#2275 )	2019-11-22 18:05:36 +08:00
vinson0526	732c473043	Add spark-doris-connector extension (#2228 )	2019-11-22 15:38:05 +08:00

47 Commits