doris

Author	SHA1	Message	Date
wudi	a4f9628576	[improvement](datax) improvement json import and support csv writing 1.At present, read_json_by_line and fuzzy_parse are used for json format writing, and the performance of streamload writing will decrease. It is modified to strip_outer_array and fuzzy_parse writing, and the speed is increased by about 3 times. 2.Add csv writing, the column separator is set to \x01, and the row separator is set to \x02, the performance is about 5 times higher than before	2022-08-09 11:50:24 +08:00
zyzAndMz	65dd8eb885	Update init-env.sh (#11111 ) This script is missing "!"	2022-07-22 21:55:12 +08:00
Simon Cheung	468040974e	[compile]Update init-env.sh (#10451 )	2022-06-30 11:28:06 +08:00
Mingyu Chen	67f341f44e	[TLP](step-1) Remove incubator prefix (#10230 ) Remove some `incubator-` prefix in source code. The document is not modified, will be done in next PR.	2022-06-19 19:34:52 +08:00
Shuangchi He	87e3904cc6	Fix some typos for docs. (#9680 )	2022-05-19 20:55:21 +08:00
Lonre Wang	c1707ca388	[feature][datax]doriswriter support timeZone (#9327 )	2022-05-06 18:39:10 +08:00
long2ice	7af79e1df5	[Feature][dbt] add partition_type support (#9389 )	2022-05-06 15:27:34 +08:00
long2ice	2c81624765	[Features]Add dbt doris adapter (#9299 ) * Add dbt doris adapter * Add licence header to each file * Fix licence header	2022-04-29 11:40:29 +08:00
wudi	3dd6b42781	[fix](datax) Fix the problem of keyword error when importing datax (#8893 )	2022-04-08 09:20:54 +08:00
wunan1210	3b159a9820	support doriswriter build in macos (#8330 ) support doriswriter build in macos (#8330)	2022-03-07 09:53:16 +08:00
Mingyu Chen	c3b010b277	[refactor] Remove flink/spark connectors (#8004 ) As we discussed in dev@doris[1] Flink/Spark connectors has been moved to new repo: https://github.com/apache/incubator-doris-connectors [1] https://lists.apache.org/thread/hnb7bf0l6y6rzb9pr6lhxz3jjoo04skl	2022-02-10 15:00:36 +08:00
jiafeng.zhang	4ada8e4854	[fix](httpv2) make http v2 and v1 interface compatible (#7848 ) http v2 TableSchemaAction adds the return value of aggregation_type, and modifies the corresponding code of Flink/Spark Connector	2022-01-31 22:12:34 +08:00
Zhengguo Yang	4bdeef3b64	[chore][fix][doc](fe-plugin)(mysqldump) fix build auditlog plugin error (#7804 ) 1. fix problems when build fe_plugins 2. format 3. add docs about dump data using mysql dump	2022-01-26 09:11:23 +08:00
wudi	60c6bb4f92	[Feature][flink-connector] support flink delete option (#7457 ) * Flink Connector supports delete option on Unique models Co-authored-by: wudi <wud3@shuhaisc.com>	2022-01-23 20:24:41 +08:00
jiafeng.zhang	a6ff1bd79e	Flink / Spark connector compilation problem (#7725 ) Flink / Spark connector compilation problem	2022-01-14 22:14:48 +08:00
董涛	6864a376ca	[improvement](spark-connector) Throw an exception when the data push fails and there are too many retries (#7531 )	2022-01-11 15:03:06 +08:00
jiafeng.zhang	7254bcc8ca	[refactor](spark-connector) delete useless maven dependencies and some code variable definition issues (#7655 )	2022-01-09 16:58:16 +08:00
jiafeng.zhang	9aaa3f63f7	[improvement](spark-connector) Stream load http exception handling (#7514 ) Stream load http exception handling	2022-01-09 16:54:55 +08:00
weajun	3a8a85b739	[Optimize][Extension] optimize extension datax doriswriter，Remove import doris via csv in Dataxwriter, only support via json (#7568 ) * 1.Remove import doris via csv in Dataxwriter, only support via json; 2.Format Dataxwriter code; 3.Optimize exception handling and reduce multiple output of exception logs; 4.Update the dataxwriter's documentation; * Delete DorisCsvCodec.java delete unused file extension/DataX/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCsvCodec.java * 1.remove `format` config key; 2.Optimize serialization code in DorisJsonCodec class	2022-01-09 13:27:52 +08:00
Zhengguo Yang	ad35067a2a	[chore][docs] add deploy spark/flink connectors to maven release repo docs (#7616 )	2022-01-06 23:23:33 +08:00
Zhengguo Yang	738d2d2e07	[refactor] update parent pom version and optimize build scripts (#7548 )	2022-01-05 10:45:11 +08:00
Zhengguo Yang	2872dbfeb8	[refactor] Standardize the writing of pom files, prepare for deployment to maven (#7477 )	2021-12-30 10:16:37 +08:00
jiafeng.zhang	80587e7ac2	[improvement](spark-connector)(flink-connector) Modify the max num of batch written by Spark/Flink connector each time. (#7485 ) Increase the default batch size and flush interval	2021-12-26 11:13:47 +08:00
Heng Zhao	b4ce189646	[improvement](flink-connector) flush data without multi httpclients (#7329 ) (#7450 ) reuse http client to flush data	2021-12-24 21:28:35 +08:00
Heng Zhao	e9049605b6	[fix](flink-connector) Connector should visit the surviving BE nodes (#7435 )	2021-12-21 11:05:42 +08:00
wudi	549e849400	[improvement](flink-connector) DataSourceFunction read doris supports parallel (#7232 ) The previous DataSourceFunction inherited from RichSourceFunction. As a result, no matter how much the parallelism of flink is set, the parallelism of DataSourceFunction is only 1. Now modify it to RichParallelSourceFunction. And when flink has multiple degrees of parallelism, assign the doris data to each parallelism. For example, read dorisPartitions.size = 10, flink.parallelism = 4 The task is split as follows: task0: dorisPartitions[0],[4],[8] task1: dorisPartitions[1],[5],[9] task2: dorisPartitions[2],[6] task3: dorisPartitions[3],[7]	2021-12-15 16:21:29 +08:00
Mingyu Chen	c8bc0cf523	[chore][community](github) Remove travis and add github action (#7380 ) 1. Remove travis 2. Add github action to build extension: 1. docs 2. fs_broker 3. flink/spark/connector	2021-12-15 13:27:37 +08:00
wei zhao	19a3c393a9	[Improvement](spark-connector) Add 'sink.batch.size' and 'sink.max-retries' options in spark-connector (#7281 ) Add `sink.batch.size` `sink.max-retries` options in `Doris Spark-connector`. Be consistent with `link-connector` options . eg: ```scala df.write .format("doris") // specify maximum number of lines in a single flushing .option("sink.batch.size",2048) // specify number of retries after writing failed .option("sink.max-retries",3) .save() ```	2021-12-06 10:29:33 +08:00
Mingyu Chen	dcad6ff5e5	[License] Add License header for missing files (#7130 ) 1. Add License header for missing files 2. Modify the spark pom.xml to correct the location of `thrift`	2021-11-16 18:37:54 +08:00
wudi	88651a47c7	[Feature] Support Flink and Spark connector support String type (#7075 ) Support String type for Flink and Spark connector	2021-11-13 17:10:22 +08:00
tinkerrrr	ed61055912	[SparkConnector] Add thrift dir for spark connector (#7074 ) Add thrift dir for spark connector, to fix error when building spark-doris-connector	2021-11-13 17:09:52 +08:00
wei zhao	8e9f36877c	[Compile] Fix spark-connector compile problem (#7048 ) Use `thrift` in thirdparty	2021-11-11 15:42:30 +08:00
jiafeng.zhang	b54a12ef11	[Build]Compile and output the jar file, add Spark, Flink version and Scala version (#7051 ) The jar file compiled by Flink and Spark Connector, with the corresponding Flink, Spark version and Scala version at compile time, so that users can know whether the version number matches when using it. Example of output file name：doris-spark-1.0.0-spark-3.2.0_2.12.jar	2021-11-09 10:02:08 +08:00
Mingyu Chen	29838f07da	[HTTP][API] Add backends info API for spark/flink connector (#6984 ) Doris should provide a http api to return backends list for connectors to submit stream load, and without privilege checking, which can let common user to use it	2021-11-05 09:43:06 +08:00
wei zhao	d19a971582	[Revert] Revert RestService.java (#6994 )	2021-11-04 12:13:18 +08:00
wei zhao	f39a5bc1d0	[Feature] Spark connector supports to specify fields to write (#6973 ) 1. By default , Spark connector must write all fields value to `Doris` table . In this feature , user can specify part of fields to write , even specify the order of the fields to write. eg: I have a table named `student` which has three columns (name,gender,age) , creating table sql as following: ```sql create table student (name varchar(255), gender varchar(10), age int) duplicate key (name) distributed by hash(name) buckets 2; ``` Now , I just want to write values to two columns : name , gender. The code as following: ```scala val df = spark.createDataFrame(Seq( ("m", "zhangsan"), ("f", "lisi"), ("m", "wangwu") )) df.write .format("doris") .option("doris.fenodes", dorisFeNodes) .option("doris.table.identifier", dorisTable) .option("user", dorisUser) .option("password", dorisPwd) //specify your fields or the order .option("doris.write.field", "gender,name") .save() ```	2021-11-02 16:35:29 +08:00
wei zhao	466cd5dd09	[Optimize] Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x (#6956 ) * Spark connector supports multiple spark versions:2.1.x/2.3.x/2.4.x/3.x Co-authored-by: wei.zhao <wei.zhao@aispeech.com>	2021-10-29 17:06:05 +08:00
jiafeng.zhang	1f65de1a5d	Fix spark connector build error (#6948 ) pom.xml error	2021-10-29 14:59:05 +08:00
wunan1210	addfff74c4	support use char like \x01 in flink-doris-sink column & line delimiter (#6937 ) * support use char like \x01 in flink-doris-sink column & line delimiter * extend imports * add docs	2021-10-29 13:56:52 +08:00
xiaokangguo	ebb4c282b1	[Flink]Simplify the use of flink connector (#6892 ) 1. Simplify the use of flink connector like other stream sink by GenericDorisSinkFunction. 2. Add the use cases of flink connector. ## Use case ``` env.fromElements("{\"longitude\": \"116.405419\", \"city\": \"北京\", \"latitude\": \"39.916927\"}") .addSink( DorisSink.sink( DorisOptions.builder() .setFenodes("FE_IP:8030") .setTableIdentifier("db.table") .setUsername("root") .setPassword("").build() )); ```	2021-10-23 18:10:47 +08:00
Yun Tang	6029082c2a	[Flink][Bug] Fix potential NPE when cancel DorisSourceFunction (#6838 ) Fix potential NPE of `scalaValueReader` when cancelling DorisSourceFunction.	2021-10-23 16:45:24 +08:00
Zhengguo Yang	24d38614a0	[Dependency] Upgrade thirdparty libs (#6766 ) Upgrade the following dependecies: libevent -> 2.1.12 OpenSSL 1.0.2k -> 1.1.1l thrift 0.9.3 -> 0.13.0 protobuf 3.5.1 -> 3.14.0 gflags 2.2.0 -> 2.2.2 glog 0.3.3 -> 0.4.0 googletest 1.8.0 -> 1.10.0 snappy 1.1.7 -> 1.1.8 gperftools 2.7 -> 2.9.1 lz4 1.7.5 -> 1.9.3 curl 7.54.1 -> 7.79.0 re2 2017-05-01 -> 2021-02-02 zstd 1.3.7 -> 1.5.0 brotli 1.0.7 -> 1.0.9 flatbuffers 1.10.0 -> 2.0.0 apache-arrow 0.15.1 -> 5.0.0 CRoaring 0.2.60 -> 0.3.4 orc 1.5.8 -> 1.6.6 libdivide 4.0.0 -> 5.0 brpc 0.97 -> 1.0.0-rc02 librdkafka 1.7.0 -> 1.8.0 after this pr compile doris should use build-env:1.4.0	2021-10-15 13:03:04 +08:00
wei zhao	237a8ae948	[Feature] support spark connector sink data using sql (#6796 ) Co-authored-by: wei.zhao <wei.zhao@aispeech.com>	2021-10-09 15:47:36 +08:00
chovy	8d471007a6	[Feature] support spark connector sink stream data to doris (#6761 ) * [Feature] support spark connector sink stream data to doris * [Doc] Add spark-connector batch/stream writing instructions * add license and remove meaningless blanks code Co-authored-by: wei.zhao <wei.zhao@aispeech.com>	2021-09-28 17:46:19 +08:00
wudi	df5ba6b5a2	[Fix] Flink connector support json import and use httpclient to streamlaod (#6740 ) * [Bug]:fix when data null , throw NullPointerException * [Bug]:Distinguish between null and empty string * [Feature]:flink-connector supports streamload parameters * [Fix]:code style * [Fix]: support json format import and use httpclient to streamload * [Fix]:remove System out * [Fix]:upgrade httpclient version * [Doc]: add json format import doc Co-authored-by: wudi <wud3@shuhaisc.com>	2021-09-28 17:37:03 +08:00
xhmz	68529d20f3	[Flink] Fix bug of flink doris connector (#6655 ) Flink-Doris-Connector do not support flink 1.13, refactor doris sink forma to not use GenericRowData. But to use RowData::FieldGetter.	2021-09-24 21:38:35 +08:00
jiafeng.zhang	a7b8d110a0	Spark 2.x and 3.x version compilation instructions (#6503 ) Spark 2.x and 3.x version compilation instructions	2021-08-27 10:55:29 +08:00
wunan1210	4ff6eb55d0	[FlinkConnector] Make flink datastream source parameterized (#6473 ) make flink datastream source parameterized as List<?> instead of Object.	2021-08-22 22:03:32 +08:00
jiafeng.zhang	4ea2fcefbc	[Improve]The connector supports spark 3.0, flink 1.13 (#6449 ) Modify the flink/spark compilation documentation	2021-08-18 15:57:50 +08:00
wunan1210	2f90aaab8e	[Doc] flink/spark connector: add sources/javadoc plugins (#6435 ) spark-doris-connector/flink-doris-connect add plugins to generate javadoc and sources jar, so can be easy to distribute and debug.	2021-08-16 22:41:24 +08:00

1 2

79 Commits