Commit Graph

97 Commits

Author SHA1 Message Date
e1184bf4dc [fix](dbt) dbt incremental append (#20513) 2023-06-09 01:41:33 +08:00
24dd3f19cd [feature](extension) support beats output to doris (#18448) 2023-04-16 18:17:48 +08:00
45dbd4d872 [fix](dbt)fix dbt incremental #16840
fix dbt incremental :new ideas for no rollback and support incremental data rerun .
add snapshot
use 'mysql-connector-python' mysql driver to replace 'MysqlDb' driver
2023-02-18 20:40:56 +08:00
1f07e54178 [typo](docs)fix logstash readme url (#15573) 2023-01-03 22:57:36 +08:00
bd5882d08a [fix](datax)doris writer write error (#14276)
* doris writer write error
2022-11-18 18:20:13 +08:00
f5761c658f [Fix]Fix the extension mysql_to_doris bug (#13723)
* Fix the extension mysql_to_doris  BUG

e_mysql_to_doris.sh: command error,This error causes script execution errors.  :ERROR 1103 (42000) at line 1: Incorrect table name ''.
 ` ` symbol position error

* Update extension/mysql_to_doris/bin/e_mysql_to_doris.sh


Co-authored-by: Adonis Ling <adonis0147@gmail.com>
2022-10-31 08:45:34 +08:00
6ff6a4f8b2 [fix] The special library name table name problem (#13519) 2022-10-21 22:45:35 +08:00
ed19562cb3 And ali datax unified configuration naming, modify maxBatchSize to batchSize(#13278)
And ali datax unified configuration naming, modify maxBatchSize to batchSize
2022-10-11 14:51:19 +08:00
6ee150755a [refactor](datax)Refactoring doris writer code (#13226)
* Refactoring doris writer code
2022-10-11 08:47:05 +08:00
54e6f12110 [improvement](mysql-to-doris)Fully resolved Mysql external table issues (#13229)
Fully resolved Mysql external table issues
2022-10-10 16:48:52 +08:00
29fc167548 [Bug](Datax)Fix bug that the dataxwriter will drop column when convert map to json (#13042)
* fix bug that when value is null,toJSONString will drop this key value.
2022-09-29 11:37:10 +08:00
f1811e41bc [fix](config)Update user_define_tables.sh #12542 2022-09-16 10:27:28 +08:00
ef37396b63 [fix](dbt)fix dbt incremental bug (#12280) 2022-09-04 16:40:40 +08:00
df51c78593 [fix](dbt)fix dbt run abnormal #12242 2022-09-01 12:10:48 +08:00
e48b691139 Failed to get doris_odbc_name value in mysql_to_doris, replace driver value error (#11965)
Failed to get doris_odbc_name value in mysql_to_doris, replace driver value error
2022-08-29 19:13:54 +08:00
4217b9c1d3 [feature] (dbt) add incremental and init interactive command line (#11870)
add dbt-doris incremental model and init interactive command line
2022-08-25 15:03:28 +08:00
d4749c2652 [extension](mysql-to-doris) add odbc conf and some fix (#11692) 2022-08-20 18:27:48 +08:00
27f652aaff [extension](feature)Mysql database import doris by external tables (#10905) 2022-08-11 10:18:45 +08:00
a4f9628576 [improvement](datax) improvement json import and support csv writing
1.At present, read_json_by_line and fuzzy_parse are used for json format writing, and the performance of streamload writing will decrease. It is modified to strip_outer_array and fuzzy_parse writing, and the speed is increased by about 3 times.

2.Add csv writing, the column separator is set to \x01, and the row separator is set to \x02, the performance is about 5 times higher than before
2022-08-09 11:50:24 +08:00
65dd8eb885 Update init-env.sh (#11111)
This script is missing "!"
2022-07-22 21:55:12 +08:00
468040974e [compile]Update init-env.sh (#10451) 2022-06-30 11:28:06 +08:00
67f341f44e [TLP](step-1) Remove incubator prefix (#10230)
Remove some `incubator-` prefix in source code.
The document is not modified, will be done in next PR.
2022-06-19 19:34:52 +08:00
87e3904cc6 Fix some typos for docs. (#9680) 2022-05-19 20:55:21 +08:00
c1707ca388 [feature][datax]doriswriter support timeZone (#9327) 2022-05-06 18:39:10 +08:00
7af79e1df5 [Feature][dbt] add partition_type support (#9389) 2022-05-06 15:27:34 +08:00
2c81624765 [Features]Add dbt doris adapter (#9299)
* Add dbt doris adapter

* Add licence header to each file

* Fix licence header
2022-04-29 11:40:29 +08:00
3dd6b42781 [fix](datax) Fix the problem of keyword error when importing datax (#8893) 2022-04-08 09:20:54 +08:00
3b159a9820 support doriswriter build in macos (#8330)
support doriswriter build in macos (#8330)
2022-03-07 09:53:16 +08:00
c3b010b277 [refactor] Remove flink/spark connectors (#8004)
As we discussed in dev@doris[1]
Flink/Spark connectors has been moved to new repo: https://github.com/apache/incubator-doris-connectors

[1] https://lists.apache.org/thread/hnb7bf0l6y6rzb9pr6lhxz3jjoo04skl
2022-02-10 15:00:36 +08:00
4ada8e4854 [fix](httpv2) make http v2 and v1 interface compatible (#7848)
http v2 TableSchemaAction adds the return value of aggregation_type,
and modifies the corresponding code of Flink/Spark Connector
2022-01-31 22:12:34 +08:00
4bdeef3b64 [chore][fix][doc](fe-plugin)(mysqldump) fix build auditlog plugin error (#7804)
1. fix problems when build fe_plugins
2. format
3. add docs about dump data using mysql dump
2022-01-26 09:11:23 +08:00
60c6bb4f92 [Feature][flink-connector] support flink delete option (#7457)
* Flink Connector supports delete option on Unique models
Co-authored-by: wudi <wud3@shuhaisc.com>
2022-01-23 20:24:41 +08:00
a6ff1bd79e Flink / Spark connector compilation problem (#7725)
Flink / Spark connector compilation problem
2022-01-14 22:14:48 +08:00
6864a376ca [improvement](spark-connector) Throw an exception when the data push fails and there are too many retries (#7531) 2022-01-11 15:03:06 +08:00
7254bcc8ca [refactor](spark-connector) delete useless maven dependencies and some code variable definition issues (#7655) 2022-01-09 16:58:16 +08:00
9aaa3f63f7 [improvement](spark-connector) Stream load http exception handling (#7514)
Stream load http exception handling
2022-01-09 16:54:55 +08:00
3a8a85b739 [Optimize][Extension] optimize extension datax doriswriter,Remove import doris via csv in Dataxwriter, only support via json (#7568)
* 1.Remove import doris via csv in Dataxwriter, only support via json;
2.Format Dataxwriter code;
3.Optimize exception handling and reduce multiple output of exception logs;
4.Update the dataxwriter's documentation;

* Delete DorisCsvCodec.java

delete unused file extension/DataX/doriswriter/src/main/java/com/alibaba/datax/plugin/writer/doriswriter/DorisCsvCodec.java

* 1.remove `format` config key;
2.Optimize serialization code in DorisJsonCodec class
2022-01-09 13:27:52 +08:00
ad35067a2a [chore][docs] add deploy spark/flink connectors to maven release repo docs (#7616) 2022-01-06 23:23:33 +08:00
738d2d2e07 [refactor] update parent pom version and optimize build scripts (#7548) 2022-01-05 10:45:11 +08:00
2872dbfeb8 [refactor] Standardize the writing of pom files, prepare for deployment to maven (#7477) 2021-12-30 10:16:37 +08:00
80587e7ac2 [improvement](spark-connector)(flink-connector) Modify the max num of batch written by Spark/Flink connector each time. (#7485)
Increase the default batch size and flush interval
2021-12-26 11:13:47 +08:00
b4ce189646 [improvement](flink-connector) flush data without multi httpclients (#7329) (#7450)
reuse http client to flush data
2021-12-24 21:28:35 +08:00
e9049605b6 [fix](flink-connector) Connector should visit the surviving BE nodes (#7435) 2021-12-21 11:05:42 +08:00
549e849400 [improvement](flink-connector) DataSourceFunction read doris supports parallel (#7232)
The previous DataSourceFunction inherited from RichSourceFunction.
As a result, no matter how much the parallelism of flink is set, the parallelism of DataSourceFunction is only 1.
Now modify it to RichParallelSourceFunction.

And when flink has multiple degrees of parallelism, assign the doris data to each parallelism.
For example, read dorisPartitions.size = 10, flink.parallelism = 4
The task is split as follows:
task0: dorisPartitions[0],[4],[8]
task1: dorisPartitions[1],[5],[9]
task2: dorisPartitions[2],[6]
task3: dorisPartitions[3],[7]
2021-12-15 16:21:29 +08:00
c8bc0cf523 [chore][community](github) Remove travis and add github action (#7380)
1. Remove travis
2. Add github action to build extension:
    1. docs
    2. fs_broker
    3. flink/spark/connector
2021-12-15 13:27:37 +08:00
19a3c393a9 [Improvement](spark-connector) Add 'sink.batch.size' and 'sink.max-retries' options in spark-connector (#7281)
Add  `sink.batch.size` `sink.max-retries` options in `Doris Spark-connector`.
Be consistent with `link-connector` options .
eg:
```scala
   df.write
      .format("doris")
      // specify maximum number of lines in a single flushing
      .option("sink.batch.size",2048)
      // specify number of retries after writing failed
      .option("sink.max-retries",3)
      .save()
```
2021-12-06 10:29:33 +08:00
dcad6ff5e5 [License] Add License header for missing files (#7130)
1. Add License header for missing files
2. Modify the spark pom.xml to correct the location of `thrift`
2021-11-16 18:37:54 +08:00
88651a47c7 [Feature] Support Flink and Spark connector support String type (#7075)
Support String type for Flink and Spark connector
2021-11-13 17:10:22 +08:00
ed61055912 [SparkConnector] Add thrift dir for spark connector (#7074)
Add thrift dir for spark connector, to fix error when building spark-doris-connector
2021-11-13 17:09:52 +08:00
8e9f36877c [Compile] Fix spark-connector compile problem (#7048)
Use `thrift` in thirdparty
2021-11-11 15:42:30 +08:00