Commit Graph

13 Commits

Author SHA1 Message Date
86d235a76a [Extension] Logstash Doris output plugin (#3800)
This plugin is used to output data to Doris for logstash
Use the HTTP protocol to interact with the Doris FE Http interface
Load data through Doris's stream load
2020-06-11 08:54:51 +08:00
4cbcae1574 [Spark on Doris] Shade and provide the thrift lib in spark-doris-connector (#3631)
Mainly changes:
1. Shade and provide the thrift lib in spark-doris-connector
2. Add a `build.sh` for spark-doris-connector
3. Move the README.md of spark-doris-connector to `docs/`
4. Change the line delimiter of `fe/src/test/java/org/apache/doris/analysis/AggregateTest.java`
2020-05-19 14:20:21 +08:00
c9c58342b2 [License] Add License to codes (#3272) 2020-04-07 16:35:13 +08:00
16b61b62f5 [Spark] Support convert Arrow data to RowBatch asynchronously in Spark-Doris-Connector (#3186)
Currently, in the Spark-Doris-Connector, when Spark iteratively obtains each row of data,
it needs to synchronously convert the Arrow format data into the row format required by Spark.
In order to speed up the conversion process, we can add an asynchronous thread in the Connector,
which is responsible for obtaining the Arrow format data from BE and converting it into the row
format required by Spark calculation

In our test environment, Doris cluster used 1 fe and 7 be (32C+128G). When using Spark-Doris-Connector
to query a table containing 67 columns, the original query returned 69 million rows of data
took about 2.5min, but after improvement, it reduced to about 1.6min, which reduced the time by about 30%
2020-03-26 21:34:37 +08:00
e20d905d70 Remove unused KUDU codes (#3175)
KUDU table is no longer supported long time ago. Remove code related to it.
2020-03-24 13:54:05 +08:00
1550401d4b Support param exec_mem_limit for spark-doris-connctor (#2775) 2020-01-18 00:14:39 +08:00
8ea5907252 Update arrow's version to 0.15.1 and shaded it in spark-doris-connector (#2769) 2020-01-15 21:08:34 +08:00
18a11f5663 Convert from arrow to rowbatch (#2723)
For #2722
In our test environment, Doris cluster used 1 fe and 7 be (32C+128G). When using spakr-doris connecter to query a table containing 67 columns, it took about 1 hour for the query to return 69 million rows of data. After the improvement, the same query condition took 2.5 minutes and the query performance was significantly improved
2020-01-10 14:11:15 +08:00
feda66f99f Spark return error to users when spark on doris query failed (#2531) 2019-12-30 21:58:13 +08:00
435fdd236e Fix npe in spark-doris-connector when query is complex (#2503) 2019-12-19 14:53:29 +08:00
48f559600f Fix bug when spark on doris run long time (#2485) 2019-12-18 13:08:21 +08:00
0e84a88c1a Fix document bugs in spark-doris-connector (#2275) 2019-11-22 18:05:36 +08:00
732c473043 Add spark-doris-connector extension (#2228) 2019-11-22 15:38:05 +08:00