This pull request includes some implementations of the statistics(https://github.com/apache/incubator-doris/issues/6370), it will not affect any existing code and users will not be able to create statistics job. After receiving the statistics collection statement, it generates a job. Here it implements the division of statistics collection jobs according to the following statistics categories: table: - `row_count`: table row count are critical in estimating cardinality and memory usage of scan nodes. - `data_size`: table size, not applicable to CBO, mainly used to monitor and manage table size. column: - `num_distinct_value`: used to determine the selectivity of an equivalent expression. - `min`: The minimum value. - `max`: The maximum value. - `num_nulls`: number of nulls. - `avg_col_len`: the average length of a column, in bytes, is used for memory and network IO evaluation. - `max_col_len`: the Max length of the column, in bytes, is used for memory and network IO evaluation. After the job is divided, statistics tasks will be obtained.
Apache Doris (incubating)
Doris is an MPP-based interactive SQL data warehousing for reporting and analysis. Its original name was Palo, developed in Baidu. After donated to Apache Software Foundation, it was renamed Doris.
-
Doris provides high concurrent low latency point query performance, as well as high throughput queries of ad-hoc analysis.
-
Doris provides batch data loading and real-time mini-batch data loading.
-
Doris provides high availability, reliability, fault tolerance, and scalability.
The main advantages of Doris are the simplicity (of developing, deploying and using) and meeting many data serving requirements in a single system. For details, refer to Overview.
Official website: https://doris.apache.org/
License
Note
Some licenses of the third-party dependencies are not compatible with Apache 2.0 License. So you need to disable some Doris features to be complied with Apache 2.0 License. For details, refer to the
thirdparty/LICENSE.txt
Technology
Doris mainly integrates the technology of Google Mesa and Apache Impala, and it is based on a column-oriented storage engine and can communicate by MySQL client.
Compile and install
See Compilation
Getting start
See Basic Usage
Doris Connector
Doris provides support for Spark/Flink to read data stored in Doris through Connector, and also supports to write data to Doris through Connector.
apache/incubator-doris-flink-connector
apache/incubator-doris-spark-connector
Report issues or submit pull request
If you find any bugs, feel free to file a GitHub issue or fix it by submitting a pull request.
Contact Us
Contact us through the following mailing list.
| Name | Scope | |||
|---|---|---|---|---|
| dev@doris.apache.org | Development-related discussions | Subscribe | Unsubscribe | Archives |
Links
- Doris official site - http://doris.incubator.apache.org
- Developer Mailing list - dev@doris.apache.org. Mail to dev-subscribe@doris.apache.org, follow the reply to subscribe the mail list.
- Slack channel - Join the Slack