ElvinWei dfbeeccd47 [feature-wip](statistics) step2: schedule the statistics job and generate executable tasks (#8859)
This pull request includes some implementations of the statistics(https://github.com/apache/incubator-doris/issues/6370), it will not affect any existing code and users will not be able to create statistics job.

After receiving the statistics collection statement, it generates a job. Here it implements the division of statistics collection jobs according to the following statistics categories:
table:
- `row_count`: table row count are critical in estimating cardinality and memory usage of scan nodes.
- `data_size`: table size, not applicable to CBO, mainly used to monitor and manage table size.
column:
- `num_distinct_value`: used to determine the selectivity of an equivalent expression.
- `min`: The minimum value.
- `max`: The maximum value.
- `num_nulls`: number of nulls.
- `avg_col_len`: the average length of a column, in bytes, is used for memory and network IO evaluation.
- `max_col_len`: the Max length of the column, in bytes, is used for memory and network IO evaluation.

After the job is divided, statistics tasks will be obtained.
2022-04-27 11:05:43 +08:00

Apache Doris (incubating)

License Total Lines GitHub release Join the Doris Community at Slack Join the chat at https://gitter.im/apache-doris/Lobby

Doris is an MPP-based interactive SQL data warehousing for reporting and analysis. Its original name was Palo, developed in Baidu. After donated to Apache Software Foundation, it was renamed Doris.

  • Doris provides high concurrent low latency point query performance, as well as high throughput queries of ad-hoc analysis.

  • Doris provides batch data loading and real-time mini-batch data loading.

  • Doris provides high availability, reliability, fault tolerance, and scalability.

The main advantages of Doris are the simplicity (of developing, deploying and using) and meeting many data serving requirements in a single system. For details, refer to Overview.

Official website: https://doris.apache.org/

Monthly Active Contributors

Contributor over time

License

Apache License, Version 2.0

Note

Some licenses of the third-party dependencies are not compatible with Apache 2.0 License. So you need to disable some Doris features to be complied with Apache 2.0 License. For details, refer to the thirdparty/LICENSE.txt

Technology

Doris mainly integrates the technology of Google Mesa and Apache Impala, and it is based on a column-oriented storage engine and can communicate by MySQL client.

Compile and install

See Compilation

Getting start

See Basic Usage

Doris Connector

Doris provides support for Spark/Flink to read data stored in Doris through Connector, and also supports to write data to Doris through Connector.

apache/incubator-doris-flink-connector

apache/incubator-doris-spark-connector

Report issues or submit pull request

If you find any bugs, feel free to file a GitHub issue or fix it by submitting a pull request.

Contact Us

Contact us through the following mailing list.

Name Scope
dev@doris.apache.org Development-related discussions Subscribe Unsubscribe Archives
Description
No description provided
Readme 825 MiB
Languages
Java 31.7%
Groovy 22.6%
C++ 20.5%
Csound 18.9%
Python 4.2%
Other 1.8%