Kang efdb3b79a5 [feature] add zstd compression codec (#9747)
ZSTD compression is fast with high compression ratio. It can be used to archive higher compression ratio
than default Lz4f codec for storing cost sensitive data such as logs.

Compared to Lz4f codec, we see zstd codec get 35% compressed size off, 30% faster at first time read without OS page 
cache, 40% slower at second time read with OS page cache in the following comparison test.

test data: 25GB text log, 110 million rows
test table: test_table(ts varchar(30), log string)
test SQL: set enable_vectorized_engine=1; select sum(length(log)) from test_table
be.conf: disable_storage_page_cache = true
set this config to disable doris page cache to avoid all data cached in memory for test real decompression speed.
test result

master branch with lz4f codec result: 
- compressed size 4.3G
- SQL first exec time(read data from disk + decompress + little computation) : 18.3s
- SQL second exec time(read data from OS pagecache + decompress + little computation) : 2.4s

this branch with zstd codec (hardcode enable it) result:
- compressed size: 2.8G
- SQL first exec time: 12.8s
- SQL second exec time: 3.4s
2022-05-27 21:56:18 +08:00
2022-05-19 20:55:21 +08:00
2022-05-23 11:59:15 +08:00
2022-04-30 16:55:43 +08:00

Apache Doris (incubating)

License Total Lines GitHub release Join the Doris Community at Slack Join the chat at https://gitter.im/apache-doris/Lobby

Doris is an MPP-based interactive SQL data warehousing for reporting and analysis. Its original name was Palo, developed in Baidu. After donated to Apache Software Foundation, it was renamed Doris.

  • Doris provides high concurrent low latency point query performance, as well as high throughput queries of ad-hoc analysis.

  • Doris provides batch data loading and real-time mini-batch data loading.

  • Doris provides high availability, reliability, fault tolerance, and scalability.

The main advantages of Doris are the simplicity (of developing, deploying and using) and meeting many data serving requirements in a single system. For details, refer to Overview.

Official website: https://doris.apache.org/

Monthly Active Contributors

Contributor over time

License

Apache License, Version 2.0

Note

Some licenses of the third-party dependencies are not compatible with Apache 2.0 License. So you need to disable some Doris features to be complied with Apache 2.0 License. For details, refer to the thirdparty/LICENSE.txt

Technology

Doris mainly integrates the technology of Google Mesa and Apache Impala, and it is based on a column-oriented storage engine and can communicate by MySQL client.

Compile and install

See Compilation

Getting start

See Basic Usage

Doris Connector

Doris provides support for Spark/Flink to read data stored in Doris through Connector, and also supports to write data to Doris through Connector.

apache/incubator-doris-flink-connector

apache/incubator-doris-spark-connector

Doris Manager

Doris provides one-click visual automatic installation and deployment, cluster management and monitoring tools for clusters.

apache/incubator-doris-manager

Report issues or submit pull request

If you find any bugs, feel free to file a GitHub issue or fix it by submitting a pull request.

Contact Us

Contact us through the following mailing list.

Name Scope
dev@doris.apache.org Development-related discussions Subscribe Unsubscribe Archives
Description
No description provided
Readme 825 MiB
Languages
Java 31.7%
Groovy 22.6%
C++ 20.5%
Csound 18.9%
Python 4.2%
Other 1.8%