Home

Awesome

<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <div align="center">

Apache Doris

License GitHub release OSSRank Commit activity EN doc CN doc

<div>

Official Website Quick Download

</div> <div> <a href="https://twitter.com/doris_apache"><img src="https://img.shields.io/badge/- @Doris_Apache -424549?style=social&logo=x" height=25></a> &nbsp; <a href="https://github.com/apache/doris/discussions"><img src="https://img.shields.io/badge/- Discussion -red?style=social&logo=discourse" height=25></a> &nbsp; <a href="https://apachedoriscommunity.slack.com/join/shared_invite/zt-2kl08hzc0-SPJe4VWmL_qzrFd2u2XYQA"><img src="https://img.shields.io/badge/-Slack-red?style=social&logo=slack" height=25></a> &nbsp; <a href="https://medium.com/@ApacheDoris"><img src="https://img.shields.io/badge/-Medium-red?style=social&logo=medium" height=25></a> </div> </div>

Apache Doris is an easy-to-use, high-performance and real-time analytical database based on MPP architecture, known for its extreme speed and ease of use. It only requires a sub-second response time to return query results under massive data and can support not only high-concurrent point query scenarios but also high-throughput complex analysis scenarios.

All this makes Apache Doris an ideal tool for scenarios including report analysis, ad-hoc query, unified data warehouse, and data lake query acceleration. On Apache Doris, users can build various applications, such as user behavior analysis, AB test platform, log retrieval analysis, user portrait analysis, and order analysis.

๐ŸŽ‰ Version 2.1.4 released now. Check out the ๐Ÿ”—Release Notes here. The 2.1 verison delivers exceptional performance with 100% higher out-of-the-box queries proven by TPC-DS 1TB tests, enhanced data lake analytics that are 4-6 times speedier than Trino and Spark, solid support for semi-structured data analysis with new Variant types and suite of analytical functions, asynchronous materialized views for query acceleration, optimized real-time writing at scale, and better workload management with stability and runtime SQL resource tracking.

๐ŸŽ‰ Version 2.0.12 is now released ! This fully evolved and stable release is ready for all users to upgrade. Check out the ๐Ÿ”—Release Notes here.

๐Ÿ‘€ Have a look at the ๐Ÿ”—Official Website for a comprehensive list of Apache Doris's core features, blogs and user cases.

๐Ÿ“ˆ Usage Scenarios

As shown in the figure below, after various data integration and processing, the data sources are usually stored in the real-time data warehouse Apache Doris and the offline data lake or data warehouse (in Apache Hive, Apache Iceberg or Apache Hudi).

<br /> <img src="https://cdn.selectdb.com/static/What_is_Apache_Doris_3_a61692c2ce.png" /> <br />

Apache Doris is widely used in the following scenarios:

๐Ÿ–ฅ๏ธ Core Concepts

๐Ÿ“‚ Architecture of Apache Doris

The overall architecture of Apache Doris is shown in the following figure. The Doris architecture is very simple, with only two types of processes.

Both types of processes are horizontally scalable, and a single cluster can support up to hundreds of machines and tens of petabytes of storage capacity. And these two types of processes guarantee high availability of services and high reliability of data through consistency protocols. This highly integrated architecture design greatly reduces the operation and maintenance cost of a distributed system.

<br />

The overall architecture of Apache Doris

<br />

In terms of interfaces, Apache Doris adopts MySQL protocol, supports standard SQL, and is highly compatible with MySQL dialect. Users can access Doris through various client tools and it supports seamless connection with BI tools.

๐Ÿ’พ Storage Engine

Doris uses a columnar storage engine, which encodes, compresses, and reads data by column. This enables a very high compression ratio and largely reduces irrelavant data scans, thus making more efficient use of IO and CPU resources. Doris supports various index structures to minimize data scans:

๐Ÿ’ฟ Storage Models

Doris supports a variety of storage models and has optimized them for different scenarios:

Doris also supports strongly consistent materialized views. Materialized views are automatically selected and updated, which greatly reduces maintenance costs for users.

๐Ÿ” Query Engine

Doris adopts the MPP model in its query engine to realize parallel execution between and within nodes. It also supports distributed shuffle join for multiple large tables so as to handle complex queries.

<br />

Query Engine

<br />

The Doris query engine is vectorized, with all memory structures laid out in a columnar format. This can largely reduce virtual function calls, improve cache hit rates, and make efficient use of SIMD instructions. Doris delivers a 5โ€“10 times higher performance in wide table aggregation scenarios than non-vectorized engines.

<br />

Doris query engine

<br />

Apache Doris uses Adaptive Query Execution technology to dynamically adjust the execution plan based on runtime statistics. For example, it can generate runtime filter, push it to the probe side, and automatically penetrate it to the Scan node at the bottom, which drastically reduces the amount of data in the probe and increases join performance. The runtime filter in Doris supports In/Min/Max/Bloom filter.

๐Ÿš… Query Optimizer

In terms of optimizers, Doris uses a combination of CBO and RBO. RBO supports constant folding, subquery rewriting, predicate pushdown and CBO supports Join Reorder. The Doris CBO is under continuous optimization for more accurate statistical information collection and derivation, and more accurate cost model prediction.

Technical Overview: ๐Ÿ”—Introduction to Apache Doris

๐ŸŽ† Why choose Apache Doris?

๐Ÿ™Œ Contributors

Apache Doris has graduated from Apache incubator successfully and become a Top-Level Project in June 2022.

Currently, the Apache Doris community has gathered more than 400 contributors from nearly 200 companies in different industries, and the number of active contributors is close to 100 per month.

Monthly Active Contributors

Contributor over time

We deeply appreciate ๐Ÿ”—community contributors for their contribution to Apache Doris.

๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Users

Apache Doris now has a wide user base in China and around the world, and as of today, Apache Doris is used in production environments in thousands of companies worldwide. More than 80% of the top 50 Internet companies in China in terms of market capitalization or valuation have been using Apache Doris for a long time, including Baidu, Meituan, Xiaomi, Jingdong, Bytedance, Tencent, NetEase, Kwai, Sina, 360, Mihoyo, and Ke Holdings. It is also widely used in some traditional industries such as finance, energy, manufacturing, and telecommunications.

The users of Apache Doris: ๐Ÿ”—Users

Add your company logo at Apache Doris Website: ๐Ÿ”—Add Your Company

๐Ÿ‘ฃ Get Started

๐Ÿ“š Docs

All Documentation ๐Ÿ”—Docs

โฌ‡๏ธ Download

All release and binary version ๐Ÿ”—Download

๐Ÿ—„๏ธ Compile

See how to compile ๐Ÿ”—Compilation

๐Ÿ“ฎ Install

See how to install and deploy ๐Ÿ”—Installation and deployment

๐Ÿงฉ Components

๐Ÿ“ Doris Connector

Doris provides support for Spark/Flink to read data stored in Doris through Connector, and also supports to write data to Doris through Connector.

๐Ÿ”—apache/doris-flink-connector

๐Ÿ”—apache/doris-spark-connector

๐ŸŒˆ Community and Support

๐Ÿ“ค Subscribe Mailing Lists

Mail List is the most recognized form of communication in Apache community. See how to ๐Ÿ”—Subscribe Mailing Lists

๐Ÿ™‹ Report Issues or Submit Pull Request

If you meet any questions, feel free to file a ๐Ÿ”—GitHub Issue or post it in ๐Ÿ”—GitHub Discussion and fix it by submitting a ๐Ÿ”—Pull Request

๐Ÿป How to Contribute

We welcome your suggestions, comments (including criticisms), comments and contributions. See ๐Ÿ”—How to Contribute and ๐Ÿ”—Code Submission Guide

โŒจ๏ธ Doris Improvement Proposals (DSIP)

๐Ÿ”—Doris Improvement Proposal (DSIP) can be thought of as A Collection of Design Documents for all Major Feature Updates or Improvements.

๐Ÿ”‘ Backend C++ Coding Specification

๐Ÿ”— Backend C++ Coding Specification should be strictly followed, which will help us achieve better code quality.

๐Ÿ’ฌ Contact Us

Contact us through the following mailing list.

NameScope
dev@doris.apache.orgDevelopment-related discussionsSubscribeUnsubscribeArchives

๐Ÿงฐ Links

๐Ÿ“œ License

Apache License, Version 2.0

Note Some licenses of the third-party dependencies are not compatible with Apache 2.0 License. So you need to disable some Doris features to be complied with Apache 2.0 License. For details, refer to the thirdparty/LICENSE.txt