All Articles

Storage and Retrieval (2) - B Tree

Preface

This is the second article of a series to summarize the key concepts of Chapter 3. Storage and Retrieval in the Designing Data Intensive Application book. The series consists of 3 articles, including log-structure storage engine (SSTables and LSM Tree), page-oriented storage engine (B Tree) and column based storage engine.

Landscape of database storages

Landscape of database storages

In short, database storages are divided into two categories, row-based and column based. We will talk more about the difference of them in next article. In this article, we will talk more about the page-oriented storage under row-based storage.

Page-oriented storage engine

Almost log-structured storage engine have pro-found influence on database evolution, the most widely adopted engine is actually page-oriented storage engine.

Different from log-structure storage engine, page oriented storage engine update the value in place (in its original page), instead of appending data to latest segment file.

B Tree, which were adopted by MySQL, PostgreSQL and MongoDB, are the most adopted data structure.

B Tree

Landscape of database storages

How it works

  • For read request:

    • Binary search for root page
    • Go to corresponding child page if necessary or get the value of corresponding key directly
  • For write request:

    • Find corresponding node first
    • If found, update the value in that page directly
    • If not found, insert a node in the page.
    • If the page is full, split page into 2 pages, and equally distribute the nodes
  • Branching factor: the number of nodes in one page, usually it’s several hundreds.

    • Since B Tree is a balanced tree, the depth of tree is O(log n), where n is the number of key-value pairs.
    • A four-level tree of 4KB pages with a branching factor of 500 can store up to 256 TB.

Pros & Cons

  • Better read performance: because we are searching a N-ary search tree, the time complexity is O(log n). Where N is branching factor. On the other hands, the worst case of log-structured storage engine is nearly O(n).
  • Worse write performance: write request in log-structured storage engine is to intuitively append data, but in B Tree, we need to search key first, open the target page, update value, and then save the page back.
  • More complex concurrency control & crash recovery.

Comparison matrix of Log-structure & Page-oriented Storage engine

Category Pros & Cons Well know data structure Real world DB
Log-structure Better write performance SSTables & LSM Tree Cassandra LevelDB HBase Lucence
Page-oriented Better read performance more complex concurrency control & crash recovery B Tree MySQL PostgreSQL MongoDB

About write performance

  • Write Amplification: one write query can produce how many disk writes ?

    • For write-intensive application, disk I/O is usually the bottleneck. Usually log-structured storage engine will have less write amplification, so its write performance is usually better.
  • Log structure engines have higher write latency at high percentiles:

    • although log structure storage engines usually have better write performance, however, they are sometimes largely influenced by background compaction process disk I/O. So log structure storage engine usually have higher write latency at higher percentile.
    • On the other hands, the write latency of page oriented storage engine is more stable.

Summary

In this article, we walk through the design of page-oriented storage engine, mainly B Tree. We can know that data was updated directly in page file, so in general we can have worse write performance. But Since it’s a balanced tree structure, it can have much better read performance.

So in short, if the scenario requires much more read than write, we can consider use page-oriented storage engine. On the other hands, if we to handle a write intensive application, we can consider using log-structured storage engine.

In next article, we will talk more about column-based storage engine, which has many different design philosophy comparing to row-based storage engine. Let’s go.