This project is an in-memory implementation of the Bε-tree (B-epsilon tree) data structure, a write-optimized variant of B-trees. The Bε-tree improves insert and update performance by buffering updates in internal nodes and flushing them in batches down the tree, while maintaining efficient query performance.
The implementation follows the design described by Bender et al. in their seminal paper "An Introduction to Bε-trees and Write-Optimization", and supports the core operations:
- Insert
- Update (only if key exists)
- Delete
- Point Query
It is designed to be modular, configurable, and extensible, with clear separation of tree and node logic.
- Write-optimized: Buffers updates in internal nodes to batch writes and reduce overhead.
- Balanced tree: Splits leaves and internal nodes to maintain logarithmic height.
- Upsert support: Handles insert, update, and delete messages as buffered operations.
- Configurable parameters: Node size, buffer size, and epsilon parameter can be tuned.
- Query correctness: Queries apply buffered updates along the root-to-leaf path.
- Real-time interaction: Includes an interactive CLI mode for live operations.
- C++ compiler with C++11 support or higher (tested with
g++
) - Standard C++ libraries
- Clone or download the project source code.
- Navigate to the project directory.
- Compile all source files together:
g++ *.cpp -o betree
Run the compiled executable:
./betree
You will enter an interactive prompt supporting the following commands:
insert
— Insert a key-value pair.update
— Update the value of an existing key.delete
— Delete a key.query
— Query the value of a key.exit
— Exit the program.
Example:
> insert
key: 10
value: 100
Inserted (10, 100)
> query
key: 10
Key 10 has value 100
> update
key: 10
value: 200
Updated (10, 200)
> query
key: 10
Key 10 has value 200
> delete
key: 10
Deleted key 10
> query
key: 10
Key 10 not found
> exit
You can tune the tree’s behavior by modifying constants in be_tree_config.hpp
:
NUM_DATA_PAIRS
— Maximum key-value pairs in a leaf node.NUM_PIVOTS
— Number of pivots in internal nodes (controls fanout).NUM_UPSERTS
— Size of the upsert buffer in internal nodes.FLUSH_THRESHOLD
andLEAF_FLUSH_THRESHOLD
— Flush batch sizes.
These parameters correspond to the node size (B) and epsilon (ε) in Bε-tree theory:
NUM_PIVOTS
≈ $ B^\epsilon $NUM_UPSERTS
≈ $ B - B^\epsilon $
Adjust these to balance query and insert performance.
be_tree_config.hpp
— Configuration constants and types.be_tree_node.hpp/cpp
— Node class and internal node/leaf logic.be_tree.hpp/cpp
— Tree class and high-level operations.be_tree_insert.cpp
— Insert operation.be_tree_update.cpp
— Update operation with presence check.be_tree_delete.cpp
— Delete operation.be_tree_query.cpp
— Query operation.main.cpp
— Interactive CLI program.
- Michael A. Bender et al., An Introduction to Bε-trees and Write-Optimization, USENIX Login, Oct 2015. https://www.usenix.org/publications/login/oct15/bender
- Bε-tree theory balances write-optimization and query efficiency by tuning ε between 0 and 1.
- Compared to B-trees and LSM-trees, Bε-trees offer superior insert throughput and comparable query performance.