Add composite keys for secondary indexes (refs #180) by shrayarora8 · Pull Request #254 · apache/incubator-resilientdb

shrayarora8 · 2026-05-12T01:55:36Z

What this PR does

This is the first part of work on the composite keys feature from issue #180. It adds the storage layer primitives that make secondary attribute lookups O(log N + M) instead of O(N).

I split the work into 3 phases so each PR stays small and reviewable:

Phase 1 (this PR): codec for encoding/decoding composite keys. Four new storage methods + filter for user-facing scans + tests + benchmark
Phase 2 (future PR): proto + KVExecutor wiring
Phase 3 (future PR): client SDK + end-to-end benchmark

Encoding

A composite key ("ck") is a single string with this layout:

"ck"  \0  index_name  \0  attribute_1  \0  [attribute_2 \0] ...  \0  primary_key

The "ck" namespace plus null delimiter (\0) (3 bytes total) keeps these entries from colliding with regular user keys. Even a user key like "ck_balance" is safe, it starts with "ck" but not with "ck\0".
\0 is used as the field separator because it's illegal in user supplied attribute strings, so it can never appear inside a field. The encoder rejects inputs that contain it.
The value stored alongside the key is empty (""). The key itself encodes everything we need.

Why is the value empty?

The composite key is an index entry, not user data. The user's actual data is stored separately under their primary key (e.g. via SetValueWithVersion("user:1", <data>, 0)). The composite key's job is just to record the existence of an (attribute → primary_key) association so that later, when someone asks "give me everyone whose city is Davis", we can find all the matching primary keys without scanning everything. The onus is on the client to create the composite key on the secondary attribute (s).
The workflow looks like this:

App stores user data: SetValueWithVersion("user:1", <data>, 0)
App creates an index marker: CreateCompositeKey(EncodeCompositeKey("byCity", {"Davis"}, "user:1"))
Later, app queries by city: GetByCompositeKeyPrefix(EncodeCompositeKeyPrefix("byCity", {"Davis"}))
- that returns all composite keys with prefix "ck\0byCity\0Davis\0"
- app decodes each one to pull out the primary key
- app calls GetValue(primary_key) for the data
  The composite key string already encodes everything we need (index name, attribute values, primary key). Storing anything in the value would just be duplication. LevelDB sorts by key, not value . Keeping it empty also keeps the index cheap on disk.

For a prefix scan, the codec builds the same string up to (and including) the trailing \0 after the last attribute. That makes it a strict byte prefix of every key we want to match.

Architecture / flow

Composite keys live in the storage layer only for this PR.

Write path:

application
   ↓ CreateCompositeKey(encoded_string)
Storage interface
   ↓
ResLevelDB::CreateCompositeKey  →  leveldb::DB::Put(key, "")
   or
MemoryDB::CreateCompositeKey    →  ck_map_[key] = ""

Read path:

application
   ↓ GetByCompositeKeyPrefix(prefix)
Storage interface
   ↓
ResLevelDB  →  iter.Seek(prefix), walk while memcmp matches  →  O(log N + M)
MemoryDB    →  ck_map_.lower_bound(prefix), walk while matches  →  O(log N + M)

Both backends use sorted-order data structures, so Seek / lower_bound jumps to the first candidate key, and the iteration stops at the first non-match.

UpdateCompositeKey is implemented as an atomic delete + insert. On LevelDB this uses WriteBatch so we never end up in a state where the old key is gone but the new one didn't make it in.

Filter for user-facing scans

GetAllItems() and GetKeyRange() had to be tweaked so they don't return composite-key markers to applications that just want their own data. The filter is:

if (key starts with "ck\0") continue;

I tested explicitly that:

composite keys are hidden from GetAllItems
a user key called "ck_balance" is NOT hidden (the filter checks "ck\0", not "ck")
composite keys are still reachable via GetByCompositeKeyPrefix — they're hidden, not deleted

For MemoryDB, no filter is needed: composite keys live in a separate std::map<std::string, std::string> ck_map_, so they can't leak into GetAllItems by construction.

Tests

`composite_key_codec_test.cpp` (5 tests)

RoundTrip — Encode ("byOwner", ["alice", "active"], "user:1"), then decode the result and check I get back the same index name, the attributes in the same order, and the same primary key. Basic correctness check.
RejectDelimInInput — If any input field contains \0, encoding must refuse and return "". Otherwise you'd produce a key that can't be decoded. Side note: C-string literals like "in\0dex" truncate at the null byte and never actually trigger this path, so the test builds the strings explicitly with std::string("in") + kCompositeKeyDelim + "dex".
PrefixIsStrictBytePrefix — For any (idx, attrs), EncodeCompositeKeyPrefix(idx, attrs) must be a strict byte prefix of EncodeCompositeKey(idx, attrs, any_pk). This is the property that makes the LevelDB Seek + walk pattern correct — if the prefix wasn't an exact byte prefix, Seek could land in the wrong spot.
DecodeMalformed — Garbage inputs (missing namespace, only one field after the namespace) return false instead of crashing or returning wrong data.
EmptyAttributes — Encoding/decoding still works with zero attributes (just index_name + primary_key). Unusual case but allowed by the format, so it has to work.

`kv_storage_test.cpp` (6 new parametrized tests)

Each runs against all three backends — MemoryDB, ResLevelDB, ResLevelDB-with-block-cache — so 18 cases total.

CreateAndRetrieveCompositeKey — Insert three Davis users, prefix-scan, verify all three come back.
PrefixScanOrdering — Insert keys out of order (SF, Davis-1, Davis-2, NYC), then do a Davis prefix scan. The result must be [Davis-1, Davis-2] in that exact order. This matters for BFT determinism as every honest replica running the same scan against the same state has to produce the same byte-for-byte output, otherwise replicas would diverge during consensus.
DeleteRemovesEntry — Create, delete, prefix-scan returns 0 results. Confirms DeleteCompositeKey actually removes the marker.
UpdateIsAtomic — Move user:1 from Davis to SF via UpdateCompositeKey. After the call, Davis prefix has 0 entries and SF prefix has 1 entry. On LevelDB this is enforced with WriteBatch.
EmptyPrefixScanReturnsNothing — Insert Davis keys, scan for NYC prefix, get 0 results. Confirms no false positives — the Seek + walk pattern doesn't accidentally pick up keys from other prefixes.
GetAllItemsExcludesCompositeKeys — The filter test. Setup: insert two regular user keys (user:1, user:2), one user key that starts with "ck" but is real data (ck_balance — important edge case), and two composite key markers. Then GetAllItems() must return exactly the 3 real keys (including ck_balance) and exclude the markers. As a sanity check, I also verify the markers are still reachable via GetByCompositeKeyPrefix they're hidden from the user-facing API, not deleted.
All 48 tests in kv_storage_test pass (30 + 18 cases).

Benchmark

benchmark/storage/composite_key_benchmark.cpp measures the two paths head-to-head. Setup for each (N, selectivity) cell:

Spin up a fresh LevelDB instance in /tmp
Insert N user records with primary keys user:0, user:1, ..., user:N-1. The first selectivity * N records get value "Davis"; the rest get "Other".
For every record, also create a composite-key index entry under byCity. So LevelDB ends up with roughly 2N keys.
The two paths being measured:

OLD: GetAllItems() returns every user record. filter value == "Davis" in C++ code. O(N), has to touch every record.
NEW: GetByCompositeKeyPrefix("byCity\0Davis\0") returns just the matching index entries. O(log N + M), Seek lands at the prefix in O(log N), then walks M matches.
Sample run on Apple M2:
| Records | Selectivity | OLD (ms) | NEW (ms) | Speedup |
|---|---|---|---|---|
| 1,000 | 1% | 0.39 | 0.00 | 151.9× |
| 1,000 | 10% | 0.37 | 0.01 | 32.0× |
| 10,000 | 1% | 4.44 | 0.02 | 225.3× |
| 10,000 | 10% | 4.43 | 0.13 | 34.9× |
| 100,000 | 1% | 52.60 | 0.15 | 340.8× |
| 100,000 | 10% | 52.20 | 1.37 | 38.2× |
Run it with:

bazel run //benchmark/storage:composite_key_benchmark --copt=-Wno-implicit-function-declaration

Why benchmark at the storage layer

The composite-keys feature is a storage-layer optimization. Consensus, networking, and the executor are unchanged. Consensus adds a constant per request overhead, so an end to end benchmark would be measuring consensus variance more than the storage gains.

End-to-end measurement makes more sense once Phase 3 (executor) and Phase 4 (client) are in place. That benchmark will live in the Phase 4 PR.

Files touched

New files:

chain/storage/composite_key_codec.h
chain/storage/composite_key_codec.cpp
chain/storage/composite_key_codec_test.cpp
benchmark/storage/BUILD
benchmark/storage/composite_key_benchmark.cpp

Modified files (all in chain/storage/):

storage.h — 4 pure virtual methods added to the interface
leveldb.h — 4 method declarations
leveldb.cpp — 4 method implementations + filter in GetAllItems/GetKeyRange
memory_db.h — 4 method declarations + ck_map_ member
memory_db.cpp — 4 method implementations
kv_storage_test.cpp — 6 new parametrized tests
BUILD — wire up the codec library + add deps to existing rules

Every change is contained within chain/storage/ and the new benchmark/storage/ directory. Consensus, networking, the executor, the client SDK, and protobuf definitions are all untouched.

Future work

Phase 3: add a KVRequest op for composite keys and route it through KVExecutor
Phase 4 — client SDK + end-to-end benchmark. Once the executor knows about composite keys, the next step is letting actual applications use them. The client SDK would expose something like client.CreateCompositeKey(index_name, attributes, primary_key) and client.LookupByPrefix(index_name, attribute_prefix). After that's done, we can run a real benchmark over a running ResilientDB cluster and measure end-to-end speedup with real world overhead baked.

Refs #180

) Implements Phase 1 + 2 of apache#180: codec, four storage methods on both backends, scan filter, unit tests, and a microbenchmark. Executor and client wiring follow in separate PRs.

cjcchen · 2026-05-25T02:55:12Z

Please address the action issues above: build failed/ UT failed.

Bismanpal-Singh · 2026-05-29T23:45:47Z

@shrayarora8 Please address the build issues before we merge in.

chain/storage: add composite keys for secondary indexes (refs apache#180

c8a8315

) Implements Phase 1 + 2 of apache#180: codec, four storage methods on both backends, scan filter, unit tests, and a microbenchmark. Executor and client wiring follow in separate PRs.

shrayarora8 force-pushed the feature/composite-keys-storage branch from b1cfa50 to c8a8315 Compare May 12, 2026 02:10

Merge branch 'master' into feature/composite-keys-storage

328fb2f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add composite keys for secondary indexes (refs #180)#254

Add composite keys for secondary indexes (refs #180)#254
shrayarora8 wants to merge 2 commits into
apache:masterfrom
shrayarora8:feature/composite-keys-storage

shrayarora8 commented May 12, 2026 •

edited

Loading

Uh oh!

cjcchen commented May 25, 2026 •

edited

Loading

Uh oh!

Bismanpal-Singh commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shrayarora8 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Encoding

Why is the value empty?

Architecture / flow

Filter for user-facing scans

Tests

composite_key_codec_test.cpp (5 tests)

kv_storage_test.cpp (6 new parametrized tests)

Benchmark

Why benchmark at the storage layer

Files touched

Future work

Uh oh!

cjcchen commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bismanpal-Singh commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shrayarora8 commented May 12, 2026 •

edited

Loading

`composite_key_codec_test.cpp` (5 tests)

`kv_storage_test.cpp` (6 new parametrized tests)

cjcchen commented May 25, 2026 •

edited

Loading