Proposal to Add Interface to Create and Retrieve CompositeKeys

# Background

Currently, secondary indexing is offloaded to an external datastore by exporting all chain data to an external source. This introduces several moving parts, including a utility like [Python Cache](https://github.com/apache/incubator-resilientdb-resilient-python-cache).

Although the chain is [in-memory](https://github.com/apache/incubator-resilientdb/blob/master/chain/state/chain_state.h#L38), every client issues a [GetAllBlocks](https://github.com/apache/incubator-resilientdb-graphql/blob/main/service/http_server/crow_service.cpp#L394) request. This retrieves the entire blockchain state, pulling *all* the data. While modifying this structure could improve overall blockchain state retrieval, that is not our current focus.

# Goal

Instead, our goal is to improve read latencies for queries based on secondary attribute lookups from the [storage layer](https://github.com/apache/incubator-resilientdb/blob/master/chain/storage/storage.h), which is the most common use case for applications. These would be for applications using the storage engine as a document store, i.e, storing values as a JSON object or otherwise as a simple key-value store.  A few applications that do this, that are part of this release, are:

1. [ResLens](https://github.com/apache/incubator-resilientdb-ResLens) - Uses ResDB as a simple KV store
2. [ResCanvas](https://github.com/ResilientApp/ResCanvas) - Uses ResDB as a document store
3. [Consensus](https://github.com/ResilientApp/Coinsensus-Backend) - Used ResDB as a document store

# Problem Statement

Currently, if an application needs to perform a lookup based on a secondary attribute, the following steps must be taken.
 - Hit this endpoint [GetAllValues](https://github.com/apache/incubator-resilientdb-graphql/blob/main/service/http_server/crow_service.cpp#L67)
 - Then manually apply filtering logic in memory at the application layer.
 - The alternative solution is to export all the data to an external datastore and sync it constantly, and apply filtering logic on it.

# Proposed Solution

Composite Keys are a great way to add indexing support on non-primary attributes. They allow for indexing by a single field, multiple fields, and converging indexes for different workloads.  They are used widely for this use case, for example, MySQL's [MyRocks](https://github.com/facebook/mysql-5.6/wiki/MyRocks-record-format#secondary-index-c) storage engine.

They are also used by [Hyperledger](https://hyperledger-fabric.readthedocs.io/en/release-2.5/) blockchain as a lightweight indexing mechanism atop a Key-Value store [Composite keys in Hyperledger](https://pkg.go.dev/github.com/hyperledger/fabric/core/chaincode/shim#ChaincodeStub.CreateCompositeKey).

We can leverage LevelDB or [RocksDB's BlobDB](https://github.com/facebook/rocksdb/wiki/BlobDB), the latter of which reduces write amplification on large value pairs, a common issue with applications that use ResilientDB as a document store.

# Technical Details
 - [Public API's and Implementation](https://github.com/apache/incubator-resilientdb/pull/181).

## Interface Details
 - [Updates to the KV proto definition](https://github.com/apache/incubator-resilientdb/blob/3374f87b0136b761dba435e361808f83558c1970/proto/kv/kv.proto)
- [Updates to the KV Executor Interface](https://github.com/apache/incubator-resilientdb/blob/3374f87b0136b761dba435e361808f83558c1970/executor/kv/kv_executor.h#L68C20-L68C27)

## Design Details

### Scope and Assumptions
1. **Transaction Management:** Atomic API's to Create Composite Keys and Update Composite Keys will be provided to the end user. The onus will be on the user to call these API's from the application layer or an SDK, which will wrap these 2 operations in a transaction. Transaction management, thus, would be an application-level construct.

_The storage layer does not guarantee index consistency if composite key creation is not wrapped in a transaction. Applications are expected to use SDK-level transaction helpers to ensure atomicity._

2. **Usage**: Instead of providing enterprise features to create an index on a dataset, users have to create on-demand indexes. This is truly up to the user on how they decide to create composite keys. These low-level constructs promote these constructs:
 - Gas Fee Application at a transaction level.
 - Index / Composite Key creation will be a registered operation on the ledger.

3. **`Update` Queries**:  Currently, the SET query refers to a PUT where a SET on an existing key updates the key-value pair with the new value.  We provide an UpdateCompositeKey API, which will update the old composite key with the new value. This operation involves the following steps:
 a. Check for the existing composite key. If the key is absent, then return.
 b. If the old composite key exists, then delete the old key.
 c. Construct the composite key with the new value and insert it.
 d. Transaction management for steps b and c, so that both these operations constitute 1 single atomic operation.

## Alternatives Considered
1. We implemented a multi-tenant solution with custom modifications to the LSM engine to attach one global secondary index. Details for this can be found down below:
 - [Implementation](https://github.com/ResilientEcosystem/Docstore)
 - [Presentation](https://docs.google.com/presentation/d/1PZd-7EtDAsY8WcHO6KkyVGfVoeThAWbtvaEeLX-eRmw/edit?slide=id.g36c6a13ac34_0_30#slide=id.g36c6a13ac34_0_30)
 - This implementation was rejected due to the compatibility of multi-tenancy with the current blockchain design.

2. Our current solution involves syncing the blockchain state with an external data source like MongoDB and applying filtering queries atop MongoDB. Well, this is a good design. We observed a lot of maintenance and assistance for new application developers to get started with setting up 2 different databases, maintaining 2 different constructs of querying one state of data.

## Failure Modes & Consistency
1. What happens if a `Put` succeeds but `CreateCompositeKey` fails?
Transaction management for all operation-specific API's will be pushed to the application. Thus, each operation carries an overhead of 1 network call.  Thus, an application-level construct like an SDK will be responsible for ensuring that if a `PUT` query fails, then `CompositeKey` creation is failed.

## Performance Considerations
1. **Current Approach**  
   - Secondary lookups require a full `O(n)` scan across all records.  
   - In practice, this also includes the cost of syncing data to MongoDB and issuing queries there, further increasing latency.  

2. **Composite Key Approach**  
   - Lookup cost is `O(log n)` to position the iterator via prefix search.  
   - Retrieval cost is `O(M)`, where `M` is the number of matching results.  
   - Overall complexity: **`O(log n + M)`**, a significant improvement over `O(n)` for large datasets. 

3. Why does `Prefix Search` work in KV stores?
### Role of Bloom Filters in Prefix Searches

Bloom filters complement composite key lookups by reducing unnecessary I/O during prefix queries:

1. **File Pruning for Seeks**  
   - For a prefix search, the initial `Seek(prefix)` must locate the first matching key across multiple SST files.  
   - Bloom filters allow LevelDB/RocksDB to skip SST files that cannot possibly contain keys with the given prefix, avoiding wasted binary searches.  

2. **Faster Lookup Startup**  
   - The `O(log n)` seek is accelerated since only candidate files are probed.  
   - This is especially valuable in large datasets with many SST files spread across levels.  

3. **Reduced I/O Amplification**  
   - Queries with narrow prefixes benefit the most: Bloom filters prevent LevelDB / RocksDB from scanning irrelevant files.  
   - Once the iterator is positioned, sequential retrieval of matches (`O(M)`) proceeds as normal and does not use Bloom filters.  

**Summary:**  
Bloom filters do not optimize the *sequential scan* phase of a prefix search, but they significantly speed up the *initial seek* by pruning SST files. This ensures that prefix lookups retain `O(log n + M)` complexity rather than approaching `O(n)` in practice.

## Migration / Backwards Compatibility

- **Data Compatibility:**  
  Existing data remains untouched. Composite keys are stored as additional key-value entries, so no schema or data migration is required.  

- **API Compatibility:**  
  Existing APIs (`Put`, `Get`, `GetAllValues`) continue to work as before. Composite key APIs (`CreateCompositeKey`, `GetByCompositeKey`, `UpdateCompositeKey`) are additive and optional.  

- **Application Migration:**  
  Applications can adopt composite keys incrementally. They may continue to rely on client-side filtering or external databases, and gradually replace those with composite key lookups as needed.  

- **Optional Adoption:**  
  No changes are required for current users. Only applications that explicitly invoke the new APIs will see the benefits.  

## Operational Concerns
 - As discussed, the Gas Fee at a transaction level is an actively discussed feature requirement. Thus, users currently don't get levied a fee on any of the existing operations or our newly proposed operations. With our current design, however, the  Gas Fee application can be done since the components for index management have been extracted into individual operations, and thus a Gas Fee on every operation can be applied.

# Preliminary Results:
- Benchmark Conducted using Google Bench for 1000,10,000, and 100,000 random inserts. 
- [Bench Script](https://github.com/apache/incubator-resilientdb/blob/3374f87b0136b761dba435e361808f83558c1970/benchmark/storage/composite_key_benchmark.cpp)
- [Unit Tests](https://github.com/apache/incubator-resilientdb/blob/3374f87b0136b761dba435e361808f83558c1970/executor/kv/kv_executor_test.cpp)
 
Environment Details
```
Running /home/ubuntu/.cache/bazel/_bazel_ubuntu/4daa26ab31ab249b7607bfe58eb14371/execroot/com_resdb_nexres/bazel-out/k8-fastbuild/bin/benchmark/storage/composite_key_benchmark
Run on (8 X 3600.53 MHz CPUs)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 1024 KiB (x4)
  L3 Unified 36608 KiB (x1)
Load Average: 2.07, 0.97, 0.48
***WARNING*** Library was built as DEBUG. Timings may be affected.
```
| Records     | Filter Type | Current Solution | Composite Key Solution | Perceived Speedup |
|-------------|-------------|------------------|-------------------------|-------------------|
| 1,000       | String      | 43.7 ms          | 6.34 ms                 | 6.9×              |
| 10,000      | String      | 472 ms           | 73.1 ms                 | 6.5×              |
| 100,000     | String      | 5159 ms          | 745 ms                  | 6.9×              |
| 1,000       | Integer     | 43.3 ms          | 6.36 ms                 | 6.8×              |
| 10,000      | Integer     | 471 ms           | 69.7 ms                 | 6.8×              |
| 100,000     | Integer     | 5132 ms          | 732 ms                  | 7.0×              |

# Advantages of our Solution
 - **Improved Read Latency**: Enables faster lookups by indexing secondary attributes directly in the storage layer.

- **Reduces Application Complexity**: Eliminates the need for client-side filtering logic or full data scans.

- **No External Sync Required**: Removes dependency on external databases and continuous export/sync pipelines.

- **Lightweight Implementation**: Composite keys require minimal overhead and can be implemented without significant architectural changes.

- **Supports Richer Queries**: Enables filtering and retrieval by multiple fields or field combinations (converged indexes).

- **Built on Proven Techniques**: Uses battle-tested patterns from systems like MyRocks (MySQL) and Hyperledger Fabric.

- **Document Store Friendly**: Optimized for use cases where values are stored as JSON or large blobs, especially with RocksDB’s BlobDB.

- **Scalable Design**: Can handle high write and read throughput while preserving query efficiency.

- **Expands Use Cases**: Unlocks new classes of applications like dashboards, real-time analytics, and search-backed services.

- **Easy Adoption**: Current Applications do not need to change their code. All these features are add-ons and can be added without deleting, modifying, or migrating data.

# Disadvantages of our Solution
 - **Increased Write Amplification**: Any external index needs extra space; this is the classic space vs time conundrum in CS. We can limit applications to the number of composite keys they can create. Creating the actual keys takes up very little space. These composite keys are just heap pointers and are by default non-clustered indexes.

- **Increased App Complexity**: We do not automatically create composite keys; rather ask the developer to build it. This acts just like inserting yet another key-value pair.  Now the write becomes non-atomic, but this can be handled by the application, where first the process of inserting the data and creating the index is wrapped in a transaction.

# Future Directions / Ideas
 - Currently, the in-memory blockchain state is maintained in a C++ `unordered_map` container. The use of an in-memory database to manage blockchain state will allow for more robust querying and maintenance of an in-memory chain state.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal to Add Interface to Create and Retrieve CompositeKeys #180

Background

Goal

Problem Statement

Proposed Solution

Technical Details

Interface Details

Design Details

Scope and Assumptions

Alternatives Considered

Failure Modes & Consistency

Performance Considerations

Role of Bloom Filters in Prefix Searches

Migration / Backwards Compatibility

Operational Concerns

Preliminary Results:

Advantages of our Solution

Disadvantages of our Solution

Future Directions / Ideas

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Records	Filter Type	Current Solution	Composite Key Solution	Perceived Speedup
1,000	String	43.7 ms	6.34 ms	6.9×
10,000	String	472 ms	73.1 ms	6.5×
100,000	String	5159 ms	745 ms	6.9×
1,000	Integer	43.3 ms	6.36 ms	6.8×
10,000	Integer	471 ms	69.7 ms	6.8×
100,000	Integer	5132 ms	732 ms	7.0×

Proposal to Add Interface to Create and Retrieve CompositeKeys #180

Description

Background

Goal

Problem Statement

Proposed Solution

Technical Details

Interface Details

Design Details

Scope and Assumptions

Alternatives Considered

Failure Modes & Consistency

Performance Considerations

Role of Bloom Filters in Prefix Searches

Migration / Backwards Compatibility

Operational Concerns

Preliminary Results:

Advantages of our Solution

Disadvantages of our Solution

Future Directions / Ideas

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions