(tree): Incremental summarization of chunked forest #24949

agarwal-navin · 2025-07-01T16:57:34Z

Reviewer guidance

There are a couple of TODOs that will be fixed before merging the PR. I have kept it so it helps with the review.
There are some outstanding work which I plan to do in a follow up. I have called them out in the Next steps section below.

Next steps

Add schema API to allow consumers to specify which properties can be incremental summarized. Currently, they are hardcoded but that will be changed before making this PR ready for review.
Add logic to gate incremental summaries based on format version of FieldBatchCoded. Also, that format version will be updated as per cross-client compat policy.

Description

AB#1263 has an overall description and link to a design document.

This PR adds incremental summarization support to chunked forest. Currently, the chunked forest data is encoded and added to a single blob in the summary tree. This change breaks it down into mutiple subtrees where each subtree can be incrementally summarized, i.e., if such a subtree doesn't change between summaries, a summary handle will be added to the summary instead of the entire subtree.
The schema will support marking properties as incrementally summarizable (TODO - not in this PR). Any chunks in the fields corresponding to this property will be the boundary at which incrementally summarization will work. Basically, if this property changes or anything within it changes (for objects and arrays), its chunks will be re-summarized, or else, a summary handle will be used.

To achieve this, this PR makes the following changes:

Added a new encoded format called EncodedIncrementalChunkShape to EncodedChunkShape for chunks whose data will be incremental summarized.
Added a new shape called IncrementalChunkShape is added which represents chunks whose data can be incrementally summarized, aka, incremental chunks.
Each incremental chunk is assigned a ChunkReferenceId which uniquely identifies it under its parent field.
Added a new field encoder called incrementalFieldEncoder which encodes its chunks as IncrementalChunkShape. The ChunkReferenceIds of these chunks are stored as EncodedNestedArrayShape.
Added a new decoder called IncrementalChunkDecoder is added which can decode an incremental chunk.
Added IncrementalEncoder and IncrementalDecoder interfaces which has properties that help with encoding and decoding of incremental chunks. These are added to the FieldBatchEncodingContext.
Added a ForestIncrementalSummaryBuilder which tracks chunks across summaries and generates summary tree or summary handles for them depending on whether they change between summaries.
The forest summary tree consists of a blob called ForestTree at the root that contains the data for the forest. If there are any incremental fields in the forest, an array of chunk reference ids will be encoded for its chunks in the data.
- Fore each incremental chunk, a separate summary tree will be added with its reference id as the key for the tree.
- The data for the chunk will be added to a blob called contents under the tree. If the chunk has any incremental fields, an array of chunk reference ids will be encoded for its chunks in the data.
- The same format is repeated until all incremental fields have been summarized.
- Note that currently, an incremental field will have exactly one chunk. This may change in the future where the data may be encoded into multiple chunks.

AB#14308 AB#41861 AB#41863 AB#41864

packages/dds/tree/src/core/tree/chunk.ts

packages/dds/tree/src/feature-libraries/chunked-forest/basicChunk.ts

packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkCodecUtilities.ts

packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkDecoding.ts

packages/dds/tree/src/feature-libraries/chunked-forest/codec/codecs.ts

packages/dds/tree/src/core/tree/chunk.ts

packages/dds/tree/src/feature-libraries/forest-summary/forestSummarizer.ts

packages/dds/tree/src/core/tree/chunk.ts

packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkDecoding.ts

…ce (#24970) ## Description Added `getSnapshotTree` API to `IChannelStorageService` to let DDS access their snapshot tree. This will help DDSes examine their summary where the summary consists of dynamic subtrees / blobs. Currently, all the DDSes store their data in fixed blobs, so they can download these specific blobs via their name. However, there can be scenarios where a DDS writes their data in dynamic trees / blobs similar to the container runtime where its channels subtree's contents depends on number of data stores and their ids. In such cases, without having access to the snapshot tree, it will need to write the path of all its blob into its summary tree (into a blob with specific name) and use that to fetch them during load. For example, shared tree is implementing incremental summarization of its data where the number and paths of blobs will change dynamically. Having access to the snapshot tree is important for it to fetch the contents of all the blobs in its summary tree. PR for shared tree's incremental summarization - #24949. [AB#43434](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/43434)

…se nester arrays

CraigMacomber · 2025-07-31T21:01:04Z

packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkDecoding.ts

+		const chunkReferenceId = readStream(stream);
+		assert(
+			typeof chunkReferenceId === "number",
+			"Expected incremental field's data to be a chunk reference id",
+		);
+


You could use:

Suggested change

const chunkReferenceId = readStream(stream);

assert(

typeof chunkReferenceId === "number",

"Expected incremental field's data to be a chunk reference id",

);

const chunkReferenceId = readStreamNumber(stream);

That has the assert built in.

CraigMacomber · 2025-07-31T21:10:51Z

packages/dds/tree/src/feature-libraries/chunked-forest/codec/compressedEncode.ts

-	extends ShapeGeneric<EncodedChunkShape>
-	implements FieldEncoder
-{
+export class NestedArrayEncoder extends NestedArrayShape implements FieldEncoder {


I think it would be cleaner to not extend NestedArrayShape, and extend just store an instance of it.

That will make caching/reuse of the shape more effective (we can cache the NestedArrayShape instances without worriny about if they are an NestedArrayEncoder).

It will also seperate concerns here so the encoder can clearly just be about implementing the encider interface, which will be cleaner.

I think we would want to make the constructor take in the just the public readonly shape: NestedArrayShape

Added as a parameter but made it use a default value using the inner encoder's shape by default since that is going to be the common case.

CraigMacomber · 2025-07-31T21:15:45Z

packages/dds/tree/src/feature-libraries/chunked-forest/codec/compressedEncode.ts

+		);
+
+		let chunkReferenceIds: ChunkReferenceId[] = [];
+		if (cursor.getFieldLength() !== 0) {


IncrementalEncoder.encodeIncrementalField does not say anything about not supporting empty fields so I think we should skip this check, and let it handle the empty case. I don't think we gain anything by adding the special casing of empty here.

I added this check so that we don't end up adding an empty summary tree for fields that are empty. I could move this check to the ForestIncrementalSummaryBuilder and it can decide to return an empty array in case the field is empty.

agarwal-navin requested a review from CraigMacomber July 1, 2025 16:57