-
Notifications
You must be signed in to change notification settings - Fork 553
(tree): Incremental summarization of chunked forest #24949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
(tree): Incremental summarization of chunked forest #24949
Conversation
packages/dds/tree/src/feature-libraries/chunked-forest/basicChunk.ts
Outdated
Show resolved
Hide resolved
packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkCodecUtilities.ts
Outdated
Show resolved
Hide resolved
packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkCodecUtilities.ts
Outdated
Show resolved
Hide resolved
packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkCodecUtilities.ts
Outdated
Show resolved
Hide resolved
packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkDecoding.ts
Outdated
Show resolved
Hide resolved
packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkDecoding.ts
Outdated
Show resolved
Hide resolved
packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkDecoding.ts
Outdated
Show resolved
Hide resolved
packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkDecoding.ts
Outdated
Show resolved
Hide resolved
packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkDecoding.ts
Outdated
Show resolved
Hide resolved
packages/dds/tree/src/feature-libraries/chunked-forest/codec/codecs.ts
Outdated
Show resolved
Hide resolved
packages/dds/tree/src/feature-libraries/chunked-forest/codec/codecs.ts
Outdated
Show resolved
Hide resolved
packages/dds/tree/src/feature-libraries/chunked-forest/codec/codecs.ts
Outdated
Show resolved
Hide resolved
packages/dds/tree/src/feature-libraries/forest-summary/forestSummarizer.ts
Outdated
Show resolved
Hide resolved
packages/dds/tree/src/feature-libraries/chunked-forest/codec/chunkDecoding.ts
Outdated
Show resolved
Hide resolved
…ce (#24970) ## Description Added `getSnapshotTree` API to `IChannelStorageService` to let DDS access their snapshot tree. This will help DDSes examine their summary where the summary consists of dynamic subtrees / blobs. Currently, all the DDSes store their data in fixed blobs, so they can download these specific blobs via their name. However, there can be scenarios where a DDS writes their data in dynamic trees / blobs similar to the container runtime where its channels subtree's contents depends on number of data stores and their ids. In such cases, without having access to the snapshot tree, it will need to write the path of all its blob into its summary tree (into a blob with specific name) and use that to fetch them during load. For example, shared tree is implementing incremental summarization of its data where the number and paths of blobs will change dynamically. Having access to the snapshot tree is important for it to fetch the contents of all the blobs in its summary tree. PR for shared tree's incremental summarization - #24949. [AB#43434](https://dev.azure.com/fluidframework/235294da-091d-4c29-84fc-cdfc3d90890b/_workitems/edit/43434)
5b54069
to
a18ca83
Compare
eac3c35
to
e0f4f71
Compare
const chunkReferenceId = readStream(stream); | ||
assert( | ||
typeof chunkReferenceId === "number", | ||
"Expected incremental field's data to be a chunk reference id", | ||
); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use:
const chunkReferenceId = readStream(stream); | |
assert( | |
typeof chunkReferenceId === "number", | |
"Expected incremental field's data to be a chunk reference id", | |
); | |
const chunkReferenceId = readStreamNumber(stream); |
That has the assert built in.
extends ShapeGeneric<EncodedChunkShape> | ||
implements FieldEncoder | ||
{ | ||
export class NestedArrayEncoder extends NestedArrayShape implements FieldEncoder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner to not extend NestedArrayShape, and extend just store an instance of it.
That will make caching/reuse of the shape more effective (we can cache the NestedArrayShape instances without worriny about if they are an NestedArrayEncoder).
It will also seperate concerns here so the encoder can clearly just be about implementing the encider interface, which will be cleaner.
I think we would want to make the constructor take in the just the public readonly shape: NestedArrayShape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as a parameter but made it use a default value using the inner encoder's shape by default since that is going to be the common case.
); | ||
|
||
let chunkReferenceIds: ChunkReferenceId[] = []; | ||
if (cursor.getFieldLength() !== 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IncrementalEncoder.encodeIncrementalField does not say anything about not supporting empty fields so I think we should skip this check, and let it handle the empty case. I don't think we gain anything by adding the special casing of empty here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this check so that we don't end up adding an empty summary tree for fields that are empty. I could move this check to the ForestIncrementalSummaryBuilder
and it can decide to return an empty array in case the field is empty.
Reviewer guidance
Next steps
section below.Next steps
Description
AB#1263 has an overall description and link to a design document.
This PR adds incremental summarization support to chunked forest. Currently, the chunked forest data is encoded and added to a single blob in the summary tree. This change breaks it down into mutiple subtrees where each subtree can be incrementally summarized, i.e., if such a subtree doesn't change between summaries, a summary handle will be added to the summary instead of the entire subtree.
The schema will support marking properties as incrementally summarizable (TODO - not in this PR). Any chunks in the fields corresponding to this property will be the boundary at which incrementally summarization will work. Basically, if this property changes or anything within it changes (for objects and arrays), its chunks will be re-summarized, or else, a summary handle will be used.
To achieve this, this PR makes the following changes:
EncodedIncrementalChunkShape
toEncodedChunkShape
for chunks whose data will be incremental summarized.IncrementalChunkShape
is added which represents chunks whose data can be incrementally summarized, aka, incremental chunks.ChunkReferenceId
which uniquely identifies it under its parent field.incrementalFieldEncoder
which encodes its chunks asIncrementalChunkShape
. TheChunkReferenceId
s of these chunks are stored asEncodedNestedArrayShape
.IncrementalChunkDecoder
is added which can decode an incremental chunk.IncrementalEncoder
andIncrementalDecoder
interfaces which has properties that help with encoding and decoding of incremental chunks. These are added to theFieldBatchEncodingContext
.ForestIncrementalSummaryBuilder
which tracks chunks across summaries and generates summary tree or summary handles for them depending on whether they change between summaries.ForestTree
at the root that contains the data for the forest. If there are any incremental fields in the forest, an array of chunk reference ids will be encoded for its chunks in the data.contents
under the tree. If the chunk has any incremental fields, an array of chunk reference ids will be encoded for its chunks in the data.AB#14308 AB#41861 AB#41863 AB#41864