Skip to content

Conversation

leonanos8
Copy link
Contributor

@leonanos8 leonanos8 commented Aug 27, 2025

Description

Follow-up PR on the actual implementation of the collection and calculation of the 2D anon stats

Changes

  • Introduced a 2D anonymized statistics data model to represent joint left/right distance buckets.
  • Extended the GPU actor to compute, clear between batches, and include 2D stats in its result payload.
  • Updated the server to receive the 2D stats and prepare them for SNS publishing.
    - Decided to go with a new message type and attributes so we can forward the 2D messages on their own queue to be consumed by signup-service-worker independently.
  • Scoped to Normal flow; mirror flow and existing 1D stats behavior remain unchanged.

[edit]

  • Serialize the computed 2D anonymized statistics into a file and offload this file to S3 to avoid violating the SNS message size limit(256KB). I discovered during testing that with the 2D stats we go over this limit. Solution for this is to offload the paylod in S3 and only publish the key in the SNS message.

(We could try minimizing the size of the 2D stats that need to be sent, by either keep only the ones with count >=1 or use compression. But since we already have an existing S3 bucket for SNS requests (used for shares), which has a cleaning lifecycle policy, its not big of an overhead to proceed with the classic offload-to-S3 pattern )

@leonanos8 leonanos8 self-assigned this Aug 27, 2025
@leonanos8 leonanos8 force-pushed the leonidasnanos-pop-2659-2d-hd-and-implementation-of-the-required-wiring branch from 1693073 to 9425683 Compare August 27, 2025 13:47
@leonanos8 leonanos8 marked this pull request as ready for review August 27, 2025 14:47
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements collection and propagation of 2D anonymized bucket statistics from the GPU actor to the server and SNS. The 2D stats represent joint left/right distance buckets for matches that occur on both eyes, providing more granular analytics than the existing 1D statistics.

  • Adds a new BucketStatistics2D data model with 2D histogram buckets and associated metadata
  • Extends the GPU actor to compute, store, and include 2D stats in the ServerJobResult payload
  • Adds configuration flag and SNS message type to enable independent queue routing for 2D statistics

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
iris-mpc-gpu/src/server/actor.rs Integrates 2D stats collection into GPU actor workflow with buffer management and calculation
iris-mpc-cpu/src/execution/hawk_main.rs Updates CPU execution path to include placeholder 2D stats in result
iris-mpc-common/src/job.rs Adds 2D stats field to ServerJobResult structure
iris-mpc-common/src/helpers/statistics.rs Implements complete 2D statistics data model with serialization and bucket filling logic
iris-mpc-common/src/helpers/smpc_request.rs Defines new message type constant for 2D statistics SNS messages
iris-mpc-common/src/config/mod.rs Adds configuration flag to control 2D statistics message publishing

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +160 to +175
impl Eq for Bucket2DResult {}
impl PartialEq for Bucket2DResult {
fn eq(&self, other: &Self) -> bool {
self.count == other.count
&& (self.left_hamming_distance_bucket[0] - other.left_hamming_distance_bucket[0]).abs()
<= 1e-9
&& (self.left_hamming_distance_bucket[1] - other.left_hamming_distance_bucket[1]).abs()
<= 1e-9
&& (self.right_hamming_distance_bucket[0] - other.right_hamming_distance_bucket[0])
.abs()
<= 1e-9
&& (self.right_hamming_distance_bucket[1] - other.right_hamming_distance_bucket[1])
.abs()
<= 1e-9
}
}
Copy link

Copilot AI Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing Eq for a type with floating-point fields violates Rust's equivalence relation requirements. The PartialEq implementation uses approximate equality (1e-9 tolerance) which is not transitive, reflexive, or symmetric as required by Eq. Remove the Eq implementation and only keep PartialEq.

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually true, the impl Eq should be removed.

Comment on lines +188 to +190
#[serde(skip_serializing)]
#[serde(skip_deserializing)]
#[serde(with = "ts_seconds_option")]
Copy link

Copilot AI Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next_start_time_utc_timestamp field has conflicting serde attributes. skip_serializing and skip_deserializing will prevent serialization/deserialization, making the with = \"ts_seconds_option\" attribute ineffective. Remove the with attribute since the field is being skipped.

Suggested change
#[serde(skip_serializing)]
#[serde(skip_deserializing)]
#[serde(with = "ts_seconds_option")]

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also seems a reasonable suggestion.

@leonanos8 leonanos8 merged commit 940eb8f into dk/2d-anon-stats-gpu Sep 3, 2025
14 of 15 checks passed
@leonanos8 leonanos8 deleted the leonidasnanos-pop-2659-2d-hd-and-implementation-of-the-required-wiring branch September 3, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants