Skip to content

Conversation

carlomazzaferro
Copy link
Collaborator

@carlomazzaferro carlomazzaferro commented Sep 19, 2025

Summary

  • Add operation-aware anonymized statistics for both 1D (per-eye) and 2D (both eyes) flows, splitting aggregation into
    Uniqueness vs Reauth buckets.
  • Keep GPU kernels unchanged; split occurs entirely in host code using existing per-match indices.
  • Add a configurable minimum threshold for publishing Reauth 1D stats; default 10,000 distances.
  • Extend result payloads and JSON schema to include an operation tag and reauth-specific stats fields.
  • Exclude reset/deletion flows from both stats sets; no reauth mirror-orientation stats.

How It Works

  • 1D (per-eye):

    • Kernel openResults still pushes 16-bit distance shares and 64-bit match_id into GPU ring buffers.
    • Host decodes match_id:
    • let MQ = query_length (max_batch_size * ALL_ROTATIONS), let MD = max_db_size.
    • query_idx = (match_id % MQ), query_no_rot = query_idx / ALL_ROTATIONS, batch_id = match_id / (MD * MQ).
  • Host keeps batch_ops_map: HashMap<u64, Vec> with the op per filtered query for each batch.

  • Build two per-device subsets:

    • Uniqueness subset, Reauth subset (reset/update/deletion are ignored).
    • Per subset: sort indices, build “first-of-query” bitmask (rotation-collapsed), re-sort the distance shares accordingly,
      and run compare_multiple_thresholds_while_aggregating_per_query(...).
  • Publish:

    • Uniqueness: always (Normal and Mirror).
    • Reauth: Normal only and only if per-flush count across devices ≥ reauth_match_distances_min_count.
  • 2D (both eyes):

    • Aggregate two-sided matches into separate caches: both_side_match_distances_buffer_uni and _reauth.
    • Partition per device by decoding a representative idx for each grouped match to determine op via batch_ops_map.
    • When a cache reaches match_distances_2d_buffer_size, compute thresholds and fill the corresponding BucketStatistics2D.

Config

  • New: reauth_match_distances_min_count: usize (default 10_000).
  • Used to gate publishing 1D reauth anonymized stats; configurable via existing config system.

Payload Changes

  • BucketStatistics + BucketStatistics2D gain operation: Operation in the JSON (additive).
  • ServerJobResult now includes:
    • 1D Reauth: anonymized_bucket_statistics_left_reauth, ..._right_reauth
    • 2D Reauth: anonymized_bucket_statistics_2d_reauth
  • Sending:
    • Uniqueness 1D (Normal) and 2D continue as before.
    • Reauth 1D (Normal) sent when present.
    • Reauth 2D sent via S3 + SNS pointer like Uniqueness 2D.
    • Mirror orientation stats sent only for Uniqueness.

Backward Compatibility

  • JSON: The new operation field is additive and compatible with consumers that ignore unknown fields.
  • New ServerJobResult fields are additive; existing processing is unchanged and we guarded senders.
  • No changes to kernel ABI or GPU buffers.

Example Payloads

  • 1D:
    • {"party_id":1,"eye":"Left","operation":"Uniqueness","buckets":[{"count":123,"hamming_distance_bucket":[0.000,0.001]},...]}
    • {"party_id":1,"eye":"Right","operation":"Reauth","buckets":[...]}
  • 2D:
    • {"party_id":1,"operation":"Uniqueness","buckets":[{"count":42,"left_hamming_distance_bucket": [0.000,0.001],"right_hamming_distance_bucket":[0.000,0.001]},...]}

1 << 13 // 8192
}

fn default_reauth_match_distances_min_count() -> usize {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure if we need this, was just a consideration on how to limit the amount of reauth bucket stats to send

@carlomazzaferro carlomazzaferro changed the title wip: anon stats separately for reauth and uniqueness [POP-2913] wip: anon stats separately for reauth and uniqueness Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant