Skip to content

Epic: Cloud Push Integration #487

@deepakduggirala

Description

@deepakduggirala

Implement support for pushing datasets to user-provided cloud storage (AWS S3, GCP GCS, Azure Blob) via signed URLs and OAuth2. Address credential handling, tool integration (rclone), and secure runtime configuration.
This epic captures integration complexity not shared by HPFS/Globus targets.

  • Design: Secure Credential Handling for Signed URLs and OAuth Tokens

    • Define ephemeral storage strategy for signed URLs and OAuth2 credentials.
    • Decide encryption at rest strategy (e.g., Vault, DB + envelope encryption).
    • Define lifecycle: when credentials are stored, accessed, purged.
    • Document audit and expiration policy.
  • Implement Signed URL Handling for Cloud Push

    • Add support in CloudPush class for writing to S3/GCS/Azure using pre-signed URLs.
    • Verify integrity and success of upload (e.g., via HEAD or response codes).
    • Add test suite for signed URL upload (mock endpoints).
  • Design OAuth2 Delegation Model for Cloud Providers

    • Evaluate provider-specific OAuth2 workflows:
      • AWS: Cognito/federation
      • GCP: service accounts / delegated auth
      • Azure: AAD + delegated user access
    • Choose smallest viable auth surface to support.
    • Write flow diagrams for user auth experience.
  • Implement Cloud OAuth Flow Handling and Token Storage

    • Add backend support for receiving authorization grants.
    • Use provider SDK or generic OAuth2 client for token exchange.
    • Securely store refresh/access tokens with limited scope.
    • Hook into CloudPush logic to use valid access tokens for upload.
  • Add CloudPush Implementation Using rclone

    • Implement CloudPush class to delegate upload to rclone.
    • Configure rclone with in-memory or ephemeral config (no user home writes).
    • Support logging and error inspection.
  • Cloud Push Audit and Metadata Logging

    • Extend metadata model to store bucket URI, upload time, status.
    • Add flags for success verification.
    • Add storage for credential type (signed URL vs OAuth).
    • Record error codes if upload fails.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions