Skip to content

usage with pytorch FSDP #135

@benihime91

Description

@benihime91

How to use with pytorch FSDP for 2D parallelism training. For example we want to apply USP attention on single nodes and FSDP across multiple nodes

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions