Skip to content

selective activation checkpoint with cpu offload #130

@zhangvia

Description

@zhangvia

i'm using the selective activation checkpoint to train a video generation model,which is entirely different from the case of training large language models. because video genration model has less parameters but much more intermediate activations.

i'm training hunyuanvideo model which you can find it in diffusers library. but when i try to use selective activation checkpoint and cpu offload to reduce gpu memory cost, it still oom in cpu offload hook which is save_on_cpu, the traceback tell me that torch want to creat a head_num * seqlen * seqlen tensor in save_on_cpu hook before F.scaled_dot_product_attention, which is really big and cost 69gb vram. that 's really abnormal. because i've already use the F.scale_dot_production_attention to compute attention, there shouldn‘t be any seqlen * seqlen variables.

can anyone give me some explanations? and how can i use cpu offload to leavrage my huge host memory.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions