Skip to content

Commit 8223242

Browse files
committed
add wan2.1 notebook
1 parent 35ceb0a commit 8223242

File tree

4 files changed

+1405
-0
lines changed

4 files changed

+1405
-0
lines changed
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Text to Video generation with Wan2.1 and OpenVINO
2+
3+
Wan2.1 is a comprehensive and open suite of video foundation models that pushes the boundaries of video generation.
4+
5+
Built upon the mainstream diffusion transformer paradigm, Wan 2.1 achieves significant advancements in generative capabilities through a series of innovations, including our novel spatio-temporal variational autoencoder (VAE), scalable pre-training strategies, large-scale data construction, and automated evaluation metrics. These contributions collectively enhance the model's performance and versatility.
6+
7+
You can find more details about model in [model card](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) and [original repository](https://github.com/Wan-Video/Wan2.1)
8+
9+
In this tutorial we consider how to convert, optimize and run Wan2.1 model using OpenVINO.
10+
Additionally, for achieving inference speedup, we will apply [CausVid](https://causvid.github.io/) distillation approach using LoRA.
11+
12+
![](https://causvid.github.io/images/methods.jpg)
13+
14+
Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies. The generation of a single frame requires the model to process the entire sequence, including the future. CausVid address this limitation by adapting a pretrained bidirectional diffusion transformer to an autoregressive transformer that generates frames on-the-fly. To further reduce latency, the authors extend distribution matching distillation (DMD) to videos, distilling 50-step diffusion model into a 4-step generator.
15+
16+
The method distills a many-step, bidirectional video diffusion model sdata into a 4-step, causal generator. The training process consists of two stages:
17+
1. Student Initialization: Initialization of the causal student by pretraining it on a small set of ODE solution pairs generated by the bidirectional teacher. This step helps stabilize the subsequent distillation training.
18+
2. Asymmetric Distillation: Using the bidirectional teacher model, we train the causal student generator through a distribution matching distillation loss.
19+
20+
More details about CuasVid can be found in the [paper](https://arxiv.org/abs/2412.07772), [original repository](https://github.com/tianweiy/CausVi) and [project page](https://causvid.github.io/)
21+
22+
23+
## Notebook contents
24+
This tutorial consists of the following steps:
25+
- Prerequisites
26+
- Convert and Optimize model
27+
- Run inference pipeline
28+
- Interactive inference
29+
30+
## Installation instructions
31+
This is a self-contained example that relies solely on its own code.</br>
32+
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
33+
For details, please refer to [Installation Guide](../../README.md).
34+
35+
<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/wan2.1-text-to-video/README.md" />
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
import gradio as gr
2+
import torch
3+
from diffusers.utils import export_to_video
4+
import numpy as np
5+
6+
MAX_SEED = np.iinfo(np.int32).max
7+
8+
9+
def make_demo(pipeline):
10+
def generate_video(prompt, negative_prompt="", guidance_scale=1.0, seed=42, progress=gr.Progress(track_tqdm=True)):
11+
output = pipeline(
12+
prompt=prompt,
13+
negative_prompt=negative_prompt,
14+
height=480,
15+
width=832,
16+
num_frames=20,
17+
guidance_scale=guidance_scale,
18+
num_inference_steps=4,
19+
generator=torch.Generator().manual_seed(seed),
20+
).frames[0]
21+
22+
video_path = "output.mp4"
23+
export_to_video(output, video_path, fps=10)
24+
return video_path
25+
26+
iface = gr.Interface(
27+
fn=generate_video,
28+
inputs=[
29+
gr.Textbox(label="Prompt", placeholder="Enter your video prompt here"),
30+
gr.Textbox(label="Negative Prompt", placeholder="Optional negative prompt", value=""),
31+
gr.Slider(
32+
label="Guidance scale",
33+
minimum=0.0,
34+
maximum=20.0,
35+
step=0.1,
36+
value=1.0,
37+
),
38+
gr.Slider(
39+
label="Seed",
40+
minimum=0,
41+
maximum=MAX_SEED,
42+
step=1,
43+
value=42,
44+
),
45+
],
46+
outputs=gr.Video(label="Generated Video"),
47+
title="Wan2.1-T2V-1.3B OpenVINO Video Generator",
48+
flagging_mode="never",
49+
examples=[
50+
["a penguin playfully dancing in the snow, Antarctica", "", 1.0, 42],
51+
[
52+
"A cat walks on the grass, realistic",
53+
"Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards",
54+
2.5,
55+
678,
56+
],
57+
],
58+
)
59+
return iface

0 commit comments

Comments
 (0)