-
Notifications
You must be signed in to change notification settings - Fork 176
Description
I observed that after generating many blocks of frames without action conditioning (i.e. mouse input = "u", keyboard input = "q") the frames show the same scene but there is quality degradation (eg distortion, visual artifacts) over time.
This can be reproduced with both inference.py
and inference_streaming.py
using this branch which contains a small change to generate mouse and keyboard condition tensors with all zeros (i.e. no mouse or keyboard input).
For inference.py
, I used this command:
python inference.py \
--config_path configs/inference_yaml/inference_universal.yaml \
--checkpoint_path Matrix-Game-2.0/base_distilled_model/base_distill.safetensors \
--img_path demo_images/universal/0011.png \
--output_folder outputs \
--num_output_frames 150 \
--seed 42 \
--pretrained_model_path Matrix-Game-2.0
This is the resulting video:
repro_static_quality_degradation_inference.mp4
For inference_streaming.py
, I used this command:
python inference_streaming.py \
--config_path configs/inference_yaml/inference_universal.yaml \
--checkpoint_path Matrix-Game-2.0/base_distilled_model/base_distill.safetensors \
--output_folder outputs \
--seed 42 \
--pretrained_model_path Matrix-Game-2.0
And also used demo_images/universal/0011.png
as the input image and repeatedly entered "u" for mouse input and "q" for keyboard input.
This is the resulting video:
repro_static_quality_degradation_inference_streaming.mp4
Is this known/expected behavior? If so, I'm wondering if anyone has thoughts on the best way to avoid the quality degradation because as-is any pause in action resulting in a static scene would cause the quality to degrade. A workaround might be to skip frame generation when there are no actions, but I think the downside of that you would lose the ability to get frames for the same scene with minor animation details eg changes in the water for the demo image used above.