[WIP] Add Deepcache #705

rmatif · 2025-06-18T12:45:10Z

This PR is currently in progress and far from complete. It adds DeepCache, a method that can be applied to U-Net architectures to skip certain blocks and reuse them in later steps in order to save compute time.

I have been inspired by this ComfyUI implementation.

It adds --deepcache interval,depth,start,stop arguments.

Currently, it's not working well and I can't figure out why or how to achieve better results. I have been debugging the cache step and counter logic for a week, but the issue seems to be more subtle than that.

Command example:

./build/bin/sd -m ../models/realisticVisionV60B1_v51HyperVAE.safetensors -v -p "cute cat" --cfg-scale 2.5 --steps 8 --deepcache 2,3,0,8

w/o deepcache	`--deepcache 2,3,0,8`	`--deepcache 3,3,0,8`

If someone could help by taking a look or continue the work, I would be grateful. Otherwise, I don't think I'll spend more time on it.

FSSRepo · 2025-07-04T18:57:25Z

I'm very interested in this PR; I wish I had time to test DeepCache in Comfy UI and compare the results with your PR.

rmatif · 2025-07-05T12:41:59Z

@FSSRepo Thanks for your interest! Here's a comparison with ComfyUI, using the same model and parameters as above.

w/o deepcache	`interval = 2, depth = 3, start = 0, stop = 8`	`interval = 3, depth = 3, start = 0, stop = 8`

The results are so much better in ComfyUI compared to what I’m getting. My implementation doesn’t seem to work without CFG, which is really odd since DeepCache is supposed to be CFG-agnostic. I’m definitely doing something wrong. I'd love to continue working on this, but I’ve run out of ideas. It would be great if you could take a look and share any feedback!

stduhpf · 2025-07-05T12:56:37Z

My implementation doesn’t seem to work without CFG, which is really odd since DeepCache is supposed to be CFG-agnostic.

My guess is that it's sharing the same cache between uncond and conditioned pass and it's probably not supposed to.

rmatif · 2025-07-06T08:39:04Z

My implementation doesn’t seem to work without CFG, which is really odd since DeepCache is supposed to be CFG-agnostic.

My guess is that it's sharing the same cache between uncond and conditioned pass and it's probably not supposed to.

I tried to create a separate cache for conditional and unconditional passes, but it broke things even more. In any case, I think we should fix things with CFG first before addressing the CFG-free issue, don't think those are related

FSSRepo · 2025-07-12T15:04:24Z

What is CFG?

According to my understanding, it's when we pass the --cfg-scale parameter. Why do they refer to it as something that's missing in this project?

Or is it a deepcache configuration?

stduhpf · 2025-07-12T16:15:37Z

What is CFG?

CFG means Classifier-Free guidance. It's basically a way to change how much of an effect the prompt has for conditional generation by linearly extrapolating from the conditioned prediction away from the perdition without text conditioning (or with a negative prompt). So it needs 2 forward passes at each step: 1 with the positive prompt, and 1 with empty/negative prompt.

rmatif · 2025-07-12T16:20:45Z

What is CFG?

According to my understanding, it's when we pass the --cfg-scale parameter. Why do they refer to it as something that's missing in this project?

Or is it a deepcache configuration?

It's just the --cfg-scale. When you try to run inference with a CFG of 1, the results are significantly worse, almost garbage. It seems that the more steps it takes, the further off the output gets, as if some error is accumulating at each step.

I did try separating the cache between the conditional and unconditional passes, but that didn’t help and in fact, it broke the case where we run with CFG > 1. From my understanding, DeepCache operates at a higher level and shouldn't be affected by this conditional/unconditional distinction stuff. Something is seriously wrong here, but I can't quite put my finger on it

EDIT: I may have wrongly assumed that you're familiar with the concept of CFG, but @stduhpf already explained it well. Basically, during inference, you're doing:

final_prediction = prediction_unconditional + w * (prediction_conditional - prediction_unconditional)

When w = 1, you're effectively running only the conditional pass. That’s useful because it means you can double your inference speed, and distilled models support this approach. However, you do trade off some prompt fidelity when doing so.

I recently read a paper that concluded CFG might actually be useless. It only appears to work because we end up using twice the compute

rmatif and others added 5 commits June 4, 2025 13:53

first attempt

c29b4be

some progress

53ce38e

fixing graph logic

a5bbf1b

del irrelevant comments

a8a6d66

remove DeepCache comparison images

169188c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Add Deepcache #705

[WIP] Add Deepcache #705

rmatif commented Jun 18, 2025 •

edited

Loading

Uh oh!

FSSRepo commented Jul 4, 2025

Uh oh!

rmatif commented Jul 5, 2025

Uh oh!

stduhpf commented Jul 5, 2025

Uh oh!

rmatif commented Jul 6, 2025

Uh oh!

FSSRepo commented Jul 12, 2025

Uh oh!

stduhpf commented Jul 12, 2025

Uh oh!

rmatif commented Jul 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

[WIP] Add Deepcache #705

Are you sure you want to change the base?

[WIP] Add Deepcache #705

Conversation

rmatif commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FSSRepo commented Jul 4, 2025

Uh oh!

rmatif commented Jul 5, 2025

Uh oh!

stduhpf commented Jul 5, 2025

Uh oh!

rmatif commented Jul 6, 2025

Uh oh!

FSSRepo commented Jul 12, 2025

Uh oh!

stduhpf commented Jul 12, 2025

Uh oh!

rmatif commented Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rmatif commented Jun 18, 2025 •

edited

Loading

rmatif commented Jul 12, 2025 •

edited

Loading