Skip to content

Allow models to run without all text encoder(s) #645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

stduhpf
Copy link
Contributor

@stduhpf stduhpf commented Apr 2, 2025

For now only Flux and SD3.x.

Just puts a warning instead of crashing when text encoders are missing, and then proceed without it.

TODO:

  • Re-enable gpu prompt processing if t5 isn't actually used
  • Support unet models (SDXL ?)

Comparisons:

  • Using clip_l/clip_g q8_0 and t5xxl q4_k.
  • 5 steps
  • 1024x1024
  • default seed
  • cfg 1
  • default guidance
  • tiled vae
  • prompt: 'Illustration of a cute cat holding a sign saying "You do not need all text encoders!"'

SD3.5 Large Turbo (iq4_nl):

With t5_xxl:

with clip_l without clip_l
with clip_g 3 5lt-all 3 5lt-nocl
without clip_g 3 5lt-nocg 3 5lt-t5

Without t5_xxl:

with clip_l without clip_l
with clip_g 3 5lt-not5 3 5lt-cg
without clip_g 3 5lt-cl 3 5lt-nop

Flux Schnell (iq4_nl imatrix):

with clip_l without clip_l
with T5_xxl fs-all fs-noclip
without T5_xxl fs-not5 fs-nop

@rmatif
Copy link

rmatif commented Apr 5, 2025

Thanks to this, one can now run Flux on an 8GB Android phone

Screenshot_20250405_211047

@Green-Sky
Copy link
Contributor

@rmatif is your comment for this pr specifically? ... kind of sounds not related.

BTW, did you try one of the flux 8B "lite" prunes?
https://huggingface.co/Green-Sky/flux.1-lite-8B-GGUF/tree/main/lora-experiments
Here with hyper-sd lora merged, for lower step count.

@stduhpf
Copy link
Contributor Author

stduhpf commented Apr 6, 2025

@Green-Sky I think @rmatif meant that with this PR it's possible to drop T5, which makes Flux fit in only 8GB of system memory.

@rmatif
Copy link

rmatif commented Apr 6, 2025

@Green-Sky I think @rmatif meant that with this PR it's possible to drop T5, which makes Flux fit in only 8GB of system memory.

This is exactly what I meant, sorry if I wasn't clear. With this PR, we can drop the heavy T5, so we can squeeze Flux into just an 8GB phone.

@Green-Sky I just tested Flux.1-lite and the q4_k version can also fit now into those kinds of devices, although you can't run inference on resolutions larger than 512x512 due to the compute buffer, but I bet q3_k will do just fine.

@stduhpf stduhpf changed the title Allow models to run with without all text encoder(s) Allow models to run without all text encoder(s) Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants