Skip to content

Why not using "uv" instead of conda? #744

@pduy

Description

@pduy

Hello everyone,

I have recently stumbled across this repo trying to tune some diffusion model. I'm not sure if this is relevant anymore as there seems to be no more commits for the last 2 years, but the environment configured here is very outdated, and updating it is not as straightforward due to the fact that conda is not good in dependency resolution, and the anaconda repository usually lags behind the mainstream PyPI (configured via pip) in terms of package versions.

I have tried to solve it myself in different ways and will share my experience here.

Problem

If you just take the environment here out of the box and install it via conda (conda env create -f environment.yaml), you would not be able to load any model from OpenAI. For example, this simple script tries to load some OpenAI models.

from transformers import CLIPModel, CLIPProcessor


def main():
    model = CLIPModel.from_pretrained("openai/clip-vit-large-patch14")
    processor = CLIPProcessor.from_pretrained("openai/clip-vit-large-patch14")

    print("model", model)
    print("processor", processor)



if __name__ == "__main__":
    main()

It would report the error already specified here. The root cause is simply the transformers package is outdated, hence the connection is broken.

Why can't we just put a newer version in environment.yml

Well, if it is that simple then there would be no frustration and there would not be a dozen of tools for resolving dependencies. As an simple example, transformers depends on torch, torch-vision also depends on torch. torch in turns depends on numpy and Python version. With Python 3.8 configured here we can at best get the two years old version only.

We can just manually do that ourselves. But there are better tools for it.

Updating packages

I have tried 2 main ways of doing this.

  • Using conda but starting afresh to hope to get newest versions. This means I start a new conda environment and just conda install each of the packages in environment.yml, hoping to get the newest one possible. Short answer: newer versions are installed but still heavily outdated due to the inter-dependencies of packages. This can be solved manually by cherry picking but it's a lot of manual work.

  • Converting everything to a more mainstream Python package manager, here I choose uv (but poetry should work just fine too). This works perfectly. The above script just works again thanks to the updated transformers. The only thing is, to structure a project using uv, or poetry, we can't just throw everything in the home directory. There needs to be some sort of standard Python project structure. An example PR is here.

Please let me know what you think.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions