Skip to content

Conversation

ezyang
Copy link
Contributor

@ezyang ezyang commented Sep 22, 2025

This is required by the new VLM code:

[rank0]:Traceback (most recent call last):
[rank0]:  File "/home/ezyang/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/runpy.py", line 187, in _run_module_as_main
[rank0]:    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
[rank0]:  File "/home/ezyang/.local/share/uv/python/cpython-3.10.18-linux-x86_64-gnu/lib/python3.10/runpy.py", line 110, in _get_module_details
[rank0]:    __import__(pkg_name)
[rank0]:  File "/data/users/ezyang/b/torchtitan/torchtitan/__init__.py", line 12, in <module>
[rank0]:    import torchtitan.experiments  # noqa: F401
[rank0]:  File "/data/users/ezyang/b/torchtitan/torchtitan/experiments/__init__.py", line 10, in <module>
[rank0]:    import torchtitan.experiments.vlm  # noqa: F401
[rank0]:  File "/data/users/ezyang/b/torchtitan/torchtitan/experiments/vlm/__init__.py", line 17, in <module>
[rank0]:    from .datasets.mm_datasets import build_mm_dataloader
[rank0]:  File "/data/users/ezyang/b/torchtitan/torchtitan/experiments/vlm/datasets/mm_datasets.py", line 29, in <module>
[rank0]:    from .mm_collator_nld import MultiModalCollatorNLD
[rank0]:  File "/data/users/ezyang/b/torchtitan/torchtitan/experiments/vlm/datasets/mm_collator_nld.py", line 17, in <module>
[rank0]:    from .utils.image import (
[rank0]:  File "/data/users/ezyang/b/torchtitan/torchtitan/experiments/vlm/datasets/utils/image.py", line 12, in <module>
[rank0]:    import einops as E
[rank0]:ModuleNotFoundError: No module named 'einops'

Signed-off-by: Edward Z. Yang ezyang@meta.com

[ghstack-poisoned]
@ezyang
Copy link
Contributor Author

ezyang commented Sep 22, 2025

Stack from ghstack (oldest at bottom):

ezyang added a commit that referenced this pull request Sep 22, 2025
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
ghstack-source-id: 42575a7
ghstack-comment-id: 3319174692
Pull-Request: #1734
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 22, 2025
@wwwjn
Copy link
Contributor

wwwjn commented Sep 22, 2025

For models under experiment folder, we have their own requirements (eg, for VLM it's here: https://github.com/pytorch/torchtitan/blob/refs/heads/main/torchtitan/experiments/vlm/requirements.txt). We don't want model dependency under experiment folder to be part of main requirement.txt, which might broke main branch development

@ruisizhang123
Copy link
Contributor

FYI, I tried to run expertments/simple_fsdp and hit missing package error for PIL in experiment/vlm here. However, PIL (Pillow) pkg is missing in VLM's requirements.txt

@tianyu-l
Copy link
Contributor

@wwwjn @fegin as we have more and more experiments, we probably shouldn't import everything, or register all TrainSpec. We may need to optionally do importlib.importmodule like this https://github.com/pytorch/torchtitan/blob/main/torchtitan/train.py#L80

But we need to make sure it works with existing JobConfig extension, which also uses this field.

We may have to extend config/manager.py to handle TrainSpec registration.

@tianyu-l
Copy link
Contributor

closing as this is fixed in #1740

@tianyu-l tianyu-l closed this Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot. high priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants