You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Load a `VisionEncoderDecoderModel` model from transformers.
79
+
Load an Image-to-Text model from transformers.
78
80
79
81
:param model_name_or_path: Directory of a saved model or the name of a public model.
80
-
Currently, only `VisionEncoderDecoderModel` models are supported.
81
82
To find these models:
82
83
1. Visit [Hugging Face image to text models](https://huggingface.co/models?pipeline_tag=image-to-text).`
83
84
2. Open the model you want to check.
84
85
3. On the model page, go to the "Files and Versions" tab.
85
-
4. Open the `config.json` file and make sure the `architectures` field contains `VisionEncoderDecoderModel`.
86
+
4. Open the `config.json` file and make sure the `architectures` field contains `VisionEncoderDecoderModel`, `BlipForConditionalGeneration`, or `Blip2ForConditionalGeneration`.
86
87
:param model_version: The version of the model to use from the Hugging Face model hub. This can be the tag name, branch name, or commit hash.
87
88
:param generation_kwargs: Dictionary containing arguments for the `generate()` method of the Hugging Face model.
88
89
See [generate()](https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationMixin.generate) in Hugging Face documentation.
0 commit comments