Interpolate positional embeddings for input images with larger sizes by EeroHeikkinen · Pull Request #416 · mlfoundations/open_clip

EeroHeikkinen · 2023-02-11T16:22:49Z

This allows to use input images with resolutions larger than the trained resolution.

Simple example:

import torch
from PIL import Image
import open_clip

model, _, _ = open_clip.create_model_and_transforms('xlm-roberta-base-ViT-B-32', pretrained='laion5b_s13b_b90k')
preprocess = open_clip.image_transform(
        448,
        is_train=False,
        mean=None,
        std=None,
    )
tokenizer = open_clip.get_tokenizer('xlm-roberta-base-ViT-B-32')

image = preprocess(Image.open("file.jpg")).unsqueeze(0)
text = tokenizer(["a penguin", "vanilla ice cream", "a dog"])

with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)
    image_features /= image_features.norm(dim=-1, keepdim=True)
    text_features /= text_features.norm(dim=-1, keepdim=True)

    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", text_probs)

EeroHeikkinen added 2 commits February 11, 2023 18:19

Interpolate positional embeddings for larger input

f5a4730

Move interpolated pos encoding to correct device

3f96a63

Tsingularity mentioned this pull request Aug 10, 2023

How to choose features from OpenCLIP? Tsingularity/dift#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpolate positional embeddings for input images with larger sizes#416

Interpolate positional embeddings for input images with larger sizes#416
EeroHeikkinen wants to merge 2 commits into
mlfoundations:mainfrom
EeroHeikkinen:main

EeroHeikkinen commented Feb 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

EeroHeikkinen commented Feb 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant