Performance degradation on certain vision models from v4.51.* #37748

yuan-thomas · 2025-04-24T11:24:43Z

System Info

transformers version: 4.51.3
Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
Python version: 3.12.3
Huggingface_hub version: 0.30.2
Safetensors version: 0.5.3
Accelerate version: not installed
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (GPU?): 2.7.0a0+7c8ec84dab.nv25.03 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: No
Using GPU in script?: Yes
GPU type: NVIDIA GeForce RTX 3070 Laptop GPU

Who can help?

vision models: @amyeroberts, @qubvel

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Run this script:

from transformers import AutoImageProcessor, ConvNextV2Model
import torch
import torch.nn as nn
import time
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

image_processor = AutoImageProcessor.from_pretrained("facebook/convnextv2-large-1k-224")
model = ConvNextV2Model.from_pretrained("facebook/convnextv2-large-1k-224")

inputs = image_processor(image, return_tensors="pt")

start_time = time.time()

model.train()

logits = model(**inputs).last_hidden_state.mean(dim=1) # [batch_size, hidden_size]
criterion = nn.BCEWithLogitsLoss()

fake_logits = torch.randn_like(logits)

loss = criterion(logits, fake_logits)
loss.backward()

print(time.time() - start_time)

Expected behavior

It seems there is performance degradation between version 4.50.* and 4.51.*. Tried pytorch version 2.4 & 2.7.

In my testing, 4.51.* is about 4x slower than the previous version. Using the script attached:

4.50.* uses 1.1s
4.51.* uses 4.5s

The text was updated successfully, but these errors were encountered:

yuan-thomas · 2025-04-24T11:45:26Z

While duration is longer, GPU utilization is much higher too, from average of 30% in 4.50.* to 90% in 4.51.*, while running this script.

yuan-thomas added the bug label Apr 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degradation on certain vision models from v4.51.* #37748

Performance degradation on certain vision models from v4.51.* #37748

yuan-thomas commented Apr 24, 2025 •

edited

Loading

yuan-thomas commented Apr 24, 2025

Performance degradation on certain vision models from v4.51.* #37748

Performance degradation on certain vision models from v4.51.* #37748

Comments

yuan-thomas commented Apr 24, 2025 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

yuan-thomas commented Apr 24, 2025

yuan-thomas commented Apr 24, 2025 •

edited

Loading