⚡️ Speed up method `PixelNorm.forward` by 7% #7741

aseembits93 · 2025-04-23T00:17:46Z

📄 7% (0.07x) speedup for `PixelNorm.forward` in `comfy/ldm/lightricks/vae/pixel_norm.py`

⏱️ Runtime : 2.28 milliseconds → 2.13 milliseconds (best of 51 runs)

📝 Explanation and details

To optimize the runtime of this program, we can leverage some of PyTorch's functions for better performance. Specifically, we can use torch.rsqrt and torch.mean wisely to optimize the normalization calculation. This can be beneficial from a performance perspective since certain operations might be optimized internally.

Here is an optimized version of the code.

Explanation.

torch.mean(x * x, dim=self.dim, keepdim=True): Calculating the mean of the squared values directly.
torch.rsqrt(mean_square): Using torch.rsqrt to compute the reciprocal of the square root. This can be more efficient than computing the square root and then taking the reciprocal separately.
x * torch.rsqrt(mean_square): Multiplying x by the reciprocal square root we computed above.

This reformulation can lead to improved performance because it reduces the number of operations by specifically leveraging PyTorch's optimized backend operations.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 58 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests Details

import pytest  # used for our unit tests
import torch
from comfy.ldm.lightricks.vae.pixel_norm import PixelNorm
from torch import nn

# unit tests

# Basic Functionality
def test_basic_functionality():
    pn = PixelNorm()
    x = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Single Element Tensor

def test_higher_dimensional_tensors():
    pn = PixelNorm()
    x = torch.rand(2, 3, 4)
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Different Dimensions for Normalization
def test_different_dimensions_for_normalization():
    pn = PixelNorm(dim=0)
    x = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=0, keepdim=True) + pn.eps)

# Edge Case: Zero Tensor

def test_large_values():
    pn = PixelNorm()
    x = torch.tensor([[1e10, 2e10], [3e10, 4e10]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Edge Case: Small Values
def test_small_values():
    pn = PixelNorm()
    x = torch.tensor([[1e-10, 2e-10], [3e-10, 4e-10]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Negative Values
def test_negative_values():
    pn = PixelNorm()
    x = torch.tensor([[-1.0, -2.0], [-3.0, -4.0]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Mixed Positive and Negative Values
def test_mixed_positive_and_negative_values():
    pn = PixelNorm()
    x = torch.tensor([[-1.0, 2.0], [-3.0, 4.0]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Large Scale Test
def test_large_scale():
    pn = PixelNorm()
    x = torch.rand(1000, 1000)
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Non-Float Tensors
def test_non_float_tensors():
    pn = PixelNorm()
    x = torch.tensor([[1, 2], [3, 4]], dtype=torch.int32)
    with pytest.raises(RuntimeError):
        pn.forward(x)

# Different Epsilon Values
def test_different_epsilon_values():
    pn = PixelNorm(eps=1e-5)
    x = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Non-Contiguous Tensors
def test_non_contiguous_tensors():
    pn = PixelNorm()
    x = torch.rand(2, 3).transpose(0, 1)
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Gradient Check
def test_gradient_check():
    pn = PixelNorm()
    x = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)
    codeflash_output = pn.forward(x); result = codeflash_output
    result.sum().backward()

# Empty Tensor

def test_tensor_with_nan_values():
    pn = PixelNorm()
    x = torch.tensor([[float('nan'), 2.0], [3.0, 4.0]])
    codeflash_output = pn.forward(x); result = codeflash_output

# Tensor with Inf Values
def test_tensor_with_inf_values():
    pn = PixelNorm()
    x = torch.tensor([[float('inf'), 2.0], [3.0, 4.0]])
    codeflash_output = pn.forward(x); result = codeflash_output

# Tensor with Mixed NaN and Inf Values
def test_tensor_with_mixed_nan_and_inf_values():
    pn = PixelNorm()
    x = torch.tensor([[float('nan'), float('inf')], [float('-inf'), 4.0]])
    codeflash_output = pn.forward(x); result = codeflash_output

# Tensor with Repeated Values

def test_tensor_with_extremely_high_variance():
    pn = PixelNorm()
    x = torch.tensor([[1.0, 1e10], [1e-10, 1.0]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Tensor with Zero Mean
def test_tensor_with_zero_mean():
    pn = PixelNorm()
    x = torch.tensor([[1.0, -1.0], [2.0, -2.0]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Tensor with Non-Standard Shapes
def test_tensor_with_non_standard_shapes():
    pn = PixelNorm()
    x = torch.rand(1, 1000)
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Tensor with Only One Dimension
def test_tensor_with_only_one_dimension():
    pn = PixelNorm(dim=0)
    x = torch.tensor([1.0, 2.0, 3.0, 4.0])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=0, keepdim=True) + pn.eps)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest  # used for our unit tests
import torch
from comfy.ldm.lightricks.vae.pixel_norm import PixelNorm
from torch import nn

# unit tests

# Basic Functionality Tests

def test_basic_multi_dimensional():
    pn = PixelNorm()
    x = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Edge Case Tests



def test_normalize_along_dim0():
    pn = PixelNorm(dim=0)
    x = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=0, keepdim=True) + pn.eps)

def test_normalize_along_dim2():
    pn = PixelNorm(dim=2)
    x = torch.tensor([[[1.0, 2.0], [3.0, 4.0]], [[5.0, 6.0], [7.0, 8.0]]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=2, keepdim=True) + pn.eps)

# Large Scale Test Cases
def test_large_tensor():
    pn = PixelNorm()
    x = torch.randn(1000, 1000)
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Special Values


def test_very_large_tensor():
    pn = PixelNorm()
    x = torch.randn(10000, 10)  # Keeping it under 100MB
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

# Rare or Unexpected Edge Cases

def test_high_dimensional_single_element():
    pn = PixelNorm()
    x = torch.tensor([[[[[1.0]]]]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2) + pn.eps)

def test_negative_dimension():
    pn = PixelNorm(dim=-1)
    x = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=-1, keepdim=True) + pn.eps)

def test_non_contiguous_tensor():
    pn = PixelNorm()
    x = torch.tensor([[1.0, 2.0], [3.0, 4.0]]).t()
    codeflash_output = pn.forward(x); result = codeflash_output
    expected = x / torch.sqrt(torch.mean(x**2, dim=1, keepdim=True) + pn.eps)

To edit these changes git checkout codeflash/optimize-PixelNorm.forward-m9j2ee0l and push.

To optimize the runtime of this program, we can leverage some of PyTorch's functions for better performance. Specifically, we can use `torch.rsqrt` and `torch.mean` wisely to optimize the normalization calculation. This can be beneficial from a performance perspective since certain operations might be optimized internally. Here is an optimized version of the code. ### Explanation. - `torch.mean(x * x, dim=self.dim, keepdim=True)`: Calculating the mean of the squared values directly. - `torch.rsqrt(mean_square)`: Using `torch.rsqrt` to compute the reciprocal of the square root. This can be more efficient than computing the square root and then taking the reciprocal separately. - `x * torch.rsqrt(mean_square)`: Multiplying `x` by the reciprocal square root we computed above. This reformulation can lead to improved performance because it reduces the number of operations by specifically leveraging PyTorch's optimized backend operations.

aseembits93 requested a review from comfyanonymous as a code owner April 23, 2025 00:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up method `PixelNorm.forward` by 7% #7741

⚡️ Speed up method `PixelNorm.forward` by 7% #7741

aseembits93 commented Apr 23, 2025 •

edited

Loading

⚡️ Speed up method PixelNorm.forward by 7% #7741

Are you sure you want to change the base?

⚡️ Speed up method PixelNorm.forward by 7% #7741

Conversation

aseembits93 commented Apr 23, 2025 • edited Loading

📄 7% (0.07x) speedup for PixelNorm.forward in comfy/ldm/lightricks/vae/pixel_norm.py

Explanation.

⚡️ Speed up method `PixelNorm.forward` by 7% #7741

⚡️ Speed up method `PixelNorm.forward` by 7% #7741

aseembits93 commented Apr 23, 2025 •

edited

Loading

📄 7% (0.07x) speedup for `PixelNorm.forward` in `comfy/ldm/lightricks/vae/pixel_norm.py`