Skip to content

Fix checkpoint export errors for the Dream model #311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 25, 2025

Conversation

nitsanluke
Copy link
Contributor

@nitsanluke nitsanluke commented Jun 17, 2025

✨ Description

PR fixes existing issue in exporting a diffusion model.

πŸ” Type of change

Select all that apply:

  • πŸ› Bug fix (non-breaking change that addresses a specific issue)
  • πŸš€ New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • πŸ“ˆ Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • πŸ› οΈ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • πŸ“¦ Dependency bump (updates dependencies, including Dockerfile or package changes)
  • πŸ“ Documentation change (updates documentation, including new content or typo fixes)
  • πŸ”§ Infrastructure/Build change (affects build process, CI/CD, or dependencies)

πŸ“ Changes

βœ… Checklist

Make sure the following tasks are completed before submitting the PR:

General

  • πŸ“œ I have read and followed the contributing guidelines.
  • 🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
  • πŸŽ‰ The functionality is complete, and I have tested the changes.
  • πŸ“ I have updated the documentation if needed.
  • ⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
  • 🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

  • πŸ‹ I have updated the Docker configuration or dependencies, if applicable.
  • πŸ”„ I have ensured compatibility with the existing setup after dependency changes.

Testing

  • πŸ§ͺ I have added or updated tests to cover my changes.
  • βœ”οΈ New and existing tests pass locally with my changes.
  • 🚦 I have tested these changes on GPUs and verified training stability.
  • πŸ‹οΈ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

  • πŸ“Š I have run benchmarks where applicable to evaluate the performance impact.
  • βœ… The benchmarks show no performance regression.
  • πŸš€ The benchmarks indicate a potential performance improvement.
  • ⚠️ The benchmarks indicate a potential performance degradation.
  • πŸ“ˆ I have provided benchmark results and detailed any performance impact below, if applicable.

πŸ“Š Performance Impact Details

If there is any impact on performance, describe it and provide benchmark results, if applicable:


πŸ—’οΈ Additional Notes

Include any additional context, information, or considerations here, such as known issues, follow-up tasks, or backward compatibility concerns.

@nitsanluke nitsanluke requested a review from jlamypoirier June 17, 2025 23:01
@nitsanluke nitsanluke changed the title Fix export checkpoint error for dream Fix checkpoint export errors for the Dream model Jun 17, 2025
@nitsanluke nitsanluke marked this pull request as ready for review June 18, 2025 14:02
@jlamypoirier jlamypoirier mentioned this pull request Jun 18, 2025
8 tasks
@nitsanluke nitsanluke requested a review from jlamypoirier June 19, 2025 15:17
@jlamypoirier
Copy link
Collaborator

jlamypoirier commented Jun 19, 2025

Just a heads up, #264 added a new bug (@bigximik) TypeError: Can't instantiate abstract class HybridSSMModel without an implementation for abstract method 'get_tflops'. We previously used the trainer inheritance for it, but the HybridSSMModel doesn't inherit from GPT. Simplest fix would be to make it inherit, the formula will be wrong but at least the crash will be gone. (edit: fixed in #310)

@nitsanluke nitsanluke requested a review from jlamypoirier June 23, 2025 17:37
Copy link
Collaborator

@jlamypoirier jlamypoirier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, otherwise LGTM

@nitsanluke nitsanluke merged commit 37a4944 into main Jun 25, 2025
0 of 2 checks passed
@nitsanluke nitsanluke deleted the fx_test_checkpoint_bug branch June 25, 2025 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants