Skip to content

Block interface: refactor #339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 16 commits into
base: tp_mamba
Choose a base branch
from
Draft

Block interface: refactor #339

wants to merge 16 commits into from

Conversation

jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Jul 29, 2025

✨ Description

Decouple the block interface from the transformer. This will make the SSM interface cleaner, improve readability, prevent bug, simplify the implementation of future mixers, etc. It's also a step towards merging SSMs into the GPT model, and varying block configurations (#242). I also included some groundwork for these tasks, so that upcoming PRs are smaller and simpler.

This PR is pure refactor, and changes that would cause backward compatibility concerns are left for future work.

  • Create a block submodule and move everything non-specific to transformers there from the transformer directory.
  • Extract base classes BlockConfig, MLPConfig, AttentionConfig from TransformerConfig. Using inheritance for backward compatibility, but composition would be preferable.
  • Extract base classes BlockDimNames, BlockKwargs from their transformer counterparts for common variables. Use inheritance in their specializations to simplify usage. (Ex. we can systematically use SSMDimNames, SSMKwargs when dealing with SSMs.)
  • Remove unused fields per_layer_lr_scale(from incorrect inheritance of BlockConfig) and normalization from SSMConfig, which would cause unexpected behaviour if defined.
  • Make a layer debugging utility to merge the various debug calls and help add new ones.
  • Move initialization to its own file.
  • Lots of misc interface fixes and improvements.

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

@tscholak
Copy link
Collaborator

tscholak commented Aug 1, 2025

@jlamypoirier, thanks for pushing on this. Unifying GPT and HybridSSM is absolutely the right long-term goal, and this PR on block abstractions is valuable groundwork.

Before we get too deep into this, I want to clarify expectations on the direction for the block architecture.

Eventually, we need to land the feature #242 exactly as scoped: declarative, per-block config, explicit architecture, full heterogeneity, and named block/weight sharing. Please have a look also at #277.

That's the path forward we need. Anything that diverges from this or locks us into an alternative config approach (including any override machinery revival) is off the table and won't be accepted by me or the team.

Your skills are exactly what we need to lay the groundwork for #242 quickly and cleanly. That doesn't mean you need to implement the whole feature yourself. What matters is that your work enables us to land #242 without obstacles.

Thanks.

@jlamypoirier
Copy link
Collaborator Author

@tscholak I'm aiming quite close to the scope https://github.com/ServiceNow/Fast-LLM/pull/342/files#diff-a600ef5264541f7a4a8b245115233343926edacce94e81249e31134e8760adf1R174.

After having a look, I think it's safer that I implement it myself. There are very complex backward compatibility implications to deal with, plus it's a good time to deal with some old technical debt and add much-needed improvements.

I'll leave out weight sharing for now though because it's a brand new feature and is not directly related.

@jlamypoirier jlamypoirier changed the title [Prototype] Block interface Block interface Aug 1, 2025
@jlamypoirier jlamypoirier changed the title Block interface Block interface: refactor Aug 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants