Block interface: refactor #339

jlamypoirier · 2025-07-29T18:21:33Z

✨ Description

Decouple the block interface from the transformer. This will make the SSM interface cleaner, improve readability, prevent bug, simplify the implementation of future mixers, etc. It's also a step towards merging SSMs into the GPT model, and varying block configurations (#242). I also included some groundwork for these tasks, so that upcoming PRs are smaller and simpler.

This PR is pure refactor, and changes that would cause backward compatibility concerns are left for future work.

Create a block submodule and move everything non-specific to transformers there from the transformer directory.
Extract base classes BlockConfig, MLPConfig, AttentionConfig from TransformerConfig. Using inheritance for backward compatibility, but composition would be preferable.
Extract base classes BlockDimNames, BlockKwargs from their transformer counterparts for common variables. Use inheritance in their specializations to simplify usage. (Ex. we can systematically use SSMDimNames, SSMKwargs when dealing with SSMs.)
Remove unused fields per_layer_lr_scale(from incorrect inheritance of BlockConfig) and normalization from SSMConfig, which would cause unexpected behaviour if defined.
Make a layer debugging utility to merge the various debug calls and help add new ones.
Move initialization to its own file.
Lots of misc interface fixes and improvements.

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

This reverts commit a5eb076.

tscholak · 2025-08-01T15:10:08Z

@jlamypoirier, thanks for pushing on this. Unifying GPT and HybridSSM is absolutely the right long-term goal, and this PR on block abstractions is valuable groundwork.

Before we get too deep into this, I want to clarify expectations on the direction for the block architecture.

Eventually, we need to land the feature #242 exactly as scoped: declarative, per-block config, explicit architecture, full heterogeneity, and named block/weight sharing. Please have a look also at #277.

That's the path forward we need. Anything that diverges from this or locks us into an alternative config approach (including any override machinery revival) is off the table and won't be accepted by me or the team.

Your skills are exactly what we need to lay the groundwork for #242 quickly and cleanly. That doesn't mean you need to implement the whole feature yourself. What matters is that your work enables us to land #242 without obstacles.

Thanks.

jlamypoirier · 2025-08-01T17:19:28Z

@tscholak I'm aiming quite close to the scope https://github.com/ServiceNow/Fast-LLM/pull/342/files#diff-a600ef5264541f7a4a8b245115233343926edacce94e81249e31134e8760adf1R174.

After having a look, I think it's safer that I implement it myself. There are very complex backward compatibility implications to deal with, plus it's a good time to deal with some old technical debt and add much-needed improvements.

I'll leave out weight sharing for now though because it's a brand new feature and is not directly related.

jlamypoirier added 6 commits July 29, 2025 13:39

Block interface

02f8af5

Merge branch 'tp_mamba' into block_interface

b2f4476

fixes

ce70b16

fix

a9f733d

stuff

a5eb076

Revert "stuff"

ab484ac

This reverts commit a5eb076.

jlamypoirier mentioned this pull request Jul 31, 2025

[Prototype] Block interface: workspace #342

Draft

jlamypoirier added 8 commits July 31, 2025 15:58

stuff

b68d360

misc

82c9dbd

misc

9fbb9ff

misc

44df195

misc

3bb03cb

misc

98bae95

fixes

fd731ef

fixes

f483321

jlamypoirier changed the title ~~[Prototype] Block interface~~ Block interface Aug 1, 2025

jlamypoirier changed the title ~~Block interface~~ Block interface: refactor Aug 1, 2025

jlamypoirier added 2 commits August 8, 2025 16:21

Merge branch 'tp_mamba' into block_interface

c16c00f

stuff

07c9211

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Block interface: refactor #339

Block interface: refactor #339

Uh oh!

jlamypoirier commented Jul 29, 2025 •

edited

Loading

Uh oh!

tscholak commented Aug 1, 2025

Uh oh!

jlamypoirier commented Aug 1, 2025

Uh oh!

Uh oh!

Block interface: refactor #339

Are you sure you want to change the base?

Block interface: refactor #339

Uh oh!

Conversation

jlamypoirier commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

🔍 Type of change

Uh oh!

tscholak commented Aug 1, 2025

Uh oh!

jlamypoirier commented Aug 1, 2025

Uh oh!

Uh oh!

jlamypoirier commented Jul 29, 2025 •

edited

Loading