[Prototype] Generalize dynamic config classes #245

jlamypoirier · 2025-04-30T18:08:52Z

✨ Description

Generalize the concept of dynamic config class from the dataset mechanism to all config class. I opted for a unique global registry of all config classes. This choice is motivated by simplicity (multiple registries would be hell to manage), though it means registry keys need to be globally unique so they'll tend to be longer. Config classes are now specified through the type field, using the config class name or its shorthand without the Config suffix.

Included a proposal to simplify the cli, so the runnable is selected through its type field, ex. fast-llm type=GPTTrainer ... instead of fast-llm train gpt.

Added backward compatibility for the old dataset dynamic type names.

TODO:

Add backward compatibility for the cli?
The type field conflicts with normalization, rotary and peft types. These are good use cases for dynamic classes so I'll implement, but I'll wait after Review the architecture split #237.
Some other potential use cases in the optimizer (Lr schedule), but no need to do in this PR.

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

tscholak · 2025-05-02T17:32:14Z

looks great so far!

fast-llm type=GPTTrainer is principled (because it taps into the override logic) but ugly (because spelling out type= is mandatory and because it's using class names as values). I think the better approach would be to still use the subcommand trick and force that commands set type to a str.

about the global registry thing. another approach would be to outsource the functionality to a mixin class that can be optionally added to a config class. that way we can have more control over where this feature is used, and it would also for having different registries for different config base classes.

jlamypoirier · 2025-05-02T22:25:47Z

fast-llm type=GPTTrainer is principled (because it taps into the override logic) but ugly (because spelling out type= is mandatory and because it's using class names as values). I think the better approach would be to still use the subcommand trick and force that commands set type to a str.

It would be a bit of a hack, but I can make it work.

about the global registry thing. another approach would be to outsource the functionality to a mixin class that can be optionally added to a config class. that way we can have more control over where this feature is used, and it would also for having different registries for different config base classes.

I considered several alternatives, but they all tend to run into trouble once we start dealing with more than just a few registries. The user has to think about what registry a class belongs to when setting the name, and make sure to pick a unique name for that one, so the benefit of having shorter names would come at a cost to developers. Also there is the issue of classes potentially belonging to multiple registries, which can be solved at the price of extra complexity and varying type field name, which makes things even worse.

I tried to think of a developer-friendly solution for a while but ended up failing to, so went for a single global registry instead. It's good enough because the validation mechanism ensures the selected class has the expected base class. The only downsides are the less user-friendly names and the impossibility to restrict which subclasses are allowed for a given field.

jlamypoirier · 2025-05-05T23:13:47Z

@tscholak I started working on more dynamic classes and realized user-friendly names are essential. So I implemented a simple solution where each class can have its own registry, and the class is selected by iterating through the registries in the MRO. We still build a Config registry from class names, and others can be created with a simple register call.

So we get neat names without adding complexity on the developer side. The only real missing piece would some way for users to find out the available types, we should add it to automatic config documentation once we have it.

tscholak · 2025-05-06T17:34:19Z

yes, this is great!

…asses

tscholak · 2025-05-09T22:23:31Z

fast_llm/cli.py

+        # TODO: Remove backward compatibility.
+        if len(args) >= 2 and args[0] == "train":
+            if args[1] == "gpt":
+                args = ["type=train_gpt"] + args[2:]


with regards to deprecating the fast-llm train command syntax. I'm not super confident that we ever will.

Found a way to preserve the syntax https://github.com/ServiceNow/Fast-LLM/pull/245/files#diff-1572906edaf7e94bf94dcf9dfdecc5b34757c2e56924cd3054a365b10db5ee31R26

tscholak · 2025-05-09T22:40:47Z

fast_llm/data/dataset/gpt/config.py

+GPTSampledDatasetConfig.register_subclass("concatenated_memmap", GPTConcatenatedMemmapConfig)
+GPTSampledDatasetConfig.register_subclass("fim", GPTFimSampledDatasetConfig)
+GPTSampledDatasetConfig.register_subclass("legacy", GPTLegacyDatasetConfig)
+GPTSampledDatasetConfig.register_subclass("test_slow", GPTTestSlowDatasetConfig)


hmm, so this is for registering user-friendly names or the classes themselves?
can we make registration automatic? didn't we want to use the type classvar for that?

type is for the instance field. We could add other classvars, but it wouldn't be ideal because we'd still need to distinguish which base class it belongs to (ex. one classvar for each base class), and we may want to register multiple user-friendly names. Also It adds the magic of the classvar being converted to dynamic type name. Overall the external register call seems better.

I am confused by this answer. I was expecting something like this:

If a config class defines/overrides the type classvar field, it automatically becomes a registered dynamic class with the value of the type field becoming its human-readable name, which will be used for serialization and deserialization.

Example:

@config_class class MainConfig: foo: FooConfig @config_class class FooConfig: type: typing.ClassVar[str] class BarConfig(FooConfig) type = "bar" something: int class BazConfig(FooConfig) type = "baz" something_else: int

Serialized version:

foo: type: baz something_else: 5

I think unless we can show a concrete need for multiple human-readable names per class or for registering the same subclass under multiple base classes (imho both extremely niche), the explicit register_subclass(...) pattern adds too much noise and duplication.

type is a field for instances https://github.com/ServiceNow/Fast-LLM/pull/245/files#diff-15aab2503b9a097809b7fa450b3f63ea0db5d700982d7265610fdf28b476788aR989. What you are referring to is an equivalent to the type_ classvar we had before https://github.com/ServiceNow/Fast-LLM/pull/245/files#diff-94ac1cad97ec073f07a7d6ead2489610430a6fb40aa3e34d520b86bd88e6c9fcL102. These two can't be the same.

Adding back the same type_ variable is no longer an option, because all base classes have a registry and we don't know which one is concerned. So we'd need some adjustments, I can think of a few options: a registry_class classvar, separate type variables (ex. gpt_dataset_type), or a more complex type: typing.ClassVar[tuple[tuple[type[Config], str], ...]]. None of these look better than register_subclass.

I think we do want to support multiple names and/or base classes. Multiple names may be needed for backward compatibility, or to simplify usage. (The final saved config use the full class name either way, so that's just convenience.) And multiple base classes is useful for specialization, for example GPTTrainerConfig should be registered as train_gpt as a RunnableConfig, but can take the shorter name gpt as a TrainerConfig. Concretely this allows the fast-llm train gpt syntax.

Having a decorator looks very good. However, why can't we use one global registry for all Config classes?
Names should be unique per class hierarchy, conflicts will rise on register (not implemented in example below).

Below is an example using a global ConfigRegistry, which supports:

Command runners

Typed class hierarchies (models, trainers, evaluators, etc.)

Instantiating proper subclasses from YAML

Auto-dispatching based on type: keys in config

The system includes demo configs and commands at the end to show how it works in practice.

Features

Decorator for easy class registration: @ConfigRegistry.register("name", implementation_class)

Automatic resolution of subclassed dataclasses from YAML with type field

Reusable base logic for any component: models, runners, evaluators, checkpoints, etc.

Command line integration with flexible structure

🔧 Subclass Resolution Logic

In this design, the "base class" used to resolve registered subclasses is determined by context:

✅ In nested dataclass fields (e.g., model: ModelConfig), the base class is what's declared in the field's type annotation.

✅ In top-level commands or manual use, the base class is explicitly passed to from_dict(base_cls, data) or ConfigRegistry.get(type_name, base_cls).

This allows the same global registry to work across unrelated class hierarchies (e.g., ModelConfig, TrainerConfig, EvaluatorConfig), because each lookup is scoped to its specific base class.

What do you think?

from dataclasses import is_dataclass, fields, dataclass from typing import get_args, get_origin, Union, Any import yaml import types # ---------------- CONFIG REGISTRY MACHINERY (SIMPLIFIED) ---------------- @dataclass class Config: # Each Config can optionally define a "configurable" runtime class it instantiates configurable_class = None def get_configurable(self): # Returns an instance of the associated runtime class using self as config if self.configurable_class is None: return None return self.configurable_class(self) class ConfigRegistry: # Global registry mapping friendly names (like "gpt", "train") to config classes _table: dict[str, set[type]] = {} @classmethod def register(cls, name: str | None = None, configurable_class: type | None = None): # Decorator to register a Config class under a name and link to its runtime class def deco(target: Config): key = name or target.__name__ target.configurable_class = configurable_class cls._table.setdefault(key, set()).add(target) return target return deco @classmethod def get(cls, name: str, base: type) -> type: # Retrieve the config class registered under `name` that is a subclass of `base` for cand in cls._table.get(name, ()): if issubclass(cand, base): return cand raise KeyError(f"No registered class named '{name}' that is subclass of {base.__name__}") def from_dict(cls: type, raw: dict | list | Any) -> Any: """Recursively convert a dict (or list) into a dataclass instance.""" if is_dataclass(cls) and isinstance(raw, dict): init_kwargs = {} for f in fields(cls): field_value = raw.get(f.name) field_type = f.type if field_value is None: init_kwargs[f.name] = None continue init_kwargs[f.name] = _convert_value(field_value, field_type) return cls(**init_kwargs) return _convert_value(raw, cls) def _convert_value(value: Any, target_type: type) -> Any: # Handles typing annotations like list[T], dict[K, V], Union, and dataclasses origin = get_origin(target_type) args = get_args(target_type) if origin is list and isinstance(value, list): return [_convert_value(v, args[0]) for v in value] elif origin is dict and isinstance(value, dict): return {_convert_value(k, args[0]): _convert_value(v, args[1]) for k, v in value.items()} elif is_dataclass(target_type) and isinstance(value, dict): # If a 'type' key is present, use it to find a registered subclass typ_name = value.pop("type", None) if typ_name: subclass = ConfigRegistry.get(typ_name, target_type) if subclass: return from_dict(subclass, value) return from_dict(target_type, value) elif origin is types.UnionType: # Handle Optional[T] (which is Union[T, None]) non_none_args = [arg for arg in args if arg is not type(None)] if len(non_none_args) == 1: return _convert_value(value, non_none_args[0]) return value # ---------------- MODEL CLASSES ---------------- @dataclass class CheckPointConfig(Config): path: str @dataclass class GPTCheckPointConfig(CheckPointConfig): pass @dataclass class SSMCheckPointConfig(CheckPointConfig): pass @ConfigRegistry.register("llama") @dataclass class LlamaCheckPointConfig(GPTCheckPointConfig): pass @ConfigRegistry.register("mamba") @dataclass class MambCheckPointConfig(SSMCheckPointConfig): pass # Model runtime classes (used during execution) class Model: pass class GPTModel(Model): pass class SSMModel(Model): pass @dataclass class ModelConfig(Config): checkpoint: CheckPointConfig | None = None def get_model(self): return self.get_configurable() @ConfigRegistry.register("gpt", GPTModel) @dataclass class GPTModelConfig(ModelConfig): pass @ConfigRegistry.register("ssm", SSMModel) @dataclass class SSMModelConfig(ModelConfig): pass # ---------------- TRAINER CLASSES ---------------- @dataclass class TrainerConfig: train_iters: int # ---------------- EVALUATOR CLASSES ---------------- class Evaluator: pass class EvaluatorLoss(Evaluator): pass class EvaluatorLmEval(Evaluator): pass @dataclass class EvaluatorConfig(Config): def get_evaluator(self): return self.get_configurable() @ConfigRegistry.register("loss", EvaluatorLoss) @dataclass class EvaluatorLossConfig(EvaluatorConfig): pass @ConfigRegistry.register("lm_eval", EvaluatorLmEval) @dataclass class EvaluatorLmEvalConfig(EvaluatorConfig): pass # ---------------- RUNNER CLASSES ---------------- class Runner: def __init__(self, cfg): self.cfg = cfg def run(self): raise NotImplementedError class EvaluateRunner(Runner): def run(self): print("Running evaluation for", self.cfg.model.__class__.__name__) print("Checkpoint is", self.cfg.model.checkpoint.__class__.__name__) for name, ev in self.cfg.evaluations.items(): print(f" • {name}: {ev.__class__.__name__}") class TrainRunner(Runner): def run(self): print("Running Training for", self.cfg.model.__class__.__name__) print("Checkpoint is", self.cfg.model.checkpoint.__class__.__name__) print("Iterations:", self.cfg.trainer.train_iters) @dataclass class RunnerConfig(Config): def get_runner(self): return self.get_configurable() @ConfigRegistry.register("evaluate", EvaluateRunner) @dataclass class EvaluateConfig(RunnerConfig): model: ModelConfig evaluations: dict[str, EvaluatorConfig] @ConfigRegistry.register("train", TrainRunner) @dataclass class TrainConfig(RunnerConfig): model: ModelConfig trainer: TrainerConfig # ---------------- CLI ENTRYPOINT ---------------- class Command: @staticmethod def run(argv: list[str]): if len(argv) != 2: raise SystemExit("Usage: <command> <yaml-string>") cmd_name, yaml_blob = argv raw_cfg = yaml.safe_load(yaml_blob) # Look up the appropriate config class from registry cmd_cfg_cls = ConfigRegistry.get(cmd_name, RunnerConfig) # Parse YAML to a typed dataclass cmd_cfg = from_dict(cmd_cfg_cls, raw_cfg) # Instantiate and run the corresponding Runner cmd_cfg.get_configurable().run() # ---------------- DEMO ---------------- evaluate_command_args = """ model: type: gpt checkpoint: type: llama3 path: /path/to/model evaluations: stack3b: type: lm_eval fineweb: type: loss """ args = ["evaluate", evaluate_command_args] Command.run(args) train_command_args = """ model: type: ssm checkpoint: type: mamba path: /path/to/model trainer: train_iters: 1000 """ args = ["train", train_command_args] Command.run(args)

@bigximik thanks for the suggestion. It's actually very close to my initial implementation, without the implicit registration. But I moved away from it following discussions here and other concerns that arose when trying it out in existing dynamic classes:

This is opt-in from the developer perspective, but always enabled for the user. I like that but we got an explicit request to do that [Prototype] Generalize dynamic config classes #245 (comment). There is also an explicit request to have multiple registries

Name clashes are too much of a problem. We can have a "gpt" model, but also a "gpt" trainer, a "gpt" dataset, a "gpt" huggingface model, etc. With a single global register, this is not possible and we need longer names to properly disambiguate, at which point it's basically the same as the class name. So the only viable option is to use the class name itself.

That means long and annoying names. We can't just write "default" or "llama3" for rotary embeddings, we have to provide redundant information, ex. "Llama3RotaryConfig" or an adjusted form like "llama3_rotary".

It's also not great for backward compatibility, since we'd have to deprecate all existing human-friendly names.

Concerning the cli part, it goes against the request to preserve the syntax. Your suggestion solves the "train" part (it's basically the same as mine) but leaves out the "gpt" part, so we'd still need to dispatch the actual trainer dynamically.

@jlamypoirier Thanks for the comments!

Regarding the first four points:

Name conflicts aren't a concern from my point of view when using a global registry — the registry supports multiple classes registered under the same name. Real name conflicts can occur if more than one class is a subclass of the required base at resolution time. However, the same issue would occur even if separate registries were used in the decorator. These conflicts can still be detected and reported. Yes, conflict detection is deferred from decorator application to resolution time, but it's still caught early — e.g., during a dry run or config loading.

This approach ensures the base class (i.e., the separate registry) needs to be specified only once, and only in the place it actually matters — such as in the config field type — which improves maintainability.

To my understanding, this also aligns with an opt-in model.

Perhaps @tscholak can clarify whether this satisfies the multiple registry and opt-in requirement?

Regarding implicit Configurable registration

We can explicitly separate the two decorators: one for configurable registration, and one for registering user-friendly names. Or we can keep it as it is now for configurable registration.

Regarding leaving the CLI interface intact

We can keep the CLI interface unchanged, so it remains fast-llm train gpt --config ..., still using a single global registry. Actually we can have both, but i still would advocate for a new one which is fast-llm train --config ...

@dataclass class RunnerConfig(Config): def get_runner(self): return self.get_configurable() @ConfigRegistry.register("evaluate", EvaluateRunner) @dataclass class EvaluateConfig(RunnerConfig): model: ModelConfig evaluations: dict[str, EvaluatorConfig] @ConfigRegistry.register("train", TrainRunner) @dataclass class TrainConfig(RunnerConfig): model: ModelConfig trainer: TrainerConfig @ConfigRegistry.register("evaluate_gpt", EvaluateRunner) @dataclass class GPTEvaluateConfig(RunnerConfig): model: GPTModelConfig evaluations: dict[str, EvaluatorConfig] @ConfigRegistry.register("train_ssm", TrainRunner) @dataclass class SSMTrainConfig(RunnerConfig): model: SSMModelConfig trainer: TrainerConfig # ---------------- CLI ENTRYPOINT ---------------- class Command: @staticmethod def run(argv: list[str]): if len(argv) != 3 and len(argv) != 2: raise SystemExit("Usage: <command> <model> <yaml-string> or <command> <yaml-string>") if len(argv) == 3: cmd_name, model_name, yaml_blob = argv else: cmd_name, yaml_blob = argv model_name = None raw_cfg = yaml.safe_load(yaml_blob) if model_name is None: full_command_name = cmd_name else: full_command_name = f'{cmd_name}_{model_name}' # Look up the appropriate config class from registry print(full_command_name) print(ConfigRegistry._table) cmd_cfg_cls = ConfigRegistry.get(full_command_name, RunnerConfig) # Parse YAML to a typed dataclass cmd_cfg = from_dict(cmd_cfg_cls, raw_cfg) # Instantiate and run the corresponding Runner cmd_cfg.get_configurable().run() # ---------------- DEMO ---------------- evaluate_command_args = """ model: checkpoint: type: llama3 path: /path/to/model evaluations: stack3b: type: lm_eval fineweb: type: loss """ args = ["evaluate", "gpt", evaluate_command_args] Command.run(args) train_command_args = """ model: checkpoint: type: mamba path: /path/to/model trainer: train_iters: 1000 """ args = ["train", "ssm", train_command_args] Command.run(args) train_command_args = """ model: type: ssm checkpoint: type: mamba path: /path/to/model trainer: train_iters: 1000 """ args = ["train", train_command_args] Command.run(args)

Note

My example is just a demo to demonstrate the concept. It doesn't implement type field injection during config serialization or other advanced features already handled by the config classes nor it autocompletion friendly.

This is the same kind of opt-in that got rejected here. All base classes potentially participate in the dynamic dispatch, depending on what their derived classes may do. I don't have a problem with it, but @tscholak specifically requested enabling dynamic dispatch for specific base classes only.

I'm not a fan of having a single registry with duplicate names. It works , but looks potentially confusing. I prefer an explicit specification of what base class is allowed for a given registration, it's easier to reason with and document, e.x we can make clear and explicit lists of allowed types for the base classes we care about for dynamic dispatch.

As for the cli part, your implementation would be difficult to run in practice because the actual model/trainer class is specified through inheritance rather than composition, and changing that would be really difficult. Once we take that into account, we basically get the implementation I'm proposing (see #269), independently of the actual registration mechanism.

Anyway, I've spent enough time on this an would rather move on to other tasks. #268 is fully functional and satisfies all requirements as far as I know, so I'd rather not change it anymore unless there is a very good reason to do so.

makes sense @jlamypoirier

tscholak · 2025-05-09T22:42:53Z

fast_llm/engine/training/config.py

@@ -349,30 +338,23 @@ class TrainerConfig(PretrainedFastLLMModelConfig, ExperimentConfig):
    _abstract = True
    # TODO: Generalize data, schedule, logging, etc.
    training: TrainingConfig = Field(
-        default_factory=TrainingConfig,


you mentioned in the other PR that this is necessary now, but I don't quite understand why. if you've got several classes of type TrainingConfig, you want to have a straightforward way to say which one is the default. that's pretty much it

Problem was all sorts of potential issues happening if not going through _from_dict, not necessarily related to this PR (ex. dataclasses doing the wrong thing when calling it for nested configs). But now that I enforced going through it (with the metaclass __call__), I think I can safely bring it back.

Actually this won't work without some hacking around. Problem is we don't just want to set a default, we want to set a default to the nested config's type field even is a value is provided. And we can't just set a default to type because it would propagate to subclasses and cause all sorts of trouble.

Ok, though we still need a clean, consistent way to specify default values for config fields.

field: SomeConfig = Field(default_factory=SomeConcreteConfig)

should just work. If SomeConcreteConfig is a subclass in dynamic dispatch / tagged union style setup, then instantiating it directly should still produce a valid config object. The fact that it has a type or tag field shouldn't block the default from being usable. If it's not currently reliable, that's a sign that we're over-complicating the dispatch mechanism than a reason to give up on defaults.

This would work perfectly fine, just not in a useful way, and it would be a ton of trouble.

default_factory (and default) is for when a value is not provided. This means it is (and should be) ignored whenever anything is passed, including an empty dict. So it's not suitable for setting the default field type, because setting any value will undo it.

On top of that, the config mechanism doesn't distinguish whether nested fields are passed explicitly or just empty. So some defaults may lead to incorrect serialization paths, ex: {} --(instantiate)-> Config({}) --(serialize)-> [MISSING] --(instantiate)-> Config([DEFAULT]). (in fact this happens for all nested fields, not just configs, but configs are the biggest risk)

I've spent many hours this week diving into this PR and its discussions to understand the broader impact. I want to be crystal-clear about where we stand:

Fast-LLM is an model training framework, not a configuration R&D project.

Yet the config subsystem keeps burning days of senior time. That's time we're not spending on training models, fixing bugs, or shipping features. We've now burned more than a week on a problem Pydantic solves out of the box. That's not sustainable.

Every extra layer of custom logic raises the bar for contributors, concentrates knowledge in one person, and stalls the real work: training models and shipping features. As the team lead, I have to call it what it is: it's an operational risk.

We need to reverse this pattern. The config system should get out of our way, not pull the whole team into its gravity well.

And no, this isn't about shutting down ideas. It's about protecting this team from complexity creep that slows everyone down.

As I mentioned multiple times already, I do care a lot about it, clear and concise configs are essential to me for development, testing, debugging, understanding bug reports, etc.

This illustrates the underlying issue: the current design optimizes for what the framework maintainer finds convenient for himself, but not for what contributors actually need when developing models or features.

Now, let's look at what works and doesn't work in Pydantic:

from pydantic import BaseModel, Field, ValidationError from typing import Literal, Union class Foo(BaseModel): type: Literal['foo'] = 'foo' x: int = 1 class Bar(BaseModel): type: Literal['bar'] = 'bar' y: int = 2 Item = Union[Foo, Bar] class Wrapper(BaseModel): item: Item = Field(default_factory=Foo, discriminator='type') print(Wrapper().model_dump()) # works, default_factory runs, uses Foo() # print(Wrapper(item={})) # doesn't work, validation fails because of missing discriminator # print(Wrapper(item={'x': 42})) # doesn't work, validation fails because of missing discriminator print(Wrapper(item={'type': 'bar', 'y': 99}).model_dump()) # works, dispatches to Bar

This setup gets most things right:

Dynamic dispatch is opt-in.

The discriminator is explicit and user-controllable.

Defaults work when the field is absent.

There is no magic registry.

Does it handle every edge case? No. But it works well for the common case. And that's the bar Fast-LLM needs to meet.

Right now, we're debating edge cases that haven't materialized (e.g., sub-subclassing edge-cases should be rare, and if we're running into them, that's a sign to simplify the class hierarchy and not to complicate things further.) Meanwhile, the team is stuck. That's the wrong tradeoff.

In hindsight, we should have adopted Pydantic from the start. But since we're here now, and deadlines are real, we need to course-correct pragmatically.

We've seen the cost of keeping this overly flexible. It's time to simplify. Here are the principles we need to lock in now:

Dynamic config behaviour must be opt-in, via a mixin or decorator. Most config classes will just be plain config dataclasses.

Subclasses declare their own tag via something, a ClassVar[str], or decorator param. It doesn't really matter. Your choice, @jlamypoirier. But it must be self-contained and discoverable.

If a config field is missing, we use default_factory. If a dict is provided without a tag (it shouldn't), we either:

inject the default tag automatically (preferred), or

error clearly with a message explaining the missing tag (acceptable fallback).

No global subclass registries tied to CLI routing. CLI dispatch should be its own, explicit layer. Config structure should not bend to command-line ergonomics.

These constraints exist to make Fast-LLM maintainable and usable by the whole team. The bar is not "can @jlamypoirier make this work." It's "can any contributor pick this up, understand it, and extend it easily."

That's what makes this framework sustainable.

Let's be clear, the time we're taking is due to the overly detailed discussion we're having here. I had a working solution 2 weeks ago, with near-zero complexity on the user side.

We've seen the cost of keeping this overly flexible. It's time to simplify. Here are the principles we need to lock in now:

Dynamic config behaviour must be opt-in, via a mixin or decorator. Most config classes will just be plain config dataclasses.

It's already opt-in, there is nothing to do when we don't want it, and nothing will happen if we don't use it. Everything else is implementation details.

Subclasses declare their own tag via something, a ClassVar[str], or decorator param. It doesn't really matter. Your choice, @jlamypoirier. But it must be self-contained and discoverable.

Already done, see the decorator implementation.

If a config field is missing, we use default_factory. If a dict is provided without a tag (it shouldn't), we either:

inject the default tag automatically (preferred), or

error clearly with a message explaining the missing tag (acceptable fallback).

That's already what we're doing.

No global subclass registries tied to CLI routing. CLI dispatch should be its own, explicit layer. Config structure should not bend to command-line ergonomics.

It is and always had been the opposite, the cli adapts to the config system. I updated the cli system according to what the config sytstem now supports, and your request to preserve the syntax

Thanks, @jlamypoirier. I want to respond clearly to your points:

"I had a working solution 2 weeks ago..."

The PR is still in draft, tests were failing non-stop until the last commit, and key review feedback -- including on default handling, tagging, and registration -- remains unresolved or blatantly ignored. There was no point at which this was merge-ready. It stalled until I pushed.

"There is nothing to do when we don't want [dynamic dispatch]..."

That's not what "opt-in" means. Right now, every config class implicitly enters a world with base-class registries, tag handling, and dynamic behaviours -- even if it doesn't use them. Every single config class pays this complexity tax. That's passive inclusion, not opt-in.

What I've asked for is explicit control: if a developer doesn't add a mixin or decorator, the config class behaves like a plain (config) dataclass. No registry, no tagging, no indirection. That's what "opt-in" means in a maintainable system. And I had described how this can work eons ago: #104 (comment)

"Already done..." / "That's already what we're doing."

If that were the case, this conversation wouldn't be happening. Are the default factories back? No? Well then.

“The CLI adapts to the config system..."

We shouldn't even be debating this. Why is there even coupling between the CLI commands and the config system? We have 3 or 4 commands right now. Give them one, hard-coded root config class each and call it a day.

We're deep in this because we didn't push for simplicity early. I'm asking for a course correction now, not to block you, but to unblock everyone else.

That's not what "opt-in" means. Right now, every config class implicitly enters a world with base-class registries, tag handling, and dynamic behaviours -- even if it doesn't use them. Every single config class pays this complexity tax. That's passive inclusion, not opt-in.

What I've asked for is explicit control: if a developer doesn't add a mixin or decorator, the config class behaves like a plain (config) dataclass. No registry, no tagging, no indirection. That's what "opt-in" means in a maintainable system. And I had described how this can work eons ago: #104 (comment)

Let's not debate about semantics. Nothing visible happens if the developer doesn't do anything, what happens in the background is irrelevant. Unless I'm missing something?

If that were the case, this conversation wouldn't be happening. Are the default factories back? No? Well then.

Default factories were never there for config classes. All I can do at this time is force them to be an exact match with the type hint, which is equivalent to them not being there. Anything more is not an option.

We shouldn't even be debating this. Why is there even coupling between the CLI commands and the config system? We have 3 or 4 commands right now. Give them one, hard-coded root config class each and call it a day.

I don't get what's wrong here. I can revert the changes if you want. But we have a perfect match between cli commands and the new dynamic config system, it would be weird to keep and maintain a separate infrastructure just because we want to.

Thanks for the update, @jlamypoirier. I don't want to drag this out further, so we'll move forward for now.

That said, I want to clearly record the unresolved concerns:

Every config class now implicitly participates in dynamic dispatch and registry logic, even when that complexity isn't needed. The cognitive overhead of the config system remains too high, and this PR made it worse. The sheer review burden here shows how far the design diverges from what most developers expect in a config system. We've burned over a week of senior time debating mechanics, and that's time we could have spent on models, training, or features.

We continue to build and maintain a bespoke config layer, which distracts from model development. A better choice would be to lean on mature solutions like Pydantic, which are battle-tested and solve 95% of our use cases with far less friction.

If the complexity tax of the current system continues to stall progress, I will propose a full migration to Hydra + OmegaConf + Pydantic.

For now, I'm prioritizing unblocking the team and moving forward.

tscholak · 2025-05-09T22:44:37Z

fast_llm/layers/transformer/config.py

    attention_factor: None | float = Field(
        default=None,
-        desc="Attention factor for yarn-type scaling.",


is this because the desc is not informative or because something has changed so that descriptions don't work anymore?

Attention factor is just repeating the field name, and for yarn-type scaling is now redundant with the type/class name. So the existing descriptions are completely useless.

ok, redundancy, I get it. But rather than dropping the description entirely, should we consider replacing it with something more informative? e.g., when this field matters, what values are valid, what happens if it's None, etc. Otherwise we're losing a chance to document intent.

I agree, but it's outside the scope of this PR.

Removing the desc is touching the field, so scope isn’t the issue. If we're already changing it, let's replace the description with something useful rather than delete it and lose intent. Otherwise please revert this hunk and keep the existing description until we can document it properly.

These descriptions were already irrelevant, so whether I'm removing them or leaving them as a repetition of the field name doesn't matter. And more importantly, I'm not familiar enough with these classes to make a proper description, feel free to suggest some.

tscholak · 2025-05-09T22:45:41Z

fast_llm/layers/transformer/config.py

    layers: list[TransformerSubLayerName] = Field(
-        default=None,
+        default=(TransformerSubLayerName.query, TransformerSubLayerName.value_),


Suggested change

default=(TransformerSubLayerName.query, TransformerSubLayerName.value_),

default_factory=lambda: [TransformerSubLayerName.query, TransformerSubLayerName.value_],

The tuple is fine, it gets transformed into a list automatically

I'm OK with the tuple if there's a documented guarantee that the config layer converts it automatically and that states the conversion rules clearly.

Right now, the behaviour isn't obvious at all. The type of this field is annotated as a list but initialized as a tuple. That's a straightforward type mismatch. I wasn't aware that coercion rules existed beyond serialized data, and I doubt anyone else is either. Does it always convert tuple to list? What happens with typing.Sequence or other container types?

Explicit is always better than implicit. Let's document these conversions clearly, or just avoid them entirely by using explicit defaults.

Yes, this will be part of the config documentation.

This reverts commit f79ed27.

This reverts commit 31579bd.

…lasses

…classes

oleksost · 2025-05-27T15:29:03Z

fast_llm/config.py

+
+        if expected_class is not None:
+            # Should be handled in `from_dict`, but can fail if instantiating directly.
+            Assert.is_(self.__class__, expected_class)


@jlamypoirier shouldn't this be something like Assert.custom(issubclass, expected_class, self.__class__) instead?

I.e. the dynamically inferred class (expected_class) should be a subclass of self.class, it should not necessarily match the class exactly?

The value is overridden in from_dict. Anyway this PR isn't too relevant anymore, I broke it into pieces

jlamypoirier added 2 commits April 30, 2025 13:05

Generalize config classes

4b606b0

cli

4a67660

tscholak mentioned this pull request May 2, 2025

combined columns to create a new field in prepare datasets #248

Closed

13 tasks

Merge branch 'main' into generalize_dynamic_classes

531f67d

jlamypoirier mentioned this pull request May 5, 2025

[Prototype] Sandbox for Implementation of generate and integration of lm_eval (evaluation harness) #222

Closed

25 tasks

misc

1823407

jlamypoirier added 4 commits May 7, 2025 14:10

stuff

fe7acd9

Merge remote-tracking branch 'origin/main' into generalize_dynamic_cl…

bee7a4b

…asses

stuff

d41be60

fixes

ec35a50

tscholak mentioned this pull request May 8, 2025

Combine GPTHuggingfaceDatasetConfig input sources into source_schema #255

Merged

20 tasks

jlamypoirier added 4 commits May 9, 2025 11:38

stuff

3005c8c

fix

5735d21

Merge remote-tracking branch 'origin/main' into generalize_dynamic_cl…

a7e7362

…asses

stuff

6357365

tscholak reviewed May 9, 2025

View reviewed changes

jlamypoirier added 6 commits May 12, 2025 13:09

stuff

207aef0

Bring back default_factory

31579bd

fix

f79ed27

Revert "fix"

0a37209

This reverts commit f79ed27.

Revert "Bring back default_factory"

897cc0f

This reverts commit 31579bd.

stuff

8a49e0f

fix

843a621

jlamypoirier mentioned this pull request May 14, 2025

Misc improvements and fixes #266

Merged

8 tasks

jlamypoirier added 4 commits May 14, 2025 17:02

stuff

aa3bc0b

stuff

28d321e

stuff

1bbd7fb

Merge branch 'misc' into generalize_dynamic_classes

a426450

jlamypoirier changed the base branch from main to misc May 14, 2025 21:50

jlamypoirier added 3 commits May 14, 2025 18:08

stuff

87e11f0

stuff

60a656e

Minimalistic dynamic configs

3595949

jlamypoirier mentioned this pull request May 14, 2025

Minimalistic dynamic configs #268

Merged

8 tasks

jlamypoirier added 2 commits May 14, 2025 20:42

stuff

39b1a04

Merge branch 'minimalistic_dynamic_classes' into generalize_dynamic_c…

f29b3fc

…lasses

jlamypoirier changed the base branch from misc to minimalistic_dynamic_classes May 15, 2025 14:45

jlamypoirier added 5 commits May 15, 2025 11:16

Simplify cli

743edaa

Simplify cli

038106f

Merge remote-tracking branch 'origin/main' into misc

d0e86cb

Merge branch 'misc' into minimalistic_dynamic_classes

3fa314c

Merge remote-tracking branch 'origin/main' into minimalistic_dynamic_…

7bb6ee9

…classes

Base automatically changed from minimalistic_dynamic_classes to main May 21, 2025 21:07

jlamypoirier added 5 commits May 27, 2025 10:53

Merge branch 'minimalistic_dynamic_classes' into simplify_cli

a98a2ae

Merge branch 'main' into simplify_cli

85e02b8

Merge remote-tracking branch 'origin/main' into simplify_cli

0fd49e4

Merge branch 'main' into generalize_dynamic_classes

0a010f3

Merge branch 'simplify_cli' into generalize_dynamic_classes

c415e32

jlamypoirier changed the base branch from main to simplify_cli May 27, 2025 14:59

fix

e199d0a

oleksost reviewed May 27, 2025

View reviewed changes

jlamypoirier changed the title ~~Generalize dynamic config classes~~ [Prototype] Generalize dynamic config classes Jun 4, 2025

Base automatically changed from simplify_cli to main June 12, 2025 22:14

	default=(TransformerSubLayerName.query, TransformerSubLayerName.value_),
	default_factory=lambda: [TransformerSubLayerName.query, TransformerSubLayerName.value_],

[Prototype] Generalize dynamic config classes #245

Are you sure you want to change the base?

[Prototype] Generalize dynamic config classes #245

Uh oh!

Conversation

jlamypoirier commented Apr 30, 2025

✨ Description

🔍 Type of change

Uh oh!

tscholak commented May 2, 2025

Uh oh!

jlamypoirier commented May 2, 2025

Uh oh!

jlamypoirier commented May 5, 2025

Uh oh!

tscholak commented May 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bigximik May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Features

🔧 Subclass Resolution Logic

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Regarding the first four points:

Regarding implicit Configurable registration

Regarding leaving the CLI interface intact

Note

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tscholak May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

bigximik May 15, 2025 •

edited

Loading

Regarding implicit `Configurable` registration

tscholak May 14, 2025 •

edited

Loading

oleksost May 27, 2025 •

edited

Loading