Skip to content

[Prototype] Generalize dynamic config classes #245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

jlamypoirier
Copy link
Collaborator

✨ Description

Fix: #126

Generalize the concept of dynamic config class from the dataset mechanism to all config class. I opted for a unique global registry of all config classes. This choice is motivated by simplicity (multiple registries would be hell to manage), though it means registry keys need to be globally unique so they'll tend to be longer. Config classes are now specified through the type field, using the config class name or its shorthand without the Config suffix.

Included a proposal to simplify the cli, so the runnable is selected through its type field, ex. fast-llm type=GPTTrainer ... instead of fast-llm train gpt.

Added backward compatibility for the old dataset dynamic type names.

TODO:

  • Add backward compatibility for the cli?
  • The type field conflicts with normalization, rotary and peft types. These are good use cases for dynamic classes so I'll implement, but I'll wait after Review the architecture split #237.
  • Some other potential use cases in the optimizer (Lr schedule), but no need to do in this PR.

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

@tscholak
Copy link
Collaborator

tscholak commented May 2, 2025

looks great so far!

fast-llm type=GPTTrainer is principled (because it taps into the override logic) but ugly (because spelling out type= is mandatory and because it's using class names as values). I think the better approach would be to still use the subcommand trick and force that commands set type to a str.

about the global registry thing. another approach would be to outsource the functionality to a mixin class that can be optionally added to a config class. that way we can have more control over where this feature is used, and it would also for having different registries for different config base classes.

@jlamypoirier
Copy link
Collaborator Author

fast-llm type=GPTTrainer is principled (because it taps into the override logic) but ugly (because spelling out type= is mandatory and because it's using class names as values). I think the better approach would be to still use the subcommand trick and force that commands set type to a str.

It would be a bit of a hack, but I can make it work.

about the global registry thing. another approach would be to outsource the functionality to a mixin class that can be optionally added to a config class. that way we can have more control over where this feature is used, and it would also for having different registries for different config base classes.

I considered several alternatives, but they all tend to run into trouble once we start dealing with more than just a few registries. The user has to think about what registry a class belongs to when setting the name, and make sure to pick a unique name for that one, so the benefit of having shorter names would come at a cost to developers. Also there is the issue of classes potentially belonging to multiple registries, which can be solved at the price of extra complexity and varying type field name, which makes things even worse.

I tried to think of a developer-friendly solution for a while but ended up failing to, so went for a single global registry instead. It's good enough because the validation mechanism ensures the selected class has the expected base class. The only downsides are the less user-friendly names and the impossibility to restrict which subclasses are allowed for a given field.

@jlamypoirier
Copy link
Collaborator Author

@tscholak I started working on more dynamic classes and realized user-friendly names are essential. So I implemented a simple solution where each class can have its own registry, and the class is selected by iterating through the registries in the MRO. We still build a Config registry from class names, and others can be created with a simple register call.

So we get neat names without adding complexity on the developer side. The only real missing piece would some way for users to find out the available types, we should add it to automatic config documentation once we have it.

@tscholak
Copy link
Collaborator

tscholak commented May 6, 2025

yes, this is great!

fast_llm/cli.py Outdated
# TODO: Remove backward compatibility.
if len(args) >= 2 and args[0] == "train":
if args[1] == "gpt":
args = ["type=train_gpt"] + args[2:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with regards to deprecating the fast-llm train command syntax. I'm not super confident that we ever will.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPTSampledDatasetConfig.register_subclass("concatenated_memmap", GPTConcatenatedMemmapConfig)
GPTSampledDatasetConfig.register_subclass("fim", GPTFimSampledDatasetConfig)
GPTSampledDatasetConfig.register_subclass("legacy", GPTLegacyDatasetConfig)
GPTSampledDatasetConfig.register_subclass("test_slow", GPTTestSlowDatasetConfig)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so this is for registering user-friendly names or the classes themselves?
can we make registration automatic? didn't we want to use the type classvar for that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type is for the instance field. We could add other classvars, but it wouldn't be ideal because we'd still need to distinguish which base class it belongs to (ex. one classvar for each base class), and we may want to register multiple user-friendly names. Also It adds the magic of the classvar being converted to dynamic type name. Overall the external register call seems better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by this answer. I was expecting something like this:

If a config class defines/overrides the type classvar field, it automatically becomes a registered dynamic class with the value of the type field becoming its human-readable name, which will be used for serialization and deserialization.

Example:

@config_class
class MainConfig:
    foo: FooConfig

@config_class
class FooConfig:
    type: typing.ClassVar[str]

class BarConfig(FooConfig)
    type = "bar"
    something: int

class BazConfig(FooConfig)
    type = "baz"
    something_else: int

Serialized version:

foo:
  type: baz
  something_else: 5

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unless we can show a concrete need for multiple human-readable names per class or for registering the same subclass under multiple base classes (imho both extremely niche), the explicit register_subclass(...) pattern adds too much noise and duplication.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type is a field for instances https://github.com/ServiceNow/Fast-LLM/pull/245/files#diff-15aab2503b9a097809b7fa450b3f63ea0db5d700982d7265610fdf28b476788aR989. What you are referring to is an equivalent to the type_ classvar we had before https://github.com/ServiceNow/Fast-LLM/pull/245/files#diff-94ac1cad97ec073f07a7d6ead2489610430a6fb40aa3e34d520b86bd88e6c9fcL102. These two can't be the same.

Adding back the same type_ variable is no longer an option, because all base classes have a registry and we don't know which one is concerned. So we'd need some adjustments, I can think of a few options: a registry_class classvar, separate type variables (ex. gpt_dataset_type), or a more complex type: typing.ClassVar[tuple[tuple[type[Config], str], ...]]. None of these look better than register_subclass.

I think we do want to support multiple names and/or base classes. Multiple names may be needed for backward compatibility, or to simplify usage. (The final saved config use the full class name either way, so that's just convenience.) And multiple base classes is useful for specialization, for example GPTTrainerConfig should be registered as train_gpt as a RunnableConfig, but can take the shorter name gpt as a TrainerConfig. Concretely this allows the fast-llm train gpt syntax.

Copy link
Contributor

@bigximik bigximik May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a decorator looks very good. However, why can't we use one global registry for all Config classes?
Names should be unique per class hierarchy, conflicts will rise on register (not implemented in example below).

Below is an example using a global ConfigRegistry, which supports:

  • Command runners
  • Typed class hierarchies (models, trainers, evaluators, etc.)
  • Instantiating proper subclasses from YAML
  • Auto-dispatching based on type: keys in config

The system includes demo configs and commands at the end to show how it works in practice.

Features

  • Decorator for easy class registration: @ConfigRegistry.register("name", implementation_class)
  • Automatic resolution of subclassed dataclasses from YAML with type field
  • Reusable base logic for any component: models, runners, evaluators, checkpoints, etc.
  • Command line integration with flexible structure

🔧 Subclass Resolution Logic

In this design, the "base class" used to resolve registered subclasses is determined by context:

  • ✅ In nested dataclass fields (e.g., model: ModelConfig), the base class is what's declared in the field's type annotation.

  • ✅ In top-level commands or manual use, the base class is explicitly passed to from_dict(base_cls, data) or ConfigRegistry.get(type_name, base_cls).

This allows the same global registry to work across unrelated class hierarchies (e.g., ModelConfig, TrainerConfig, EvaluatorConfig), because each lookup is scoped to its specific base class.

What do you think?

from dataclasses import is_dataclass, fields, dataclass
from typing import get_args, get_origin, Union, Any
import yaml
import types

# ---------------- CONFIG REGISTRY MACHINERY (SIMPLIFIED) ----------------

@dataclass
class Config:
    # Each Config can optionally define a "configurable" runtime class it instantiates
    configurable_class = None

    def get_configurable(self):
        # Returns an instance of the associated runtime class using self as config
        if self.configurable_class is None:
            return None
        return self.configurable_class(self)


class ConfigRegistry:
    # Global registry mapping friendly names (like "gpt", "train") to config classes
    _table: dict[str, set[type]] = {}

    @classmethod
    def register(cls, name: str | None = None, configurable_class: type | None = None):
        # Decorator to register a Config class under a name and link to its runtime class
        def deco(target: Config):
            key = name or target.__name__
            target.configurable_class = configurable_class
            cls._table.setdefault(key, set()).add(target)
            return target
        return deco

    @classmethod
    def get(cls, name: str, base: type) -> type:
        # Retrieve the config class registered under `name` that is a subclass of `base`
        for cand in cls._table.get(name, ()):
            if issubclass(cand, base):
                return cand
        raise KeyError(f"No registered class named '{name}' that is subclass of {base.__name__}")


def from_dict(cls: type, raw: dict | list | Any) -> Any:
    """Recursively convert a dict (or list) into a dataclass instance."""
    if is_dataclass(cls) and isinstance(raw, dict):
        init_kwargs = {}
        for f in fields(cls):
            field_value = raw.get(f.name)
            field_type = f.type

            if field_value is None:
                init_kwargs[f.name] = None
                continue

            init_kwargs[f.name] = _convert_value(field_value, field_type)

        return cls(**init_kwargs)

    return _convert_value(raw, cls)


def _convert_value(value: Any, target_type: type) -> Any:
    # Handles typing annotations like list[T], dict[K, V], Union, and dataclasses
    origin = get_origin(target_type)
    args = get_args(target_type)

    if origin is list and isinstance(value, list):
        return [_convert_value(v, args[0]) for v in value]

    elif origin is dict and isinstance(value, dict):
        return {_convert_value(k, args[0]): _convert_value(v, args[1]) for k, v in value.items()}

    elif is_dataclass(target_type) and isinstance(value, dict):
        # If a 'type' key is present, use it to find a registered subclass
        typ_name = value.pop("type", None)
        if typ_name:
            subclass = ConfigRegistry.get(typ_name, target_type)
            if subclass:
                return from_dict(subclass, value)
        return from_dict(target_type, value)

    elif origin is types.UnionType:
        # Handle Optional[T] (which is Union[T, None])
        non_none_args = [arg for arg in args if arg is not type(None)]
        if len(non_none_args) == 1:
            return _convert_value(value, non_none_args[0])

    return value


# ---------------- MODEL CLASSES ----------------

@dataclass
class CheckPointConfig(Config):
    path: str


@dataclass
class GPTCheckPointConfig(CheckPointConfig):
    pass


@dataclass
class SSMCheckPointConfig(CheckPointConfig):
    pass


@ConfigRegistry.register("llama")
@dataclass
class LlamaCheckPointConfig(GPTCheckPointConfig):
    pass


@ConfigRegistry.register("mamba")
@dataclass
class MambCheckPointConfig(SSMCheckPointConfig):
    pass


# Model runtime classes (used during execution)
class Model: pass
class GPTModel(Model): pass
class SSMModel(Model): pass


@dataclass
class ModelConfig(Config):
    checkpoint: CheckPointConfig | None = None

    def get_model(self):
        return self.get_configurable()


@ConfigRegistry.register("gpt", GPTModel)
@dataclass
class GPTModelConfig(ModelConfig):
    pass


@ConfigRegistry.register("ssm", SSMModel)
@dataclass
class SSMModelConfig(ModelConfig):
    pass


# ---------------- TRAINER CLASSES ----------------

@dataclass
class TrainerConfig:
    train_iters: int


# ---------------- EVALUATOR CLASSES ----------------

class Evaluator: pass
class EvaluatorLoss(Evaluator): pass
class EvaluatorLmEval(Evaluator): pass


@dataclass
class EvaluatorConfig(Config):
    def get_evaluator(self):
        return self.get_configurable()


@ConfigRegistry.register("loss", EvaluatorLoss)
@dataclass
class EvaluatorLossConfig(EvaluatorConfig):
    pass


@ConfigRegistry.register("lm_eval", EvaluatorLmEval)
@dataclass
class EvaluatorLmEvalConfig(EvaluatorConfig):
    pass


# ---------------- RUNNER CLASSES ----------------

class Runner:
    def __init__(self, cfg):
        self.cfg = cfg

    def run(self):
        raise NotImplementedError


class EvaluateRunner(Runner):
    def run(self):
        print("Running evaluation for", self.cfg.model.__class__.__name__)
        print("Checkpoint is", self.cfg.model.checkpoint.__class__.__name__)
        for name, ev in self.cfg.evaluations.items():
            print(f" • {name}: {ev.__class__.__name__}")


class TrainRunner(Runner):
    def run(self):
        print("Running Training for", self.cfg.model.__class__.__name__)
        print("Checkpoint is", self.cfg.model.checkpoint.__class__.__name__)
        print("Iterations:", self.cfg.trainer.train_iters)


@dataclass
class RunnerConfig(Config):
    def get_runner(self):
        return self.get_configurable()


@ConfigRegistry.register("evaluate", EvaluateRunner)
@dataclass
class EvaluateConfig(RunnerConfig):
    model: ModelConfig
    evaluations: dict[str, EvaluatorConfig]


@ConfigRegistry.register("train", TrainRunner)
@dataclass
class TrainConfig(RunnerConfig):
    model: ModelConfig
    trainer: TrainerConfig


# ---------------- CLI ENTRYPOINT ----------------

class Command:
    @staticmethod
    def run(argv: list[str]):
        if len(argv) != 2:
            raise SystemExit("Usage: <command> <yaml-string>")

        cmd_name, yaml_blob = argv
        raw_cfg = yaml.safe_load(yaml_blob)

        # Look up the appropriate config class from registry
        cmd_cfg_cls = ConfigRegistry.get(cmd_name, RunnerConfig)

        # Parse YAML to a typed dataclass
        cmd_cfg = from_dict(cmd_cfg_cls, raw_cfg)

        # Instantiate and run the corresponding Runner
        cmd_cfg.get_configurable().run()


# ---------------- DEMO ----------------

evaluate_command_args = """
model:
    type: gpt
    checkpoint:
        type: llama3
        path: /path/to/model
        
evaluations:
    stack3b:
        type: lm_eval
    fineweb:
        type: loss
"""

args = ["evaluate", evaluate_command_args]
Command.run(args)


train_command_args = """
model:
    type: ssm
    checkpoint:
        type: mamba
        path: /path/to/model
trainer:
    train_iters: 1000
"""

args = ["train", train_command_args]
Command.run(args)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bigximik thanks for the suggestion. It's actually very close to my initial implementation, without the implicit registration. But I moved away from it following discussions here and other concerns that arose when trying it out in existing dynamic classes:

  • This is opt-in from the developer perspective, but always enabled for the user. I like that but we got an explicit request to do that [Prototype] Generalize dynamic config classes #245 (comment). There is also an explicit request to have multiple registries
  • Name clashes are too much of a problem. We can have a "gpt" model, but also a "gpt" trainer, a "gpt" dataset, a "gpt" huggingface model, etc. With a single global register, this is not possible and we need longer names to properly disambiguate, at which point it's basically the same as the class name. So the only viable option is to use the class name itself.
  • That means long and annoying names. We can't just write "default" or "llama3" for rotary embeddings, we have to provide redundant information, ex. "Llama3RotaryConfig" or an adjusted form like "llama3_rotary".
  • It's also not great for backward compatibility, since we'd have to deprecate all existing human-friendly names.
  • Concerning the cli part, it goes against the request to preserve the syntax. Your suggestion solves the "train" part (it's basically the same as mine) but leaves out the "gpt" part, so we'd still need to dispatch the actual trainer dynamically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlamypoirier Thanks for the comments!

Regarding the first four points:

Name conflicts aren't a concern from my point of view when using a global registry — the registry supports multiple classes registered under the same name. Real name conflicts can occur if more than one class is a subclass of the required base at resolution time. However, the same issue would occur even if separate registries were used in the decorator. These conflicts can still be detected and reported. Yes, conflict detection is deferred from decorator application to resolution time, but it's still caught early — e.g., during a dry run or config loading.

This approach ensures the base class (i.e., the separate registry) needs to be specified only once, and only in the place it actually matters — such as in the config field type — which improves maintainability.

To my understanding, this also aligns with an opt-in model.

Perhaps @tscholak can clarify whether this satisfies the multiple registry and opt-in requirement?

Regarding implicit Configurable registration

We can explicitly separate the two decorators: one for configurable registration, and one for registering user-friendly names. Or we can keep it as it is now for configurable registration.

Regarding leaving the CLI interface intact

We can keep the CLI interface unchanged, so it remains fast-llm train gpt --config ..., still using a single global registry. Actually we can have both, but i still would advocate for a new one which is fast-llm train --config ...

@dataclass
class RunnerConfig(Config):
    def get_runner(self):
        return self.get_configurable()


@ConfigRegistry.register("evaluate", EvaluateRunner)
@dataclass
class EvaluateConfig(RunnerConfig):
    model: ModelConfig
    evaluations: dict[str, EvaluatorConfig]


@ConfigRegistry.register("train", TrainRunner)
@dataclass
class TrainConfig(RunnerConfig):
    model: ModelConfig
    trainer: TrainerConfig


@ConfigRegistry.register("evaluate_gpt", EvaluateRunner)
@dataclass
class GPTEvaluateConfig(RunnerConfig):
    model: GPTModelConfig
    evaluations: dict[str, EvaluatorConfig]


@ConfigRegistry.register("train_ssm", TrainRunner)
@dataclass
class SSMTrainConfig(RunnerConfig):
    model: SSMModelConfig
    trainer: TrainerConfig


# ---------------- CLI ENTRYPOINT ----------------


class Command:
    @staticmethod
    def run(argv: list[str]):
        if len(argv) != 3 and len(argv) != 2:
            raise SystemExit("Usage: <command> <model> <yaml-string> or <command> <yaml-string>")

        if len(argv) == 3:
            cmd_name, model_name, yaml_blob = argv
        else:
            cmd_name, yaml_blob = argv
            model_name = None
        raw_cfg = yaml.safe_load(yaml_blob)

        if  model_name is None:
            full_command_name = cmd_name
        else:
            full_command_name = f'{cmd_name}_{model_name}'

        # Look up the appropriate config class from registry
        print(full_command_name)
        print(ConfigRegistry._table)
        cmd_cfg_cls = ConfigRegistry.get(full_command_name, RunnerConfig)


        # Parse YAML to a typed dataclass
        cmd_cfg = from_dict(cmd_cfg_cls, raw_cfg)

        # Instantiate and run the corresponding Runner
        cmd_cfg.get_configurable().run()


# ---------------- DEMO ----------------

evaluate_command_args = """
model:
    checkpoint:
        type: llama3
        path: /path/to/model
        
evaluations:
    stack3b:
        type: lm_eval
    fineweb:
        type: loss
"""

args = ["evaluate", "gpt", evaluate_command_args]
Command.run(args)


train_command_args = """
model:
    checkpoint:
        type: mamba
        path: /path/to/model
trainer:
    train_iters: 1000
"""

args = ["train", "ssm", train_command_args]
Command.run(args)


train_command_args = """
model:
    type: ssm
    checkpoint:
        type: mamba
        path: /path/to/model
trainer:
    train_iters: 1000
"""

args = ["train", train_command_args]
Command.run(args)

Note

My example is just a demo to demonstrate the concept. It doesn't implement type field injection during config serialization or other advanced features already handled by the config classes nor it autocompletion friendly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same kind of opt-in that got rejected here. All base classes potentially participate in the dynamic dispatch, depending on what their derived classes may do. I don't have a problem with it, but @tscholak specifically requested enabling dynamic dispatch for specific base classes only.

I'm not a fan of having a single registry with duplicate names. It works , but looks potentially confusing. I prefer an explicit specification of what base class is allowed for a given registration, it's easier to reason with and document, e.x we can make clear and explicit lists of allowed types for the base classes we care about for dynamic dispatch.

As for the cli part, your implementation would be difficult to run in practice because the actual model/trainer class is specified through inheritance rather than composition, and changing that would be really difficult. Once we take that into account, we basically get the implementation I'm proposing (see #269), independently of the actual registration mechanism.

Anyway, I've spent enough time on this an would rather move on to other tasks. #268 is fully functional and satisfies all requirements as far as I know, so I'd rather not change it anymore unless there is a very good reason to do so.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense @jlamypoirier

@@ -349,30 +338,23 @@ class TrainerConfig(PretrainedFastLLMModelConfig, ExperimentConfig):
_abstract = True
# TODO: Generalize data, schedule, logging, etc.
training: TrainingConfig = Field(
default_factory=TrainingConfig,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mentioned in the other PR that this is necessary now, but I don't quite understand why. if you've got several classes of type TrainingConfig, you want to have a straightforward way to say which one is the default. that's pretty much it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem was all sorts of potential issues happening if not going through _from_dict, not necessarily related to this PR (ex. dataclasses doing the wrong thing when calling it for nested configs). But now that I enforced going through it (with the metaclass __call__), I think I can safely bring it back.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this won't work without some hacking around. Problem is we don't just want to set a default, we want to set a default to the nested config's type field even is a value is provided. And we can't just set a default to type because it would propagate to subclasses and cause all sorts of trouble.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, though we still need a clean, consistent way to specify default values for config fields.

field: SomeConfig = Field(default_factory=SomeConcreteConfig)

should just work. If SomeConcreteConfig is a subclass in dynamic dispatch / tagged union style setup, then instantiating it directly should still produce a valid config object. The fact that it has a type or tag field shouldn't block the default from being usable. If it's not currently reliable, that's a sign that we're over-complicating the dispatch mechanism than a reason to give up on defaults.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would work perfectly fine, just not in a useful way, and it would be a ton of trouble.

default_factory (and default) is for when a value is not provided. This means it is (and should be) ignored whenever anything is passed, including an empty dict. So it's not suitable for setting the default field type, because setting any value will undo it.

On top of that, the config mechanism doesn't distinguish whether nested fields are passed explicitly or just empty. So some defaults may lead to incorrect serialization paths, ex: {} --(instantiate)-> Config({}) --(serialize)-> [MISSING] --(instantiate)-> Config([DEFAULT]). (in fact this happens for all nested fields, not just configs, but configs are the biggest risk)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent many hours this week diving into this PR and its discussions to understand the broader impact. I want to be crystal-clear about where we stand:

Fast-LLM is an model training framework, not a configuration R&D project.

Yet the config subsystem keeps burning days of senior time. That's time we're not spending on training models, fixing bugs, or shipping features. We've now burned more than a week on a problem Pydantic solves out of the box. That's not sustainable.

Every extra layer of custom logic raises the bar for contributors, concentrates knowledge in one person, and stalls the real work: training models and shipping features. As the team lead, I have to call it what it is: it's an operational risk.

We need to reverse this pattern. The config system should get out of our way, not pull the whole team into its gravity well.

And no, this isn't about shutting down ideas. It's about protecting this team from complexity creep that slows everyone down.

As I mentioned multiple times already, I do care a lot about it, clear and concise configs are essential to me for development, testing, debugging, understanding bug reports, etc.

This illustrates the underlying issue: the current design optimizes for what the framework maintainer finds convenient for himself, but not for what contributors actually need when developing models or features.

Now, let's look at what works and doesn't work in Pydantic:

from pydantic import BaseModel, Field, ValidationError
from typing import Literal, Union

class Foo(BaseModel):
    type: Literal['foo'] = 'foo'
    x: int = 1

class Bar(BaseModel):
    type: Literal['bar'] = 'bar'
    y: int = 2

Item = Union[Foo, Bar]

class Wrapper(BaseModel):
    item: Item = Field(default_factory=Foo, discriminator='type')

print(Wrapper().model_dump())  # works, default_factory runs, uses Foo()
# print(Wrapper(item={}))  # doesn't work, validation fails because of missing discriminator
# print(Wrapper(item={'x': 42}))  # doesn't work, validation fails because of missing discriminator
print(Wrapper(item={'type': 'bar', 'y': 99}).model_dump())  # works, dispatches to Bar

This setup gets most things right:

  • Dynamic dispatch is opt-in.
  • The discriminator is explicit and user-controllable.
  • Defaults work when the field is absent.
  • There is no magic registry.

Does it handle every edge case? No. But it works well for the common case. And that's the bar Fast-LLM needs to meet.

Right now, we're debating edge cases that haven't materialized (e.g., sub-subclassing edge-cases should be rare, and if we're running into them, that's a sign to simplify the class hierarchy and not to complicate things further.) Meanwhile, the team is stuck. That's the wrong tradeoff.

In hindsight, we should have adopted Pydantic from the start. But since we're here now, and deadlines are real, we need to course-correct pragmatically.

We've seen the cost of keeping this overly flexible. It's time to simplify. Here are the principles we need to lock in now:

  1. Dynamic config behaviour must be opt-in, via a mixin or decorator. Most config classes will just be plain config dataclasses.

  2. Subclasses declare their own tag via something, a ClassVar[str], or decorator param. It doesn't really matter. Your choice, @jlamypoirier. But it must be self-contained and discoverable.

  3. If a config field is missing, we use default_factory. If a dict is provided without a tag (it shouldn't), we either:

    • inject the default tag automatically (preferred), or
    • error clearly with a message explaining the missing tag (acceptable fallback).
  4. No global subclass registries tied to CLI routing. CLI dispatch should be its own, explicit layer. Config structure should not bend to command-line ergonomics.

These constraints exist to make Fast-LLM maintainable and usable by the whole team. The bar is not "can @jlamypoirier make this work." It's "can any contributor pick this up, understand it, and extend it easily."

That's what makes this framework sustainable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be clear, the time we're taking is due to the overly detailed discussion we're having here. I had a working solution 2 weeks ago, with near-zero complexity on the user side.

We've seen the cost of keeping this overly flexible. It's time to simplify. Here are the principles we need to lock in now:

  1. Dynamic config behaviour must be opt-in, via a mixin or decorator. Most config classes will just be plain config dataclasses.

It's already opt-in, there is nothing to do when we don't want it, and nothing will happen if we don't use it. Everything else is implementation details.

  1. Subclasses declare their own tag via something, a ClassVar[str], or decorator param. It doesn't really matter. Your choice, @jlamypoirier. But it must be self-contained and discoverable.

Already done, see the decorator implementation.

  1. If a config field is missing, we use default_factory. If a dict is provided without a tag (it shouldn't), we either:

    • inject the default tag automatically (preferred), or
    • error clearly with a message explaining the missing tag (acceptable fallback).

That's already what we're doing.

  1. No global subclass registries tied to CLI routing. CLI dispatch should be its own, explicit layer. Config structure should not bend to command-line ergonomics.

It is and always had been the opposite, the cli adapts to the config system. I updated the cli system according to what the config sytstem now supports, and your request to preserve the syntax

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jlamypoirier. I want to respond clearly to your points:

"I had a working solution 2 weeks ago..."

The PR is still in draft, tests were failing non-stop until the last commit, and key review feedback -- including on default handling, tagging, and registration -- remains unresolved or blatantly ignored. There was no point at which this was merge-ready. It stalled until I pushed.

"There is nothing to do when we don't want [dynamic dispatch]..."

That's not what "opt-in" means. Right now, every config class implicitly enters a world with base-class registries, tag handling, and dynamic behaviours -- even if it doesn't use them. Every single config class pays this complexity tax. That's passive inclusion, not opt-in.

What I've asked for is explicit control: if a developer doesn't add a mixin or decorator, the config class behaves like a plain (config) dataclass. No registry, no tagging, no indirection. That's what "opt-in" means in a maintainable system. And I had described how this can work eons ago: #104 (comment)

"Already done..." / "That's already what we're doing."

If that were the case, this conversation wouldn't be happening. Are the default factories back? No? Well then.

“The CLI adapts to the config system..."

We shouldn't even be debating this. Why is there even coupling between the CLI commands and the config system? We have 3 or 4 commands right now. Give them one, hard-coded root config class each and call it a day.

We're deep in this because we didn't push for simplicity early. I'm asking for a course correction now, not to block you, but to unblock everyone else.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what "opt-in" means. Right now, every config class implicitly enters a world with base-class registries, tag handling, and dynamic behaviours -- even if it doesn't use them. Every single config class pays this complexity tax. That's passive inclusion, not opt-in.

What I've asked for is explicit control: if a developer doesn't add a mixin or decorator, the config class behaves like a plain (config) dataclass. No registry, no tagging, no indirection. That's what "opt-in" means in a maintainable system. And I had described how this can work eons ago: #104 (comment)

Let's not debate about semantics. Nothing visible happens if the developer doesn't do anything, what happens in the background is irrelevant. Unless I'm missing something?

If that were the case, this conversation wouldn't be happening. Are the default factories back? No? Well then.

Default factories were never there for config classes. All I can do at this time is force them to be an exact match with the type hint, which is equivalent to them not being there. Anything more is not an option.

We shouldn't even be debating this. Why is there even coupling between the CLI commands and the config system? We have 3 or 4 commands right now. Give them one, hard-coded root config class each and call it a day.

I don't get what's wrong here. I can revert the changes if you want. But we have a perfect match between cli commands and the new dynamic config system, it would be weird to keep and maintain a separate infrastructure just because we want to.

Copy link
Collaborator

@tscholak tscholak May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, @jlamypoirier. I don't want to drag this out further, so we'll move forward for now.

That said, I want to clearly record the unresolved concerns:

Every config class now implicitly participates in dynamic dispatch and registry logic, even when that complexity isn't needed. The cognitive overhead of the config system remains too high, and this PR made it worse. The sheer review burden here shows how far the design diverges from what most developers expect in a config system. We've burned over a week of senior time debating mechanics, and that's time we could have spent on models, training, or features.

We continue to build and maintain a bespoke config layer, which distracts from model development. A better choice would be to lean on mature solutions like Pydantic, which are battle-tested and solve 95% of our use cases with far less friction.

If the complexity tax of the current system continues to stall progress, I will propose a full migration to Hydra + OmegaConf + Pydantic.

For now, I'm prioritizing unblocking the team and moving forward.

attention_factor: None | float = Field(
default=None,
desc="Attention factor for yarn-type scaling.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this because the desc is not informative or because something has changed so that descriptions don't work anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attention factor is just repeating the field name, and for yarn-type scaling is now redundant with the type/class name. So the existing descriptions are completely useless.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, redundancy, I get it. But rather than dropping the description entirely, should we consider replacing it with something more informative? e.g., when this field matters, what values are valid, what happens if it's None, etc. Otherwise we're losing a chance to document intent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but it's outside the scope of this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the desc is touching the field, so scope isn’t the issue. If we're already changing it, let's replace the description with something useful rather than delete it and lose intent. Otherwise please revert this hunk and keep the existing description until we can document it properly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These descriptions were already irrelevant, so whether I'm removing them or leaving them as a repetition of the field name doesn't matter. And more importantly, I'm not familiar enough with these classes to make a proper description, feel free to suggest some.

layers: list[TransformerSubLayerName] = Field(
default=None,
default=(TransformerSubLayerName.query, TransformerSubLayerName.value_),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default=(TransformerSubLayerName.query, TransformerSubLayerName.value_),
default_factory=lambda: [TransformerSubLayerName.query, TransformerSubLayerName.value_],

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tuple is fine, it gets transformed into a list automatically

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with the tuple if there's a documented guarantee that the config layer converts it automatically and that states the conversion rules clearly.

Right now, the behaviour isn't obvious at all. The type of this field is annotated as a list but initialized as a tuple. That's a straightforward type mismatch. I wasn't aware that coercion rules existed beyond serialized data, and I doubt anyone else is either. Does it always convert tuple to list? What happens with typing.Sequence or other container types?

Explicit is always better than implicit. Let's document these conversions clearly, or just avoid them entirely by using explicit defaults.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will be part of the config documentation.

@jlamypoirier jlamypoirier mentioned this pull request May 14, 2025
8 tasks
@jlamypoirier jlamypoirier changed the base branch from main to misc May 14, 2025 21:50
@jlamypoirier jlamypoirier mentioned this pull request May 14, 2025
8 tasks
@jlamypoirier jlamypoirier changed the base branch from misc to minimalistic_dynamic_classes May 15, 2025 14:45
Base automatically changed from minimalistic_dynamic_classes to main May 21, 2025 21:07
@jlamypoirier jlamypoirier changed the base branch from main to simplify_cli May 27, 2025 14:59

if expected_class is not None:
# Should be handled in `from_dict`, but can fail if instantiating directly.
Assert.is_(self.__class__, expected_class)
Copy link
Contributor

@oleksost oleksost May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jlamypoirier shouldn't this be something like Assert.custom(issubclass, expected_class, self.__class__) instead?

I.e. the dynamically inferred class (expected_class) should be a subclass of self.class, it should not necessarily match the class exactly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value is overridden in from_dict. Anyway this PR isn't too relevant anymore, I broke it into pieces

@jlamypoirier jlamypoirier changed the title Generalize dynamic config classes [Prototype] Generalize dynamic config classes Jun 4, 2025
Base automatically changed from simplify_cli to main June 12, 2025 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feat] Generalize dynamic config classes
5 participants