Add register_fields method #5219

brittonsmith · 2025-07-18T14:43:05Z

PR Summary

This allows users to define or override properties of on-disk fields, namely units, aliases, and display_name. This is particularly useful if a dataset contains extra fields not defined in a frontend for which the user would like to define units. The syntax is:

ds = yt.load(...)
ds.register_field(("enzo", "DinosaurDensity"), units="code_mass/code_length**3")

Happy to add docs and tests if this looks broadly acceptable.

PR Checklist

New features are documented, with docstrings and narrative docs
Adds a test for any bugs fixed. Adds tests for new features.

…erties.

matthewturk

Like it!

neutrinoceros · 2025-07-18T14:54:12Z

would it be possible to make the api ds.fields.register instead ? I'm not sure what type ds.fields is, but that's where I'd look first as a user, and Dataset is already very busy as far as attributes and methods go.

matthewturk · 2025-07-18T14:55:28Z

I think that might require that the index be created first? Which would get in the way. I do like it more, though.

brittonsmith · 2025-07-18T14:57:18Z

I think that might require that the index be created first? Which would get in the way. I do like it more, though.

Yes, this is correct. You need to call this before the field_list is populated.

neutrinoceros · 2025-07-18T15:05:13Z

so... maybe it should be a keyword argument to yt.load then ?

matthewturk · 2025-07-18T15:06:51Z

I think both would be good, but I think for clarity ds.register_fields is probably a bit cleaner than adding more to load.

brittonsmith · 2025-07-18T16:06:47Z

Unless I am mistaken, adding a keyword argument to load would be fairly invasive as I would have to make sure it is threaded through every frontend.

neutrinoceros · 2025-07-18T20:16:39Z

not necessarily ! It is true that adding a keyword argument to yt.load would prevent the introduction of a similarly-named keyword argument in any frontend, and we should be careful about not shadowing any existing one, but as long as we choose careffuly there is a design space there that we can exploit.

neutrinoceros · 2025-07-18T20:18:40Z

ds.register_fields is probably a bit cleaner

for completeness; I feel the opposite way: an API that can only be called between yt.load and the index being filled seems very brittle to me, and very likely to confuse users (especially in an interactive context, which we constantly encourage users towards)

brittonsmith · 2025-07-30T14:39:43Z

@neutrinoceros, you might have to lay this out for me because I'm not sure I see how a keyword arg can be added to yt.load in a simple way. If I understand this function correctly, load is choosing the appropriate (i.e., frontend-specific) subclass of Dataset, instantiating an instance of that, and returning it. If I look at, for example, EnzoDataset, it does not accept **kwargs, so how would I pass say a registered_fields keyword without adding it there, the baseclass, and every other frontend. The only other thing I can see to do is to insert functionality into load itself, which feels even less elegant.

I take your point about this being a somewhat brittle operation, but there is error checking in place to prevent this from being called incorrectly as well as documentation, so to me it is acceptable, and with the above point, the simpler solution.

neutrinoceros · 2025-07-30T14:47:55Z

Yes, I'm specifically suggesting that the logic live in yt.load (see for instance how the hint keyword argument is implemented). If the logic takes too much space, it can be written as a private helper function.

What matters most in my opinion is that we avoid taking shortcuts in designing the interface, because that part will need to be maintained forever. In contrast,
elegance of implementation details seems irrelevant to me, as long as it can be refactored later, and I don't see it any reason why it couldn't be.

brittonsmith · 2025-07-31T12:45:26Z

Ok, I understand now, thank you. To be honest, I'm not in favor of this solution. The hint keyword has an important place here as it aids in the process of figuring out the right frontend, i.e., one of the main functions of load. On the other hand, adding field registering here prevents the user from instantiating a frontend-specific dataset subclass directly. From an organization of duties perspective, I don't think this is where it belongs. From an implementation perspective, I think this also makes it more difficult to thread the logic into the field system. I'm not in favor of, for example, creating another global configuration object to query. We may need another perspective here.

neutrinoceros · 2025-07-31T12:48:30Z

adding field registering here prevents the user from instantiating a frontend-specific dataset subclass directly.

that's precisely what the hint argument was added to avoid. Are there other reasons to work around yt.load this way that I'm unaware of ?

matthewturk · 2025-07-31T13:37:26Z

I've been mulling this over, and I think I'm increasingly opposed (GASP, HORROR!) to adding more things to load. I don't think it's a bad place to do it, but I think we should take the approach that adding new things there that aren't meant for a single frontend can be challenging.

I genuinely think that adding it to the Dataset object may be the simplest way, and particularly because it mirrors the way we manage other fields. I also think that it might be possible to do this in a way that -- like adding gradient fields -- we override things appropriate. That should probably be a follow-up system.

In short, what I'm coming to think is that we are in a time when we're making some compromises about the existing field system that will stay within where it is (such as this) and those that seek to move it forward (such as #5226 ). I don't think this is such an invasive change that we should forestall it to a day when the field system is improved -- a day which I never thought before would come, but now which seems much more likely. So I think we should do this like it is, even if it's not where we are in agreement about an ideal.

neutrinoceros · 2025-07-31T14:43:31Z

To be clear, I'm never going to block any of this:

I'm not actually holding any stakes
if my argument isn't compelling already I doubt I'll be able to convince anyone

However, for what it's worth, I would like to explain where my reluctance is coming from:
I've been frustrated for years by how much responsibilities and interfaces are bound to the Dataset class. I think its current design can only be justified by historical reasons, which is another way of saying that it grew organically, without a clear definition of its scope. The heavy use of inheritance makes it even harder to maintain, so I would rather we avoid cramming even more stuff into this already overcrowded space (as a general guideline). I understand it's the easy way to do things now, and in some cases the only one, but I genuinely believe that's it's a negative feedback loop, so naturally I tend to struggle against it. Anyway, it is also possible that my strong feelings in that area are clouding my judgement, and as I said, I don't consider myself a mn actual stakeholder, so do as you see fit. I just needed this out my chest.

neutrinoceros · 2025-07-31T14:46:15Z

(In case it needs be said: this is in no way personal, and I genuinely respect everyone in this conversation. I'm sorry if it comes across any differently.)

brittonsmith · 2025-08-01T12:14:59Z

@neutrinoceros, thank you, I've not taken this discussion personally. I appreciate fleshing out the pros and cons of my proposed solution and your alternative. I think I understand your points and where you are coming from in terms of the bloat to the Dataset class, but I'm still of the opinion that putting this in Dataset is the lesser evil for where the code is now. It seems like at least @matthewturk is on board. Can we move forward with this then? I don't think it sinks us significantly deeper into technical debt against future refactoring.

matthewturk · 2025-08-04T19:32:23Z

I am inclined to thumbs up on this, but need another day to think.

neutrinoceros · 2025-08-05T07:26:14Z

Overall I'm -0.1 on this: I'm leaning against it, but I'm not invested enough that my vote should be blocking.

Add register_fields method to register or override on-disk field prop…

cb1d649

…erties.

brittonsmith added the enhancement Making something better label Jul 18, 2025

matthewturk previously approved these changes Jul 18, 2025

View reviewed changes

Fix typo.

4e1b705

brittonsmith dismissed matthewturk’s stale review via 4e1b705 July 18, 2025 14:51

brittonsmith added 3 commits July 18, 2025 16:22

Add error message if register_field cannot be called.

e561e7b

Add docs.

2a9a98e

Add a test.

3cb6d64

chrishavlin added the workshop-2025 label Jul 21, 2025

Add register_fields method #5219

Are you sure you want to change the base?

Add register_fields method #5219

Conversation

brittonsmith commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

PR Checklist

Uh oh!

matthewturk left a comment

Choose a reason for hiding this comment

Uh oh!

neutrinoceros commented Jul 18, 2025

Uh oh!

matthewturk commented Jul 18, 2025

Uh oh!

brittonsmith commented Jul 18, 2025

Uh oh!

neutrinoceros commented Jul 18, 2025

Uh oh!

matthewturk commented Jul 18, 2025

Uh oh!

brittonsmith commented Jul 18, 2025

Uh oh!

neutrinoceros commented Jul 18, 2025

Uh oh!

neutrinoceros commented Jul 18, 2025

Uh oh!

brittonsmith commented Jul 30, 2025

Uh oh!

neutrinoceros commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brittonsmith commented Jul 31, 2025

Uh oh!

neutrinoceros commented Jul 31, 2025

Uh oh!

matthewturk commented Jul 31, 2025

Uh oh!

neutrinoceros commented Jul 31, 2025

Uh oh!

neutrinoceros commented Jul 31, 2025

Uh oh!

brittonsmith commented Aug 1, 2025

Uh oh!

matthewturk commented Aug 4, 2025

Uh oh!

neutrinoceros commented Aug 5, 2025

Uh oh!

Uh oh!

brittonsmith commented Jul 18, 2025 •

edited

Loading

neutrinoceros commented Jul 30, 2025 •

edited

Loading