Skip to content

Conversation

brittonsmith
Copy link
Member

The field info container is notably brittle in that either a dataset's index or field_list must be accessed/created first before it can come into existence. This PR makes Dataset.field_info into a property allowing it to be created on first access. Additionally, we now poke the index at the start of create_field_info to allow this to be run before the field_list is created. I argue this should have been considered a bug given the naming of the method without an underscore implying the user was free to call it.

@brittonsmith brittonsmith added bug enhancement Making something better labels Aug 4, 2025
self.create_field_info()
return self._field_info

@field_info.setter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary ?
Immutability has a lot of value when dealing with parallelism, so I would advise to avoid introducing mutability where it's not needed.

Copy link
Member Author

@brittonsmith brittonsmith Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already doing this in far more critical places, e.g., the index. In fact, I would argue that we want some measure of this here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what for ? Unless there's a test exercising it I don't understand the point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I replied without really thinking this through. We don't do this with index, but we probably do want to allow this to be settable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, without this, we can't do what's in the tests in PR #5211.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm even more confused now. I don't see the connection with #5211

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm referring to how the tests in that PR manually alter entries in field_info. You wouldn't be able to do that without having the setter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've come around on not having a setter for field_info. When it wasn't a property we allowed it to be set by anyone, but maybe it's better to be safe since we can.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to tie this off, the setter is necessary if we want to avoid modifying all the places in the frontends that setup the field_info on their own, which I believe we do. I also think we should not worry about immutability since we didn't have that before anyway. There's no reason to impose this restriction now. This change doesn't modify existing behavior and only allows us to refer to ds.field_info without having to think about whether it has been created yet. I don't see a reason not to do this.

@neutrinoceros neutrinoceros added api-consistency naming conventions, code deduplication, informative error messages, code smells... bug and removed bug labels Aug 4, 2025
@neutrinoceros
Copy link
Member

Whoops, I removed the bug label before I read your argument for it. Sounds reasonable to me now.

def create_field_info(self):
self.index
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to admit that this isn't a great thing to do as instantiating the index called create_field_info. What I think would be better is renaming this _create_field_info to remove the expectation that anyone ever call it. I would then change line 661 above to simply self.index.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this so that we only ever do one pass through create_field_info.

@matthewturk
Copy link
Member

So I looked into the field_info setter issue, and what it looks like is that the issue arises from where we actually do need to set field_info as an attribute. Basically, there are places scattered throughout the code in different Dataset subclasses (as well as the base class) where we actually create the field_info object. I see two ways around this:

  1. Define a setter that manages this
  2. Go through and change every single one of these to actually set the backing attribute (_field_info), rather than the property itself.

I prefer the first, though -- it touches fewer things and also operates exactly how a property is supposed to operate. I think we should have the setter.

Copy link
Member

@cphyc cphyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's overall a positive change, but I would just suggest we roll back 458ab76, since this is no longer required.

One issue that PR raises is the fact that we rely heavily on side effects in the code. This seems to be mostly a consequence of having a lazy approach, so I wonder whether we could (should?) refactor to remove these side effects as much as possible, or take a step back and think about the possible states the dataset and index objects can have (see #5252).

@cphyc cphyc merged commit 82ac140 into yt-project:main Aug 29, 2025
19 of 22 checks passed
@brittonsmith brittonsmith deleted the finfo branch August 29, 2025 15:50
@chrishavlin
Copy link
Contributor

what milestone ya'll want this in? it's got a bug label so i'm assuming we'd want this to ship with 4.4.2?

@brittonsmith
Copy link
Member Author

I'll have a look through the tests and see if there are other places where we do that. This is a very good point you raise about side-effects and that dataset objects can be in different states. It's something we should probably try to be more explicit about.

@brittonsmith
Copy link
Member Author

what milestone ya'll want this in? it's got a bug label so i'm assuming we'd want this to ship with 4.4.2?

Yes, I think this was technically a bug, although one we happily lived with for quite some time. I think we can go with 4.4.2. Thanks @chrishavlin!

@cphyc
Copy link
Member

cphyc commented Aug 29, 2025

I'll have a look through the tests and see if there are other places where we do that. This is a very good point you raise about side-effects and that dataset objects can be in different states. It's something we should probably try to be more explicit about.

Sorry, I triggered the merge button by mistake while playing in the CLI... However, we can just open an issue to keep a log that we need to remove these unnecessary calls to ds.index in the tests (and maybe elsewhere?).

@brittonsmith
Copy link
Member Author

@cphyc, no problem, happy to do a separate PR for that.

@chrishavlin
Copy link
Contributor

@meeseeksdev backport to yt-4.4.x

Copy link

lumberbot-app bot commented Aug 29, 2025

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout yt-4.4.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 82ac1403a99046061cc978fcff38a5903886e4be
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #5250: Convert field_info into a property so it can be created on first access.'
  1. Push to a named branch:
git push YOURFORK yt-4.4.x:auto-backport-of-pr-5250-on-yt-4.4.x
  1. Create a PR against branch yt-4.4.x, I would have named this PR:

"Backport PR #5250 on branch yt-4.4.x (Convert field_info into a property so it can be created on first access.)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

chrishavlin added a commit to chrishavlin/yt that referenced this pull request Aug 29, 2025
@neutrinoceros
Copy link
Member

@chrishavlin did you intend to open a manual backport PR ? I don't see one linked yet, however there seem to be a reference to it from your fork.

@chrishavlin
Copy link
Contributor

oh! ya, i messed up the cherry picking somehow though (that branch is showing changes to tests/pytest_mpl_baseline ???), I need to re-do it.

@chrishavlin
Copy link
Contributor

(I'll have another go in the next ~10 mins)

@neutrinoceros
Copy link
Member

that branch is showing changes to tests/pytest_mpl_baseline ???

this is a common (and easy to make) mistake. Make sure you run git submodules update --init in between branch checkouts, and prefer partial commits (git commit -p ...) over all-ins git commit -a ... to add another level of manual checking :)

chrishavlin pushed a commit to chrishavlin/yt that referenced this pull request Sep 2, 2025
…y so it can be created on first access.

(cherry picked from commit 82ac140)
chrishavlin pushed a commit to chrishavlin/yt that referenced this pull request Sep 2, 2025
… can be created on first access.

(cherry picked from commit 82ac140)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-consistency naming conventions, code deduplication, informative error messages, code smells... bug enhancement Making something better
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants