Skip to content

api: revamp subsampling factors to avoid duplicates #2575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mloubout
Copy link
Contributor

@mloubout mloubout commented Apr 7, 2025

No description provided.

@mloubout mloubout added bug-py API api (symbolics, types, ...) labels Apr 7, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

tests/test_dimension.py:1900

  • [nitpick] Consider renaming 'test_cond_copy' to a more descriptive name (e.g., 'test_operator_parameters_caching') to better convey the intent of the test.
def test_cond_copy(self):

devito/types/dimension.py:826

  • [nitpick] Consider adding a docstring to 'SubsamplingFactor' to clearly document its caching functionality and intended usage.
class SubsamplingFactor(Constant, Cached):

@mloubout mloubout force-pushed the fix-subsampling-rebuild branch from 4edf17b to 07c5547 Compare April 7, 2025 21:26
Copy link

codecov bot commented Apr 7, 2025

Codecov Report

Attention: Patch coverage is 98.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 91.91%. Comparing base (4196aa4) to head (f59a4fe).

Files with missing lines Patch % Lines
devito/types/dimension.py 97.05% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2575   +/-   ##
=======================================
  Coverage   91.91%   91.91%           
=======================================
  Files         245      245           
  Lines       48313    48362   +49     
  Branches     4244     4248    +4     
=======================================
+ Hits        44406    44454   +48     
+ Misses       3232     3231    -1     
- Partials      675      677    +2     
Flag Coverage Δ
pytest-gpu-aomp-amdgpuX 72.54% <93.18%> (+0.01%) ⬆️
pytest-gpu-nvc-nvidiaX 73.61% <93.18%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@EdCaunt EdCaunt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me. Where were the duplicated subsampling factors becoming an issue?

@mloubout mloubout force-pushed the fix-subsampling-rebuild branch from 07c5547 to 6828f53 Compare April 17, 2025 14:21
Copy link
Contributor

@FabioLuporini FabioLuporini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes me think here is that it feels like we're relying on caching for correctness, and that shouldn't the case. Caching should just be a thing for performance. We know it's not always been like that in the past, but we've improved the situation over the years. This however looks like a step back

Also, it doesn't really make much sense that we use a Constant that is an Uncached, being a subclass of DataSymbol, and now here we have a subclass of Constant that is Cached instead, as it's like bending the class hierarchy.

I think the true issue here is that we're using a Constant, whose underlying value can change dynamically, as subsampling factor.

So, with all this in mind, rolling back to our starting point, I think a potentially better solution here is to switch from Constant to DataSymbol, as a DataSymbol still allows user overrides while not carrying any data explicitly, and so behaving as expected. Check the following out -- no surprises.

In [1]: from devito.types import Constant

In [2]: a = Constant(name='a')

In [3]: a1 = Constant(name='a1')

In [4]: a == a1
Out[4]: False

...

In [8]: from devito.types.basic import DataSymbol

In [9]: b = DataSymbol(name='b')

In [10]: b1 = DataSymbol(name='b')

In [11]: b == b1
Out[11]: True

In [12]: b.data
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[12], line 1
----> 1 b.data

AttributeError: 'DataSymbol' object has no attribute 'data'

@FabioLuporini
Copy link
Contributor

FabioLuporini commented Apr 28, 2025

for maximum neatness, we could still have a class SubsamplingFactor(DataSymbol): pass

user-provided Constants as factors should still be supported for backwards compatibility obviously

@mloubout
Copy link
Contributor Author

as a DataSymbol still allows user overrides while not carrying any data explicitly

Well, that's the issue, it needs to carry some, forcing the user to provide the value at apply when they already provided it at the dimension creation is not good. The API is "it allows overrides" not "you have to provide it at apply"

@mloubout
Copy link
Contributor Author

Also you need b is b1 to be True not b == b1 for the symbol to be treated as unique and since DataSymbol is Uncached it won't work or will require the same massaging to make it cached.

@mloubout mloubout force-pushed the fix-subsampling-rebuild branch 4 times, most recently from 05ce88e to 3ce097c Compare April 28, 2025 18:08
@mloubout mloubout changed the title api: cache subsampling factors to avoid duplicates api: revamp subsampling factors to avoid duplicates Apr 29, 2025
else:
raise ValueError("factor must be an integer or integer Constant")

if self._factor is not None:
# Always make the factor symbolic to allow overrides with different factor.
self._symbolic_factor = symbolic_factor or \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be an instance attribute? just like for example symbolic_size or symbolic_max are cached_property, can (should) this not be the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need it there so that it uses the name of the input Constant when one is used for backward compatibility. Can't do that in a cached_property.

# Parent dimension define the interval
fact = self._factor.data if self._factor is not None else 1
def _arg_values(self, interval, grid=None, args=None, **kwargs):
if self.symbolic_factor is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assuming it's possible to turn symbolic_factor into a cached_property, here we would need if self.factor. is not None


def __init_finalize__(self, name, parent=None, factor=None, condition=None,
indirect=False, **kwargs):
indirect=False, symbolic_factor=None, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring needs updating since symbolic_factor now included in kwargs. How is symbolic_factor different to factor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factor is just the number it was created with, symbolic_factor is the symbol in the code but it's a plain Scalar that doesn't carry data and can be passed at apply.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Does it _rebuild correctly? I seem to remember that there is still an issue where rebuilding a ConditionalDimension with no factor introduces a factor of one unless that got patched when I wasn't looking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rebuild issue been patch a while back yes

@mloubout mloubout force-pushed the fix-subsampling-rebuild branch from 3ce097c to 7b48777 Compare May 1, 2025 12:17
@@ -305,7 +305,7 @@ def spacing_map(self):
# Special case subsampling: `Grid.dimensions` -> (xb, yb, zb)`
# where `xb, yb, zb` are ConditionalDimensions whose parents
# are SpaceDimensions
mapper[d.root.spacing] = s/self.dtype(d.factor.data)
mapper[d.root.spacing] = s/self.dtype(d.factor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my last worry, after looking at this line, is that this set of changes might somehow break user code, because now .factor is a different thing, while the user may have expected a Constant if a Constant was supplied

@mloubout mloubout force-pushed the fix-subsampling-rebuild branch from 7b48777 to 5c268dc Compare May 2, 2025 15:36
@mloubout mloubout force-pushed the fix-subsampling-rebuild branch from 5c268dc to f59a4fe Compare May 2, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API api (symbolics, types, ...) bug-py
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants