-
Notifications
You must be signed in to change notification settings - Fork 18
[Ref Mode] PyTorch reference mode (eager only) #339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
stack-info: PR: #339, branch: yf225/stack/34
helion/ref/hl_patch.py
Outdated
) | ||
|
||
# Step 3: Handle block_size (in ref mode, full dim size is always used as block_size) | ||
block_size_list = [None] * len(end_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always use full dim size in ref modes regardless of block_size
value
test/test_ref_compile.py
Outdated
class TestExamplesRefCompile(test_examples.TestExamples): | ||
"""Run all TestExamples tests in reference torch.compile mode via HELION_REF_COMPILE=1.""" | ||
|
||
# NOTE: All tests in TestExamples are run in ref torch.compile(fullgraph=True) mode by default in this test file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently all examples in TestExamples
pass with ref eager mode and ref compile mode.
Planning to add more ref mode unit tests to cover test_reduce.py
, test_associative_scan.py
etc. in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a step back, it seems like most test files are using RefEagerTestDisabled
and even for the tests that are enabled we are pretty aggressive disabling stuff like assertTrue and assertEqual.
How complete is this implementation?
self.assertGreater( # type: ignore[attr-defined] | ||
total_assertions, | ||
0, | ||
f"Test {self._testMethodName} did not call torch.testing.assert_close, assertRaises, or skipTest", # type: ignore[attr-defined] # pyright: ignore[reportAttributeAccessIssue] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even for the tests that are enabled we are pretty aggressive disabling stuff like assertTrue and assertEqual
@jansel we have a requirement here that all enabled tests must either call torch.testing.assert_close
or assertRaises
or self.skipTest
, so I believe even though assertTrue and assertEqual etc. are disabled, we are still checking correctness via those enabled assertions
Updated to not disable assertTrue and assertEqual, and added assertEqualCode / assertNotEqualCode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like most test files are using
RefEagerTestDisabled
yes I'll work on polishing the next PR (#404) that enables all the test files. (At one time I had the combined PR passing all tests, but it's 3000+ lines so I split into two PRs 😆)
self, first: str, second: str, msg: str | None = None | ||
) -> None: | ||
if not self._in_ref_eager_mode: | ||
super().assertNotEqual(first, second, msg) # type: ignore[misc] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New state: we only disable journal / in / not-in / code comparison in ref mode
path: | | ||
~/.cache/uv | ||
~/.venv | ||
key: ${{ runner.os }}-deps-${{ matrix.python-version }}-${{ hashFiles('.github/workflows/test.yml', 'requirements.txt') }}-${{ steps.date.outputs.month }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yf225 i think you broke the cache key with this. Why did you need to use templates? we could just do this with strategy matrix instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can clean it up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stamped, sorry for breakage and thanks so much for catching this!
Stacked PRs:
[Ref Mode] PyTorch reference mode (eager only)
This PR focuses on enabling ref mode for APIs like hl.grid / hl.tile /
hl.dot. A follow-up PR will enable ref mode for reduction / scan / indexing APIs.
Part of #77.
Please see inline code comments on the PR.