-
Notifications
You must be signed in to change notification settings - Fork 339
[AutoTuner] Restore resume check #3070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
99ea816
to
8e18ff7
Compare
tools/AutoTuner/test/resume_check.py
Outdated
If no folders are found, returns a default value of 9e99. | ||
""" | ||
if iteration < 0 or iteration >= self.iterations: | ||
raise ValueError("Iteration must be between 0 and iterations - 1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ValueError("Iteration must be between 0 and iterations - 1") | |
raise ValueError("Iteration must be between 0 and (iterations - 1)") |
tools/AutoTuner/test/resume_check.py
Outdated
folders = glob.glob(os.path.join(experiment_dir, f"variant-*-or-{iteration}")) | ||
return max((os.path.getmtime(folder) for folder in folders), default=9e99) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little obfuscated:
- The function's name does not match the expected return value type/format. A check should return True/False.
- Returning 9e99 is not clear on what is going on.
- Creating a folder does not confirm the run status. For this test to be true to its purpose, we need to guarantee that it has not finished, not just started.
We should consider using a get_experiment_status
function to check if all iterations have finished running; this function would return at least "RUNNING" and "FINISHED", other states might be helpful but are not required now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The function's name does not match the expected return value type/format. A check should return True/False.
Can possibly change it to
get_trial_times
- Returning 9e99 is not clear on what is going on.
It is just a dummy value, to compare for latest modified time in while loop line 121-128
# Check if first config is complete
while True:
cur_modified_time = self.check_trial_times()
print(f"Current modified time: {cur_modified_time}")
print(f"Latest modified time: {latest_modified_time}")
if abs(cur_modified_time - latest_modified_time) < 1e-3:
break
latest_modified_time = cur_modified_time
time.sleep(10)
- Creating a folder does not confirm the run status. For this test to be true to its purpose, we need to guarantee that it has not finished, not just started.
This function returns the latest modified time of a given iteration (folder names are matched using
iteration
glob) - so if a run is completed the folder should no longer be modified.
4, We should consider using a get_experiment_status function to check if all iterations have finished running; this function would return at least "RUNNING" and "FINISHED", other states might be helpful but are not required now.
get_experiment_status
is helpful in general, but might not be too useful in resume_check because we need to check iteration completion, as opposed to experiment completion (or all_iterations completion)
Signed-off-by: Jack Luar <jluar@precisioninno.com>
…ror handling Signed-off-by: Jack Luar <jluar@precisioninno.com>
Signed-off-by: Jack Luar <jluar@precisioninno.com>
Signed-off-by: Jack Luar <jluar@precisioninno.com>
8e18ff7
to
cf5ed3d
Compare
Signed-off-by: Jack Luar <jluar@precisioninno.com>
Signed-off-by: Jack Luar <jluar@precisioninno.com>
This pull request includes several changes to the
AutoTuner
tests to re-enable previously disabled tests, improve the test setup, and add new utility methods.Fixes #3005
Re-enabling tests and improving setup:
flow/test/test_autotuner.sh
: Re-enabled thetest_tune_resume
test in theAutoTuner
script by uncommenting the test execution line.Codebase simplification and improvements:
tools/AutoTuner/test/resume_check.py
: Removed unnecessary directory changes and imported theglob
module to facilitate file pattern matching. [1] [2]tools/AutoTuner/test/resume_check.py
: Introduced thecheck_trial_times
method to check the modification times of trial iterations, improving the robustness of thetest_tune_resume
test.