Skip to content

Conversation

mabruzzo
Copy link
Contributor

@mabruzzo mabruzzo commented Mar 12, 2023

This fix is very important if you want to schedule the following actions at given simulation times:

  • dump outputs using the new load-balanced approach (i.e. using the "output" method)
  • write checkpoint files at arbitrary times approach using the new load-balanced approach (i.e. using the "output" method)
  • execute any other arbitrary method object
    Currently, we can only reliably schedule the above to occur at particular cycle numbers.

Prior to this PR, there are currently 2 problems that inhibit this from working.

  1. The calculated dt was not being restricted to coincide with the simulation times at which methods are scheduled to be executed. This issue has been fixed by this PR.
  2. The next() method is never called on Schedule instances associated any of the Method objects. This creates issues for schedules based on "simulation time" or "wall-clock time" that are implemented with ScheduleInterval (the existing implementation requires knowledge of how many times next() has been called to determine the next "simulation/wall-clock" when a method simulation). This issue has not been addressed by this PR.

In lieu of fixing Problem-2, this PR adopted a short-term fix that modifies the ScheduleInterval so that it ignores the number of times next() has been called when considering how scheduling with simulation-time (this is very similar to what it does when considering scheduling based on simulation-cycle). However, scheduling based on wall-clock time remains broken -- it will require a more comprehensive solution to Problem-2. While this short-term fix is fairly crude, it's better than nothing (the new output and checkpoint machinery effectively can't be used without this fix in a bunch of real-world simulations).

Problem-2 requires a somewhat more involved solution.

  • The way that the next() method is currently implemented requires that it be called once per Processing Element at the end of the compute phase for each method's schedule object (for the methods that were scheduled to be executed during that cycle). To accomplish this, we would probably need to add a callback to a new method of the Problem object to be executed at the end of the compute phase (after each block has executed each relevant method object).
  • Alternatively, we can modify the next machinery so that it can be called multiple times per cycle (to do this, we would probably need to record the current cycle, current simulation-time, and wallclock-time at the start of the cycle)...

I'm leaving this to be addressed in the future...

@mabruzzo mabruzzo force-pushed the method-schedule branch 2 times, most recently from 3d1ac2a to 8e716c9 Compare March 21, 2023 14:31
…counts for Method::schedule()

In the longer-term additional changes will be required to get ScheduleInterval working when using wall-clock time for outputs. Complications arise when trying to call Schedule::next for the schedules configured for Method objects. In the future, we will need to either:
1. refactor control flow so that the method only gets called once per PE per cycle
2. refactor internals of Schedule so that its okay to call Schedule::next multiple times on a given PE during a single cycle.
@zeus753mln zeus753mln requested a review from jobordner July 27, 2023 21:45
@jwise77 jwise77 requested review from WillHicks96 and removed request for jobordner February 23, 2024 17:29
@jwise77 jwise77 self-requested a review December 12, 2024 17:13
@jwise77 jwise77 removed the request for review from WillHicks96 March 26, 2025 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants