-
Notifications
You must be signed in to change notification settings - Fork 96
Isolated start-up and shut-down graphs. #5090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
e9ee0ee
to
b4e5429
Compare
bcb5b05
to
8d09900
Compare
I'm tentatively putting this one back to 8.1.0.
|
Merged master and deconflicted. |
@dpmatthews - pinging you for a big-picture review (given your involvement in the discussions that led to this). |
acc0ca3
to
5c4532c
Compare
Squashed and rebased. |
5c4532c
to
5617b1c
Compare
(squashed, rebased, and deconflicted) |
Also, as expected, if you use clock triggers in this section:
let's see what happens if I use workflow_state:
(otherwise it would work I think haha).. Perhaps we can just use raw argument value if Are these tasks even recorded in the task table, and with |
Well, stop/restart works while |
Should |
Probably not. The The main cycling section still has an ICP for all the usual purposes. |
commit 629ead573f5a306a23a5153eb02c6cf0523dd436 Merge: 16387d0 7284f25 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Fri Mar 14 00:26:41 2025 +0000 Merge branch 'master' into initial-final-graph commit 16387d0 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Fri Feb 14 00:45:31 2025 +0000 Always start new graph from flow 1. commit 2c95d33 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Thu Feb 13 23:32:20 2025 +0000 Extend a functional test. commit a0b65ba Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Mon Feb 10 09:59:06 2025 +0000 Post-merge tweaks. commit 3bed101 Merge: db03e50 4f11996 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Fri Feb 7 00:35:13 2025 +0000 Merge branch 'master' into initial-final-graph commit db03e50 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Mon Sep 4 17:49:47 2023 +1200 Post-rebase tweakage. commit af0bdc1 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Mon Mar 20 10:32:09 2023 +1300 Squashed commit of the following: commit ad8231a8f22dd3cd8d887774474f97df2c83e429 Merge: acc0ca3 6fc3c58 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Mon Mar 20 10:27:19 2023 +1300 Merge branch 'master' into initial-final-graph commit acc0ca3 Author: Hilary Oliver <hilary.j.oliver@gmail.com> Date: Wed Oct 26 12:00:00 2022 +1300 Style fix. commit a7170fb Author: Hilary Oliver <hilary.j.oliver@gmail.com> Date: Wed Oct 26 11:53:31 2022 +1300 Update change log. commit 9024065 Merge: 858a8c1 b0ff549 Author: Hilary Oliver <hilary.j.oliver@gmail.com> Date: Wed Oct 26 11:32:26 2022 +1300 Merge branch 'master' into initial-final-graph commit 858a8c1 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Mon Sep 26 23:27:01 2022 +1300 Fix alpha/omega graph runahead release. commit b581ab2 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Mon Sep 26 23:26:33 2022 +1300 Add new func tests. commit 8d09900 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Mon Sep 26 15:53:42 2022 +1300 Revert graph sorting change. commit 75c1763 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Fri Sep 23 22:44:08 2022 +1200 Adapt integration tests. commit 1b685a0 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Fri Sep 23 15:55:57 2022 +1200 Tidy up. commit 7a62450 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Fri Sep 23 13:20:51 2022 +1200 Switch to alpha and omega. commit 129c2e3 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Fri Sep 23 10:32:54 2022 +1200 Implicit integer ICP if only nocycle graphs. commit a53ac82 Author: Hilary James Oliver <hilary.j.oliver@gmail.com> Date: Thu Sep 22 16:23:16 2022 +1200 startup and shutdown graphs; needs tidying
d82ebc6
to
aa736c0
Compare
(rebased onto current master) |
I initially called the new isolated graphs "alpha" and "omega" to avoid any confusion with the concepts of initial, start, final, stop, cycle points. However, that sounds a bit too quirky and I've switched to "startup graph" and "shutdown graph", which is descriptive, and it shouldn't be confusing if we consistently use "startup" and "shutdown" (not "start" and "stop"). ... updated issue description. |
I don't want to throw a curve-ball in, but that nomenclature made me think.. Would it make sense to change:
to this:
That way people can utilize whatever cycling, xtriggers (i.e. workflow_state), point reference they like in the start/shutdown graphs.. |
Yes, but only if we extend the scope of this PR.
Here's a curveball loosely based on a real example: [scheduling]
[[graph]]
R1 = install => spinup => run => report
[[[install]]]
R1 = install<model>
[[[spinup]]]
cycling mode = integer
initial cycle point = 1
final cycle point = {{ SPINUP_LENGTH }}
R1 = spinup<model>_cold => spinup<model>
P1 = spinup<model> => spinup<model + 1>
R1/$ = spinup<model> => make_bc
[[[run]]]
initial cycle point = {{ START_CYCLE }}
final cycle point = {{ END_CYCLE }}
P1D = foo & bar => run<model> => baz
[[[report]]]
R1 = pub => qux => report Three part graphs (startup, run, teardown) are a tad restrictive. The real word example has a date-time cycling part, but an ungodly number of spinup tasks (currently implemented via parameters) that would better fit integer-cycling logic. However, I don't think it would be possible to implement under the runahead limit model. Solving that one would involve clearing and rebuilding the task pool? |
I quite like David's suggestion, but the problem with it is it suggests that non-R1 (i.e., cycling) is possible in the startup and shutdown graphs. (That is perfectly possible and not much of a change to this PR, but I either decided or was persuaded against it two years ago ... I don't recall which at this point). |
As for @oliver-sanders suggestion, it is also quite easy to change this to an arbitrary number of isolated graph sections, and my original experiments did that, but as I recall @dpmatthews persuaded me out of that. Not sure of the runahead implications at this point, would need to think through that. |
(Note, my example above also supported both integer and datetime cycling) |
For the sake of our operation sync startup and removing our dependency on ICP tasks, we should get this going. The idea I think will satisfy most parties with minimal backend headache is to have:
e.g.
or more back compatible:
(or Internally this would be the easiest to implement (and perhaps build on later), you'd essentially create a couple of structures to check for runahead release in the
(and/or perhaps on the
IMHO this is the most simple back compatible approach that can be built on in the future.. |
Agreed that it's important to flesh out the direction of this feature now to ensure we don't close the door for future development. We don't have to deliver the full vision upfront, just ensure that the syntax and implementation model we pick now is compatible with the longer term vision. I'm not sure I'm following on the "priority" options above? Is that an alternative way of specifying the order in which the graph sections run, or a way of controlling the order if multiple graph sections were permitted to run in parallel? Here are the implementation options I've thought of so far (there may be more). If we could work out which of these we want to aim for, we can settle the syntax, etc to leave room for this feature to grow as desired in the future: Graph sections:
Cycling model:
Interfaces:
|
@dwsutherland - couple of initial responses:
Yes I was also considering that yesterday, as currently on this PR branch the special no-cycling points preclude (obviously) cycling, but they also require hacking any cycle point comparison code (e.g. for runahead limiting).
The runahead limit itself would almost do the job, if we have the same kind of cycle points in each section (I mean for when startup tasks get retriggered mid-run; when run "naturally" the order is absolute)... but perhaps not quite (if the main graph front is not beyond the runahead limit with respect to the startup tasks) but users could just pause the workflow before retriggering startup tasks, to handle that. Integer cycling in the startup graph of a datetime cycilng workflow should probably be supported, unfortunately. We might get away with a short datetime cycle (e.g. minutes) that the jobs just ignore the value of - but that's kinda nasty. f Otherwise we'll need to think through the implications of a mixed datetime/integer task pool, if that's even possible. It might be better to keep the sections separate - if you trigger a startup graph again, the main graph will be suspended until the startup one finishes again. |
I think it's the latter, as one important use case for this is retriggering deployment tasks (in the startup graph) mid-run. So for manual triggering purposes I guess we have to be able to:
The same-task-pool model works fine on this branch, but only because the startup and shutdown graphs are non-cycling, and I have to carefully exclude those special cycle points from comparison operations and the like (which is not great).
Reminded me, that's partly why I chose the special cycle point values "startup" and "shutdown" - it works with our existing UIs. It won't be so easy if we allow cycling there. |
Yes, both really, this was just a way of implementing the section run order, i.e.:
(the under the hood priority would be 1, 2, 3 in this case) I'd support bespoke graph section, and freeform cycling..
So the rest of the features will be mostly backend implementation in follow-on PRs.. The implementation I proposed (using existing pool) will allow us to get underway quickly, because we can then re-architect under the hood are release features piece wise after.
Exactly, it's really a requirement that we're able to rerun these sections and have downstream/lower-priority section essentially freeze while it runs.
Well we could just use the queuing system, so any actively waiting tasks in lower priority sections get queued, even if they are released from runahead, while higher priority sections have active tasks. Also, I like the fact we get to use cycling in the bespoke section with the inherited cycling mode and ICP/FCP.. And we get this for free by using the same pool (but restricting, by validation, the mixing of same named tasks between sections). This might mean we need different runahead pool/internals for each graph section, otherwise a queued task in a lower priority section might runahead limit higher priority sections. Must stress, it's important that the same named tasks can not be permitted to be in multiple graph sections..
Perhaps - with only one graph section:
with multiple:
(IMO - the visuals aren't as high a priority, can stay as is for now (cycle roots)) |
It would be a good idea to gather the use cases the feature is intended to solve. E.g, what is difficult about ICP/FCP dependence under the current arrangement? What patterns of graphs can we identify? Etc. |
Created a new issue for the general discussion - this old PR isn't really the right place for it: #7020 |
Converted to DRAFT - no point in continuing with this PR while we're revisiting the requirements. |
[UPDATE: likely to be superseded by a more general solution; partially addresses the retrospectively-written issue #7020]
Supersede #4912
Supersede #5036
(Really addresses the original #4903 but specifically just for start-up and shut-down graphs)
Implements isolated NON-CYCLING startup and shutdown graphs as special cycle points, to avoid the pain of eternal dependence on initial and final tasks.
[UPDATE from below] I initially called the new isolated graphs "alpha" and "omega" to avoid any confusion with the concepts of initial, start, final, stop, cycle points. However, that sounds a bit too quirky and I've switched to "startup graph" and "shutdown graph", which is descriptive, and it shouldn't be confusing if we consistently use "startup" and "shutdown" (not "start" and "stop").
How it works:
Also:
startup/foo
will coexist in the task pool with3/bar
etc.EXAMPLE
Result:
Check List
CONTRIBUTING.md
and added my name as a Code Contributor.setup.cfg
andconda-environment.yml
.CHANGES.md
entry included if this is a change that can affect users