You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -24,7 +26,8 @@ As of now, the image is of a UBI 8 system, with Spack-installed compilers and al
24
26
25
27
#### Triggering the Testing Workflow
26
28
27
-
This autotesting workflow is triggered by opening a pull request to `main` and also by a handful of actions on such a PR that is already open, including:
29
+
This autotesting workflow is triggered by opening a pull request to `main` and
30
+
also by a handful of actions on such a PR that is already open, including:
The AT2 configuration on `blake` currently attempts to keep 3 runners available
44
-
to accept jobs at all times.
46
+
The AT2 configuration on `blake`and `caraway`currently attempts to keep 3
47
+
runners per machine available to accept jobs at all times.
45
48
This workflow is configured to allow concurrent testing, so up to 3 test-matrix
46
49
configurations can run at once.
47
50
The concurrency setting is also configured to kill any active job if another
@@ -58,13 +61,17 @@ instance of this workflow is started for the same PR ref.
58
61
59
62
## Development Details
60
63
61
-
Most of the required configuration is provided by the AT2 docs and instructional Confluence page (on the Sandia network :confused:--reach out if you need access).
64
+
Most of the required configuration is provided by the AT2 docs and instructional
65
+
Confluence page (on the Sandia network :confused:--reach out if you need access).
62
66
However, some non-obvious choices and configurations are listed here.
63
67
64
-
- To add some info to the testing output, we employ a custom action, cribbed from E3SM/EAMxx, that prints out the workflow's trigger.
68
+
- To add some info to the testing output, we employ a custom action, cribbed
69
+
from E3SM/EAMxx, that prints out the workflow's trigger.
65
70
66
71
### Hacks
67
72
73
+
-[ ] FIXME(@mjs): This should not be necessary any more, after the changes to the haero build. `build-haero.sh` should be functional for this build now.
74
+
68
75
- For whatever reason, Skywalker does not like building in the `gcc_12-3-0_cuda_12-1` container for the H100 GPU.
69
76
- This appears to be an issue of the (Haero?) build not auto-detecting the correct Compute Capability (CC 9.0 => `sm_90`).
70
77
- To overcome this, we first obtain the CC flag via `nvidia-smi` within the testing container.
@@ -77,4 +84,4 @@ However, some non-obvious choices and configurations are listed here.
77
84
- One token used to fetch and read/write runner information.
78
85
-**Expires 11 April 2026**
79
86
- One token used fetch and read repository information via the API.
| CPU GH-runner Ubuntu 22.04[^gh-ubu2204]| Linux - Ubuntu 22.04 | GitHub Runners |`gcc` 12.3 |
26
27
27
28
### The Flow of the CI Workflow
28
29
@@ -48,6 +49,13 @@ Based on the trigger and/or inputs, `MAM4xx Autotester` dispatches sub-workflows
48
49
-***Note:*** AT2 = "Autotester 2," the second generation of a Sandia-developed GitHub-based testing product.
49
50
- See the [AT2 README](./AT2-README.md) for details about the implementation of the AT2 product.
50
51
52
+
#### GPU AT2 `gcc` 13.3 `hip` 6.2
53
+
54
+
- This is largely identical to the above CUDA-based workflow, the salient difference being that we run on AMD hardware, using the `hipcc` C++ compiler.
55
+
- The `caraway` machine has 2 different AMD_GFX90A-architecture MI200-series GPUs available, MI210 and MI250.
56
+
- As of the time of writing, autotesting jobs are assigned one or the other based on availability, to speed up matters.
57
+
-***Note:*** This could change based on future needs.
58
+
51
59
#### GitHub CPU Auto-test Ubuntu 22.04
52
60
53
61
- The full version of this test runs a "matrix-strategy" test running all combinations of
@@ -86,6 +94,7 @@ The current options when manually triggering a workflow are:
86
94
- Test Machine Architecture
87
95
- Current Options:
88
96
-`GPU-NVIDIA_H100`
97
+
-`GPU-AMD_MI200-series`
89
98
-`CPU-Ubuntu_22-04`
90
99
-`ALL`
91
100
- Floating-point Precision
@@ -135,7 +144,7 @@ Refer to the section on [Other Types of Job Control](./AT2-README.md#other-types
135
144
-[x] Unify all CI into a single top-level yaml file that calls the sub-cases.
136
145
- This should provide finer control over what runs and when.
0 commit comments