|
| 1 | +--- |
| 2 | +layout: page |
| 3 | +title: Available Computers |
| 4 | +nav_order: 1 |
| 5 | +description: "Information about computing resources available to the group" |
| 6 | +permalink: /details/computers |
| 7 | +--- |
| 8 | + |
| 9 | +# Computers |
| 10 | + |
| 11 | +This page provides information about the computing resources available to our research group. |
| 12 | + |
| 13 | +## Georgia Tech |
| 14 | + |
| 15 | +* GT PACE Phoenix |
| 16 | + * User guide [here](https://docs.pace.gatech.edu/phoenix_cluster/gettingstarted_phnx/) |
| 17 | + * Login via `ssh <GTusername>@login-phoenix-rh9.pace.gatech.edu` to get the RHEL9 nodes |
| 18 | + * Purpose: All-purpose campus resource of CPU and GPU jobs with a variety of hardware. |
| 19 | + * "Rules": Use the `embers` queue type to use idle nodes at zero cost. |
| 20 | + * To get access, let Spencer know, and he will fill out [this form](https://gatech.service-now.com/home?id=sc_cat_item&sys_id=61bc5e351b37f994a8622f4b234bcbf0). |
| 21 | + |
| 22 | +* GT ICE |
| 23 | + * [Resources/User guide](https://gatech.service-now.com/home?id=kb_article_view&sysparm_article=KB0042095) (click `Available Resources`, e.g.) |
| 24 | + * This looks like ~40 V100s, 8 A100s, 4 A40s, 20 RTX6000s, and 4 MI210s. |
| 25 | + * May need to contact Spencer for access. |
| 26 | + * __Most GPU nodes sit idle__ |
| 27 | + * On those nodes: `MaxNodes=UNLIMITED MaxTime=18:00:00` |
| 28 | + |
| 29 | +* GT Rogues Gallery |
| 30 | + * User guide [here](https://gt-crnch-rg.readthedocs.io/en/main/) |
| 31 | + * Purpose: Use of brand-new, forward-looking, or weird hardware. At the time of writing, it includes an NV H100 server, GH200 nodes, AMD MI210 GPU server, Bluefield-2/3 SmartNICs, RISC-V and Arm CPUs, etc. |
| 32 | + * "Rules": There are few rules; just follow the guidelines in the documentation. There are no limitations on hardware access/node hours. |
| 33 | + * Get access via [this link](https://crnch-rg.cc.gatech.edu/request-rogues-gallery-access/) |
| 34 | + |
| 35 | +* GT Wingtip-gpu3 |
| 36 | + * User guide [here](https://github.gatech.edu/cse-computing/compute-resources/blob/main/docs/systems/wingtip-gpu.md) |
| 37 | + * Purpose: Small (but possibly very long) GPU jobs, hosts 5x NV A100-80GB PCIe at the moment |
| 38 | + * "Rules": Be mindful of others' use of this machine as it does not have a scheduler. |
| 39 | + * Get access by emailing [Will Powell](mailto:will.powell@cc.gatech.edu), cc me. |
| 40 | + |
| 41 | +## University Clusters |
| 42 | + |
| 43 | +* ACCESS-CI computers |
| 44 | + * These are a set of university supercomputers listed [here](https://access-ci.org/resource-providers/). Each has its own user guide. At the time of writing, we have access to NCSA Delta (A100 GPUs), PSC Bridges2 (V100 GPUs), Purdue Anvil, and Texas A&M ACES (H100 GPUs), but we can change to others as needed. We primarily use NCSA Delta. |
| 45 | + * Purpose: All-purpose resources for CPU and GPU simulation. |
| 46 | + * "Rules": Be mindful of the available node hours. Queue times might be long. |
| 47 | + * Our account number: |
| 48 | + * `PHY240200` (ACCESS-CI Maximize, NCSA Delta only) |
| 49 | + * `PHY210084` (ACCESS-CI Accelerate; Bridges2, Delta, and so on) |
| 50 | + * Get access by |
| 51 | + * Creating an account [here](https://identity.access-ci.org/new-user.html) |
| 52 | + * Then, message Spencer on Slack with your username |
| 53 | + * On [NCSA Delta](https://docs.ncsa.illinois.edu/systems/delta/en/latest/) |
| 54 | + * The account name is `bdiy-delta-gpu` (ACCESS-CI Maximize) or `bbsc-delta-gpu` (ACCESS-CI Accelerate) for GPU resources |
| 55 | + * Replace `-gpu` with `-cpu` for CPU resources |
| 56 | + |
| 57 | +## DOE Labs |
| 58 | + |
| 59 | +* Oak Ridge National Lab OLCF: Frontier/Wombat/Andes/etc. |
| 60 | + * Purpose |
| 61 | + * Frontier: Very large-scale GPU simulation on AMD MI250X GPUs. |
| 62 | + * Wombat: Testbed for next-gen HPC platforms, including ARM nodes and soon next-generation NVIDIA nodes (GraceHopper). |
| 63 | + * Andes: For postprocessing |
| 64 | + * Our account number: `CFD154` |
| 65 | + * "Rules": Ask Spencer before running any jobs that use a very large number of node hours |
| 66 | + * Get access by |
| 67 | + * Create an account by following [these instructions](https://docs.olcf.ornl.gov/accounts/accounts_and_projects.html#applying-for-a-user-account) |
| 68 | + * The account/allocation number is `CFD154`. |
| 69 | + |
| 70 | +* Sandia National Lab (SNL) |
| 71 | + * Purpose: Resources for DOE-sponsored/funded research projects are only available to those students working on these projects. You will only have access to non-restricted resources. |
| 72 | + * "Rules": Usually, there are not many rules aside from the very many that they will impute onto you as you acquire access to these machines. |
| 73 | + * Login process (Sandia National Lab-specific) |
| 74 | + * Onto the DaaS |
| 75 | + * VMware Horizon ([download online](https://customerconnect.vmware.com/en/downloads/info/slug/desktop_end_user_computing/vmware_horizon_clients/horizon_8)) |
| 76 | + * URL: `daas.sandia.gov` |
| 77 | + * Passcode: `[PIN] + [your yubikey1timepassword]` |
| 78 | + * Password: `[kerberos pw]` |
| 79 | + * 3 options |
| 80 | + * badge update |
| 81 | + * conference room |
| 82 | + * daas <- open this one |
| 83 | + * Can complete training and do other things here, like look at your WebCARS to get `WC_ID` (which you need to submit jobs) |
| 84 | + * Onto a computer remotely |
| 85 | + * https://hpc.sandia.gov/access/ssh/ |
| 86 | + * Can do the below with DaaS (using my example username, `[usrname]`) |
| 87 | + * `ssh [usrname]@srngate.sandia.gov` |
| 88 | + * Passcode: `[PIN] + [yubikey one time pw]` |
| 89 | + * Choose a computer: e.g., Skybridge, Attaway, Weaver, etc. |
| 90 | + * Press `1` - `ssh session` |
| 91 | + * Default user name (`[usrname]`) |
| 92 | + * Password is (usually) the Kerberos one |
| 93 | + * If it asks for token OTP (e.g., on Weaver) then this is `[PIN] + [yubikey1timepassword]` |
| 94 | + |
| 95 | + * LLNL Livermore Computing: Lassen, Tioga, etc. |
| 96 | + * Anyone working on a specific LLNL project can use [LLNL CZ](https://lc.llnl.gov/) (non-restricted) resources |
| 97 | + * Talk to Spencer about getting access to CZ (collaboration zone) if you are working on a LLNL project |
| 98 | + * "Rules": Usually not many rules aside from the very many that they will impute onto you as you acquire access to these machines. |
| 99 | + * Login process (Lawrence Livermore National Lab-specific) |
| 100 | + * Onto LC-idm |
| 101 | + * URL: `ic-idm.llnl.gov` |
| 102 | + * Passcode: `[PIN] + [rsa one time password]` |
| 103 | + * Can use to view user profile and request roles (ask for resources on specific machines) |
| 104 | + * Onto the LC |
| 105 | + * URL: `lc.llnl.gov` |
| 106 | + * Passcode: `[PIN] + [rsa one time password]` |
| 107 | + * Requires three logins to fully log in |
| 108 | + * Can be used to access collaboration tools such as Confluence and Gitlab, user documentation, and MyLC for alerts, machine status, and job status |
| 109 | + * Onto a computer remotely |
| 110 | + * Can do the below with ssh (using my example username, `[usrname]`, for a specific LLNL machine, `[llnlmachine]`) |
| 111 | + * `ssh [usrname]@[llnlmachine].llnl.gov` |
| 112 | + * Passcode: `[PIN] + [rsa one time password]` |
| 113 | + |
| 114 | +## DOD Labs |
| 115 | + |
| 116 | +* Department of Defense |
| 117 | + * Anyone working on a DOD project can use [DOD HPCMP](https://www.hpc.mil/) (non-restricted) resources |
| 118 | + * The process of getting permissions to the non-restricted systems is a bit tedious but usually worth it |
| 119 | + * See [here](https://centers.hpc.mil/) for information on the available supercomputers |
| 120 | + * In particular, it's useful to keep an eye on [upcoming systems](https://centers.hpc.mil/systems/hardware.html#upcoming) |
| 121 | + * Current unclassified systems are [here](https://centers.hpc.mil/systems/unclassified.html) |
| 122 | + * Talk to Spencer about getting access to a DOD machine if you are working on a DOD project |
| 123 | + * Subproject: `ONRDC51242690`, Group: `5124D690`, Project: `5124` |
| 124 | + * Site: `NAVY` |
| 125 | + * nautilus |
| 126 | + * narwhal |
| 127 | + * Site: `ERDC` |
| 128 | + * carpenter |
| 129 | + * [Docs available here](https://centers.hpc.mil/users/docs/index.html#general) |
0 commit comments