|
| 1 | +# Parallel Computing Toolbox plugin for MATLAB Parallel Server with Grid Engine |
| 2 | + |
| 3 | +[](https://www.mathworks.com/matlabcentral/fileexchange/52816) |
| 4 | + |
| 5 | +Parallel Computing Toolbox™ provides the `Generic` cluster type for submitting MATLAB® jobs to a cluster running a third-party scheduler. |
| 6 | +`Generic` uses a set of plugin scripts to define how your machine running MATLAB or Simulink® communicates with your scheduler. |
| 7 | +You can customize the plugin scripts to configure how MATLAB interacts with the scheduler to best suit your cluster's setup and to support custom submission options. |
| 8 | + |
| 9 | +This repository contains MATLAB code files and shell scripts that you can use to submit jobs from a MATLAB or Simulink session running on Windows®, Linux®, or macOS to a Grid Engine® scheduler running on Linux. |
| 10 | + |
| 11 | +## Products Required |
| 12 | + |
| 13 | +- [MATLAB](https://mathworks.com/products/matlab.html) and [Parallel Computing Toolbox](https://mathworks.com/products/parallel-computing.html), release R2017a or newer, installed on your computer. |
| 14 | +Refer to the documentation for [how to install MATLAB and toolboxes](https://mathworks.com/help/install/index.html) on your computer. |
| 15 | +- [MATLAB Parallel Server™](https://mathworks.com/products/matlab-parallel-server.html) installed on the cluster. |
| 16 | +Refer to the documentation for [how to install MATLAB Parallel Server](https://mathworks.com/help/matlab-parallel-server/integrate-matlab-with-third-party-schedulers.html) on your cluster. |
| 17 | +The cluster administrator normally does this step. |
| 18 | +- Grid Engine running on the cluster. |
| 19 | + |
| 20 | +## Which Plugin Scripts Should I Use? |
| 21 | + |
| 22 | +This repository provides a set of plugin scripts in each of the `shared`, `remote`, and `nonshared` folders. |
| 23 | +Each folder corresponds to a **submission mode**, determining how MATLAB submits jobs to the scheduler and retrieves the results. |
| 24 | +Use this section to determine which submission mode is appropriate for your network setup. |
| 25 | + |
| 26 | +To decide which submission mode to use, consult the following diagram. |
| 27 | + |
| 28 | +```mermaid |
| 29 | +flowchart TD |
| 30 | + Q1[Can MATLAB and cluster machines\n access a shared filesystem?] |
| 31 | + Q1 -->|Yes| Q2 |
| 32 | + Q1 --->|No| N([Nonshared]) |
| 33 | + Q2[Are the scheduler utilities available\n on the machine running MATLAB?] |
| 34 | + Q2 -->|Yes| S([Shared]) |
| 35 | + Q2 -->|No| R([Remote]) |
| 36 | +``` |
| 37 | + |
| 38 | +### Shared Submission Mode |
| 39 | + |
| 40 | +MATLAB uses files on disk to send tasks to the Parallel Server workers and fetch their results back. |
| 41 | +This is most effective when there is a disk location accessible to both your machine running MATLAB and the workers on the cluster. |
| 42 | +Your computer can communicate with the workers by reading and writing to this shared filesystem. |
| 43 | + |
| 44 | +To manage work on the cluster, MATLAB calls the Grid Engine command line utilities. |
| 45 | +For example, the `qsub` command is used to submit work and `qstat` to query the state of submitted jobs. |
| 46 | +If your MATLAB session is running on a machine with the scheduler utilities available, the plugin scripts can call the utilities on the command line. |
| 47 | +This is typically true if your MATLAB session is running on the same Grid Engine cluster you want to submit to. |
| 48 | + |
| 49 | +If your MATLAB session and workers have a shared filesystem and the scheduler utilities are available on your machine, use **shared submission mode**. |
| 50 | + |
| 51 | +### Remote Submission Mode |
| 52 | + |
| 53 | +If MATLAB cannot directly access the scheduler utilities on the command line, but can access the same filesystem as the workers, use **remote submission mode**. |
| 54 | +MATLAB creates an SSH session to the cluster and runs scheduler commands over that connection. |
| 55 | +Job files are still shared between your MATLAB session and the workers using the shared filesystem. |
| 56 | + |
| 57 | +This submission mode is useful for submitting from a MATLAB session on a Windows computer to a Linux Grid Engine cluster on the same network. |
| 58 | +Your Windows machine creates an SSH session to the cluster head node to access the Grid Engine utilities, while using a shared networked folder to store job data files. |
| 59 | + |
| 60 | +If your MATLAB session is running on a compute node of the cluster where you want to submit work to, you can use remote submission mode to create an SSH session to the cluster's head node to submit more jobs. |
| 61 | + |
| 62 | +### Nonshared Submission Mode |
| 63 | + |
| 64 | +If there isn't a shared filesystem, you need to use **nonshared submission mode**. |
| 65 | +In this mode, MATLAB uses SSH to submit commands to the scheduler and SFTP to copy job and task files between your computer and the cluster. |
| 66 | + |
| 67 | +Note that transferring large data files (e.g. hundreds of MB) over the SFTP connection can cause a noticeable overhead to job submission and fetching results, so it is best to use a shared filesystem with remote submission mode if one is available. |
| 68 | +There must be a shared filesystem location available to all the workers, even if your computer cannot access it. |
| 69 | + |
| 70 | +## Setup Instructions |
| 71 | + |
| 72 | +Before proceeding, ensure that the above required products are installed. |
| 73 | + |
| 74 | +### Download or Clone this Repository |
| 75 | + |
| 76 | +To download a zip file of this repository, at the top of this repository page, select **Code > Download ZIP**. |
| 77 | +Alternatively, to clone this repository to your computer with git installed, run the following command on your operating system's command line: |
| 78 | +``` |
| 79 | +git clone https://github.com/mathworks/matlab-parallel-gridengine-plugin |
| 80 | +``` |
| 81 | +You can execute a system command from the MATLAB command line by adding a `!` before the command. |
| 82 | + |
| 83 | +### Create a Cluster Profile in MATLAB |
| 84 | + |
| 85 | +You can create a cluster profile by using either the Cluster Profile Manager or the MATLAB command line. |
| 86 | + |
| 87 | +To open the Cluster Profile Manager, on the **Home** tab, in the **Environment** section, select **Parallel > Create and Manage Clusters**. |
| 88 | +Within the Cluster Profile Manager, select **Add Cluster Profile > Generic** from the menu to create a new `Generic` cluster profile. |
| 89 | + |
| 90 | +Alternatively, for a command line workflow without using graphical user interfaces, create a new `Generic` cluster object by running: |
| 91 | +```matlab |
| 92 | +c = parallel.cluster.Generic; |
| 93 | +``` |
| 94 | + |
| 95 | +### Configure Cluster Properties |
| 96 | + |
| 97 | +The table below gives the minimum properties required for `Generic` to work correctly. |
| 98 | +For a full list of cluster properties, see the documentation for [`parallel.Cluster`](https://mathworks.com/help/parallel-computing/parallel.cluster.html). |
| 99 | + |
| 100 | +**Property** | **Description** |
| 101 | +----------------------|---------------- |
| 102 | +JobStorageLocation | Where job data is stored by your machine. |
| 103 | +NumWorkers | Number of workers your license allows. |
| 104 | +ClusterMatlabRoot | Full path to the MATLAB install folder on the cluster. |
| 105 | +OperatingSystem | The cluster's operating system. |
| 106 | +HasSharedFilesystem | True for shared and remote submission modes, false for nonshared. |
| 107 | +PluginScriptsLocation | Full path to the shared, remote or nonshared folder, depending on your submission mode. If using R2019a or earlier, this property is called IntegrationScriptsLocation. |
| 108 | + |
| 109 | +In the Cluster Profile Manager, set each property value in the boxes provided. |
| 110 | +Alternatively, at the command line, set each property on the cluster object using dot notation: |
| 111 | +```matlab |
| 112 | +c.JobStorageLocation = 'C:\MatlabJobs'; |
| 113 | +% etc. |
| 114 | +``` |
| 115 | + |
| 116 | +At the command line, you can also set properties at the same time you create the `Generic` cluster object, by specifying name-value pairs in the constructor: |
| 117 | +```matlab |
| 118 | +c = parallel.cluster.Generic( ... |
| 119 | + 'JobStorageLocation', 'C:\MatlabJobs', ... |
| 120 | + 'NumWorkers', 20, ... |
| 121 | + 'ClusterMatlabRoot', '/usr/local/MATLAB/R2022a', ... |
| 122 | + 'OperatingSystem', 'unix', ... |
| 123 | + 'HasSharedFilesystem', true, ... |
| 124 | + 'PluginScriptsLocation', 'C:\MatlabGrid EnginePlugin\shared'); |
| 125 | +``` |
| 126 | + |
| 127 | +If you're submitting from a Windows machine to a Linux cluster in remote submission mode, you can specify the `JobStorageLocation` as a structure specifying how to find the shared folder on each operating system. |
| 128 | +For example: |
| 129 | +```matlab |
| 130 | +struct('windows', '\\organization\matlabjobs\jobstorage', 'unix', '/organization/matlabjobs/jobstorage') |
| 131 | +``` |
| 132 | +or if you have the drive letter `M:` mapped to `\\organization\matlabjobs`: |
| 133 | +```matlab |
| 134 | +struct('windows', 'M:\jobstorage', 'unix', '/organization/matlabjobs/jobstorage') |
| 135 | +``` |
| 136 | + |
| 137 | +### Configure AdditionalProperties |
| 138 | + |
| 139 | +You can use `AdditionalProperties` as a way of modifying the behaviour of `Generic` without having to edit the plugin scripts. |
| 140 | +For a full list of the `AdditionalProperties` supported by the plugin scripts in this repository, see [Customize Behavior of Sample Plugin Scripts](https://mathworks.com/help/matlab-parallel-server/customize-behavior-of-sample-plugin-scripts.html). |
| 141 | +By modifying the plugins, you can add support for your own custom `AdditionalProperties`. |
| 142 | + |
| 143 | +In shared submission mode, you do not need to set any `AdditionalProperties` to use these plugins. |
| 144 | + |
| 145 | +In both remote and nonshared submission modes, set `ClusterHost` to the name of the machine MATLAB should SSH to. |
| 146 | +This is the machine on which MATLAB runs scheduler utilities such as `sbatch` and `squeue`, so typically select the cluster head node or login node. |
| 147 | + |
| 148 | +In nonshared mode only, set `RemoteJobStorageLocation` to a folder available on the cluster for the workers to write results to. |
| 149 | +MATLAB uses SFTP to copy files to and from this folder. |
| 150 | + |
| 151 | +In the Cluster Profile Manager, add new `AdditionalProperties` by clicking **Add** under the table of `AdditionalProperties`. |
| 152 | +On the command line, use dot notation to add new fields: |
| 153 | +```matlab |
| 154 | +c.AdditionalProperties.ClusterHost = 'gridengine01.organization.com'; % MATLAB will SSH to gridengine01 to submit jobs |
| 155 | +``` |
| 156 | + |
| 157 | +### Save Your New Profile |
| 158 | + |
| 159 | +In the Cluster Profile Manager, click **Done**. |
| 160 | +On the command line, run: |
| 161 | +```matlab |
| 162 | +saveAsProfile(c, "myGrid EngineCluster"); |
| 163 | +``` |
| 164 | +Your cluster profile is now ready to use. |
| 165 | + |
| 166 | +### Validate the Cluster Profile |
| 167 | + |
| 168 | +Cluster validation submits one of each type of job to test the cluster profile has been configured correctly. |
| 169 | +In the Cluster Profile Manager, click the **Validate** button. |
| 170 | +If you make a change to a cluster profile, you can rerun cluster validation to ensure there are no errors. |
| 171 | +You do not need to validate each time you use the profile or each time you start MATLAB. |
| 172 | + |
| 173 | +## Examples |
| 174 | + |
| 175 | +First create a cluster object using your profile: |
| 176 | +```matlab |
| 177 | +c = parcluster("myGrid EngineCluster") |
| 178 | +``` |
| 179 | + |
| 180 | +### Submit Work for Batch Processing |
| 181 | + |
| 182 | +The `batch` command runs a MATLAB script or function on a worker on the cluster. |
| 183 | +For more information about batch processing, see the documentation for the [batch command](https://mathworks.com/help/parallel-computing/batch.html). |
| 184 | + |
| 185 | +```matlab |
| 186 | +% Create and submit a job to the cluster |
| 187 | +job = batch( ... |
| 188 | + c, ... % cluster object created using parcluster |
| 189 | + @sqrt, ... % function/script to run |
| 190 | + 1, ... % number of output arguments |
| 191 | + {[64 100]}); % input arguments |
| 192 | +
|
| 193 | +% Your MATLAB session is now available to do other work, such |
| 194 | +% as create and submit more jobs to the cluster. You can also |
| 195 | +% shut down your MATLAB session and come back later - the work |
| 196 | +% will continue running on the cluster. Once you've recreated |
| 197 | +% the cluster object using parcluster, you can view existing |
| 198 | +% jobs using the Jobs property on the cluster object. |
| 199 | +
|
| 200 | +% Wait for the job to complete. If the job is already complete, |
| 201 | +% this will return immediately. |
| 202 | +wait(job); |
| 203 | +
|
| 204 | +% Retrieve the output arguments for each task. For this example, |
| 205 | +% results will be a 1x1 cell array containing the vector [8 10]. |
| 206 | +results = fetchOutputs(job) |
| 207 | +``` |
| 208 | + |
| 209 | +### Open a Parallel Pool |
| 210 | + |
| 211 | +A parallel pool (parpool) is a group of MATLAB workers that you can interactively run work on. |
| 212 | +When you run the `parpool` command, MATLAB will submit a special job to the cluster to start the workers. |
| 213 | +Once the workers have started, your MATLAB session will connect to them. |
| 214 | +Depending on the network configuration at your organization, including whether it is permissible to connect to a program running on a compute node, parpools may not be functional in nonshared submission mode. |
| 215 | +For more information about parpools, see the documentation for the [parpool command](https://mathworks.com/help/parallel-computing/parpool.html). |
| 216 | + |
| 217 | +```matlab |
| 218 | +% Open a parallel pool on the cluster. This command will return |
| 219 | +% once the pool is opened. |
| 220 | +pool = parpool(c); |
| 221 | +
|
| 222 | +% List the hosts the workers are running on. For a small pool, |
| 223 | +% all the workers will likely be on the same machine. For a large |
| 224 | +% pool, the workers will be spread over multiple nodes. |
| 225 | +future = parfevalOnAll(p, @getenv, 1, 'HOST') |
| 226 | +wait(future); |
| 227 | +fetchOutputs(future) |
| 228 | +
|
| 229 | +% Output the numbers 1 to 10 in a parallel for (parfor) loop. |
| 230 | +% Unlike a regular for loop, iterations of the loop will not |
| 231 | +% be executed in order. |
| 232 | +parfor idx = 1:10 |
| 233 | + disp(idx) |
| 234 | +end |
| 235 | +
|
| 236 | +% Use the pool to calculate the first 500 magic squares. |
| 237 | +parfor idx = 1:500 |
| 238 | + magicSquare{idx} = magic(idx); |
| 239 | +end |
| 240 | +``` |
| 241 | + |
| 242 | +## License |
| 243 | + |
| 244 | +The license is available in the [license.txt](license.txt) file in this repository. |
| 245 | + |
| 246 | +## Community Support |
| 247 | + |
| 248 | +[MATLAB Central](https://www.mathworks.com/matlabcentral) |
| 249 | + |
| 250 | +## Technical Support |
| 251 | + |
| 252 | +If you require assistance or have a request for additional features or capabilities, please contact [MathWorks Technical Support](https://www.mathworks.com/support/contact_us.html). |
| 253 | + |
| 254 | +Copyright 2022 The MathWorks, Inc. |
0 commit comments