Skip to content

Commit 44f074e

Browse files
authored
Merge pull request #2 from mathworks/discovery
Add cluster discovery files
2 parents a9c6aed + 8f04183 commit 44f074e

12 files changed

+300
-89
lines changed

README.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Parallel Computing Toolbox plugin for MATLAB Parallel Server with Grid Engine
22

3-
[![View on File Exchange](https://www.mathworks.com/matlabcentral/images/matlab-file-exchange.svg)](https://www.mathworks.com/matlabcentral/fileexchange/52816)
3+
[![View Parallel Computing Toolbox Plugin for Grid Engine on File Exchange](https://www.mathworks.com/matlabcentral/images/matlab-file-exchange.svg)](https://mathworks.com/matlabcentral/fileexchange/127389-parallel-computing-toolbox-plugin-for-grid-engine)
44

55
Parallel Computing Toolbox™ provides the `Generic` cluster type for submitting MATLAB® jobs to a cluster running a third-party scheduler.
66
The `Generic` cluster type uses a set of plugin scripts to define how your machine communicates with your scheduler.
@@ -66,6 +66,16 @@ $ echo "hostname" | qsub -pe matlab 1
6666
Check that the job runs correctly using "qstat", and check that the output file contains the name of the host that ran the job.
6767
The default filename for the output file is "~/STDIN.o###", where "###" is the Grid Engine job number.
6868

69+
### Cluster Discovery
70+
71+
Since version R2023a, MATLAB can discover clusters running third-party schedulers such as Grid Engine.
72+
As a cluster admin, you can create a configuration file that describes how to configure the Parallel Computing Toolbox on the user's machine to submit MATLAB jobs to the cluster.
73+
The cluster configuration file is a plain text file with the extension `.conf` containing key-value pairs that describe the cluster configuration information.
74+
The MATLAB client will use the cluster configuration file to create a cluster profile for the user who discovers the cluster.
75+
Therefore, users will not need to follow the instructions in the sections below.
76+
You can find an example of a cluster configuration file in [discover/example.conf](discover/example.conf).
77+
For full details on how to make a cluster running a third-party scheduler discoverable, see the documentation for [Configure for Third-Party Scheduler Cluster Discovery](https://mathworks.com/help/matlab-parallel-server/configure-for-cluster-discovery.html).
78+
6979
### Create a Cluster Profile in MATLAB
7080

7181
Create a cluster profile by using either the Cluster Profile Manager or the MATLAB Command Window.
@@ -257,4 +267,4 @@ The license is available in the [license.txt](license.txt) file in this reposito
257267

258268
If you require assistance or have a request for additional features or capabilities, please contact [MathWorks Technical Support](https://www.mathworks.com/support/contact_us.html).
259269

260-
Copyright 2022 The MathWorks, Inc.
270+
Copyright 2022-2023 The MathWorks, Inc.

communicatingJobWrapper.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
# The following environment variables are set by Grid Engine:
1919
# PE_HOSTFILE - list of hostnames with their associated number of processors allocated to this Grid Engine job
2020

21-
# Copyright 2006-2022 The MathWorks, Inc.
21+
# Copyright 2006-2023 The MathWorks, Inc.
2222

2323
# If PARALLEL_SERVER_ environment variables are not set, assign any
2424
# available values with form MDCE_ for backwards compatibility

communicatingJobWrapperSmpd.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
# PARALLEL_SERVER_STORAGE_CONSTRUCTOR - used by decode function
1717
# PARALLEL_SERVER_JOB_LOCATION - used by decode function
1818

19-
# Copyright 2006-2022 The MathWorks, Inc.
19+
# Copyright 2006-2023 The MathWorks, Inc.
2020

2121
# If PARALLEL_SERVER_ environment variables are not set, assign any
2222
# available values with form MDCE_ for backwards compatibility

communicatingSubmitFcn.m

Lines changed: 38 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)
66
%
77
% See also parallel.cluster.generic.communicatingDecodeFcn.
88

9-
% Copyright 2010-2022 The MathWorks, Inc.
9+
% Copyright 2010-2023 The MathWorks, Inc.
1010

11-
% Store the current filename for the errors, warnings and dctSchedulerMessages
11+
% Store the current filename for the errors, warnings and dctSchedulerMessages.
1212
currFilename = mfilename;
1313
if ~isa(cluster, 'parallel.Cluster')
1414
error('parallelexamples:GenericGridEngine:NotClusterObject', ...
@@ -17,9 +17,22 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)
1717

1818
decodeFunction = 'parallel.cluster.generic.communicatingDecodeFcn';
1919

20-
if ~strcmpi(cluster.OperatingSystem, 'unix')
20+
clusterOS = cluster.OperatingSystem;
21+
if ~strcmpi(clusterOS, 'unix')
2122
error('parallelexamples:GenericGridEngine:UnsupportedOS', ...
22-
'The function %s only supports clusters with unix OS.', currFilename)
23+
'The function %s only supports clusters with the unix operating system.', currFilename)
24+
end
25+
26+
% Get the correct quote and file separator for the Cluster OS.
27+
% This check is unnecessary in this file because we explicitly
28+
% checked that the ClusterOsType is unix. This code is an example
29+
% of how to deal with clusters that can be unix or pc.
30+
if strcmpi(clusterOS, 'unix')
31+
quote = '''';
32+
fileSeparator = '/';
33+
else
34+
quote = '"';
35+
fileSeparator = '\';
2336
end
2437

2538
if isprop(cluster.AdditionalProperties, 'ClusterHost')
@@ -47,18 +60,6 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)
4760
end
4861
end
4962

50-
% Get the correct quote and file separator for the Cluster OS.
51-
% This check is unnecessary in this file because we explicitly
52-
% checked that the ClusterOsType is unix. This code is an example
53-
% of how to deal with clusters that can be unix or pc.
54-
if strcmpi(cluster.OperatingSystem, 'unix')
55-
quote = '''';
56-
fileSeparator = '/';
57-
else
58-
quote = '"';
59-
fileSeparator = '\';
60-
end
61-
6263
% The job specific environment variables
6364
% Remove leading and trailing whitespace from the MATLAB arguments
6465
matlabArguments = strtrim(environmentProperties.MatlabArguments);
@@ -104,6 +105,7 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)
104105
else
105106
jobDirectoryOnCluster = remoteConnection.getRemoteJobLocation(job.ID, cluster.OperatingSystem);
106107
end
108+
107109
% Specify the job wrapper script to use.
108110
% Prior to R2019a, only the SMPD process manager is supported.
109111
if verLessThan('matlab', '9.6') || ...
@@ -118,7 +120,7 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)
118120
dirpart = fileparts(mfilename('fullpath'));
119121
localScript = fullfile(dirpart, jobWrapperName);
120122
% Copy the local wrapper script to the job directory
121-
copyfile(localScript, localJobDirectory);
123+
copyfile(localScript, localJobDirectory, 'f');
122124

123125
% The script to execute on the cluster to run the job
124126
wrapperPath = sprintf('%s%s%s', jobDirectoryOnCluster, fileSeparator, jobWrapperName);
@@ -143,17 +145,26 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)
143145
commonSubmitArgs = getCommonSubmitArgs(cluster);
144146
additionalSubmitArgs = strtrim(sprintf('%s %s', additionalSubmitArgs, commonSubmitArgs));
145147

146-
% Create a script to submit a Grid Engine job - this will be created in the job directory
147-
dctSchedulerMessage(5, '%s: Generating script for job.', currFilename);
148-
localSubmitScriptPath = tempname(localJobDirectory);
149-
createSubmitScript(localSubmitScriptPath, jobName, quotedLogFile, quotedWrapperPath, ...
150-
variables, additionalSubmitArgs);
148+
% Extension to use for scripts
149+
scriptExt = '.sh';
151150

152-
% Path to the submit script as seen by the cluster
153-
[~, submitScriptName] = fileparts(localSubmitScriptPath);
154-
submitScriptPathOnCluster = sprintf('%s%s%s', jobDirectoryOnCluster, fileSeparator, submitScriptName);
151+
% Path to the submit script, to submit the Grid Engine job using qsub
152+
localSubmitScriptPath = [tempname(localJobDirectory) scriptExt];
153+
[~, submitScriptName, submitScriptExt] = fileparts(localSubmitScriptPath);
154+
submitScriptPathOnCluster = sprintf('%s%s%s%s', jobDirectoryOnCluster, fileSeparator, submitScriptName, submitScriptExt);
155155
quotedSubmitScriptPathOnCluster = sprintf('%s%s%s', quote, submitScriptPathOnCluster, quote);
156156

157+
% Path to the environment wrapper, which will set the environment variables
158+
% for the job then execute the job wrapper
159+
localEnvScriptPath = [tempname(localJobDirectory) scriptExt];
160+
[~, envScriptName, envScriptExt] = fileparts(localEnvScriptPath);
161+
envScriptPathOnCluster = sprintf('%s%s%s%s', jobDirectoryOnCluster, fileSeparator, envScriptName, envScriptExt);
162+
quotedEnvScriptPathOnCluster = sprintf('%s%s%s', quote, envScriptPathOnCluster, quote);
163+
164+
createEnvironmentWrapper(localEnvScriptPath, quotedWrapperPath, variables);
165+
createSubmitScript(localSubmitScriptPath, jobName, quotedLogFile, ...
166+
quotedEnvScriptPathOnCluster, additionalSubmitArgs);
167+
157168
% Create the command to run on the cluster
158169
commandToRun = sprintf('sh %s', quotedSubmitScriptPathOnCluster);
159170

@@ -163,13 +174,13 @@ function communicatingSubmitFcn(cluster, job, environmentProperties)
163174
remoteConnection.startMirrorForJob(job);
164175
end
165176

166-
if isprop(cluster.AdditionalProperties, 'ClusterHost')
177+
if strcmpi(cluster.OperatingSystem, 'unix')
167178
% Add execute permissions to shell scripts
168179
runSchedulerCommand(cluster, sprintf( ...
169180
'chmod u+x %s%s*.sh', jobDirectoryOnCluster, fileSeparator));
170181
% Convert line endings to Unix
171182
runSchedulerCommand(cluster, sprintf( ...
172-
'dos2unix %s%s*.sh', jobDirectoryOnCluster, fileSeparator));
183+
'dos2unix --allow-chown %s%s*.sh', jobDirectoryOnCluster, fileSeparator));
173184
end
174185

175186
% Now ask the cluster to run the submission command

createEnvironmentWrapper.m

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
function createEnvironmentWrapper(outputFilename, quotedWrapperPath, environmentVariables)
2+
% Create a script that sets the correct environment variables and then
3+
% calls the job wrapper.
4+
5+
% Copyright 2023 The MathWorks, Inc.
6+
7+
dctSchedulerMessage(5, '%s: Creating environment wrapper at %s', mfilename, outputFilename);
8+
9+
% Open file in binary mode to make it cross-platform.
10+
fid = fopen(outputFilename, 'w');
11+
if fid < 0
12+
error('parallelexamples:GenericGridEngine:FileError', ...
13+
'Failed to open file %s for writing', outputFilename);
14+
end
15+
fileCloser = onCleanup(@() fclose(fid));
16+
17+
% Specify shell to use
18+
fprintf(fid, '#!/bin/sh\n');
19+
20+
formatSpec = 'export %s=''%s''\n';
21+
22+
% Write the commands to set and export environment variables
23+
for ii = 1:size(environmentVariables, 1)
24+
fprintf(fid, formatSpec, environmentVariables{ii,1}, environmentVariables{ii,2});
25+
end
26+
27+
% Write the command to run the job wrapper
28+
fprintf(fid, '%s\n', quotedWrapperPath);
29+
30+
end

createSubmitScript.m

Lines changed: 7 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
1-
function createSubmitScript(outputFilename, jobName, quotedLogFile, quotedWrapperPath, ...
2-
environmentVariables, additionalSubmitArgs, jobArrayString)
3-
% Create a script that sets the correct environment variables and then
4-
% executes the Grid Engine qsub command.
1+
function createSubmitScript(outputFilename, jobName, quotedLogFile, ...
2+
quotedWrapperPath, additionalSubmitArgs, jobArrayString)
3+
% Create a script that runs the Grid Engine qsub command.
54

6-
% Copyright 2010-2022 The MathWorks, Inc.
5+
% Copyright 2010-2023 The MathWorks, Inc.
76

8-
if nargin < 7
7+
if nargin < 6
98
jobArrayString = [];
109
end
1110

@@ -19,20 +18,11 @@ function createSubmitScript(outputFilename, jobName, quotedLogFile, quotedWrappe
1918
end
2019
fileCloser = onCleanup(@() fclose(fid));
2120

22-
% Specify Shell to use
21+
% Specify shell to use
2322
fprintf(fid, '#!/bin/sh\n');
2423

25-
% Write the commands to set and export environment variables
26-
for ii = 1:size(environmentVariables, 1)
27-
fprintf(fid, 'export %s=''%s''\n', environmentVariables{ii,1}, environmentVariables{ii,2});
28-
end
29-
30-
% Generate the command to run and write it.
31-
% We will forward all environment variables with this job in the call
32-
% to qsub
33-
variablesToForward = environmentVariables(:,1);
3424
commandToRun = getSubmitString(jobName, quotedLogFile, quotedWrapperPath, ...
35-
variablesToForward, additionalSubmitArgs, jobArrayString);
25+
additionalSubmitArgs, jobArrayString);
3626
fprintf(fid, '%s\n', commandToRun);
3727

3828
end

discover/example.conf

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Since version R2023a, MATLAB can discover clusters running third-party
2+
# schedulers such as Grid Engine. The Discover Clusters functionality
3+
# automatically configures the Parallel Computing Toolbox to submit MATLAB
4+
# jobs to the cluster. To use this functionality, you must create a cluster
5+
# configuration file and store it at a location accessible to MATLAB users.
6+
#
7+
# This file is an example of a cluster configuration which MATLAB can
8+
# discover. You can copy and modify this file to make your cluster discoverable.
9+
#
10+
# For more information, including the required format for this file, see
11+
# the online documentation for making a cluster running a third-party
12+
# scheduler discoverable:
13+
# https://mathworks.com/help/matlab-parallel-server/configure-for-cluster-discovery.html
14+
15+
# Copyright 2023 The MathWorks, Inc.
16+
17+
# The name MATLAB will display for the cluster when discovered.
18+
Name = My Grid Engine cluster
19+
20+
# Maximum number of MATLAB workers a single user can use in a single job.
21+
# This number must not exceed the number of available MATLAB Parallel
22+
# Server licenses.
23+
NumWorkers = 32
24+
25+
# Path to the MATLAB install on the cluster for the workers to use. Note
26+
# the variable "$MATLAB_VERSION_STRING" returns the release number of the
27+
# MATLAB client that is running discovery, e.g. 2023a. If multiple versions
28+
# of MATLAB are installed on the cluster, this allows discovery to select
29+
# the correct installation path. Add a leading "R" or "r" if needed to
30+
# complete the MATLAB version.
31+
ClusterMatlabRoot = /opt/matlab/R"$MATLAB_VERSION_STRING"
32+
33+
# Location where the MATLAB client stores job and task information.
34+
JobStorageLocation = /home/matlabjobs
35+
# If the client and cluster share a filesystem but the client is running
36+
# the Windows operating system and the cluster running a Linux operating
37+
# system, you must specify the JobStorageLocation using a structure by
38+
# commenting out the previous line and uncommenting the following lines.
39+
# The 'windows' and 'unix' fields must correspond to the same folder as
40+
# viewed from each of those operating systems.
41+
#JobStorageLocation.windows = \\organization\home\matlabjobs
42+
#JobStorageLocation.unix = /organization/home/matlabjobs
43+
44+
# Folder that contains the scheduler plugin scripts that describe how
45+
# MATLAB interacts with the scheduler. A property can take different values
46+
# depending on the operating system of the client MATLAB by specifying the
47+
# name of the OS in parentheses.
48+
PluginScriptsLocation (Windows) = \\organization\matlab\pluginscripts
49+
PluginScriptsLocation (Unix) = /organization/matlab/pluginscripts
50+
51+
# The operating system on the cluster. Valid values are 'unix' and 'windows'.
52+
OperatingSystem = unix
53+
54+
# Specify whether client and cluster nodes share JobStorageLocation. To
55+
# configure MATLAB to copy job input and output files to and from the
56+
# cluster using SFTP, set this property to false and specify a value for
57+
# AdditionalProperties.RemoteJobStorageLocation below.
58+
HasSharedFilesystem = true
59+
60+
# Specify whether the cluster uses online licensing.
61+
RequiresOnlineLicensing = false
62+
63+
# LicenseNumber for the workers to use. Specify only if
64+
# RequiresOnlineLicensing is set to true.
65+
#LicenseNumber = 123456
66+
67+
[AdditionalProperties]
68+
69+
# To configure the user's machine to connect to the submission host via
70+
# SSH, uncomment the following line and enter the hostname of the cluster
71+
# machine that has the scheduler utilities to submit jobs.
72+
#ClusterHost = gridengine-headnode
73+
74+
# If the user's machine and the cluster nodes do not have a shared file
75+
# system, MATLAB can copy job input and output files to and from the
76+
# cluster using SFTP. To activate this feature, set HasSharedFilesystem
77+
# above to false. Then uncomment the following lines and enter the location
78+
# on the cluster to store job files.
79+
#RemoteJobStorageLocation (Windows) = /home/"$USERNAME"/.matlab/generic_cluster_jobs
80+
#RemoteJobStorageLocation (Unix) = /home/"$USER"/.matlab/generic_cluster_jobs
81+
82+
# Username to log in to ClusterHost with. On Linux and Mac, use the USER
83+
# environment variable. On Windows, use the USERNAME variable.
84+
Username (Unix) = "$USER"
85+
Username (Windows) = "$USERNAME"

discover/runDiscovery.sh

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
#!/bin/sh
2+
3+
# Copyright 2023 The MathWorks, Inc.
4+
5+
usage="$(basename "$0") matlabroot [folder] -- run third-party scheduler discovery in MATLAB R2023a onwards
6+
matlabroot - path to the folder where MATLAB is installed
7+
folder - folder to search for cluster configuration files
8+
(defaults to pwd)"
9+
10+
# Print usage
11+
if [ -z "$1" ] || [ "$1" = "-h" ] || [ "$1" = "--help" ] ; then
12+
echo "$usage"
13+
exit 0
14+
fi
15+
16+
# MATLAB executable to launch
17+
matlabExe="$1/bin/matlab"
18+
if [ ! -f "${matlabExe}" ] ; then
19+
echo "Could not find MATLAB executable at ${matlabExe}"
20+
exit 1
21+
fi
22+
23+
# Folder to run discovery on. If specified, wrap in single-quotes to make a MATLAB charvec.
24+
discoveryFolder="$2"
25+
if [ ! -z "$discoveryFolder" ] ; then
26+
discoveryFolder="'${discoveryFolder}'"
27+
fi
28+
29+
# Command to run in MATLAB
30+
matlabCmd="parallel.cluster.generic.discoverGenericClusters(${discoveryFolder})"
31+
32+
# Arguments to pass to MATLAB
33+
matlabArgs="-nojvm -parallelserver -batch"
34+
35+
# Build and run system command
36+
CMD="\"${matlabExe}\" ${matlabArgs} \"${matlabCmd}\""
37+
eval $CMD

0 commit comments

Comments
 (0)