MPI Example
The tutorial below shows you how to run Wes Kendall's basic "hello world" program, written in C, using the message passing interface (MPI) to scale across our HPC compute nodes [1]. The test will be submitted to the HPC via a SLURM (Simple Linux Utility for Resource Management) batch scheduling system.
Additional examples can be found in C++, Fortran or Python sections.
Table of Contents
Note: Do not execute jobs on the login nodes; only use the login nodes to access your compute nodes. Processor-intensive, memory-intensive, or otherwise disruptive processes running on login nodes will be killed without warning.
Step 1: Access Your Allocation
Open a Bash terminal.
Execute
ssh username@hpc.cofc.edu
.When prompted, enter your password.
Once you have connected to the login node, you can proceed to Step 2 and begin assembling your SLURM submission script.
Step 2: Create a SLURM Script
Below is the SLURM script we are using to run an MPI "hello world" program as a batch job. SLURM scripts use variables to specify things like the number of nodes and cores used to execute your job, estimated walltime for your job, and which compute resources to use (e.g., GPU vs. CPU). The sections below feature an example Slurm script for our HPC resources, show you how to create and save and submit your own SLURM script to run on our HPC.
Consult the official SLURM documentation and FAQ for a complete list of options and common questions.
Example SLURM Script
Here is an example SLURM script for running a batch job on our HPC. Please save it to a file named mpi-test.slurm
. We break down each command in the section below.
SLURM Script Breakdown
You can always type man sbatch
to see all the SLURM batch submission options. Below is an explanation of the options used above.
- | Option | Description |
SBATCH | -p, | Submit the job to |
SBATCH | -J, | Name the job as |
SBATCH | -o, | Write the job's standard output to the file name named |
SBATCH | -e, | Write the job's standard error messages to the file name named |
SBATCH |
| Notify user by email when certain event types occur, as specified by the |
SBATCH |
| Notify user by email when certain event types occur. |
SBATCH | -N, | Request that |
SBATCH |
| Request that |
SBATCH |
| Specify the real memory required per node in the proper unit. |
SBATCH |
| Specify memory per core. 4GB is a reasonable number. |
SBATCH | -t, | Maximum run time for your job in the format |
Step 3: Compile the C Program from Source
Below is Wes Kendall's simple "hello world" C program that utilizes MPI to run the job in parallel [1]. We will need to compile this source code on one of the compute nodes.
MPI Hello World Source Code
C Procedure
When creating and editing your hello_world.c
source code, we will be working on the login node using the text editor such as Vi, Emacs or Nano.
Create a file named
hello_world.c
and paste the contents of the above code there.Load the compiler and MPI library. Enter
module list
to see if what modules are loaded. If MPICH is not loaded, swap the current MPI library with MPICH to proceed.Please note that
GNU8
andOpenMPI3
are the defaults on our cluster. This exercise suggests that we use a different flavor of MPI called MPICH. So, search for the available MPICH module.Try loading the suggested MPICH module, namely
mpich/3.3
As noted above, you can only have one MPI library in your path at a time. Therefore, you would need to swap the default
openmpi3
library withmpich
Compile the C source into a binary executable file.
Use
ls -al
to verify the presence of thehello_world_c
binary in your working directory.
With the C code compiled into a binary (hello_world_c
), we can now schedule and run the job on our compute nodes.
Step 4: Run the Job
Before proceeding, please check the path/directory as your SLURM script and C binary. Use
ls -al
to confirm their presence.Use
sbatch
to schedule your batch job in the queue.This command will automatically queue your job using SLURM and produce a job number (shown below). You can check the status of your job at any time with the
squeue
command.You can also stop your job at any time with the
scancel
command.View your results. Once your job completes, SLURM will produce two output/data files. These output/data files, unless otherwise specified in the SLURM script, are placed in the same path as your binary. One file (
MPItest-<jobnumber>.out
) contains the results of the binary you just executed, and the other (MPItest-<jobnumber>.err
) contains any errors that occurred during execution. Please replace " with your job number. You can view the contents of these files using themore
command followed by the file name.Your output should look something like this, with one line per processor core (20 in this case):
You have successfully created an MPI code and run it through a batch queue manager!
Works Cited
Wes Kendall, "MPI Hello World," MPI Tutorial, accessed June 14, 2017, http://mpitutorial.com/tutorials/mpi-hello-world/.
Additional Examples
Last updated