Running MPI

The following assumes that you are using MPICH on a cluster of machines running some variant of UNIX for which you have access to all or some of the nodes via the mpirun command. (modifications for LAM and for using schedulers to be written later.) It is also assumed that you are executing the commands to compile, copy, and run the code from a command line, or in a terminal window.

Typically, running an MPI program will consist of three steps:

Compile

Copy

Execute

Compile

Assuming you have code to compile (If you have binary executables only, proceed to step 2, copy) you need to create an executable. This involves compiling your code with the appropriate compiler, linked against the MPI libraries. It is possible to pass all options through a standard cc or f77 command, but MPICH provides a "wrapper" (mpicc for cc/gcc, mpiCC for c++/g++ on UNIX/Linux, mpicxx for c++/g++ on BSD/Mac OS X, and mpif77 for f77) that appropriately links against the MPI libraries and sets the appropriate include and library paths.

Example: hello.c (use your preferred text editor to create the file hello.c)

#include <stdio.h>
#include <mpi.h>

int main(int argc, char ** argv) {
    int rank, size;
    char name[80];
    int length;

    MPI_Init(&argc, &argv); // note that argc and argv are passed
                            // by address

    MPI_Comm_rank(MPI_COMM_WORLD,&rank);
    MPI_Comm_size(MPI_COMM_WORLD,&size);
    MPI_Get_processor_name(name,&length);

    printf("Hello MPI: processor %d of %d on %s\n", rank,size,name);
    MPI_Finalize();
}

After saving the above example file, you can compile the program using the mpicc command.

mpicc -o hello hello.c

The "-o" option provides an output file name, otherwise your executable would be saved as "a.out". Be careful to make sure you provide an executable name if you use the "-o" option. Many programmers have deleted part of their source code by accidentally giving their source code as their output file name.

If you have typed the file correctly, you should succesfully compile the code, and an "ls" command should show that you have created the file "hello"

[chilledmonkey:~/Desktop/HELLO_MPI] djoiner% ls hello.c
[chilledmonkey:~/Desktop/HELLO_MPI] djoiner% mpicc -o hello hello.c
[chilledmonkey:~/Desktop/HELLO_MPI] djoiner% ls hello  
hello.c hello.o
[chilledmonkey:~/Desktop/HELLO_MPI] djoiner%

Copy

In order for your program to run on each node, the executable must exist on each node.

There are as many ways to make sure that your executable exists on all of the nodes as there are ways to put the cluster together in the first place. I will cover one method here appropriate for use with the BCCD bootable cluster cd.

This method will assume that an account (bccd) exists on all machines with the same home directory (/home/bccd), that authentication is being done via ssh, and that public keys have been shared for the account to allow for login and remote execution without a password.

(If you are using the BCCD bootable cluster CD and have not set up your nodes for remote access, make sure you have logged in to each machine as bccd, started the heartbeat program, and included the appropriate public keys in each machines authorized keys file. If there are no other BCCD nodes on your network, on each node check to make sure you have all of the public keys from the other nodes [ls /tmp/bccd] and add those public keys to each nodes authorized keys file [cat /tmp/bccd/*/id_rsa.pub >> /home/bccd/.ssh/authorized_keys])

One command that can be used to copy files between machines is "rsync"; rsync is a unix command that will synchronize files between remote machines, and in its simplest use acts as a secure remote copy.

It takes similar arguments to the unix "cp" command.

If you have saved your example in a directory HELLO (i.e. the file is saved as /home/bccd/HELLO/hello.c) the following command will copy the entire HELLO directory to a remote node.

rsync -arv /home/bccd/HELLO bccd@node_hostname:/home/bccd/

The format for rsync is rsync arguments source destination. Here, the arguments are arv (archive, recursive, verbose), the source is /home/bccd/HELLO, and the desitination is in username@host:location format, which in this case is bccd@node_hostname:/home/bccd/.

This will need to be done for each host.

If you are not sure if your copy worked properly, ssh into each host, and check to see that the files are there using the "ls" command.

Execute

Once you have compiled the code and copied it to all of the nodes, you can run the code using the mpirun command.

Two of the more common arguments to the mpirun command are the "np" argument that lets you specify how many processors to use, and the "machinefile" argument that lets you specify exactly which nodes are available for use.

On many machines, the machinefile argument will be optional, and will default to the file mpich_install_dir/util/machines/machines.ARCH or something similar, where ARCH stands for that machines' architecture (e.g. FreeBSD,LINUX, etc.). For BCCD cluster cd users, since the mpich directory is on CD, this file cannot be modified, and you should provide your own machines file. (For BCCD users, a simple way of creating a machines file from available BCCD nodes is to check to see which nodes have broadcast their public keys. [ls /tmp/bccd > /home/bccd/machines].) For this example, we will assume that a user bccd with a home directory /home/bccd has a machines file /home/bccd/machines.

Change directory to the file where your executable is located, and run your hello command using 4 processes:

mpirun -np 4 -machinefile /home/bccd/machines ./hello

CSERD

Running MPI Programs Tutorial

Running MPI

Compile

Copy

Execute