math-cluster4. We are running the mpich implementation of MPI, and all explanations pertain to that.
My intention is to expand or modify this page during the semester in order to include other information as needed. If you find mistakes on this page or have suggestions, please contact me.
node4.math-cluster4.math.umbc.edu. I will refer to them as
node4, respectively, in the following.
Each machine has two 1000 MHz Intel Pentium III processors and 1 GB of memory.
The four nodes of
math-cluster4 are connected
by 100 Mbps ethernet cables and a dedicated switch;
therefore, the cluster is called a Beowulf cluster.
The cables and the switch are off-the-shelf commodity products.
All four computers access the central SCSI harddisk on
The only machine with a connection to the outside network is
The department uses the port number 49207 for ssh connections, so
you would log in to
ssh -p 49207 email@example.com username denotes your username on
math-cluster4. Here, 49207 with the -p flag specifies the port number. You may want to try that you can connect to the other machines like
pc51using ssh node2 for instance. You will need to enter a password every time; this fact makes parallel computing impractical, and you must implement the following instructions to overcome this problem.
pc51and start by saying in your home directory
ssh-keygen -t dsaat the Linux prompt; do not choose any passphrase, rather just hit RETURN. Then copy
cp .ssh/id_dsa.pub .ssh/authorized_keys2Try to see that you can now log onto the other machines from
pc51just by saying ssh node2 without being prompted for a password.
Finally, set up one environment variable and add the MPI directory
to the path, as follows. Add the following two lines to your
setenv MPIHOME /usr/local/mpich set path = ( $path $MPIHOME/bin )Do a source .cshrc to activate the changes in
.cshrcin your current shell. To be precise, you will have to enter your password once to enable the automatic authentication; to this end, you may want to ssh to all machines (node2, node3, and node4) manually at this time. From then on, you should be able to log in to these other machines as well as run your parallel code without entering your password.
sample.cthat contains some MPI commands. Compile and link the MPI code by
mpicc -o sample sample.cThe script mpicc (located in directory
$MPIHOME/bin) works on the surface just like the regular compiler gcc. For instance, the option -o sample chooses the name of the output file (here the executable file), and mpicc compiles and links in one (apparent) step.
If your code includes mathematical functions (like
cos, etc.), you need to link to the mathematics library
libm.so. This is done, just like for serial compiling,
by adding -lm to the end of your compile command, that is,
mpicc -o sample sample.c -lmIn a similar fashion, other libraries can be linked.
More formally, I could also separate the compile from the link step.
That is, the C file
sample.c is first compiled into object code
sample.o, which then gets linked to the required libraries
to obtain the executable
The sequence of the two commands
mpicc -c sample.c mpicc -o sample sample.o -lmaccomplishes this, where the option
-cstands for "compile-only".
See the man page of mpicc for more information by saying
mpirun -np 4 sampleAgain, mpirun is a script located in
$MPIHOME/bin. The last argument sample is the name of the executable file, and the option -np 4 gives the number of processors that you requesting for this MPI run.
The processors, on which your code is run, are taken from the default machinefile given by $MPIHOME/share/machines.LINUX. You can inspect its contents by saying
If you wish to control more precisely, which machines your code is run
on, you should create your own machinefile called, for instance,
machines.sample. If this file is placed in
your current directory along with the executable
you use it by saying
mpirun -machinefile machines.sample -np 4 sampleUsing -np 4 assumes of course that your machinefile contains a list of at least processors.
Notice that it clearly does not make sense to use more than 8 processors on our system, since it consists only of four dual PCs. See the man page of mpirun for more information by saying
mpirundoes not check, whether the processors given in the machinefile (default or custom) are available for use. That is, if someone else is already running a job on one or more processors, it will still run your job, as requested. This has two main disadvantages: (1) You are stepping on someone else and impeding his/her running of code, and (2) your code will run slower than necessary, if another processor is available.
While the second issue above should be reason enough to be more careful, the first issue is more important: We rely on all users to be courteous to other users by not stepping on them and impeding their work. There is no automatic scheduler set up on our system. Rather, scheduling is on a first-come first-serve basis. That means that it is your obligation before starting your job to make sure that all desired processors are available!
This means usually that you need to manually ssh to all desired machines and use the command top to assess the system status. Only then is it okay to start your parallel job. If you do not adhere to these common-sense rules of courtesy to other users, the department reserves the right to suspend or revoke your user privileges on this system without notice!
pc51only allows secure connection from the outside. Hence, the above instructions referred to ssh. Correspondingly, to transfer files in and out of the machine, you must use scp or similar secure software. You will need to do this to print out code written on the cluster, for instance.
Let me explain the use of scp by the following example:
username has a file
math627/hw1. To copy the file to the current
diretory on another Unix/Linux system with scp installed,
e.g., other machines in the departmental computer lab, use
scp -P 49207 firstname.lastname@example.org:math627/hw1/hello.c .The period "." at the end of the above sample command is significant; it signifies that you want the file copied to your current directory (without changing the name of the file). Here, 49207 is the port number, which is given by the -P option of scp; notice carefully that this is different from the -p for ssh. (As a side note, -p for scp stands for "preserve" and preserves the date stamp and other settings.)
As with ssh, you can leave out the username@,
if your username agrees on both machines. If issuing the command
from within the Mathematics domain, you can also abbreviate the
machine name to