math-cluster4
.
We are running the mpich implementation of MPI, and all explanations
pertain to that.
My intention is to expand or modify this page during the semester in order to include other information as needed. If you find mistakes on this page or have suggestions, please contact me.
pc51.math.umbc.edu
,
node2.math-cluster4.math.umbc.edu
,
node3.math-cluster4.math.umbc.edu
, and
node4.math-cluster4.math.umbc.edu
.
I will refer to them as pc51
, node2
,
node3
, and node4
, respectively, in the following.
Each machine has two 1000 MHz Intel Pentium III processors and 1 GB of memory.
The four nodes of math-cluster4
are connected
by 100 Mbps ethernet cables and a dedicated switch;
therefore, the cluster is called a Beowulf cluster.
The cables and the switch are off-the-shelf commodity products.
All four computers access the central SCSI harddisk on pc51
.
The only machine with a connection to the outside network is pc51
.
The department uses the port number 49207 for ssh connections, so
you would log in to pc51
ssh -p 49207 username@pc51.math.umbc.eduwhere username denotes your username on
math-cluster4
.
Here, 49207 with the -p flag specifies the port number.
You may want to try that you can
connect to the other machines like node2
from pc51
using ssh node2 for instance. You will need to enter a
password every time; this fact makes parallel computing impractical,
and you must implement the following instructions to overcome this problem.
pc51
and start by saying in your home directory
ssh-keygen -t dsaat the Linux prompt; do not choose any passphrase, rather just hit RETURN. Then copy
.ssh/id_dsa.pub
to .ssh/authorized_keys2
by
cp .ssh/id_dsa.pub .ssh/authorized_keys2Try to see that you can now log onto the other machines from
pc51
just by saying ssh node2 without being prompted for a password.
Finally, set up one environment variable and add the MPI directory
to the path, as follows. Add the following two lines to your
.cshrc
file:
setenv MPIHOME /usr/local/mpich set path = ( $path $MPIHOME/bin )Do a source .cshrc to activate the changes in
.cshrc
in your current shell.
To be precise, you will have to enter your password once to enable
the automatic authentication; to this end, you may want to
ssh to all machines
(node2, node3, and node4)
manually at this time.
From then on, you should be able to log in to these other machines
as well as run your parallel code without entering your password.
sample.c
that contains
some MPI commands. Compile and link the MPI code by
mpicc -o sample sample.cThe script mpicc (located in directory
$MPIHOME/bin
)
works on the surface just like the regular compiler gcc.
For instance, the option -o sample chooses the name of the
output file (here the executable file), and mpicc compiles
and links in one (apparent) step.
If your code includes mathematical functions (like exp
,
cos
, etc.), you need to link to the mathematics library
libm.so
. This is done, just like for serial compiling,
by adding -lm to the end of your compile command, that is,
mpicc -o sample sample.c -lmIn a similar fashion, other libraries can be linked.
More formally, I could also separate the compile from the link step.
That is, the C file sample.c
is first compiled into object code
sample.o
, which then gets linked to the required libraries
to obtain the executable sample
.
The sequence of the two commands
mpicc -c sample.c mpicc -o sample sample.o -lmaccomplishes this, where the option
-c
stands for "compile-only".
See the man page of mpicc for more information by saying
man mpicc
mpirun -np 4 sampleAgain, mpirun is a script located in
$MPIHOME/bin
.
The last argument sample is the name of the executable file,
and the option -np 4 gives the number of processors that you
requesting for this MPI run.
The processors, on which your code is run, are taken from the default machinefile given by $MPIHOME/share/machines.LINUX. You can inspect its contents by saying
more $MPIHOME/share/machines.LINUX
If you wish to control more precisely, which machines your code is run
on, you should create your own machinefile called, for instance,
machines.sample
. If this file is placed in
your current directory along with the executable sample
,
you use it by saying
mpirun -machinefile machines.sample -np 4 sampleUsing -np 4 assumes of course that your machinefile contains a list of at least processors.
Notice that it clearly does not make sense to use more than 8 processors on our system, since it consists only of four dual PCs. See the man page of mpirun for more information by saying
man mpirun
mpirun
does not check, whether the
processors given in the machinefile (default or custom) are available
for use. That is, if someone else is already running a job on one or
more processors, it will still run your job, as requested. This has
two main disadvantages: (1) You are stepping on someone else and impeding
his/her running of code, and (2) your code will run slower than necessary,
if another processor is available.
While the second issue above should be reason enough to be more careful, the first issue is more important: We rely on all users to be courteous to other users by not stepping on them and impeding their work. There is no automatic scheduler set up on our system. Rather, scheduling is on a first-come first-serve basis. That means that it is your obligation before starting your job to make sure that all desired processors are available!
This means usually that you need to manually ssh to all desired machines and use the command top to assess the system status. Only then is it okay to start your parallel job. If you do not adhere to these common-sense rules of courtesy to other users, the department reserves the right to suspend or revoke your user privileges on this system without notice!
pc51
only allows secure
connection from the outside. Hence, the above instructions referred
to ssh. Correspondingly, to transfer files in and out of
the machine, you must use scp or similar secure software.
You will need to do this to print out code written on the cluster,
for instance.
Let me explain the use of scp by the following example:
The user username
has a file hello.c
in
directory math627/hw1
. To copy the file to the current
diretory on another Unix/Linux system with scp installed,
e.g., other machines in the departmental computer lab, use
scp -P 49207 username@pc51.math.umbc.edu:math627/hw1/hello.c .The period "." at the end of the above sample command is significant; it signifies that you want the file copied to your current directory (without changing the name of the file). Here, 49207 is the port number, which is given by the -P option of scp; notice carefully that this is different from the -p for ssh. (As a side note, -p for scp stands for "preserve" and preserves the date stamp and other settings.)
As with ssh, you can leave out the username@,
if your username agrees on both machines. If issuing the command
from within the Mathematics domain, you can also abbreviate the
machine name to pc51
.