Restrictions on Rank-Host Mapping in MPICH2

In User’s Manual of MPICH2, it is recommended to start a parallel task (consisting multiple processes on one or more computers) using the command: mpiexec.

mpiexec accepts a command line parameter --machinefile which specifies a text file, whose each line is a machine (host) name, on which one or more processes can be started. An example machine file given in the manual is:

hosta
hostb:2
hostc
hostd:4

which means to start one process on hosta and hostc respectively, to 2 on hostb and 4 on hostd.

However, this example does not likely work in practice, because MPD constraints the pattern of rank-host mapping. (In terminology of MPI, rank is zero-based process id.) Indeed, by checking the MPD code (mpd.py) in MPICH2 1.2.1p1, we find that only two patterns of rank-host mapping are currently supported:

  1. block regular, and
  2. Round-Robin regular.

Block regular is easy to understand — given a set of (unique) hosts, to start k processes per host.  Round-Robin regular is a little more complex pattern, whose details can be found on Wikipedia.