e388450e98
This commit was SVN r24343.
1656 строки
68 KiB
Plaintext
1656 строки
68 KiB
Plaintext
Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana
|
|
University Research and Technology
|
|
Corporation. All rights reserved.
|
|
Copyright (c) 2004-2007 The University of Tennessee and The University
|
|
of Tennessee Research Foundation. All rights
|
|
reserved.
|
|
Copyright (c) 2004-2008 High Performance Computing Center Stuttgart,
|
|
University of Stuttgart. All rights reserved.
|
|
Copyright (c) 2004-2007 The Regents of the University of California.
|
|
All rights reserved.
|
|
Copyright (c) 2006-2010 Cisco Systems, Inc. All rights reserved.
|
|
Copyright (c) 2006-2007 Voltaire, Inc. All rights reserved.
|
|
Copyright (c) 2006-2010 Oracle and/or its affiliates. All rights reserved.
|
|
Copyright (c) 2007 Myricom, Inc. All rights reserved.
|
|
Copyright (c) 2008 IBM Corporation. All rights reserved.
|
|
Copyright (c) 2010 Oak Ridge National Labs. All rights reserved.
|
|
$COPYRIGHT$
|
|
|
|
Additional copyrights may follow
|
|
|
|
$HEADER$
|
|
|
|
===========================================================================
|
|
|
|
When submitting questions and problems, be sure to include as much
|
|
extra information as possible. This web page details all the
|
|
information that we request in order to provide assistance:
|
|
|
|
http://www.open-mpi.org/community/help/
|
|
|
|
The best way to report bugs, send comments, or ask questions is to
|
|
sign up on the user's and/or developer's mailing list (for user-level
|
|
and developer-level questions; when in doubt, send to the user's
|
|
list):
|
|
|
|
users@open-mpi.org
|
|
devel@open-mpi.org
|
|
|
|
Because of spam, only subscribers are allowed to post to these lists
|
|
(ensure that you subscribe with and post from exactly the same e-mail
|
|
address -- joe@example.com is considered different than
|
|
joe@mycomputer.example.com!). Visit these pages to subscribe to the
|
|
lists:
|
|
|
|
http://www.open-mpi.org/mailman/listinfo.cgi/users
|
|
http://www.open-mpi.org/mailman/listinfo.cgi/devel
|
|
|
|
Thanks for your time.
|
|
|
|
===========================================================================
|
|
|
|
Much, much more information is also available in the Open MPI FAQ:
|
|
|
|
http://www.open-mpi.org/faq/
|
|
|
|
===========================================================================
|
|
|
|
Detailed Open MPI v1.5 Feature List:
|
|
|
|
o Open MPI RunTime Environment (ORTE) improvements
|
|
- General robustness improvements
|
|
- Scalable job launch (we've seen ~16K processes in less than a
|
|
minute in a highly-optimized configuration)
|
|
- New process mappers
|
|
- Support for Platform/LSF environments (v7.0.2 and later)
|
|
- More flexible processing of host lists
|
|
- new mpirun cmd line options and associated functionality
|
|
|
|
o Fault-Tolerance Features
|
|
- Asynchronous, transparent checkpoint/restart support
|
|
- Fully coordinated checkpoint/restart coordination component
|
|
- Support for the following checkpoint/restart services:
|
|
- blcr: Berkeley Lab's Checkpoint/Restart
|
|
- self: Application level callbacks
|
|
- Support for the following interconnects:
|
|
- tcp
|
|
- mx
|
|
- openib
|
|
- sm
|
|
- self
|
|
- Improved Message Logging
|
|
|
|
o MPI_THREAD_MULTIPLE support for point-to-point messaging in the
|
|
following BTLs (note that only MPI point-to-point messaging API
|
|
functions support MPI_THREAD_MULTIPLE; other API functions likely
|
|
do not):
|
|
- tcp
|
|
- sm
|
|
- mx
|
|
- elan
|
|
- self
|
|
|
|
o Point-to-point Messaging Layer (PML) improvements
|
|
- Memory footprint reduction
|
|
- Improved latency
|
|
- Improved algorithm for multiple communication device
|
|
("multi-rail") support
|
|
|
|
o Numerous Open Fabrics improvements/enhancements
|
|
- Added iWARP support (including RDMA CM)
|
|
- Memory footprint and performance improvements
|
|
- "Bucket" SRQ support for better registered memory utilization
|
|
- XRC/ConnectX support
|
|
- Message coalescing
|
|
- Improved error report mechanism with Asynchronous events
|
|
- Automatic Path Migration (APM)
|
|
- Improved processor/port binding
|
|
- Infrastructure for additional wireup strategies
|
|
- mpi_leave_pinned is now enabled by default
|
|
|
|
o uDAPL BTL enhancements
|
|
- Multi-rail support
|
|
- Subnet checking
|
|
- Interface include/exclude capabilities
|
|
|
|
o Processor affinity
|
|
- Linux processor affinity improvements
|
|
- Core/socket <--> process mappings
|
|
|
|
o Collectives
|
|
- Performance improvements
|
|
- Support for hierarchical collectives (must be activated
|
|
manually; see below)
|
|
- Support for Voltaire FCA (Fabric Collective Accelerator) technology
|
|
|
|
o Miscellaneous
|
|
- MPI 2.1 compliant
|
|
- Sparse process groups and communicators
|
|
- Support for Cray Compute Node Linux (CNL)
|
|
- One-sided RDMA component (BTL-level based rather than PML-level
|
|
based)
|
|
- Aggregate MCA parameter sets
|
|
- MPI handle debugging
|
|
- Many small improvements to the MPI C++ bindings
|
|
- Valgrind support
|
|
- VampirTrace support
|
|
- Updated ROMIO to the version from MPICH2 1.0.7
|
|
- Removed the mVAPI IB stacks
|
|
- Display most error messages only once (vs. once for each
|
|
process)
|
|
- Many other small improvements and bug fixes, too numerous to
|
|
list here
|
|
|
|
Known issues
|
|
------------
|
|
|
|
o MPI_REDUCE_SCATTER does not work with counts of 0.
|
|
https://svn.open-mpi.org/trac/ompi/ticket/1559
|
|
|
|
o Please also see the Open MPI bug tracker for bugs beyond this release.
|
|
https://svn.open-mpi.org/trac/ompi/report
|
|
|
|
===========================================================================
|
|
|
|
The following abbreviated list of release notes applies to this code
|
|
base as of this writing (5 October 2010):
|
|
|
|
General notes
|
|
-------------
|
|
|
|
- Open MPI includes support for a wide variety of supplemental
|
|
hardware and software package. When configuring Open MPI, you may
|
|
need to supply additional flags to the "configure" script in order
|
|
to tell Open MPI where the header files, libraries, and any other
|
|
required files are located. As such, running "configure" by itself
|
|
may not include support for all the devices (etc.) that you expect,
|
|
especially if their support headers / libraries are installed in
|
|
non-standard locations. Network interconnects are an easy example
|
|
to discuss -- Myrinet and OpenFabrics networks, for example, both
|
|
have supplemental headers and libraries that must be found before
|
|
Open MPI can build support for them. You must specify where these
|
|
files are with the appropriate options to configure. See the
|
|
listing of configure command-line switches, below, for more details.
|
|
|
|
- The majority of Open MPI's documentation is here in this file, the
|
|
included man pages, and on the web site FAQ
|
|
(http://www.open-mpi.org/). This will eventually be supplemented
|
|
with cohesive installation and user documentation files.
|
|
|
|
- Note that Open MPI documentation uses the word "component"
|
|
frequently; the word "plugin" is probably more familiar to most
|
|
users. As such, end users can probably completely substitute the
|
|
word "plugin" wherever you see "component" in our documentation.
|
|
For what it's worth, we use the word "component" for historical
|
|
reasons, mainly because it is part of our acronyms and internal API
|
|
functionc calls.
|
|
|
|
- The run-time systems that are currently supported are:
|
|
- rsh / ssh
|
|
- LoadLeveler
|
|
- PBS Pro, Open PBS, Torque
|
|
- Platform LSF (v7.0.2 and later)
|
|
- SLURM
|
|
- Cray XT-3 and XT-4
|
|
- Sun Grid Engine (SGE) 6.1, 6.2 and open source Grid Engine
|
|
- Microsoft Windows CCP (Microsoft Windows server 2003 and 2008)
|
|
|
|
- Systems that have been tested are:
|
|
- Linux (various flavors/distros), 32 bit, with gcc, and Sun Studio 12
|
|
- Linux (various flavors/distros), 64 bit (x86), with gcc, Absoft,
|
|
Intel, Portland, Pathscale, and Sun Studio 12 compilers (*)
|
|
- OS X (10.4), 32 and 64 bit (i386, PPC, PPC64, x86_64), with gcc
|
|
and Absoft compilers (*)
|
|
- Solaris 10 update 2, 3 and 4, 32 and 64 bit (SPARC, i386, x86_64),
|
|
with Sun Studio 10, 11 and 12
|
|
|
|
(*) Be sure to read the Compiler Notes, below.
|
|
|
|
- Other systems have been lightly (but not fully tested):
|
|
- Other 64 bit platforms (e.g., Linux on PPC64)
|
|
- Microsoft Windows CCP (Microsoft Windows server 2003 and 2008);
|
|
see the README.WINDOWS file.
|
|
|
|
Compiler Notes
|
|
--------------
|
|
|
|
- Mixing compilers from different vendors when building Open MPI
|
|
(e.g., using the C/C++ compiler from one vendor and the F77/F90
|
|
compiler from a different vendor) has been successfully employed by
|
|
some Open MPI users (discussed on the Open MPI user's mailing list),
|
|
but such configurations are not tested and not documented. For
|
|
example, such configurations may require additional compiler /
|
|
linker flags to make Open MPI build properly.
|
|
|
|
- Open MPI does not support the Sparc v8 CPU target, which is the
|
|
default on Sun Solaris. The v8plus (32 bit) or v9 (64 bit)
|
|
targets must be used to build Open MPI on Solaris. This can be
|
|
done by including a flag in CFLAGS, CXXFLAGS, FFLAGS, and FCFLAGS,
|
|
-xarch=v8plus for the Sun compilers, -mcpu=v9 for GCC.
|
|
|
|
- At least some versions of the Intel 8.1 compiler seg fault while
|
|
compiling certain Open MPI source code files. As such, it is not
|
|
supported.
|
|
|
|
- The Intel 9.0 v20051201 compiler on IA64 platforms seems to have a
|
|
problem with optimizing the ptmalloc2 memory manager component (the
|
|
generated code will segv). As such, the ptmalloc2 component will
|
|
automatically disable itself if it detects that it is on this
|
|
platform/compiler combination. The only effect that this should
|
|
have is that the MCA parameter mpi_leave_pinned will be inoperative.
|
|
|
|
- Early versions of the Portland Group 6.0 compiler have problems
|
|
creating the C++ MPI bindings as a shared library (e.g., v6.0-1).
|
|
Tests with later versions show that this has been fixed (e.g.,
|
|
v6.0-5).
|
|
|
|
- The Portland Group compilers prior to version 7.0 require the
|
|
"-Msignextend" compiler flag to extend the sign bit when converting
|
|
from a shorter to longer integer. This is is different than other
|
|
compilers (such as GNU). When compiling Open MPI with the Portland
|
|
compiler suite, the following flags should be passed to Open MPI's
|
|
configure script:
|
|
|
|
shell$ ./configure CFLAGS=-Msignextend CXXFLAGS=-Msignextend \
|
|
--with-wrapper-cflags=-Msignextend \
|
|
--with-wrapper-cxxflags=-Msignextend ...
|
|
|
|
This will both compile Open MPI with the proper compile flags and
|
|
also automatically add "-Msignextend" when the C and C++ MPI wrapper
|
|
compilers are used to compile user MPI applications.
|
|
|
|
- Using the MPI C++ bindings with the Pathscale compiler is known
|
|
to fail, possibly due to Pathscale compiler issues.
|
|
|
|
- Using the Absoft compiler to build the MPI Fortran bindings on Suse
|
|
9.3 is known to fail due to a Libtool compatibility issue.
|
|
|
|
- Open MPI will build bindings suitable for all common forms of
|
|
Fortran 77 compiler symbol mangling on platforms that support it
|
|
(e.g., Linux). On platforms that do not support weak symbols (e.g.,
|
|
OS X), Open MPI will build Fortran 77 bindings just for the compiler
|
|
that Open MPI was configured with.
|
|
|
|
Hence, on platforms that support it, if you configure Open MPI with
|
|
a Fortran 77 compiler that uses one symbol mangling scheme, you can
|
|
successfully compile and link MPI Fortran 77 applications with a
|
|
Fortran 77 compiler that uses a different symbol mangling scheme.
|
|
|
|
NOTE: For platforms that support the multi-Fortran-compiler bindings
|
|
(i.e., weak symbols are supported), due to limitations in the MPI
|
|
standard and in Fortran compilers, it is not possible to hide these
|
|
differences in all cases. Specifically, the following two cases may
|
|
not be portable between different Fortran compilers:
|
|
|
|
1. The C constants MPI_F_STATUS_IGNORE and MPI_F_STATUSES_IGNORE
|
|
will only compare properly to Fortran applications that were
|
|
created with Fortran compilers that that use the same
|
|
name-mangling scheme as the Fortran compiler that Open MPI was
|
|
configured with.
|
|
|
|
2. Fortran compilers may have different values for the logical
|
|
.TRUE. constant. As such, any MPI function that uses the Fortran
|
|
LOGICAL type may only get .TRUE. values back that correspond to
|
|
the the .TRUE. value of the Fortran compiler that Open MPI was
|
|
configured with. Note that some Fortran compilers allow forcing
|
|
.TRUE. to be 1 and .FALSE. to be 0. For example, the Portland
|
|
Group compilers provide the "-Munixlogical" option, and Intel
|
|
compilers (version >= 8.) provide the "-fpscomp logicals" option.
|
|
|
|
You can use the ompi_info command to see the Fortran compiler that
|
|
Open MPI was configured with.
|
|
|
|
- The Fortran 90 MPI bindings can now be built in one of three sizes
|
|
using --with-mpi-f90-size=SIZE (see description below). These sizes
|
|
reflect the number of MPI functions included in the "mpi" Fortran 90
|
|
module and therefore which functions will be subject to strict type
|
|
checking. All functions not included in the Fortran 90 module can
|
|
still be invoked from F90 applications, but will fall back to
|
|
Fortran-77 style checking (i.e., little/none).
|
|
|
|
- trivial: Only includes F90-specific functions from MPI-2. This
|
|
means overloaded versions of MPI_SIZEOF for all the MPI-supported
|
|
F90 intrinsic types.
|
|
|
|
- small (default): All the functions in "trivial" plus all MPI
|
|
functions that take no choice buffers (meaning buffers that are
|
|
specified by the user and are of type (void*) in the C bindings --
|
|
generally buffers specified for message passing). Hence,
|
|
functions like MPI_COMM_RANK are included, but functions like
|
|
MPI_SEND are not.
|
|
|
|
- medium: All the functions in "small" plus all MPI functions that
|
|
take one choice buffer (e.g., MPI_SEND, MPI_RECV, ...). All
|
|
one-choice-buffer functions have overloaded variants for each of
|
|
the MPI-supported Fortran intrinsic types up to the number of
|
|
dimensions specified by --with-f90-max-array-dim (default value is
|
|
4).
|
|
|
|
Increasing the size of the F90 module (in order from trivial, small,
|
|
and medium) will generally increase the length of time required to
|
|
compile user MPI applications. Specifically, "trivial"- and
|
|
"small"-sized F90 modules generally allow user MPI applications to
|
|
be compiled fairly quickly but lose type safety for all MPI
|
|
functions with choice buffers. "medium"-sized F90 modules generally
|
|
take longer to compile user applications but provide greater type
|
|
safety for MPI functions.
|
|
|
|
Note that MPI functions with two choice buffers (e.g., MPI_GATHER)
|
|
are not currently included in Open MPI's F90 interface. Calls to
|
|
these functions will automatically fall through to Open MPI's F77
|
|
interface. A "large" size that includes the two choice buffer MPI
|
|
functions is possible in future versions of Open MPI.
|
|
|
|
|
|
General Run-Time Support Notes
|
|
------------------------------
|
|
|
|
- The Open MPI installation must be in your PATH on all nodes (and
|
|
potentially LD_LIBRARY_PATH, if libmpi is a shared library), unless
|
|
using the --prefix or --enable-mpirun-prefix-by-default
|
|
functionality (see below).
|
|
|
|
- Open MPI's run-time behavior can be customized via MCA ("MPI
|
|
Component Architecture") parameters (see below for more information
|
|
on how to get/set MCA parameter values). Some MCA parameters can be
|
|
set in a way that renders Open MPI inoperable (see notes about MCA
|
|
parameters later in this file). In particular, some parameters have
|
|
required options that must be included.
|
|
|
|
- If specified, the "btl" parameter must include the "self"
|
|
component, or Open MPI will not be able to deliver messages to the
|
|
same rank as the sender. For example: "mpirun --mca btl tcp,self
|
|
..."
|
|
- If specified, the "btl_tcp_if_exclude" paramater must include the
|
|
loopback device ("lo" on many Linux platforms), or Open MPI will
|
|
not be able to route MPI messages using the TCP BTL. For example:
|
|
"mpirun --mca btl_tcp_if_exclude lo,eth1 ..."
|
|
|
|
- Running on nodes with different endian and/or different datatype
|
|
sizes within a single parallel job is supported in this release.
|
|
However, Open MPI does not resize data when datatypes differ in size
|
|
(for example, sending a 4 byte MPI_DOUBLE and receiving an 8 byte
|
|
MPI_DOUBLE will fail).
|
|
|
|
|
|
MPI Functionality and Features
|
|
------------------------------
|
|
|
|
- All MPI-2.1 functionality is supported.
|
|
|
|
- MPI_THREAD_MULTIPLE support is included, but is only lightly tested.
|
|
It likely does not work for thread-intensive applications. Note
|
|
that *only* the MPI point-to-point communication functions for the
|
|
BTL's listed above are considered thread safe. Other support
|
|
functions (e.g., MPI attributes) have not been certified as safe
|
|
when simultaneously used by multiple threads.
|
|
|
|
Note that Open MPI's thread support is in a fairly early stage; the
|
|
above devices are likely to *work*, but the latency is likely to be
|
|
fairly high. Specifically, efforts so far have concentrated on
|
|
*correctness*, not *performance* (yet).
|
|
|
|
- MPI_REAL16 and MPI_COMPLEX32 are only supported on platforms where a
|
|
portable C datatype can be found that matches the Fortran type
|
|
REAL*16, both in size and bit representation.
|
|
|
|
- The "libompitrace" library is bundled in Open MPI and is installed
|
|
by default (it can be disabled via the --disable-libompitrace
|
|
flag). This library provides a simplistic tracing of select MPI
|
|
function calls via the MPI profiling interface. Linking it in to
|
|
your appliation via (e.g., via -lompitrace) will automatically
|
|
output to stderr when some MPI functions are invoked:
|
|
|
|
$ mpicc hello_world.c -o hello_world -lompitrace
|
|
$ mpirun -np 1 hello_world.c
|
|
MPI_INIT: argc 1
|
|
Hello, world, I am 0 of 1
|
|
MPI_BARRIER[0]: comm MPI_COMM_WORLD
|
|
MPI_FINALIZE[0]
|
|
$
|
|
|
|
Keep in mind that the output from the trace library is going to
|
|
stderr, so it may output in a slightly different order than the
|
|
stdout from your application.
|
|
|
|
This library is being offered as a "proof of concept" / convenience
|
|
from Open MPI. If there is interest, it is trivially easy to extend
|
|
it to printf for other MPI functions. Patches and/or suggestions
|
|
would be greatfully appreciated on the Open MPI developer's list.
|
|
|
|
Collectives
|
|
-----------
|
|
|
|
- The "hierarch" coll component (i.e., an implementation of MPI
|
|
collective operations) attempts to discover network layers of
|
|
latency in order to segregate individual "local" and "global"
|
|
operations as part of the overall collective operation. In this
|
|
way, network traffic can be reduced -- or possibly even minimized
|
|
(similar to MagPIe). The current "hierarch" component only
|
|
separates MPI processes into on- and off-node groups.
|
|
|
|
Hierarch has had sufficient correctness testing, but has not
|
|
received much performance tuning. As such, hierarch is not
|
|
activated by default -- it must be enabled manually by setting its
|
|
priority level to 100:
|
|
|
|
mpirun --mca coll_hierarch_priority 100 ...
|
|
|
|
We would appreciate feedback from the user community about how well
|
|
hierarch works for your applications.
|
|
|
|
- The "fca" coll component: Voltaire Fabric Collective Accelerator (FCA)
|
|
is a solution for offloading collective operations from the MPI process
|
|
onto Voltaire QDR InfiniBand switch CPUs.
|
|
|
|
See http://www.voltaire.com/Products/Application_Acceleration_Software/voltaire_fabric_collective_accelerator_fca
|
|
for details.
|
|
|
|
|
|
Network Support
|
|
---------------
|
|
|
|
- The OpenFabrics Enterprise Distribution (OFED) software package v1.0
|
|
will not work properly with Open MPI v1.2 (and later) due to how its
|
|
Mellanox InfiniBand plugin driver is created. The problem is fixed
|
|
OFED v1.1 (and later).
|
|
|
|
- Older mVAPI-based InfiniBand drivers (Mellanox VAPI) are no longer
|
|
supported. Please use an older version of Open MPI (1.2 series or
|
|
earlier) if you need mVAPI support.
|
|
|
|
- The use of fork() with the openib BTL is only partially supported,
|
|
and only on Linux kernels >= v2.6.15 with libibverbs v1.1 or later
|
|
(first released as part of OFED v1.2), per restrictions imposed by
|
|
the OFED network stack.
|
|
|
|
- There are three MPI network models available: "ob1", "csum", and
|
|
"cm". "ob1" and "csum" use BTL ("Byte Transfer Layer") components
|
|
for each supported network. "cm" uses MTL ("Matching Tranport
|
|
Layer") components for each supported network.
|
|
|
|
- "ob1" supports a variety of networks that can be used in
|
|
combination with each other (per OS constraints; e.g., there are
|
|
reports that the GM and OpenFabrics kernel drivers do not operate
|
|
well together):
|
|
- OpenFabrics: InfiniBand and iWARP
|
|
- Loopback (send-to-self)
|
|
- Myrinet: GM and MX (including Open-MX)
|
|
- Portals
|
|
- Quadrics Elan
|
|
- Shared memory
|
|
- TCP
|
|
- SCTP
|
|
- uDAPL
|
|
|
|
- "csum" is exactly the same as "ob1", except that it performs
|
|
additional data integrity checks to ensure that the received data
|
|
is intact (vs. trusting the underlying network to deliver the data
|
|
correctly). csum supports all the same networks as ob1, but there
|
|
is a performance penalty for the additional integrity checks.
|
|
|
|
- "cm" supports a smaller number of networks (and they cannot be
|
|
used together), but may provide better better overall MPI
|
|
performance:
|
|
- Myrinet MX (including Open-MX, but not GM)
|
|
- InfiniPath PSM
|
|
- Portals
|
|
|
|
Open MPI will, by default, choose to use "cm" when the InfiniPath
|
|
PSM MTL can be used. Otherwise, "ob1" will be used and the
|
|
corresponding BTLs will be selected. "csum" will never be selected
|
|
by default. Users can force the use of ob1 or cm if desired by
|
|
setting the "pml" MCA parameter at run-time:
|
|
|
|
shell$ mpirun --mca pml ob1 ...
|
|
or
|
|
shell$ mpirun --mca pml csum ...
|
|
or
|
|
shell$ mpirun --mca pml cm ...
|
|
|
|
- Myrinet MX (and Open-MX) support is shared between the 2 internal
|
|
devices, the MTL and the BTL. The design of the BTL interface in
|
|
Open MPI assumes that only naive one-sided communication
|
|
capabilities are provided by the low level communication layers.
|
|
However, modern communication layers such as Myrinet MX, InfiniPath
|
|
PSM, or Portals, natively implement highly-optimized two-sided
|
|
communication semantics. To leverage these capabilities, Open MPI
|
|
provides the "cm" PML and corresponding MTL components to transfer
|
|
messages rather than bytes. The MTL interface implements a shorter
|
|
code path and lets the low-level network library decide which
|
|
protocol to use (depending on issues such as message length,
|
|
internal resources and other parameters specific to the underlying
|
|
interconnect). However, Open MPI cannot currently use multiple MTL
|
|
modules at once. In the case of the MX MTL, process loopback and
|
|
on-node shared memory communications are provided by the MX library.
|
|
Moreover, the current MX MTL does not support message pipelining
|
|
resulting in lower performances in case of non-contiguous
|
|
data-types.
|
|
|
|
The "ob1" and "csum" PMLs and BTL components use Open MPI's internal
|
|
on-node shared memory and process loopback devices for high
|
|
performance. The BTL interface allows multiple devices to be used
|
|
simultaneously. For the MX BTL it is recommended that the first
|
|
segment (which is as a threshold between the eager and the
|
|
rendezvous protocol) should always be at most 4KB, but there is no
|
|
further restriction on the size of subsequent fragments.
|
|
|
|
The MX MTL is recommended in the common case for best performance on
|
|
10G hardware when most of the data transfers cover contiguous memory
|
|
layouts. The MX BTL is recommended in all other cases, such as when
|
|
using multiple interconnects at the same time (including TCP), or
|
|
transferring non contiguous data-types.
|
|
|
|
- Linux "knem" support is used when the "sm" (shared memory) BTL is
|
|
compiled with knem support (see the --with-knem configure option)
|
|
and the knem Linux module is loaded in the running kernel. If the
|
|
knem Linux kernel module is not loaded, the knem support is (by
|
|
default) silently deactivated during Open MPI jobs.
|
|
|
|
See http://runtime.bordeaux.inria.fr/knem/ for details on Knem.
|
|
|
|
Open MPI Extensions
|
|
-------------------
|
|
|
|
- Extensions framework added. See the "Open MPI API Extensions"
|
|
section below for more information on compiling and using
|
|
extensions.
|
|
|
|
- The following extensions are included in this version of Open MPI:
|
|
|
|
- affinity: Provides the OMPI_Affinity_str() routine on retrieving
|
|
a string that contains what resources a process is bound to. See
|
|
its man page for more details.
|
|
- cr: Provides routines to access to checkpoint restart routines.
|
|
See ompi/mpiext/cr/mpiext_cr_c.h for a listing of availble
|
|
functions.
|
|
- example: A non-functional extension; its only purpose is to
|
|
provide an example for how to create other extensions.
|
|
|
|
===========================================================================
|
|
|
|
Building Open MPI
|
|
-----------------
|
|
|
|
Open MPI uses a traditional configure script paired with "make" to
|
|
build. Typical installs can be of the pattern:
|
|
|
|
---------------------------------------------------------------------------
|
|
shell$ ./configure [...options...]
|
|
shell$ make all install
|
|
---------------------------------------------------------------------------
|
|
|
|
There are many available configure options (see "./configure --help"
|
|
for a full list); a summary of the more commonly used ones follows:
|
|
|
|
--prefix=<directory>
|
|
Install Open MPI into the base directory named <directory>. Hence,
|
|
Open MPI will place its executables in <directory>/bin, its header
|
|
files in <directory>/include, its libraries in <directory>/lib, etc.
|
|
|
|
--with-elan=<directory>
|
|
Specify the directory where the Quadrics Elan library and header
|
|
files are located. This option is generally only necessary if the
|
|
Elan headers and libraries are not in default compiler/linker
|
|
search paths.
|
|
|
|
Elan is the support library for Quadrics-based networks.
|
|
|
|
--with-elan-libdir=<directory>
|
|
Look in directory for the Quadrics Elan libraries. By default, Open
|
|
MPI will look in <elan directory>/lib and <elan directory>/lib64,
|
|
which covers most cases. This option is only needed for special
|
|
configurations.
|
|
|
|
--with-gm=<directory>
|
|
Specify the directory where the GM libraries and header files are
|
|
located. This option is generally only necessary if the GM headers
|
|
and libraries are not in default compiler/linker search paths.
|
|
|
|
GM is the support library for older Myrinet-based networks (GM has
|
|
been obsoleted by MX).
|
|
|
|
--with-gm-libdir=<directory>
|
|
Look in directory for the GM libraries. By default, Open MPI will
|
|
look in <gm directory>/lib and <gm directory>/lib64, which covers
|
|
most cases. This option is only needed for special configurations.
|
|
|
|
--with-hwloc=<location>
|
|
Build hwloc support. If <location> is "internal", Open MPI's
|
|
internal copy of hwloc is used. If <location> is "external", Open
|
|
MPI will search in default locations for an hwloc installation.
|
|
Finally, if <location> is a directory, that directory will be
|
|
searched for a valid hwloc installation, just like other
|
|
--with-FOO=<directory> configure options.
|
|
|
|
hwloc is a support library that provides processor and memory
|
|
affinity information for NUMA platforms.
|
|
|
|
--with-hwloc-libdir=<directory>
|
|
|
|
Look in directory for the hwloc libraries. This option is only
|
|
usable when building Open MPI against an external hwloc
|
|
installation. Just like other --with-FOO-libdir configure options,
|
|
this option is only needed for special configurations.
|
|
|
|
--with-knem=<directory>
|
|
Specify the directory where the knem libraries and header files are
|
|
located. This option is generally only necessary if the kenm headers
|
|
and libraries are not in default compiler/linker search paths.
|
|
|
|
kenm is a Linux kernel module that allows direct process-to-process
|
|
memory copies (optionally using hardware offload), potentially
|
|
increasing bandwidth for large messages sent between messages on the
|
|
same server. See http://runtime.bordeaux.inria.fr/knem/ for
|
|
details.
|
|
|
|
--with-mx=<directory>
|
|
Specify the directory where the MX libraries and header files are
|
|
located. This option is generally only necessary if the MX headers
|
|
and libraries are not in default compiler/linker search paths.
|
|
|
|
MX is the support library for Myrinet-based networks. An open
|
|
source software package named Open-MX provides the same
|
|
functionality on Ethernet-based clusters (Open-MX can provide
|
|
MPI performance improvements compared to TCP messaging).
|
|
|
|
--with-mx-libdir=<directory>
|
|
Look in directory for the MX libraries. By default, Open MPI will
|
|
look in <mx directory>/lib and <mx directory>/lib64, which covers
|
|
most cases. This option is only needed for special configurations.
|
|
|
|
--with-openib=<directory>
|
|
Specify the directory where the OpenFabrics (previously known as
|
|
OpenIB) libraries and header files are located. This option is
|
|
generally only necessary if the OpenFabrics headers and libraries
|
|
are not in default compiler/linker search paths.
|
|
|
|
"OpenFabrics" refers to iWARP- and InifiniBand-based networks.
|
|
|
|
--with-openib-libdir=<directory>
|
|
Look in directory for the OpenFabrics libraries. By default, Open
|
|
MPI will look in <openib directory>/lib and <openib
|
|
directory>/lib64, which covers most cases. This option is only
|
|
needed for special configurations.
|
|
|
|
--with-portals=<directory>
|
|
Specify the directory where the Portals libraries and header files
|
|
are located. This option is generally only necessary if the Portals
|
|
headers and libraries are not in default compiler/linker search
|
|
paths.
|
|
|
|
Portals is the support library for Cray interconnects, but is also
|
|
available on other platforms (e.g., there is a Portals library
|
|
implemented over regular TCP).
|
|
|
|
--with-portals-config=<type>
|
|
Configuration to use for Portals support. The following <type>
|
|
values are possible: "utcp", "xt3", "xt3-modex" (default: utcp).
|
|
|
|
--with-portals-libs=<libs>
|
|
Additional libraries to link with for Portals support.
|
|
|
|
--with-psm=<directory>
|
|
Specify the directory where the QLogic InfiniPath PSM library and
|
|
header files are located. This option is generally only necessary
|
|
if the InfiniPath headers and libraries are not in default
|
|
compiler/linker search paths.
|
|
|
|
PSM is the support library for QLogic InfiniPath network adapters.
|
|
|
|
--with-psm-libdir=<directory>
|
|
Look in directory for the PSM libraries. By default, Open MPI will
|
|
look in <psm directory>/lib and <psm directory>/lib64, which covers
|
|
most cases. This option is only needed for special configurations.
|
|
|
|
--with-sctp=<directory>
|
|
Specify the directory where the SCTP libraries and header files are
|
|
located. This option is generally only necessary if the SCTP headers
|
|
and libraries are not in default compiler/linker search paths.
|
|
|
|
SCTP is a special network stack over ethernet networks.
|
|
|
|
--with-sctp-libdir=<directory>
|
|
Look in directory for the SCTP libraries. By default, Open MPI will
|
|
look in <sctp directory>/lib and <sctp directory>/lib64, which covers
|
|
most cases. This option is only needed for special configurations.
|
|
|
|
--with-udapl=<directory>
|
|
Specify the directory where the UDAPL libraries and header files are
|
|
located. Note that UDAPL support is disabled by default on Linux;
|
|
the --with-udapl flag must be specified in order to enable it.
|
|
Specifying the directory argument is generally only necessary if the
|
|
UDAPL headers and libraries are not in default compiler/linker
|
|
search paths.
|
|
|
|
UDAPL is the support library for high performance networks in Sun
|
|
HPC ClusterTools and on Linux OpenFabrics networks (although the
|
|
"openib" options are preferred for Linux OpenFabrics networks, not
|
|
UDAPL).
|
|
|
|
--with-udapl-libdir=<directory>
|
|
Look in directory for the UDAPL libraries. By default, Open MPI
|
|
will look in <udapl directory>/lib and <udapl directory>/lib64,
|
|
which covers most cases. This option is only needed for special
|
|
configurations.
|
|
|
|
--with-lsf=<directory>
|
|
Specify the directory where the LSF libraries and header files are
|
|
located. This option is generally only necessary if the LSF headers
|
|
and libraries are not in default compiler/linker search paths.
|
|
|
|
LSF is a resource manager system, frequently used as a batch
|
|
scheduler in HPC systems.
|
|
|
|
NOTE: If you are using LSF version 7.0.5, you will need to add
|
|
"LIBS=-ldl" to the configure command line. For example:
|
|
|
|
./configure LIBS=-ldl --with-lsf ...
|
|
|
|
This workaround should *only* be needed for LSF 7.0.5.
|
|
|
|
--with-lsf-libdir=<directory>
|
|
Look in directory for the LSF libraries. By default, Open MPI will
|
|
look in <lsf directory>/lib and <lsf directory>/lib64, which covers
|
|
most cases. This option is only needed for special configurations.
|
|
|
|
--with-tm=<directory>
|
|
Specify the directory where the TM libraries and header files are
|
|
located. This option is generally only necessary if the TM headers
|
|
and libraries are not in default compiler/linker search paths.
|
|
|
|
TM is the support library for the Torque and PBS Pro resource
|
|
manager systems, both of which are frequently used as a batch
|
|
scheduler in HPC systems.
|
|
|
|
--with-sge
|
|
Specify to build support for the Sun Grid Engine (SGE) resource
|
|
manager. SGE support is disabled by default; this option must be
|
|
specified to build OMPI's SGE support.
|
|
|
|
The Sun Grid Engine (SGE) is a resource manager system, frequently
|
|
used as a batch scheduler in HPC systems.
|
|
|
|
--with-esmtp=<directory>
|
|
|
|
Specify the directory where the libESMTP libraries and header files are
|
|
located. This option is generally only necessary of the libESMTP
|
|
headers and libraries are not included in the default
|
|
compiler/linker search paths.
|
|
|
|
libESMTP is a support library for sending e-mail.
|
|
|
|
--with-mpi-param_check(=value)
|
|
"value" can be one of: always, never, runtime. If --with-mpi-param
|
|
is not specified, "runtime" is the default. If --with-mpi-param
|
|
is specified with no value, "always" is used. Using
|
|
--without-mpi-param-check is equivalent to "never".
|
|
|
|
- always: the parameters of MPI functions are always checked for
|
|
errors
|
|
- never: the parameters of MPI functions are never checked for
|
|
errors
|
|
- runtime: whether the parameters of MPI functions are checked
|
|
depends on the value of the MCA parameter mpi_param_check
|
|
(default: yes).
|
|
|
|
--with-threads=value
|
|
Since thread support is only partially tested, it is disabled by
|
|
default. To enable threading, use "--with-threads=posix". This is
|
|
most useful when combined with --enable-mpi-thread-multiple.
|
|
|
|
--enable-mpi-thread-multiple
|
|
Allows the MPI thread level MPI_THREAD_MULTIPLE. See
|
|
--with-threads; this is currently disabled by default. Enabling
|
|
this feature will automatically --enable-opal-multi-threads.
|
|
|
|
--with-fca=<directory>
|
|
Specify the directory where the Voltaire FCA library and
|
|
header files are located.
|
|
|
|
FCA is the support library for Voltaire QDR switches.
|
|
|
|
|
|
--enable-opal-multi-threads
|
|
Enables thread lock support in the OPAL and ORTE layers. Does
|
|
not enable MPI_THREAD_MULTIPLE - see above option for that feature.
|
|
This is currently disabled by default.
|
|
|
|
--disable-mpi-cxx
|
|
Disable building the C++ MPI bindings. Note that this does *not*
|
|
disable the C++ checks during configure; some of Open MPI's tools
|
|
are written in C++ and therefore require a C++ compiler to be built.
|
|
|
|
--disable-mpi-cxx-seek
|
|
Disable the MPI::SEEK_* constants. Due to a problem with the MPI-2
|
|
specification, these constants can conflict with system-level SEEK_*
|
|
constants. Open MPI attempts to work around this problem, but the
|
|
workaround may fail in some esoteric situations. The
|
|
--disable-mpi-cxx-seek switch disables Open MPI's workarounds (and
|
|
therefore the MPI::SEEK_* constants will be unavailable).
|
|
|
|
--disable-mpi-f77
|
|
Disable building the Fortran 77 MPI bindings.
|
|
|
|
--disable-mpi-f90
|
|
Disable building the Fortran 90 MPI bindings. Also related to the
|
|
--with-f90-max-array-dim and --with-mpi-f90-size options.
|
|
|
|
--with-mpi-f90-size=<SIZE>
|
|
Three sizes of the MPI F90 module can be built: trivial (only a
|
|
handful of MPI-2 F90-specific functions are included in the F90
|
|
module), small (trivial + all MPI functions that take no choice
|
|
buffers), and medium (small + all MPI functions that take 1 choice
|
|
buffer). This parameter is only used if the F90 bindings are
|
|
enabled.
|
|
|
|
--with-f90-max-array-dim=<DIM>
|
|
The F90 MPI bindings are strictly typed, even including the number of
|
|
dimensions for arrays for MPI choice buffer parameters. Open MPI
|
|
generates these bindings at compile time with a maximum number of
|
|
dimensions as specified by this parameter. The default value is 4.
|
|
|
|
--enable-mpi-ext(=<list>)
|
|
Enable Open MPI's non-portable API extensions. If no <list> is
|
|
specified, all of the extensions are enabled.
|
|
|
|
See "Open MPI API Extensions", below, for more details.
|
|
|
|
--enable-mpirun-prefix-by-default
|
|
This option forces the "mpirun" command to always behave as if
|
|
"--prefix $prefix" was present on the command line (where $prefix is
|
|
the value given to the --prefix option to configure). This prevents
|
|
most rsh/ssh-based users from needing to modify their shell startup
|
|
files to set the PATH and/or LD_LIBRARY_PATH for Open MPI on remote
|
|
nodes. Note, however, that such users may still desire to set PATH
|
|
-- perhaps even in their shell startup files -- so that executables
|
|
such as mpicc and mpirun can be found without needing to type long
|
|
path names. --enable-orterun-prefix-by-default is a synonym for
|
|
this option.
|
|
|
|
--disable-shared
|
|
By default, libmpi is built as a shared library, and all components
|
|
are built as dynamic shared objects (DSOs). This switch disables
|
|
this default; it is really only useful when used with
|
|
--enable-static. Specifically, this option does *not* imply
|
|
--enable-static; enabling static libraries and disabling shared
|
|
libraries are two independent options.
|
|
|
|
--enable-static
|
|
Build libmpi as a static library, and statically link in all
|
|
components. Note that this option does *not* imply
|
|
--disable-shared; enabling static libraries and disabling shared
|
|
libraries are two independent options.
|
|
|
|
--enable-sparse-groups
|
|
Enable the usage of sparse groups. This would save memory
|
|
significantly especially if you are creating large
|
|
communicators. (Disabled by default)
|
|
|
|
--enable-peruse
|
|
Enable the PERUSE MPI data analysis interface.
|
|
|
|
--enable-dlopen
|
|
Build all of Open MPI's components as standalone Dynamic Shared
|
|
Objects (DSO's) that are loaded at run-time. The opposite of this
|
|
option, --disable-dlopen, causes two things:
|
|
|
|
1. All of Open MPI's components will be built as part of Open MPI's
|
|
normal libraries (e.g., libmpi).
|
|
2. Open MPI will not attempt to open any DSO's at run-time.
|
|
|
|
Note that this option does *not* imply that OMPI's libraries will be
|
|
built as static objects (e.g., libmpi.a). It only specifies the
|
|
location of OMPI's components: standalone DSOs or folded into the
|
|
Open MPI libraries. You can control whenther Open MPI's libraries
|
|
are build as static or dynamic via --enable|disable-static and
|
|
--enable|disable-shared.
|
|
|
|
--with-libltdl[=VALUE]
|
|
This option specifies where to find the GNU Libtool libltdl support
|
|
library. The following VALUEs are permitted:
|
|
|
|
internal: Use Open MPI's internal copy of libltdl.
|
|
external: Use an external libltdl installation (rely on default
|
|
compiler and linker paths to find it)
|
|
<no value>: Same as "internal".
|
|
<directory>: Specify the localtion of a specific libltdl
|
|
installation to use
|
|
|
|
By default (or if --with-libltdl is specified with no VALUE), Open
|
|
MPI will build and use the copy of libltdl that it has in its source
|
|
tree. However, if the VALUE is "external", Open MPI will look for
|
|
the relevant libltdl header file and library in default compiler /
|
|
linker locations. Or, VALUE can be a directory tree where the
|
|
libltdl header file and library can be found. This option allows
|
|
operating systems to include Open MPI and use their default libltdl
|
|
installation instead of Open MPI's bundled libltdl.
|
|
|
|
Note that this option is ignored if --disable-dlopen is specified.
|
|
|
|
--enable-heterogeneous
|
|
Enable support for running on heterogeneous clusters (e.g., machines
|
|
with different endian representations). Heterogeneous support is
|
|
disabled by default because it imposes a minor performance penalty.
|
|
|
|
--disable-libompitrace
|
|
Disable building the simple "libompitrace" library (see note above
|
|
about libompitrace)
|
|
|
|
--disable-vt
|
|
Disable building VampirTrace.
|
|
|
|
--enable-contrib-no-build=<list>
|
|
<list> is a comma-delimited list of the Open MPI contributed
|
|
software packages (e.g., libompitrace, VampirTrace) to disable.
|
|
|
|
Using this form is exactly equivalent to the contributed packages'
|
|
--disable-<name> form; this form may be slightly more compact if
|
|
disabling multiple packages.
|
|
|
|
--disable-sysv
|
|
Disable System V (sysv) shared memory support. By default, System V
|
|
shared memory support is enabled.
|
|
|
|
--disable-posix-shmem
|
|
Disable POSIX shared memory support. By default, POSIX shared memory support
|
|
is enabled.
|
|
|
|
--with-wrapper-cflags=<cflags>
|
|
--with-wrapper-cxxflags=<cxxflags>
|
|
--with-wrapper-fflags=<fflags>
|
|
--with-wrapper-fcflags=<fcflags>
|
|
--with-wrapper-ldflags=<ldflags>
|
|
--with-wrapper-libs=<libs>
|
|
Add the specified flags to the default flags that used are in Open
|
|
MPI's "wrapper" compilers (e.g., mpicc -- see below for more
|
|
information about Open MPI's wrapper compilers). By default, Open
|
|
MPI's wrapper compilers use the same compilers used to build Open
|
|
MPI and specify an absolute minimum set of additional flags that are
|
|
necessary to compile/link MPI applications. These configure options
|
|
give system administrators the ability to embed additional flags in
|
|
OMPI's wrapper compilers (which is a local policy decision). The
|
|
meanings of the different flags are:
|
|
|
|
<cflags>: Flags passed by the mpicc wrapper to the C compiler
|
|
<cxxflags>: Flags passed by the mpic++ wrapper to the C++ compiler
|
|
<fflags>: Flags passed by the mpif77 wrapper to the F77 compiler
|
|
<fcflags>: Flags passed by the mpif90 wrapper to the F90 compiler
|
|
<ldflags>: Flags passed by all the wrappers to the linker
|
|
<libs>: Flags passed by all the wrappers to the linker
|
|
|
|
There are other ways to configure Open MPI's wrapper compiler
|
|
behavior; see the Open MPI FAQ for more information.
|
|
|
|
There are many other options available -- see "./configure --help".
|
|
|
|
Changing the compilers that Open MPI uses to build itself uses the
|
|
standard Autoconf mechanism of setting special environment variables
|
|
either before invoking configure or on the configure command line.
|
|
The following environment variables are recognized by configure:
|
|
|
|
CC - C compiler to use
|
|
CFLAGS - Compile flags to pass to the C compiler
|
|
CPPFLAGS - Preprocessor flags to pass to the C compiler
|
|
|
|
CXX - C++ compiler to use
|
|
CXXFLAGS - Compile flags to pass to the C++ compiler
|
|
CXXCPPFLAGS - Preprocessor flags to pass to the C++ compiler
|
|
|
|
F77 - Fortran 77 compiler to use
|
|
FFLAGS - Compile flags to pass to the Fortran 77 compiler
|
|
|
|
FC - Fortran 90 compiler to use
|
|
FCFLAGS - Compile flags to pass to the Fortran 90 compiler
|
|
|
|
LDFLAGS - Linker flags to pass to all compilers
|
|
LIBS - Libraries to pass to all compilers (it is rarely
|
|
necessary for users to need to specify additional LIBS)
|
|
|
|
For example:
|
|
|
|
shell$ ./configure CC=mycc CXX=myc++ F77=myf77 F90=myf90 ...
|
|
|
|
***Note: We generally suggest using the above command line form for
|
|
setting different compilers (vs. setting environment variables and
|
|
then invoking "./configure"). The above form will save all
|
|
variables and values in the config.log file, which makes
|
|
post-mortem analysis easier when problems occur.
|
|
|
|
Note that you may also want to ensure that the value of
|
|
LD_LIBRARY_PATH is set appropriately (or not at all) for your build
|
|
(or whatever environment variable is relevant for your operating
|
|
system). For example, some users have been tripped up by setting to
|
|
use non-default Fortran compilers via FC / F77, but then failing to
|
|
set LD_LIBRARY_PATH to include the directory containing that
|
|
non-default Fortran compiler's support libraries. This causes Open
|
|
MPI's configure script to fail when it tries to compile / link / run
|
|
simple Fortran programs.
|
|
|
|
It is required that the compilers specified be compile and link
|
|
compatible, meaning that object files created by one compiler must be
|
|
able to be linked with object files from the other compilers and
|
|
produce correctly functioning executables.
|
|
|
|
Open MPI supports all the "make" targets that are provided by GNU
|
|
Automake, such as:
|
|
|
|
all - build the entire Open MPI package
|
|
install - install Open MPI
|
|
uninstall - remove all traces of Open MPI from the $prefix
|
|
clean - clean out the build tree
|
|
|
|
Once Open MPI has been built and installed, it is safe to run "make
|
|
clean" and/or remove the entire build tree.
|
|
|
|
VPATH and parallel builds are fully supported.
|
|
|
|
Generally speaking, the only thing that users need to do to use Open
|
|
MPI is ensure that <prefix>/bin is in their PATH and <prefix>/lib is
|
|
in their LD_LIBRARY_PATH. Users may need to ensure to set the PATH
|
|
and LD_LIBRARY_PATH in their shell setup files (e.g., .bashrc, .cshrc)
|
|
so that non-interactive rsh/ssh-based logins will be able to find the
|
|
Open MPI executables.
|
|
|
|
===========================================================================
|
|
|
|
Open MPI Version Numbers and Binary Compatibility
|
|
-------------------------------------------------
|
|
|
|
Open MPI has two sets of version numbers that are likely of interest
|
|
to end users / system administrator:
|
|
|
|
* Software version number
|
|
* Shared library version numbers
|
|
|
|
Both are described below, followed by a discussion of application
|
|
binary interface (ABI) compatibility implications.
|
|
|
|
Software Version Number
|
|
-----------------------
|
|
|
|
Open MPI's version numbers are the union of several different values:
|
|
major, minor, release, and an optional quantifier.
|
|
|
|
* Major: The major number is the first integer in the version string
|
|
(e.g., v1.2.3). Changes in the major number typically indicate a
|
|
significant change in the code base and/or end-user
|
|
functionality. The major number is always included in the version
|
|
number.
|
|
|
|
* Minor: The minor number is the second integer in the version
|
|
string (e.g., v1.2.3). Changes in the minor number typically
|
|
indicate a incremental change in the code base and/or end-user
|
|
functionality. The minor number is always included in the version
|
|
number. Starting with Open MPI v1.3.0, the minor release number
|
|
took on additional significance (see this wiki page for more
|
|
details):
|
|
|
|
o Even minor release numbers are part of "super-stable"
|
|
release series (e.g., v1.4.0). Releases in super stable series
|
|
are well-tested, time-tested, and mature. Such releases are
|
|
recomended for production sites. Changes between subsequent
|
|
releases in super stable series are expected to be fairly small.
|
|
o Odd minor release numbers are part of "feature" release
|
|
series (e.g., 1.3.7). Releases in feature releases are
|
|
well-tested, but they are not necessarily time-tested or as
|
|
mature as super stable releases. Changes between subsequent
|
|
releases in feature series may be large.
|
|
|
|
* Release: The release number is the third integer in the version
|
|
string (e.g., v1.2.3). Changes in the release number typically
|
|
indicate a bug fix in the code base and/or end-user
|
|
functionality. If the release number is 0, it is omitted from the
|
|
version number (e.g., v1.2 has a release number of 0).
|
|
|
|
* Quantifier: Open MPI version numbers sometimes have an arbitrary
|
|
string affixed to the end of the version number. Common strings
|
|
include:
|
|
|
|
o aX: Indicates an alpha release. X is an integer indicating
|
|
the number of the alpha release (e.g., v1.2.3a5 indicates the
|
|
5th alpha release of version 1.2.3).
|
|
o bX: Indicates a beta release. X is an integer indicating
|
|
the number of the beta release (e.g., v1.2.3b3 indicates the 3rd
|
|
beta release of version 1.2.3).
|
|
o rcX: Indicates a release candidate. X is an integer
|
|
indicating the number of the release candidate (e.g., v1.2.3rc4
|
|
indicates the 4th release candidate of version 1.2.3).
|
|
o rV or hgV: Indicates the Subversion / Mercurial repository
|
|
number string that the release was made from (V is usually an
|
|
integer for Subversion releases and usually a string for
|
|
Mercurial releases). Although all official Open MPI releases are
|
|
tied to a single, specific Subversion or Mercurial repository
|
|
number (which can be obtained from the ompi_info command), only
|
|
some releases have the Subversion / Mercurial repository number
|
|
in the version number. Development snapshot tarballs, for
|
|
example, have the Subversion repository included in the version
|
|
to reflect that they are a development snapshot of an upcoming
|
|
release (e.g., v1.2.3r1234 indicates a development snapshot of
|
|
version 1.2.3 corresponding to Subversion repository number
|
|
1234).
|
|
|
|
Quantifiers may be mixed together -- for example v1.2.3rc7r2345
|
|
indicates a development snapshot of an upcoming 7th release
|
|
candidate for version 1.2.3 corresponding to Subversion repository
|
|
number 2345.
|
|
|
|
Shared Library Version Number
|
|
-----------------------------
|
|
|
|
Open MPI started using the GNU Libtool shared library versioning
|
|
scheme with the release of v1.3.2.
|
|
|
|
NOTE: Only official releases of Open MPI adhere to this versioning
|
|
scheme. "Beta" releases, release candidates, and nightly
|
|
tarballs, developer snapshots, and Subversion/Mercurial snapshot
|
|
tarballs likely will all have arbitrary/meaningless shared
|
|
library version numbers.
|
|
|
|
For deep voodoo technical reasons, only the MPI API libraries were
|
|
versioned until Open MPI v1.5 was released (i.e., libmpi*so --
|
|
libopen-rte.so or libopen-pal.so were not versioned until v1.5).
|
|
Please see https://svn.open-mpi.org/trac/ompi/ticket/2092 for more
|
|
details.
|
|
|
|
NOTE: This policy change will cause an ABI incompatibility between MPI
|
|
applications compiled/linked against the Open MPI v1.4 series;
|
|
such applications will not be able to upgrade to the Open MPI
|
|
v1.5 series without re-linking. Sorry folks!
|
|
|
|
The GNU Libtool official documentation details how the versioning
|
|
scheme works. The quick version is that the shared library versions
|
|
are a triple of integers: (current,revision,age), or "c:r:a". This
|
|
triple is not related to the Open MPI software version number. There
|
|
are six simple rules for updating the values (taken almost verbatim
|
|
from the Libtool docs):
|
|
|
|
1. Start with version information of "0:0:0" for each shared library.
|
|
|
|
2. Update the version information only immediately before a public
|
|
release of your software. More frequent updates are unnecessary,
|
|
and only guarantee that the current interface number gets larger
|
|
faster.
|
|
|
|
3. If the library source code has changed at all since the last
|
|
update, then increment revision ("c:r:a" becomes "c:r+1:a").
|
|
|
|
4. If any interfaces have been added, removed, or changed since the
|
|
last update, increment current, and set revision to 0.
|
|
|
|
5. If any interfaces have been added since the last public release,
|
|
then increment age.
|
|
|
|
6. If any interfaces have been removed since the last public release,
|
|
then set age to 0.
|
|
|
|
Here's how we apply those rules specifically to Open MPI:
|
|
|
|
1. The above rules do not apply to MCA components (a.k.a. "plugins");
|
|
MCA component .so versions stay unspecified.
|
|
|
|
2. The above rules apply exactly as written to the following
|
|
libraries starting with Open MPI version v1.5 (prior to v1.5,
|
|
libopen-pal and libopen-rte were still at 0:0:0 for reasons
|
|
discussed in bug ticket #2092
|
|
https://svn.open-mpi.org/trac/ompi/ticket/2092):
|
|
|
|
* libopen-rte
|
|
* libopen-pal
|
|
* libmca_common_*
|
|
|
|
3. The following libraries use a slightly modified version of the
|
|
above rules: rules 4, 5, and 6 only apply to the official MPI
|
|
interfaces (functions, global variables). The rationale for this
|
|
decision is that the vast majority of our users only care about
|
|
the official/public MPI interfaces; we therefore want the .so
|
|
version number to reflect only changes to the official MPI API.
|
|
Put simply: non-MPI API / internal changes to the
|
|
MPI-application-facing libraries are irrelevant to pure MPI
|
|
applications.
|
|
|
|
* libmpi
|
|
* libmpi_f77
|
|
* libmpi_f90
|
|
* libmpi_cxx
|
|
|
|
4. Note, however, that libmpi.so can have its "revision" number
|
|
incremented if libopen-rte or libopen-pal change (because these
|
|
two libraries are wholly included in libmpi.so). Specifically:
|
|
the revision will change, but since we have defined that the only
|
|
relevant API interface in libmpi.so is the official MPI API,
|
|
updates to libopen-rte and libopen-pal do not change the "current"
|
|
or "age" numbers of libmpi.so.
|
|
|
|
Application Binary Interface (ABI) Compatibility
|
|
------------------------------------------------
|
|
|
|
Open MPI provided forward application binary interface (ABI)
|
|
compatibility for MPI applications starting with v1.3.2. Prior to
|
|
that version, no ABI guarantees were provided.
|
|
|
|
NOTE: Prior to v1.3.2, subtle and strange failures are almost
|
|
guaranteed to occur if applications were compiled and linked
|
|
against shared libraries from one version of Open MPI and then
|
|
run with another. The Open MPI team strongly discourages making
|
|
any ABI assumptions before v1.3.2.
|
|
|
|
Starting with v1.3.2, Open MPI provides forward ABI compatibility in
|
|
all versions of a given feature release series and its corresponding
|
|
super stable series. For example, on a single platform, an MPI
|
|
application linked against Open MPI v1.3.2 shared libraries can be
|
|
updated to point to the shared libraries in any successive v1.3.x or
|
|
v1.4 release and still work properly (e.g., via the LD_LIBRARY_PATH
|
|
environment variable or other operating system mechanism).
|
|
|
|
Open MPI reserves the right to break ABI compatibility at new feature
|
|
release series. For example, the same MPI application from above
|
|
(linked against Open MPI v1.3.2 shared libraries) will *not* work with
|
|
Open MPI v1.5 shared libraries.
|
|
|
|
===========================================================================
|
|
|
|
Checking Your Open MPI Installation
|
|
-----------------------------------
|
|
|
|
The "ompi_info" command can be used to check the status of your Open
|
|
MPI installation (located in <prefix>/bin/ompi_info). Running it with
|
|
no arguments provides a summary of information about your Open MPI
|
|
installation.
|
|
|
|
Note that the ompi_info command is extremely helpful in determining
|
|
which components are installed as well as listing all the run-time
|
|
settable parameters that are available in each component (as well as
|
|
their default values).
|
|
|
|
The following options may be helpful:
|
|
|
|
--all Show a *lot* of information about your Open MPI
|
|
installation.
|
|
--parsable Display all the information in an easily
|
|
grep/cut/awk/sed-able format.
|
|
--param <framework> <component>
|
|
A <framework> of "all" and a <component> of "all" will
|
|
show all parameters to all components. Otherwise, the
|
|
parameters of all the components in a specific framework,
|
|
or just the parameters of a specific component can be
|
|
displayed by using an appropriate <framework> and/or
|
|
<component> name.
|
|
|
|
Changing the values of these parameters is explained in the "The
|
|
Modular Component Architecture (MCA)" section, below.
|
|
|
|
===========================================================================
|
|
|
|
Open MPI API Extensions
|
|
-----------------------
|
|
|
|
Open MPI contains a framework for extending the MPI API that is
|
|
available to applications. Each extension is usually a standalone set of
|
|
functionality that is distinct from other extensions (similar to how
|
|
Open MPI's plugins are usually unrelated to each other). These
|
|
extensions provide new functions and/or constants that are available
|
|
to MPI applications.
|
|
|
|
WARNING: These extensions are neither standard nor portable to other
|
|
MPI implementations!
|
|
|
|
Compiling the extensions
|
|
------------------------
|
|
|
|
Open MPI extensions are not enabled by default; they must be enabled
|
|
by Open MPI's configure script. The --enable-mpi-ext command line
|
|
switch accepts a comma-delimited list of extensions to enable, or, if
|
|
it is specified without a list, all extensions are enabled.
|
|
|
|
Since extensions are meant to be used by advanced users only, this
|
|
file does not document which extensions are available or what they
|
|
do. Look in the ompi/mpiext/ directory to see the extensions; each
|
|
subdirectory of that directory contains an extension. Each has a
|
|
README file that describes what it does.
|
|
|
|
Using the extensions
|
|
--------------------
|
|
|
|
To reinforce the fact that these extensions are non-standard, you must
|
|
include a separate header file after <mpi.h> to obtain the function
|
|
prototypes, constant declarations, etc. For example:
|
|
|
|
#include <mpi.h>
|
|
#if defined(OPEN_MPI) && OPEN_MPI
|
|
#include <mpi-ext.h>
|
|
#endif
|
|
|
|
int main() {
|
|
MPI_Init(NULL, NULL);
|
|
#if defined(OPEN_MPI) && OPEN_MPI
|
|
{
|
|
char ompi_bound[OMPI_AFFINITY_STRING_MAX];
|
|
char current_binding[OMPI_AFFINITY_STRING_MAX];
|
|
char exists[OMPI_AFFINITY_STRING_MAX];
|
|
OMPI_Affinity_str(OMPI_AFFINITY_LAYOUT_FMT, ompi_bound, current_bindings,
|
|
exists);
|
|
}
|
|
#endif
|
|
MPI_Finalize();
|
|
return 0;
|
|
}
|
|
|
|
Notice that the Open MPI-specific code is surrounded by the #if
|
|
statement to ensure that it is only ever compiled by Open MPI.
|
|
|
|
The Open MPI wrapper compilers (mpicc and friends) should
|
|
automatically insert all relevant compiler and linker flags necessary
|
|
to use the extensions. No special flags or steps should be necessary
|
|
compared to "normal" MPI applications.
|
|
|
|
===========================================================================
|
|
|
|
Compiling Open MPI Applications
|
|
-------------------------------
|
|
|
|
Open MPI provides "wrapper" compilers that should be used for
|
|
compiling MPI applications:
|
|
|
|
C: mpicc
|
|
C++: mpiCC (or mpic++ if your filesystem is case-insensitive)
|
|
Fortran 77: mpif77
|
|
Fortran 90: mpif90
|
|
|
|
For example:
|
|
|
|
shell$ mpicc hello_world_mpi.c -o hello_world_mpi -g
|
|
shell$
|
|
|
|
All the wrapper compilers do is add a variety of compiler and linker
|
|
flags to the command line and then invoke a back-end compiler. To be
|
|
specific: the wrapper compilers do not parse source code at all; they
|
|
are solely command-line manipulators, and have nothing to do with the
|
|
actual compilation or linking of programs. The end result is an MPI
|
|
executable that is properly linked to all the relevant libraries.
|
|
|
|
Customizing the behavior of the wrapper compilers is possible (e.g.,
|
|
changing the compiler [not recommended] or specifying additional
|
|
compiler/linker flags); see the Open MPI FAQ for more information.
|
|
|
|
Alternatively, starting in the Open MPI v1.5 series, Open MPI also
|
|
installs pkg-config(1) configuration files under $libdir/pkgconfig.
|
|
If pkg-config is configured to find these files, then compiling /
|
|
linking Open MPI programs can be performed like this:
|
|
|
|
shell$ gcc hello_world_mpi.c -o hello_world_mpi -g \
|
|
`pkg-config ompi-c --cflags --libs`
|
|
shell$
|
|
|
|
Open MPI supplies multiple pkg-config(1) configuration files; one for
|
|
each different wrapper compiler (language):
|
|
|
|
------------------------------------------------------------------------
|
|
ompi Synonym for "ompi-c"; Open MPI applications using the C
|
|
MPI bindings
|
|
ompi-c Open MPI applications using the C MPI bindings
|
|
ompi-cxx Open MPI applications using the C or C++ MPI bindings
|
|
ompi-f77 Open MPI applications using the C or "mpif.h" MPI bindings
|
|
ompi-f90 Open MPI applications using the C, "mpif.h" or "use mpi" MPI
|
|
bindings
|
|
------------------------------------------------------------------------
|
|
|
|
The following pkg-config(1) configuration files *may* be installed,
|
|
depending on which command line options were specified to Open MPI's
|
|
configure script. They are not necessary for MPI applications, but
|
|
may be used by applications that use Open MPI's lower layer support
|
|
libraries.
|
|
|
|
orte: Open MPI Run-Time Environment applicaions
|
|
opal: Open Portable Access Layer applications
|
|
|
|
===========================================================================
|
|
|
|
Running Open MPI Applications
|
|
-----------------------------
|
|
|
|
Open MPI supports both mpirun and mpiexec (they are exactly
|
|
equivalent). For example:
|
|
|
|
shell$ mpirun -np 2 hello_world_mpi
|
|
or
|
|
shell$ mpiexec -np 1 hello_world_mpi : -np 1 hello_world_mpi
|
|
|
|
are equivalent. Some of mpiexec's switches (such as -host and -arch)
|
|
are not yet functional, although they will not error if you try to use
|
|
them.
|
|
|
|
The rsh launcher accepts a -hostfile parameter (the option
|
|
"-machinefile" is equivalent); you can specify a -hostfile parameter
|
|
indicating an standard mpirun-style hostfile (one hostname per line):
|
|
|
|
shell$ mpirun -hostfile my_hostfile -np 2 hello_world_mpi
|
|
|
|
If you intend to run more than one process on a node, the hostfile can
|
|
use the "slots" attribute. If "slots" is not specified, a count of 1
|
|
is assumed. For example, using the following hostfile:
|
|
|
|
---------------------------------------------------------------------------
|
|
node1.example.com
|
|
node2.example.com
|
|
node3.example.com slots=2
|
|
node4.example.com slots=4
|
|
---------------------------------------------------------------------------
|
|
|
|
shell$ mpirun -hostfile my_hostfile -np 8 hello_world_mpi
|
|
|
|
will launch MPI_COMM_WORLD rank 0 on node1, rank 1 on node2, ranks 2
|
|
and 3 on node3, and ranks 4 through 7 on node4.
|
|
|
|
Other starters, such as the resource manager / batch scheduling
|
|
environments, do not require hostfiles (and will ignore the hostfile
|
|
if it is supplied). They will also launch as many processes as slots
|
|
have been allocated by the scheduler if no "-np" argument has been
|
|
provided. For example, running a SLURM job with 8 processors:
|
|
|
|
shell$ salloc -n 8 mpirun a.out
|
|
|
|
The above command will reserve 8 processors and run 1 copy of mpirun,
|
|
which will, in turn, launch 8 copies of a.out in a single
|
|
MPI_COMM_WORLD on the processors that were allocated by SLURM.
|
|
|
|
Note that the values of component parameters can be changed on the
|
|
mpirun / mpiexec command line. This is explained in the section
|
|
below, "The Modular Component Architecture (MCA)".
|
|
|
|
===========================================================================
|
|
|
|
The Modular Component Architecture (MCA)
|
|
|
|
The MCA is the backbone of Open MPI -- most services and functionality
|
|
are implemented through MCA components. Here is a list of all the
|
|
component frameworks in Open MPI:
|
|
|
|
---------------------------------------------------------------------------
|
|
|
|
MPI component frameworks:
|
|
-------------------------
|
|
|
|
allocator - Memory allocator
|
|
bml - BTL management layer
|
|
btl - MPI point-to-point Byte Transfer Layer, used for MPI
|
|
point-to-point messages on some types of networks
|
|
coll - MPI collective algorithms
|
|
crcp - Checkpoint/restart coordination protocol
|
|
dpm - MPI-2 dynamic process management
|
|
io - MPI-2 I/O
|
|
mpool - Memory pooling
|
|
mtl - Matching transport layer, used for MPI point-to-point
|
|
messages on some types of networks
|
|
op - Back end computations for intrinsic MPI_Op operators
|
|
osc - MPI-2 one-sided communications
|
|
pml - MPI point-to-point management layer
|
|
pubsub - MPI-2 publish/subscribe management
|
|
rcache - Memory registration cache
|
|
topo - MPI topology routines
|
|
|
|
Back-end run-time environment (RTE) component frameworks:
|
|
---------------------------------------------------------
|
|
|
|
errmgr - RTE error manager
|
|
ess - RTE environment-specfic services
|
|
filem - Remote file management
|
|
grpcomm - RTE group communications
|
|
iof - I/O forwarding
|
|
notifier - System/network administrator noficiation system
|
|
odls - OpenRTE daemon local launch subsystem
|
|
oob - Out of band messaging
|
|
plm - Process lifecycle management
|
|
ras - Resource allocation system
|
|
rmaps - Resource mapping system
|
|
rml - RTE message layer
|
|
routed - Routing table for the RML
|
|
snapc - Snapshot coordination
|
|
|
|
Miscellaneous frameworks:
|
|
-------------------------
|
|
|
|
backtrace - Debugging call stack backtrace support
|
|
carto - Cartography (host/network mapping) support
|
|
crs - Checkpoint and restart service
|
|
installdirs - Installation directory relocation services
|
|
maffinity - Memory affinity
|
|
memchecker - Run-time memory checking
|
|
memcpy - Memopy copy support
|
|
memory - Memory management hooks
|
|
paffinity - Processor affinity
|
|
pstat - Process status
|
|
sysinfo - Basic system information
|
|
timer - High-resolution timers
|
|
|
|
---------------------------------------------------------------------------
|
|
|
|
Each framework typically has one or more components that are used at
|
|
run-time. For example, the btl framework is used by the MPI layer to
|
|
send bytes across different types underlying networks. The tcp btl,
|
|
for example, sends messages across TCP-based networks; the openib btl
|
|
sends messages across OpenFabrics-based networks; the MX btl sends
|
|
messages across Myrinet MX / Open-MX networks.
|
|
|
|
Each component typically has some tunable parameters that can be
|
|
changed at run-time. Use the ompi_info command to check a component
|
|
to see what its tunable parameters are. For example:
|
|
|
|
shell$ ompi_info --param btl tcp
|
|
|
|
shows all the parameters (and default values) for the tcp btl
|
|
component.
|
|
|
|
These values can be overridden at run-time in several ways. At
|
|
run-time, the following locations are examined (in order) for new
|
|
values of parameters:
|
|
|
|
1. <prefix>/etc/openmpi-mca-params.conf
|
|
|
|
This file is intended to set any system-wide default MCA parameter
|
|
values -- it will apply, by default, to all users who use this Open
|
|
MPI installation. The default file that is installed contains many
|
|
comments explaining its format.
|
|
|
|
2. $HOME/.openmpi/mca-params.conf
|
|
|
|
If this file exists, it should be in the same format as
|
|
<prefix>/etc/openmpi-mca-params.conf. It is intended to provide
|
|
per-user default parameter values.
|
|
|
|
3. environment variables of the form OMPI_MCA_<name> set equal to a
|
|
<value>
|
|
|
|
Where <name> is the name of the parameter. For example, set the
|
|
variable named OMPI_MCA_btl_tcp_frag_size to the value 65536
|
|
(Bourne-style shells):
|
|
|
|
shell$ OMPI_MCA_btl_tcp_frag_size=65536
|
|
shell$ export OMPI_MCA_btl_tcp_frag_size
|
|
|
|
4. the mpirun command line: --mca <name> <value>
|
|
|
|
Where <name> is the name of the parameter. For example:
|
|
|
|
shell$ mpirun --mca btl_tcp_frag_size 65536 -np 2 hello_world_mpi
|
|
|
|
These locations are checked in order. For example, a parameter value
|
|
passed on the mpirun command line will override an environment
|
|
variable; an environment variable will override the system-wide
|
|
defaults.
|
|
|
|
Each component typically activates itself when relavant. For example,
|
|
the MX component will detect that MX devices are present and will
|
|
automatically be used for MPI communications. The SLURM component
|
|
will automatically detect when running inside a SLURM job and activate
|
|
itself. And so on.
|
|
|
|
Components can be manually activated or deactivated if necessary, of
|
|
course. The most common components that are manually activated,
|
|
deactivated, or tuned are the "BTL" components -- components that are
|
|
used for MPI point-to-point communications on many types common
|
|
networks.
|
|
|
|
For example, to *only* activate the TCP and "self" (process loopback)
|
|
components are used for MPI communications, specify them in a
|
|
comma-delimited list to the "btl" MCA parameter:
|
|
|
|
shell$ mpirun --mca btl tcp,self hello_world_mpi
|
|
|
|
To add shared memory support, add "sm" into the command-delimited list
|
|
(list order does not matter):
|
|
|
|
shell$ mpirun --mca btl tcp,sm,self hello_world_mpi
|
|
|
|
To specifically deactivate a specific component, the comma-delimited
|
|
list can be prepended with a "^" to negate it:
|
|
|
|
shell$ mpirun --mca btl ^tcp hello_mpi_world
|
|
|
|
The above command will use any other BTL component other than the tcp
|
|
component.
|
|
|
|
===========================================================================
|
|
|
|
Common Questions
|
|
----------------
|
|
|
|
Many common questions about building and using Open MPI are answered
|
|
on the FAQ:
|
|
|
|
http://www.open-mpi.org/faq/
|
|
|
|
===========================================================================
|
|
|
|
Got more questions?
|
|
-------------------
|
|
|
|
Found a bug? Got a question? Want to make a suggestion? Want to
|
|
contribute to Open MPI? Please let us know!
|
|
|
|
When submitting questions and problems, be sure to include as much
|
|
extra information as possible. This web page details all the
|
|
information that we request in order to provide assistance:
|
|
|
|
http://www.open-mpi.org/community/help/
|
|
|
|
User-level questions and comments should generally be sent to the
|
|
user's mailing list (users@open-mpi.org). Because of spam, only
|
|
subscribers are allowed to post to this list (ensure that you
|
|
subscribe with and post from *exactly* the same e-mail address --
|
|
joe@example.com is considered different than
|
|
joe@mycomputer.example.com!). Visit this page to subscribe to the
|
|
user's list:
|
|
|
|
http://www.open-mpi.org/mailman/listinfo.cgi/users
|
|
|
|
Developer-level bug reports, questions, and comments should generally
|
|
be sent to the developer's mailing list (devel@open-mpi.org). Please
|
|
do not post the same question to both lists. As with the user's list,
|
|
only subscribers are allowed to post to the developer's list. Visit
|
|
the following web page to subscribe:
|
|
|
|
http://www.open-mpi.org/mailman/listinfo.cgi/devel
|
|
|
|
Make today an Open MPI day!
|