diff --git a/NEWS b/NEWS index 6b0e20595c..5133373efe 100644 --- a/NEWS +++ b/NEWS @@ -142,7 +142,9 @@ Trunk (not on release branches yet) - Added new "mindist" process mapper, allowing placement of processes via PCI locality information reported by the BIOS. - MPI-2.2: Add support for MPI_Dist_graph functionality. - +- Enable generic, client-side support for PMI2 implementations. Can + be leveraged by any resource manager that implements PMI2; e.g. SLURM, + versions 2.6 and higher. 1.7.2 ----- diff --git a/README b/README index ffcd7f4ac7..b7917cda87 100644 --- a/README +++ b/README @@ -63,6 +63,31 @@ base as of this writing (22 February 2012): General notes ------------- +- Open MPI now includes two public software layers: MPI and OpenSHMEM. + Throughout this document, references to Open MPI implicitly include + both of these layers. When distinction between these two layers is + necessary, we will reference them as the "MPI" and "OSHMEM" layers + respectively. + + OpenSHMEM is a collaborative effort between academia, industry, + and the U.S. Government to create a specification for a + standardized API for parallel programming in the Partitioned + Global Address Space (PGAS.) For more information about the OpenSHMEM + project, including access to the current OpenSHMEM specification, + please visit: + + http://openshmem.org/ + + The OpenSHMEM implementation contained herein is provided by + Mellanox Technologies Inc. made possible by the support and patient + guidance of the Open MPI community. This implementation attempts + to be portable, to allow it to be deployed in multiple environments, + and to be a starting point for optimizations targeted to particular + hardware platforms. However, until other network vendors and/or + institutions contribute platform specific optimizations, this + implementation will most likely provide optimal performance on Mellanox + hardware and software stacks. + - Open MPI includes support for a wide variety of supplemental hardware and software package. When configuring Open MPI, you may need to supply additional flags to the "configure" script in order @@ -88,7 +113,7 @@ General notes word "plugin" wherever you see "component" in our documentation. For what it's worth, we use the word "component" for historical reasons, mainly because it is part of our acronyms and internal API - functionc calls. + function calls. - The run-time systems that are currently supported are: - rsh / ssh @@ -242,22 +267,23 @@ Compiler Notes ******************************************************************** ******************************************************************** - *** There is now only a single Fortran MPI wrapper compiler: - *** mpifort. mpif77 and mpif90 still exist, but they are symbolic - *** links to mpifort. + *** There is now only a single Fortran MPI wrapper compiler and + *** a single Fortran OSHMEM wrapper compiler: + *** mpifort, and oshfort. mpif77 and mpif90 still exist, but they + *** are symbolic links to mpifort. ******************************************************************** - *** Similarly, Open MPI's configure script only recongizes the FC + *** Similarly, Open MPI's configure script only recognizes the FC *** and FCFLAGS environment variables (to specify the Fortran *** compiler and compiler flags, respectively). The F77 and FFLAGS *** environment variables are IGNORED. ******************************************************************** ******************************************************************** - You can use ompi_info to see with which Fortran compiler Open MPI - was configured and compiled. + You can use either ompi_info or oshmem_info to see with which Fortran + compiler Open MPI was configured and compiled. There are up to three sets of Fortran MPI bindings that may be - provided (depending on your Fortran compiler): + provided depending on your Fortran compiler): - mpif.h: This is the first MPI Fortran interface that was defined in MPI-1. It is a file that is included in Fortran source code. @@ -281,6 +307,14 @@ Compiler Notes is provided, allowing mpi_f08 to be used in new subroutines in legacy MPI applications. + Per the OSHMEM specification, there is only one Fortran OSHMEM binding + provided: + + - shmem.fh: All Fortran OpenSHMEM programs **should** include 'shmem.fh', + and Fortran OSHMEM programs that use constants defined by OpenSHMEM + **MUST** include 'shmem.fh'. + + The following notes apply to the above-listed Fortran bindings: - The mpi_f08 module is new and has been tested with the Intel @@ -291,7 +325,7 @@ Compiler Notes The gfortran compiler is *not* supported with the mpi_f08 module (gfortran lacks some necessary modern Fortran features, sorry). - - All Fortran compilers support the mpif.h-based bindings. + - All Fortran compilers support the mpif.h/shmem.fh-based bindings. - If Open MPI is built with a non-GNU Fortran compiler, all MPI subroutines will be prototyped in the mpi module, meaning that all @@ -322,8 +356,8 @@ General Run-Time Support Notes ------------------------------ - The Open MPI installation must be in your PATH on all nodes (and - potentially LD_LIBRARY_PATH (or DYLD_LIBRARY_PATH), if libmpi is a - shared library), unless using the --prefix or + potentially LD_LIBRARY_PATH (or DYLD_LIBRARY_PATH), if libmpi/libshmem + is a shared library), unless using the --prefix or --enable-mpirun-prefix-by-default functionality (see below). - Open MPI's run-time behavior can be customized via MCA ("MPI @@ -415,7 +449,13 @@ MPI Functionality and Features --disable-io-romio flag to configure when building on OpenBSD. -Collectives +OSHMEM Functionality and Features +------------------------------ + +- All OpenSHMEM-1.0 functionality is supported. + + +MPI Collectives ----------- - The "hierarch" coll component (i.e., an implementation of MPI @@ -481,6 +521,17 @@ Collectives * "commpatterns" - Provides collectives for bootstrap +OSHMEM Collectives +----------- + +- The "fca" scoll component: the Mellanox Fabric Collective Accelerator + (FCA) is a solution for offloading collective operations from the + MPI process onto Mellanox QDR InfiniBand switch CPUs and HCAs. + +- The "basic" scoll component: Reference implementation of all OSHMEM + collective operations. + + Network Support --------------- @@ -524,9 +575,22 @@ Network Support or shell$ mpirun --mca pml cm ... -- MXM is a MellanoX Messaging library utilizing full range of IB +- Similarly, there are two OSHMEM network models available: "yoda", and "ikrit". "yoda" + also uses the BTL components for many supported network. "ikrit" interfaces directly with + Mellanox MXM. + + - "yoda" supports a variety of networks that can be used: + + - OpenFabrics: InfiniBand, iWARP, and RoCE + - Loopback (send-to-self) + - Shared memory + - TCP + + - "ikrit" only supports Mellanox MXM. + +- MXM is the Mellanox Messaging Accelerator library utilizing a full range of IB transports to provide the following messaging services to the upper - level MPI: + level MPI/OSHMEM libraries: - Usage of all available IB transports - Native RDMA support @@ -641,6 +705,16 @@ Open MPI Extensions Building Open MPI ----------------- +Building Open MPI implies building both the MPI and OpenSHMEM libraries, as +such, configure flags that are neither MPI or OSHMEM specific in their names +should be regarded as applicable to both libraries. Some pairs of MPI and +OSHMEM specific switches may be mutually exclusive e.g. passing both + + --disable-mpi-fortran --enable-oshmem-fortran + +will cause configure to abort since the OSHMEM Fortran bindings are dependent +upon the MPI bindings being built. + Open MPI uses a traditional configure script paired with "make" to build. Typical installs can be of the pattern: @@ -660,15 +734,15 @@ INSTALLATION OPTIONS files in /include, its libraries in /lib, etc. --disable-shared - By default, libmpi is built as a shared library, and all components - are built as dynamic shared objects (DSOs). This switch disables - this default; it is really only useful when used with + By default, libmpi and libshmem are built as a shared library, and + all components are built as dynamic shared objects (DSOs). This + switch disables this default; it is really only useful when used with --enable-static. Specifically, this option does *not* imply --enable-static; enabling static libraries and disabling shared libraries are two independent options. --enable-static - Build libmpi as a static library, and statically link in all + Build libmpi and libshmem as static libraries, and statically link in all components. Note that this option does *not* imply --disable-shared; enabling static libraries and disabling shared libraries are two independent options. @@ -681,7 +755,7 @@ INSTALLATION OPTIONS By default, the wrapper compilers (e.g., mpicc) will enable "rpath" support in generated executables on systems that support it. That is, they will include a file reference to the location of Open MPI's - libraries in the MPI application executable itself. This mean that + libraries in the application executable itself. This means that the user does not have to set LD_LIBRARY_PATH to find Open MPI's libraries (e.g., if they are installed in a location that the run-time linker does not search by default). @@ -691,13 +765,13 @@ INSTALLATION OPTIONS is an important difference between the two: "rpath": the location of the Open MPI libraries is hard-coded into - the MPI application and cannot be overridden at run-time. + the MPI/OSHMEM application and cannot be overridden at run-time. "runpath": the location of the Open MPI libraries is hard-coded into - the MPI application, but can be overridden at run-time by + the MPI/OSHMEM application, but can be overridden at run-time by setting the LD_LIBRARY_PATH environment variable. For example, consider that you install Open MPI vA.B.0 and - compile/link your MPI application against it. Later, you install + compile/link your MPI/OSHMEM application against it. Later, you install Open MPI vA.B.1 to a different installation prefix (e.g., /opt/openmpi/A.B.1 vs. /opt/openmpi/A.B.0), and you leave the old installation intact. @@ -933,7 +1007,8 @@ RUN-TIME SYSTEM SUPPORT --with-pmi Build PMI support (by default, it is not built). If PMI support - cannot be found, configure will abort. + cannot be found, configure will abort. If the pmi2.h header is found + in addition to pmi.h, then support for PMI2 will be built. --with-slurm Force the building of SLURM scheduler support. If SLURM support @@ -1133,7 +1208,11 @@ MPI FUNCTIONALITY interface. See README.JAVA.txt for more details. --disable-mpi-fortran - Disable building the Fortran MPI bindings. + Disable building the Fortran MPI bindings. Mutually exclusive to + --enable-oshmem-fortran. + +--disable-oshmem-fortran + Disable building the Fortran OSHMEM bindings. --enable-mpi-ext(=) Enable Open MPI's non-portable API extensions. If no is @@ -1186,7 +1265,7 @@ MPI FUNCTIONALITY information about Open MPI's wrapper compilers). By default, Open MPI's wrapper compilers use the same compilers used to build Open MPI and specify an absolute minimum set of additional flags that are - necessary to compile/link MPI applications. These configure options + necessary to compile/link MPI/OSHMEM applications. These configure options give system administrators the ability to embed additional flags in OMPI's wrapper compilers (which is a local policy decision). The meanings of the different flags are: @@ -1554,7 +1633,7 @@ The following options may be helpful: Changing the values of these parameters is explained in the "The Modular Component Architecture (MCA)" section, below. -When verifying a new Open MPI installation, we recommend running three +When verifying a new Open MPI installation, we recommend running six tests: 1. Use "mpirun" to launch a non-MPI program (e.g., hostname or uptime) @@ -1568,7 +1647,17 @@ tests: receives a few MPI messages (e.g., the ring_c program in the examples/ directory in the Open MPI distribution). -If you can run all three of these tests successfully, that is a good +4. Use "oshrun" to launch a non-OSHMEM program across multiple nodes. + +5. Use "oshrun" to launch a trivial MPI program that does no OSHMEM + communication (e.g., hello_shmem.c program in the examples/ directory + in the Open MPI distribution.) + +6. Use "oshrun" to launch a trivial OSHMEM program that puts and gets + a few messages. (e.g., the ring_shmem.c in the examples/ directory + in the Open MPI distribution.) + +If you can run all six of these tests successfully, that is a good indication that Open MPI built and installed properly. =========================================================================== @@ -1644,17 +1733,22 @@ Compiling Open MPI Applications ------------------------------- Open MPI provides "wrapper" compilers that should be used for -compiling MPI applications: +compiling MPI and OSHMEM applications: -C: mpicc -C++: mpiCC (or mpic++ if your filesystem is case-insensitive) -Fortran: mpifort +C: mpicc, oshcc +C++: mpiCC, oshCC (or mpic++ if your filesystem is case-insensitive) +Fortran: mpifort, oshfort For example: shell$ mpicc hello_world_mpi.c -o hello_world_mpi -g shell$ +For OSHMEM applications: + + shell$ oshcc hello_shmem.c -o hello_shmem -g + shell$ + All the wrapper compilers do is add a variety of compiler and linker flags to the command line and then invoke a back-end compiler. To be specific: the wrapper compilers do not parse source code at all; they @@ -1701,7 +1795,7 @@ Running Open MPI Applications ----------------------------- Open MPI supports both mpirun and mpiexec (they are exactly -equivalent). For example: +equivalent) to launch MPI applications. For example: shell$ mpirun -np 2 hello_world_mpi or @@ -1750,6 +1844,17 @@ Note that the values of component parameters can be changed on the mpirun / mpiexec command line. This is explained in the section below, "The Modular Component Architecture (MCA)". +Open MPI supports oshrun to launch OSHMEM applications. For example: + + shell$ oshrun -np 2 hello_world_oshmem + +OSHMEM applications may also be launched directly by resource managers such as +SLURM. For example, when OMPI is configured --with-pmi and --with-slurm one may +launch OSHMEM applications via srun + + shell$ srun -N 2 hello_world_oshmem + + =========================================================================== The Modular Component Architecture (MCA) @@ -1790,6 +1895,17 @@ sharedfp - shared file pointer operations for MPI I/O topo - MPI topology routines vprotocol - Protocols for the "v" PML +OSHMEM component frameworks: +------------------------- + +atomic - OSHMEM atomic operations +memheap - OSHMEM memory allocators that support the + PGAS memory model +scoll - OSHMEM collective operations +spml - OSHMEM "pml-like" layer: supports one-sided, + point-to-point operations + + Back-end run-time environment (RTE) component frameworks: --------------------------------------------------------- @@ -1826,7 +1942,7 @@ memchecker - Run-time memory checking memcpy - Memopy copy support memory - Memory management hooks pstat - Process status -shmem - Shared memory support +shmem - Shared memory support (NOT related to OSHMEM) timer - High-resolution timers --------------------------------------------------------------------------- @@ -1860,18 +1976,18 @@ MPI, we have interpreted these nine levels as three groups of three: 5. Application tuner / detailed 6. Application tuner / all - 7. MPI developer / basic - 8. MPI developer / detailed - 9. MPI developer / all + 7. MPI/OSHMEM developer / basic + 8. MPI/OSHMEM developer / detailed + 9. MPI/OSHMEM developer / all Here's how the three sub-groups are defined: 1. End user: Generally, these are parameters that are required for correctness, meaning that someone may need to set these just to - get their MPI application to run correctly. + get their MPI/OSHMEM application to run correctly. 2. Application tuner: Generally, these are parameters that can be used to tweak MPI application performance. - 3. MPI developer: Parameters that either don't fit in the other two, + 3. MPI/OSHMEM developer: Parameters that either don't fit in the other two, or are specifically intended for debugging / development of Open MPI itself. @@ -1916,7 +2032,7 @@ values of parameters: shell$ OMPI_MCA_btl_tcp_frag_size=65536 shell$ export OMPI_MCA_btl_tcp_frag_size -4. the mpirun command line: --mca +4. the mpirun/oshrun command line: --mca Where is the name of the parameter. For example: