Copyright (c) 2004-2005 The Trustees of Indiana University. All rights reserved. Copyright (c) 2004-2005 The Trustees of the University of Tennessee. All rights reserved. Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, University of Stuttgart. All rights reserved. Copyright (c) 2004-2005 The Regents of the University of California. All rights reserved. $COPYRIGHT$ Additional copyrights may follow $HEADER$ This is a preliminary README file. It will be scrubbed formally before release. The best way to report bugs, send comments, or ask questions is to sign up on the user's and/or developer's mailing list (for user-level and developer-level questions; when in doubt, send to the user's list): users@open-mpi.org devel@open-mpi.org Because of spam, only subscribers are allowed to post to these lists (ensure that you subscribe with and post from exactly the same e-mail address -- joe@example.com is considered different than joe@mycomputer.example.com!). Visit these pages to subscribe to the lists: http://www.open-mpi.org/mailman/listinfo.cgi/users http://www.open-mpi.org/mailman/listinfo.cgi/devel Thanks for your time. =========================================================================== The following abbreviated list of release notes applies to this code base as of this writing (8 Aug 2005): - The Open MPI installation must be in your PATH on all nodes (and potentially LD_LIBRARY_PATH, if libmpi is a shared library). - LAM/MPI-like mpirun notation of "C" and "N" is not yet supported. - Shared memory support will not function properly on machines that have a weak memory consistency mode. The default in this beta is to disable shared memory support on all Power PC architectures, even though some Power PC platforms have strong memory consistency models. See the description of the --enable-ptl-sm configure flag, below. - Striping MPI messages across multiple networks is supported (and happens automatically when multiple networks are available), but needs performance tuning. - The only run-time systems currently supported are: - rsh / ssh - Recent versions of BProc - Complete user and system administrator documentation is missing (this file comprises the majority of the current user documentation). - The Fortran 90 MPI API is disabled by default (we have only be able to get it to work with gfortran). You can enable with with configure options; see below. - Missing MPI functionality: - MPI-2 one-sided functionality will not be included in the first few releases of Open MPI. - Systems that have been tested are: - Linux, 32 bit, with gcc - OS X (10.3), 32 bit, with gcc - OS X (10.4), 32 bit, with gcc - Other systems have been lightly (but not fully tested): - Other compilers on Linux, 32 bit - 64 bit platforms (AMD, PPC64, Sparc); they "mostly work", but there are still some known issues - There are some cases where after running MPI applications, the directory /tmp/openmpi-sessions-@* will exist (but will likely be empty). It is safe to remove after the run is complete. - The MPI and run-time layers do not free all used memory properly during MPI_FINALIZE. - Running on nodes with different endian and/or different datatype sizes within a single parallel application is not supported in this beta. - Threading support (both asynchronous progress and MPI_THREAD_MULTIPLE) is included, but is only lightly tested. =========================================================================== Building Open MPI ----------------- Open MPI uses a traditional configure script paired with "make" to build. Typical installs can be of the pattern: --------------------------------------------------------------------------- shell$ ./configure [...options...] shell$ make all install --------------------------------------------------------------------------- There are many available configure options (see "./configure --help" for a full list); a summary of the more important ones follows: --prefix= Install Open MPI into the base directory named . Hence, Open MPI will place its executables in /bin, its header files in /include, its libraries in /lib, etc. --with-btl-gm= Specify the directory where the GM libraries and header files are located. This enables GM support in Open MPI. --with-btl-mx= Specify the directory where the MX libraries and header files are located. This enables MX support in Open MPI. --with-btl-mvapi= Specify the directory where the mVAPI libraries and header files are located. This enables mVAPI support in Open MPI. --with-btl-openib= Specify the directory where the Open IB libraries and header files are located. This enables mVAPI support in Open MPI. --with-mpi-param_check(=value) "value" can be one of: always, never, runtime. If no value is specified, or this option is not used, "always" is the default. Using --without-mpi-param-check is equivalent to "never". - always: the parameters of MPI functions are always checked for errors - never: the parameters of MPI functions are never checked for errors - runtime: whether the parameters of MPI functions are checked depends on the value of the MCA parameter mpi_param_check (default: yes). --with-threads=value Since thread support (both support for MPI_THREAD_MULTIPLE and asynchronous progress) is only partially tested, it is disabled by default. To enable threading, use "--with-threads=posix". This is most useful when combined with --enable-mpi-threads and/or --enable-progress-threads. --enable-mpi-threads Allows the MPI thread level MPI_THREAD_MULTIPLE. See --with-threads; this is currently disabled by default. --enable-progress-threads Allows asynchronous progress in some transports. See --with-threads; this is currently disabled by default. --enable-f90 Enable building the Fortran 90 MPI bindings (disabled by default). We have only been able to get these bindings to build with gfortran. Also related to the --with-f90-max-array-dim option. --with-f90-max-array-dim= The F90 MPI bindings are stictly typed, even including the number of dimensions for arrays for MPI choice buffer parameters. Open MPI generates these bindings at compile time with a maximum number of dimensions as specified by this parameter. The default value is 4. --disable-shared By default, libmpi is built as a shared library, and all components are built as dynamic shared objects (DSOs). This switch disables this default; it is really only useful when used with --enable-static. --enable-static Build libmpi as a static library, and statically link in all components. There are other options available -- see "./configure --help". Open MPI supports all the "make" targets that are provided by GNU Automake, such as: all - build the entire package install - install the package uninstall - remove all traces of the package from the $prefix clean - clean out the build tree Once Open MPI has been built and installed, it is safe to run "make clean" and/or remove the entire build tree. VPATH builds are fully supported. Generally speaking, the only thing that users need to do to use Open MPI is ensure that /bin is in their PATH. Users may need to ensure that this directory is set in their PATH in their shell setup files (e.g., .bashrc, .cshrc) so that rsh/ssh-based logins will be able to find the Open MPI executables. Setting LD_LIBRARY_PATH is typically not necessary, but in some cases, if libmpi.so cannot be found when MPI applications are run, /lib should be added to LD_LIBRARY_PATH. =========================================================================== Checking Your Open MPI Installation ----------------------------------- The "ompi_info" command can be used to check the status of your Open MPI installation (located in /bin/ompi_info). Running it with no arguments provides a summary of information about your Open MPI installation. Note that the ompi_info command is extremely helpful in determining which components are installed as well as listing all the run-time settable parameters that are available in each component (as well as their default values). The following options may be helpful: --all Show a *lot* of information about your Open MPI installation. --parsable Display all the information in an easily grep/cut/awk/sed-able format. --param A of "all" and a of "all" will show all parameters to all components. Otherwise, the parameters of all the components in a specific framework, or just the parameters of a specific component can be displayed by using an appropriate and/or name. Changing the values of these parameters is explained in the "The Modular Component Architecture (MCA)" section, below. =========================================================================== Compiling Open MPI Applications ------------------------------- Open MPI provides "wrapper" compilers that should be used for compiling MPI applications: C: mpicc C++: mpiCC (or mpic++ if your filesystem is case-insensitive) Fortran 77: mpif77 Fortran 90: mpif90 For example: shell$ mpicc hello_world_mpi.c -o hello_world_mpi -g shell$ All the wrapper compilers do is add a variety of compiler and linker flags to the command line and then invoke a back-end compiler. The end result is an MPI executable that is properly linked to all the relevant libraries. =========================================================================== Running Open MPI Applications ----------------------------- Open MPI supports both mpirun and mpiexec (they are actually the same). For example: shell$ mpirun -np 2 hello_world_mpi or shell$ mpiexec -np 1 hello_world_mpi : -np 1 hello_world_mpi are equivalent. Many of mpiexec's switches (such as -host and -arch) are not yet functional, although they will not error if you try to use them. Since rsh is probably the launcher that you will be using (if you are outside of Los Alamos National Laboratory), you can also specify a -hostfile parameter, indicating an standard mpirun-style hostfile (one hostname per line): shell$ mpirun -hostfile my_hostfile -np 2 hello_world_mpi If you intend to run more than one process on a node, the hostfile can use the "slots" attribute. If "slots" is not specified, a count of 1 is assumed. For example, using the following hostfile: --------------------------------------------------------------------------- node1.example.com node2.example.com node3.example.com slots=2 node4.example.com slots=4 --------------------------------------------------------------------------- shell$ mpirun -hostfile my_hostfile -np 8 hello_world_mpi will launch MPI_COMM_WORLD rank 0 on node1, rank 1 on node2, ranks 2 and 3 on node3, and ranks 4 through 7 on node4. Note that the values of component parameters can be changed on the mpirun / mpiexec command line. This is explained in the section below, "The Modular Component Architecture (MCA)". =========================================================================== The Modular Component Architecture (MCA) The MCA is the backbone of Open MPI -- most services and functionality are implemented through MCA components. Here is a list of all the component frameworks in Open MPI: --------------------------------------------------------------------------- MPI component frameworks: ------------------------- coll - MPI collective algorithms io - MPI-2 I/O pml - MPI point-to-point management layer bml - BTL management layer btl - MPI point-to-point byte transfer layer topo - MPI topology routines Back-end run-time environment component frameworks: --------------------------------------------------- errmgr - RTE error manager gpr - General purpose registry iof - I/O forwarding ns - Name server oob - Out of band messaging pls - Process launch system ras - Resource allocation system rds - Resource discovery system rmaps - Resource mapping system rmgr - Resource manager rml - RTE message layer soh - State of health monitor Miscellaneous frameworks: ------------------------- allocator - Memory allocator mpool - Memory pooling paffinity - Processor affinity timer - High-resolution timers memory - Memory subsystem hooks --------------------------------------------------------------------------- Each framework typically has one or more components that are used at run-time. For example, the ptl framework is used by MPI to send bytes across underlying networks. The tcp ptl, for example, sends messages across TCP-based networks; the gm ptl sends messages across GM Myrinet-based networks. Each component typically has some tunable parameters that can be changed at run-time. Use the ompi_info command to check a component to see what its tunable parameters are. For example: shell$ ompi_info --param btl tcp shows all the parameters (and default values) for the tcp btl component. These values can be overridden at run-time in several ways. At run-time, the following locations are examined (in order) for new values of parameters: 1. /etc/openmpi-mca-params.conf This file is intended to set any system-wide default MCA parameter values -- it will apply, by default, to all users who use this Open MPI installation. The default file that is installed contains many comments explaining its format. 2. $HOME/.openmpi/mca-params.conf If this file exists, it should be in the same format as /etc/openmpi-mca-params.conf. It is intended to provide per-user default parameter values. 3. environment variables of the form OMPI_MCA_ set equal to a Where is the name of the parameter. For example, set the variable named OMPI_MCA_btl_tcp_frag_size to the value 65536 (Bourne-style shells): shell$ OMPI_MCA_btl_tcp_frag_size=65536 shell$ export OMPI_MCA_btl_tcp_frag_size 4. the mpirun command line: --mca Where is the name of the parameter. For example: shell$ mpirun --mca btl_tcp_frag_size 65536 -np 2 hello_world_mpi These locations are checked in order. For example, a parameter value passed on the mpirun command line will override an environment variable; an environment variable will override the system-wide defaults. =========================================================================== Common Questions ---------------- 1. How do I change the rsh/ssh launcher to use rsh? The default remote shell agent for the rsh/ssh launcher is ssh, but you can set an MCA parameter to change it to rsh (or use a specific path for ssh, pass different parameters to rsh/ssh, etc.). The MCA parameter name is pls_rsh_agent. You can use any of the methods for setting MCA parameters described above; for example: shell$ mpirun --mca pls_rsh_agent rsh -np 4 a.out 2. When I run "make", it looks very much like the build system is going into a loop, and I see messages similar to: Warning: File `Makefile.am' has modification time 3.6e+04 s in the future Open MPI uses an Automake-based build system, and is therefore highly dependent upon filesystem timestamps. If building on a networked file system, you *must* ensure that the time of the machine that you are building on is tightly synchronized with the time on your network fileserver (e.g., using ntp). If this is not possible, you will need to build Open MPI on a non-networked filesystem. ...we'll add more questions here as they are asked by real users. =========================================================================== Got more questions? ------------------- Found a bug? Got a question? Want to make a suggestion? Want to contribute to Open MPI? Please let us know! User-level questions and comments should generally be sent to the user's mailing list (users@open-mpi.org). Because of spam, only subscribers are allowed to post to this list (ensure that you subscribe with and post from *exactly* the same e-mail address -- joe@example.com is considered different than joe@mycomputer.example.com!). Visit this page to subscribe to the user's list: http://www.open-mpi.org/mailman/listinfo.cgi/users Developer-level bug reports, questions, and comments should generally be sent to the developer's mailing list (devel@open-mpi.org). Please do not post the same question to both lists. As with the user's list, only subscribers are allowed to post to the developer's list. Visit the following web page to subscribe: http://www.open-mpi.org/mailman/listinfo.cgi/devel When submitting bug reports to either list, be sure to include the following information in your mail (please compress!): - the stdout and stderr from Open MPI's configure - the top-level config.log file - the stdout and stderr from building Open MPI - the output from "ompi_info --all" (if possible) For Bourne-type shells, here's one way to capture this information: shell$ ./configure ... 2>&1 | tee config.out [...lots of configure output...] shell$ make 2>&1 | tee make.out [...lots of make output...] shell$ mkdir ompi-output shell$ cp config.out config.log make.out ompi-output shell$ ompi_info --all |& tee ompi-output/ompi-info.out shell$ tar cvf ompi-output.tar ompi-output [...output from tar...] shell$ gzip ompi-output.tar For C shell-type shells, the procedure is only slightly different: shell% ./configure ... |& tee config.out [...lots of configure output...] shell% make |& tee make.out [...lots of make output...] shell% mkdir ompi-output shell% cp config.out config.log make.out ompi-output shell% ompi_info --all |& tee ompi-output/ompi-info.out shell% tar cvf ompi-output.tar ompi-output [...output from tar...] shell% gzip ompi-output.tar In either case, attach the resulting ompi-output.tar.gz file to your mail. This provides the Open MPI developers with a lot of information about your installation and can greatly assist us in helping with your problem. Be sure to also include any other useful files (in the ompi-output.tar.gz tarball), such as output showing specific errors.