2014-05-15 22:28:03 +04:00
|
|
|
/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil -*- */
|
2004-01-15 09:08:25 +03:00
|
|
|
/*
|
2010-03-13 02:57:50 +03:00
|
|
|
* Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana
|
2005-11-05 22:57:48 +03:00
|
|
|
* University Research and Technology
|
|
|
|
* Corporation. All rights reserved.
|
2011-10-04 18:50:31 +04:00
|
|
|
* Copyright (c) 2004-2011 The University of Tennessee and The University
|
2005-11-05 22:57:48 +03:00
|
|
|
* of Tennessee Research Foundation. All rights
|
|
|
|
* reserved.
|
2015-06-24 06:59:57 +03:00
|
|
|
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
2004-11-28 23:09:25 +03:00
|
|
|
* University of Stuttgart. All rights reserved.
|
2005-03-24 15:43:37 +03:00
|
|
|
* Copyright (c) 2004-2005 The Regents of the University of California.
|
|
|
|
* All rights reserved.
|
2015-10-07 01:07:07 +03:00
|
|
|
* Copyright (c) 2006-2015 Cisco Systems, Inc. All rights reserved.
|
2014-05-15 22:28:03 +04:00
|
|
|
* Copyright (c) 2006-2014 Los Alamos National Security, LLC. All rights
|
2015-06-24 06:59:57 +03:00
|
|
|
* reserved.
|
2006-12-05 22:07:02 +03:00
|
|
|
* Copyright (c) 2006 University of Houston. All rights reserved.
|
2009-02-24 20:17:33 +03:00
|
|
|
* Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved.
|
2012-02-06 21:35:21 +04:00
|
|
|
* Copyright (c) 2011 Sandia National Laboratories. All rights reserved.
|
2015-06-18 19:53:20 +03:00
|
|
|
* Copyright (c) 2014-2015 Intel, Inc. All rights reserved.
|
2012-06-27 05:28:28 +04:00
|
|
|
*
|
2004-11-22 04:38:40 +03:00
|
|
|
* $COPYRIGHT$
|
2015-06-24 06:59:57 +03:00
|
|
|
*
|
2004-11-22 04:38:40 +03:00
|
|
|
* Additional copyrights may follow
|
2015-06-24 06:59:57 +03:00
|
|
|
*
|
2004-01-15 09:08:25 +03:00
|
|
|
* $HEADER$
|
|
|
|
*/
|
|
|
|
|
2006-02-12 04:33:29 +03:00
|
|
|
#include "ompi_config.h"
|
2004-01-15 09:08:25 +03:00
|
|
|
|
2006-04-13 21:00:36 +04:00
|
|
|
#ifdef HAVE_SYS_TYPES_H
|
|
|
|
#include <sys/types.h>
|
|
|
|
#endif
|
|
|
|
#ifdef HAVE_UNISTD_H
|
|
|
|
#include <unistd.h>
|
|
|
|
#endif
|
|
|
|
#ifdef HAVE_SYS_PARAM_H
|
|
|
|
#include <sys/param.h>
|
|
|
|
#endif
|
|
|
|
#ifdef HAVE_NETDB_H
|
|
|
|
#include <netdb.h>
|
|
|
|
#endif
|
|
|
|
|
Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.
Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.
Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.
I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:
1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)
2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.
There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.
This commit was SVN r23925.
2010-10-24 22:35:54 +04:00
|
|
|
#include "opal/mca/event/event.h"
|
2009-02-14 05:26:12 +03:00
|
|
|
#include "opal/util/output.h"
|
2005-07-04 01:57:43 +04:00
|
|
|
#include "opal/runtime/opal_progress.h"
|
2005-08-26 14:56:39 +04:00
|
|
|
#include "opal/mca/base/base.h"
|
2006-04-13 21:00:36 +04:00
|
|
|
#include "opal/sys/atomic.h"
|
2009-02-12 17:15:25 +03:00
|
|
|
#include "opal/runtime/opal.h"
|
2013-02-13 01:10:11 +04:00
|
|
|
#include "opal/util/show_help.h"
|
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
|
|
|
#include "opal/mca/mpool/base/base.h"
|
|
|
|
#include "opal/mca/mpool/base/mpool_base_tree.h"
|
|
|
|
#include "opal/mca/rcache/base/base.h"
|
|
|
|
#include "opal/mca/allocator/base/base.h"
|
Per the PMIx RFC:
WHAT: Merge the PMIx branch into the devel repo, creating a new
OPAL “lmix” framework to abstract PMI support for all RTEs.
Replace the ORTE daemon-level collectives with a new PMIx
server and update the ORTE grpcomm framework to support
server-to-server collectives
WHY: We’ve had problems dealing with variations in PMI implementations,
and need to extend the existing PMI definitions to meet exascale
requirements.
WHEN: Mon, Aug 25
WHERE: https://github.com/rhc54/ompi-svn-mirror.git
Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.
All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.
Accordingly, we have:
* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.
* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.
* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint
* removed the prior OMPI/OPAL modex code
* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.
* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand
This commit was SVN r32570.
2014-08-21 22:56:47 +04:00
|
|
|
#include "opal/mca/pmix/pmix.h"
|
2014-09-15 22:00:46 +04:00
|
|
|
#include "opal/util/timings.h"
|
2004-08-14 05:56:05 +04:00
|
|
|
|
2005-08-26 14:56:39 +04:00
|
|
|
#include "mpi.h"
|
2006-02-12 04:33:29 +03:00
|
|
|
#include "ompi/constants.h"
|
2005-08-26 14:56:39 +04:00
|
|
|
#include "ompi/errhandler/errcode.h"
|
|
|
|
#include "ompi/communicator/communicator.h"
|
2009-07-13 08:59:13 +04:00
|
|
|
#include "ompi/datatype/ompi_datatype.h"
|
2012-02-06 21:35:21 +04:00
|
|
|
#include "ompi/message/message.h"
|
2005-08-26 14:56:39 +04:00
|
|
|
#include "ompi/op/op.h"
|
|
|
|
#include "ompi/file/file.h"
|
|
|
|
#include "ompi/info/info.h"
|
|
|
|
#include "ompi/runtime/mpiruntime.h"
|
|
|
|
#include "ompi/attribute/attribute.h"
|
|
|
|
#include "ompi/mca/pml/pml.h"
|
2014-01-17 10:09:29 +04:00
|
|
|
#include "ompi/mca/bml/bml.h"
|
2005-08-26 14:56:39 +04:00
|
|
|
#include "ompi/mca/pml/base/base.h"
|
2014-01-17 10:09:29 +04:00
|
|
|
#include "ompi/mca/bml/base/base.h"
|
2006-01-28 18:38:37 +03:00
|
|
|
#include "ompi/mca/osc/base/base.h"
|
2005-08-26 14:56:39 +04:00
|
|
|
#include "ompi/mca/coll/base/base.h"
|
2013-01-28 03:25:10 +04:00
|
|
|
#include "ompi/mca/rte/rte.h"
|
2014-01-13 21:43:24 +04:00
|
|
|
#include "ompi/mca/rte/base/base.h"
|
2005-08-26 14:56:39 +04:00
|
|
|
#include "ompi/mca/topo/base/base.h"
|
|
|
|
#include "ompi/mca/io/io.h"
|
|
|
|
#include "ompi/mca/io/base/base.h"
|
2007-07-17 18:50:52 +04:00
|
|
|
#include "ompi/mca/pml/base/pml_base_bsend.h"
|
2007-08-02 01:33:25 +04:00
|
|
|
#include "ompi/runtime/params.h"
|
2015-06-18 19:53:20 +03:00
|
|
|
#include "ompi/dpm/dpm.h"
|
2010-08-17 08:44:22 +04:00
|
|
|
#include "ompi/mpiext/mpiext.h"
|
2004-01-29 22:40:22 +03:00
|
|
|
|
2010-03-13 02:57:50 +03:00
|
|
|
#if OPAL_ENABLE_FT_CR == 1
|
2007-03-17 02:11:45 +03:00
|
|
|
#include "ompi/mca/crcp/crcp.h"
|
|
|
|
#include "ompi/mca/crcp/base/base.h"
|
|
|
|
#endif
|
|
|
|
#include "ompi/runtime/ompi_cr.h"
|
|
|
|
|
2013-03-28 01:09:41 +04:00
|
|
|
extern bool ompi_enable_timing;
|
2014-12-08 11:52:37 +03:00
|
|
|
extern bool ompi_enable_timing_ext;
|
2012-04-06 18:23:13 +04:00
|
|
|
|
2004-06-07 19:33:53 +04:00
|
|
|
int ompi_mpi_finalize(void)
|
2004-01-15 09:08:25 +03:00
|
|
|
{
|
2015-10-11 15:31:47 +03:00
|
|
|
int ret = MPI_SUCCESS;
|
2007-12-07 16:09:07 +03:00
|
|
|
opal_list_item_t *item;
|
2014-02-06 22:37:19 +04:00
|
|
|
ompi_proc_t** procs;
|
|
|
|
size_t nprocs;
|
2014-09-15 22:00:46 +04:00
|
|
|
OPAL_TIMING_DECLARE(tm);
|
2015-04-17 11:14:52 +03:00
|
|
|
OPAL_TIMING_INIT_EXT(&tm, OPAL_TIMING_GET_TIME_OF_DAY);
|
2014-09-15 22:00:46 +04:00
|
|
|
|
2006-04-13 21:00:36 +04:00
|
|
|
/* Be a bit social if an erroneous program calls MPI_FINALIZE in
|
|
|
|
two different threads, otherwise we may deadlock in
|
|
|
|
ompi_comm_free() (or run into other nasty lions, tigers, or
|
2015-10-11 15:31:47 +03:00
|
|
|
bears).
|
|
|
|
|
|
|
|
This lock is held for the duration of ompi_mpi_init() and
|
|
|
|
ompi_mpi_finalize(). Hence, if we get it, then no other thread
|
|
|
|
is inside the critical section (and we don't have to check the
|
|
|
|
*_started bool variables). */
|
|
|
|
opal_mutex_lock(&ompi_mpi_bootstrap_mutex);
|
|
|
|
if (!ompi_mpi_initialized || ompi_mpi_finalized) {
|
|
|
|
/* Note that if we're not initialized or already finalized, we
|
|
|
|
cannot raise an MPI exception. The best that we can do is
|
|
|
|
write something to stderr. */
|
2006-04-13 21:00:36 +04:00
|
|
|
char hostname[MAXHOSTNAMELEN];
|
|
|
|
pid_t pid = getpid();
|
2011-11-30 03:24:52 +04:00
|
|
|
gethostname(hostname, sizeof(hostname));
|
2006-04-13 21:00:36 +04:00
|
|
|
|
2015-10-11 15:31:47 +03:00
|
|
|
if (ompi_mpi_initialized) {
|
|
|
|
opal_show_help("help-mpi-runtime.txt",
|
|
|
|
"mpi_finalize: not initialized",
|
|
|
|
true, hostname, pid);
|
|
|
|
} else if (ompi_mpi_finalized) {
|
|
|
|
opal_show_help("help-mpi-runtime.txt",
|
|
|
|
"mpi_finalize:invoked_multiple_times",
|
|
|
|
true, hostname, pid);
|
|
|
|
}
|
|
|
|
opal_mutex_unlock(&ompi_mpi_bootstrap_mutex);
|
2006-04-13 21:00:36 +04:00
|
|
|
return MPI_ERR_OTHER;
|
|
|
|
}
|
2015-10-11 15:31:47 +03:00
|
|
|
ompi_mpi_finalize_started = true;
|
2006-04-13 21:00:36 +04:00
|
|
|
|
2010-08-17 08:44:22 +04:00
|
|
|
ompi_mpiext_fini();
|
|
|
|
|
2006-04-13 21:00:36 +04:00
|
|
|
/* Per MPI-2:4.8, we have to free MPI_COMM_SELF before doing
|
|
|
|
anything else in MPI_FINALIZE (to include setting up such that
|
|
|
|
MPI_FINALIZED will return true). */
|
2006-04-12 05:16:45 +04:00
|
|
|
|
2009-02-24 20:17:33 +03:00
|
|
|
if (NULL != ompi_mpi_comm_self.comm.c_keyhash) {
|
2006-04-12 05:16:45 +04:00
|
|
|
ompi_attr_delete_all(COMM_ATTR, &ompi_mpi_comm_self,
|
2009-02-24 20:17:33 +03:00
|
|
|
ompi_mpi_comm_self.comm.c_keyhash);
|
|
|
|
OBJ_RELEASE(ompi_mpi_comm_self.comm.c_keyhash);
|
|
|
|
ompi_mpi_comm_self.comm.c_keyhash = NULL;
|
2006-04-12 05:16:45 +04:00
|
|
|
}
|
|
|
|
|
2006-04-13 21:00:36 +04:00
|
|
|
/* Proceed with MPI_FINALIZE */
|
|
|
|
|
2004-11-20 22:12:43 +03:00
|
|
|
ompi_mpi_finalized = true;
|
2006-08-02 02:23:57 +04:00
|
|
|
|
2014-01-26 21:26:40 +04:00
|
|
|
/* As finalize is the last legal MPI call, we are allowed to force the release
|
|
|
|
* of the user buffer used for bsend, before going anywhere further.
|
|
|
|
*/
|
|
|
|
(void)mca_pml_base_bsend_detach(NULL, NULL);
|
|
|
|
|
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
|
|
|
#if OPAL_ENABLE_PROGRESS_THREADS == 0
|
2014-06-26 00:43:28 +04:00
|
|
|
opal_progress_set_event_flag(OPAL_EVLOOP_ONCE | OPAL_EVLOOP_NONBLOCK);
|
|
|
|
#endif
|
|
|
|
|
2006-11-22 05:06:52 +03:00
|
|
|
/* Redo ORTE calling opal_progress_event_users_increment() during
|
|
|
|
MPI lifetime, to get better latency when not using TCP */
|
|
|
|
opal_progress_event_users_increment();
|
2006-08-02 02:23:57 +04:00
|
|
|
|
2009-01-07 00:30:12 +03:00
|
|
|
/* check to see if we want timing information */
|
2014-12-08 11:52:37 +03:00
|
|
|
OPAL_TIMING_MSTART((&tm,"time to execute finalize barrier"));
|
2009-01-07 00:30:12 +03:00
|
|
|
|
2009-02-28 15:58:12 +03:00
|
|
|
/* NOTE: MPI-2.1 requires that MPI_FINALIZE is "collective" across
|
|
|
|
*all* connected processes. This only means that all processes
|
|
|
|
have to call it. It does *not* mean that all connected
|
2015-06-24 06:59:57 +03:00
|
|
|
processes need to synchronize (either directly or indirectly).
|
2009-02-28 15:58:12 +03:00
|
|
|
|
|
|
|
For example, it is quite easy to construct complicated
|
|
|
|
scenarios where one job is "connected" to another job via
|
|
|
|
transitivity, but have no direct knowledge of each other.
|
|
|
|
Consider the following case: job A spawns job B, and job B
|
|
|
|
later spawns job C. A "connectedness" graph looks something
|
|
|
|
like this:
|
|
|
|
|
|
|
|
A <--> B <--> C
|
|
|
|
|
|
|
|
So what are we *supposed* to do in this case? If job A is
|
|
|
|
still connected to B when it calls FINALIZE, should it block
|
|
|
|
until jobs B and C also call FINALIZE?
|
|
|
|
|
|
|
|
After lengthy discussions many times over the course of this
|
|
|
|
project, the issue was finally decided at the Louisville Feb
|
|
|
|
2009 meeting: no.
|
|
|
|
|
|
|
|
Rationale:
|
|
|
|
|
|
|
|
- "Collective" does not mean synchronizing. It only means that
|
|
|
|
every process call it. Hence, in this scenario, every
|
|
|
|
process in A, B, and C must call FINALIZE.
|
|
|
|
|
|
|
|
- KEY POINT: if A calls FINALIZE, then it is erroneous for B or
|
|
|
|
C to try to communicate with A again.
|
|
|
|
|
|
|
|
- Hence, OMPI is *correct* to only effect a barrier across each
|
|
|
|
jobs' MPI_COMM_WORLD before exiting. Specifically, if A
|
|
|
|
calls FINALIZE long before B or C, it's *correct* if A exits
|
|
|
|
at any time (and doesn't notify B or C that it is exiting).
|
|
|
|
|
|
|
|
- Arguably, if B or C do try to communicate with the now-gone
|
|
|
|
A, OMPI should try to print a nice error ("you tried to
|
|
|
|
communicate with a job that is already gone...") instead of
|
|
|
|
segv or other Badness. However, that is an *extremely*
|
|
|
|
difficult problem -- sure, it's easy for A to tell B that it
|
|
|
|
is finalizing, but how can A tell C? A doesn't even know
|
|
|
|
about C. You'd need to construct a "connected" graph in a
|
|
|
|
distributed fashion, which is fraught with race conditions,
|
|
|
|
etc.
|
|
|
|
|
|
|
|
Hence, our conclusion is: OMPI is *correct* in its current
|
|
|
|
behavior (of only doing a barrier across its own COMM_WORLD)
|
|
|
|
before exiting. Any problems that occur are as a result of
|
|
|
|
erroneous MPI applications. We *could* tighten up the erroneous
|
|
|
|
cases and ensure that we print nice error messages / don't
|
|
|
|
crash, but that is such a difficult problem that we decided we
|
|
|
|
have many other, much higher priority issues to handle that deal
|
|
|
|
with non-erroneous cases. */
|
|
|
|
|
2014-06-26 18:31:40 +04:00
|
|
|
/* Wait for everyone to reach this point. This is a grpcomm
|
|
|
|
barrier instead of an MPI barrier for (at least) two reasons:
|
|
|
|
|
|
|
|
1. An MPI barrier doesn't ensure that all messages have been
|
|
|
|
transmitted before exiting (e.g., a BTL can lie and buffer a
|
|
|
|
message without actually injecting it to the network, and
|
|
|
|
therefore require further calls to that BTL's progress), so
|
|
|
|
the possibility of a stranded message exists.
|
|
|
|
|
|
|
|
2. If the MPI communication is using an unreliable transport,
|
|
|
|
there's a problem of knowing that everyone has *left* the
|
|
|
|
barrier. E.g., one proc can send its ACK to the barrier
|
|
|
|
message to a peer and then leave the barrier, but the ACK
|
|
|
|
can get lost and therefore the peer is left in the barrier.
|
|
|
|
|
|
|
|
Point #1 has been known for a long time; point #2 emerged after
|
|
|
|
we added the first unreliable BTL to Open MPI and fixed the
|
|
|
|
del_procs behavior around May of 2014 (see
|
|
|
|
https://svn.open-mpi.org/trac/ompi/ticket/4669#comment:4 for
|
|
|
|
more details). */
|
Per the PMIx RFC:
WHAT: Merge the PMIx branch into the devel repo, creating a new
OPAL “lmix” framework to abstract PMI support for all RTEs.
Replace the ORTE daemon-level collectives with a new PMIx
server and update the ORTE grpcomm framework to support
server-to-server collectives
WHY: We’ve had problems dealing with variations in PMI implementations,
and need to extend the existing PMI definitions to meet exascale
requirements.
WHEN: Mon, Aug 25
WHERE: https://github.com/rhc54/ompi-svn-mirror.git
Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.
All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.
Accordingly, we have:
* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.
* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.
* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint
* removed the prior OMPI/OPAL modex code
* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.
* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand
This commit was SVN r32570.
2014-08-21 22:56:47 +04:00
|
|
|
opal_pmix.fence(NULL, 0);
|
2012-06-27 05:28:28 +04:00
|
|
|
|
2009-01-07 00:30:12 +03:00
|
|
|
/* check for timing request - get stop time and report elapsed
|
|
|
|
time if so */
|
2014-12-08 11:52:37 +03:00
|
|
|
OPAL_TIMING_MSTOP(&tm);
|
|
|
|
OPAL_TIMING_DELTAS(ompi_enable_timing, &tm);
|
|
|
|
OPAL_TIMING_REPORT(ompi_enable_timing_ext, &tm);
|
2014-09-15 22:00:46 +04:00
|
|
|
OPAL_TIMING_RELEASE(&tm);
|
2009-01-07 00:30:12 +03:00
|
|
|
|
2007-03-17 02:11:45 +03:00
|
|
|
/*
|
|
|
|
* Shutdown the Checkpoint/Restart Mech.
|
|
|
|
*/
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_cr_finalize())) {
|
2013-01-28 03:25:10 +04:00
|
|
|
OMPI_ERROR_LOG(ret);
|
2007-03-17 02:11:45 +03:00
|
|
|
}
|
|
|
|
|
2007-12-07 16:09:07 +03:00
|
|
|
/* Shut down any bindings-specific issues: C++, F77, F90 */
|
|
|
|
|
|
|
|
/* Remove all memory associated by MPI_REGISTER_DATAREP (per
|
|
|
|
MPI-2:9.5.3, there is no way for an MPI application to
|
|
|
|
*un*register datareps, but we don't want the OMPI layer causing
|
|
|
|
memory leaks). */
|
|
|
|
while (NULL != (item = opal_list_remove_first(&ompi_registered_datareps))) {
|
|
|
|
OBJ_RELEASE(item);
|
|
|
|
}
|
2008-09-01 10:01:06 +04:00
|
|
|
OBJ_DESTRUCT(&ompi_registered_datareps);
|
2004-11-20 22:12:43 +03:00
|
|
|
|
2008-09-01 02:37:26 +04:00
|
|
|
/* Remove all F90 types from the hash tables. As the OBJ_DESTRUCT will
|
|
|
|
* call a special destructor able to release predefined types, we can
|
|
|
|
* simply call the OBJ_DESTRUCT on the hash table and all memory will
|
|
|
|
* be correctly released.
|
|
|
|
*/
|
|
|
|
OBJ_DESTRUCT( &ompi_mpi_f90_integer_hashtable );
|
|
|
|
OBJ_DESTRUCT( &ompi_mpi_f90_real_hashtable );
|
|
|
|
OBJ_DESTRUCT( &ompi_mpi_f90_complex_hashtable );
|
|
|
|
|
2004-11-20 22:12:43 +03:00
|
|
|
/* Free communication objects */
|
|
|
|
|
|
|
|
/* free file resources */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_file_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
|
|
|
|
2006-01-28 18:38:37 +03:00
|
|
|
/* free window resources */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_win_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2006-01-28 18:38:37 +03:00
|
|
|
}
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_osc_base_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2006-01-28 18:38:37 +03:00
|
|
|
}
|
|
|
|
|
2014-05-15 22:28:03 +04:00
|
|
|
/* free communicator resources. this MUST come before finalizing the PML
|
|
|
|
* as this will call into the pml */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_comm_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2014-05-15 22:28:03 +04:00
|
|
|
}
|
|
|
|
|
2015-09-23 22:48:37 +03:00
|
|
|
/* call del_procs on all allocated procs even though some may not be known
|
|
|
|
* to the pml layer. the pml layer is expected to be resilient and ignore
|
|
|
|
* any unknown procs. */
|
2014-05-15 22:28:03 +04:00
|
|
|
nprocs = 0;
|
2015-09-23 22:48:37 +03:00
|
|
|
procs = ompi_proc_get_allocated (&nprocs);
|
2014-05-15 22:28:03 +04:00
|
|
|
MCA_PML_CALL(del_procs(procs, nprocs));
|
|
|
|
free(procs);
|
|
|
|
|
2015-06-24 06:59:57 +03:00
|
|
|
/* free pml resource */
|
|
|
|
if(OMPI_SUCCESS != (ret = mca_pml_base_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2007-10-09 19:28:56 +04:00
|
|
|
}
|
2004-11-20 22:12:43 +03:00
|
|
|
|
|
|
|
/* free requests */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_request_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
|
|
|
|
2012-02-06 21:35:21 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = ompi_message_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2012-02-06 21:35:21 +04:00
|
|
|
}
|
|
|
|
|
2007-08-02 01:33:25 +04:00
|
|
|
/* If requested, print out a list of memory allocated by ALLOC_MEM
|
|
|
|
but not freed by FREE_MEM */
|
|
|
|
if (0 != ompi_debug_show_mpi_alloc_mem_leaks) {
|
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
|
|
|
mca_mpool_base_tree_print(ompi_debug_show_mpi_alloc_mem_leaks);
|
2007-08-02 01:33:25 +04:00
|
|
|
}
|
|
|
|
|
2004-12-02 16:28:10 +03:00
|
|
|
/* Now that all MPI objects dealing with communications are gone,
|
|
|
|
shut down MCA types having to do with communications */
|
2013-03-28 01:17:31 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = mca_base_framework_close(&ompi_pml_base_framework) ) ) {
|
|
|
|
OMPI_ERROR_LOG(ret);
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2004-12-02 16:28:10 +03:00
|
|
|
}
|
|
|
|
|
2007-07-17 18:50:52 +04:00
|
|
|
/* shut down buffered send code */
|
|
|
|
mca_pml_base_bsend_fini();
|
|
|
|
|
2010-03-13 02:57:50 +03:00
|
|
|
#if OPAL_ENABLE_FT_CR == 1
|
2007-03-17 02:11:45 +03:00
|
|
|
/*
|
|
|
|
* Shutdown the CRCP Framework, must happen after PML shutdown
|
|
|
|
*/
|
2013-03-28 01:17:31 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = mca_base_framework_close(&ompi_crcp_base_framework) ) ) {
|
2013-01-28 03:25:10 +04:00
|
|
|
OMPI_ERROR_LOG(ret);
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2007-03-17 02:11:45 +03:00
|
|
|
}
|
|
|
|
#endif
|
2004-12-02 16:28:10 +03:00
|
|
|
|
2004-11-20 22:12:43 +03:00
|
|
|
/* Free secondary resources */
|
|
|
|
|
|
|
|
/* free attr resources */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_attr_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* free group resources */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_group_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
|
|
|
|
2015-06-18 19:53:20 +03:00
|
|
|
/* finalize the DPM subsystem */
|
|
|
|
if ( OMPI_SUCCESS != (ret = ompi_dpm_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2008-02-28 04:57:57 +03:00
|
|
|
}
|
2015-06-24 06:59:57 +03:00
|
|
|
|
2004-11-20 22:12:43 +03:00
|
|
|
/* free internal error resources */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_errcode_intern_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
2015-06-24 06:59:57 +03:00
|
|
|
|
2004-11-20 22:12:43 +03:00
|
|
|
/* free error code resources */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_mpi_errcode_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* free errhandler resources */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_errhandler_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Free all other resources */
|
|
|
|
|
|
|
|
/* free op resources */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_op_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* free ddt resources */
|
2009-07-13 08:59:13 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = ompi_datatype_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* free info resources */
|
|
|
|
if (OMPI_SUCCESS != (ret = ompi_info_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Close down MCA modules */
|
|
|
|
|
2005-01-04 18:43:26 +03:00
|
|
|
/* io is opened lazily, so it's only necessary to close it if it
|
|
|
|
was actually opened */
|
2013-03-28 01:17:31 +04:00
|
|
|
if (0 < ompi_io_base_framework.framework_refcnt) {
|
|
|
|
/* May have been "opened" multiple times. We want it closed now */
|
|
|
|
ompi_io_base_framework.framework_refcnt = 1;
|
2005-01-04 18:43:26 +03:00
|
|
|
|
2013-03-28 01:17:31 +04:00
|
|
|
if (OMPI_SUCCESS != mca_base_framework_close(&ompi_io_base_framework)) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2005-01-04 18:43:26 +03:00
|
|
|
}
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
2013-03-28 01:17:31 +04:00
|
|
|
(void) mca_base_framework_close(&ompi_topo_base_framework);
|
|
|
|
if (OMPI_SUCCESS != (ret = mca_base_framework_close(&ompi_osc_base_framework))) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2006-01-28 18:38:37 +03:00
|
|
|
}
|
2013-03-28 01:17:31 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = mca_base_framework_close(&ompi_coll_base_framework))) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
2014-01-17 10:09:29 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = mca_base_framework_close(&ompi_bml_base_framework))) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2014-01-17 10:09:29 +04:00
|
|
|
}
|
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = mca_base_framework_close(&opal_mpool_base_framework))) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2005-01-15 16:20:26 +03:00
|
|
|
}
|
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = mca_base_framework_close(&opal_rcache_base_framework))) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2005-09-13 02:28:23 +04:00
|
|
|
}
|
George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.
2014-07-26 04:47:28 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = mca_base_framework_close(&opal_allocator_base_framework))) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2014-05-15 19:59:51 +04:00
|
|
|
}
|
2009-02-13 06:40:53 +03:00
|
|
|
|
2014-11-29 02:26:36 +03:00
|
|
|
/* free proc resources */
|
|
|
|
if ( OMPI_SUCCESS != (ret = ompi_proc_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2014-11-29 02:26:36 +03:00
|
|
|
}
|
|
|
|
|
2013-12-24 15:05:51 +04:00
|
|
|
if (NULL != ompi_mpi_main_thread) {
|
|
|
|
OBJ_RELEASE(ompi_mpi_main_thread);
|
|
|
|
ompi_mpi_main_thread = NULL;
|
|
|
|
}
|
|
|
|
|
2015-10-07 01:07:07 +03:00
|
|
|
/* Clean up memory/resources from the MPI dynamic process
|
|
|
|
functionality checker */
|
|
|
|
ompi_mpi_dynamics_finalize();
|
|
|
|
|
2012-06-27 05:28:28 +04:00
|
|
|
/* Leave the RTE */
|
|
|
|
|
2013-01-28 03:25:10 +04:00
|
|
|
if (OMPI_SUCCESS != (ret = ompi_rte_finalize())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2004-11-20 22:12:43 +03:00
|
|
|
}
|
When we abort during MPI_Init, we currently emit a totally incorrect error message stating that we were unable to aggregate error messages and cannot guarantee all other processes were killed. This simply isn't true IF the rte has been initialized.
So track that the rte has reached that point, and only emit the new message if it is accurate.
Note that we still generate a TON of output for a minor error:
Ralphs-iMac:examples rhc$ mpirun -n 3 -mca btl sm ./hello_c
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Process 1 ([[50239,1],2]) is on host: Ralphs-iMac
Process 2 ([[50239,1],2]) is on host: Ralphs-iMac
BTLs attempted: sm
Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another. This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used. Your MPI job will now abort.
You may wish to try to narrow down the problem;
* Check the output of ompi_info to see which BTL/MTL plugins are
available.
* Run your application with MPI_THREAD_SINGLE.
* Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
if using MTL-based communications) to see exactly which
communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[50239,1],2]
Exit code: 1
--------------------------------------------------------------------------
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mca-bml-r2.txt / unreachable proc
[Ralphs-iMac.local:23227] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mpi-runtime / mpi_init:startup:pml-add-procs-fail
Ralphs-iMac:examples rhc$
Hopefully, we can agree on a way to reduce this verbage!
This commit was SVN r31686.
The following SVN revision numbers were found above:
r2 --> open-mpi/ompi@58fdc188553052bc2e893ba28fb28fddbe78435a
2014-05-08 19:48:16 +04:00
|
|
|
ompi_rte_initialized = false;
|
2004-08-29 13:05:14 +04:00
|
|
|
|
2014-01-13 21:43:24 +04:00
|
|
|
/* now close the rte framework */
|
|
|
|
if (OMPI_SUCCESS != (ret = mca_base_framework_close(&ompi_rte_base_framework) ) ) {
|
|
|
|
OMPI_ERROR_LOG(ret);
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2014-01-13 21:43:24 +04:00
|
|
|
}
|
|
|
|
|
2011-10-04 18:50:31 +04:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_finalize_util())) {
|
2015-10-11 15:31:47 +03:00
|
|
|
goto done;
|
2009-02-12 00:01:56 +03:00
|
|
|
}
|
|
|
|
|
2004-11-20 22:12:43 +03:00
|
|
|
/* All done */
|
2004-01-15 09:08:25 +03:00
|
|
|
|
2015-10-11 15:31:47 +03:00
|
|
|
done:
|
|
|
|
opal_mutex_unlock(&ompi_mpi_bootstrap_mutex);
|
|
|
|
|
|
|
|
return ret;
|
2004-01-15 09:08:25 +03:00
|
|
|
}
|