1
1
openmpi/ompi/runtime
Jeff Squyres 852af8b834 ompi_mpi_abort: fix corner cases, simplify logic
I recently found a case where ompi_mpi_abort() segv's:

{{{
$ mpirun --mca btl non_existent_btl_name ...
}}}

In this case, the BML init fails because we have no paths to any
peers.  It calls ompi_mpi_abort(), but this is before ompi_comm_self
has been setup.  ompi_mpi_abort() assumes that if the comm parameter
is != NULL, it can be used.  But since we aborted so early in
MPI_INIT, that's a false assumption.

(note that this isn't happening on v1.8 because the check for
INIT/FINALIZE in ompi_mpi_abort() is a little different.  Hence: this
is a trunk issue -- at least for now)

When fixing this problem, I noticed a few other problems in ompi_mpi_abort():

* the group access was incorrect (it didn't use accessor functions)
* it wasn't clear that ORTE's ompi_rte_abort_peers() returns
  NOT_IMPLEMENTED and falls through down to ompi_rte_abort()
* the check for my proc in the communicator was a little more
  complicated than necessary
* the logic for checking for aborts early in MPI_INIT wasn't right
* some comments were stale
* the hostname output in error messages would be NULL if MPI_FINALIZE
  had been invoked
* it was possible to abort, but still exit with a 0 status

This commit fixes all of the above problems, and makes the logic a
little more straightforward.  Thanks to Ralph Castain and George
Bosilca for the assists with this patch.

This commit was SVN r32125.
2014-07-03 02:38:27 +00:00
..
help-mpi-runtime.txt Remove tabs for spaces, fix some error messages. 2013-03-01 19:13:06 +00:00
Makefile.am Fix longstanding issue with our multi-project support. Rather than using 2014-01-07 22:11:15 +00:00
mpiruntime.h ompi_mpi_abort had one extra argument that was never used. Clean it up. 2014-07-03 00:34:44 +00:00
ompi_cr.c MCA/base: Add new MCA variable system 2013-03-27 21:09:41 +00:00
ompi_cr.h Move the RTE framework change into the trunk. With this change, all non-CR 2013-01-27 23:25:10 +00:00
ompi_info_support.c tools: Add oshmem_info utility 2013-10-12 19:03:32 +00:00
ompi_info_support.h tools: Add oshmem_info utility 2013-10-12 19:03:32 +00:00
ompi_module_exchange.c As per the RFC: 2014-04-29 21:49:23 +00:00
ompi_module_exchange.h As per the RFC: 2014-04-29 21:49:23 +00:00
ompi_mpi_abort.c ompi_mpi_abort: fix corner cases, simplify logic 2014-07-03 02:38:27 +00:00
ompi_mpi_finalize.c finalize/disconnect: add explicit comment about why we use an RTE barrier 2014-06-26 14:31:40 +00:00
ompi_mpi_init.c Revert r32082 and r32070 - the developer's conference has decided to go a different direction on the threaded progress effort. This will involve some degree of prototyping to understand the tradeoffs prior to making a final design decision, and so we'll hold off on the final change until that is completed. 2014-06-25 20:43:28 +00:00
ompi_mpi_params.c For large scale systems, we would like to avoid doing a full modex during MPI_Init so that launch will scale a little better. At the moment, our options are somewhat limited as only a few BTLs don't immediately call modex_recv on all procs during startup. However, for those situations where someone can take advantage of it, add the ability to do a "modex on demand" retrieval of data from remote procs when we launch via mpirun. 2014-01-11 17:36:06 +00:00
ompi_mpi_preconnect.c MCA/base: Add new MCA variable system 2013-03-27 21:09:41 +00:00
params.h Since the calls to "PMI get" scale by number of procs (not nodes), it makes more sense to have the MCA param be the cutoff based on number of procs. Also, it occurred to me that this shouldn't impact the nidmap process as that is built and circulated when we launch via mpirun, not during direct launch. 2013-08-22 03:40:26 +00:00