1
1
openmpi/ompi/runtime
Ralph Castain 0995a6f3b9 Revert r29917 and replace it with a fix that resolves the thread deadlock while retaining the desired debug info. In an earlier commit, we had changed the modex accordingly:
* automatically retrieve the hostname (and all RTE info) for all procs during MPI_Init if nprocs < cutoff

* if nprocs > cutoff, retrieve the hostname (and all RTE info) for a proc upon the first call to modex_recv for that proc. This would provide the hostname for debugging purposes as we only report errors on messages, and so we must have called modex_recv to get the endpoint info

* BTLs are not to call modex_recv until they need the endpoint info for first message - i.e., not during add_procs so we don't call it for every process in the job, but only those with whom we communicate

My understanding is that only some BTLs have been modified to meet that third requirement, but those include the Cray ones where jobs are big enough that launch times were becoming an issue. Other BTLs would hopefully be modified as time went on and interest in using them at scale arose. Meantime, those BTLs would call modex_recv on every proc, and we would therefore be no worse than the prior behavior.

This commit revises the MPI-RTE interface to pass the ompi_proc_t instead of the ompi_process_name_t for the proc so that the hostname can be easily inserted. I have advised the ORNL folks of the change.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix thread deadlock

This commit was SVN r29931.

The following SVN revision numbers were found above:
  r29917 --> open-mpi/ompi@1a972e2c9d
2013-12-17 03:26:00 +00:00
..
help-mpi-runtime.txt Remove tabs for spaces, fix some error messages. 2013-03-01 19:13:06 +00:00
Makefile.am Per RFC add initial support for the MPI 3.0 tools interface. 2013-04-24 15:59:23 +00:00
mpiruntime.h Per RFC add initial support for the MPI 3.0 tools interface. 2013-04-24 15:59:23 +00:00
ompi_cr.c MCA/base: Add new MCA variable system 2013-03-27 21:09:41 +00:00
ompi_cr.h Move the RTE framework change into the trunk. With this change, all non-CR 2013-01-27 23:25:10 +00:00
ompi_info_support.c tools: Add oshmem_info utility 2013-10-12 19:03:32 +00:00
ompi_info_support.h tools: Add oshmem_info utility 2013-10-12 19:03:32 +00:00
ompi_module_exchange.c Revert r29917 and replace it with a fix that resolves the thread deadlock while retaining the desired debug info. In an earlier commit, we had changed the modex accordingly: 2013-12-17 03:26:00 +00:00
ompi_module_exchange.h Move the RTE framework change into the trunk. With this change, all non-CR 2013-01-27 23:25:10 +00:00
ompi_mpi_abort.c ***** THIS INCLUDES A SMALL CHANGE IN THE MPI-RTE INTERFACE ***** 2013-10-08 18:37:59 +00:00
ompi_mpi_finalize.c Due to MPI_Comm_idup we can no longer use the communicator's CID as 2013-10-03 01:11:28 +00:00
ompi_mpi_init.c Due to MPI_Comm_idup we can no longer use the communicator's CID as 2013-10-03 01:11:28 +00:00
ompi_mpi_params.c Fix conditional: don't just check the constant (thanks to clang for an 2013-12-02 19:41:59 +00:00
ompi_mpi_preconnect.c MCA/base: Add new MCA variable system 2013-03-27 21:09:41 +00:00
params.h Since the calls to "PMI get" scale by number of procs (not nodes), it makes more sense to have the MCA param be the cutoff based on number of procs. Also, it occurred to me that this shouldn't impact the nidmap process as that is built and circulated when we launch via mpirun, not during direct launch. 2013-08-22 03:40:26 +00:00