1
1

finalize/disconnect: add explicit comment about why we use an RTE barrier

Based on extensive discussions before/at the June 2014 developer's
meeting, put a lengthy comment explaining a second reason why we
''must'' use an RTE barrier during MPI_FINALIZE and
MPI_COMM_DISCONNECT (i.e., unreliable transports).  Slightly explain
more the original reason why we do this, too (BTLs can lie/buffer a
message without actually injecting it on the network). 

This commit was SVN r32095.
Этот коммит содержится в:
Jeff Squyres 2014-06-26 14:31:40 +00:00
родитель 47b118c0ae
Коммит 8e52ba423f
2 изменённых файлов: 24 добавлений и 8 удалений

Просмотреть файл

@ -753,9 +753,10 @@ static int disconnect(ompi_communicator_t *comm)
orte_dpm_prequest_t *req, *preq;
ompi_group_t *group;
/* JMS Temporarily disable PML-based barrier and use RTE-based
barrier instead. This is related to
https://svn.open-mpi.org/trac/ompi/ticket/4643. */
/* Note that we explicitly use an RTE-based barrier (vs. an MPI
barrier). See a lengthy comment in
ompi/runtime/ompi_mpi_finalize.c for a much more detailed
rationale. */
OPAL_OUTPUT_VERBOSE((3, ompi_dpm_base_framework.framework_output,
"%s dpm:orte:disconnect comm_cid %d",

Просмотреть файл

@ -207,11 +207,26 @@ int ompi_mpi_finalize(void)
have many other, much higher priority issues to handle that deal
with non-erroneous cases. */
/* wait for everyone to reach this point
This is a grpcomm barrier instead of an MPI barrier because an
MPI barrier doesn't ensure that all messages have been transmitted
before exiting, so the possibility of a stranded message exists.
*/
/* Wait for everyone to reach this point. This is a grpcomm
barrier instead of an MPI barrier for (at least) two reasons:
1. An MPI barrier doesn't ensure that all messages have been
transmitted before exiting (e.g., a BTL can lie and buffer a
message without actually injecting it to the network, and
therefore require further calls to that BTL's progress), so
the possibility of a stranded message exists.
2. If the MPI communication is using an unreliable transport,
there's a problem of knowing that everyone has *left* the
barrier. E.g., one proc can send its ACK to the barrier
message to a peer and then leave the barrier, but the ACK
can get lost and therefore the peer is left in the barrier.
Point #1 has been known for a long time; point #2 emerged after
we added the first unreliable BTL to Open MPI and fixed the
del_procs behavior around May of 2014 (see
https://svn.open-mpi.org/trac/ompi/ticket/4669#comment:4 for
more details). */
coll = OBJ_NEW(ompi_rte_collective_t);
coll->id = ompi_process_info.peer_fini_barrier;
coll->active = true;