Add a lengthy comment about correctness and features of MPI_FINALIZE,
per a lengthy discussion at the Louisville, Feb 2009 OMPI meeting. This commit was SVN r20656.
Этот коммит содержится в:
родитель
57be80c983
Коммит
2002c576fe
@ -150,6 +150,61 @@ int ompi_mpi_finalize(void)
|
||||
gettimeofday(&ompistart, NULL);
|
||||
}
|
||||
|
||||
/* NOTE: MPI-2.1 requires that MPI_FINALIZE is "collective" across
|
||||
*all* connected processes. This only means that all processes
|
||||
have to call it. It does *not* mean that all connected
|
||||
processes need to synchronize (either directly or indirectly).
|
||||
|
||||
For example, it is quite easy to construct complicated
|
||||
scenarios where one job is "connected" to another job via
|
||||
transitivity, but have no direct knowledge of each other.
|
||||
Consider the following case: job A spawns job B, and job B
|
||||
later spawns job C. A "connectedness" graph looks something
|
||||
like this:
|
||||
|
||||
A <--> B <--> C
|
||||
|
||||
So what are we *supposed* to do in this case? If job A is
|
||||
still connected to B when it calls FINALIZE, should it block
|
||||
until jobs B and C also call FINALIZE?
|
||||
|
||||
After lengthy discussions many times over the course of this
|
||||
project, the issue was finally decided at the Louisville Feb
|
||||
2009 meeting: no.
|
||||
|
||||
Rationale:
|
||||
|
||||
- "Collective" does not mean synchronizing. It only means that
|
||||
every process call it. Hence, in this scenario, every
|
||||
process in A, B, and C must call FINALIZE.
|
||||
|
||||
- KEY POINT: if A calls FINALIZE, then it is erroneous for B or
|
||||
C to try to communicate with A again.
|
||||
|
||||
- Hence, OMPI is *correct* to only effect a barrier across each
|
||||
jobs' MPI_COMM_WORLD before exiting. Specifically, if A
|
||||
calls FINALIZE long before B or C, it's *correct* if A exits
|
||||
at any time (and doesn't notify B or C that it is exiting).
|
||||
|
||||
- Arguably, if B or C do try to communicate with the now-gone
|
||||
A, OMPI should try to print a nice error ("you tried to
|
||||
communicate with a job that is already gone...") instead of
|
||||
segv or other Badness. However, that is an *extremely*
|
||||
difficult problem -- sure, it's easy for A to tell B that it
|
||||
is finalizing, but how can A tell C? A doesn't even know
|
||||
about C. You'd need to construct a "connected" graph in a
|
||||
distributed fashion, which is fraught with race conditions,
|
||||
etc.
|
||||
|
||||
Hence, our conclusion is: OMPI is *correct* in its current
|
||||
behavior (of only doing a barrier across its own COMM_WORLD)
|
||||
before exiting. Any problems that occur are as a result of
|
||||
erroneous MPI applications. We *could* tighten up the erroneous
|
||||
cases and ensure that we print nice error messages / don't
|
||||
crash, but that is such a difficult problem that we decided we
|
||||
have many other, much higher priority issues to handle that deal
|
||||
with non-erroneous cases. */
|
||||
|
||||
/* wait for everyone to reach this point
|
||||
This is a grpcomm barrier instead of an MPI barrier because an
|
||||
MPI barrier doesn't ensure that all messages have been transmitted
|
||||
|
Загрузка…
x
Ссылка в новой задаче
Block a user