than can be used (e.g., number of on-node peers), that no additional
room is set aside for those FIFOs that will never be created. This
makes it easier to have dedicated FIFOs: just set btl_sm_num_fifos
to be very large rather than setting it to be the local number of
procs. In practice, we ask for extra headroom anyhow, so this change
generally won't matter.
This commit was SVN r22291.
other request-using frameworks.
- Rather than having mpi/c/* functions allocate requests explicitly,
pass the MPI_Request* down to the I/O component and have it
perform the allocation.
- While the I/O base provides a base request which can be used,
it is not required and all request management occurs within
the component.
- Push progress management into the component, rather than having it
happen in the base. Progress functions are now easily registered,
and not all (ie, the one existing) components use progress functions
in any rational way.
ROMIO switched to generalized requests instead of MPIO_Requests many
moons ago, and Open MPI now uses ROMIO's generalized requests, so there
is no reason to wrap those requests (which are OMPI requests) in another
level of request.
Now the file function passes the MPI_Request* to the ROMIO component,
which passes it to the underlying ROMIO function, which calls
MPI_Grequest_start to create an OMPI request, which is what gets set
as the request to the user. Much cleaner.
This patch has two motivations. One, a whole heck of a lot of code
just got removed, and request handling is now much cleaner for I/O
components. Two, by adding support for Argonne's proposed generalized
request extensions, we can allow ROMIO to provide async I/O through
generalized requests, which we couldn't rationally do in the old
setup due to the crazy request completion rules.
This commit was SVN r22235.
use the new Automake "silent rules" if available.
If you are using an Automake prior to v1.11, you won't see the new
silent rules -- it will automatically default back to the "verbose"
rules.
Note, too, that even with these changes, you can enable the verbose
"make all" output in one of two ways:
1. Add "V=1" to your "make" command line
{{{
shell$ make all V=1
}}}
2. Add "--disable-silent-rules" to your "configure" command line:
{{{
shell$ ./configure --disable-silent-rules ...
}}}
The one down side of using the silent rules by default is that we'll
get less diagnostic information when users send their build logs. I
think we should update the web page to request that users send build
logs of "make V=1", but I'm guessing that not everyone will do it.
Note that I did ''not'' silent-ize the libltdl build (which is a dozen
or so files in the beginning of the build) because we wholly import
libltdl at autogen time. I therefore didn't want to patch libltdl
(further) after importing it a) to remain as forward- compatible as
possible, and b) patching the imported libltdl build system might be
tricky in terms of timestamps / dependencies. So those dozen-or-so
files will still be "verbose", but the rest of the files in OMPI will
be "silent".
This commit was SVN r22189.
area, we cap the size at LONG_MAX. But we are figuring out how much
we need. So, if that amount exceeds LONG_MAX, we should return an
"out of resource" error code.
This commit was SVN r22172.
This commit does a bunch of things:
* Address all remaining code review items from CMR #2023:
* Defer mmap setup to be lazy; only set it up the first time we
invoke a collective. In this way, we don't penalize apps that
make lots of communicators but don't invoke collectives on them
(per #2027).
* Remove the extra assignments of mca_coll_sm_one (fixing a
convertor count setup that was the real problem).
* Remove another extra/unnecessary assignment.
* Increase libevent polling frequency when using the RML to
bootstrap mmap'ed memory.
* Fix a minor procs-related memory leak in btl_sm.
* Commit a datatype fix that George and I discovered along the way to
fixing the coll sm.
* Improve error messages when mmap fails, potentially trying to
de-alloc any allocated memory when that happens.
* Fix a previously-unnoticed confusion between extent and true_extent
in coll sm reduce.
This commit was SVN r22049.
The following Trac tickets were found above:
Ticket 2023 --> https://svn.open-mpi.org/trac/ompi/ticket/2023
shmem progress (or the Windows equiv). Instead, poll hard on the
condition, but periocially call opal_progress(). This allows
badly-formed apps (e.g., the ibm test communicator/bsend_free) to
actually complete.
To be clear, there are far too many apps out there that assume that
MPI collectives will actually progress the rest of MPI. I don't like
putting in a feature to enable broken apps, but I have a dim
recollection of this issue coming up before (apps "hanging" when
testing the sm coll because they assumed that calling collectives
would trigger other MPI progress). Rather than have people claim that
OMPI is broken, I prefer to put in this "workaround". :-(
Indeed, the bsend_free test ''may'' be coded that way for exactly that
reason...? I don't remember offhand...
This commit was SVN r21984.
This commit fixes the ft_event logic so that it uses the normal destroy funcitonality instead of the workaround with the component that was previously there. All and all it made for cleaner code, which is always good.
If r21967 moves to v1.3, this patch will need to be moved as well.
This commit was SVN r21972.
The following SVN revision numbers were found above:
r21967 --> open-mpi/ompi@533633b8cb
Before this, we would restore the topmost old session directory. This commit makes sure that we remove it when we are done with it.
This commit was SVN r21971.
in the v1.2 series the cid's could never go above the max. allowed for a
particular pml. Because of that, pml_add_comm never checked for the cid, and
in fact pml_add_comm was called in comm_set, which is *before* we knew the
cid.
in the v1.3 series (and trunk) we check now the cid to detect overflow, and
because of that pml_add_comm has been moved *after* the cid allocation
routine, namely into the comm_activate routine.
in the v1.2 series, the comm_activate contained a synchronization step of the
old communicator in order to prevent incoming fragments on the new
communicator, with the main problem being that the allreduce in the
communicator allocation finished at different times on different processes,
and thus, this scenario could and did really occur.
in the v1.3 series, the comm_activate does not contain the synchronization
step anymore, since we introduced the new queue for fragments with unknown
cid. The problem is however, that whether a fragment is known or not is
decided by using ompi_comm_lookup(), which will return something useful as
soon as the cid allocation finished, even before pml_add_comm has been
called. So there is a small time gap where we will not post a message into
queue for unknown cid's, but we can also not look up the process structure
belonging to the rank in that comm ( that is in pml_ob1_match_recv_frag or
something like that).
The current fix reintroduces the synchronization step in comm_activate, and
ensures that no fragment can be received for a new communicator before the
synchronization occurs , and thus comm_nextcid() and pml_add_comm has been
called. It seems to be the safest and easiest way for now. Welcome back, v1.2.
This commit was SVN r21970.