MSGDEBUG2 now means "print a one-liner for all PML calls into BTL, and
also when BTL calls PML with a recv completion (not send completions)"
MSGDEBUG1 means print more internal gory detail
MSGDEBUG is gone, replaced by MSGDEBUG1
In the process also found that PUT_DEST style fragments could
potentially be leaked in usnic_free() since send_fragment tests were
being applied to see if it was eligible to be freed.
This commit was SVN r29185.
changes required to support MPI_Bsend(). Introduces concept of
attaching a buffer to a large segment that the PML can scribble into and
we will send from. The reason we don't use a pinned buffer and send
directly from that is that usnic_verbs does not (yes) support num_sge>1
for regular sends. This means the data gets copied twice, but that is
unavoidable.
changed the logic in handle_large_send to be more sensible
Incorporated David's review comments
This commit was SVN r29184.
Do not assume that the "size" passed to alloc_send() will be the same as
the size of the message the resulting fragment will hold when
usnic_send() is called. This means usnic_send()/usnic_put() can never
trust any pre-computed size values, and are only allowed to look at the
lengths and pointers of the elements in the desc SG list.
This commit was SVN r29183.
- tag needs to be sent in *our* header, not the PML header
- usnic_alloc() should return smaller value if too much data requested
- be careful about callbacks vs removing items from lists
(we need to remove from outr lists *before* the callback)
- improve send callback handling
- add some more MSGDEBUG2 logging and cleanup
This commit was SVN r29181.
child stdout and falls back to plain pipe if openpty fails. Child uses
the 'usepty' flag to decide whether to treat this descriptor as a pty
or as a pipe.
Set 'usepty' flag to 0 upon openpty failure to inform the child that
it isn't dealing with a pty even though pty has been requested.
Thanks to Michal Peclo for reporting it and providing a patch.
cmr:v1.7.3:reviewer=jsquyres
cmr:v1.6.6:reviewer=jsquyres
This commit was SVN r29169.
The intercomm "merge" function can create a linkage between procs that was not reflected anywhere in a modex, and so at least some of the procs in the resulting communicator don't know how to talk to some of the new communicator's peers.
For example, consider the case where:
1. parent job A comm_spawns a process (job B) - these processes exchange modex and can communicate
2. parent job A now comm_spawns another process (job C) - again, these can communicate, but the proc in C knows nothing of B
3. do an intercomm merge across the communicators created by the two comm_spawns. This puts B and C into the same communicator, but they know nothing about how to talk to each other as they were not involved in any exchange of contact info. Hence, collectives on that communicator now fail.
This fix adds an API to the ompi/dpm framework that (a) exchanges the modex info across the procs in the merge to ensure all procs know how to communicate, and (b) calls add_procs to give the btl's a chance to select transports to any new procs.
cmr:v1.7.3:reviewer=jsquyres
This commit was SVN r29166.
The following Trac tickets were found above:
Ticket 2904 --> https://svn.open-mpi.org/trac/ompi/ticket/2904
(http://www.open-mpi.org/community/lists/devel/2013/09/12889.php), I
renamed all "f77" and "f90" directory/file names to "fortran"
(including removing shmemf77 / shmemf90 wrapper compilers and
replacing them with "shmemfort").
2. Fixed several Fortran coding errors.
3. Removed lots of old/stale comments that were clearly the result of
copying from the OMPI layer and then not cleaning up afterwards (i.e.,
the comments were wholly inaccurate in the oshmem layer).
4. Removed both redundant and harmful code from oshmem_config.h.in.
5. Temporarily slave building the oshmem Fortran bindings to
--enable-mpi-fortran. This doesn't seem like a good long-term
solution, but at least you can now build all Fortran bindings (MPI +
oshmem) or not. *** SEE MY NOTE IN config/oshmem_configure_options.m4
FOR WORK THAT STILL NEEDS TO BE DONE!
This commit was SVN r29165.
The FREE_LIST_*_MT stuff was introduced on the SVN trunk in r28722
(2013-07-04), but so far, has not been merged into the v1.7 branch yet
(2013-09-06). So put it in its own #ifdef, rather than defining it
based on OMPI_MAJOR_VERSION/OMPI_MINOR_VERSION.
This commit was SVN r29148.
The following SVN revision numbers were found above:
r28722 --> open-mpi/ompi@c9e5ab9ed1
onflict that can cause messages to be lost. Add detection of this condition, and have both processes cancel their connect operations. The process with the higher rank will
reconnect, while the lower rank process will simply wait for the connection to be created.
Refs trac:3696
This commit was SVN r29139.
The following Trac tickets were found above:
Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
The Cisco-maintained v1.6 port of the usnic BTL has diverged from the
upstream trunk and v1.7 branches. This commit adjusts the trunk to more
closely match the v1.6 branch to simplify future merging and
cherry-picking.
The usnic MCA parameters also need work on this side.
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)
This commit was SVN r29138.
The following Trac tickets were found above:
Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
The fix for the HPL SEGV was incorrect because it assumed the
prepare_src() routine was always allowed to return "bytes processed"
less than the requested "bytes to send". It turns out this is only true
if the convertor is what limits the size, we are not allowed to limit
the data sent for our own reasons, else we break login in the upper
layers.
This means we need to learn the number of bytes out of the size
requested the convertor will give us, no matter how big the size is.
Unfortunately, this is a destructive test, and (currently) the only way to
learn that number is to actually have the convertor copy the data out into
buffers.
This change implements this, copying the entire data out into a chain of
send segments which are attached to the large send fragment. Now we can
always return the proper size value to the PML.
Fixes Cisco bug CSCuj08024
Authored-by: Reese Faucette <rfaucett@cisco.com>
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)
This commit was SVN r29137.
The following Trac tickets were found above:
Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
Authored-by: Reese Faucette <rfaucett@cisco.com>
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)
This commit was SVN r29136.
The following Trac tickets were found above:
Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)
This commit was SVN r29135.
The following Trac tickets were found above:
Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)
This commit was SVN r29134.
The following Trac tickets were found above:
Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
- round segment buffer allocation to cache-line
- split some routines into an inline fast section and a called
slower section
- introduce receive fastpath in component_progress that:
o returns immediately if there is a packet available on priority
queue and fastpath is enabled
o disables fastpath for 1 time after use to provide fairness to
other processing
o defers receive buffer posting
o defers bookeeping for receive until next call
to usnic_component_progress
Authored-by: Reese Faucette <rfaucett@cisco.com>
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)
This commit was SVN r29133.
The following Trac tickets were found above:
Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
Reese actually authored several usnic BTL changes prior to this commit,
but they were committed on his behalf by Jeff or me.
cmr=v1.7.3:reviewer=jsquyres
This commit was SVN r29132.
Without this, an `--enable-debug` build would hit an assertion in the
list code when run under valgrind with `--malloc-fill=0xff` or any other
case where malloc returned non-zeroed buffers.
Also allow the normal OBJ_ machinery to handle the constructor
invocation ordering for us instead of doing it by hand (which could have
led to future bugs).
Reviewed-by: jsquyres@cisco.com
cmr=v1.7.4
Depends on trunk functionality in r29095 and r29096. Refs trac:3740,#3741.
This commit was SVN r29127.
The following SVN revision numbers were found above:
r29095 --> open-mpi/ompi@d1b5940e97
r29096 --> open-mpi/ompi@a552921171
The following Trac tickets were found above:
Ticket 3740 --> https://svn.open-mpi.org/trac/ompi/ticket/3740