This commit fixes a bug that can cause request and communicator leaks
when cleaning up an OSC window. The should prevent a hang seen with
IMB-EXT.
cmr=v1.8.2:reviewer=jsquyres
This commit was SVN r31539.
Due to a leak in the osc/rdma component we were running out of cids on
a one-sided tests. This resulted in a hang instead of an error. This
commit causes the nextcid algorithm to return an error if we run out
of cids.
cmr=v1.8.2:reviewer=jsquyres
This commit was SVN r31538.
We will track #4568 from the 1.8 CMR.
Closes trac:4568
cmr=v1.8.2:reviewer=jsquyres
This commit was SVN r31535.
The following Trac tickets were found above:
Ticket 4568 --> https://svn.open-mpi.org/trac/ompi/ticket/4568
This commit will improve the message rate when using the sendi function
by not waiting for the send to get to the remote process.
cmr=v1.8.2:reviewer=ompi-rm1.8
This commit was SVN r31526.
This combination does not make sense but is not explicitly forbidden by
the standard so remove the argument check for this combination.
cmr=v1.8.2:reviewer=jsquyres
This commit was SVN r31523.
feature
This commit should fix a hang seen when running some of the one-sided
tests. The downside of this fix is it reduces the maximum size of the
messages that use the fast boxes. I will fix this in a later commit.
To improve performance under a heavy load I introduced sequencing to
ensure messages are given to the pml in order. I have seen little-no
impact on the message rate or latency with this change and there is a
clear improvement to the heavy message rate case.
Lets let this sit in the trunk for a couple of days to ensure that
everything is working correctly.
cmr=v1.8.2:reviewer=jsquyres
This commit was SVN r31522.
Patch from Gilles Gouaillardet on #4517 to fix handling 0-sized
messages in coll tuned with MPI_ALLTOALLV and MPI_IN_PLACE.
Reviewed by Jeff Squyres.
Fixes trac:4517
cmr=v1.8.2:reviewer=ompi-rm1.8
This commit was SVN r31521.
The following Trac tickets were found above:
Ticket 4517 --> https://svn.open-mpi.org/trac/ompi/ticket/4517
Patch from Gilles Gouaillardet on #4506 to correctly handle 0-sized
messages in coll/basic MPI_Alltoallv and MPI_Alltoallw.
Reviewed by Jeff Squyres.
Fixes trac:4506.
cmr=v1.8.2:reviewer=ompi-rm1.8
This commit was SVN r31519.
The following Trac tickets were found above:
Ticket 4506 --> https://svn.open-mpi.org/trac/ompi/ticket/4506
Patch submitted by Gilles Gouaillardet on #4518. Reviewed by Jeff.
Fixes trac:4518
cmr=v1.8.2:reviewer=ompi-rm1.8
This commit was SVN r31517.
The following Trac tickets were found above:
Ticket 4518 --> https://svn.open-mpi.org/trac/ompi/ticket/4518
Ensure to also OBJ_RELEASE the neightbor and ineighbor modules.
Fixes trac:4444 (this patch is from that ticket).
This commit was SVN r31516.
The following Trac tickets were found above:
Ticket 4444 --> https://svn.open-mpi.org/trac/ompi/ticket/4444
Child processes now look clean; I can't find any more fd's that are
leaking from the parent to children.
Refs trac:4550
This commit was SVN r31515.
The following Trac tickets were found above:
Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
code.
This commit fixes minor errors in the incorrectly-committed r31513
(new fd close-on-exec convenience function).
Refs trac:4550
This commit was SVN r31514.
The following SVN revision numbers were found above:
r31513 --> open-mpi/ompi@e1655ae68d
The following Trac tickets were found above:
Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
Paul Hargrove pointed out that Stevens tells us that we should
FD_GETFL before FD_SETFL. And so we shall.
Make a new convenience function to do this (opal_fd_set_cloexec()),
just so that we don't have to litter this 2-step process throughout
the code.
Refs trac:4550
This commit was SVN r31513.
The following Trac tickets were found above:
Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
This pipe is used to communicate between threads in this process.
Mark both fd as close-on-exec so that children don't inherit this
pipe.
Refs trac:4550
This commit was SVN r31512.
The following Trac tickets were found above:
Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
Make sure that an internal, long-lived hwloc fd is marked as
close-on-exec so that children don't inherit it. This patch is
committed upstream in the hwloc master and v1.9 branches as 7489287
and b654e19, respectively. The patch applied here is the exact same
logic, but the surrounding code changed slightly since the hwloc v1.7
series, so the patch doesn't apply cleanly.
Refs trac:4550
This commit was SVN r31511.
The following Trac tickets were found above:
Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
new OFED changed struct layout and static assignment caused segv.
detect struct new layout and use dynamic assignment
fixed by AlexM, reviewed by Miked
cmr=v1.8.2:reviewer=ompi-rm1.8
This commit was SVN r31502.
Make sure the debugger attach fifo is marked as close-on-exec so that
children procs don't inherit it. For example, if you salloc a SLURM
allocation and run "mpirun ..." in there (i.e., mpirun is running on
the head node, and launching on to back-end nodes), the forked srun's
will inherit this fd if it is still open.
Refs trac:4550
This commit was SVN r31499.
The following Trac tickets were found above:
Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
One more commit for this ticket... as pointed out by Giles, we have
ompi_op_is_commute(). We should use that instead of replicating the
logic for the test.
Refs trac:4548
This commit was SVN r31497.
The following Trac tickets were found above:
Ticket 4548 --> https://svn.open-mpi.org/trac/ompi/ticket/4548
MPI_OP_COMMUTATIVE should work on all MPI_Op's -- regardless of
whether they are predefined or not.
Refs trac:4548.
This commit was SVN r31491.
The following Trac tickets were found above:
Ticket 4548 --> https://svn.open-mpi.org/trac/ompi/ticket/4548
Add some verbiage about how mpirun now defaults to disallowing running
as root, but you can use the --allow-run-as-root option to override
this default behavior.
Refs trac:4536
This commit was SVN r31477.
The following Trac tickets were found above:
Ticket 4536 --> https://svn.open-mpi.org/trac/ompi/ticket/4536
Prior to r29058, this same logic was in place (i.e., ensure that the
extra fd to /dev/null is closed). It looks like it was accidentally
removed in the ORTE conversion to the state machine in r29058.
This ''might'' have something to do with many hangs that we're seeing
in Cisco MTT with jobs that exhibit failure (e.g., call MPI_ABORT)...?
cmr=v1.8.2:reviewer=rhc
This commit was SVN r31469.
The following SVN revision numbers were found above:
r29058 --> open-mpi/ompi@a200e4f865