closed
We were leaving the selected component open. This commit should
eliminate a leak detected by valgrind.
cmr=v1.8.2:reviewer=jsquyres
This commit was SVN r31749.
The smsg_mboxes free list was not getting destructed. The construct
has been moved to module initialization and a matching destruct is now
in the module destruct.
This commit was SVN r31746.
This commit fixes memory leaks discovered in the sbgp setup code. We
were leaking an opal_argv as well as some list items. I took the
opportunity to clean up the code a little which includes making use of
the opal_argv_free function.
cmr=v1.8.2:reviewer=manjugv
This commit was SVN r31745.
The items in the available bcol list were getting leaked. This commit
fixes this leak. I also cleaned up the code a bit. This includes
making use of the opal_argv_free function.
cmr=v1.8.2:reviewer=manjugv
This commit was SVN r31744.
It is essential to call mca_base_framework_close for every framework
that is opened. coll/ml was not doing this so neither bcol nor sbgp
were getting cleaned up. This commit fixes this omission.
Also fixed a leak caused by calling OBJ_DESTRUCT for something created
with OBJ_NEW. With these changes coll/ml appears to be valgrind clean.
cmr=v1.8.2:reviewer=manjugv
This commit was SVN r31743.
Simple bug. The dist_graph pointer must be a constructed object. The
change from malloc to OBJ_NEW was missing from r31716. Tested with MTT
and everything looks ok now.
This commit was SVN r31739.
The following SVN revision numbers were found above:
r31716 --> open-mpi/ompi@e3df77548d
There is no reason to cancel the listening thread. It should die
automatically when the file descriptor is closed. It is sufficient
to just wait for the thread to exit with pthread join.
cmr=v1.8.2:ticket=trac:4616:reviewer=jsquyres
This commit was SVN r31738.
The following Trac tickets were found above:
Ticket 4616 --> https://svn.open-mpi.org/trac/ompi/ticket/4616
We were leaking file descriptors when coll/ml was in use. It turn out
this was because basesmuma was failing to unmap files it had previously
mapped. This commit cleans up the setup code to ensure that we only
attempt to map the control files once per module and then ensures the
files are unmapped when the module is released.
cmr=v1.8.2:reviewer=manjugv
This commit was SVN r31737.
algorithm
Per suggestion from Manju make sure there isn't a gap in the size ranges
for the available algorithms.
cmr=v1.8.2:ticket=trac:4437:reviewer=ompi-rm1.8
This commit was SVN r31728.
The following Trac tickets were found above:
Ticket 4437 --> https://svn.open-mpi.org/trac/ompi/ticket/4437
MPI_Cart_Create/MPI_Graph_create/MPI_Dist_Graph
Fixes trac:4581
This commit was SVN r31716.
The following Trac tickets were found above:
Ticket 4581 --> https://svn.open-mpi.org/trac/ompi/ticket/4581
In preparation for moving the BTLs down to OPAL, discontinue the use
of the RML for connectivity client/agent communication. Instead, use
local unix domain sockets in the job session directory (all
communication is between processes on the same server, so unix domain
sockets are fine).
This commit was SVN r31710.
Update the Open MPI description and fix lots of grammatical errors in
the OpenSHMEM description.
cmr=v1.8.2:reviewer=ompi-rm1.8
This commit was SVN r31709.
Add the component use_udp value into the modex. If my component's
use_udp value doesn't agree with the use_udp value from a peer's modex
data, print a helpful message and disqualify the usnic BTL (the usnic
BTL will not be used). This prevents accidental customer
misconfigurations.
Reviewed by Dave Goodell
cmr=v1.8.2:reviewer=ompi-rm1.8
This commit was SVN r31689.
Trivial struct re-ordering to eliminate holes in the middle of the
struct (although there's still a hole at the end) and reduce the
overall size of the struct from 64 to 56 bytes. Also change mtu from
int to uint16_t; there was no need for it to be that large.
Reviewed by Dave Goodell
cmr=v1.8.2:reviewer=ompi-rm1.8
This commit was SVN r31688.
Fix mismatch between the MCA param (which expresses the timeout in
*mili*seconds) and the struct timeval timeout (which expresses the
timeout in *micro*seconds).
Reviewed by Dave Goodell
cmr=v1.8.2:reviewer=ompi-rm1.8
This commit was SVN r31687.
So track that the rte has reached that point, and only emit the new message if it is accurate.
Note that we still generate a TON of output for a minor error:
Ralphs-iMac:examples rhc$ mpirun -n 3 -mca btl sm ./hello_c
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Process 1 ([[50239,1],2]) is on host: Ralphs-iMac
Process 2 ([[50239,1],2]) is on host: Ralphs-iMac
BTLs attempted: sm
Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another. This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used. Your MPI job will now abort.
You may wish to try to narrow down the problem;
* Check the output of ompi_info to see which BTL/MTL plugins are
available.
* Run your application with MPI_THREAD_SINGLE.
* Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
if using MTL-based communications) to see exactly which
communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[50239,1],2]
Exit code: 1
--------------------------------------------------------------------------
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mca-bml-r2.txt / unreachable proc
[Ralphs-iMac.local:23227] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mpi-runtime / mpi_init:startup:pml-add-procs-fail
Ralphs-iMac:examples rhc$
Hopefully, we can agree on a way to reduce this verbage!
This commit was SVN r31686.
The following SVN revision numbers were found above:
r2 --> open-mpi/ompi@58fdc18855
Bring down 3aa0ed6 from the hwloc v1.7 branch: Stevens says we should
GETFD before we SETFD, so we do
cmr=v1.8.2:reviewer=rhc
This commit was SVN r31683.
top_ompi_srcdir -> OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR
We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.
Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.
This commit was SVN r31678.