This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.
All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
Use of the old ompi_free_list_t and ompi_free_list_item_t is
deprecated. These classes will be removed in a future commit.
This commit updates the entire code base to use opal_free_list_t and
opal_free_list_item_t.
Notes:
OMPI_FREE_LIST_*_MT -> opal_free_list_* (uses opal_using_threads ())
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds an owner file in each of the component directories
for each framework. This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page. Currently there are two
"fields" in the file, an owner and a status. A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.
We were still leaking 1) file descriptors for data files, and 2) some
control files. I fixed both of these leaks and everything is looking
good. This should fix the bug where we are running out of file
descriptors when running the loop_spawn test. I also too the
opportunity to refactor the code a bit to make the mapping/unmapping
simpler. This should help avoid these sorts of issues in the future.
Depends on #4678
cmr=v1.8.2:reviewer=manjugv
This commit was SVN r31893.
if in_ptr is NULL, the MAP_FIXED flag cannot be passed to mmap
this caused a hang in topology/cart and topology/sub from ibm
test suite on trunk.
cmr=v1.8.2:reviewer=hjelmn
This commit was SVN r31890.
Basesmuma was vallocing space for control data then mmapping over that
data. Nothing in the code suggests any need for mmapping a specific
address so I did the following to remove the leak:
- Removed the valloc of the buffer space
- ftruncate the mmaped file to ensure there is sufficient memory to
allocate space for the control data.
Ideally this code should be using opal/shmem but that is a larger
change. Keeping it simple for now.
cmr=v1.8.2:reviewer=manjugv
This commit was SVN r31822.
netpatterns_setup_narray_knomial_tree.
Fix a bug in ptpcoll that caused memory allocated by
netpatterns_setup_narray_knomial_tree to leak.
cmr=v1.8.2:reviewer=manjugv
This commit was SVN r31781.
The items in the available bcol list were getting leaked. This commit
fixes this leak. I also cleaned up the code a bit. This includes
making use of the opal_argv_free function.
cmr=v1.8.2:reviewer=manjugv
This commit was SVN r31744.
We were leaking file descriptors when coll/ml was in use. It turn out
this was because basesmuma was failing to unmap files it had previously
mapped. This commit cleans up the setup code to ensure that we only
attempt to map the control files once per module and then ensures the
files are unmapped when the module is released.
cmr=v1.8.2:reviewer=manjugv
This commit was SVN r31737.
algorithm
Per suggestion from Manju make sure there isn't a gap in the size ranges
for the available algorithms.
cmr=v1.8.2:ticket=trac:4437:reviewer=ompi-rm1.8
This commit was SVN r31728.
The following Trac tickets were found above:
Ticket 4437 --> https://svn.open-mpi.org/trac/ompi/ticket/4437
top_ompi_srcdir -> OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR
We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.
Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.
This commit was SVN r31678.
Not closing this file descriptor will cause us to leak file
descriptors. It is safe to close the file after it has been mmapped.
cmr=v1.8.2:reviewer=manjugv
This commit was SVN r31579.
The algorithm was failing ibm/collective/allgather and iallgather. I
cleaned up the code to eliminate duplicate code paths and tracked the
issue down to an error in the way extra nodes in the knomial exchange
are handled. The new code is more compact and has been tested with up
to 64 ranks with the ibm test suite.
cmr=v1.8.1:reviewer=manjugv
This commit was SVN r31419.
Thanks to ggouaillardet for finding and fixing these issues.
Closes trac:4460
cmr=v1.8.1:reviewer=manjugv
This commit was SVN r31264.
The following Trac tickets were found above:
Ticket 4460 --> https://svn.open-mpi.org/trac/ompi/ticket/4460
This commit should finish the work started for #869. Closing that ticket
with this commit.
Closes trac:869
cmr=v1.8.1:reviewer=jsquyres
This commit was SVN r31257.
The following Trac tickets were found above:
Ticket 869 --> https://svn.open-mpi.org/trac/ompi/ticket/869
When we are only using local ranks basesmuma needs to provide an allreduce
function for both large and small message or else the coll/ml selection
logic will fail. In the future this logic should probably be updated to
just disable allreduce in coll/ml instead of disabling coll/ml. For now
it should be correct to say the basesmuma allgather works for larger
messages.
cmr=v1.8:reviewer=manjugv
This commit was SVN r31190.
After discussion with Manju we decided to update these the process count
limits of the shared memory collectives to an arbitrarily large number.
cmr=v1.7.5:ticket=trac:4405
This commit was SVN r31126.
The following SVN revision numbers were found above:
r31096 --> open-mpi/ompi@3f469d08e7
The following Trac tickets were found above:
Ticket 4405 --> https://svn.open-mpi.org/trac/ompi/ticket/4405
The initialization code did several allgathers on void *'s using
MPI_LONG_LONG_INT. This will produce the wrong result on 32-bit
platforms. Instead use MPI_BYTE with count = sizeof (void *).
cmr=v1.7.5:ticket=trac:4158
This commit was SVN r30627.
The following Trac tickets were found above:
Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
Found two bugs in basesmuma:
- Release all resources when tearing down the bcol module.
- Allways call the allreduce in the smcm code. We do not know
beforehand whether all procs have all the files mapped.
cmr=v1.7.5:ticket=trac:4158
This commit was SVN r30623.
The following Trac tickets were found above:
Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158