Modify the mapper to better bookmark its stopping place each time, and to pick up the next time from there. This needs to be validated on a multi-node system.
Fix a major memory corruption problem in the registry put/get functions that was doing multiple free's. Not sure how valgrind missed this one, though it only occurred in specific circumstances (such as comm_spawn).
This commit was SVN r12179.
This change does a couple of things:
1. Since the USE_PARENT_ALLOC attribute is a directive about regarding allocation of resources to a job, it more properly should be an attribute of the RAS. Change the name to reflect that and move the attribute define to the ras_types.h file.
2. Add the attributes list to the RMAPS map_job interface. This provides us with the desired flexibility to dynamically specify directives for mapping. The system will - in the absence of any attribute-based directive - default to the values provided in the MCA parameters (either from environment or command-line interface).
This commit was SVN r12164.
size and diplacement of data-type. After this patch all data can contain size_t bytes
and the displacements are defined as ptrdiff_t. All of the files I was able to compile
have been modified to match this requirement.
This commit was SVN r12146.
In this implementation, we begin mapping on the first node that has at least one slot available as measured by the slots_inuse versus the soft limit. If none of the nodes meet that criterion, we just start at the beginning of the node list since we are oversubscribed anyway.
Note that we ignore this logic if the user specifies a mapping - then it's just "user beware".
The real root cause of the problem is that we don't adjust sched_yield as we add processes onto a node. Hence, the node becomes oversubscribed and performance goes into the toilet. What we REALLY need to do to solve the problem is:
(a) modify the PLS components so they reuse the existing daemons,
(b) create a way to tell a running process to adjust its sched_yield, and
(c) modify the ODLS components to update the sched_yield on a process per the new method
Until we do that, we will continue to have this problem - all this fix (and any subsequent one that focuses solely on the mapper) does is hopefully make it happen less often.
This commit was SVN r12145.
The UD BTL isn't gone - the latest version is in my afriedle-ud branch. This version on the trunk was very old, ompi_ignore'd, lacked performance, and probably contained bugs. The maintained version on my branch is working solid, and will eventually come back, but not for v1.2.
This commit was SVN r12144.
Fix the problem observed by multiple people that comm_spawned children were (once again) being mapped onto the same nodes as their parents. This was caused by going through the RAS a second time, thus overwriting the mapper's bookkeeping that told RMAPS where it had left off.
To solve this - and to continue moving forward on the ORTE development - we introduce the concept of attributes to control the behavior of the RM frameworks. I defined the attributes and a list of attributes as new ORTE data types to make it easier for people to pass them around (since they are now fundamental to the system, and therefore we will be packing and unpacking them frequently). Thus, all the functions to manipulate attributes can be implemented and debugged in one place.
I used those capabilities in two places:
1. Added an attribute list to the rmgr.spawn interface.
2. Added an attribute list to the ras.allocate interface. At the moment, the only attribute I modified the various RAS components to recognize is the USE_PARENT_ALLOCATION one (as defined in rmgr_types.h).
So the RAS components now know how to reuse an allocation. I have debugged this under rsh, but it now needs to be tested on a wider set of platforms.
This commit was SVN r12138.
* Update comments in some MPI_FILE_* functions to reflect that the
MPI specs have different page numbers in the ps and pdf (woof!).
* Update comments to say "Retain" where we meant retain (not "return)
* Add a check in MPI_ERRHANDLER_FREE to raise an MPI exception if the
user attempts to free an intrinsic errhandler *and* the refcount is
1 (meaning that it would actually free the intrinsic). This
protects erroneous programs from segv'ing.
* Remove lengthy comment from comm_get_errhandler.c which is no
longer valid (because of the MPI-2 errata that says that users *do*
have to call MPI_ERRHANDLER_FREE).
This commit was SVN r12128.
The following SVN revision numbers were found above:
r12122 --> open-mpi/ompi@407b3cb788
The following Trac tickets were found above:
Ticket 502 --> https://svn.open-mpi.org/trac/ompi/ticket/502
MPI::SEEK_* because iostreams (well, ios_base, but I don't think that
should be included directly) can use SEEK_* as values in an enum, which
means that 'const int' is bad for them.
* Remove now useless comments in the cxx example programs
* include iostream after mpi.h so that our examples work with other MPI
implementations that don't try to be as friendly with the constants.
Refs trac:387
This commit was SVN r12125.
The following Trac tickets were found above:
Ticket 387 --> https://svn.open-mpi.org/trac/ompi/ticket/387
* Document --disable-mpi-cxx-seek
* Document that you need to include "mpi.h" after system-level
headers that create the SEEK_* constants
* Make the C++ examples follow this behavior (include "mpi.h" after
<iostream>)
This commit was SVN r12123.
The following Trac tickets were found above:
Ticket 387 --> https://svn.open-mpi.org/trac/ompi/ticket/387
* Fix MPI-2 page number in comments for a specific reference in the
spec
* Allow getting/setting the errhandler on MPI_FILE_NULL
* Allow freeing of intrinsic errhandlers, per MPI-2 errata (if you GET
an errhandler on a communicator, you must be able to FREE it, even
if it's an intrinsic).
Thanks to Lisandro Dalcin for reporting these problems.
This commit was SVN r12122.
some issues with the C #defines SEEK_{SET, END, POS}. The workaround
involves some hackery that should work in almost every common use case
for the C stdio constants (and all the legal issues of the MPI constants).
The one issue is that the C stdio constants are now const ints instead
of #defines, which means that #ifdef checks will fail for the constants.
Behavior can be disabled at either configure time or build time.
Refs trac:387
This commit was SVN r12121.
The following Trac tickets were found above:
Ticket 387 --> https://svn.open-mpi.org/trac/ompi/ticket/387
where a window was in both the passive and active side of a lock sequence.
Refs trac:488
This commit was SVN r12112.
The following Trac tickets were found above:
Ticket 488 --> https://svn.open-mpi.org/trac/ompi/ticket/488
seed value have something set to true. Allow selection of the listen
type to thread if (and only if) the process is the HNP...
This commit was SVN r12105.
processes launched locally for the stdio file names. This was causing
the expected files to not exist and bproc_vexecmove_io to fail.
* Clean up a bunch of debugging output in the bproc pls
This commit was SVN r12102.
I was running into where if a string in the argument list contains a printf
escape sequence, we would segfault. In particular, I was using opal_output
to print the environment and had something like:
LESSOPEN=|/usr/bin/lesspipe.sh %s
in my environment. So I called opal_output(0, "%s", environ[i]) and
got a segfault because the fprintf tried to expand the %s in the
environment variable
This commit was SVN r12094.