name server).
* Add return status message for kill messages from the contact pcm doing
the actual killing so that MPI_Abort or ompi_rte_{kill,term}_{proc,job}
have a useful return value.
* Cleanup the per-started-process local storage in the pcm base to not have
so much code duplication
* Since there are now 4 kill functions (signal or terminate a proc or job)
combine them into one function in the pcm interface. Makes life easier
all around for PCM authors. Already had to combine for the message
transfer.
* Fix race condition in the bootproxy code that was causing it not to
have the right count for number of alive processes
This commit was SVN r3796.
support is included because ROMIO is inherently thread-unsafe.
One possible way to have true asynchronous progress would be to use a
progress thread that wakes up and polls at some frequency when there
are non-blocking IO requests pending. This is pretty icky, though --
it should definitely have an MCA parameter to enable/disable this
functionality, as well as another to control the polling frequency.
This also strengthens the argument that we need a v2 of the io
framework -- one that is not designed to exclusively support ROMIO --
one that does something unimaginably "better" for the parallel MPI-2
IO interface. :-)
This commit was SVN r3786.
* add debuging code for the callback. Since gdb doesn't really like
doing things like waitpid for processes, spin when we are in the
handler in a way that gdb can easily attach and debug
This commit was SVN r3785.
memory freed on receipt of FIN packet with memory region information
from the sender (mca_ptl_ib_fin_header_t); RDMA write source
memory freed on completion notification from RDMA write...
This commit was SVN r3776.
Comm_spawn is now fully functional. I'll send out a separate message about some of the problems encountered, and resulting action items.
This commit was SVN r3770.
w/o threads runs correctly at this stage. fixed errors with using
addresses valid in remote memory scope, not local memory scope.
This commit was SVN r3767.
* remove ignore on the bjs discovery / mapping code. It seems to work
correctly
* Fix up svn:ignore properties in the dynamic-mca directory
This commit was SVN r3761.
node, since the default hostfile shipped with Open MPI contains only
"localhost". This is performed after all the other checks, so it should
only get activated as a last-ditch effort.
This commit was SVN r3758.
as loop ending condition
* Add resolver code for a couple of different node formats for BProc. Given
hostfiles can now be a combination of notations:
0
n1
2
master
self
This commit was SVN r3757.
PLEASE NOTE: there are some diagnostic messages in oob_xcast that will print out. Please don't have a cow about them - they won't hurt nor injure anyone, and it's just there for a little while to help Tim and I debug a problem. Just didn't want to create yet another MCA parameter to debug 10 lines of code. :-)
This commit was SVN r3756.
for hints, and a ompi_rb_tree_t for all other search operations;
registration occurs only when a different memory range or access
privileges is needed; deregistration is lazy, and only done when
registration fails with VAPI_EAGAIN (no resources...)
This commit was SVN r3755.
Most of this checkin consists of more debugging stuff. Hopefully, you won't see any printf's that aren't protected by debug flags - if you do, let me know and I'll take them out with my apologies.
Outside of debugging, the biggest change was a revamp of the shutdown process. For several reasons, we had chosen to have all processes "wait" for a shutdown message before exiting. This message is typically generated by mpirun, but in the case of comm_spawn we needed to do something else. We have decided that the best way to solve this problem is to:
(a) replace the shutdown message (which needed to be generated by somebody - usually mpirun) with an oob_barrier call. This still requires that the rank 0 process be alive. However, we terminate all processes if one abnormally terminates anyway, so this isn't a problem (with the standard or our implementation); and
(b) have the state-of-health monitoring subsystem issue the call to cleanup the job from the registry. Since the state-of-health subsystem isn't available yet, we have temporarily assigned that responsibility to the rank 0 process. Once the state-of-health subsystem is available, we will have it monitor the job for all-processes-complete and then it can tell the registry to cleanup the job (i.e., remove all data relating to this job).
Hope that helps a little. I'll put all this into the design docs soon.
This commit was SVN r3754.
.ompi_ignore file.
If you have an empty .ompi_unignore file (presumably alongside an
.ompi_ignore file), we'll build the component.
If you have a non-empty .ompi_unignore file (presumable alongside an
.ompi_ignore file), and your username can be found in that file, we'll
build the component. If your username is *not* the the .ompi_unignore
file, we *won't* build the component.
This commit was SVN r3747.