MPI and non-ORTE applications for RSH on one node with or without
threads. I think we're approaching convergence with the tim branch
This commit was SVN r4895.
Comm_spawn is now fully functional. I'll send out a separate message about some of the problems encountered, and resulting action items.
This commit was SVN r3770.
PLEASE NOTE: there are some diagnostic messages in oob_xcast that will print out. Please don't have a cow about them - they won't hurt nor injure anyone, and it's just there for a little while to help Tim and I debug a problem. Just didn't want to create yet another MCA parameter to debug 10 lines of code. :-)
This commit was SVN r3756.
Most of this checkin consists of more debugging stuff. Hopefully, you won't see any printf's that aren't protected by debug flags - if you do, let me know and I'll take them out with my apologies.
Outside of debugging, the biggest change was a revamp of the shutdown process. For several reasons, we had chosen to have all processes "wait" for a shutdown message before exiting. This message is typically generated by mpirun, but in the case of comm_spawn we needed to do something else. We have decided that the best way to solve this problem is to:
(a) replace the shutdown message (which needed to be generated by somebody - usually mpirun) with an oob_barrier call. This still requires that the rank 0 process be alive. However, we terminate all processes if one abnormally terminates anyway, so this isn't a problem (with the standard or our implementation); and
(b) have the state-of-health monitoring subsystem issue the call to cleanup the job from the registry. Since the state-of-health subsystem isn't available yet, we have temporarily assigned that responsibility to the rank 0 process. Once the state-of-health subsystem is available, we will have it monitor the job for all-processes-complete and then it can tell the registry to cleanup the job (i.e., remove all data relating to this job).
Hope that helps a little. I'll put all this into the design docs soon.
This commit was SVN r3754.
This may trigger a complete rebuild :(. Short overview of changes:
- reduce number of network slams at startup
- prevent gpr from hanging when doing process death code
- general gpr cleanups
This commit was SVN r3584.
--------
1. malloc casts to the right pointers
2. function parameter casts in the components (eg., recv requires a (char *) typecast
else cL compiler barfs)
3. added my own errno indirection. this is only in oob/tcp module. ompi_errno is #defined
ro errno in unix land and to a function ompi_get_error which returns the equivalent
error code.
4. implemented our own fcntl to prevent spaghetti coding. this currently only takes
F_GETFL and F_SETFL arguments, does nothing on F_GETFL and sets the nonblocking
option on F_SETFL
5. Moved some extern declarations to global scope since the CL compiler does not do
the right things if they are declared and used in static inline functions.
6. Protection around some header files. changed sys/errno to errno.
7. defined in_proto_t (unsigned uint16_t) to DWORD ... comments are welcome
This commit was SVN r3394.
I have to commit this to cleanup a break in my tree. I'm hoping it won't break the compile of the tree, but will fix it as quickly as possible.
Jeff - you are welcome to set an "ignore" on the gpr if you like - I'll let you know when I've got the "kinks" out.
This commit was SVN r2145.
code base.
- many (most) mca type names have "component" or "module" in them,
as relevant, just to further distinguish the difference between
component data/actions and module data/actions. All developers
are encouraged to perpetuate this convention when you create
types that are specific to a framework, component, or module
- did very little to entire framework (just the basics to make it
compile) because it's just about to be almost entirely replaced
- ditto for io / romio
- did not work on elan or ib components; have to commit and then
convert those on a different machine with the right libraries and
headers
- renamed a bunch of *_module.c files to *_component.c and *module*c
to *component*c (a few still remain, e.g., ptl/ib, ptl/elan, etc.)
- modified autogen/configure/build process to match new filenames
(e.g., output static-components.h instead of static-modules.h)
- removed DOS-style cr/lf stuff in ns/ns.h
- added newline to end of file src/util/numtostr.h
- removed some redundant error checking in the top-level topo
functions
- added a few {} here and there where people "forgot" to put them in
for 1 line blocks ;-)
- removed a bunch of MPI_* types from mca header files (replaced with
corresponding ompi_* types)
- all the ptl components had version numbers in their structs; removed
- converted a few more elements in the MCA base to use the OBJ
interface -- removed some old manual reference counting kruft
This commit was SVN r1830.