This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.
This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.
This commit closes trac:158
More details to follow.
This commit was SVN r14051.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r13912
The following Trac tickets were found above:
Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
remote nodes. It will also kill off rogue orteds and orterun
processes. The killing of processes is ifdef'ed out for Windows
since I do not know how to do it there. Note that this change
will requite an autogen.
This commit was SVN r13477.
yesterday and set it to "true" in orte_init(). But ompi_mpi_init()
doesn't call orte_init() -- it calls orte_init_stage1() and
orte_init_stage2(). So orte_initialized was never set to true, and
Badness happend from there (w.r.t. ompi_mpi_abort()).
This patch moves the setting of orte_initialized to orte_init_stage2()
so that everyone will always get it set properly.
It also moves setting orte_universe_info.state to RUNNING into
stage2() as well -- Ralph confirmed that that should have been there
for the same reasons that orte_initialized needs to be there.
This commit was SVN r13374.
completed successfully, Bad Things(tm) could happen.
* Now we explicitly check orte_initialized (a new global in ORTE
indicating whether we are between orte_init() and orte_finalize()
or not), and if so, react accordingly.
* If ORTE is initialized, use orte_system_info.nodename; otherwise,
use gethostname().
* Add loop protection to ensure that ompi_mpi_abort() is not invoked
multiple times recursively.
This commit was SVN r13354.
the ORTE_DAEMON_CMD type. Which, unfortunately, is used all over
the place. Without this, we get error:
[msc01:12341] [0,0,0] ORTE_ERROR_LOG: Data pack failed in file ../../ompi-trunk/orte/dss/dss_pack.c at line 83
[msc01:12341] [0,0,0] ORTE_ERROR_LOG: Data pack failed in file ../../ompi-trunk/orte/dss/dss_pack.c at line 58
[msc01:12341] [0,0,0] ORTE_ERROR_LOG: Data pack failed in file ../../../../ompi-trunk/orte/mca/pls/base/pls_base_orted_cmds.c at line 136
This commit was SVN r13320.
1. add a "cancel_operation" API to the pls components that allows orterun to demand that an orted operation (e.g., terminate_job) be immediately cancelled and abandoned.
2. changes the pls orted commands from blocking to non-blocking. This allows us to interrupt those operations should an orted be non-responsive. The change also adds an orte_abort_timeout that limits how long orterun will automatically wait for the orteds to respond - if the terminate command, for example, doesn't see orted response within that time, then we printout an appropriate error message and just give up.
3. modifies orterun to allow multiple ctrl-c's to simply abort the program even if the orteds have not responded
4. does some cleanup on the orte-level mca params so that their implementation looks a lot more like that of ompi - makes it easier to maintain. This change also includes the definition of an orte_abort_timeout struct and associated MCA param (can't have too many!) so you can set the time after which orterun gives up on waiting for orteds to respond
This needs more testing before migrating to 1.2.
This commit was SVN r13304.
but remove them also. This current set of changes will affect
nothing as no one is making use of this ability. However, orte-clean
will be changed soon to utilize this new feature.
This commit was SVN r12996.
This has now been corrected. The singleton startup will dutifully call the mapper framework so that the proper data storage locations get initialized. Unfortunately, we then had to instruct the RMAPS not to allocate a vpid range for this job - otherwise, it would make a mistake and think there were two processes in it. Hence, a change was required to RMAPS to tell it "map this job, but don't allocate a vpid range for it".
This change will need to migrate across to 1.2 after it "soaks" the appropriate time.
This commit was SVN r12952.
Also, take the first step in updating how we handle mca params in ORTE - bring it closer to how it is done in the other two layers. Much more work to be done here.
This commit was SVN r12838.
the same time, remove some of the MPI-related options from OPAL:
- provide mechanism to change at runtime whether sched_yield() should
be called when the progress engine is idle
- provide mechanism for changing the rate at which the event engine
is called when there are "no" users of the event engine (ie, when
using MPI but not TCP)
- fix some function names in the progress engine to better match
their intended use (and remove MPI naming scheme)
- remove progress_mpi_enable / progress_mpi_disable because
we can now use the functions to set the sched_yield and
tick rate interfaces
- rename opal_progress_events() to opal_progress_set_event_flag()
because the first really isn't descriptive of what the function
does and I always got confused by it
This commit was SVN r12645.
Fix comm_spawn by singletons. orte_init does some voodoo to let the system know about localhost when we are a singleton. This includes allocating it so that any comm_spawn'd children can use their parent "allocation". Unfortunately, the fix that bproc needs (due to that smr filling up the node segment!) causes the singleton startup to fail. The fix is to just have the singleton startup force an allocation of its localhost.
Only issue here is: what happens if we are in a persistent universe? The singleton will now overwrite any prior info on slots used on localhost by other jobs (won't affect anything else). The answer, of course, is to do something more intelligent - lookup localhost on the registry and just update its info instead of overwriting it.
Something for another day (or month....or year)
This commit was SVN r12644.
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
Get the ordering right so that a singleton can start.
Protect the rmgr copy app_context function from NULL fields
Tell the mapper it is okay for there not to be a pre-existing mapping plan for a parent when dynamically spawning processes
This commit was SVN r12257.
Fix the persistent daemon problem where it was exiting when a job completed. Problem was that the persistent daemon would order the job daemons to exit. They would then send an 'ack' back to the persistent daemon - but the ack consisted of an echo of the "exit" command, which was recv'd by the wrong listener who treated it as a properly sent cmd....and exited.
This commit was SVN r12243.
Fix the problem observed by multiple people that comm_spawned children were (once again) being mapped onto the same nodes as their parents. This was caused by going through the RAS a second time, thus overwriting the mapper's bookkeeping that told RMAPS where it had left off.
To solve this - and to continue moving forward on the ORTE development - we introduce the concept of attributes to control the behavior of the RM frameworks. I defined the attributes and a list of attributes as new ORTE data types to make it easier for people to pass them around (since they are now fundamental to the system, and therefore we will be packing and unpacking them frequently). Thus, all the functions to manipulate attributes can be implemented and debugged in one place.
I used those capabilities in two places:
1. Added an attribute list to the rmgr.spawn interface.
2. Added an attribute list to the ras.allocate interface. At the moment, the only attribute I modified the various RAS components to recognize is the USE_PARENT_ALLOCATION one (as defined in rmgr_types.h).
So the RAS components now know how to reuse an allocation. I have debugged this under rsh, but it now needs to be tested on a wider set of platforms.
This commit was SVN r12138.
- use the OPAL functions for PATH and environment variables
- make all headers C++ friendly
- no unamed structures
- no implicit cast.
Plus a full implementation for the orte_wait functions.
This commit was SVN r11347.
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.
This commit was SVN r11270.
Other changes:
1. Remove the old xcpu components as they are not functional.
2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one.
This will require an autogen/configure, I'm afraid.
This commit was SVN r11228.
Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced).
Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up).
I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t).
In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but...
Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems.
This commit was SVN r11204.
- orte-clean.c : check to see if the base session directory is empty
and delete it if it is.
- orte_universe_exists.c : Fix a down stread problem resulting from
George's r10718 commit. Don't use the 'fulldirpath' since
that is no longer guarenteed to be the absolute path
to the session directory. Construct this value outside of that
function from the prefix and frontend vars.
This commit was SVN r10741.
The following SVN revision numbers were found above:
r10718 --> open-mpi/ompi@47eef2e002
After seeing the uglyness that is removing directories in the
codebase I decided to push down this to the OPAL by extending the
opal/os_create_dirpath.(c|h) to contain some more functionality.
In this process I renamed 'os_create_dirpath' to 'os_dirpath' since it
is a bit more general now.
Added a few functions to:
- check if an directory is empty
- check to see if the access permissions are set correctly
- destroy the directory at the end of the dirpath
- By using a caller callback function (a la Perl, I believe)
for every file, the caller can have fine grained control over
whether a specific file is deleted or not.
This simplifies things a bit for orte_session_dir_(finalize|cleanup)
as it should no longer contain any of this functionality, but uses
these functions to do the work.
From the external perspective nothing has changed, from the
developer point of view we have some cleaner, more generic code.
This commit was SVN r10640.
from the tmp/jjhursey-ft-cr branch.
In this commit we change the way universe names are created.
Before we by default first created "default-universe" then
if there was a conflict we created "default-universe-PID"
where PID is the PID of the HNP.
Now we create "default-universe-PID" all the time (when
a default universe name is used). This makes it much
easier when trying to find a HNP from an outside app
(e.g. orte-ps, orteconsole, ...)
This also adds a "search" function to find all of the
universes on the machine. This is useful in many contexts
when trying to find a persistent daemon or when trying to
connect to a HNP.
This commit also makes orte_universe_t an opal_object_t,
which is something that needed to happen, and only effected
the SDS in one of it's base functions.
I was asked to bring this over to aid in fixing orteconsole
and orteprobe. Due to the change of orte_universe_t to
an object orteprobe may need to be updated to reflect this
change. Since orteprobe needs to be looked at anyway I'll
leave this to Ralph to take care of.
*Note*:
These changes do not depend upon any of the FT work (but
the FT work does depend upon them). These were brought over
to help in fixing some of the ORTE tool set that require
the functionality layed out in this patch.
Testing:
Ran the 'ibm' tests before and after this change, and all was
as well as before the change. If anyone notices additional
irregularities in the system let me know. But none are expected.
This commit was SVN r10550.
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
debugger scheme described in
http://www.open-mpi.org/community/lists/users/2005/10/0214.php. This
makes our user-level debugger scheme much more vendor-independent
(although the "-tv" option will still work for backwards compatibility
-- it'll just be a synonum of "--debug").
This commit was SVN r8206.
This takes care of Troy's first segfault problem, and compile errors that will likely happen as soon as Ken applies George's patch and runs make again.
This commit was SVN r7833.
originally suggested by Ralf Wildenhues, to try to speed autogen, configure,
and make (and possibly even make install). Use automake's include directive
to drastically reduce the number of Makefile files (although the number of
Makefile.am files is the same - most are just included in a top-level
Makefile.am). Also use an Automake SUBDIRs feature to eliminate the
dynamic-mca tree, which was no longer really needed. This makes adding
a framework easier (since you don't have to remember the dynamic-mca
tree) and makes building faster (as make doesn't have to recurse through
the dynamic-mca tree)
This commit was SVN r7777.
command:
svn merge -r 7567:7663 https://svn.open-mpi.org/svn/ompi/tmp/jjhursey-rmaps .
(where "." is a trunk checkout)
The logs from this branch are much more descriptive than I will put
here (including a *really* long description from last night). Here's
the short version:
- fixed some broken implementations in ras and rmaps
- "orterun --host ..." now works and has clearly defined semantics
(this was the impetus for the branch and all these fixes -- LANL had
a requirement for --host to work for 1.0)
- there is still a little bit of cleanup left to do post-1.0 (we got
correct functionality for 1.0 -- we did not fix bad implementations
that still "work")
- rds/hostfile and ras/hostfile handshaking
- singleton node segment assignments in stage1
- remove the default hostfile (no need for it anymore with the
localhost ras component)
- clean up pls components to avoid duplicate ras mapping queries
- [possible] -bynode/-byslot being specific to a single app context
This commit was SVN r7664.
Some of the functions in opal_init are void or return a bool (opal_output_init, but always returns true.. eh?), so I don't check them.
This commit was SVN r7638.
some orted's to stall on locks in the MPI Dynamics cases. Since it
is not essentual that we call these functions, they can so away.
Unlock the peer lock when aborting. This causes a potential deadlock
in do_waitall [see comment in code]. This was causing orteds to
deadlock at times when the seed had terminated. With proper interleaving
and timing the orted was deadlocking. This seems to have fixed this in
my stress testing with MPI 2 Dynamics.
This commit was SVN r7539.
However we do want to do a bit of cleanup on the node before we exit,
specificly clean out the session directory. I also had a couple of the
subsystems that don't depend upon peers (which is key) clean up as well.
Pedantic formatting issue in oob_tcp.h
This commit was SVN r7387.
that multiple processes don't overwrite each other. Change that
default in orte_init_stage1() to just "output-" (because the file will
be in a process-unique directory at that point; the pid is no longer
necessary).
This commit was SVN r7256.
opal_output_set_output_file_info(). This allows getting and setting
the default directory where output stream files will be opened (for
all *new* streams). Before this function is not invoked, the default
location is $TMPDIR or $HOME (if $TMPDIR is not defined).
Added a call into orte_init_stage1() to call this function
immediately after the session directory is created and set the default
location of stream files to be the process' session directory.
This commit was SVN r7254.
AM_INIT_AUTOMAKE, instead of the deprecated version.
* Work around dumbness in modern AC_INIT that requires the version
number to be set at autoconf time (instead of at configure time, as
it was before). Set the version number, minus the subversion r number,
at autoconf time. Override the internal variables to include the r
number (if needed) at configure time. Basically, the right thing
should always happen. The only place it might not is the version
reported as part of configure --help will not have an r number.
* Since AM_INIT_AUTOMAKE taks a list of options, no need to specify
them in all the Makefile.am files.
* Addes support for subdir-objects, meaning that object files are put
in the directory containing source files, even if the Makefile.am is
in another directory. This should start making it feasible to
reduce the number of Makefile.am files we have in the tree, which
will greatly reduce the time to run autogen and configure.
This commit was SVN r7211.
CTRL-C'd.
We were calling orte_finalize recursively which caused a segv when it tried to
use a freed framework (orte_rmgr in this case).
I added a status flag to orte_universe_info to indicate where we are in the code.
This was needed to determine if we should call orte_abort or not when shutting
down in the tcp oob.
This commit was SVN r7160.
orte_init_stage1(), since not all ORTE processes call orte_init().
* Expad opal_error test case to make sure ORTE error codes print
properly
* Make project error codes start at easy values (OPAL is -1 to -100,
ORTE is -101 to -200, OMPI is less than -201) to make it easier
to figure out what an error code as an integer means. Also has
the nice property of not changing the values of error codes ever
time a new error code is added.
This commit was SVN r7061.
tree.
- fix up #include's throughout the tree (yay contrib/search_replace.pl!)
- remove a few extraneous #include's
- remove orte_sys_info*() from opal_init()/opal_finalize() (it's
already in orte_init_stage1() and orte_system_finalize())
- remove dependencies in opal on orte_system_info -- util/os_path.c
and util/os_create_dirpath.c (they only used path_sep, anyway --
easily changed to #defines)
This commit was SVN r7059.
- Change orte_base_infrastructre to orte_infrastructre to conform with
ompi_info's needs
- Move MCA Param registration in ORTE to a centralized function that is
called first in orte_init_stage1
- Set the infrastructre flag as an argument to orte_init
- Adjust initalization functions to properly pass down the infrastructre
flag.
This commit was SVN r7053.
OPAL_ERROR, same for all the other error codes. Also, make sure that there
are never conflicts between OPAL anr ORTE error codes (for example).
Finally, provide opal_perror(), opal_strerror(), and opal_strerror_r() to
give stringified error messages for the different error codes
This commit was SVN r6969.
- change the framework opens to [mostly] use the new MCA param API
- properly pass in framework debug output streams to the
mca_base_component_open() function
This commit was SVN r6888.
ns_replica.c
- Removed the error logging since I use this function in orte_init_stage1 to
check if we have created a cellid yet or not.
ras_types.h & rase_base_node.h
- This was an empty file. moved the orte_ras_node_t from base/ras_base_node.h
to this file.
- Changed the name of orte_ras_base_node_t to orte_ras_node_t to match the
naming mechanisms in place.
ras.h
- Exposed 2 functions:
- node_insert:
This takes a list of orte_ras_base_node_t's and places them in the Node
Segment of the GPR. This is to be used in orte_init_stage1 for singleton
processes, and the hostfile parsing (see rds_hostfile.c). This just puts
in the appropriate API interface to keep from calling the
orte_ras_base_node_insert function directly.
- node_query:
This is used in hostfile parsing. This just puts in the appropriate API
interface to keep from calling the orte_ras_base_node_query function
directly.
- Touched all of the implemented components to add reference to these new
function pointers
ras_base_select.c & ras_base_open.c
- Add and set the global module reference
rds.h
- Exposed 1 function:
- store_resource:
This stores a list of rds_cell_desc_t's to the Resource Segment.
This is used in conjunction with the orte_ras.node_insert function in
both the orte_init_stage1 for singleton processes and rds_hostfile.c
rds_base_select.c & rds_base_open.c
- Add and set the global module reference
rds_hostfile.c
- Added functionality to create a new cellid for each hostfile, placing
each entry in the hostfile into the same cellid. Currently this is
commented out with the cellid hard coded to 0, with the intention of
taking this out once ORTE is able to handle multiple cellid's
- Instead of just adding hosts to the Node Segment via a direct call to
the ras_base_node_insert() function. First add the hosts to the Resource
Segment of the GPR using the orte_rds.store_resource() function then use
the API version of orte_ras.node_insert() to store the hosts on the Node
Segment.
- Add 1 new function pointer to module as required by the API.
rds_hostfile_component.c
- Converted this to use the new MCA parameter registration
orte_init_stage1.c
- It is possible that a cellid was not created yet for the current environment.
So I put in some logic to test if the cellid 0 existed. If it does then
continue, otherwise create the cellid so we can properly interact with the
GPR via the RDS.
- For the singleton case we insert some 'dummy' data into the GPR. The RAS
matches this logic, so I took out the duplicate GPR put logic, and
replaced it with a call to the orte_ras.node_insert() function.
- Further before calling orte_ras.node_insert() in the singleton case,
we also call orte_rds.store_resource() to add the singleton node to the
Resource Segment.
Console:
- Added a bunch of new functions. Still experimenting with many aspects of the
implementation. This is a checkpoint, and has very limited functionality.
- Should not be considered stable at the moment.
This commit was SVN r6813.
using orteprobe.
Created a header file for orte_setup_hnp. [HNP = Head Node Process]
General cleanup and added a bit of documentation in orte_setup_hnp.c
Also fixed a cellid tokens issue (circa line 285)
Changed the launched scope from private to public
In orteprobe:
- added reference to orted.h to avoid duplicate header contents in orteprobe.h
- removed the version tag, and put in a verbose argument
- Fixed a buffer packing problem that was causing the parent from receiving the
proper contact information for the new daemon.
This commit was SVN r6802.
This required a little fiddling with a number of areas. Biggest problem was that it uncovered a potential for an infinite loop to be created in the registry. If a callback function modified the registry, the registry checked the triggers to see if anything had fired. Well, if the original callback was due to a trigger firing, that condition hadn't changed - so the trigger fired again....which caused the callback to be called, which modified the registry, which checked the triggers, etc. etc.
Triggers are now checked and then "flagged" as being "in process" so that the registry will NOT recheck that trigger until all callbacks have been processed. Tried doing this with subscriptions as well, but that caused a problem - when we release processes from a stagegate, they (at the moment) immediately place data on the registry that should cause a subscription to fire. Unfortunately, the system will just hang if that subscription doesn't get processed. So, I have left the subscription system alone - any callback function that modifies the registry in a fashion that will fire a subscription will indeed fire that subscription. We'll have to see if this causes problems - it shouldn't, but a careless user could lock things up if the callback generates a callback to itself.
Also fixed the code that placed a process' RML contact info on the registry to eliminate the leading '/' from the string.
This commit was SVN r6684.
test from orte_init_stage1 into a new framework, Startup Discovery Service
(sds). This allows us to have more flexibility with platforms like
Red Storm, which do not have a universe in the usual meaning and don't have
a seed daemon they can contact
This commit was SVN r6630.
* Add ability to completely disable libltdl (the dlopen code to load
dynamic shared objects) to configure: --disable-dlopen
* Added MCA param (component_disable_dlopen) to disable DSO loading
at runtime
* Made the event library behave in some not-completely-erroneous way
on platforms where it has absolutely no eventops support (ie, no
select, poll, or epoll)
* Disabled orte_wait, opal_few, and opal_daemon_init code on
platforms without fork, waitpid support. All non-init functions
will return OPMI_ERR_NOT_SUPPORTED
* Disable orteprobe tool when fork or pipe aren't supported
This commit was SVN r6490.
* rename ompi_malloc to opal_malloc
* rename ompi_numtostr to opal_numtostr
* start of rename of ompi_environ to opal_environ
This commit was SVN r6332.
* rename ompi_basename to opal_basename
* rename ompi bitop functions to opal
* rename ompi_cmd_line to opal_cmd_line
* rename ompi_sizet2int to opal_sizet2int
* rename orte_daemon_init to opal_daemon_init
* rename ompi_few to opal_few
This commit was SVN r6330.