openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	ecd603256a	* Rename opal_hwloc_components to opal_hwloc_base_components * Fix some comments This commit was SVN r25150.	2011-09-17 11:54:36 +00:00
Ralph Castain	92c7372e20	Per the RFC from Jeff, move hwloc from opal/mca/common to its own static framework ala libevent. Have ORTE daemons collect the topology info at startup and, if --enable-hwloc-xml is set, send that info back to the HNP for later use. The HNP only retains unique topology "templates" to reduce memory footprint. Have the daemon include the local topology info in the nidmap buffer sent to each app so the apps don't all hammer the local system to discover it for themselves. Remove the sysinfo framework as hwloc replaces that functionality. This commit was SVN r25124.	2011-09-11 19:02:24 +00:00
Wesley Bland	4e7ff0bd5e	By popular demand the epoch code is now disabled by default. To enable the epochs and the resilient orte code, use the configure flag: --enable-resilient-orte This will define both: ORTE_ENABLE_EPOCH ORTE_RESIL_ORTE This commit was SVN r25093.	2011-08-26 22:16:14 +00:00
Shiqing Fan	6d0ab9bd6c	One library was missing for linking orterun on Windows. This commit was SVN r25057.	2011-08-18 09:33:41 +00:00
Shiqing Fan	3af7c9f7bb	Complete the MinGW build support on Windows. This commit was SVN r25048.	2011-08-15 09:47:23 +00:00
Ralph Castain	715f871605	Ignore the daemon job when reporting parseable output This commit was SVN r24944.	2011-07-25 20:44:08 +00:00
Ralph Castain	199804fc35	complete implementation of parseable output This commit was SVN r24929.	2011-07-23 22:23:24 +00:00
Ralph Castain	00647fa342	Update orte-ps to add parseable output - not fully tested because I couldn't get other parts of the system to work. This commit was SVN r24927.	2011-07-23 20:20:31 +00:00
Ralph Castain	1ad110d2e9	After a nice, calm, rational discussion between Brian, Jeff, and myself, we decided to revert r24864 and r24862 to restore the reference counters in opal_init/finalize. The rationale was that we should instead change orte_init/finalize to also use reference counters to support multi-embedded libraries. Jeff and Brian will discuss proposing a similar change to mpi_init/finalize to the MPI Forum so that all three libraries will behave in similar manners. It was agreed that opal_init_util had wound up being used in unintended ways, which raised the problem of getting reference counts to work right. However, fixing it would involve more pain than it was worth - and so long as the other layers are made to behave similarly, I have no preference either way. Complete implementation will follow - for now, this just reverts the prior changes. This commit was SVN r24886. The following SVN revision numbers were found above: r24862 --> open-mpi/ompi@aa92e0c4eb r24864 --> open-mpi/ompi@a5062385c2	2011-07-12 17:07:41 +00:00
Ralph Castain	aa92e0c4eb	Replace a useless counter with a boolean check to see if we have already passed thru opal_finalize so we don't call finalize, and then don't pass thru it (as was happening on several tools) This commit was SVN r24862.	2011-07-08 06:43:19 +00:00
Wesley Bland	e1ba09ad51	Add a resilience to ORTE. Allows the runtime to continue after a process (or ORTED) failure. Note that more work will be necessary to allow the MPI layer to take advantage of this. Per RFC: http://www.open-mpi.org/community/lists/devel/2011/06/9299.php This commit was SVN r24815.	2011-06-23 20:38:02 +00:00
Samuel Gutierrez	81f38b258a	commit of new shared memory backing facility framework (shmem) and its components. This commit was SVN r24795.	2011-06-21 15:41:57 +00:00
Jeff Squyres	9531e205e1	Minor fix to a comment This commit was SVN r24789.	2011-06-20 17:51:01 +00:00
Josh Hursey	6539a31b23	Cleanup configure checks for C/R functionality. Add a WANT_FT_CR flag different from WANT_FT so tools like *-checkpoint are not built when a different FT technique is requested. Also fix the C/R thread check so that it is only enabled if C/R is enabled, not generally when threads are enabled. This commit was SVN r24769.	2011-06-09 19:45:29 +00:00
Ralph Castain	8c08ee9c3d	Remove stale tool This commit was SVN r24720.	2011-05-21 00:38:35 +00:00
Ralph Castain	b47ec2ee87	Remove lingering references to opal_profile option This commit was SVN r24709.	2011-05-18 18:27:29 +00:00
Ralph Castain	9678e62613	Fix possible corruption of environ. Thanks to Ariel Burton and Peter Thompson for finding it! This commit was SVN r24708.	2011-05-18 16:25:35 +00:00
Ralph Castain	0ff0d20e72	Grr...get the prefix right - need to strip the bin out of absolute path to mpirun. This commit was SVN r24658.	2011-04-28 22:20:55 +00:00
Ralph Castain	6af2677fb8	Check for both absolute-path-to-mpirun and -prefix being specified. If the two differ, print out a warning and ignore -prefix. If they are the same, or only one was given, then proceed as directed. This commit was SVN r24657.	2011-04-28 22:12:41 +00:00
Ralph Castain	9988b97b97	Extend/update how we handle process stats. Add the ability to collect node-level stats separate from the process stats. Update the process stat memory fields to report in MBytes instead of KBytes as I can't find any process that runs in KBytes nowadays. Rename the memusage sensor plugin to "resusage" as it will soon be updated to include full process stat monitoring. Extend the heartbeat sensor to report node and process stats in the heartbeat. Store the process and node stats in their respective orte_xxx_t object. This commit was SVN r24629.	2011-04-21 22:55:45 +00:00
Ralph Castain	3a28556472	Expand our handling of non-zero exit status. If a process exits with non-zero status, pass that info along to the user in case it means something to them, even if the process also exited without calling MPI_Finalize. If the process calls MPI_Abort, that trumps the exit status question. Provide a new MCA param that allows the user to direct that we abort the job once a process exits with non-zero status. No recovery is allowed in such cases to avoid trying to restart a process that has already exited MPI. This commit was SVN r24614.	2011-04-14 15:04:21 +00:00
Ralph Castain	d17b50e1ff	Add the appropriate hooks to tell Totalview to display the user's main program upon startup. Apparently, this hook got lost somewhere after the 1.2 series :-( Thanks to David Turner and the TV folks for passing this along. This commit was SVN r24549.	2011-03-21 17:40:58 +00:00
Eugene Loh	2770a12beb	Continue clean up of thread options started in r22841, 22842, and 22849. No need for any CMRs to 1.5... that was already done in CMR 2728. This commit was SVN r24545. The following SVN revision numbers were found above: r22841 --> open-mpi/ompi@b400b84162	2011-03-18 21:36:35 +00:00
Ralph Castain	ebabe9c83a	Forgot that Terry wanted to control the vm launch with an mca param - set one up for that purpose This commit was SVN r24525.	2011-03-13 00:46:42 +00:00
Ralph Castain	dc6f616599	Enable VM launch. For some time, ORTE has had the ability to launch daemons on all nodes prior to launching an application. It has largely been used outside of the OMPI community, and so was never explicitly turned "on" inside OMPI releases. Nevertheless, the code has been there. Allowing VM launches does not require ANY changes to existing PLM components. All that was required was to have orterun launch the daemons as a separate call to orte_plm.spawn -prior- to launching the applications. The rest of the VM support code resides in the rmaps framework: (a) a check when asked to map a job to see if it is the daemon job, and (b) a separate "setup_virtual_machine" mapper in the rmaps base that creates the required map so the PLM's will do the right thing. In order to support those users who have no RM allocation but like to give the allocation in the form of a -host or -hostfile argument to their application, there is a little more code in orterun and the setup_virtual_machine mapper to capture information passed in that manner. This has been tested with rsh and slurm environments, and, since there is nothing environment-specific in the implementation, should work in others as well - but needs to be proven. This commit was SVN r24524.	2011-03-12 22:50:53 +00:00
George Bosilca	80fe617cd2	If we don't release the OPAL utils explicitly there will be a memory leak. This commit was SVN r24505.	2011-03-10 00:42:28 +00:00
George Bosilca	7f34a28c8f	Correct a comment. This commit was SVN r24504.	2011-03-10 00:41:41 +00:00
Jeff Squyres	3f4d4886f2	Minor update for something that has been bugging me for quite a while: OMPI supports multiple different repository systems (SVN, hg, git). But the VERSION file has listed "want_svn" and "svn_r" as fields, even though the actual repo system and version may not be SVN. So search/replace those fields (and derrivative values that come from those fields) with "want_repo_rev" and "repo_rev", respectively. This commit was SVN r24405.	2011-02-16 22:53:23 +00:00
Ralph Castain	a9dca25ca5	Remove the distinction between local and global restarts - leave it up to the error strategy to decide which to do. Cleanup the heartbeat handling so it is associated with the proc, not a node. Cleanup handling of recovery options so that defaults do not override user values iff they are provided. This commit was SVN r24382.	2011-02-14 20:49:12 +00:00
Jeff Squyres	ec3d18dc9f	As noted on the mailing list by Gabriele Fatigati (http://www.open-mpi.org/community/lists/users/2011/01/15427.php), the --tv (and friends) switches to mpirun would effectively munge the orterun command line together and then split it apart again before exec'ing the underlying debugger. We would therefore lose multi-token argv[x] value and split them into multiple tokens. For example: mpirun --tv -np 2 a.out "foo bar" would get launched with "foo" and "bar" as separate arguments; not one argument. This was due to the underlying code joining the argv into a single string and then re-splitting it. This commit removed the argv join; it now does the parsing and re-jigering of the argv by only looking at each individual argv item; multi-word tokens like "foo bar" will never be split into separate tokens. This commit was SVN r24322.	2011-01-28 13:01:06 +00:00
Josh Hursey	81fd41f811	Return an informative error message if the user requests a migration of a job that is not capable of it. C/R Functionality cleanup This commit was SVN r24307.	2011-01-26 15:36:34 +00:00
Josh Hursey	e4d13d338f	Fix a couple of compiler warnings This commit was SVN r24295.	2011-01-25 22:22:32 +00:00
Shiqing Fan	f43862420c	Convert the bad dos line endings to unix style for all windows related files. This commit was SVN r24137.	2010-12-02 12:08:08 +00:00
Rolf vandeVaart	1d62542c23	Fix another Sun Studio warning. jobid and vpid need to be uint32_t. This commit was SVN r24074.	2010-11-19 18:12:46 +00:00
Shiqing Fan	358b4a5cba	Add an option to enable the debug postfix for executables. This commit was SVN r24070.	2010-11-19 15:54:13 +00:00
Jeff Squyres	e4744b4ed5	Per http://www.open-mpi.org/community/lists/devel/2010/11/8671.php , change a bunch of OMPI_<foo> names to OPAL_<foo>. This commit was SVN r24046.	2010-11-12 23:22:11 +00:00
Shiqing Fan	c03ea1a5f3	A more clean way to build on Windows. It's not possible to combine two shared libraries on Windows, so we have to do it a bit different. First generate a small event static library by just linking the object files, and link it into other libraries that needs the libevent API. This commit was SVN r24039.	2010-11-11 12:02:54 +00:00
Shiqing Fan	7bac326920	Fix Windows build, add custom command to generate static libraries (opal and orte) for shared build. This commit was SVN r24012.	2010-11-09 08:32:45 +00:00
Abhishek Kulkarni	132c8d1b00	removing some unneeded calls to ORTE_ERROR_LOG This commit was SVN r23999.	2010-11-06 22:00:18 +00:00
Ralph Castain	9ea2b196ce	Convert the opal_event framework to use direct function calls instead of hiding functions behind function pointers. Eliminate the opal_object_t abstraction of libevent's event struct so it can be directly passed to the libevent functions. Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component. Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work. This commit was SVN r23966.	2010-10-28 15:22:46 +00:00
Ralph Castain	c13b0bb668	Update some debugger attachment code per LLNL request This commit was SVN r23965.	2010-10-28 03:06:20 +00:00
Ralph Castain	7f103c8a9d	Supply missing argument This commit was SVN r23945.	2010-10-26 07:24:55 +00:00
Ralph Castain	86c7365e8e	Clean up a few initialization issues - don't think these are impacting the shared memory situation as it didn't fix the problem. Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution. This commit was SVN r23943.	2010-10-26 02:41:42 +00:00
Ralph Castain	fceabb2498	Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac. This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects. Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems. Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct. I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things: 1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new) 2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it. There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do. This commit was SVN r23925.	2010-10-24 18:35:54 +00:00
Ralph Castain	2c1a658232	Fix debugger attach This commit was SVN r23921.	2010-10-22 20:07:24 +00:00
Jeff Squyres	4322a78e60	Update wrapper compiler scripts to search for perl during configure, per request from BSD maintainers. This commit was SVN r23914.	2010-10-19 22:45:54 +00:00
Jeff Squyres	c891ed34e2	More verbatim escaping. This commit was SVN r23873.	2010-10-07 22:26:51 +00:00
Ralph Castain	871b685e89	Ensure all debugger interface symbols are present in orterun This commit was SVN r23823.	2010-09-30 21:36:00 +00:00
Ralph Castain	3631e4e936	Revert remaining svn kruft from r23764 This commit was SVN r23786. The following SVN revision numbers were found above: r23764 --> open-mpi/ompi@40a2bfa238	2010-09-22 01:11:40 +00:00
Ralph Castain	40a2bfa238	WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gone into validating the branch. However, not all conditions can be checked, so users are cautioned that it may be advisable to not update from the trunk for a few days to allow MTT to identify platform-specific issues. This merges the branch containing the revamped build system based around converting autogen from a bash script to a Perl program. Jeff has provided emails explaining the features contained in the change. Please note that configure requirements on components HAVE CHANGED. For example. a configure.params file is no longer required in each component directory. See Jeff's emails for an explanation. This commit was SVN r23764.	2010-09-17 23:04:06 +00:00

1 2 3 4 5 ...

656 Коммитов