openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	01fd96bfa5	Revert "Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build." This reverts commit `63f619f871`.	2014-10-22 10:32:11 -07:00
Jeff Squyres	206eade32c	mpirun.1in: whitespace cleanup Whitespace cleanup only; no content changes.	2014-10-20 05:18:25 -07:00
Jeff Squyres	9529289319	mpirun.1in: more updates about binding/etc. Follow on to `91e9686` and `f9d620e`.	2014-10-20 05:17:49 -07:00
Ralph Castain	91e96861dd	Cleanup the orterun man page per review by Gus Correa	2014-10-19 10:21:50 -07:00
Ralph Castain	f9d620e3a7	Update the orterun man page	2014-10-16 21:05:04 -07:00
Ralph Castain	ecbae03009	Fix typo	2014-10-16 13:30:06 -07:00
Ralph Castain	b6aa691e0a	Fix incorrect implementation of new MCA param mca_base_env_list - it was not picking up envars and forwarding them, but only worked if you explicitly set a value for the envar. Ensure it works for both direct and indirect launch modes. Remove stale code as this replaced orte_forward_envars. Ensure it doesn't get passed to the ORTE daemons.	2014-10-16 12:58:56 -07:00
Ralph Castain	63f619f871	Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build.	2014-10-10 11:39:08 -07:00
Jeff Squyres	72704441a2	URLs: update URLs for GitHub	2014-10-01 14:44:09 -07:00
Ralph Castain	84810b80fd	Cover the remaining code paths for Java apps to define class path Refs trac:4926 This commit was SVN r32823. The following Trac tickets were found above: Ticket 4926 --> https://svn.open-mpi.org/trac/ompi/ticket/4926	2014-09-30 22:27:03 +00:00
Ralph Castain	040a69c38b	Correct the classpath to correctly include the local directory so Java programs find the application class cmr=v1.8.4:reviewer=jsquyres This commit was SVN r32817.	2014-09-30 16:35:12 +00:00
Ralph Castain	0445052a1c	Check for multiple declarations of a given MCA param and error out if detected as that can create an ambiguous definition of the param value. Refs trac:4897 This commit was SVN r32719. The following Trac tickets were found above: Ticket 4897 --> https://svn.open-mpi.org/trac/ompi/ticket/4897	2014-09-12 22:21:30 +00:00
Ralph Castain	e671620ac7	Per request from Jeff: tune up the help messages for binding options Refs trac:4898 This commit was SVN r32691. The following Trac tickets were found above: Ticket 4898 --> https://svn.open-mpi.org/trac/ompi/ticket/4898	2014-09-09 22:39:22 +00:00
Ralph Castain	4207b4c4ad	Improve the --bind-to help message to better indicate the default options under various values of np. Remove the warning message if the user doesn't specify a binding policy and we are overloaded cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32687.	2014-09-08 21:03:51 +00:00
Ralph Castain	4df1aa63f7	Since we've run into the situation where someone puts a script wrapper around a launcher such as srun, we need to always protect MCA cmd line params with quotes. This means we also need to protect the backend from quotes coming into the system as part of a value, or else the parser gets confused. So add a new function for wrapping MCA arguments, and tell the backend parser to ignore/remove leading/trailing quotes. cmr=v1.8.3:reviewer=jsquyres This commit was SVN r32686.	2014-09-08 20:38:46 +00:00
Ralph Castain	aec5cd08bd	Per the PMIx RFC: WHAT: Merge the PMIx branch into the devel repo, creating a new OPAL “lmix” framework to abstract PMI support for all RTEs. Replace the ORTE daemon-level collectives with a new PMIx server and update the ORTE grpcomm framework to support server-to-server collectives WHY: We’ve had problems dealing with variations in PMI implementations, and need to extend the existing PMI definitions to meet exascale requirements. WHEN: Mon, Aug 25 WHERE: https://github.com/rhc54/ompi-svn-mirror.git Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding. All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level. Accordingly, we have: * created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations. * Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported. * Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint * removed the prior OMPI/OPAL modex code * added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform. * retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand This commit was SVN r32570.	2014-08-21 18:56:47 +00:00
Gilles Gouaillardet	f96d382d1d	Fix typo. Thanks to Christopher Samuel for reporting it This commit was SVN r32520.	2014-08-13 05:54:59 +00:00
Gilles Gouaillardet	e184733ef6	check-help-strings cleanup This commit was SVN r32496.	2014-08-11 03:26:21 +00:00
Jeff Squyres	4da3c85b54	fortran: revert Absoft-based fixes Rever r32246, r32254, and 32255 -- they were fixing side-effects of the real bug. Real fix coming after this one. This commit was SVN r32286. The following SVN revision numbers were found above: r32246 --> open-mpi/ompi@08d2a1a48d r32254 --> open-mpi/ompi@232d4dbb7b	2014-07-22 21:49:22 +00:00
Jeff Squyres	6cc538ae16	help-orterun.txt: wrap long messages, clarify new messages Clarify the new -x/mca_base_env_list help messages. This commit was SVN r32199.	2014-07-10 17:24:52 +00:00
Ralph Castain	796f57f709	Protect against problems if someone passes us thru a pipe and then abnormally terminates the pipe early This commit was SVN r32189.	2014-07-09 22:41:53 +00:00
Joshua Ladd	801e2cb544	Fix error and warning messages after reverting the mca_base_env_list to being semicolon delimited. This commit was SVN r32179.	2014-07-09 14:46:19 +00:00
Joshua Ladd	30da6d3a17	Opal: add a new MCA parameter that allows the user to specify a list of environment variables. This parameter will become the standard mechanism by which environment variables are set for OMPI applications replacing the -x option. mpirun ... -x env_foo1=val1 -x env_foo2 -x env_foo3=val3 should now be expressed as mpirun ... -mca mca_base_env_list env_foo1=val1+env_foo2+env_foo3=val3. The motivation for doing this is so that a list of environment variables may be set via standard MCA mechanisms such as mca parameter files, amca lists, etc. This feature was developed by Elena Shipunova and was reviewed by Josh Ladd. This commit was SVN r32163.	2014-07-09 00:38:25 +00:00
Adrian Reber	cabf1d4e68	use the orte attributes in the FT code to fix compile errors This commit was SVN r32093.	2014-06-26 03:19:17 +00:00
Ralph Castain	5f6be06b54	Per request from Gilles and discussion at devel conference, have the --oversubscribe option automatically set both oversubscribe and overload-allowed properties as this is likely what the user intended. cmr=v1.8.2:reviewer=rhc:subject=automatically set oversub/load This commit was SVN r32072.	2014-06-24 18:11:39 +00:00
Ralph Castain	8db76e9c6f	Ensure that we change to the session dir if we preload binaries so we'll use the loaded one Special patch created for v1.8 and CMR filed This commit was SVN r31963.	2014-06-06 21:43:23 +00:00
Ralph Castain	f1978fba7c	Cleanup a set of typos on the orte_get_attribute call This commit was SVN r31942.	2014-06-03 20:36:38 +00:00
Ralph Castain	8736a1c138	Per RFC: http://www.open-mpi.org/community/lists/devel/2014/05/14822.php Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root). This commit was SVN r31916.	2014-06-01 16:14:10 +00:00
Oscar Vega-Gisbert	83bdebbf81	Java bindings for OSHMEM. This commit was SVN r31810.	2014-05-18 21:48:09 +00:00
Ralph Castain	5602156a1c	Use the correct abstraction layer name for the data dirs This commit was SVN r31684.	2014-05-08 14:32:24 +00:00
Ralph Castain	11faab1091	The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees. This commit was SVN r31679.	2014-05-08 02:01:35 +00:00
Ralph Castain	4def94900a	Per RFC: OMPI_INSTALL_BINARIES -> OPAL_INSTALL_BINARIES This commit was SVN r31634.	2014-05-05 21:43:05 +00:00
Ralph Castain	7a79b25577	Ensure we cleanup some files so session dirs can be rolled up cmr=v1.8.2:reviewer=jsquyres This commit was SVN r31569.	2014-04-30 17:52:10 +00:00
Ralph Castain	c4c9bc1573	As per the RFC: http://www.open-mpi.org/community/lists/devel/2014/04/14496.php Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM This commit was SVN r31557.	2014-04-29 21:49:23 +00:00
Jeff Squyres	e1655ae68d	opal/util/fd.c: add new convenience function for setting FD_CLOEXEC Paul Hargrove pointed out that Stevens tells us that we should FD_GETFL before FD_SETFL. And so we shall. Make a new convenience function to do this (opal_fd_set_cloexec()), just so that we don't have to litter this 2-step process throughout the code. Refs trac:4550 This commit was SVN r31513. The following Trac tickets were found above: Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550	2014-04-24 13:04:49 +00:00
Jeff Squyres	87e6232e67	orterun.c: set an fd to be close-on-exec Make sure the debugger attach fifo is marked as close-on-exec so that children procs don't inherit it. For example, if you salloc a SLURM allocation and run "mpirun ..." in there (i.e., mpirun is running on the head node, and launching on to back-end nodes), the forked srun's will inherit this fd if it is still open. Refs trac:4550 This commit was SVN r31499. The following Trac tickets were found above: Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550	2014-04-22 21:55:09 +00:00
Jeff Squyres	63b7ef4103	orterun.1in: Document --allow-run-as-root option Add some verbiage about how mpirun now defaults to disallowing running as root, but you can use the --allow-run-as-root option to override this default behavior. Refs trac:4536 This commit was SVN r31477. The following Trac tickets were found above: Ticket 4536 --> https://svn.open-mpi.org/trac/ompi/ticket/4536	2014-04-22 14:34:32 +00:00
Jeff Squyres	482b465c05	Trivial format change: use the same length of lines and \n offsets as opal_show_help(). Refs trac:4536 This commit was SVN r31437. The following Trac tickets were found above: Ticket 4536 --> https://svn.open-mpi.org/trac/ompi/ticket/4536	2014-04-18 23:14:45 +00:00
Ralph Castain	12094eb7b2	Add some further protections after discussion with Jeff Refs trac:4536 This commit was SVN r31422. The following Trac tickets were found above: Ticket 4536 --> https://svn.open-mpi.org/trac/ompi/ticket/4536	2014-04-18 16:21:55 +00:00
Ralph Castain	7c4fa3446c	Per the telecon, revert r31302 for now pending an RFC review on the idea of setting app proc envar's using an MCA param This commit was SVN r31345. The following SVN revision numbers were found above: r31302 --> open-mpi/ompi@6a1b78e26b	2014-04-08 15:47:12 +00:00
Mike Dubman	6a1b78e26b	opal: add mca param to control ranks env variables add -mca base_env_list "var1=val1 var2=val2 ..." mca parameter that can be used in mca param files or with -am app.conf mpirun commandline to set rank env variables with mca mechanism fixed by Elena, reviewed by Miked cmr=v1.8.1:reviewer=ompi-rm1.8 This commit was SVN r31302.	2014-04-01 21:14:31 +00:00
Jeff Squyres	173c046617	build: add Automake-like silent/verbose macros for "ln -s ..." operations Also, since I put some of the macros for these silent/verbose rules up in the top-level Makefile.man-page-rules file, I renamed it to Makefile.ompi-rules. I've had this sitting around for a while; now seems like as good a time as any to commit it. This commit was SVN r31271.	2014-03-28 18:24:32 +00:00
Ralph Castain	f7df960198	Silence warning This commit was SVN r31139.	2014-03-18 23:15:29 +00:00
Ralph Castain	518ba55cf4	Ensure MPIEXEC_TIMEOUT calls the correct state to exit cmr=v1.7.5:reviewer=dgoodell This commit was SVN r31125.	2014-03-18 20:12:02 +00:00
Ralph Castain	38e02890aa	ORTE doesn't care about cxx flags cmr=v1.8:reviewer=jsquyres This commit was SVN r31086.	2014-03-17 21:21:54 +00:00
Ralph Castain	7869402f5f	Sigh - looks like I did too good a job of turning things off. Back some of it out in favor of trying again when more time is available Refs trac:4368 This commit was SVN r31017. The following Trac tickets were found above: Ticket 4368 --> https://svn.open-mpi.org/trac/ompi/ticket/4368	2014-03-12 02:10:35 +00:00
Ralph Castain	9c66c4f439	Correctly implement --disable-oshmem and --without-orte so we don't build the disabled section of code. Fix a bunch of code rot in the PMI rte component, and add several missing headers when building --without-orte. NOTE: I transferred the oshmem-disabled-by-default from the 1.7 branch to the trunk to minimize future disruption if/when we change that option. cmr=v1.8:reviewer=jsquyres This commit was SVN r31006.	2014-03-11 22:02:40 +00:00
Adrian Reber	e5bef82ee1	OPAL_ENABLE_FT_CR: remove compiler warnings When compiling --with-ft there are a few compiler warnings about unused variables. This patch fixes those compiler warnings. This commit was SVN r30927.	2014-03-04 15:28:07 +00:00
Ralph Castain	0ac97761cc	Now that we are binding by default, the issue of #slots and what to do when oversubscribed has become a bit more complicated. This isn't a problem in managed environments as we are always provided an accurate assignment for the #slots, or when -host is used to define the allocation since we automatically assume one slot for every time a node is named. The problem arises when a hostfile is used, and the user provides host names without specifying the slots= paramater. In these cases, we assign slots=1, but automatically allow oversubscription since that number isn't confirmed. We then provide a separate parameter by which the user can direct that we assign the number of slots based on the sensed hardware - e.g., by telling us to set the #slots equal to the #cores on each node. However, this has been set to "off" by default. In order to make this a little less complex for the user, set the default such that we automatically set #slots equal to #cores (or #hwt's if use_hwthreads_as_cpus has been set) only for those cases where the user provides names in a hostfile but does not provide slot information. Also cleanup some a couple of issues in the mapping/binding system: * ensure we only override the binding directive if we are oversubscribed and overload is not allowed * ensure that the MPI procs don't attempt to bind themselves if they are launched by an orted as any binding directive (no matter what it was) would have been serviced by the orted on launch * minor cleanup to the warning message when oversubscribed and binding was requested cmr=v1.7.5:reviewer=rhc:subject=update mapping/binding system This commit was SVN r30909.	2014-03-03 16:46:37 +00:00
Ralph Castain	1565816988	Do a little better job of cleaning up the session directory left by mpirun by ensuring we delete the event associated with debugger attachment and unlinking the pipe used for that purpose. Also, we no longer leave "abort" files around, so remove that check when deleting session directory trees cmr=v1.7.5:reviewer=jsquyres:subject=cleanup session directories better This commit was SVN r30689.	2014-02-11 22:16:17 +00:00
Ralph Castain	a49e0db8dd	We haven't supported a c++ wrapper for ORTE in quite some time cmr=v1.7.5:reviewer=ompi-gk1.7:subject=remove c++ cruft This commit was SVN r30653.	2014-02-10 17:16:30 +00:00
Ralph Castain	bc7cc09749	After a lot of pain, I've managed to resolve the problem of conflicting mapping directives caused by mismatched MCA params - i.e., where someone has one variant of an MCA param (e.g., rmaps_base_mapping_policy) in their default MCA param file, and then specifies another variant (e.g., --npernode) on the command line. I can't fully resolve the problem as there is no way to know precisely what the user meant - we can only guess which param was really intended since the MCA param system can't apply its normal precedence rules. So...print a big "deprecated" warning for the old params and error out if a conflict is detected. I know that isn't what people really wanted, but it's the best we can do. If only the old style param is given, then process it after the warning. Extend the current map-by param to add support for ppr and cpus-per-proc, adding the latter to the list of allowed modifiers using "pe=n" for processing elements/proc. Thus, you can map-by socket:pe=2,oversubscribe to map by socket, binding 2 processing elements/process, with oversubscription allowed. Or you can map-by ppr:2:socket:pe=4 to map two processes to every socket in the allocation, binding each process to 4 processing elements. For those wondering, a processing element is defined as a hwthread if --use-hwthreads-as-cpus is given, or else as a core. Refs trac:4117 This commit was SVN r30620. The following Trac tickets were found above: Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117	2014-02-07 21:25:40 +00:00
Jeff Squyres	4edeb229cc	Add MPIEXEC_TIMEOUT environment variable to the man page. cmr=v1.7.4:reviewer=rhc This commit was SVN r30455.	2014-01-28 14:40:17 +00:00
Jeff Squyres	21ffddbbd0	Addendum to r30408: if we're going to remove stale kruft, let's remove all of it. :-) Refs trac:4175. This commit was SVN r30417. The following SVN revision numbers were found above: r30408 --> open-mpi/ompi@31acdb15bc The following Trac tickets were found above: Ticket 4175 --> https://svn.open-mpi.org/trac/ompi/ticket/4175	2014-01-24 22:19:36 +00:00
Ralph Castain	e3cb4b4a5b	Grant Nathan his wish - add an --disable-getpwuid to the configure options and protect all users of that code so it disappears if disabled. cmr=v1.7.5:reviewer=hjelmn:subject=disable getpwuid if requested This commit was SVN r30413.	2014-01-24 19:18:37 +00:00
Ralph Castain	31acdb15bc	We haven't really supported orteCC in a long time, so let's remove the stale cruft. Thanks to Paul Hargrove for noticing! cmr=v1.7.4:reviewer=jsquyres:subject=remove stale orteCC cruft This commit was SVN r30408.	2014-01-24 17:26:54 +00:00
Adrian Reber	0af2897c12	removed trailing whitespaces in orte-checkpoint.c This commit was SVN r30407.	2014-01-24 17:23:49 +00:00
Adrian Reber	659eb1b10a	silence two compiler warnings This commit was SVN r30406.	2014-01-24 17:22:28 +00:00
Adrian Reber	919260a0d2	fix communication between orte-checkpoint and orterun Right after starting the communication with orterun the buffer containing the message is deleted. This patch removes the deletion of the buffer which is now done by orte_rml_send_callback(). This is now also the callback function used by orte_rml.send_buffer_nb(). The previous callback hnp_receiver() was introduced by an earlier patch which only was trying to get the code to compile again. This commit was SVN r30405.	2014-01-24 17:18:28 +00:00
Jeff Squyres	87e476ebd8	Clean up many references to "rank": usually change to "process" and/or specifically delineate that we're referring to the process' rank in MPI_COMM_WORLD. Refs trac:4068 This commit was SVN r30181. The following Trac tickets were found above: Ticket 4068 --> https://svn.open-mpi.org/trac/ompi/ticket/4068	2014-01-09 16:37:49 +00:00
Ralph Castain	2a0e4b5e62	Update the orterun help messages and man page to reflect new map/rank/bind options and defaults. Thanks to Paul Hargrove for reporting it. cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30173.	2014-01-09 04:44:28 +00:00
Jeff Squyres	13b29cff2c	This commit compliements/completes r30140. r30140 made all the configury/Makefile.am changes; this commit renames the internal installdirs.h framework struct field names to match the configry macro names: * pkgdatdir -> ompidatadir * pkglibdir -> ompilibdir * pkgincludedir -> ompiincludedir This commit was SVN r30145. The following SVN revision numbers were found above: r30140 --> open-mpi/ompi@8b778903d8	2014-01-07 23:36:33 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
Ralph Castain	d5a5caa7e0	Restore the bycore mpirun option for backward compatibility Refs trac:4044 cmr=v1.7.4:reviewer=jsquyres This commit was SVN r30103. The following Trac tickets were found above: Ticket 4044 --> https://svn.open-mpi.org/trac/ompi/ticket/4044	2014-01-02 04:16:43 +00:00
Adrian Reber	53a70fe87f	Trying to get the C/R code to compile again. (send__nb) This patch changes all send/send_buffer occurrences in the C/R code to send_nb/send_buffer_nb. The new code compiles but does not work. Changes from V1: #ifdef out the code (so it is preserved for later re-design) * marked the broken C/R code with ENABLE_FT_FIXED Changes from V2: * just replace the blocking calls with the non-blocking calls * all #ifdef's introduced in V1 are gone * send_* returns error code or ORTE_SUCCESS (not the number of bytes) This commit was SVN r30036.	2013-12-20 21:58:28 +00:00
Adrian Reber	a3813d37c7	Trying to get the C/R code to compile again. (recv__nb) This patch changes all recv/recv_buffer occurrences in the C/R code to recv_nb/recv_buffer_nb. The old code is still there but disabled using ifdefs (ENABLE_FT_FIXED). The new code compiles but does not work. Changes from V1: #ifdef out the code (so it is preserved for later re-design) * marked the broken C/R code with ENABLE_FT_FIXED Changes from V2: * only #ifdef out the code where the behaviour is changed (used to be blocking; now non-blocking) This commit was SVN r30035.	2013-12-20 21:05:40 +00:00
Ralph Castain	71b52fe861	Ensure that comm_spawn'd procs get user-specified forwarded envars Thanks to Tim Miller for reporting the regression from the 1.6 series cmr=v1.7.4:reviewer=jsquyres:subject=Ensure that comm_spawn'd procs get user-specified forwarded envars This commit was SVN r30012.	2013-12-20 14:47:35 +00:00
Ralph Castain	d47d2569f3	We stripped the process info packing routine to minimize message size when sending the launch message, but tools still require all the info. So modify the tool-hnp handshake to explicitly add the missing info Refs trac:3992 This commit was SVN r29989. The following Trac tickets were found above: Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992	2013-12-19 20:42:20 +00:00
Ralph Castain	6239e64f36	Further cleanup of orte-ps so it doesn't abort when hitting a stale HNP - only report that event once and just keep working. Refs trac:3992 This commit was SVN r29974. The following Trac tickets were found above: Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992	2013-12-19 03:28:05 +00:00
Brian Barrett	121ca26c59	Per discussion at Develoepr's Meeting, remove Solaris threads support. Solaris will just fall back to pthreads, which should be no problem. This commit was SVN r29893.	2013-12-13 20:07:11 +00:00
Ralph Castain	9604f36c3b	Specify units for the job completion timeout This commit was SVN r29839.	2013-12-08 04:51:58 +00:00
Ralph Castain	62c9e5c64c	Really is better if we output a message indicating that the job was aborted due to hitting the execution time limit Refs trac:3960 This commit was SVN r29833. The following Trac tickets were found above: Ticket 3960 --> https://svn.open-mpi.org/trac/ompi/ticket/3960	2013-12-07 15:33:56 +00:00
Ralph Castain	d44e4a311f	Per request from Dave Goodell, add support for MPIEXEC_TIMEOUT - if set in the environment, terminate the job after the specified number of seconds has passed. Equivalent to MPICH functionality. cmr=v1.7.4:reviewer=dgoodell:subject=add support for MPIEXEC_TIMEOUT This commit was SVN r29831.	2013-12-07 01:58:32 +00:00
Jeff Squyres	ed9aba3896	This patch fixes error: void value not ignored as it ought to be in the C/R code by ignoring the return value of functions which no longer return a value (only void). Signed-off-by: Adrian Reber <adrian.reber@hs-esslingen.de> This commit was SVN r29816.	2013-12-06 14:40:10 +00:00
Brian Barrett	6d7a1fbb82	Move opal_portable_platform.h to opal/include/opal, which is where it really should have been all along and fix one place that uses the file Update opal_portable_platform.h with changes to mpi_portable_platform.h made in r29608. Make mpi_portable_platform.h a symlink to opal_portable_platform.h, so that they won't get out of sync. I'd like to remove mpi_portable_platform.h, but we don't automatically add -I${includedir}/openmpi/ to make that sane from a header include point of view, so that's future work. This commit was SVN r29618. The following SVN revision numbers were found above: r29608 --> open-mpi/ompi@b71bd51cdd	2013-11-06 17:12:26 +00:00
Ralph Castain	eb132f923b	Check for bozo error of negative np for an app as this will cause ORTE to spin forever. cmr:v1.7.3:reviewer=jsquyres:subject=Check for negative np cmr:v1.6.6:reviewer=jsquyres:subject=Check for negative np This commit was SVN r29157.	2013-09-11 19:21:22 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Nathan Hjelm	299d5b3dd7	Fix two debugger attach bugs. - orte_debugger_init_after_spawn was not being called for debuggers that use the MPIR_attach_fifo to co-locate debugger daemons. - MPIR_Breakpoint was not getting called if a debugger reattached. Add a job state (ORTE_JOB_STATE_DEBUGGER_DETACH) to reset mpir_breakpoint_fired to false when a debugger detaches to ensure MPIR_Breakpoint is called if another debugger attaches. Tested with STAT 2.0/launchmon 1.0. cmr:v1.7 This commit was SVN r28665.	2013-06-20 16:18:05 +00:00
Jeff Squyres	089c632cce	Remove a bunch of dead code: gcc 4.7 warns of set-but-unused variables. So get rid of them. This commit was SVN r28538.	2013-05-17 21:45:49 +00:00
Ralph Castain	f15fe5045e	Ensure that debugger connect can occur by getting the rml contact info updated before calling init_after_spawn cmr:v1.7.3,reviewer=jsquyres This commit was SVN r28455.	2013-05-06 22:00:45 +00:00
Ralph Castain	fb2a694587	Fix print This commit was SVN r28446.	2013-05-04 22:37:34 +00:00
Ralph Castain	27e3e382d5	No need for ORTE tools to use orte progress thread This commit was SVN r28445.	2013-05-04 21:13:20 +00:00
Nathan Hjelm	9d4a26f47d	Update OMPI frameworks to use the MCA framework system. Notes: - This commit also eliminates the need for an available components list in use in several frameworks. None of the code in question was making use of the priority field of the priority component list item so these extra lists were removed. - Cleaned up selection code in several frameworks to sort lists using opal_list_sort. - Cleans up the ompi/orte-info functions. Expose the functions that construct the list of params so they can be used elsewhere. patches for mtl/portals4 from brian missed a few output variables in openib This commit was SVN r28241.	2013-03-27 21:17:31 +00:00
Nathan Hjelm	cf377db823	MCA/base: Add new MCA variable system Features: - Support for an override parameter file (openmpi-mca-param-override.conf). Variable values in this file can not be overridden by any file or environment value. - Support for boolean, unsigned, and unsigned long long variables. - Support for true/false values. - Support for enumerations on integer variables. - Support for MPIT scope, verbosity, and binding. - Support for command line source. - Support for setting variable source via the environment using OMPI_MCA_SOURCE_<var name>=source (either command or file:filename) - Cleaner API. - Support for variable groups (equivalent to MPIT categories). Notes: - Variables must be created with a backing store (char *, int , or bool *) that must live at least as long as the variable. - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of mca_base_var_set_value() to change the value. - String values are duplicated when the variable is registered. It is up to the caller to free the original value if necessary. The new value will be freed by the mca_base_var system and must not be freed by the user. - Variables with constant scope may not be settable. - Variable groups (and all associated variables) are deregistered when the component is closed or the component repository item is freed. This prevents a segmentation fault from accessing a variable after its component is unloaded. - After some discussion we decided we should remove the automatic registration of component priority variables. Few component actually made use of this feature. - The enumerator interface was updated to be general enough to handle future uses of the interface. - The code to generate ompi_info output has been moved into the MCA variable system. See mca_base_var_dump(). opal: update core and components to mca_base_var system orte: update core and components to mca_base_var system ompi: update core and components to mca_base_var system This commit also modifies the rmaps framework. The following variables were moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode, rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables. This commit was SVN r28236.	2013-03-27 21:09:41 +00:00
Ralph Castain	6ee32767d4	Restore the cpus-per-proc option for byslot and bynode mapping. Remove the bind_idx (which recorded the index of the hwloc object where the proc was bound) as this would no longer be unique, and just use the bitmap as the standard reference for location. Update the relative locality computation to take bitmaps as its argument. This commit was SVN r28219.	2013-03-26 18:27:50 +00:00
Ralph Castain	a4b6fb241f	Remove all remaining vestiges of the Windows integration This commit was SVN r28137.	2013-02-28 17:31:47 +00:00
Ralph Castain	cf9796accd	Remove the old configure option for disabling full rte support - we now use the OMPI rte framework for such purposes This commit was SVN r28134.	2013-02-28 01:35:55 +00:00
Jeff Squyres	9bd4b814db	Fix one more nroff macro issue This commit was SVN r28090.	2013-02-21 17:38:06 +00:00
Jeff Squyres	76fcd42bc3	Fix minor nroff macro issues. This commit was SVN r28088.	2013-02-21 17:35:36 +00:00
Jeff Squyres	12e047e594	Update documentation for rankfiles in orterun.1: * Add a little more description of what rankfiles are * Update that we use logical numbering for socket:core notation * Mention +nX notation This commit was SVN r28067.	2013-02-16 17:52:30 +00:00
Ralph Castain	c0b670bea8	I guess some profiling tools and debuggers require that the argv[0] of each rank be unique so they can create a filename based on that value. For those obscure cases, provide an mpirun cmd line option that indexes each argv[0] by rank This commit was SVN r28064.	2013-02-15 20:20:49 +00:00
Ralph Castain	744ed49b2d	Begin cleanup of the thread_lock calls in ORTE. We'll ignore the ones in the rml/oob for now as that code block is being rewritten anyway. This commit was SVN r28053.	2013-02-13 01:53:12 +00:00
Brian Barrett	504a6d036f	* Rather than use the extra_includes directive, add the extra includes (which is really just -I${includedir}/openmpi/ for devel headers) to CPPFLAGS, since all the other necessary -Is for devel headers (like libevent and hwloc) are added to CPPFLAGS. * Clean up ${includedir} and ${libdir} for script wrapper compilers * Update script wrapper compilers to work like the C wrapper compilers w.r.t static and dynamic linking * Remove the ORTE script wrapper compilers since they didn't support the ${includedir} stuff and Ralph said they weren't used anymore. This commit was SVN r28052.	2013-02-13 00:33:05 +00:00
Brian Barrett	b8442ba505	Revamp the handling of wrapper compiler flags. The user flags, main configure flags, and mca flags are kept seperate until the very end. The main configure wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD macro. MCA components should either let <framework>_<component>_{LIBS,LDFLAGS} be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}. The situations in which WRAPPER CPPFLAGS can be set by MCA components was made very small to match the one use case where it makes sense. This commit was SVN r27950.	2013-01-29 00:00:43 +00:00
Brian Barrett	f42783ae1a	Move the RTE framework change into the trunk. With this change, all non-CR runtime code goes through one of the rte, dpm, or pubsub frameworks. This commit was SVN r27934.	2013-01-27 23:25:10 +00:00
Brian Barrett	0e799a93c3	Automake will ship the .in file whether or not the conditional is taken, so don't install orte_wrapper_script when it's not used This commit was SVN r27902.	2013-01-24 21:36:25 +00:00
Ralph Castain	6e2cabb87f	Remove duplicate code This commit was SVN r27889.	2013-01-23 02:07:06 +00:00
George Bosilca	e69dc00460	Dont duplicate headers nor global variables. This commit was SVN r27864.	2013-01-18 11:51:56 +00:00
Ralph Castain	c96cc2d5a0	In order to properly connect to debuggers like STAT, we need to get the hostname in its unstripped version for the MPIR_proctab. Unfortunately, we need a stripped version for Cray's alps launcher. So when we are stripping the hostname prefix, retain alias hostnames and add the ability to specify an alias to use in the proctab. This commit was SVN r27863.	2013-01-18 05:00:05 +00:00
Ralph Castain	5b8de0b9f4	Ouch - opal_progress calls event_loop with a NO_BLOCK flag. So when run without progress threads, the ORTE tools were not blocking in the event lib as they should be. Avoid calling opal_progress inside ORTE by directly using the event_loop call instead of ORTE_WAIT_FOR_COMPLETION as parts of the OMPI layer are using that macro. Thanks to George for spotting the problem. This commit was SVN r27815.	2013-01-14 23:06:42 +00:00
Ralph Castain	72bea688f1	Fix typo This commit was SVN r27717.	2012-12-23 18:13:39 +00:00
Ralph Castain	852a709c0e	Add libopen-pal to the libraries as all these tools directly reference OPAL functions, and the list of OS's that don't support indirect linking grows (Mac and Ubuntu, for now). This commit was SVN r27716.	2012-12-23 15:54:05 +00:00
Jeff Squyres	c5b0bcd9f7	Refs trac:3422 * Add some comments in the -wrapper-data-txt.in files just so that someone doesn't forget in the future why we link in what we do in the MPI and ORTE wrapper compilers. Update ompi_wrapper_script.in to match the new behavior. * Update orte_wrapper_script.in to support --openmpi:linkall (which is a no-op in this case) This commit was SVN r27672. The following Trac tickets were found above: Ticket 3422 --> https://svn.open-mpi.org/trac/ompi/ticket/3422	2012-12-14 16:34:20 +00:00
Jeff Squyres	f779b1ded9	Put back the static-library-detection stuff from r27668, with some additional functionality. Rationale (refs trac:3422): * Normal MPI applications only ever use the MPI API. Hence, -lmpi is sufficient (they'll never directly call ORTE or OPAL functions). This is arguably the most common case. * That being said, we do have some test programs (e.g., those in orte/test/mpi) that call MPI functions but also call ORTE/OPAL functions. I've also written the occasional MPI test program that calls opal_output, for example (there even might be a few tests in the IBM test suite that directly call ORTE/OPAL functions). * Even though this is not a common case, these applications should also compile/link with mpicc. * So we should add a --openmpi:linkall option that will also link in whatever is necessary to call ORTE/OPAL functions * Yes, we could hard-code "-lopen-rte -lopen-pal" in Makefiles, but we do reserve the right to change those library names and/or add others someday, so it's better to abstract out the names and let the wrapper supply whatever is necessary. * ORTE programs, however, are different. They almost always call OPAL functions (e.g., if they want to send a message, they must use the OPAL DSS). As such, it seems like the ORTE programs should always link in OPAL. Therefore: * Add undocumented --openmpi:linkall flag to the wrapper compilers. See the comment in opal_wrapper.c for an explanation of what it does. This flag is only intended for Open MPI developers -- not end users. That's why it's undocumented. * Update orte/test/mpi/Makefile.am to add --openmpi:linkall * Make ortecc/ortec++'s wrapper data text files always explicitly link in libopen-pal This commit was SVN r27670. The following SVN revision numbers were found above: r27668 --> open-mpi/ompi@cf845897aa The following Trac tickets were found above: Ticket 3422 --> https://svn.open-mpi.org/trac/ompi/ticket/3422	2012-12-13 22:31:37 +00:00
Jeff Squyres	cf845897aa	Temporarily revert r27662 and r27667 because something wonky is happening on OS X. Grumble... This commit was SVN r27668. The following SVN revision numbers were found above: r27662 --> open-mpi/ompi@97cc916007 r27667 --> open-mpi/ompi@529f6244ca	2012-12-11 23:08:14 +00:00
Jeff Squyres	97cc916007	Per discussion at the Open MPI developer meeting last week: 1. Restore libopen-pal.la, libopen-rte.la, and libmpi.la to be separate entities (i.e., don't have libopen-rte.la include libopen-pal.la, and don't have libmpi.la include libopen-pal.la). Yay! 1. Consequently, make the wrapper compilers look for flags indicating that the user wants to compile statically (currently: -static, !--static, -Bstatic, and "-Wl," in front of all of those). If it is, follow a 6-way matrix for determinining which libraries to list on the underlying command line. 1. To support that, add the name of a token static and dynamic library to look for in each of the wrapper compiler data files. 1. Fix a long-standing typo in the opalcc wrapper data file. This commit was SVN r27662.	2012-12-11 01:46:59 +00:00
Nathan Hjelm	a427a7e727	do not include c99 flag in compiler wrappers This commit was SVN r27625.	2012-11-20 19:33:14 +00:00
Ralph Castain	7a5f6b584c	Have orte-info show thread support as well This commit was SVN r27624.	2012-11-18 18:15:22 +00:00
Ralph Castain	fefec03e78	Enable all ORTE tools to use progress threads if they are enabled This commit was SVN r27593.	2012-11-12 02:54:09 +00:00
Ralph Castain	bd887f7f56	Add a new "test" component to the DFS that treats all files as remote in order to test the app-to-daemon interactions on a single machine. Set a global param to indicate we are using staged execution. Add a param to indicate it is okay for non-MPI processes to execute without finalizing. Cleanup file map load and fetch operations. This commit was SVN r27587.	2012-11-10 14:09:12 +00:00
Ralph Castain	fd632147df	Per patch from Nathan, with a few fixes, cleanup the orte-info tool This commit was SVN r27581.	2012-11-10 04:11:40 +00:00
Nathan Hjelm	2acd0f83de	Revert "Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter". It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now. This commit was SVN r27527. The following SVN revision numbers were found above: r27451 --> open-mpi/ompi@d59034e6ef r27456 --> open-mpi/ompi@ecdbf34937	2012-10-30 19:45:18 +00:00
Ralph Castain	a080de188f	Enable orterun to directly support staged execution, treating each app as a separate job. Support transfer of file maps when support exists. This commit was SVN r27516.	2012-10-29 23:11:30 +00:00
Ralph Castain	e6014bf2e1	Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter This commit was SVN r27477. The following SVN revision numbers were found above: r27451 --> open-mpi/ompi@d59034e6ef r27456 --> open-mpi/ompi@ecdbf34937	2012-10-24 18:38:44 +00:00
Nathan Hjelm	d59034e6ef	MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions. cmr:v1.7 This commit was SVN r27451.	2012-10-17 20:17:37 +00:00
Ralph Castain	9daaa001d9	Remove tools that are no longer required This commit was SVN r27383.	2012-09-29 17:33:16 +00:00
Jeff Squyres	fb2e543a57	Refs trac:3275. We ran into a case where the OMPI SVN trunk grew a new acceptable MCA parameter value, but this new value was not accepted on the v1.6 branch (hwloc_base_mem_bind_failure_action -- on the trunk it accepts the value "silent", but on the older v1.6 branch, it doesn't). If you set "hwloc_base_mem_bind_failure_action=silent" in the default MCA params file and then accidentally ran with the v1.6 branch, every OMPI executable (including ompi_info) just failed because hwloc_base_open() would say "hey, 'silent' is not a valid value for hwloc_base_mem_bind_failure_action!". Kaboom. The only problem is that it didn't give you any indication of where this value was being set. Quite maddening, from a user perspective. So we changed the ompi_info handles this case. If any framework open function return OMPI_ERR_BAD_PARAM (either because its base MCA params got a bad value or because one of its component register/open functions return OMPI_ERR_BAD_PARAM), ompi_info will stop, print out a warning that it received and error, and then dump out the parameters that it has received so far in the framework that had a problem. At a minimum, this will show the user the MCA param that had an error (it's usually the last one), and ''where it was set from'' (so that they can go fix it). We updated ompi_info to check for O???_ERR_BAD_PARAM from each from the framework opens. Also updated the doxygen docs in mca.h for this O???_BAD_PARAM behavior. And we noticed that mca.h had MCA_SUCCESS and MCA_ERR_??? codes. Why? I think we used them in exactly one place in the code base (mca_base_components_open.c). So we deleted those and just used the normal OPAL_* codes instead. While we were doing this, we also cleaned up a little memory management during ompi_info/orte-info/opal-info finalization. Valgrind still reports a truckload of memory still in use at ompi_info termination, but they mostly look to be components not freeing memory/resources properly (and outside the scope of this fix). This commit was SVN r27306. The following Trac tickets were found above: Ticket 3275 --> https://svn.open-mpi.org/trac/ompi/ticket/3275	2012-09-11 20:47:24 +00:00
Jeff Squyres	a8f8064d8b	Add a missing free(). Refs trac:3292. This commit was SVN r27298. The following Trac tickets were found above: Ticket 3292 --> https://svn.open-mpi.org/trac/ompi/ticket/3292	2012-09-11 17:59:40 +00:00
Ralph Castain	6d29cecce1	Fix the help message warning of multiple prefixes so it correctly prints out the info, and fix a typo. cmr:v1.7 This commit was SVN r27241.	2012-09-05 16:28:36 +00:00
Ralph Castain	bae5dab916	If (and only if) a user requests, set the default number of slots on any node to the number of objects of the specified type. This only takes effect in an unmanaged environment - i.e., if an external resource manager assigns us a number of slots, then that is what we use. However, if we are using a hostfile, then the user may or may not have given us a value for the number of slots on each node. For those nodes (and only those nodes) where the user does not specify a slot count, we will set the number of slots according to their direction: either to the number of cores, numas, sockets, or hwthreads. Otherwise, the slot count is set to 1. Note that the default behavior remains unchanged: in the absence of any value for #slots, and in the absence of any directive to set #slots, we will set #slots=1. This commit was SVN r27236.	2012-09-04 20:58:26 +00:00
Ralph Castain	98580c117b	Introduce staged execution. If you don't have adequate resources to run everything without oversubscribing, don't want to oversubscribe, and aren't using MPI, then staged execution lets you (a) run as many procs as there are available resources, and (b) start additional procs as others complete and free up resources. Adds a new mapper as well as a new state machine. Remove some stale configure.m4's we no longer need. Optimize the nidmaps a bit by only sending info that has changed each time, instead of sending a complete copy of everything. Makes no difference for the typical MPI job - only impacts things like staged execution where we are sending multiple (possibly many) launch messages. This commit was SVN r27165.	2012-08-28 21:20:17 +00:00
Ralph Castain	e0c39c94e8	Complete the cleanup of the preload files system. Remove the dest_dir option as moving things to arbitrary locations - especially absolute paths - can prove disastrous. Remove the preload_libs option as these can be treated as just files. Cleanup some of the pack/unpack code as the dss handles NULL strings just fine. Deal a little better with absolute paths, noting that tar now strips the leading '/' for us (showing my age as it didn't used to do so). Remove the odls_base_state.c file as that code is now covered by the new broadcast form of preload_files. This commit was SVN r27127.	2012-08-24 02:28:29 +00:00
Ralph Castain	b4a544ad2a	Per discussion with Josh, use the --preload-xxx cmd line options to broadcast files to all nodes. Add --set-cwd-to-session-dir option to start procs in their session directories. Add OMPI_FILE_LOCATION envar to tell procs where their prepositioned files went. This commit was SVN r27125.	2012-08-23 21:28:05 +00:00
Ralph Castain	a572b6fa9f	Pick the right place This commit was SVN r27085.	2012-08-17 00:28:28 +00:00
Ralph Castain	35fef87202	Make the "no virtual machine" selection more intuitive by providing a --novm option to mpirun. This commit was SVN r27048.	2012-08-15 14:55:03 +00:00
Ralph Castain	589acf550c	Improve the new MPI_INFO_ENV to better handle Java applications and to correctly report the info for singletons. This commit was SVN r27025.	2012-08-13 22:13:49 +00:00
Jeff Squyres	3719b6c68b	After some further discussion between Jeff, Ralph, and Josh, rever r26951. The feeling is that fixing the actual problem of the command line parser not always identifying when invalid command line options were specified (i.e., r26953) was a better solution. This commit was SVN r26979. The following SVN revision numbers were found above: r26951 --> open-mpi/ompi@1f8df92c3c r26953 --> open-mpi/ompi@0b7b3feba9	2012-08-09 20:56:01 +00:00
Ralph Castain	1f8df92c3c	Remove the confusion over which options are "to" and which are "by" by creating synonyms so that either spelling works. This commit was SVN r26951.	2012-08-05 14:40:38 +00:00
Ralph Castain	c7f9a0fa34	Check for recursive use of mpirun - issue error message and abort if detected This commit was SVN r26903.	2012-07-28 21:50:56 +00:00
Abhishek Kulkarni	5c58a1c9c1	Fix C/R support in the trunk. Among other things, this patch deals with the following issues: * fix ompi-checkpoint argument parsing * ompi-restart -showme prints an extraneous "Restarted child with PID" message. Move around the debug statement to avoid this. * fixes for the state machine changes This commit was SVN r26770.	2012-07-09 23:34:13 +00:00
Ralph Castain	e6f3586415	Remove the orte notifier framework, per discussion at the devel meeting and follow-up with Jeff (who took the action item) This commit was SVN r26637.	2012-06-22 18:09:23 +00:00
Brian Barrett	9af72072a3	Use MKDIR_P instead of mkdir_p in Makefiles, as MKDIR_P is the only one defined in recent versions of AC/AM. This commit was SVN r26625.	2012-06-21 16:52:37 +00:00
Ralph Castain	0a713cd27e	Add database framework to ORTE and refactor modex code to utilize it. Create the "hash" db component from the prior modex db code. Leave the other components ignored for now - will activate them later. Modex is still a blocking operation at this point. This commit was SVN r26618.	2012-06-19 13:38:42 +00:00
Ralph Castain	269cb2b8d9	Some cleanup to remove calls to opal_progress when running with orte progress threads, and to ensure that all orte-related events are in the orte event base. This commit was SVN r26591.	2012-06-11 19:59:53 +00:00
Ralph Castain	d6279fc971	Fix the debugger daemon launch support to fit the new state machine. Treat debugger daemons just like any other job, except that we map them only to nodes where an app process currently exists (as opposed to every node in the system). Trigger breakpoint and rank0 release only after the debugger daemons are in position. This commit was SVN r26556.	2012-06-06 02:01:23 +00:00
Jeff Squyres	99c5afb397	Remove clang compiler warnings. This commit was SVN r26523.	2012-05-29 23:36:06 +00:00
Ralph Castain	be6ed9c2df	Allow partial use of allocations by specifying the max number of daemons (i.e., max VM size) for the job This commit was SVN r26499.	2012-05-27 16:48:19 +00:00
Jeff Squyres	7969faf372	Fixes trac:3057: minor update to the man page to state that slot locations in rankfiles use ''physical'' device indexes (vs. logical indexes). This commit was SVN r26478. The following Trac tickets were found above: Ticket 3057 --> https://svn.open-mpi.org/trac/ompi/ticket/3057	2012-05-23 11:43:33 +00:00
Ralph Castain	b217124bd8	Symlink instead of copy This commit was SVN r26464.	2012-05-21 23:07:48 +00:00
Ralph Castain	da3873af6f	Rename the mapreduce tool to "mr+" per the marketing types This commit was SVN r26463.	2012-05-21 21:17:44 +00:00
Ralph Castain	a526afae92	Ensure we always cleanup local procs, no matter how we exited. This commit was SVN r26454.	2012-05-18 23:37:40 +00:00
Ralph Castain	12ebc0e269	Don't need this to be a bin program as the class is captured in the jar This commit was SVN r26453.	2012-05-18 23:37:18 +00:00
Ralph Castain	b16e43f489	Silence a warning on Mac This commit was SVN r26449.	2012-05-18 15:27:04 +00:00
Ralph Castain	ca1b325738	Tweak the java setup so it works better on Mac. Only build mapreduce and allocators if hadoop support was requested. This commit was SVN r26448.	2012-05-18 01:02:01 +00:00
Jeff Squyres	2d78728d38	Fix the macro name in the comment: it's EXTRA_DIST, not EXTRA_SOURCES. This commit was SVN r26429.	2012-05-10 14:07:36 +00:00
Jeff Squyres	b325c17c72	It's a little weird to put in a blank _SOURCES line for the HDFSFileFinder PROGRAM, but if we don't put in a _SOURCES line at all, Automake will default to "HDFSFileFinter_class_SOURCES = HDFSFileFinder.c", which clearly will cause problems. But we don't want to put the .java file in _SOURCES, either, because we haven't configured Automake to handle Java (because current versions of Automake only have GCJ, not other Java compilers). So set HDFSFileFinder_class_SOURCES to blank and list the .java file in EXTRA_SOURCES (so that they get picked up for "make dist"). This commit was SVN r26424.	2012-05-10 13:54:51 +00:00
Ralph Castain	640f0610aa	Fix the makefile to install the perl scripts properly This commit was SVN r26416.	2012-05-09 14:06:02 +00:00
Ralph Castain	fd796cce0a	Add an allocator tool for finding HDFS file locations and obtaining allocations for those nodes (supports both Hadoop 1 and 2). Split the Java support into two parts: detection of Java support and request for Java bindings. This commit was SVN r26414.	2012-05-09 01:13:49 +00:00
Jeff Squyres	2ba10c37fe	Per RFC, bring in the following changes: * Remove paffinity, maffinity, and carto frameworks -- they've been wholly replaced by hwloc. * Move ompi_mpi_init() affinity-setting/checking code down to ORTE. * Update sm, smcuda, wv, and openib components to no longer use carto. Instead, use hwloc data. There are still optimizations possible in the sm/smcuda BTLs (i.e., making multiple mpools). Also, the old carto-based code found out how many NUMA nodes were ''available'' -- not how many were used ''in this job''. The new hwloc-using code computes the same value -- it was not updated to calculate how many NUMA nodes are used ''by this job.'' * Note that I cannot compile the smcuda and wv BTLs -- I ''think'' they're right, but they need to be verified by their owners. * The openib component now does a bunch of stuff to figure out where "near" OpenFabrics devices are. '''THIS IS A CHANGE IN DEFAULT BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors (I do not have a NUMA machine with an OpenFabrics device that is a non-uniform distance from multiple different NUMA nodes). * Completely rewrite the OMPI_Affinity_str() routine from the "affinity" mpiext extension. This extension now understands hyperthreads; the output format of it has changed a bit to reflect this new information. * Bunches of minor changes around the code base to update names/types from maffinity/paffinity-based names to hwloc-based names. * Add some helper functions into the hwloc base, mainly having to do with the fact that we have the hwloc data reporting ''all'' topology information, but sometimes you really only want the (online \| available) data. This commit was SVN r26391.	2012-05-07 14:52:54 +00:00
Ralph Castain	b2f77bf08f	Extend the iof by adding two new components to support map-reduce IO chaining. Add a mapreduce tool for running such applications. Fix the state machine to support multiple jobs being simultaneously launched as this is not only required for mapreduce, but can happen under comm-spawn applications as well. This commit was SVN r26380.	2012-05-02 21:00:22 +00:00
Ralph Castain	a927318ea1	Add -N option as synonym for "npernode" This commit was SVN r26367.	2012-05-01 16:18:14 +00:00
Jeff Squyres	501a86afe1	No need to include the generated files in the tarball. Thanks to Eugene for pointing this out. This commit was SVN r26339.	2012-04-25 14:19:18 +00:00
Ralph Castain	5d14fa7546	Fix mpi_abort, minimize error output. This commit was SVN r26266.	2012-04-11 14:37:08 +00:00
Ralph Castain	14d5525fb1	Some minor cleanups. Get singletons working. Cleanup abort handling so it gets properly identified. This commit was SVN r26261.	2012-04-10 19:08:54 +00:00
Ralph Castain	bd8b4f7f1e	Sorry for mid-day commit, but I had promised on the call to do this upon my return. Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code. Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch. This commit was SVN r26242.	2012-04-06 14:23:13 +00:00
Ralph Castain	6dc44dc4b8	Look at the basename of the appname for the "java" keyword This commit was SVN r26190.	2012-03-24 00:38:18 +00:00
Josh Hursey	a595525366	Add the callers name to the 'comm failed' error message, so we know between which two peers the communication failed. This commit was SVN r26117.	2012-03-08 21:55:19 +00:00
Ralph Castain	c3cf46af65	Ensure install_dirs are filled in before parsing prefix This commit was SVN r26093.	2012-03-03 23:14:15 +00:00
Ralph Castain	b8f093d1a0	Switch precedence - take the --prefix value over the absolute-path-to-mpirun so the backend prefix can be different from that of mpirun on hetero machines. This commit was SVN r26085.	2012-03-02 22:59:13 +00:00
Ralph Castain	6c93dd13b0	Cleanup the prefix handling by mpirun. Important note: we do NOT support per-app_context prefixes!! Don't let app_files trump given prefix values. Assign according to following precedence rules: 1. absolute path to mpirun, if given 2. --prefix value, if given to mpirun 3. default prefix, if configured with --enable-orterun-prefix-default 4. prefix from first app in app_file, if given 5. no prefix This commit was SVN r26081.	2012-03-02 19:48:25 +00:00
Jeff Squyres	97b3603036	A bunch of fixes and improvements to Open MPI's various command line tools. * fixed some bugs where "unknown" tokens were allowed on the command line (which should really only be used for ortertun). * if an unknown token is encountered, print a short error to stderr and quit with a nonzero exit status * if we don't find the right number of parameters to an option, print a short error to stderr and quit with a nonzero exit status * when --help is given, print the help message to stdout (not stderr) and quit with a zero exit status * added --showme:help option to the wrapper compilers * updated docs in opal/util/cmd_line.h * other small/miscellaneous CLI parsing bugs in various tools I won't bore you with what we did before. :-) Here's some examples of what the new behavior looks like: {{{ % ompi_info --bogus ompi_info: Error: unknown option "--bogus" Type 'ompi_info --help' for usage. % ompi_info --param bogus ompi_info: Error: option "--param" did not have enough parameters (2) Type 'ompi_info --help' for usage. % }}} This commit was SVN r26072.	2012-02-29 17:52:38 +00:00
Ralph Castain	bc5886707f	Document the mpirun exit status behavior This commit was SVN r26009.	2012-02-22 23:47:00 +00:00
Ralph Castain	47c64ec837	Roll in Java bindings per telecon discussion. Man pages still under revision This commit was SVN r25973.	2012-02-20 22:12:43 +00:00
Ralph Castain	d7d8a8cdf7	Some cleanup of the tmpdir session directory specifications. Remove the --tmpdir option from orterun as it was confusing. Create an orte_local_tmpdir_base mca param in its place. Clarify the role of the local vs remote vs global tmpdir base params, and ensure that you don't set conflicting options. Remove the OMPI_PREFIX_ENV environmental variable as that was totally confusing as a way of setting a tmpdir base location. This commit was SVN r25941.	2012-02-16 16:10:01 +00:00
Jeff Squyres	54cf60eb4b	$(RM) is not a standard macro. Just use "rm" -- every platform has it. This commit was SVN r25934.	2012-02-15 19:51:59 +00:00
Jeff Squyres	ae9503db6e	Remove the sentence that says that --prefix is a per-context option. This commit was SVN r25932.	2012-02-15 18:31:27 +00:00
Ralph Castain	61ac2bb11b	If no session directories are being created, then we cannot create the debugger attachment fifo - so don't complain about it. This commit was SVN r25802.	2012-01-27 04:05:23 +00:00
Ralph Castain	6db8c56cd4	Add local and node ranks to debugger daemon procs so the odls properly launches them This commit was SVN r25774.	2012-01-25 03:17:10 +00:00
Ralph Castain	bf09133631	Correctly track the number of debugger daemons being spawned This commit was SVN r25741.	2012-01-19 18:17:07 +00:00
Ralph Castain	6235a355de	Correctly handle co-spawning of daemons when attaching to a running job. We cannot use the general process mappers as we only want debugger daemons spawned on nodes where application procs already exist. So custom build the map for the debugger daemon job, and have the plm just launch that job without doing its usual vm-spawn step. This commit was SVN r25736.	2012-01-18 00:19:49 +00:00
Ralph Castain	fd0d9f73c6	Make preload_binaries an MCA param so it can be set in the default MCA parameters for a system This commit was SVN r25728.	2012-01-17 17:16:05 +00:00
Shiqing Fan	f57f873404	Disable the debugger support for Windows. This commit was SVN r25725.	2012-01-17 16:21:33 +00:00
Ralph Castain	ce7ddd0e10	Create the debugger attach fifo unless the user requests that we periodically poll insteaad. This commit was SVN r25714.	2012-01-11 19:44:22 +00:00
Ralph Castain	bf103de66c	My apologies for doing this outside of the usual time restrictions, but we need to get this in so we can make progress. Move the ORTE-level debugger code back into orterun and out of the ORTE library to resolve symbol conflicts. This commit was SVN r25713.	2012-01-11 15:53:09 +00:00
Jeff Squyres	a4c8bb27fa	Pull in the MPIR_Breakpoint symbol via a dummy function in debuggers_base_fns.c: orte_debugger_base_pull_mpir_breakpoint(). This commit was SVN r25660.	2011-12-15 18:39:34 +00:00
Nathan Hjelm	9dec101043	fix totalview launch through --debug This commit was SVN r25654.	2011-12-15 15:19:13 +00:00
Ralph Castain	f531b09a8d	Correctly handle -host and -hostfile options. Ensure the initial vm launch constrains itself to the union of specified hosts if those options are given. Get oversubscribe set correctly for that case. This commit was SVN r25648.	2011-12-14 20:01:15 +00:00
Ralph Castain	7510339725	Remove stale orte_vm_launch param. Add a param that allows users to specify envars to forward/set so they can do it in the MCA param file instead of only via mpirun cmd line. This commit was SVN r25580.	2011-12-06 21:31:22 +00:00
Ralph Castain	90b7f2a7bf	The rest of the multi app_context fix. Remove the restriction on number of app_contexts that can have zero np specified as multiple mappers now support that use-case. Update the ranking algorithms to respect and track bookmarks. Ensure we properly set the oversubscribed flag on a per-node basis. This commit was SVN r25578.	2011-12-06 17:28:29 +00:00
Ralph Castain	6fefe236a4	Warn users if they set opal_paffinity_alone, either to true or false, that this parameter is no longer functional - they must use the --bind-to option and its corresponding mca param. This commit was SVN r25567.	2011-12-03 01:10:52 +00:00
Ralph Castain	c56acf60ca	Although we never really thought about it, we made an unconscious assumption in the mapper system - we assumed that the daemons would be placed on nodes in the order that the nodes appear in the allocation. In other words, we assumed that the launch environment would map processes in node order. Turns out, this isn't necessarily true. The Cray, for example, launches processes in a toroidal pattern, thus causing the daemons to wind up somewhere other than what we thought. Other environments (e.g., slurm) are also capable of such behavior, depending upon the default mapping algorithm they are told to use. Resolve this problem by making the daemon-to-node assignment in the affected environments when the daemon calls back and tells us what node it is on. Order the nodes in the mapping list so they are in daemon-vpid order as opposed to the order in which they show in the allocation. For environments that don't exhibit this mapping behavior (e.g., rsh), this won't have any impact. Also, clean up the vm launch procedure a little bit so it more closely aligns with the state machine implementation that is coming, and remove some lingering "slave" code. This commit was SVN r25551.	2011-11-30 19:58:24 +00:00
Ralph Castain	b475421c16	As promised, rationalize the rsh support. Remove rshbase and the base rsh support, centralizing all rsh support into the rsh component. Remove the "slave" launch support as that experiment is complete. Fix tree spawn and make that the default method for rsh launch, turning it "off" for qrsh as that system does not support tree spawn. This commit was SVN r25507.	2011-11-26 02:33:05 +00:00
Ralph Castain	9b59d8de6f	This is actually a much smaller commit than it appears at first glance - it just touches a lot of files. The --without-rte-support configuration option has never really been implemented completely. The option caused various objects not to be defined and conditionally compiled some base functions, but did nothing to prevent build of the component libraries. Unfortunately, since many of those components use objects covered by the option, it caused builds to break if those components were allowed to build. Brian dealt with this in the past by creating platform files and using "no-build" to block the components. This was clunky, but acceptable when only one organization was using that option. However, that number has now expanded to at least two more locations. Accordingly, make --without-rte-support actually work by adding appropriate configury to prevent components from building when they shouldn't. While doing so, remove two frameworks (db and rmcast) that are no longer used as ORCM comes to a close (besides, they belonged in ORCM now anyway). Do some minor cleanups along the way. This commit was SVN r25497.	2011-11-22 21:24:35 +00:00
Ralph Castain	6310361532	At long last, the fabled revision to the affinity system has arrived. A more detailed explanation of how this all works will be presented here: https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement The wiki page is incomplete at the moment, but I hope to complete it over the next few days. I will provide updates on the devel list. As the wiki page states, the default and most commonly used options remain unchanged (except as noted below). New, esoteric and complex options have been added, but unless you are a true masochist, you are unlikely to use many of them beyond perhaps an initial curiosity-motivated experimentation. In a nutshell, this commit revamps the map/rank/bind procedure to take into account topology info on the compute nodes. I have, for the most part, preserved the default behaviors, with three notable exceptions: 1. I have at long last bowed my head in submission to the system admin's of managed clusters. For years, they have complained about our default of allowing users to oversubscribe nodes - i.e., to run more processes on a node than allocated slots. Accordingly, I have modified the default behavior: if you are running off of hostfile/dash-host allocated nodes, then the default is to allow oversubscription. If you are running off of RM-allocated nodes, then the default is to NOT allow oversubscription. Flags to override these behaviors are provided, so this only affects the default behavior. 2. both cpus/rank and stride have been removed. The latter was demanded by those who didn't understand the purpose behind it - and I agreed as the users who requested it are no longer using it. The former was removed temporarily pending implementation. 3. vm launch is now the sole method for starting OMPI. It was just too darned hard to maintain multiple launch procedures - maybe someday, provided someone can demonstrate a reason to do so. As Jeff stated, it is impossible to fully test a change of this size. I have tested it on Linux and Mac, covering all the default and simple options, singletons, and comm_spawn. That said, I'm sure others will find problems, so I'll be watching MTT results until this stabilizes. This commit was SVN r25476.	2011-11-15 03:40:11 +00:00
Ralph Castain	729935dffb	Minor cleanups, mirroring what Jeff did to ompi_info This commit was SVN r25438.	2011-11-05 00:42:49 +00:00
Ralph Castain	fcee46b063	Add an option for printing a diffable process map for testing mappers This commit was SVN r25428.	2011-11-03 14:22:07 +00:00
Ralph Castain	d28dd55d33	Minimize the amount of topology info returned by the daemons. Most clusters, especially at scale, use the same node topology on every node, so there is no re ason to return the topology from every daemon. Borrow a page from the --hetero-apps page and let users indicate that the node topology differs by adding a -- hetero-nodes option to mpirun. If the option is set, then every daemon returns topology info. If not set, then only daemon vpid=1 returns it. We always want one daemon to return the topology as the head node is often different from the compute nodes. Having one daemon return the compute node topolo gy allows us to detect any such difference. All compute nodes are then set to the same topology. This commit was SVN r25408.	2011-11-01 18:43:10 +00:00
Ralph Castain	648c85b41b	Add a simple pattern mapper as an example of how to use the topology info to create desired mappings. Let the user specify a pattern based on resource types, and map that pattern across all available nodes as resources permit. Don't automatically display the topology for each node when --display-devel-map is set as it can overwhelm the reader. Use a separate flag --display-topo to get it. This commit was SVN r25396.	2011-10-29 15:12:45 +00:00
Jeff Squyres	ecd603256a	* Rename opal_hwloc_components to opal_hwloc_base_components * Fix some comments This commit was SVN r25150.	2011-09-17 11:54:36 +00:00
Ralph Castain	92c7372e20	Per the RFC from Jeff, move hwloc from opal/mca/common to its own static framework ala libevent. Have ORTE daemons collect the topology info at startup and, if --enable-hwloc-xml is set, send that info back to the HNP for later use. The HNP only retains unique topology "templates" to reduce memory footprint. Have the daemon include the local topology info in the nidmap buffer sent to each app so the apps don't all hammer the local system to discover it for themselves. Remove the sysinfo framework as hwloc replaces that functionality. This commit was SVN r25124.	2011-09-11 19:02:24 +00:00
Wesley Bland	4e7ff0bd5e	By popular demand the epoch code is now disabled by default. To enable the epochs and the resilient orte code, use the configure flag: --enable-resilient-orte This will define both: ORTE_ENABLE_EPOCH ORTE_RESIL_ORTE This commit was SVN r25093.	2011-08-26 22:16:14 +00:00
Shiqing Fan	6d0ab9bd6c	One library was missing for linking orterun on Windows. This commit was SVN r25057.	2011-08-18 09:33:41 +00:00
Shiqing Fan	3af7c9f7bb	Complete the MinGW build support on Windows. This commit was SVN r25048.	2011-08-15 09:47:23 +00:00
Ralph Castain	715f871605	Ignore the daemon job when reporting parseable output This commit was SVN r24944.	2011-07-25 20:44:08 +00:00
Ralph Castain	199804fc35	complete implementation of parseable output This commit was SVN r24929.	2011-07-23 22:23:24 +00:00
Ralph Castain	00647fa342	Update orte-ps to add parseable output - not fully tested because I couldn't get other parts of the system to work. This commit was SVN r24927.	2011-07-23 20:20:31 +00:00
Ralph Castain	1ad110d2e9	After a nice, calm, rational discussion between Brian, Jeff, and myself, we decided to revert r24864 and r24862 to restore the reference counters in opal_init/finalize. The rationale was that we should instead change orte_init/finalize to also use reference counters to support multi-embedded libraries. Jeff and Brian will discuss proposing a similar change to mpi_init/finalize to the MPI Forum so that all three libraries will behave in similar manners. It was agreed that opal_init_util had wound up being used in unintended ways, which raised the problem of getting reference counts to work right. However, fixing it would involve more pain than it was worth - and so long as the other layers are made to behave similarly, I have no preference either way. Complete implementation will follow - for now, this just reverts the prior changes. This commit was SVN r24886. The following SVN revision numbers were found above: r24862 --> open-mpi/ompi@aa92e0c4eb r24864 --> open-mpi/ompi@a5062385c2	2011-07-12 17:07:41 +00:00
Ralph Castain	aa92e0c4eb	Replace a useless counter with a boolean check to see if we have already passed thru opal_finalize so we don't call finalize, and then don't pass thru it (as was happening on several tools) This commit was SVN r24862.	2011-07-08 06:43:19 +00:00
Wesley Bland	e1ba09ad51	Add a resilience to ORTE. Allows the runtime to continue after a process (or ORTED) failure. Note that more work will be necessary to allow the MPI layer to take advantage of this. Per RFC: http://www.open-mpi.org/community/lists/devel/2011/06/9299.php This commit was SVN r24815.	2011-06-23 20:38:02 +00:00
Samuel Gutierrez	81f38b258a	commit of new shared memory backing facility framework (shmem) and its components. This commit was SVN r24795.	2011-06-21 15:41:57 +00:00

... 2 3 4 5 6 ...

994 Коммитов