openmpi

Автор	SHA1	Сообщение	Дата
Rainer Keller	bff1b2a22b	- Finally add the missing opal/util/output.h for the OPAL_OUTPUT_VERBOSE macro. - ompi/errhandler/errhandler_predefined.h: Well, just the missing fwd declarations... This commit was SVN r20820.	2009-03-17 22:37:15 +00:00
Rainer Keller	6f808d9b05	Preparation work for another commit (after RFC): - This patch solely _adds_ required headers and is rather localized The next patch (after RFC) heavily removes headers (based on script) - ompi/communicator/communicator.h: For sources that use ompi_mpi_comm_world, don't require them to include "mpi.h" - ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs #include "ompi/mca/topo/topo.h" - ompi/errhandler/errhandler_predefined.h: ompi/communicator/communicator.h depends on this header file! To prevent recursion just have fwd declarations. #include "ompi/types.h" for fwd declarations of the main structs. - ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t - ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and ompi_rb_tree_t, so have the proper classes - ompi/mca/op/op.h: Op is pretty self-contained: Nobody up to now has done #include "opal/class/opal_object.h" - ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h: #include "opal/types.h" for ompi_ptr_t - ompi/mca/pml/base/base.h: We use opal_lists - ompi/mca/pml/dr/pml_dr_vfrag.h: #include "opal/types.h" for ompi_ptr_t - ompi/mca/pml/ob1/pml_ob1_hdr.h: #include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t - opal/dss/dss_unpack.c: #include "opal/types.h" - opal/mca/base/base.h: #include "opal/util/cmd_line.h" for opal_cmd_line_t - orte/mca/oob/tcp/oob_tcp.c: #include "opal/types.h" for opal_socklen_t - orte/mca/oob/tcp/oob_tcp.h: #include "opal/threads/threads.h" for opal_thread_t - orte/mca/oob/tcp/oob_tcp_msg.c: #include "opal/types.h" - orte/mca/oob/tcp/oob_tcp_peer.c: #include "opal/types.h" for opal_socklen_t - orte/mca/oob/tcp/oob_tcp_send.c: #include "opal/types.h" - orte/mca/plm/base/plm_base_proxy.c: #include "orte/util/name_fns.h" for ORTE_NAME_PRINT - orte/mca/rml/base/rml_base_receive.c: #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE - orte/mca/rml/oob/rml_oob_recv.c: #include "opal/types.h" for ompi_iov_base_ptr_t - orte/mca/rml/oob/rml_oob_send.c: #include "opal/types.h" for ompi_iov_base_ptr_t - orte/runtime/orte_data_server.c #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE - orte/runtime/orte_globals.h: #include "orte/util/name_fns.h" for ORTE_NAME_PRINT Tested on Linux/x86-64 This commit was SVN r20817.	2009-03-17 21:34:30 +00:00
Rainer Keller	6a72c0f4d1	- As long as a header declares _DECLSPEC functionality it should include the corresponding _config.h header file. Tested on Linux/x86-64 This commit was SVN r20795.	2009-03-17 01:45:19 +00:00
Rainer Keller	d8cf4c0fec	- Get pgcc on XT to complain less: In case we use memcmp, strlen, strup and friends include <string.h> Also several constants.h are not included directly - Let's have mca_topo_base_cart_create return ompi-errors in ompi/mca/topo/base/topo_base_cart_create.c This commit was SVN r20773.	2009-03-13 02:10:32 +00:00
Rainer Keller	ec0ed48718	- Revert r20739 This commit was SVN r20742. The following SVN revision numbers were found above: r20739 --> open-mpi/ompi@781caee0b6	2009-03-05 21:56:03 +00:00
Rainer Keller	a94438343b	- Revert r20740 This commit was SVN r20741. The following SVN revision numbers were found above: r20740 --> open-mpi/ompi@2a70618a77	2009-03-05 21:50:47 +00:00
Rainer Keller	2a70618a77	- Second patch, as discussed in Louisville. Replace short macros in orte/util/name_fns.h to the actual fct. call. - Compiles on linux/x86-64 This commit was SVN r20740.	2009-03-05 21:14:18 +00:00
Rainer Keller	781caee0b6	- First of two or three patches, in orte/util/proc_info.h: Adapt orte_process_info to orte_proc_info, and change orte_proc_info() to orte_proc_info_init(). - Compiled on linux-x86-64 - Discussed with Ralph This commit was SVN r20739.	2009-03-05 20:36:44 +00:00
Ralph Castain	f11931306a	Modify the accounting system to recycle jobids. Properly recover resources from nodes and jobs upon completion. Adjustments in several places were required to deal with sparsely populated job, node, and proc arrays as a result of this change. Correct an error wrt how jobids were being computed. Needed to ensure that the job family field was not overrun as we increment jobids for comm_spawn. Update the slurm plm module so it uses the new slurm termination procedure (brings trunk back into alignment with 1.3 branch). Update the slurmd ess component so it doesn't get selected if we are running a singleton inside of a slurm allocation. Cleanup HNP init by moving some code that had been in orte_globals.c for historical reasons into the ess hnp module, and removing the call to that code from the ess_base_std_prolog NOTE: this change allows orte to support an infinite aggregate number of comm_spawn's, with up to 64k being alive at any one instant. HOWEVER, the MPI layer currently does -not- support re-use of jobids. I did some prototype coding to revise the ompi_proc_t structures, but the BTLs are caching their own data, and there was no readily apparent way to update it. Thus, attempts to spawn more than the 64k limit will abort to avoid causing the MPI layer to hang. This commit was SVN r20700.	2009-03-03 16:39:13 +00:00
Ralph Castain	b8ffa302da	Separate abnormal job termination from abnormal orted termination so we can continue to use xcast for orted cmds, but can know to turn off reading of stdin as the job is being terminated. This commit was SVN r20650.	2009-02-27 10:16:25 +00:00
Rainer Keller	04567d3af0	- Header orte/mca/errmgr/errmgr.h is not needed. Once again compiles fine with -Wimplicit-function-declaration This commit was SVN r20640.	2009-02-26 04:05:30 +00:00
Rainer Keller	96e1b9b747	- Header orte/mca/rml/rml.h is not needed if no occurence of orte_rml or ORTE_RML. As the others compiles fine with -Wimplicit-function-declaration This commit was SVN r20639.	2009-02-26 03:52:31 +00:00
Ralph Castain	dcff523244	Fix a race condition that causes corruption of a buffer in mpirun while trying to process launch_local_proc cmds. Cleanup the pidmap handling by changing from value to pointer arrays. This commit was SVN r20629.	2009-02-25 02:43:22 +00:00
George Bosilca	8f1c7cf8c2	Make sure we correctly unregister all persistent events and signal handlers. This commit was SVN r20568.	2009-02-17 00:20:05 +00:00
George Bosilca	4004cb11bc	Release the orte_default_hostfile. This commit was SVN r20561.	2009-02-14 21:49:56 +00:00
Rainer Keller	d81443cc5a	- On the way to get the BTLs split out and lessen dependency on orte: Often, orte/util/show_help.h is included, although no functionality is required -- instead, most often opal_output.h, or orte/mca/rml/rml_types.h Please see orte_show_help_replacement.sh commited next. - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration actually showed two missing #include "orte/util/show_help.h" in orte/mca/odls/base/odls_base_default_fns.c and in orte/tools/orte-top/orte-top.c Manually added these. Let's have MTT the last word. This commit was SVN r20557.	2009-02-14 02:26:12 +00:00
Jeff Squyres	91d302fd67	A bunch of minor ORTE valgrind-inspired memory leak cleanups (reviewed by Ralph). This commit was SVN r20544.	2009-02-13 04:14:10 +00:00
Ralph Castain	62dd763a8f	Add ability for local slave spawns to pre-position supporting files. Update comm_spawn and comm_spawn_multiple man pages to cover new info_keys. This commit was SVN r20527.	2009-02-12 15:56:45 +00:00
Ralph Castain	f0af389910	Enable comm_spawn of slave processes, currently only active for the rsh, slurm, and tm environments. Establish support for local rsh environments in the plm/base so that rsh of local slaves can be done by any environment that supports it. Create new orte_rsh_agent param so users can specify rsh agent from outside of rsh plm, and sym link that to the old plm_rsh_agent and pls_rsh_agent options. Modify the orte-bootproxy to pass prefix for the remote slave to support hetero/hybrid scenarios This commit was SVN r20492.	2009-02-09 20:44:44 +00:00
Ralph Castain	f8cd188367	Make the orte_pmap_t an object so it can be properly initialized. Adjust the construct function to properly indicate invalid node/local ranks This commit was SVN r20465.	2009-02-06 15:29:33 +00:00
Ralph Castain	dbba261451	Commit missing change in #define so r20427 doesn't break trunk This commit was SVN r20428. The following SVN revision numbers were found above: r20427 --> open-mpi/ompi@b100513022	2009-02-04 22:37:24 +00:00
Ralph Castain	2966206f58	Fix a race condition in the IOF and add some new user-requested features: 1. fix a race condition whereby a proc's output could trigger an event prior to the other outputs being setup, thus c ausing the IOF to declare the proc "terminated" too early. This was really rare, but could happen. 2. add a new "timestamp-output" option that timestamp's each line of output 3. add a new "output-filename" option that redirects each proc's output to a separate rank-named file. 4. add a new "xterm" option that redirects the output of the specified ranks to a separate xterm window. This commit was SVN r20392.	2009-01-30 22:47:30 +00:00
Rolf vandeVaart	0704b98668	Add the ability to forward SIGTSTP (converted to SIGSTOP) and SIGCONT to the a.outs. By default, they are not forwarded and the behavior remains as it has always been. However, if one runs with --mca orte_forward_job_control 1, then mpirun will catch those two signals and forward them to the orteds which will deliver them to the a.outs. We have had requests for this feature. This commit was SVN r20391.	2009-01-30 18:50:10 +00:00
Ralph Castain	0435108834	Improve the efficiency of the launch system by changing the outer loop to being over app_context, and adding a flag to the app_context so the daemon can record that "this app is on my node" when decoding the launch msg. If the --wdir option is given, check to see if the user provided a relative path. If so, convert it to an absolute path. This is needed to maintain consistent behavior across environements. Some environments automatically chdir to your current working directory when launching the remote orted, while others (e.g., ssh) don't. This levels the playing field and reduces user surprise. This commit was SVN r20342.	2009-01-25 12:39:24 +00:00
Ralph Castain	4da9f53fa4	Implement the xml formatted output of stdout/err/diag. Force -tag-output if -xml is set. This commit was SVN r20302.	2009-01-20 16:58:31 +00:00
Ralph Castain	88a0af9726	Revise the way we output resolved hostnames to make life easier for the Eclipse folks. Store aliases for individual nodes (only when requested to show resolved hostnames) and then report them out as part of the display-map option. This commit was SVN r20284.	2009-01-15 18:11:50 +00:00
Brian Barrett	d3310a5ad1	fixes to get compiling on Red Storm again This commit was SVN r20252.	2009-01-12 22:30:00 +00:00
Ralph Castain	2778c13fac	Continue to refine the timing instrumentation to identify where launch time is being spent This commit was SVN r20244.	2009-01-12 19:12:58 +00:00
Jeff Squyres	d1c6f3f89a	* Fix a truckload of Cisco copyrights to be the same as the rest of the code base. * Fix a few misspellings in other copyrights. This commit was SVN r20241.	2009-01-11 02:30:00 +00:00
Ralph Castain	25f578a7d2	Continue to improve timing instrumentation. Add ability to store timing data directly to a file instead of just to stdout. This commit was SVN r20229.	2009-01-08 14:27:52 +00:00
Ralph Castain	007d68becc	Make the data on local children and their jobs available globally on both daemons and the HNP. This simply shifts the data structures from the ODLS base to the orte globals area to support subsequent movement of the daemon collective operations from the odls to the grpcomm framework. As that will be a larger change, it will be implemented on a branch and rolled over separately. This commit was SVN r20228.	2009-01-08 14:25:56 +00:00
Ralph Castain	7818779760	Expose the nidmap and pidmap as orte globals so that components in other frameworks can access and/or manipulate them without forcing API modifications - modify the individual ess components that were affected so they use the global variables. Add a list of attributes to the nids for storing node-related data (e.g., modex attrs), and define a new object for that purpose. Consolidate the nid/pid lookup code with the rest of the nid/pid code so that changes are easier to track. Add the ability to send cluster profile info as part of the nidmap. Cleanup the setup and teardown of the new global nidmap and pidmap objects. This commit was SVN r20219.	2009-01-07 14:58:38 +00:00
Brian Barrett	64f7848a84	Number of small fixes to get the trunk to build again on Catamount This commit was SVN r20141.	2008-12-16 20:09:56 +00:00
Ralph Castain	e878ee4fa3	Revert r20128. Setting a default hostfile name breaks all the filtering code we added to the system. It would require multiple entries in several places to ensure that, should the default hostfile in fact not exist, the system will still work correctly. Too much complexity - just put the name in the default mca param file iff you actually have a default hostfile. This commit was SVN r20129. The following SVN revision numbers were found above: r20128 --> open-mpi/ompi@ea01da0eee	2008-12-15 17:37:21 +00:00
Ralph Castain	ea01da0eee	Set default name for "default-hostfile" param to "openmpi-default-hostfile" to retain backwards compatibility with OMPI 1.2 This commit was SVN r20128.	2008-12-15 17:08:59 +00:00
Shiqing Fan	20cea164db	- 3/4 commit for Windows Visual Studio and CCP support: corrections to non-windows files (but within ifdef __WINDOWS__) type casts, event library for windows use win32. in orte runtime, add windows sockets handling and object construction. This commit was SVN r20110.	2008-12-10 21:13:10 +00:00
Ralph Castain	728a24c8ec	After considerable patience and help with debugging/testing from Tim M and Jeff S, return a completed and pretty well tested patch of the IOF to the trunk. This commit includes the previously reverted r20074, r20068, and r20064, as well as changes to fix those commits. Basically, the remaining problem turned out to be: 1. closing stdout/stderr during orte_finalize of mpirun 2. inadvertently setting up a write event on fd = -1 3. devising a scheme to more accurately track when the stdin write event was active vs closed so it only got released once This passed prelim MTT testing by Jeff and Tim, but should soak for awhile before migrating to 1.3. This commit was SVN r20106. The following SVN revision numbers were found above: r20064 --> open-mpi/ompi@a07660aea8 r20068 --> open-mpi/ompi@ec930d14a9 r20074 --> open-mpi/ompi@2940309613	2008-12-10 20:40:47 +00:00
Ralph Castain	e28210d0dc	Revert r20074, r20068, and r20064: remove the IOF proc completion code pending further off-trunk work. This commit was SVN r20089. The following SVN revision numbers were found above: r20064 --> open-mpi/ompi@a07660aea8 r20068 --> open-mpi/ompi@ec930d14a9 r20074 --> open-mpi/ompi@2940309613	2008-12-09 17:11:59 +00:00
Ralph Castain	51789c9049	Cleanup the output for nodename resolve reporting This commit was SVN r20081.	2008-12-08 19:00:36 +00:00
Ralph Castain	a07660aea8	Bring over the IOF completion changes. This commit fixes the long-occurring problem whereby application procs could, under some circumstances, lose their final prints to stdout/err. The commit includes: 1. coordination of job completion notification to include a requirement for both waitpid detection AND notification that all iof pipes have been closed by the app 2. change of all IOF read and write events to be non-persistent so they can properly be shutdown and restarted only when required 3. addition of a delay (currently set to 10ms) before restarting the stdin read event. This was required to ensure that the stdout, stderr, and stddiag read events had an opportunity to be serviced in scenarios where large files are attached to stdin. This commit was SVN r20064.	2008-12-03 17:45:42 +00:00
Ralph Castain	ff8e83ff3b	Per request from IBM/Eclipse, provide MCA param to request output when nodes are resolved to a different nodename. This really only happens for the node that mpirun executes on, but they need the alert so they can do string matching of node names. This commit was SVN r20032.	2008-11-24 19:57:08 +00:00
Ralph Castain	182b15e252	Remove duplicate definition of orte_xml_output - thanks Shiqing for catching it! This commit was SVN r20017.	2008-11-18 13:53:13 +00:00
Ralph Castain	586334d1c8	Per discussion with Tim Mattox, reset the trunk to pre-19991 level for the iof only. I will shortly add a changeset that will repair the one known error where we were incorrectly closing the stdout/err/diag file descriptors when all we wanted to do was close stdin. I will leave out the changes associated with coordinating proc termination due to race conditions IU encounted during MTT testing. I have been unable to replicate those so far, but we hope to resolve it in the near future. This commit was SVN r19998.	2008-11-14 20:22:36 +00:00
Ralph Castain	555bbf0c02	Fix the iof race conditions wrt proc termination. This is comprised of two sections: 1. modify the iof to track when a proc actually closes all of its open iof output pipes. When this occurs, notify the odls that the proc's iof is complete. This is done via a zero-time event so that we can step out of the read event before processing the notification. 2. in the odls, modify the waitpid callback so it only flags that it was called. Add a function to receive the iof-complete notification, and a function that checks for both iof complete and waitpid callback before declaring a proc fully terminated. This ensures that we read and deliver -all- of the IO prior to declaring the job complete. Also modified the odls call to orte_iof.close (and the component's implementation) so it only closes stdin, leaving the other io channels alone. This fixes the other half of the known problem. This should fix the ticket on this subject, but I'll wait to close it pending further testing in the trunk. This commit was SVN r19991.	2008-11-12 23:32:01 +00:00
Ralph Castain	f54fda489e	This is a first step towards supporting fully-routed OOB communications: 1. remove direct routed module (hooray!) 2. add radix tree routed module (binomial remains default) 3. remove duplicate data storage - orteds were storing nidmap and pidmap data in odls, everyone else in ess 4. add ess APIs to update nidmap, add new pidmap - used only by orteds for MPI-2 support 5. modify code to eliminate multiple calls to orte_routed.update_route that recreated info already in ess pidmap. Add ess API to lookup that info instead. Modify routed modules to utilize that capability 6. setup new ability to shutdown orteds without sending back an "ack" message to mpirun - not utilized yet, will require some changes to plm terminate_orteds functions in managed environments (coming soon) Initial tests indicating that fully routing comm via defined routing trees may not actually have a significant cost for operations like IB QP setup. More tests required to confirm. This will require an autogen... This commit was SVN r19866.	2008-10-31 21:10:00 +00:00
Ralph Castain	6e5d844c36	Roll in the revamped IOF subsystem. Per the devel mailing list email, this is a complete rewrite of the iof framework designed to simplify the code for maintainability, and to support features we had planned to do, but were too difficult to implement in the old code. Specifically, the new code: 1. completely and cleanly separates responsibilities between the HNP, orted, and tool components. 2. removes all wireup messaging during launch and shutdown. 3. maintains flow control for stdin to avoid large-scale consumption of memory by orteds when large input files are forwarded. This is done using an xon/xoff protocol. 4. enables specification of stdin recipients on the mpirun cmd line. Allowed options include rank, "all", or "none". Default is rank 0. 5. creates a new MPI_Info key "ompi_stdin_target" that supports the above options for child jobs. Default is "none". 6. adds a new tool "orte-iof" that can connect to a running mpirun and display the output. Cmd line options allow selection of any combination of stdout, stderr, and stddiag. Default is stdout. 7. adds a new mpirun and orte-iof cmd line option "tag-output" that will tag each line of output with process name and stream ident. For example, "[1,0]<stdout>this is output" This is not intended for the 1.3 release as it is a major change requiring considerable soak time. This commit was SVN r19767.	2008-10-18 00:00:49 +00:00
Ralph Castain	b46d3e766e	Cleanup the plm failed-to-start problem a little - ensure that the event is always defined so we don't have to check when trying to trigger it, thus avoiding potential race conditions. This commit was SVN r19755.	2008-10-16 14:58:32 +00:00
Ralph Castain	48c3de1865	Fix a problem in the plm "failed to start" code observed by Jeff. When we are unable to launch to a specific node because it doesn't exist or is down, the system would hang and/or segv. The reason for the hang was that we were "firing" the orted exit trigger prior to its timer event being defined - thus "locking" that one-shot and preventing it from firing when we actually were ready to use it. The segv was caused by the fact that we don't really know which daemon failed to start (at least, in most cases), so we didn't set a pointer to the aborted proc object. All we really wanted, though, was to ensure that mpirun returned a non-zero exit status, so the fix was to simply return the default error status. This commit was SVN r19754.	2008-10-16 14:21:37 +00:00
Ralph Castain	802d14b130	Default job controls to forward IO. Adjust debugger code to not forward IO unless requested. This commit was SVN r19690.	2008-10-06 17:56:23 +00:00
Ralph Castain	15c47a2473	Revise the daemon collective system to handle comm_spawn patterns that cross into new nodes that are not direct children on the routing tree of the HNP. Refers to ticket #1548. Although this appears to fix the problem, the ticket will be held open pending further test prior to transition to the 1.3 branch. This commit was SVN r19674.	2008-10-02 20:08:27 +00:00
Ralph Castain	4f89adae0c	Prettify the user level display of allocation and map to make it easier to see and understand This commit was SVN r19655.	2008-09-28 16:44:09 +00:00
Ralph Castain	037231fbcb	MOdify the node_rank and local_rank fields to be uint16_t so we can handle more than 256 procs/node. Change the type to a defined one so that any future change can be easily done, if required. This commit was SVN r19637.	2008-09-25 13:39:08 +00:00
Ralph Castain	e64b79f30f	Modify the --display-map and --display-alloc per note on devel list to reduce info for user understanding. Add --display-devel-map and --display-devel-alloc to display all the detailed info we used to provide - it is only of use/interest to developers anyway and confuses users. This commit was SVN r19608.	2008-09-23 15:46:34 +00:00
Ralph Castain	f326ee356e	Add some error output to the plm rsh This commit was SVN r19532.	2008-09-10 01:59:49 +00:00
Shiqing Fan	04ee20a880	- Mainly type casts. Microsoft VC++ compiler is too strict. This commit was SVN r19517.	2008-09-08 15:39:30 +00:00
Shiqing Fan	c90e6e4f6d	- The correct function to close a socket. Thanks to George for noticing it. This commit was SVN r19513.	2008-09-08 14:35:47 +00:00
Shiqing Fan	93897c87a8	- Update the orte wait function for Windows. This commit was SVN r19512.	2008-09-08 14:11:26 +00:00
Brian Barrett	c2c5a34cb1	Missed a symbol that needs to exist in non-full RTE case This commit was SVN r19482.	2008-09-02 15:07:48 +00:00
Brian Barrett	52d4be78dc	Add missing header file for the full rte case caused by changes in r19471. This commit was SVN r19474. The following SVN revision numbers were found above: r19471 --> open-mpi/ompi@38eb301919	2008-09-01 17:49:31 +00:00
Brian Barrett	38eb301919	Follow-on to r19457. Rather than have #ifs in the middle of functions (which neither Ralph nor I liked), don't allow the functions we don't need to be visible. Still not happy about the number of #ifs in the code, but splitting the code further would have been a nightmare and this was a good cutting point. Also protected some variables that were declared but not instanced so that users would be notified at compile time instead of link or run time (in the case of dss constants) that things wouldn't work. This commit was SVN r19471. The following SVN revision numbers were found above: r19457 --> open-mpi/ompi@a15171e46b	2008-09-01 17:15:01 +00:00
Brian Barrett	a15171e46b	Some fixes for the disabled ORTE case * Protect an orte variable used in the orte debugger stuff * Initialize the datatype code in the Catamount code, as we need it for intercommunicators (the proc code needs it to pack the remote name) * Turn on a bunch of the orte datatype code so that ORTE_NAME is available. This commit was SVN r19457.	2008-08-31 18:06:55 +00:00
Ralph Castain	4e0f34a062	When we hit an error prior to actually launching daemons, it would be nice if orterun didn't bark about daemons failing to launch, mpirun detecting a job failed, etc. Add a new job state to indicate that we never attempted to launch. Flag such a scenario and avoid hitting all the other error messages. This commit was SVN r19366.	2008-08-19 15:19:30 +00:00
Ralph Castain	49745c5f40	Provide a new option that allows a user to leave an ssh session open without getting deluged by ORTE debug output. The new option is --leave-session-attached, with a corresponding MCA param of orte_leave_session_attached. Theoretically, any PLM could use this - but in reality, all of them except rsh/ssh already leave the session attached anyway. This fixes trac:656 - a REALLY old ticket This commit was SVN r19294. The following Trac tickets were found above: Ticket 656 --> https://svn.open-mpi.org/trac/ompi/ticket/656	2008-08-14 18:59:01 +00:00
Ralph Castain	30f37f762d	Enable co-location of debugger daemons during initial launch and when debugging a running job. Provide support for four MPIR extensions that allow specification of debugger daemon executable, argv for the debugger daemon, whether or not to forward debugger daemon IO, and whether or not debugger daemon will piggy-back on ORTE OOB network. Last is not yet implemented. No change in behavior or operation occurs unless (a) the debugger specifically utilizes the extensions and, for co-locate while running, the user specifically enables the capability via an MCA param. Two of the MPIR extensions supported here are used in a widely-used debugger for a large-scale installation. The other two extensions are new and being utilized in prototype work by several debuggers for possible future release. This commit was SVN r19275.	2008-08-13 17:47:24 +00:00
Ralph Castain	d7da6b3226	Just a minor cleanup of race conditions on trigger events for exit. Close the trigger pipe upon use since it is only a one-shut anyway. This removes the need to destruct the object, leaving the lock available to protect one-time termination routines throughout the life of the program. This commit was SVN r19208.	2008-08-06 21:53:35 +00:00
Ralph Castain	63c33a9c32	Some minor updates to the locking system changes. Remove obsolete locks. Ensure the trigger event objects do not get deconstructed until the very end to avoid possible problems due to race conditions. Route all orted abnormal term tests through the trigger. This commit was SVN r19172.	2008-08-06 11:31:06 +00:00
Shiqing Fan	bb90ad793a	- Move the entire OBJ_CLASS_INSTANCE of orte_trigger_event_t into #if blocks, so that windows can have its own destructor for socket. Thanks to Ralph. - The modification for handling windows socket will first be applied to windows branch. This commit was SVN r19170.	2008-08-06 09:42:48 +00:00
Ralph Castain	be02211b4f	Modify the wakeup system to make it more Windows-friendly. This allows Shiqing to consolidate the Windows-specific modifications into one location, and generalizes the wakeup procedure in case we hit other system-specific requirements. This needs some soak time to ensure we haven't opened any race conditions. I tried to loop everything in the shutdown procedure through that trigger event call to ensure it all goes through the one-time locks as it did before so that someone hitting ctrl-c when we are already shutting down shouldn't cause problems. Just want to let people use it for awhile to verify. This commit was SVN r19159.	2008-08-05 15:09:29 +00:00
Ralph Castain	35a86b3347	Establish an MCA param "orte_allocation_required" so that a system can require the user have an RM-provided allocation in order to run. This helps prevent the problem where a user forgets to get an allocation on an RM-managed cluster, and then executes mpirun on the head node - thus causing all of their mpi procs to launch on the head node, usually bringing it to its knees. Since OMPI allows mpirun to default to the local node, and since users want to retain the option to co-locate procs with mpirun, we needed another param to block this error case. This commit was SVN r19135.	2008-08-04 14:25:19 +00:00
Jeff Squyres	4bdc093746	Fixes trac:1361: mainly add new internal MCA parameter that orterun will set when it launches under debuggers using the --debug option. This commit was SVN r19116. The following Trac tickets were found above: Ticket 1361 --> https://svn.open-mpi.org/trac/ompi/ticket/1361	2008-07-31 22:11:46 +00:00
Ralph Castain	a62b2a0150	Per the July technical meeting: Standardize the handling of the orte launch agent option across PLMs. This has been a consistent complaint I have received - each PLM would register its own MCA param to get input on the launch agent for remote nodes (in fact, one or two didn't, but most did). This would then get handled in various and contradictory ways. Some PLMs would accept only a one-word input. Others accepted multi-word args such as "valgrind orted", but then some would error by putting any prefix specified on the cmd line in front of the incorrect argument. For example, while using the rsh launcher, if you specified "valgrind orted" as your launch agent and had "--prefix foo" on you cmd line, you would attempt to execute "ssh foo/valgrind orted" - which obviously wouldn't work. This was all -very- confusing to users, who had to know which PLM was being used so they could even set the right mca param in the first place! And since we don't warn about non-recognized or non-used mca params, half of the time they would wind up not doing what they thought they were telling us to do. To solve this problem, we did the following: 1. removed all mca params from the individual plms for the launch agent 2. added a new mca param "orte_launch_agent" for this purpose. To further simplify for users, this comes with a new cmd line option "--launch-agent" that can take a multi-word string argument. The value of the param defaults to "orted". 3. added a PLM base function that processes the orte_launch_agent value and adds the contents to a provided argv array. This can subsequently be harvested at-will to handle multi-word values 4. modified the PLMs to use this new function. All the PLMs except for the rsh PLM required very minor change - just called the function and moved on. The rsh PLM required much larger changes as - because of the rsh/ssh cmd line limitations - we had to correctly prepend any provided prefix to the correct argv entry. 5. added a new opal_argv_join_range function that allows the caller to "join" argv entries between two specified indices Please let me know of any problems. I tried to make this as clean as possible, but cannot compile all PLMs to ensure all is correct. This commit was SVN r19097.	2008-07-30 18:26:24 +00:00
Ralph Castain	718cceddaa	Ensure that we only launch procs on the HNP if that node is actually included in the allocation. This commit was SVN r19038.	2008-07-25 17:13:22 +00:00
Thomas Herault	28dc80b67e	Deal with the SIGCHLD issue in LSF. lsb_launch tampers with SIGCHLD signal handler. We are forced to reinstall our own signal handler after a call to this function. This commit fixes trac:1356. This commit was SVN r19033. The following Trac tickets were found above: Ticket 1356 --> https://svn.open-mpi.org/trac/ompi/ticket/1356	2008-07-25 15:23:23 +00:00
Ralph Castain	a1d296ae03	This commit fixes ticket #1410 Fix a few bugs in the mappers: 1. Ensure that bynode with no -np fills all available slots - it just does so with the ranks set bynode instead of byslot 2. fix --nolocal behavior so it works correctly in all cases. We still have to test the host's name using opal_ifislocal in the mapper because the name returned by gethostname to orte_process_info.hostname can be an FQDN, but a hostfile may contain a non-FQDN version. 3. Add missing --nolocal logic to the seq mapper Oversubscribed mapping seemed to be working okay without repair, so I couldn't verify my own bug report in that regard. Also included are some preliminary changes to support the modified hostfile behavior, which will be committed shortly: 1. removed the totally useless "allocate" field in the orte_node_t object since every node is automatically allocated for use - and everything ignored the field anyway 2. correctly initialize the slots_alloc field when the allocation is read This commit was SVN r19030.	2008-07-25 13:35:12 +00:00
Shiqing Fan	0646cd2491	- Move wait object instance code out of the #ifdef block, so that systems with waitpid and Windows can both use it. Thanks to Ralph. This commit was SVN r19003.	2008-07-23 16:20:42 +00:00
Ralph Castain	dbc35b60f6	Okay, one last time - get the xml output of the map correct...sigh. This commit was SVN r18988.	2008-07-23 02:45:08 +00:00
Ralph Castain	26cfac94e6	Fix a formatting problem with xml output of map This commit was SVN r18976.	2008-07-22 13:14:02 +00:00
Ralph Castain	ba5498cdc6	Repair the MPI-2 dynamic operations. This includes: 1. repair of the linear and direct routed modules 2. repair of the ompi/pubsub/orte module to correctly init routes to the ompi-server, and correctly handle failure to correctly parse the provided ompi-server URI 3. modification of orterun to accept both "file" and "FILE" for designating where the ompi-server URI is to be found - purely a convenience feature 4. resolution of a message ordering problem during the connect/accept handshake that allowed the "send-first" proc to attempt to send to the "recv-first" proc before the HNP had actually updated its routes. Let this be a further reminder to all - message ordering is NOT guaranteed in the OOB 5. Repair the ompi/dpm/orte module to correctly init routes during connect/accept. Reminder to all: messages sent to procs in another job family (i.e., started by a different mpirun) are ALWAYS routed through the respective HNPs. As per the comments in orte/routed, this is REQUIRED to maintain connect/accept (where only the root proc on each side is capable of init'ing the routes), allow communication between mpirun's using different routing modules, and to minimize connections on tools such as ompi-server. It is all taken care of "under the covers" by the OOB to ensure that a route back to the sender is maintained, even when the different mpirun's are using different routed modules. 6. corrections in the orte/odls to ensure proper identification of daemons participating in a dynamic launch 7. corrections in build/nidmap to support update of an existing nidmap during dynamic launch 8. corrected implementation of the update_arch function in the ESS, along with consolidation of a number of ESS operations into base functions for easier maintenance. The ability to support info from multiple jobs was added, although we don't currently do so - this will come later to support further fault recovery strategies 9. minor updates to several functions to remove unnecessary and/or no longer used variables and envar's, add some debugging output, etc. 10. addition of a new macro ORTE_PROC_IS_DAEMON that resolves to true if the provided proc is a daemon There is still more cleanup to be done for efficiency, but this at least works. Tested on single-node Mac, multi-node SLURM via odin. Tests included connect/accept, publish/lookup/unpublish, comm_spawn, comm_spawn_multiple, and singleton comm_spawn. Fixes ticket #1256 This commit was SVN r18804.	2008-07-03 17:53:37 +00:00
Brian Barrett	cbd6749c22	Move the lock initialization back to orte_init so that the finalize lock is properly initialized and available in all cases (like ompi_info, where the ess is never actually initialized). Fixes trac:1364. This commit was SVN r18733. The following Trac tickets were found above: Ticket 1364 --> https://svn.open-mpi.org/trac/ompi/ticket/1364	2008-06-25 03:18:37 +00:00
Ralph Castain	b118779c08	It is okay for us to init the ORTE mca params multiple times. Indeed, it is absolutely required by orterun as the first time has to be done prior to parsing the command line, which means that the mca values haven't been parsed yet! Add ability for sys admins to prohibit putting session directories under specified locations. Thus, they can now protect parallel file systems from foolish user mistakes. This commit was SVN r18721.	2008-06-24 17:50:56 +00:00
Ralph Castain	26c9ad5799	Clean-up the DSS API to remove two functions that are supposed to be used solely internally to the DSS. These were likely exposed because we need to call them when packing/unpacking declared types, but this means that developers may accidentally use the wrong functions, causing the DSS buffer to get confused. Instead, return the system to the way it used to work and hide those functions. This commit was SVN r18684.	2008-06-19 18:46:25 +00:00
Ralph Castain	0532d799d6	Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm. Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed. This commit was SVN r18664.	2008-06-18 03:15:56 +00:00
Ralph Castain	9613b3176c	Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach. I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive. This commit was SVN r18619.	2008-06-09 14:53:58 +00:00
Ralph Castain	7bee71aa59	Fix a potential, albeit perhaps esoteric, race condition that can occur for fast HNP's, slow orteds, and fast apps. Under those conditions, it is possible for the orted to be caught in its original send of contact info back to the HNP, and thus for the progress stack never to recover back to a high level. In those circumstances, the orted can "hang" when trying to exit. Add a new function to opal_progress that tells us our recursion depth to support that solution. Yes, I know this sounds picky, but good ol' Jeff managed to make it happen by driving his cluster near to death... Also ensure that we declare "failed" for the daemon job when daemons fail instead of the application job. This is important so that orte knows that it cannot use xcast to tell daemons to "exit", nor should it expect all daemons to respond. Otherwise, it is possible to hang. After lots of testing, decide to default (again) to slurm detecting failed orteds. This proved necessary to avoid rather annoying hangs that were difficult to recover from. There are conditions where slurm will fail to launch all daemons (slurm folks are working on it), and yet again, good ol' Jeff managed to find both of them. Thanks you Jeff! :-/ This commit was SVN r18611.	2008-06-06 19:36:27 +00:00
Ralph Castain	0da811ce79	Initial work on xml support - allocation and job map outputs completed. More to come. This commit was SVN r18587.	2008-06-04 20:53:12 +00:00
Ralph Castain	c992e99035	Remove the tags from orte_output_open and the filtering operation from orte_output - this will be handled differently to improve the XML output interface This commit was SVN r18557.	2008-06-03 14:24:01 +00:00
Ralph Castain	b456fb2d42	Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2heartrate, declares orted failure if not seen in last 2heartrate time. Also detect orted failed-to-start by setting timeout on launch. Currently only used in TM launcher. Neither detection is enabled by default, but are only active if heartrate is set and/or launch timeout is set. Exception for SLURM as orted failure is always detected and reported. More info to come on devel list. This commit was SVN r18555.	2008-06-02 21:46:34 +00:00
Ralph Castain	72530f8fed	Cleanly handle the failed start of an orted, or its unexpected failure after start. This commit will allow mpirun to exit cleanly when this occurs, and does a best-effort attempt to cleanup the mess. However, it still has two unresolved issues that need to be eventually addressed: 1. it depends upon the ability of the native environment to alert us that the orted has died/failed to start. I have included that support for SLURM, but other environments need to be done. 2. for some yet-to-be-determined reason, the message that tells the remaining daemons to "die" isn't getting out of the RML, even though no obvious blockage is standing in the way. Work will continue on resolving that problem. For now, the orteds appear to be exiting on their own quite nicely when they see their HNP "lifeline" disappear. This represents the best-available fix for ticket #221 so I am closing that ticket at this time. This commit was SVN r18536.	2008-05-29 13:38:27 +00:00
Ralph Castain	f76240e7cc	Modify the nidmap utility to pass daemon vpids for nodes. In some mapping algo's, it is possible for nodes to be skipped. This results in daemon vpids that differ from the index of their respective node in the node array, causing the daemon to not recognize procs that it is supposed to launch. This commit was SVN r18528.	2008-05-28 18:38:47 +00:00
Ralph Castain	828ae26d90	ORTE-level MCA params are defined in several places. Ompi_info cannot call orte_init due to an issue with the memory allocator, thus making it impossible for ompi_info to display all of the ORTE-level MCA params. By consolidating them all into one function, ompi_info can call that function and register the desired variables. This also requires, however, that ompi_info call orte_output_init to avoid generating tons of error messages, so make that adjustment too. Fixes ticket #1314 In addition, orte_output has a race condition issue whereby calls to orte_output/verbose can occur prior to either the RML being defined/setup, or the HNP being defined. This latter occurs during the initialization of the orte_process_info structure. In both cases, there is no way orte_output can send the output to the HNP. Hence, the message must be simply output locally. Fixes ticket #1315 This commit was SVN r18524.	2008-05-28 13:29:58 +00:00
Terry Dontje	ef7ac86929	created opal_version_string and orte_version_string to match the ompi changes made in r18345 for ompi_version_string. This was done per request from Jeff Squyres to maintain consistency and to remove some warnings caused by the non-use of some static const char. This commit was SVN r18461. The following SVN revision numbers were found above: r18345 --> open-mpi/ompi@8dd0421015	2008-05-20 12:13:19 +00:00
Jeff Squyres	e7ecd56bd2	This commit represents a bunch of work on a Mercurial side branch. As such, the commit message back to the master SVN repository is fairly long. = ORTE Job-Level Output Messages = Add two new interfaces that should be used for all new code throughout the ORTE and OMPI layers (we already make the search-and-replace on the existing ORTE / OMPI layers): * orte_output(): (and corresponding friends ORTE_OUTPUT, orte_output_verbose, etc.) This function sends the output directly to the HNP for processing as part of a job-specific output channel. It supports all the same outputs as opal_output() (syslog, file, stdout, stderr), but for stdout/stderr, the output is sent to the HNP for processing and output. More on this below. * orte_show_help(): This function is a drop-in-replacement for opal_show_help(), with two differences in functionality: 1. the rendered text help message output is sent to the HNP for display (rather than outputting directly into the process' stderr stream) 1. the HNP detects duplicate help messages and does not display them (so that you don't see the same error message N times, once from each of your N MPI processes); instead, it counts "new" instances of the help message and displays a message every ~5 seconds when there are new ones ("I got X new copies of the help message...") opal_show_help and opal_output still exist, but they only output in the current process. The intent for the new orte_* functions is that they can apply job-level intelligence to the output. As such, we recommend that all new ORTE and OMPI code use the new orte_* functions, not thei opal_* functions. === New code === For ORTE and OMPI programmers, here's what you need to do differently in new code: * Do not include opal/util/show_help.h or opal/util/output.h. Instead, include orte/util/output.h (this one header file has declarations for both the orte_output() series of functions and orte_show_help()). * Effectively s/opal_output/orte_output/gi throughout your code. Note that orte_output_open() takes a slightly different argument list (as a way to pass data to the filtering stream -- see below), so you if explicitly call opal_output_open(), you'll need to slightly adapt to the new signature of orte_output_open(). * Literally s/opal_show_help/orte_show_help/. The function signature is identical. === Notes === * orte_output'ing to stream 0 will do similar to what opal_output'ing did, so leaving a hard-coded "0" as the first argument is safe. * For systems that do not use ORTE's RML or the HNP, the effect of orte_output_* and orte_show_help will be identical to their opal counterparts (the additional information passed to orte_output_open() will be lost!). Indeed, the orte_* functions simply become trivial wrappers to their opal_* counterparts. Note that we have not tested this; the code is simple but it is quite possible that we mucked something up. = Filter Framework = Messages sent view the new orte_* functions described above and messages output via the IOF on the HNP will now optionally be passed through a new "filter" framework before being output to stdout/stderr. The "filter" OPAL MCA framework is intended to allow preprocessing to messages before they are sent to their final destinations. The first component that was written in the filter framework was to create an XML stream, segregating all the messages into different XML tags, etc. This will allow 3rd party tools to read the stdout/stderr from the HNP and be able to know exactly what each text message is (e.g., a help message, another OMPI infrastructure message, stdout from the user process, stderr from the user process, etc.). Filtering is not active by default. Filter components must be specifically requested, such as: {{{ $ mpirun --mca filter xml ... }}} There can only be one filter component active. = New MCA Parameters = The new functionality described above introduces two new MCA parameters: * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that help messages will be aggregated, as described above. If set to 0, all help messages will be displayed, even if they are duplicates (i.e., the original behavior). * '''orte_base_show_output_recursions''': An MCA parameter to help debug one of the known issues, described below. It is likely that this MCA parameter will disappear before v1.3 final. = Known Issues = * The XML filter component is not complete. The current output from this component is preliminary and not real XML. A bit more work needs to be done to configure.m4 search for an appropriate XML library/link it in/use it at run time. * There are possible recursion loops in the orte_output() and orte_show_help() functions -- e.g., if RML send calls orte_output() or orte_show_help(). We have some ideas how to fix these, but figured that it was ok to commit before feature freeze with known issues. The code currently contains sub-optimal workarounds so that this will not be a problem, but it would be good to actually solve the problem rather than have hackish workarounds before v1.3 final. This commit was SVN r18434.	2008-05-13 20:00:55 +00:00
Ralph Castain	b2c73f6e11	Fix tree-spawn to work within the new modex system This commit was SVN r18349.	2008-05-01 19:19:34 +00:00
Ralph Castain	3e55fe6f6d	Fold in the revised modex scheme. Move the ompi_proc_t modex portions to the RTE level since the daemons already have that info. Provide each process with the equivalent of a "nidmap" - both a map of what nodes are in the job, and a map of which node each process is on. This enables the use of static ports, though that hasn't been turned "on" in this commit. Update the rsh tree spawn capability so we spawn the next wave of daemons before launching our own local procs. Add an ability to encode nodenames for large clusters with contiguous node name numbering schemes - this allows communication of all node names in a few bytes instead of tens-of-bytes/node. This commit was SVN r18338.	2008-04-30 19:49:53 +00:00
Josh Hursey	cc83d41ad9	Merge in tmp/jjh-scratch {{{ svn merge -r 18218:18240 https://svn.open-mpi.org/svn/ompi/tmp/jjh-scratch . }}} Contains: * Primarily a fix for a user reported problem where a cached file descriptor is causing a SIGPIPE on restart. * Cleanup some small memory leaks from using mca_base_param_env_var() - Thanks Jeff * Cleanup ORTE FT tool compilation in non-FT builds - Thanks Tim P. * Cleanup mpi interface with missplaced {{{OPAL_CR_ENTER_LIBRARY}}} - Thanks Terry * Some other sundry cleanup items all dealing with C/R functionality in the trunk. This commit was SVN r18241.	2008-04-23 00:17:12 +00:00
Ralph Castain	e7487ad533	Implement the seq rmaps module that sequentially maps process ranks to a list hosts in a hostfile. Restore the "do-not-launch" functionality so users can test a mapping without launching it. Add a "do-not-resolve" cmd line flag to mpirun so the opal/util/if.c code does not attempt to resolve network addresses, thus enabling a user to test a hostfile mapping without hanging on network resolve requests. Add a function to hostfile to generate an ordered list of host names from a hostfile This commit was SVN r18190.	2008-04-17 13:50:59 +00:00
Ralph Castain	7c7304466c	Add a binomial tree-based launch to ssh, turned "on" only when the plm_rsh_tree_spawned mca param is set to a non-zero value. This probably isn't a very optimized capability, but it does execute a tree-based launch that may scale better than linear at high node counts. Add the daemon map capability to the ODLS to create and save a map of daemon vpid vs nodename from the launch message. Cleanup a few places in the base plm launch support where we didn't adequately protect rml recv's from potentially executing sends. This commit was SVN r18143.	2008-04-14 18:26:08 +00:00
Ralph Castain	3a0d09300b	Fully implement the inbound binomial allgather for daemon-based collectives. Supports both modex and barrier operations. Comm_spawn still uses the rank=0 method - shifting that algo to the daemons is under study. This commit was SVN r18115.	2008-04-09 22:10:53 +00:00
Tim Prins	313edd8955	- Fix a problem reported on the users list where we would segfault in finalize after calling spawn if the user did not call MPI_Comm_disconnect - Fix the app context constructor so it initializes all the fields. This commit was SVN r18079.	2008-04-04 15:07:39 +00:00
Ralph Castain	537395b924	Make two important MCA params "visible" to ompi_info This commit was SVN r18074.	2008-04-02 14:54:57 +00:00
Ralph Castain	8dca132604	Cleanup some ignores Add missing variables! This commit was SVN r18063.	2008-04-01 20:32:17 +00:00
Ralph Castain	6fcaa8df39	Remove stale define. Add global variable to be used soon. This commit was SVN r18005.	2008-03-28 02:20:37 +00:00
Josh Hursey	55044c3c4f	A fix from resulting from r17944. Need to make sure we go through orte_proc_info_finalize properly so the 'init' flag is set on restart. This is a bit cleaner anyway, esp since the GPR is gone. This commit was SVN r17978. The following SVN revision numbers were found above: r17944 --> open-mpi/ompi@ec76fe4fe4	2008-03-26 14:13:33 +00:00
Ralph Castain	dc7f45dafd	Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure. Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code. This commit was SVN r17926.	2008-03-23 23:10:15 +00:00
Ralph Castain	27a73ad9ee	Fix a race condition between the orteds and HNP that can cause the orteds to output the "lost lifeline" message. This has been a long-time problem. I tried to reduce the problem by having the orteds tell the HNP they were finalizing, and having the HNP wait until all orteds had reported or we timed out. What was observed was that all the orteds were correctly reporting that they are leaving, but the HNP is able to exit before the orteds, thus closing the orteds lifeline socket and generating the error output. This is caused by the fact that the orteds have to whack all remaining session directories, which includes that blasted monster shared memory file! Cleaning up the SM file can take quite a while. The HNP doesn't have that problem as there is no SM file there! So it gets out first. What we had done in the past to resolve that problem was put a little test in the OOB that checks to see if we are finalizing. If we are, then we ignore the lifeline connection being lost. That check was still in the code - however, we had lost the line in orte_finalize that set the flag!! This commit was SVN r17893.	2008-03-20 13:30:51 +00:00
Ralph Castain	2ed0e60321	Bring some sanity to the exit code returned by mpirun. Ensure that we provide a non-zero code if something goes wrong, including someone exiting after calling mpi_init without calling mpi_finalize. Jeff is preparing an (undoubtedly lengthy) explanation/matrix of how these codes are determined for the OMPI FAQ. This commit was SVN r17879.	2008-03-19 19:00:51 +00:00
Lenny Verkhovsky	13ff2a0f34	local declaration instead of using global variable This commit was SVN r17876.	2008-03-19 13:04:40 +00:00
Lenny Verkhovsky	647bce6d3e	Support for new RMAPS rank mapping component This commit was SVN r17860.	2008-03-18 09:39:07 +00:00
Jeff Squyres	6ad96df8bc	Add the declspec's in here so that they're visible. This commit was SVN r17846.	2008-03-17 18:37:03 +00:00
Ralph Castain	629b95a2fe	Afraid this has a couple of things mixed into the commit. Couldn't be helped - had missed one commit prior to running out the door on vacation. Fix race conditions in abnormal terminations. We had done a first-cut at this in a prior commit. However, the window remained partially open due to the fact that the HNP has multiple paths leading to orte_finalize. Most of our frameworks don't care if they are finalized more than once, but one of them does, which meant we segfaulted if orte_finalize got called more than once. Besides, we really shouldn't be doing that anyway. So we now introduce a set of atomic locks that prevent us from multiply calling abort, attempting to call orte_finalize, etc. My initial tests indicate this is working cleanly, but since it is a race condition issue, more testing will have to be done before we know for sure that this problem has been licked. Also, some updates relevant to the tool comm library snuck in here. Since those also touched the orted code (as did the prior changes), I didn't want to attempt to separate them out - besides, they are coming in soon anyway. More on them later as that functionality approaches completion. This commit was SVN r17843.	2008-03-17 17:58:59 +00:00
Ralph Castain	b110a247be	Fix comm_spawn (maybe). Comm_spawn was sticking during spawn_multiple because of a problem in the dpm - the modex there is asking processes to talk to each other in an allgather_list operation, but the procs don't have the required contact info to do so. The solution here was to ensure that all parent procs have full contact info for procs in the child job. Admittedly, this isn't the long-term answer. We would like to have the contact info given to only the parent procs that were involved in the comm_spawn. There is a way to do that, but this will suffice to keep things working until that can be implemented and tested. This commit was SVN r17772.	2008-03-06 21:56:00 +00:00
Ralph Castain	097cc83be2	Fix a race condition - ensure we don't call terminate in orterun more than once, even if the timeout fires while we are doing so This commit was SVN r17766.	2008-03-06 19:35:57 +00:00
Ralph Castain	ff99aa054f	In order to prevent orphaned processes when using non-unity routing methods, the procs need to realize that their local daemon is a critical connection - if that connection unexpectedly closes, they need to terminate. This commit adds definition for a "lifeline" connection. For an HNP, there is no lifeline, so the lifeline proc is NULL. For a daemon, the lifeline is the HNP - the daemon should abort if it loses that connection. For a proc using unity routed, the lifeline is the HNP since it connects directly to the HNP. For a proc using tree routed, the lifeline is the local daemon. Adjusted OOB to call abort if the lifeline (as opposed to HNP) connection is lost. This commit was SVN r17761.	2008-03-06 15:30:44 +00:00
Tim Prins	f61c2333c0	Remove unneeded field, and the two uses of it. This commit was SVN r17757.	2008-03-06 12:46:36 +00:00
Tim Prins	f9916811ae	Make it so we do not mangle the options the user passes to their executeable. Fixes trac:1124 The change also: - cleans up and simplifies the command line processing code - adds an error output if more than one hostfile passed for a single app context - gets rid of the superfluous orte_app_context_map_t type, and instead use a simple argv of -host options This commit was SVN r17750. The following Trac tickets were found above: Ticket 1124 --> https://svn.open-mpi.org/trac/ompi/ticket/1124	2008-03-05 22:12:27 +00:00
Ralph Castain	06d3145fe4	First cut at direct launch for TM. Able to launch non-ORTE procs and detect their completion for a clean shutdown. This commit was SVN r17732.	2008-03-05 13:51:32 +00:00
Jeff Squyres	d0f5be023c	Restore r17703; it was accidentally removed as part of r17704. This commit was SVN r17728. The following SVN revision numbers were found above: r17703 --> open-mpi/ompi@1bedaea79b r17704 --> open-mpi/ompi@8189fcc7d5	2008-03-05 12:01:37 +00:00
Josh Hursey	3b4073e32c	This commit fixes the checkpoint/restart functionality on the trunk. Included in this commit are: * Extension to the ESS framework to support C/R * Fixed support for {{{snapc_base_establish_global_snapshot_dir}}} * Fixed FileM support * Misc. minor code modifications There are some outstanding visability issues that I want to fix next. This commit was SVN r17725.	2008-03-05 04:57:23 +00:00
Ralph Castain	edb8e32a7a	Add default hostfile parameter plus --default-hostfile command line option. Fix error message when job setup failed This commit was SVN r17724.	2008-03-05 04:54:57 +00:00
Ralph Castain	9413d6cf5d	Define a default exit code for when things fail prior to a job launch - still needs work, but a start. Fix a deadlock loop when things really, really go bad. If we timeout trying to kill the job, then it's time to bail as cleanly as possible, not go back and keep trying. This commit was SVN r17715.	2008-03-05 01:46:30 +00:00
Jeff Squyres	8189fcc7d5	Back out r17702; it went very badly. This commit was SVN r17704. The following SVN revision numbers were found above: r17702 --> open-mpi/ompi@3df754ebd7	2008-03-05 00:42:39 +00:00
Shiqing Fan	1bedaea79b	Add support of orte event wait functions for Windows. This commit was SVN r17703.	2008-03-05 00:25:23 +00:00
Ralph Castain	841d0e5208	Cleanup an attribute warning - not sure which one to set or where it should go, so I'll leave that to someone more familiar with "attributes". Ensure some debugging is only enabled when have_debug is set. This commit was SVN r17681.	2008-03-03 16:06:47 +00:00
Ralph Castain	6450962d59	Add some debugging to the message event object. Cleanup some no-longer-used values This commit was SVN r17671.	2008-02-29 20:10:31 +00:00
Ralph Castain	5e6928d710	Cleanup recursions in ORTE caused by processing recv'd messages that can cause the system to take action resulting in receipt of another message. Basically, the method employed here is to have a recv create a zero-time timer event that causes the event library to execute a function that processes the message once the recv returns. Thus, any action taken as a result of processing the message occur outside of a recv. Created two new macros to assist: ORTE_MESSAGE_EVENT: creates the zero-time event, passing info in a new orte_message_event_t object ORTE_PROGRESSED_WAIT: while waiting for specified conditions, just calls progress so messages can be recv'd. Also fixed the failed_launch function as we no longer block in the orted callback function. Updated the error messages to reflect revision. No change in API to this function, but PLM "owners" may want to check their internal error messages to avoid duplication and excessive output. This has been tested on Mac, TM, and SLURM. This commit was SVN r17647.	2008-02-28 19:58:32 +00:00
George Bosilca	9d421bea2a	Replace all occurences of orte_pointer_array by opal_pointer_array. Remove the implementation of orte_pointer_array. This commit was SVN r17636.	2008-02-28 05:32:23 +00:00
Ralph Castain	d70e2e8c2b	Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer This commit was SVN r17632.	2008-02-28 01:57:57 +00:00
Jeff Squyres	d47ea89181	George rightly pointed out that this should be 0600, not 0660. This commit was SVN r16927.	2007-12-11 12:55:08 +00:00
Jeff Squyres	1640897272	Ensure to use the 3rd argument to open(), per suggestion from Sebastian Schmitzdorff, because Fedora 8 no longer accepts the 2-argument form. This commit was SVN r16923.	2007-12-10 22:19:23 +00:00
Ethan Mallove	005652c9d4	* Embed ident strings into the Open MPI libraries using one of the following methods (in order of precedence): 1. #pragma ident <ident string> (e.g., Intel and Sun) 1. #ident <ident string> (e.g., GCC) 1. static const char ident[] = <ident string> (all others) By default, the ident string used is the standard Open MPI version string. Only the following libraries will get the embedded version strings (e.g., DSOs will not): * libmpi.so * libmpi_cxx.so * libmpi_f77.so * libopen-pal.so * libopen-rte.so * Added two new configure options: * `--with-package-name="STRING"` (defaults to "Open MPI username@hostname Distribution"). `STRING` is displayed by `ompi_info` next to the "Package" heading. * `--with-ident-string="STRING"` (defaults to the standard Open MPI version string - e.g., X.Y.Zr######). `%VERSION%` will expand to the Open MPI version string if it is supplied to this configure option. This commit was SVN r16644.	2007-11-03 02:40:22 +00:00
Ralph Castain	b6196e8a39	When we can detect that a daemon has failed, then we would like to terminate the system without having it lock up. The "hang" is currently caused by the system attempting to send messages to the daemons (specifically, ordering them to kill their local procs and then terminate). Unfortunately, without some idea of which daemon has died, the system hangs while attempting to send a message to someone who is no longer alive. This commit introduces the necessary logic to avoid that conflict. If a PLS component can identify that a daemon has failed, then we will set a flag indicating that fact. The xcast system will subsequently check that flag and, if it is set, will send all messages direct to the recipient. In the case of "kill local procs" and "terminate", the messages will go directly to each orted, thus bypassing any orted that has failed. In addition, the xcast system will -not- wait for the messages to complete, but will return immediately (i.e., operate in non-blocking mode). Orterun will wait (via an event timer) for a period of time based on the number of daemons in the system to allow the messages to attempt to be delivered - at the end of that time, orterun will simply exit, alerting the user to the problem and -strongly- recommending they run orte-clean. I could only test this on slurm for the case where all daemons unexpectedly died - srun apparently only executes its waitpid callback when all launched functions terminate. I have asked that Jeff integrate this capability into the OOB as he is working on it so that we execute it whenever a socket to an orted is unexpectedly closed. Meantime, the functionality will rarely get called, but at least the logic is available for anyone whose environment can support it. This commit was SVN r16451.	2007-10-15 18:00:30 +00:00
Josh Hursey	7437f37e96	This commit contains the following: * Fix some missing includes in a few places. * Add the cr_request() functionality to the BLCR CRS component. We are now dependent upon the 0.6.* series of BLCR. * Made the CR notification mechanism a registered function. This way we can have an OPAL-only version and it can be replaced at runtime with the ORTE version. * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only CR functionality when the user wants it. Default: Disabled. * Fix the placement of a checkpoint request check in MPI_Init * Pull the OPAL notification mechanism into the SnapC framework. * We no longer fork/exec the 'opal-checkpoint' command for local checkpointing, the Local coordinator in the orted does this directly. * The Local and Application coordinator talk together bypassing the OPAL notifiation mechanism. * Optimized the Local <-> App Coordinator communication. * Improved the structure used to track vpid_snapshots in the local coord. * Fix a race condition in which an application under heavy communication load may produce an inconsistent global checkpoint. This commit was SVN r16389.	2007-10-08 20:53:02 +00:00
Ralph Castain	54b2cf747e	These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC. The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component. This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done: As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in. In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in. The incoming changes revamp these procedures in three ways: 1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step. The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic. Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure. 2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed. The size of this data has been reduced in three ways: (a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes. To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose. (b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction. (c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using. While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly. 3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup. It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging. Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future. There are a few minor additional changes in the commit that I'll just note in passing: propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details. * requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details. * cleanup of some stale header files This commit was SVN r16364.	2007-10-05 19:48:23 +00:00
Ralph Castain	45767b038c	Ensure that no-daemonize is correctly set This commit was SVN r16079.	2007-09-10 14:50:54 +00:00
Jeff Squyres	8f10c285ef	Fix Coverity CID 466: remove unused variable / dead code. This commit was SVN r15998.	2007-08-29 01:25:03 +00:00
George Bosilca	2e2bf472ff	Mark the orte_abort function as noreturn and change the return value from int to void. This function call exit at the end, so there is no way to return from there. Apply the same thing to the errmsg_abort function and update all components. This commit was SVN r15704.	2007-07-31 16:09:52 +00:00
Jeff Squyres	d0137acaa4	If --debug-daemons-file is specified, it should also imply --debug-daemons. This commit was SVN r15640.	2007-07-26 17:49:13 +00:00
Ralph Castain	d99c764e75	Resolve a problem where the orte daemon comm functions were being accessed by mpirun while still retaining occasional reference to the orted_globals. Remove all dependence on orted_globals from the comm functions. Move those functions back into their own file to make it easier to maintain the separation. Ensure that mpirun ignores any "exit" commands being sent to daemons as it will exit on its own. This commit was SVN r15562.	2007-07-23 18:36:33 +00:00
Brian Barrett	5b9fa7e998	reapply r15517 and r15520, which were removed in r15527 so that I could get the RML/OOB merge in slightly easier This commit was SVN r15530. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95 r15520 --> open-mpi/ompi@9cbc9df1b8 r15527 --> open-mpi/ompi@2d17dd9516	2007-07-20 02:34:29 +00:00
Brian Barrett	39a6057fc6	A number of improvements / changes to the RML/OOB layers: * General TCP cleanup for OPAL / ORTE * Simplifying the OOB by moving much of the logic into the RML * Allowing the OOB RML component to do routing of messages * Adding a component framework for handling routing tables * Moving the xcast functionality from the OOB base to its own framework Includes merge from tmp/bwb-oob-rml-merge revisions: r15506, r15507, r15508, r15510, r15511, r15512, r15513 This commit was SVN r15528. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r15506 r15507 r15508 r15510 r15511 r15512 r15513	2007-07-20 01:34:02 +00:00
Brian Barrett	2d17dd9516	temporarily back our r15517 and 15520 so that I can get the RML / OOB changes to cleanly apply This commit was SVN r15527. The following SVN revision numbers were found above: r15517 --> open-mpi/ompi@41977fcc95	2007-07-20 01:10:34 +00:00
Ralph Castain	41977fcc95	Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but... Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point. Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings. This commit was SVN r15517.	2007-07-19 20:56:46 +00:00
Tim Prins	d2f0806420	Fix a problem introduced by r15390 which was causing strange failures in numerous areas. We no longer store whether we are a singleton in a MCA parameter, we now use a global constant. So all references to the MCA parameter must be removed. This commit was SVN r15408. The following SVN revision numbers were found above: r15390 --> open-mpi/ompi@bd65f8ba88	2007-07-13 17:52:16 +00:00
Ralph Castain	2bded34a1d	Fix a problem observed by Brian where processes launched local to mpirun lost their environment except for MCA params. The problem stemmed from no longer launching a local orted on the same node as mpirun. The orted would save and reuse the base environment. Mpirun didn't do that, and the odls was using the orted's globally saved environment (which wasn't being set). This fix establishes a globally accessible base launch environment that both the orted and mpirun can utilize. Since we now use that, we don't need to pass it to the odls_launch_proc function, so remove that param from the API (and modify all components to handle the change). This commit was SVN r15405.	2007-07-13 15:47:57 +00:00
Ralph Castain	bd65f8ba88	Bring in an updated launch system for the orteds. This commit restores the ability to execute singletons and singleton comm_spawn, both in single node and multi-node environments. Short description: major changes include - 1. singletons now fork/exec a local daemon to manage their operations. 2. the orte daemon code now resides in libopen-rte 3. daemons no longer use the orte triggering system during startup. Instead, they directly call back to their parent pls component to report ready to operate. A base function to count the callbacks has been provided. I have modified all the pls components except xcpu and poe (don't understand either well enough to do it). Full functionality has been verified for rsh, SLURM, and TM systems. Compile has been verified for xgrid and gridengine. This commit was SVN r15390.	2007-07-12 19:53:18 +00:00
Jeff Squyres	64083570f5	Add support for DDT parallel debugger, which required several things: * Making some symbols and types be global (vs. static) in orterun * Adding a "ddt" entry in the MCA parameter orte_base_user_debugger default value * Add support for @executable@, @executable_argv@, and @single_app@ tokens in the orte_base_user_debugger MCA parameter. * Added various error checks and corresponding help messages after finding a debugger in the PATH Fixes trac:1081 This commit was SVN r15323. The following Trac tickets were found above: Ticket 1081 --> https://svn.open-mpi.org/trac/ompi/ticket/1081	2007-07-10 12:53:48 +00:00
George Bosilca	de324502bc	Update the Windows wait functions. The most important change is for the event registration, which in the case of a process dead detection should be marked as fire once and taking long time. This commit was SVN r15068.	2007-06-14 04:35:46 +00:00
Brian Barrett	84d1512fba	Add the potential for doing some basic error checking on mutexes during single threaded builds. In its default configuration, all this does is ensure that there's at least a good chance of threads building based on non-threaded development (since the variable names will be checked). There is also code to make sure that a "mutex" is never "double locked" when using the conditional macro mutex operations. This is off by default because there are a number of places in both ORTE and OMPI where this alarm spews mega bytes of errors on a simple test. So we have some work to do on our path towards thread support. Also removed the macro versions of the non-conditional thread locks, as the only places they were used, the author of the code intended to use the conditional thread locks. So now you have upper-case macros for conditional thread locks and lowercase functions for non-conditional locks. Simple, right? :). This commit was SVN r15011.	2007-06-12 16:25:26 +00:00
Ralph Castain	85df3bd92f	Bring in the generalized xcast communication system along with the correspondingly revised orted launch. I will send a message out to developers explaining the basic changes. In brief: 1. generalize orte_rml.xcast to become a general broadcast-like messaging system. Messages can now be sent to any tag on the daemons or processes. Note that any message sent via xcast will be delivered to ALL processes in the specified job - you don't get to pick and choose. At a later date, we will introduce an augmented capability that will use the daemons as relays, but will allow you to send to a specified array of process names. 2. extended orte_rml.xcast so it supports more scalable message routing methodologies. At the moment, we support three: (a) direct, which sends the message directly to all recipients; (b) linear, which sends the message to the local daemon on each node, which then relays it to its own local procs; and (b) binomial, which sends the message via a binomial algo across all the daemons, each of which then relays to its own local procs. The crossover points between the algos are adjustable via MCA param, or you can simply demand that a specific algo be used. 3. orteds no longer exhibit two types of behavior: bootproxy or VM. Orteds now always behave like they are part of a virtual machine - they simply launch a job if mpirun tells them to do so. This is another step towards creating an "orteboot" functionality, but also provided a clean system for supporting message relaying. Note one major impact of this commit: multiple daemons on a node cannot be supported any longer! Only a single daemon/node is now allowed. This commit is known to break support for the following environments: POE, Xgrid, Xcpu, Windows. It has been tested on rsh, SLURM, and Bproc. Modifications for TM support have been made but could not be verified due to machine problems at LANL. Modifications for SGE have been made but could not be verified. The developers for the non-verified environments will be separately notified along with suggestions on how to fix the problems. This commit was SVN r15007.	2007-06-12 13:28:54 +00:00
Brian Barrett	1d11cc4b2d	Fix mis-declared variable type This commit was SVN r14994.	2007-06-11 16:48:35 +00:00
Ralph Castain	983fd3432a	Fix singleton comm_spawn. Ensure that singleton's start the RML receive function so they can receive RML updates during xconnect procedures once any comm_spawn'd children start. Since singleton's only use the RMGR/URM component, update that component to also hold us until xconnect is completed (if it is invoked) before returning to the caller. This commit was SVN r14914.	2007-06-06 17:39:23 +00:00
Brian Barrett	508da4e959	OS X apparently really doesn't like shared libraries with unresolvable symbols in them and environ is defined only in the final application (probably in crt1.o). Apple provides a function for getting at the environment, so use that instead if it's available. This commit was SVN r14857.	2007-06-05 03:03:59 +00:00
Brian Barrett	34fea87819	* Only need to to the opal_progress_event_users_increment() once between OPAL and ORTE. Since we now do opal_progress_init(), we do it there. Fixes a performance issue introduced in r14773. * While trying to find the above, notived that we did the reference counting for the init in init_util and for finalize in fini. That isn't right, so make them both in the non-util versions. This commit was SVN r14830. The following SVN revision numbers were found above: r14773 --> open-mpi/ompi@1e678c3f55	2007-06-01 02:43:46 +00:00
George Bosilca	905570a6d2	Call opal_show_help with the expected number of arguments. This commit was SVN r14802.	2007-05-30 18:49:43 +00:00
Josh Hursey	1e678c3f55	per conversation with Ralph and Jeff take out the opal_init_only logic. This commit moves the initalization/finalization of opal_event and opal_progress to opal_init/finalize. These were previously init/final in ORTE which is an abstraction violation. After talking about it we concluded that there are no ordering issues that require these to be init/final in ORTE instead of OPAL. I ran the IBM test suite against this commit and it didn't turn up any new failures so I think it is good to go. Let us know if this causes problems. This commit was SVN r14773.	2007-05-24 21:54:58 +00:00
Ralph Castain	b582d98d4a	Fix singleton comm_spawn so it can see available resources This commit was SVN r14719.	2007-05-22 13:29:07 +00:00
Ralph Castain	677eb5e4bc	Ensure the singleton wakes up when comm_spawn fails This commit was SVN r14714.	2007-05-21 20:13:31 +00:00
Ralph Castain	4fff584a68	Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that did start. The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system. Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed. Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief. With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn. Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put". This commit was SVN r14711.	2007-05-21 18:31:28 +00:00
Josh Hursey	596062d34b	Seems that the recent changes in the sds and oob exposed some invalid assumptions in the FT restart code for the ORTE layer. This fixes those problems by having the RML completely shutdown and restart the OOB framework (instead of just the module as before). This makes it much easier to manage, and maintainable as the OOB changes in the future. The SDS now does communication as part of its startup procedure, so we need to make sure we restart the RML before the SDS so that it can communicate properly. OOB base [close\|open] used a static bool to determine if they have been called previously or not. I needed to expose this boolean so that I can close() then open() the oob base in the restart procedure. The functionality has not changed, we just now have the ability to open/close the framework as many times as we need to as long as we always call them in that order. (So calling open twice in a row is not allowed as before, it is only allowed if you open(), close(), then open() again). Things seem to be working now. This commit was SVN r14515.	2007-04-25 19:51:52 +00:00
Ralph Castain	18cb5c9762	Complete modifications for failed-to-start of applications. Modifications for failed-to-start of orteds coming next. This completes the minor changes required to the PLS components. Basically, there is a small change required to the parameter list of the orted cmd functions. I caught and did it for xcpu and poe, in addition to the components listed in my email - so I think that only leaves xgrid unconverted. The orted fail-to-start mods will also make changes in the PLS components, but those can be localized so they come in one at a time. This commit was SVN r14499.	2007-04-24 20:53:54 +00:00
Ralph Castain	18b2dca51c	Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly. There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd really have to try). This also involved a slight change to the oob.xcast API, so propagated that as required. Note: this has only been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-) Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately. This commit was SVN r14475.	2007-04-23 18:41:04 +00:00
Ralph Castain	f47e7382e3	Add a new function to wake orterun up - used in failed-to-start scenarios, but can be used anytime a lower level needs to ensure orterun wakes up This commit was SVN r14466.	2007-04-23 12:49:25 +00:00
George Bosilca	ef4baeb6ab	Don't reset the pid, as at this point it is already set. This commit was SVN r14235.	2007-04-05 20:13:50 +00:00
George Bosilca	f2a6b9394f	Deal with the include spree. Protect "environ" on Windows. Some others minors modifications in order to make it compile [again] on Windows. This commit was SVN r14188.	2007-04-01 16:16:54 +00:00
Josh Hursey	dadca7da88	Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158	2007-03-16 23:11:45 +00:00
George Bosilca	7750ed22e0	Correct the Windows part of the universe detection. This commit was SVN r13547.	2007-02-07 22:37:28 +00:00
Rolf vandeVaart	bf5113198d	Update to orte-clean so it will remove files on local and remote nodes. It will also kill off rogue orteds and orterun processes. The killing of processes is ifdef'ed out for Windows since I do not know how to do it there. Note that this change will requite an autogen. This commit was SVN r13477.	2007-02-03 00:25:42 +00:00
Jeff Squyres	78a13bc3ea	Fix the MPI_ABORT problem. We added an orte_initialized variable yesterday and set it to "true" in orte_init(). But ompi_mpi_init() doesn't call orte_init() -- it calls orte_init_stage1() and orte_init_stage2(). So orte_initialized was never set to true, and Badness happend from there (w.r.t. ompi_mpi_abort()). This patch moves the setting of orte_initialized to orte_init_stage2() so that everyone will always get it set properly. It also moves setting orte_universe_info.state to RUNNING into stage2() as well -- Ralph confirmed that that should have been there for the same reasons that orte_initialized needs to be there. This commit was SVN r13374.	2007-01-30 23:00:43 +00:00
Jeff Squyres	e90b3e415b	* Before this commit, if we called ompi_mpi_abort() before MPI_INIT completed successfully, Bad Things(tm) could happen. * Now we explicitly check orte_initialized (a new global in ORTE indicating whether we are between orte_init() and orte_finalize() or not), and if so, react accordingly. * If ORTE is initialized, use orte_system_info.nodename; otherwise, use gethostname(). * Add loop protection to ensure that ompi_mpi_abort() is not invoked multiple times recursively. This commit was SVN r13354.	2007-01-29 22:01:28 +00:00
Jeff Squyres	974dcebf9f	Finish backing out r13316 by also removing the comments that it insertted. This commit was SVN r13324. The following SVN revision numbers were found above: r13316 --> open-mpi/ompi@35c1370a13	2007-01-26 13:09:18 +00:00
George Bosilca	29597cf0c5	We need to initialize the ODLS as they are the only one to define the ORTE_DAEMON_CMD type. Which, unfortunately, is used all over the place. Without this, we get error: [msc01:12341] [0,0,0] ORTE_ERROR_LOG: Data pack failed in file ../../ompi-trunk/orte/dss/dss_pack.c at line 83 [msc01:12341] [0,0,0] ORTE_ERROR_LOG: Data pack failed in file ../../ompi-trunk/orte/dss/dss_pack.c at line 58 [msc01:12341] [0,0,0] ORTE_ERROR_LOG: Data pack failed in file ../../../../ompi-trunk/orte/mca/pls/base/pls_base_orted_cmds.c at line 136 This commit was SVN r13320.	2007-01-26 04:32:15 +00:00
Ralph Castain	0905dfdfba	Make sure the params.h file gets included in the tarballs This commit was SVN r13318.	2007-01-26 03:05:30 +00:00
Rich Graham	35c1370a13	odls components are handled only by daemon procs. This commit was SVN r13316.	2007-01-25 21:18:59 +00:00
Ralph Castain	ab5ea61100	Bring over the rest of the ctrl-c fixes. This commit includes: 1. add a "cancel_operation" API to the pls components that allows orterun to demand that an orted operation (e.g., terminate_job) be immediately cancelled and abandoned. 2. changes the pls orted commands from blocking to non-blocking. This allows us to interrupt those operations should an orted be non-responsive. The change also adds an orte_abort_timeout that limits how long orterun will automatically wait for the orteds to respond - if the terminate command, for example, doesn't see orted response within that time, then we printout an appropriate error message and just give up. 3. modifies orterun to allow multiple ctrl-c's to simply abort the program even if the orteds have not responded 4. does some cleanup on the orte-level mca params so that their implementation looks a lot more like that of ompi - makes it easier to maintain. This change also includes the definition of an orte_abort_timeout struct and associated MCA param (can't have too many!) so you can set the time after which orterun gives up on waiting for orteds to respond This needs more testing before migrating to 1.2. This commit was SVN r13304.	2007-01-25 14:17:44 +00:00
George Bosilca	5711583bdf	Force only one thread to come out from the socket engine. This commit was SVN r13298.	2007-01-25 07:36:42 +00:00
George Bosilca	950b07d860	Work around the Windows sockets model. This commit was SVN r13294.	2007-01-25 00:19:02 +00:00
George Bosilca	3169a29da4	Revert commit r13235. This commit was SVN r13238. The following SVN revision numbers were found above: r13235 --> open-mpi/ompi@2636881324	2007-01-22 06:46:58 +00:00
George Bosilca	2636881324	Remove unused variables. This commit was SVN r13235.	2007-01-22 05:46:57 +00:00
George Bosilca	93c3e3a21f	__WINDOWS__ is defined or not. This commit was SVN r13234.	2007-01-22 05:46:30 +00:00
Brian Barrett	2755d5ccef	Only do Windows things if we're on Windows. Need another case for when we don't have windows and we don't have waitpid() (ie, the Cray) This commit was SVN r13173.	2007-01-17 23:16:52 +00:00
George Bosilca	409d1b8a8d	Make the universe creation functions Windows friendly again. This commit was SVN r13041.	2007-01-08 21:58:57 +00:00
Rolf vandeVaart	fdf44cc4ab	Add the ability to not only report broken files and directories, but remove them also. This current set of changes will affect nothing as no one is making use of this ability. However, orte-clean will be changed soon to utilize this new feature. This commit was SVN r12996.	2007-01-04 21:48:34 +00:00
Ralph Castain	90f5e3fad8	Fix a buglet in the singleton startup procedure. For purposes of minimizing the xcast message, we "strip" the descriptive info on all subscription messages. This means, though, that we have to store the process name and other info so it can be retrieved in the body of the subscription data (as opposed to in the description). This wasn't being done for singletons because they don't call the RMAPS to "map" themselves. This has now been corrected. The singleton startup will dutifully call the mapper framework so that the proper data storage locations get initialized. Unfortunately, we then had to instruct the RMAPS not to allocate a vpid range for this job - otherwise, it would make a mistake and think there were two processes in it. Hence, a change was required to RMAPS to tell it "map this job, but don't allocate a vpid range for it". This change will need to migrate across to 1.2 after it "soaks" the appropriate time. This commit was SVN r12952.	2007-01-02 16:14:44 +00:00
Ralph Castain	82946cb220	Add a new option to orterun: "--do-not-launch" directs the system to do the allocation, map, job setup, etc., but don't actually launch the job. This lets us test all the setup portions of the code. Also, take the first step in updating how we handle mca params in ORTE - bring it closer to how it is done in the other two layers. Much more work to be done here. This commit was SVN r12838.	2006-12-13 04:51:38 +00:00
Brian Barrett	6f8b366acb	Rename liborte to libopen-rte and libopal to libopen-pal per telecon today and bug #632. Refs trac:632 This commit was SVN r12762. The following Trac tickets were found above: Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632	2006-12-05 18:27:24 +00:00
Rainer Keller	e61dd8722e	- Silence compiler on ORTE_TRANSPORT_KEY_FMT, it is fixed to llx - No functional changes, just indentation and corrections to error output. This commit was SVN r12734.	2006-12-03 13:59:23 +00:00
Rainer Keller	b63500f62c	- Dont unlock ompi_rte_mutex unconditionally, use the macro instead. This commit was SVN r12655.	2006-11-22 21:01:43 +00:00
Ralph Castain	6fca1431f3	Back out some prior commits. These commits fixed bproc so it would run, but broke several other things (singleton comm_spawn and hostfile operations have been identified so far). Since bproc is the culprit here, let's leave bproc broken for now - I'll work on a fix for that environment that doesn't impact everythig else. This commit was SVN r12648.	2006-11-22 13:30:21 +00:00
Brian Barrett	33320b7165	Rework the opal_progress interface to better support dynamic processes and at the same time, remove some of the MPI-related options from OPAL: - provide mechanism to change at runtime whether sched_yield() should be called when the progress engine is idle - provide mechanism for changing the rate at which the event engine is called when there are "no" users of the event engine (ie, when using MPI but not TCP) - fix some function names in the progress engine to better match their intended use (and remove MPI naming scheme) - remove progress_mpi_enable / progress_mpi_disable because we can now use the functions to set the sched_yield and tick rate interfaces - rename opal_progress_events() to opal_progress_set_event_flag() because the first really isn't descriptive of what the function does and I always got confused by it This commit was SVN r12645.	2006-11-22 02:06:52 +00:00
Ralph Castain	9f3dcd147a	Round and round the mulberry bush we go... Fix comm_spawn by singletons. orte_init does some voodoo to let the system know about localhost when we are a singleton. This includes allocating it so that any comm_spawn'd children can use their parent "allocation". Unfortunately, the fix that bproc needs (due to that smr filling up the node segment!) causes the singleton startup to fail. The fix is to just have the singleton startup force an allocation of its localhost. Only issue here is: what happens if we are in a persistent universe? The singleton will now overwrite any prior info on slots used on localhost by other jobs (won't affect anything else). The answer, of course, is to do something more intelligent - lookup localhost on the registry and just update its info instead of overwriting it. Something for another day (or month....or year) This commit was SVN r12644.	2006-11-21 21:51:58 +00:00
Ralph Castain	6d6cebb4a7	Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it. I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn). This commit was SVN r12597.	2006-11-14 19:34:59 +00:00
Ralph Castain	a3be8261fb	Fix a bug that had us generate an error message and abort startup when there were stale universe directories around. Now, we just ignore them. This commit was SVN r12472.	2006-11-07 21:34:57 +00:00
Ralph Castain	194bdd413b	Cleanup the problem of connecting to default universes. This commit was SVN r12438.	2006-11-06 15:28:38 +00:00
Ralph Castain	7a77ef0ae3	Given the amount of pain singletons cause, one can't help but wonder if it REALLY was that much trouble for people to type "mpirun -n 1 foo"....sigh. Get the ordering right so that a singleton can start. Protect the rmgr copy app_context function from NULL fields Tell the mapper it is okay for there not to be a pre-existing mapping plan for a parent when dynamically spawning processes This commit was SVN r12257.	2006-10-23 15:15:45 +00:00
Ralph Castain	153e38ffc9	Lesson to be learned: if you send an ack to a recv'd command, better not send it to the same tag it came from - at least, not if there is a persistent recv on that tag! Fix the persistent daemon problem where it was exiting when a job completed. Problem was that the persistent daemon would order the job daemons to exit. They would then send an 'ack' back to the persistent daemon - but the ack consisted of an echo of the "exit" command, which was recv'd by the wrong listener who treated it as a properly sent cmd....and exited. This commit was SVN r12243.	2006-10-21 02:53:19 +00:00
Ralph Castain	13227e36ab	This commit looks a lot bigger than it is, so relax :-) Fix the problem observed by multiple people that comm_spawned children were (once again) being mapped onto the same nodes as their parents. This was caused by going through the RAS a second time, thus overwriting the mapper's bookkeeping that told RMAPS where it had left off. To solve this - and to continue moving forward on the ORTE development - we introduce the concept of attributes to control the behavior of the RM frameworks. I defined the attributes and a list of attributes as new ORTE data types to make it easier for people to pass them around (since they are now fundamental to the system, and therefore we will be packing and unpacking them frequently). Thus, all the functions to manipulate attributes can be implemented and debugged in one place. I used those capabilities in two places: 1. Added an attribute list to the rmgr.spawn interface. 2. Added an attribute list to the ras.allocate interface. At the moment, the only attribute I modified the various RAS components to recognize is the USE_PARENT_ALLOCATION one (as defined in rmgr_types.h). So the RAS components now know how to reuse an allocation. I have debugged this under rsh, but it now needs to be tested on a wider set of platforms. This commit was SVN r12138.	2006-10-17 16:06:17 +00:00
Rainer Keller	3f88937081	- Error logging is really not yet enabled. - Correct the error log for orte_errmgr_base_select - Spelling fixes This commit was SVN r12135.	2006-10-17 09:11:20 +00:00
Ralph Castain	37dfdb76eb	Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done. This commit was SVN r11661.	2006-09-14 21:29:51 +00:00
George Bosilca	f52c10d18e	And ORTE is ready for prime-time. All Windows tricks are in: - use the OPAL functions for PATH and environment variables - make all headers C++ friendly - no unamed structures - no implicit cast. Plus a full implementation for the orte_wait functions. This commit was SVN r11347.	2006-08-23 03:32:36 +00:00
George Bosilca	6afa4c6c64	Windows friendly version. We have to split the OMPI_DECLSPEC in at least 3 different macros, one for each project. Therefore, now we have OPAL_DECLSPEC, ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project. This commit was SVN r11270.	2006-08-20 15:54:04 +00:00

... 2 3 4 5 6 ...

411 Коммитов