1
1

2161 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
49b60029e6 * Set svn:ignore
* Fix filename in Makefile.am

This commit was SVN r20868.
2009-03-25 13:32:55 +00:00
Shiqing Fan
36a813415d When build from a tarball, there will be Linux-generated files that could not be used on Windows, so exclude them, and use the ones generated by CMake.
This commit was SVN r20858.
2009-03-24 18:10:57 +00:00
Ralph Castain
d0b50a2b9b Eliminate an annoying "not found" message when a job abnormally terminates. In this case, we can get a race condition where the job object has been removed, but updates are continuing to flow into the system.
This commit was SVN r20857.
2009-03-24 18:06:49 +00:00
Ralph Castain
9564ab8cc3 Resolve a user-reported problem where the ifislocal check would return false, but the node name is the same as the HNP's node name because we are on the same node.
This commit was SVN r20856.
2009-03-24 15:23:22 +00:00
George Bosilca
faca1aeeb9 Support for password structures without a specified shell or a NULL/empty shell.
This commit was SVN r20853.
2009-03-24 13:26:57 +00:00
Rainer Keller
be66cc2279 - We're using uint16_t, uint32_t, and friends,
so #include <stdint.h> if we have it...

This commit was SVN r20835.
2009-03-21 01:26:27 +00:00
Rainer Keller
bff1b2a22b - Finally add the missing opal/util/output.h
for the OPAL_OUTPUT_VERBOSE macro.
 - ompi/errhandler/errhandler_predefined.h:
   Well, just the missing fwd declarations...

This commit was SVN r20820.
2009-03-17 22:37:15 +00:00
Rainer Keller
64dcd85ba1 - This one was missing
This commit was SVN r20818.
2009-03-17 22:02:51 +00:00
Rainer Keller
6f808d9b05 Preparation work for another commit (after RFC):
- This patch solely _adds_ required headers and is rather localized
   The next patch (after RFC) heavily removes headers (based on script)
 - ompi/communicator/communicator.h: For sources that use
   ompi_mpi_comm_world, don't require them to include "mpi.h"
 - ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs
   #include "ompi/mca/topo/topo.h"
 - ompi/errhandler/errhandler_predefined.h:
   ompi/communicator/communicator.h depends on this header file!
   To prevent recursion just have fwd declarations.
   #include "ompi/types.h" for fwd declarations of the main structs.
 - ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and
   ompi_rb_tree_t, so have the proper classes
 - ompi/mca/op/op.h:
   Op is pretty self-contained: Nobody up to now has done
   #include "opal/class/opal_object.h"
 - ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h:
   #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/pml/base/base.h:
   We use opal_lists  
 - ompi/mca/pml/dr/pml_dr_vfrag.h:
   #include "opal/types.h" for ompi_ptr_t
 - ompi/mca/pml/ob1/pml_ob1_hdr.h:
   #include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t
 - opal/dss/dss_unpack.c:
   #include "opal/types.h"
 - opal/mca/base/base.h:
   #include "opal/util/cmd_line.h" for opal_cmd_line_t
 - orte/mca/oob/tcp/oob_tcp.c:
   #include "opal/types.h" for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp.h:
   #include "opal/threads/threads.h" for opal_thread_t
 - orte/mca/oob/tcp/oob_tcp_msg.c:
   #include "opal/types.h" 
 - orte/mca/oob/tcp/oob_tcp_peer.c:
   #include "opal/types.h"  for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp_send.c:
   #include "opal/types.h" 
 - orte/mca/plm/base/plm_base_proxy.c:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT
 - orte/mca/rml/base/rml_base_receive.c:
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/mca/rml/oob/rml_oob_recv.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/mca/rml/oob/rml_oob_send.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/runtime/orte_data_server.c
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/runtime/orte_globals.h:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT

 Tested on Linux/x86-64

This commit was SVN r20817.
2009-03-17 21:34:30 +00:00
Ralph Castain
4af623076d Add a test for hanging in a loop over mpi_reduce
This commit was SVN r20798.
2009-03-17 13:57:23 +00:00
Jeff Squyres
b5c38f74b0 Always tie the child stdin to /dev/null.
This commit was SVN r20796.
2009-03-17 03:17:50 +00:00
Rainer Keller
6a72c0f4d1 - As long as a header declares _DECLSPEC functionality
it should include the corresponding _config.h header file.

   Tested on Linux/x86-64

This commit was SVN r20795.
2009-03-17 01:45:19 +00:00
Jeff Squyres
27bacbee3c Per discussion on #1833, we should ''always'' dup /dev/null into ssh's
stdin.

This commit was SVN r20790.
2009-03-16 19:12:48 +00:00
George Bosilca
02bee12de8 Small cleanup.
This commit was SVN r20781.
2009-03-14 23:11:24 +00:00
Brian Barrett
716b505789 Fix for #1832. The stdin for the forked ssh wasn't being closed, so it was
"eating" all the stdin, leaving nothing for mpirun to forward on to the child
process

This commit was SVN r20776.
2009-03-13 22:34:56 +00:00
Rainer Keller
d8cf4c0fec - Get pgcc on XT to complain less:
In case we use memcmp, strlen, strup and friends include <string.h>
   Also several constants.h are not included directly
 - Let's have mca_topo_base_cart_create  return ompi-errors in
   ompi/mca/topo/base/topo_base_cart_create.c

This commit was SVN r20773.
2009-03-13 02:10:32 +00:00
Rainer Keller
0b59a59129 - Rather have 0xff instead of 0Xff...
This commit was SVN r20769.
2009-03-12 22:17:42 +00:00
Terry Dontje
d97e398b49 Clarify the error message for out of pipes condition.
This commit was SVN r20762.
2009-03-11 17:48:46 +00:00
Rolf vandeVaart
4d7071dadb Add some explanation of the -x flag as we have
gotten questions on its use.

This commit was SVN r20760.
2009-03-11 13:25:03 +00:00
Rolf vandeVaart
2b365d7d90 Fix so it builds on Solaris.
This commit was SVN r20758.
2009-03-10 18:38:42 +00:00
Jeff Squyres
4e53885f73 Fix a compiler warning and ensure that "sent" is initialized to 0.
This commit was SVN r20756.
2009-03-09 15:37:04 +00:00
Jeff Squyres
8b5e6c0425 Because I could. :-)
Relevant MCA params:

 * notifier_twitter_username: Twitter username
 * notifier_twitter_password: Twitter password

This commit was SVN r20750.
2009-03-06 22:02:17 +00:00
Jeff Squyres
2373bc36e2 Add the "smtp" notifier component. It uses libesmtp
(http://www.stafford.uklinux.net/libesmtp/) via the --with-esmtp(=DIR)
configure option.  Several MCA parameters must be set in order to use
this component:

 * notifier_smtp_server: SMTP server IP address or name; must be supplied
 * notifier_smtp_port: port to talk to on the server; defaults to 25
 * notifier_smtp_to: comma-delimited list of email addresses to send
   the mail to; must be supplied
 * notifier_smtp_from_name: free-form "name" who the mail is from;
   defaults to "Open MPI Notifier"
 * notifier_smtp_from_addr: email address from the mail is from; must
   be supplied
 * notifier_smtp_subject: subject of the mail; defaults to "Open MPI
   notifier"
 * notifier_smtp_body_prefix: prefix of the body of the mail; defaults
   to a sensible value
 * notifier_smtp_body_suffix: suffix of the body of the mail; defaults
   to a sensible value

Also libesmtp supports SMTP AUTH protocols, this component does not.
If people want/need those kinds of features, they're relatively easy
to add -- I just didn't bother [yet] before I knew if anyone cared.

This commit was SVN r20749.
2009-03-06 21:59:19 +00:00
Jeff Squyres
c17616c332 Change the ordering slightly; don't save anything until we know all
went well.

This commit was SVN r20748.
2009-03-06 21:49:38 +00:00
Shiqing Fan
ddc82f3831 Correct the output global variables.
This commit was SVN r20745.
2009-03-06 15:31:12 +00:00
Shiqing Fan
a8cb7d2ab1 Fix an array over flow, detected by compiling with C++ compilers. This fix is mainly for Windows build.
This commit was SVN r20744.
2009-03-06 13:27:38 +00:00
Rainer Keller
ec0ed48718 - Revert r20739
This commit was SVN r20742.

The following SVN revision numbers were found above:
  r20739 --> open-mpi/ompi@781caee0b6
2009-03-05 21:56:03 +00:00
Rainer Keller
a94438343b - Revert r20740
This commit was SVN r20741.

The following SVN revision numbers were found above:
  r20740 --> open-mpi/ompi@2a70618a77
2009-03-05 21:50:47 +00:00
Rainer Keller
2a70618a77 - Second patch, as discussed in Louisville.
Replace short macros in orte/util/name_fns.h
   to the actual fct. call.

 - Compiles on linux/x86-64

This commit was SVN r20740.
2009-03-05 21:14:18 +00:00
Rainer Keller
781caee0b6 - First of two or three patches, in orte/util/proc_info.h:
Adapt orte_process_info to orte_proc_info, and
   change orte_proc_info() to orte_proc_info_init().
 - Compiled on linux-x86-64
 - Discussed with Ralph

This commit was SVN r20739.
2009-03-05 20:36:44 +00:00
Rainer Keller
fd28b392bf - An intrusive commit yet again (sorry): with the separation we
get bitten by header depending on having already included
   the corresponding [opal|orte|ompi]_config.h header.
   When separating, things like [OPAL|ORTE|OMPI]_DECLSPEC
   are missed.

   Script to add the corresponding header in front of all following
   (taking care of possible #ifdef HAVE_...)

 - Including some minor cleanups to
   - ompi/group/group.h -- include _after_ #ifndef OMPI_GROUP_H
   - ompi/mca/btl/btl.h -- nclude _after_ #ifndef MCA_BTL_H
   - ompi/mca/crcp/bkmrk/crcp_bkmrk_btl.c -- still no need for
     orte/util/output.h
   - ompi/mca/pml/dr/pml_dr_recvreq.c -- no need for mpool.h
   - ompi/mca/btl/btl.h -- reorder to fit
   - ompi/mca/bml/bml.h -- reorder to fit
   - ompi/runtime/ompi_mpi_finalize.c -- reorder to fit
   - ompi/request/request.h -- additionally need ompi/constants.h

 - Tested on linux/x86-64

This commit was SVN r20720.
2009-03-04 15:35:54 +00:00
Rainer Keller
84408f2fb7 - A follow-up to the commit:
As orte/mca/routed/base/base.h does not require opal_bitmap.h
   Include it in the C-files, based on routed/base/base.h...

This commit was SVN r20709.
2009-03-03 22:36:58 +00:00
George Bosilca
af9c2e10a3 Really cycle when we have several IP addresses.
This commit was SVN r20705.
2009-03-03 19:29:03 +00:00
Ralph Castain
f11931306a Modify the accounting system to recycle jobids. Properly recover resources from nodes and jobs upon completion. Adjustments in several places were required to deal with sparsely populated job, node, and proc arrays as a result of this change.
Correct an error wrt how jobids were being computed. Needed to ensure that the job family field was not overrun as we increment jobids for comm_spawn.

Update the slurm plm module so it uses the new slurm termination procedure (brings trunk back into alignment with 1.3 branch).

Update the slurmd ess component so it doesn't get selected if we are running a singleton inside of a slurm allocation.

Cleanup HNP init by moving some code that had been in orte_globals.c for historical reasons into the ess hnp module, and removing the call to that code from the ess_base_std_prolog


NOTE: this change allows orte to support an infinite aggregate number of comm_spawn's, with up to 64k being alive at any one instant. HOWEVER, the MPI layer currently does -not- support re-use of jobids. I did some prototype coding to revise the ompi_proc_t structures, but the BTLs are caching their own data, and there was no readily apparent way to update it. Thus, attempts to spawn more than the 64k limit will abort to avoid causing the MPI layer to hang.

This commit was SVN r20700.
2009-03-03 16:39:13 +00:00
Ralph Castain
fb1ecb7a45 Fix orted termination so we get the #@# relay out before we exit ourselves.
Minor change in the way we respond to job info requests - needed for coming change.

This commit was SVN r20698.
2009-03-03 13:38:29 +00:00
Jeff Squyres
d5eddc7541 Some minor fixups / patches from Bert Wesarg.
This commit was SVN r20697.
2009-03-03 13:09:19 +00:00
Jeff Squyres
f81d357c53 Free a little memory. Thanks for the patch from Bert Wesarg.
This commit was SVN r20694.
2009-03-03 12:33:43 +00:00
Jeff Squyres
f8daa60b1b Fix typo noted by Bery Wesarg.
This commit was SVN r20693.
2009-03-03 12:16:57 +00:00
George Bosilca
02de7846f8 Correctly tag the help message.
This commit was SVN r20683.
2009-03-02 22:10:45 +00:00
Josh Hursey
6d79a0398d Fix a bounds check that prevented some vpid resolution in certian launch scenarios.
Traced back to r20629.

This commit was SVN r20675.

The following SVN revision numbers were found above:
  r20629 --> open-mpi/ompi@dcff523244
2009-03-02 18:26:48 +00:00
Ralph Castain
c7fda41d2a Only remove children from the local child list when the job completes so we update the status on all procs in the job and can properly terminate the job.
Correct an error in a debugging output

This commit was SVN r20669.
2009-03-01 20:12:20 +00:00
Ralph Castain
47cfccbb49 Update a couple of tests
This commit was SVN r20668.
2009-03-01 15:32:32 +00:00
Ralph Castain
15171e4ba8 Remove completed children from the local list of child processes so that we properly track our number of children. Otherwise, we can artificially believe we have exceeded system limits on the number of local children.
This commit was SVN r20667.
2009-03-01 15:31:27 +00:00
Ralph Castain
f0fcaf8b32 For some reason, the buffer gets trashed, so for now, let's process and then relay...until I can figure out the race condition that is causing the problem.
This commit was SVN r20665.
2009-03-01 01:24:02 +00:00
Ralph Castain
c2ff8dc5ce Fix notifier base functions to match revised notifier.h framework APIs
This commit was SVN r20663.
2009-02-28 23:46:18 +00:00
Ralph Castain
11979c100a Silence pointless compiler warning
This commit was SVN r20661.
2009-02-28 15:35:48 +00:00
Tim Mattox
57be80c983 First pass at integrating the CIFTS/FTB support as
a notifier module.
The Notifier framework was extended slightly to
convey more information about each event notice.
This works with the FTB v0.5 API.

To compile with FTB support, use --with-ftb=/path/to/ftb/install

CIFTS == Coordinated Infrastructure for Fault Tolerant Systems
FTB == Fault Tolerance Backplane
see http://wiki.mcs.anl.gov/cifts/index.php

This commit was SVN r20655.
2009-02-27 22:53:43 +00:00
Ralph Castain
7e5dc8f2be Ensure that we turn off stdin read event when ctrl-c terminating a program
This commit was SVN r20654.
2009-02-27 15:01:28 +00:00
Ralph Castain
b8ffa302da Separate abnormal job termination from abnormal orted termination so we can continue to use xcast for orted cmds, but can know to turn off reading of stdin as the job is being terminated.
This commit was SVN r20650.
2009-02-27 10:16:25 +00:00
Ralph Castain
4f75f6e443 Fix a bug where we were not stopping the read event on stdin if the write to stdin of the target process was backing up.
Ensure we stop reading stdin if we are abnormally terminating - no point in doing so since the job is being terminated.

This commit was SVN r20649.
2009-02-27 09:31:34 +00:00