1
1

1955 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
b661f160ba Add new "command" notifier component. This component allows forking
any arbitrary command as a notifier, potentially allowing just about
anything to be a notifier.  This component forks a child during
orte_init() to avoid forking problems with some OS-bypass networks.

The following MCA parameters are available:

notifier_command_cmd:
  Default: /sbin/initlog -f $s -n "Open MPI" -s "$S: $m (errorcode: $e)"
  Command to execute, with substitution.  $s = integer severity; $S =
  string severity; $e = integer error code; $m = string message

notifier_command_timeout:
  Default: 30
  Timeout (in seconds) of the command

This commit was SVN r21076.
2009-04-27 13:40:36 +00:00
Jeff Squyres
e5103e1f3d Actually, make than an enum instead of a #define
This commit was SVN r21075.
2009-04-27 12:50:53 +00:00
Jeff Squyres
40990c1982 Add a NOTICE notifier severity
This commit was SVN r21074.
2009-04-27 12:47:54 +00:00
Jeff Squyres
df54a00b1e Minor comment fix
This commit was SVN r21073.
2009-04-27 12:47:30 +00:00
Shiqing Fan
4067d739b4 Add a new CMake module for finding Windows CCP libraries, so that we can remove the library files in the source tree.
This commit was SVN r21071.
2009-04-24 16:51:31 +00:00
Shiqing Fan
d8f61695cd Remove a few .ompi_ignore, and add configure scripts, so that the source files of these components will be put into the tarball.
This commit was SVN r21070.
2009-04-24 16:49:01 +00:00
Shiqing Fan
3d4e0472d6 Add windows support files into the tarball, including .windows, CMakeLists.txt files, and CMake modules. Thanks to Jeff for testing it on Linux.
This commit was SVN r21069.
2009-04-24 16:39:33 +00:00
Jeff Squyres
61b2f0b9ea Fix typo preventing compilation (!)
This commit was SVN r21016.
2009-04-15 16:50:41 +00:00
Ralph Castain
96d1ff506e Be silent when the lifeline is lost - we don't want to hear about it
This commit was SVN r21011.
2009-04-15 01:34:04 +00:00
Ralph Castain
8b0a470543 Continue work to cleanup user options for slave launch
This commit was SVN r21003.
2009-04-14 20:05:51 +00:00
Ralph Castain
a952dca062 Fix a bug where we created the correct path to the file, but didn't use it
This commit was SVN r20990.
2009-04-14 14:17:43 +00:00
Ralph Castain
9c39a3edd7 Enable the passing of MCA params to dynamically spawned jobs. This creates a new info_key "ompi_param" that allows a user to specify MCA params for a dynamically spawned job.
We currently apply all of the MCA params in the parent job to the child. This commit allows a user to specify additional params for the child job, and to override any pre-existing params with the new value so they can better control behavior of the child job.

This commit was SVN r20989.
2009-04-14 14:15:49 +00:00
Ralph Castain
9c2f17eb01 Cleanup the nidmap lookup functions and add some comments explaining how we handle the nid, job, and pmap arrays. This fixes a problem we have less-than-full participation in a comm_spawn, causing holes to exist in the pmap array.
Update the slave spawn tests to properly indicate participation as being solely MPI_COMM_SELF.

This commit was SVN r20961.
2009-04-09 02:48:33 +00:00
Rainer Keller
9b7ab92de9 - Per mail and diff from Ken Matney:
Allow multiple retries to open file as well, for ALPS to supply the file.

This commit was SVN r20932.
2009-04-02 17:46:08 +00:00
Shiqing Fan
7a7c4bcb4b fix a type cast for windows.
This commit was SVN r20928.
2009-04-02 08:45:48 +00:00
Terry Dontje
4b43911c6a Remove superfluous spaces in manpages that were causing catman to
generate mangled windex files.  Made ompi-top.1 and ompi-iof.1 build
by default.  Also, added the orte-top synonym to the ompi-top manpage.

This commit was SVN r20915.
2009-04-01 14:40:27 +00:00
Aurelien Bouteiller
ccc8aa5784 Fix a segfault caused by making copies of the pointer to an array that is realloced meanwhile. The base pointer can change its address while the copy still tries to access pages that are not ours anymore.
As a safeguard, good coding style should never access directly opal_pointer_array_t->addr or opal_value_array_t->bytes_array. I found another instance of the same bug somewhere else and will commit a separate patch for it. 

This commit fixes ticket #1858 and solves user case http://www.open-mpi.org/community/lists/devel/2009/03/5731.php .

Aurelien

This commit was SVN r20903.
2009-03-31 15:41:55 +00:00
Jeff Squyres
49b60029e6 * Set svn:ignore
* Fix filename in Makefile.am

This commit was SVN r20868.
2009-03-25 13:32:55 +00:00
Ralph Castain
d0b50a2b9b Eliminate an annoying "not found" message when a job abnormally terminates. In this case, we can get a race condition where the job object has been removed, but updates are continuing to flow into the system.
This commit was SVN r20857.
2009-03-24 18:06:49 +00:00
Ralph Castain
9564ab8cc3 Resolve a user-reported problem where the ifislocal check would return false, but the node name is the same as the HNP's node name because we are on the same node.
This commit was SVN r20856.
2009-03-24 15:23:22 +00:00
George Bosilca
faca1aeeb9 Support for password structures without a specified shell or a NULL/empty shell.
This commit was SVN r20853.
2009-03-24 13:26:57 +00:00
Rainer Keller
bff1b2a22b - Finally add the missing opal/util/output.h
for the OPAL_OUTPUT_VERBOSE macro.
 - ompi/errhandler/errhandler_predefined.h:
   Well, just the missing fwd declarations...

This commit was SVN r20820.
2009-03-17 22:37:15 +00:00
Rainer Keller
64dcd85ba1 - This one was missing
This commit was SVN r20818.
2009-03-17 22:02:51 +00:00
Rainer Keller
6f808d9b05 Preparation work for another commit (after RFC):
- This patch solely _adds_ required headers and is rather localized
   The next patch (after RFC) heavily removes headers (based on script)
 - ompi/communicator/communicator.h: For sources that use
   ompi_mpi_comm_world, don't require them to include "mpi.h"
 - ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs
   #include "ompi/mca/topo/topo.h"
 - ompi/errhandler/errhandler_predefined.h:
   ompi/communicator/communicator.h depends on this header file!
   To prevent recursion just have fwd declarations.
   #include "ompi/types.h" for fwd declarations of the main structs.
 - ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and
   ompi_rb_tree_t, so have the proper classes
 - ompi/mca/op/op.h:
   Op is pretty self-contained: Nobody up to now has done
   #include "opal/class/opal_object.h"
 - ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h:
   #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/pml/base/base.h:
   We use opal_lists  
 - ompi/mca/pml/dr/pml_dr_vfrag.h:
   #include "opal/types.h" for ompi_ptr_t
 - ompi/mca/pml/ob1/pml_ob1_hdr.h:
   #include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t
 - opal/dss/dss_unpack.c:
   #include "opal/types.h"
 - opal/mca/base/base.h:
   #include "opal/util/cmd_line.h" for opal_cmd_line_t
 - orte/mca/oob/tcp/oob_tcp.c:
   #include "opal/types.h" for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp.h:
   #include "opal/threads/threads.h" for opal_thread_t
 - orte/mca/oob/tcp/oob_tcp_msg.c:
   #include "opal/types.h" 
 - orte/mca/oob/tcp/oob_tcp_peer.c:
   #include "opal/types.h"  for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp_send.c:
   #include "opal/types.h" 
 - orte/mca/plm/base/plm_base_proxy.c:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT
 - orte/mca/rml/base/rml_base_receive.c:
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/mca/rml/oob/rml_oob_recv.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/mca/rml/oob/rml_oob_send.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/runtime/orte_data_server.c
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/runtime/orte_globals.h:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT

 Tested on Linux/x86-64

This commit was SVN r20817.
2009-03-17 21:34:30 +00:00
Jeff Squyres
b5c38f74b0 Always tie the child stdin to /dev/null.
This commit was SVN r20796.
2009-03-17 03:17:50 +00:00
Jeff Squyres
27bacbee3c Per discussion on #1833, we should ''always'' dup /dev/null into ssh's
stdin.

This commit was SVN r20790.
2009-03-16 19:12:48 +00:00
George Bosilca
02bee12de8 Small cleanup.
This commit was SVN r20781.
2009-03-14 23:11:24 +00:00
Brian Barrett
716b505789 Fix for #1832. The stdin for the forked ssh wasn't being closed, so it was
"eating" all the stdin, leaving nothing for mpirun to forward on to the child
process

This commit was SVN r20776.
2009-03-13 22:34:56 +00:00
Rainer Keller
d8cf4c0fec - Get pgcc on XT to complain less:
In case we use memcmp, strlen, strup and friends include <string.h>
   Also several constants.h are not included directly
 - Let's have mca_topo_base_cart_create  return ompi-errors in
   ompi/mca/topo/base/topo_base_cart_create.c

This commit was SVN r20773.
2009-03-13 02:10:32 +00:00
Rolf vandeVaart
2b365d7d90 Fix so it builds on Solaris.
This commit was SVN r20758.
2009-03-10 18:38:42 +00:00
Jeff Squyres
4e53885f73 Fix a compiler warning and ensure that "sent" is initialized to 0.
This commit was SVN r20756.
2009-03-09 15:37:04 +00:00
Jeff Squyres
8b5e6c0425 Because I could. :-)
Relevant MCA params:

 * notifier_twitter_username: Twitter username
 * notifier_twitter_password: Twitter password

This commit was SVN r20750.
2009-03-06 22:02:17 +00:00
Jeff Squyres
2373bc36e2 Add the "smtp" notifier component. It uses libesmtp
(http://www.stafford.uklinux.net/libesmtp/) via the --with-esmtp(=DIR)
configure option.  Several MCA parameters must be set in order to use
this component:

 * notifier_smtp_server: SMTP server IP address or name; must be supplied
 * notifier_smtp_port: port to talk to on the server; defaults to 25
 * notifier_smtp_to: comma-delimited list of email addresses to send
   the mail to; must be supplied
 * notifier_smtp_from_name: free-form "name" who the mail is from;
   defaults to "Open MPI Notifier"
 * notifier_smtp_from_addr: email address from the mail is from; must
   be supplied
 * notifier_smtp_subject: subject of the mail; defaults to "Open MPI
   notifier"
 * notifier_smtp_body_prefix: prefix of the body of the mail; defaults
   to a sensible value
 * notifier_smtp_body_suffix: suffix of the body of the mail; defaults
   to a sensible value

Also libesmtp supports SMTP AUTH protocols, this component does not.
If people want/need those kinds of features, they're relatively easy
to add -- I just didn't bother [yet] before I knew if anyone cared.

This commit was SVN r20749.
2009-03-06 21:59:19 +00:00
Jeff Squyres
c17616c332 Change the ordering slightly; don't save anything until we know all
went well.

This commit was SVN r20748.
2009-03-06 21:49:38 +00:00
Shiqing Fan
ddc82f3831 Correct the output global variables.
This commit was SVN r20745.
2009-03-06 15:31:12 +00:00
Rainer Keller
ec0ed48718 - Revert r20739
This commit was SVN r20742.

The following SVN revision numbers were found above:
  r20739 --> open-mpi/ompi@781caee0b6
2009-03-05 21:56:03 +00:00
Rainer Keller
a94438343b - Revert r20740
This commit was SVN r20741.

The following SVN revision numbers were found above:
  r20740 --> open-mpi/ompi@2a70618a77
2009-03-05 21:50:47 +00:00
Rainer Keller
2a70618a77 - Second patch, as discussed in Louisville.
Replace short macros in orte/util/name_fns.h
   to the actual fct. call.

 - Compiles on linux/x86-64

This commit was SVN r20740.
2009-03-05 21:14:18 +00:00
Rainer Keller
781caee0b6 - First of two or three patches, in orte/util/proc_info.h:
Adapt orte_process_info to orte_proc_info, and
   change orte_proc_info() to orte_proc_info_init().
 - Compiled on linux-x86-64
 - Discussed with Ralph

This commit was SVN r20739.
2009-03-05 20:36:44 +00:00
Rainer Keller
fd28b392bf - An intrusive commit yet again (sorry): with the separation we
get bitten by header depending on having already included
   the corresponding [opal|orte|ompi]_config.h header.
   When separating, things like [OPAL|ORTE|OMPI]_DECLSPEC
   are missed.

   Script to add the corresponding header in front of all following
   (taking care of possible #ifdef HAVE_...)

 - Including some minor cleanups to
   - ompi/group/group.h -- include _after_ #ifndef OMPI_GROUP_H
   - ompi/mca/btl/btl.h -- nclude _after_ #ifndef MCA_BTL_H
   - ompi/mca/crcp/bkmrk/crcp_bkmrk_btl.c -- still no need for
     orte/util/output.h
   - ompi/mca/pml/dr/pml_dr_recvreq.c -- no need for mpool.h
   - ompi/mca/btl/btl.h -- reorder to fit
   - ompi/mca/bml/bml.h -- reorder to fit
   - ompi/runtime/ompi_mpi_finalize.c -- reorder to fit
   - ompi/request/request.h -- additionally need ompi/constants.h

 - Tested on linux/x86-64

This commit was SVN r20720.
2009-03-04 15:35:54 +00:00
Rainer Keller
84408f2fb7 - A follow-up to the commit:
As orte/mca/routed/base/base.h does not require opal_bitmap.h
   Include it in the C-files, based on routed/base/base.h...

This commit was SVN r20709.
2009-03-03 22:36:58 +00:00
George Bosilca
af9c2e10a3 Really cycle when we have several IP addresses.
This commit was SVN r20705.
2009-03-03 19:29:03 +00:00
Ralph Castain
f11931306a Modify the accounting system to recycle jobids. Properly recover resources from nodes and jobs upon completion. Adjustments in several places were required to deal with sparsely populated job, node, and proc arrays as a result of this change.
Correct an error wrt how jobids were being computed. Needed to ensure that the job family field was not overrun as we increment jobids for comm_spawn.

Update the slurm plm module so it uses the new slurm termination procedure (brings trunk back into alignment with 1.3 branch).

Update the slurmd ess component so it doesn't get selected if we are running a singleton inside of a slurm allocation.

Cleanup HNP init by moving some code that had been in orte_globals.c for historical reasons into the ess hnp module, and removing the call to that code from the ess_base_std_prolog


NOTE: this change allows orte to support an infinite aggregate number of comm_spawn's, with up to 64k being alive at any one instant. HOWEVER, the MPI layer currently does -not- support re-use of jobids. I did some prototype coding to revise the ompi_proc_t structures, but the BTLs are caching their own data, and there was no readily apparent way to update it. Thus, attempts to spawn more than the 64k limit will abort to avoid causing the MPI layer to hang.

This commit was SVN r20700.
2009-03-03 16:39:13 +00:00
Ralph Castain
c7fda41d2a Only remove children from the local child list when the job completes so we update the status on all procs in the job and can properly terminate the job.
Correct an error in a debugging output

This commit was SVN r20669.
2009-03-01 20:12:20 +00:00
Ralph Castain
15171e4ba8 Remove completed children from the local list of child processes so that we properly track our number of children. Otherwise, we can artificially believe we have exceeded system limits on the number of local children.
This commit was SVN r20667.
2009-03-01 15:31:27 +00:00
Ralph Castain
c2ff8dc5ce Fix notifier base functions to match revised notifier.h framework APIs
This commit was SVN r20663.
2009-02-28 23:46:18 +00:00
Ralph Castain
11979c100a Silence pointless compiler warning
This commit was SVN r20661.
2009-02-28 15:35:48 +00:00
Tim Mattox
57be80c983 First pass at integrating the CIFTS/FTB support as
a notifier module.
The Notifier framework was extended slightly to
convey more information about each event notice.
This works with the FTB v0.5 API.

To compile with FTB support, use --with-ftb=/path/to/ftb/install

CIFTS == Coordinated Infrastructure for Fault Tolerant Systems
FTB == Fault Tolerance Backplane
see http://wiki.mcs.anl.gov/cifts/index.php

This commit was SVN r20655.
2009-02-27 22:53:43 +00:00
Ralph Castain
b8ffa302da Separate abnormal job termination from abnormal orted termination so we can continue to use xcast for orted cmds, but can know to turn off reading of stdin as the job is being terminated.
This commit was SVN r20650.
2009-02-27 10:16:25 +00:00
Ralph Castain
4f75f6e443 Fix a bug where we were not stopping the read event on stdin if the write to stdin of the target process was backing up.
Ensure we stop reading stdin if we are abnormally terminating - no point in doing so since the job is being terminated.

This commit was SVN r20649.
2009-02-27 09:31:34 +00:00