1
1

2277 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
8b0a470543 Continue work to cleanup user options for slave launch
This commit was SVN r21003.
2009-04-14 20:05:51 +00:00
Ralph Castain
9fd834268c Ensure we exit with a non-zero status if terminated by user signal
This commit was SVN r20991.
2009-04-14 15:58:54 +00:00
Ralph Castain
a952dca062 Fix a bug where we created the correct path to the file, but didn't use it
This commit was SVN r20990.
2009-04-14 14:17:43 +00:00
Ralph Castain
9c39a3edd7 Enable the passing of MCA params to dynamically spawned jobs. This creates a new info_key "ompi_param" that allows a user to specify MCA params for a dynamically spawned job.
We currently apply all of the MCA params in the parent job to the child. This commit allows a user to specify additional params for the child job, and to override any pre-existing params with the new value so they can better control behavior of the child job.

This commit was SVN r20989.
2009-04-14 14:15:49 +00:00
Ralph Castain
9f7c605166 More cleanup of pointer array usage
This commit was SVN r20981.
2009-04-13 19:06:54 +00:00
Shiqing Fan
1b97fe90fd Type casts mainly for Windows.
This commit was SVN r20967.
2009-04-09 13:34:55 +00:00
Ralph Castain
2c4e7bd5a2 Remove unused var
This commit was SVN r20966.
2009-04-09 13:18:18 +00:00
Ralph Castain
b4df8bcf85 Missed comment...
This commit was SVN r20964.
2009-04-09 03:00:57 +00:00
Ralph Castain
e9bc000f63 Correctly account for holes in the job map due to cleanup as jobs terminate
This commit was SVN r20963.
2009-04-09 02:59:23 +00:00
Ralph Castain
9c2f17eb01 Cleanup the nidmap lookup functions and add some comments explaining how we handle the nid, job, and pmap arrays. This fixes a problem we have less-than-full participation in a comm_spawn, causing holes to exist in the pmap array.
Update the slave spawn tests to properly indicate participation as being solely MPI_COMM_SELF.

This commit was SVN r20961.
2009-04-09 02:48:33 +00:00
Brian Barrett
7d0c6b68dc allow trunk to compile on red storm
This commit was SVN r20960.
2009-04-08 20:53:54 +00:00
Rainer Keller
9b7ab92de9 - Per mail and diff from Ken Matney:
Allow multiple retries to open file as well, for ALPS to supply the file.

This commit was SVN r20932.
2009-04-02 17:46:08 +00:00
Shiqing Fan
7a7c4bcb4b fix a type cast for windows.
This commit was SVN r20928.
2009-04-02 08:45:48 +00:00
Terry Dontje
4b43911c6a Remove superfluous spaces in manpages that were causing catman to
generate mangled windex files.  Made ompi-top.1 and ompi-iof.1 build
by default.  Also, added the orte-top synonym to the ompi-top manpage.

This commit was SVN r20915.
2009-04-01 14:40:27 +00:00
George Bosilca
7ed4e4f9e8 Using absolute addresses leads to getting the data from strange places
if the opal_buffer_t get reallocated (and it gets). As in all cases
the data in the beginning of the buffer is the one we need, using
relative addresses fixes the problem.

This commit was SVN r20904.
2009-03-31 16:23:27 +00:00
Aurelien Bouteiller
ccc8aa5784 Fix a segfault caused by making copies of the pointer to an array that is realloced meanwhile. The base pointer can change its address while the copy still tries to access pages that are not ours anymore.
As a safeguard, good coding style should never access directly opal_pointer_array_t->addr or opal_value_array_t->bytes_array. I found another instance of the same bug somewhere else and will commit a separate patch for it. 

This commit fixes ticket #1858 and solves user case http://www.open-mpi.org/community/lists/devel/2009/03/5731.php .

Aurelien

This commit was SVN r20903.
2009-03-31 15:41:55 +00:00
Jeff Squyres
49b60029e6 * Set svn:ignore
* Fix filename in Makefile.am

This commit was SVN r20868.
2009-03-25 13:32:55 +00:00
Shiqing Fan
36a813415d When build from a tarball, there will be Linux-generated files that could not be used on Windows, so exclude them, and use the ones generated by CMake.
This commit was SVN r20858.
2009-03-24 18:10:57 +00:00
Ralph Castain
d0b50a2b9b Eliminate an annoying "not found" message when a job abnormally terminates. In this case, we can get a race condition where the job object has been removed, but updates are continuing to flow into the system.
This commit was SVN r20857.
2009-03-24 18:06:49 +00:00
Ralph Castain
9564ab8cc3 Resolve a user-reported problem where the ifislocal check would return false, but the node name is the same as the HNP's node name because we are on the same node.
This commit was SVN r20856.
2009-03-24 15:23:22 +00:00
George Bosilca
faca1aeeb9 Support for password structures without a specified shell or a NULL/empty shell.
This commit was SVN r20853.
2009-03-24 13:26:57 +00:00
Rainer Keller
be66cc2279 - We're using uint16_t, uint32_t, and friends,
so #include <stdint.h> if we have it...

This commit was SVN r20835.
2009-03-21 01:26:27 +00:00
Rainer Keller
bff1b2a22b - Finally add the missing opal/util/output.h
for the OPAL_OUTPUT_VERBOSE macro.
 - ompi/errhandler/errhandler_predefined.h:
   Well, just the missing fwd declarations...

This commit was SVN r20820.
2009-03-17 22:37:15 +00:00
Rainer Keller
64dcd85ba1 - This one was missing
This commit was SVN r20818.
2009-03-17 22:02:51 +00:00
Rainer Keller
6f808d9b05 Preparation work for another commit (after RFC):
- This patch solely _adds_ required headers and is rather localized
   The next patch (after RFC) heavily removes headers (based on script)
 - ompi/communicator/communicator.h: For sources that use
   ompi_mpi_comm_world, don't require them to include "mpi.h"
 - ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs
   #include "ompi/mca/topo/topo.h"
 - ompi/errhandler/errhandler_predefined.h:
   ompi/communicator/communicator.h depends on this header file!
   To prevent recursion just have fwd declarations.
   #include "ompi/types.h" for fwd declarations of the main structs.
 - ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and
   ompi_rb_tree_t, so have the proper classes
 - ompi/mca/op/op.h:
   Op is pretty self-contained: Nobody up to now has done
   #include "opal/class/opal_object.h"
 - ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h:
   #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/pml/base/base.h:
   We use opal_lists  
 - ompi/mca/pml/dr/pml_dr_vfrag.h:
   #include "opal/types.h" for ompi_ptr_t
 - ompi/mca/pml/ob1/pml_ob1_hdr.h:
   #include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t
 - opal/dss/dss_unpack.c:
   #include "opal/types.h"
 - opal/mca/base/base.h:
   #include "opal/util/cmd_line.h" for opal_cmd_line_t
 - orte/mca/oob/tcp/oob_tcp.c:
   #include "opal/types.h" for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp.h:
   #include "opal/threads/threads.h" for opal_thread_t
 - orte/mca/oob/tcp/oob_tcp_msg.c:
   #include "opal/types.h" 
 - orte/mca/oob/tcp/oob_tcp_peer.c:
   #include "opal/types.h"  for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp_send.c:
   #include "opal/types.h" 
 - orte/mca/plm/base/plm_base_proxy.c:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT
 - orte/mca/rml/base/rml_base_receive.c:
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/mca/rml/oob/rml_oob_recv.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/mca/rml/oob/rml_oob_send.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/runtime/orte_data_server.c
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/runtime/orte_globals.h:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT

 Tested on Linux/x86-64

This commit was SVN r20817.
2009-03-17 21:34:30 +00:00
Ralph Castain
4af623076d Add a test for hanging in a loop over mpi_reduce
This commit was SVN r20798.
2009-03-17 13:57:23 +00:00
Jeff Squyres
b5c38f74b0 Always tie the child stdin to /dev/null.
This commit was SVN r20796.
2009-03-17 03:17:50 +00:00
Rainer Keller
6a72c0f4d1 - As long as a header declares _DECLSPEC functionality
it should include the corresponding _config.h header file.

   Tested on Linux/x86-64

This commit was SVN r20795.
2009-03-17 01:45:19 +00:00
Jeff Squyres
27bacbee3c Per discussion on #1833, we should ''always'' dup /dev/null into ssh's
stdin.

This commit was SVN r20790.
2009-03-16 19:12:48 +00:00
George Bosilca
02bee12de8 Small cleanup.
This commit was SVN r20781.
2009-03-14 23:11:24 +00:00
Brian Barrett
716b505789 Fix for #1832. The stdin for the forked ssh wasn't being closed, so it was
"eating" all the stdin, leaving nothing for mpirun to forward on to the child
process

This commit was SVN r20776.
2009-03-13 22:34:56 +00:00
Rainer Keller
d8cf4c0fec - Get pgcc on XT to complain less:
In case we use memcmp, strlen, strup and friends include <string.h>
   Also several constants.h are not included directly
 - Let's have mca_topo_base_cart_create  return ompi-errors in
   ompi/mca/topo/base/topo_base_cart_create.c

This commit was SVN r20773.
2009-03-13 02:10:32 +00:00
Rainer Keller
0b59a59129 - Rather have 0xff instead of 0Xff...
This commit was SVN r20769.
2009-03-12 22:17:42 +00:00
Terry Dontje
d97e398b49 Clarify the error message for out of pipes condition.
This commit was SVN r20762.
2009-03-11 17:48:46 +00:00
Rolf vandeVaart
4d7071dadb Add some explanation of the -x flag as we have
gotten questions on its use.

This commit was SVN r20760.
2009-03-11 13:25:03 +00:00
Rolf vandeVaart
2b365d7d90 Fix so it builds on Solaris.
This commit was SVN r20758.
2009-03-10 18:38:42 +00:00
Jeff Squyres
4e53885f73 Fix a compiler warning and ensure that "sent" is initialized to 0.
This commit was SVN r20756.
2009-03-09 15:37:04 +00:00
Jeff Squyres
8b5e6c0425 Because I could. :-)
Relevant MCA params:

 * notifier_twitter_username: Twitter username
 * notifier_twitter_password: Twitter password

This commit was SVN r20750.
2009-03-06 22:02:17 +00:00
Jeff Squyres
2373bc36e2 Add the "smtp" notifier component. It uses libesmtp
(http://www.stafford.uklinux.net/libesmtp/) via the --with-esmtp(=DIR)
configure option.  Several MCA parameters must be set in order to use
this component:

 * notifier_smtp_server: SMTP server IP address or name; must be supplied
 * notifier_smtp_port: port to talk to on the server; defaults to 25
 * notifier_smtp_to: comma-delimited list of email addresses to send
   the mail to; must be supplied
 * notifier_smtp_from_name: free-form "name" who the mail is from;
   defaults to "Open MPI Notifier"
 * notifier_smtp_from_addr: email address from the mail is from; must
   be supplied
 * notifier_smtp_subject: subject of the mail; defaults to "Open MPI
   notifier"
 * notifier_smtp_body_prefix: prefix of the body of the mail; defaults
   to a sensible value
 * notifier_smtp_body_suffix: suffix of the body of the mail; defaults
   to a sensible value

Also libesmtp supports SMTP AUTH protocols, this component does not.
If people want/need those kinds of features, they're relatively easy
to add -- I just didn't bother [yet] before I knew if anyone cared.

This commit was SVN r20749.
2009-03-06 21:59:19 +00:00
Jeff Squyres
c17616c332 Change the ordering slightly; don't save anything until we know all
went well.

This commit was SVN r20748.
2009-03-06 21:49:38 +00:00
Shiqing Fan
ddc82f3831 Correct the output global variables.
This commit was SVN r20745.
2009-03-06 15:31:12 +00:00
Shiqing Fan
a8cb7d2ab1 Fix an array over flow, detected by compiling with C++ compilers. This fix is mainly for Windows build.
This commit was SVN r20744.
2009-03-06 13:27:38 +00:00
Rainer Keller
ec0ed48718 - Revert r20739
This commit was SVN r20742.

The following SVN revision numbers were found above:
  r20739 --> open-mpi/ompi@781caee0b6
2009-03-05 21:56:03 +00:00
Rainer Keller
a94438343b - Revert r20740
This commit was SVN r20741.

The following SVN revision numbers were found above:
  r20740 --> open-mpi/ompi@2a70618a77
2009-03-05 21:50:47 +00:00
Rainer Keller
2a70618a77 - Second patch, as discussed in Louisville.
Replace short macros in orte/util/name_fns.h
   to the actual fct. call.

 - Compiles on linux/x86-64

This commit was SVN r20740.
2009-03-05 21:14:18 +00:00
Rainer Keller
781caee0b6 - First of two or three patches, in orte/util/proc_info.h:
Adapt orte_process_info to orte_proc_info, and
   change orte_proc_info() to orte_proc_info_init().
 - Compiled on linux-x86-64
 - Discussed with Ralph

This commit was SVN r20739.
2009-03-05 20:36:44 +00:00
Rainer Keller
fd28b392bf - An intrusive commit yet again (sorry): with the separation we
get bitten by header depending on having already included
   the corresponding [opal|orte|ompi]_config.h header.
   When separating, things like [OPAL|ORTE|OMPI]_DECLSPEC
   are missed.

   Script to add the corresponding header in front of all following
   (taking care of possible #ifdef HAVE_...)

 - Including some minor cleanups to
   - ompi/group/group.h -- include _after_ #ifndef OMPI_GROUP_H
   - ompi/mca/btl/btl.h -- nclude _after_ #ifndef MCA_BTL_H
   - ompi/mca/crcp/bkmrk/crcp_bkmrk_btl.c -- still no need for
     orte/util/output.h
   - ompi/mca/pml/dr/pml_dr_recvreq.c -- no need for mpool.h
   - ompi/mca/btl/btl.h -- reorder to fit
   - ompi/mca/bml/bml.h -- reorder to fit
   - ompi/runtime/ompi_mpi_finalize.c -- reorder to fit
   - ompi/request/request.h -- additionally need ompi/constants.h

 - Tested on linux/x86-64

This commit was SVN r20720.
2009-03-04 15:35:54 +00:00
Rainer Keller
84408f2fb7 - A follow-up to the commit:
As orte/mca/routed/base/base.h does not require opal_bitmap.h
   Include it in the C-files, based on routed/base/base.h...

This commit was SVN r20709.
2009-03-03 22:36:58 +00:00
George Bosilca
af9c2e10a3 Really cycle when we have several IP addresses.
This commit was SVN r20705.
2009-03-03 19:29:03 +00:00
Ralph Castain
f11931306a Modify the accounting system to recycle jobids. Properly recover resources from nodes and jobs upon completion. Adjustments in several places were required to deal with sparsely populated job, node, and proc arrays as a result of this change.
Correct an error wrt how jobids were being computed. Needed to ensure that the job family field was not overrun as we increment jobids for comm_spawn.

Update the slurm plm module so it uses the new slurm termination procedure (brings trunk back into alignment with 1.3 branch).

Update the slurmd ess component so it doesn't get selected if we are running a singleton inside of a slurm allocation.

Cleanup HNP init by moving some code that had been in orte_globals.c for historical reasons into the ess hnp module, and removing the call to that code from the ess_base_std_prolog


NOTE: this change allows orte to support an infinite aggregate number of comm_spawn's, with up to 64k being alive at any one instant. HOWEVER, the MPI layer currently does -not- support re-use of jobids. I did some prototype coding to revise the ompi_proc_t structures, but the BTLs are caching their own data, and there was no readily apparent way to update it. Thus, attempts to spawn more than the 64k limit will abort to avoid causing the MPI layer to hang.

This commit was SVN r20700.
2009-03-03 16:39:13 +00:00