1
1
Граф коммитов

12883 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
f0af389910 Enable comm_spawn of slave processes, currently only active for the rsh, slurm, and tm environments. Establish support for local rsh environments in the plm/base so that rsh of local slaves can be done by any environment that supports it. Create new orte_rsh_agent param so users can specify rsh agent from outside of rsh plm, and sym link that to the old plm_rsh_agent and pls_rsh_agent options.
Modify the orte-bootproxy to pass prefix for the remote slave to support hetero/hybrid scenarios

This commit was SVN r20492.
2009-02-09 20:44:44 +00:00
Ralph Castain
631d7d2a85 Handle cases where daemon uri has quote marks around it
This commit was SVN r20491.
2009-02-09 20:40:17 +00:00
Ralph Castain
890eb9c0ce Init variable
This commit was SVN r20490.
2009-02-09 20:39:48 +00:00
Ralph Castain
4286b7adb9 Deal with unknown return address for ompi-top option
This commit was SVN r20489.
2009-02-09 20:39:05 +00:00
Ralph Castain
cab5095ce8 Init variable
This commit was SVN r20488.
2009-02-09 20:38:15 +00:00
Ralph Castain
5e0be012e0 Minor adjustments to lanl platform files, addition of slave configuration
This commit was SVN r20487.
2009-02-09 20:37:30 +00:00
Rainer Keller
43d2094908 - svn propedit svn:ignore the Makefile.in
This commit was SVN r20483.
2009-02-09 14:57:49 +00:00
Ralph Castain
eaa57e29b6 Revert r20480 as this breaks the trunk. The dpm.h include file has defines for OMPI_RML tags that are required for wireup.
This commit was SVN r20482.

The following SVN revision numbers were found above:
  r20480 --> open-mpi/ompi@62282fefe5
2009-02-09 14:14:45 +00:00
Rainer Keller
62282fefe5 - Get rid of #include "ompi/mca/dpm/dpm.h"
This commit was SVN r20480.
2009-02-09 02:56:10 +00:00
Rainer Keller
64aa384745 - Fix comment to reflect current code.
This commit was SVN r20479.
2009-02-09 00:52:57 +00:00
Ralph Castain
5bfd1f3fd0 Ensure we have a correct, non-zero exit status when daemons or procs abort or fail to launch
This commit was SVN r20478.
2009-02-07 00:57:17 +00:00
Ralph Castain
8924e00e4c Ensure we don't segfault if we don't know which proc failed
This commit was SVN r20474.
2009-02-06 20:04:36 +00:00
Jeff Squyres
f68d2b00d8 Fix one more place where the old name was left over.
This commit was SVN r20473.
2009-02-06 19:21:50 +00:00
Terry Dontje
64ace9ec12 convert bzero calls to memset to remove warnings.
This commit was SVN r20471.
2009-02-06 19:08:22 +00:00
Ralph Castain
0750103d6c Teach the routed modules that local slave processes are direct routes to/from their master daemon.
This commit was SVN r20467.
2009-02-06 15:41:53 +00:00
Ralph Castain
13749673ed Enable spawn of local slave processes - plm module implementation to follow
This commit was SVN r20466.
2009-02-06 15:31:33 +00:00
Ralph Castain
f8cd188367 Make the orte_pmap_t an object so it can be properly initialized. Adjust the construct function to properly indicate invalid node/local ranks
This commit was SVN r20465.
2009-02-06 15:29:33 +00:00
Ralph Castain
e2a8f45fba Update the nidmap functions to include a new lookup_jmap entry, and to initialize the nidmap and pidmap for startup.
Have the singleton ess module use the new capability.

Adjust a comment in ess_base_put

This commit was SVN r20464.
2009-02-06 15:28:32 +00:00
Ralph Castain
c5b637418b Ensure that the various grpcomm modules use a common data set and packing order for modex operations so that jobs using different grpcomm modules can still perform connect/accept.
Have dynamic grpcomm operations update the nidmap/pidmap to support additional features.

This commit was SVN r20463.
2009-02-06 15:25:06 +00:00
Jeff Squyres
aae930e58b s/__n/converted_n/ -- according to C99, symbols that being with "__"
are the domain of the compiler.

This commit was SVN r20462.
2009-02-06 01:04:50 +00:00
Jeff Squyres
dfb2d92b37 s/ID/id/ - both work, but if I don't make this change, I'll wonder if
we remembered to use strcasecmp() every time I see this entry in the
file... (we did, but I just don't want to have to keep remembering
that ;-) )

This commit was SVN r20461.
2009-02-06 01:02:25 +00:00
Ralph Castain
a6f9c1f2b1 Allocate the slots for use in the xgrid plm
This commit was SVN r20460.
2009-02-06 00:55:14 +00:00
Jeff Squyres
656d8578d0 * Rename (new) MCA parameter to
btl_openib_connect_rdmacm_reject_causes_connect_error (yes, it's
   still long -- on purpose :-) )
 * Add INI file parameter rdmacm_reject_causes_connect_error
 * Now only treat CONNECT_ERROR events as a REJECT if:
   * It's on a connection where we were expecting a REJECT, ''and''
   * The MCA parameter is true ''or'' the INI parameter for this
     device is true
 * Set the INI parameter for true for the NE020

This commit was SVN r20459.
2009-02-06 00:51:04 +00:00
Jeff Squyres
ffc5d8877f Fix a problem where we're accidentally initializing the wrong
errhandler (should be initializing _errors_throw_exceptions, not
_are_fatal).  This bug was not a huge tragedy because the only real
problem is that _are_fatal has the wrong string name with it (because
MPI::Init fixes up the _errors_throw_exceptions later).

This commit was SVN r20458.
2009-02-05 21:36:10 +00:00
Jeff Squyres
50b1fd1392 Per the big discussion on the OpenFabrics list a while ago, some
versions of the NE driver will report the OUI while others will report
the PCI ID.  We'll put in the Intel values when we get them (may not
be for a few more weeks).

This commit was SVN r20457.
2009-02-05 21:19:45 +00:00
Jeff Squyres
66d0a02f90 For a problem for some iWARP drivers that don't handle RDMA CM REJECT
properly at all.  NetEffect's current driver (OFED 1.4.0) will return
a CONNECT_ERROR event to the initiator rather than the REJECTED event.
Doh!  Additionally -- unfortunately -- NetEffect's vendor_id and
vendor_part_id are reported as 0 in OFED 1.4.0, so we can't
automatically detect these cards and work around the problem.  So all
we can do is add a new MCA parameter
(btl_openib_connect_rdmacm_ignore_connect_errors -- yes, it's long on
purpose ;-) ) that says that if we get a CONNECT_ERROR, bascially
treat it exactly as a REJECT for the WRONG_DIRECTION reason (which is
a "good" reject).  This allows OMPI to function with NetEffect/Intel
cards on OFED 1.4.0.

Note that NetEffect has been bought by Intel; I'm waiting for
information from them to update the ini file for their new OUI/PCI
ID's and/or new vendor_part_id values.

This commit was SVN r20454.
2009-02-05 18:45:59 +00:00
Shiqing Fan
8086bb1a1b SIGSTOP and SIGTSTP are not supported on Windows. But they have to be defined anyway, although they are not used for Windows.
This commit was SVN r20453.
2009-02-05 17:02:34 +00:00
Jeff Squyres
08c35ca135 Somehow this mca param registration code got duplicated; remove one of
them

This commit was SVN r20452.
2009-02-05 16:52:30 +00:00
George Bosilca
36d496066b Correctly deal with the whole array.
This commit was SVN r20451.
2009-02-05 16:44:43 +00:00
Shiqing Fan
ff7ca43dd1 Update two configuration files for windows build.
This commit was SVN r20450.
2009-02-05 16:39:40 +00:00
Shiqing Fan
a20254c8a5 A few type casts, making the MS compiler silent.
This commit was SVN r20449.
2009-02-05 16:37:44 +00:00
Shiqing Fan
7d2d6b16b1 A fix for windows mainly, adding BEGIN/END_C_DECLS pairs.
This commit was SVN r20448.
2009-02-05 16:35:58 +00:00
George Bosilca
2c00133fdc Silence a possible casting warning.
This commit was SVN r20447.
2009-02-05 16:18:39 +00:00
Jeff Squyres
90c28810f4 Fix CID 1122: comm->c_name is a char array (not a pointer), so
comparing it to NULL is not useful.

This commit was SVN r20444.
2009-02-05 15:31:10 +00:00
Jeff Squyres
67a5374a61 Re CID 1180: Actually, it would be better to also print something in
the case of an error, too...

This commit was SVN r20443.
2009-02-05 15:26:44 +00:00
Jeff Squyres
598e530de9 Fix CID 1180: ensure to check the output from snprintf, since we pass
it to write().

This commit was SVN r20442.
2009-02-05 15:24:48 +00:00
George Bosilca
ee6ff2372e Fix the compilation for Windows.
This commit was SVN r20441.
2009-02-05 13:55:26 +00:00
Jeff Squyres
eaeed0402c Can just use the built-in _SCRIPTS suffix.
This commit was SVN r20440.
2009-02-05 12:21:56 +00:00
Lenny Verkhovsky
5ae9fd9865 fixing r20436
This commit was SVN r20439.

The following SVN revision numbers were found above:
  r20436 --> open-mpi/ompi@0d447511a5
2009-02-05 09:45:34 +00:00
Ralph Castain
6292b797e9 Add a new ESS module for use by local slave processes - only active when specifically selected
This commit was SVN r20438.
2009-02-05 06:07:48 +00:00
Ralph Castain
b7e6bafada Add a new routed module for local slave processes to use - only active when specifically selected
This commit was SVN r20437.
2009-02-05 06:07:04 +00:00
Ralph Castain
0d447511a5 Add a shellscript for daemon-less launch of local slave processes. No manpage as this is totally for internal use only.
This commit was SVN r20436.
2009-02-05 06:05:28 +00:00
Jeff Squyres
73ea7a9aa5 Fix CIDs 1211, 1212, 1214: fix error checking in MPI_REDUCE_LOCAL.
This commit was SVN r20435.
2009-02-05 02:18:03 +00:00
Jeff Squyres
a58d0d1a27 Fix CID 1219: ensure that "found" is initialized.
This commit was SVN r20434.
2009-02-05 01:57:20 +00:00
George Bosilca
4804ee60a7 It barely compiles ...
This commit was SVN r20433.
2009-02-05 00:14:28 +00:00
Jeff Squyres
7a3b011f45 Really fix the quoting this time. Really.
This commit was SVN r20430.
2009-02-04 23:04:21 +00:00
Ralph Castain
df3446faf1 Procs don't need to check for other job families to update routes - now that the direct routing module is gone, they always route through their daemons anyway, so save a couple of unnecessary steps.
This commit was SVN r20429.
2009-02-04 22:49:57 +00:00
Ralph Castain
dbba261451 Commit missing change in #define so r20427 doesn't break trunk
This commit was SVN r20428.

The following SVN revision numbers were found above:
  r20427 --> open-mpi/ompi@b100513022
2009-02-04 22:37:24 +00:00
Ralph Castain
b100513022 Add a few new MPI_Info options to the dpm - documentation to follow.
Fix a mistake in the dpm that hardcoded the update of routes to the HNP. This needs to be done by the individual routing modules so they can take whatever action is required - which will usually include updating the HNP, but might not...and might include additional steps. New routing modules are coming that violated this assumption, so it had to be moved back into init_routes.

All current routed modules know what to do - anyone with routed modules not in the current trunk may need to adjust them (see any of the current routed modules for examples of what to do).

This commit was SVN r20427.
2009-02-04 22:30:23 +00:00
Ralph Castain
e694c0dac6 Get the various grpcomm modules to all inter-operate cleanly with the "hier" module
This commit was SVN r20426.
2009-02-04 22:26:35 +00:00