Ralph Castain
f0af389910
Enable comm_spawn of slave processes, currently only active for the rsh, slurm, and tm environments. Establish support for local rsh environments in the plm/base so that rsh of local slaves can be done by any environment that supports it. Create new orte_rsh_agent param so users can specify rsh agent from outside of rsh plm, and sym link that to the old plm_rsh_agent and pls_rsh_agent options.
...
Modify the orte-bootproxy to pass prefix for the remote slave to support hetero/hybrid scenarios
This commit was SVN r20492.
2009-02-09 20:44:44 +00:00
Ralph Castain
631d7d2a85
Handle cases where daemon uri has quote marks around it
...
This commit was SVN r20491.
2009-02-09 20:40:17 +00:00
Ralph Castain
890eb9c0ce
Init variable
...
This commit was SVN r20490.
2009-02-09 20:39:48 +00:00
Ralph Castain
4286b7adb9
Deal with unknown return address for ompi-top option
...
This commit was SVN r20489.
2009-02-09 20:39:05 +00:00
Ralph Castain
cab5095ce8
Init variable
...
This commit was SVN r20488.
2009-02-09 20:38:15 +00:00
Ralph Castain
5e0be012e0
Minor adjustments to lanl platform files, addition of slave configuration
...
This commit was SVN r20487.
2009-02-09 20:37:30 +00:00
Rainer Keller
43d2094908
- svn propedit svn:ignore the Makefile.in
...
This commit was SVN r20483.
2009-02-09 14:57:49 +00:00
Ralph Castain
eaa57e29b6
Revert r20480 as this breaks the trunk. The dpm.h include file has defines for OMPI_RML tags that are required for wireup.
...
This commit was SVN r20482.
The following SVN revision numbers were found above:
r20480 --> open-mpi/ompi@62282fefe5
2009-02-09 14:14:45 +00:00
Rainer Keller
62282fefe5
- Get rid of #include "ompi/mca/dpm/dpm.h"
...
This commit was SVN r20480.
2009-02-09 02:56:10 +00:00
Rainer Keller
64aa384745
- Fix comment to reflect current code.
...
This commit was SVN r20479.
2009-02-09 00:52:57 +00:00
Ralph Castain
5bfd1f3fd0
Ensure we have a correct, non-zero exit status when daemons or procs abort or fail to launch
...
This commit was SVN r20478.
2009-02-07 00:57:17 +00:00
Ralph Castain
8924e00e4c
Ensure we don't segfault if we don't know which proc failed
...
This commit was SVN r20474.
2009-02-06 20:04:36 +00:00
Jeff Squyres
f68d2b00d8
Fix one more place where the old name was left over.
...
This commit was SVN r20473.
2009-02-06 19:21:50 +00:00
Terry Dontje
64ace9ec12
convert bzero calls to memset to remove warnings.
...
This commit was SVN r20471.
2009-02-06 19:08:22 +00:00
Ralph Castain
0750103d6c
Teach the routed modules that local slave processes are direct routes to/from their master daemon.
...
This commit was SVN r20467.
2009-02-06 15:41:53 +00:00
Ralph Castain
13749673ed
Enable spawn of local slave processes - plm module implementation to follow
...
This commit was SVN r20466.
2009-02-06 15:31:33 +00:00
Ralph Castain
f8cd188367
Make the orte_pmap_t an object so it can be properly initialized. Adjust the construct function to properly indicate invalid node/local ranks
...
This commit was SVN r20465.
2009-02-06 15:29:33 +00:00
Ralph Castain
e2a8f45fba
Update the nidmap functions to include a new lookup_jmap entry, and to initialize the nidmap and pidmap for startup.
...
Have the singleton ess module use the new capability.
Adjust a comment in ess_base_put
This commit was SVN r20464.
2009-02-06 15:28:32 +00:00
Ralph Castain
c5b637418b
Ensure that the various grpcomm modules use a common data set and packing order for modex operations so that jobs using different grpcomm modules can still perform connect/accept.
...
Have dynamic grpcomm operations update the nidmap/pidmap to support additional features.
This commit was SVN r20463.
2009-02-06 15:25:06 +00:00
Jeff Squyres
aae930e58b
s/__n/converted_n/ -- according to C99, symbols that being with "__"
...
are the domain of the compiler.
This commit was SVN r20462.
2009-02-06 01:04:50 +00:00
Jeff Squyres
dfb2d92b37
s/ID/id/ - both work, but if I don't make this change, I'll wonder if
...
we remembered to use strcasecmp() every time I see this entry in the
file... (we did, but I just don't want to have to keep remembering
that ;-) )
This commit was SVN r20461.
2009-02-06 01:02:25 +00:00
Ralph Castain
a6f9c1f2b1
Allocate the slots for use in the xgrid plm
...
This commit was SVN r20460.
2009-02-06 00:55:14 +00:00
Jeff Squyres
656d8578d0
* Rename (new) MCA parameter to
...
btl_openib_connect_rdmacm_reject_causes_connect_error (yes, it's
still long -- on purpose :-) )
* Add INI file parameter rdmacm_reject_causes_connect_error
* Now only treat CONNECT_ERROR events as a REJECT if:
* It's on a connection where we were expecting a REJECT, ''and''
* The MCA parameter is true ''or'' the INI parameter for this
device is true
* Set the INI parameter for true for the NE020
This commit was SVN r20459.
2009-02-06 00:51:04 +00:00
Jeff Squyres
ffc5d8877f
Fix a problem where we're accidentally initializing the wrong
...
errhandler (should be initializing _errors_throw_exceptions, not
_are_fatal). This bug was not a huge tragedy because the only real
problem is that _are_fatal has the wrong string name with it (because
MPI::Init fixes up the _errors_throw_exceptions later).
This commit was SVN r20458.
2009-02-05 21:36:10 +00:00
Jeff Squyres
50b1fd1392
Per the big discussion on the OpenFabrics list a while ago, some
...
versions of the NE driver will report the OUI while others will report
the PCI ID. We'll put in the Intel values when we get them (may not
be for a few more weeks).
This commit was SVN r20457.
2009-02-05 21:19:45 +00:00
Jeff Squyres
66d0a02f90
For a problem for some iWARP drivers that don't handle RDMA CM REJECT
...
properly at all. NetEffect's current driver (OFED 1.4.0) will return
a CONNECT_ERROR event to the initiator rather than the REJECTED event.
Doh! Additionally -- unfortunately -- NetEffect's vendor_id and
vendor_part_id are reported as 0 in OFED 1.4.0, so we can't
automatically detect these cards and work around the problem. So all
we can do is add a new MCA parameter
(btl_openib_connect_rdmacm_ignore_connect_errors -- yes, it's long on
purpose ;-) ) that says that if we get a CONNECT_ERROR, bascially
treat it exactly as a REJECT for the WRONG_DIRECTION reason (which is
a "good" reject). This allows OMPI to function with NetEffect/Intel
cards on OFED 1.4.0.
Note that NetEffect has been bought by Intel; I'm waiting for
information from them to update the ini file for their new OUI/PCI
ID's and/or new vendor_part_id values.
This commit was SVN r20454.
2009-02-05 18:45:59 +00:00
Shiqing Fan
8086bb1a1b
SIGSTOP and SIGTSTP are not supported on Windows. But they have to be defined anyway, although they are not used for Windows.
...
This commit was SVN r20453.
2009-02-05 17:02:34 +00:00
Jeff Squyres
08c35ca135
Somehow this mca param registration code got duplicated; remove one of
...
them
This commit was SVN r20452.
2009-02-05 16:52:30 +00:00
George Bosilca
36d496066b
Correctly deal with the whole array.
...
This commit was SVN r20451.
2009-02-05 16:44:43 +00:00
Shiqing Fan
ff7ca43dd1
Update two configuration files for windows build.
...
This commit was SVN r20450.
2009-02-05 16:39:40 +00:00
Shiqing Fan
a20254c8a5
A few type casts, making the MS compiler silent.
...
This commit was SVN r20449.
2009-02-05 16:37:44 +00:00
Shiqing Fan
7d2d6b16b1
A fix for windows mainly, adding BEGIN/END_C_DECLS pairs.
...
This commit was SVN r20448.
2009-02-05 16:35:58 +00:00
George Bosilca
2c00133fdc
Silence a possible casting warning.
...
This commit was SVN r20447.
2009-02-05 16:18:39 +00:00
Jeff Squyres
90c28810f4
Fix CID 1122: comm->c_name is a char array (not a pointer), so
...
comparing it to NULL is not useful.
This commit was SVN r20444.
2009-02-05 15:31:10 +00:00
Jeff Squyres
67a5374a61
Re CID 1180: Actually, it would be better to also print something in
...
the case of an error, too...
This commit was SVN r20443.
2009-02-05 15:26:44 +00:00
Jeff Squyres
598e530de9
Fix CID 1180: ensure to check the output from snprintf, since we pass
...
it to write().
This commit was SVN r20442.
2009-02-05 15:24:48 +00:00
George Bosilca
ee6ff2372e
Fix the compilation for Windows.
...
This commit was SVN r20441.
2009-02-05 13:55:26 +00:00
Jeff Squyres
eaeed0402c
Can just use the built-in _SCRIPTS suffix.
...
This commit was SVN r20440.
2009-02-05 12:21:56 +00:00
Lenny Verkhovsky
5ae9fd9865
fixing r20436
...
This commit was SVN r20439.
The following SVN revision numbers were found above:
r20436 --> open-mpi/ompi@0d447511a5
2009-02-05 09:45:34 +00:00
Ralph Castain
6292b797e9
Add a new ESS module for use by local slave processes - only active when specifically selected
...
This commit was SVN r20438.
2009-02-05 06:07:48 +00:00
Ralph Castain
b7e6bafada
Add a new routed module for local slave processes to use - only active when specifically selected
...
This commit was SVN r20437.
2009-02-05 06:07:04 +00:00
Ralph Castain
0d447511a5
Add a shellscript for daemon-less launch of local slave processes. No manpage as this is totally for internal use only.
...
This commit was SVN r20436.
2009-02-05 06:05:28 +00:00
Jeff Squyres
73ea7a9aa5
Fix CIDs 1211, 1212, 1214: fix error checking in MPI_REDUCE_LOCAL.
...
This commit was SVN r20435.
2009-02-05 02:18:03 +00:00
Jeff Squyres
a58d0d1a27
Fix CID 1219: ensure that "found" is initialized.
...
This commit was SVN r20434.
2009-02-05 01:57:20 +00:00
George Bosilca
4804ee60a7
It barely compiles ...
...
This commit was SVN r20433.
2009-02-05 00:14:28 +00:00
Jeff Squyres
7a3b011f45
Really fix the quoting this time. Really.
...
This commit was SVN r20430.
2009-02-04 23:04:21 +00:00
Ralph Castain
df3446faf1
Procs don't need to check for other job families to update routes - now that the direct routing module is gone, they always route through their daemons anyway, so save a couple of unnecessary steps.
...
This commit was SVN r20429.
2009-02-04 22:49:57 +00:00
Ralph Castain
dbba261451
Commit missing change in #define so r20427 doesn't break trunk
...
This commit was SVN r20428.
The following SVN revision numbers were found above:
r20427 --> open-mpi/ompi@b100513022
2009-02-04 22:37:24 +00:00
Ralph Castain
b100513022
Add a few new MPI_Info options to the dpm - documentation to follow.
...
Fix a mistake in the dpm that hardcoded the update of routes to the HNP. This needs to be done by the individual routing modules so they can take whatever action is required - which will usually include updating the HNP, but might not...and might include additional steps. New routing modules are coming that violated this assumption, so it had to be moved back into init_routes.
All current routed modules know what to do - anyone with routed modules not in the current trunk may need to adjust them (see any of the current routed modules for examples of what to do).
This commit was SVN r20427.
2009-02-04 22:30:23 +00:00
Ralph Castain
e694c0dac6
Get the various grpcomm modules to all inter-operate cleanly with the "hier" module
...
This commit was SVN r20426.
2009-02-04 22:26:35 +00:00