1
1

16 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
c146c4969b Revert part of open-mpi/ompi@c1bbbb5e2f to restore the usock component, thus fixing show_help aggregation.
Fixes #1467

Restore debugger attach operations

Fixes #1225
2016-03-18 21:49:04 -07:00
Ralph Castain
c1bbbb5e2f Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds 2015-09-15 13:08:35 -07:00
Ralph Castain
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00
Ralph Castain
1f8de276de Consolidate all the QOS changes into one clean commit 2015-05-06 19:48:42 -07:00
Ralph Castain
d07dc362d5 Ensure we can authenticate when crossing security domains by including all available credentials, and letting the receiver use the highest priority one they have in common. 2015-03-28 20:34:26 -07:00
Ralph Castain
1b24536941 Allow for different security domains. Let the initiator of the connection determine the method to be used - if the receiver cannot support it, then that's an error that will cause the connection attempt to fail. 2015-03-25 13:22:01 -07:00
Ralph Castain
f4ff791335 Close oob/usock connections upon exec 2014-12-13 20:24:09 -08:00
Ralph Castain
d6d69e2b13 Get the direct routed component to work with both TCP and USOCK OOB components. We previously had setup the direct component so it would only support direct-launched applications. Thus, all routes went direct between processes. However, if the job had been launched by mpirun, this made no sense - what you wanted instead was to have each app proc talk directly to its daemon, but have the daemons all directly connect to each other.
So we need all the routing code for dealing with cross-job communications, lifelines, etc. The HNP will be directly connected to all daemons as they must callback at startup, and so we need to track those children correctly so we know when it is okay to terminate.

We still have to support direct launch, though, as this is the only component we can use in that scenario. So if the app doesn't have daemon URI info, then it must fall back to directly connecting to everything.
2014-12-07 09:11:48 -08:00
Gilles Gouaillardet
a6744b8177 fix misc memory leaks specific to the master 2014-11-25 13:52:10 +09:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Ralph Castain
e49ca05f11 Remove unused variable
This commit was SVN r32651.
2014-08-31 03:11:50 +00:00
Ralph Castain
5cdbc00136 Re-enable the usock oob component. Ensure the TCP component promotes messages for other procs to the OOB base so that other components have a chance to send the relay. Seems to be passing MTT, so let's see how it works for others.
This commit was SVN r32650.
2014-08-30 19:33:46 +00:00
Ralph Castain
5520d6971b We do have to track the origin of messages sent over usock as the daemon does route them back down, and we need to get the "sender" info correct. Also do a better job of dealing with simultaneous connections to avoid binding to a used socket.
Refs trac:4280

This commit was SVN r30781.

The following Trac tickets were found above:
  Ticket 4280 --> https://svn.open-mpi.org/trac/ompi/ticket/4280
2014-02-20 17:27:05 +00:00
Ralph Castain
d42f4be8a4 Add unix socket component to OOB - no longer require active network for local operations. Demonstrate inter-transport crossover.
VERY tentatively schedule this for 1.7.5 - only to be applied if we see no troubles AND the branch is ready in advance.

cmr=v1.7.5:reviewer=rhc:subject=Add unix socket component to OOB

This commit was SVN r30742.
2014-02-16 20:54:12 +00:00
Ralph Castain
657796f9e0 Revert r30327 - turns out it isn't quite right just yet. :-(
Closes trac:4138

This commit was SVN r30328.

The following SVN revision numbers were found above:
  r30327 --> open-mpi/ompi@87d5f86025

The following Trac tickets were found above:
  Ticket 4138 --> https://svn.open-mpi.org/trac/ompi/ticket/4138
2014-01-18 23:38:39 +00:00
Ralph Castain
87d5f86025 Enable use of unix domain sockets for local OOB communications, thereby removing the requirement for an active network interface when running strictly on a single node. Update the overall OOB system to support cross-transport movement of messages so that the OOB can move a received message to another transport for transmission.
cmr=v1.7.5:reviewer=jsquyres:subject=Enable use of unix domain sockets for local OOB communications

This commit was SVN r30327.
2014-01-18 21:36:49 +00:00