1
1

3474 Коммитов

Автор SHA1 Сообщение Дата
Elena
af38a762a2 these changes fix direct routed component under mpirun; oob tcp and oob ud are working with direct routed component, but usock doesn't work with direct routed component yet. 2014-12-05 12:38:59 +02:00
Ralph Castain
c4fd6d1cde Fix typo 2014-12-04 12:24:35 -08:00
Ralph Castain
c4002a8485 Further cleanups on the LSF integration - the affinity file is apparently always present, but simply empty if affinity wasn't set. 2014-12-04 12:24:35 -08:00
Ralph Castain
c88f181efe Fix singleton comm-spawn, yet again. The new grpcomm collectives require a complete knowledge of every active proc in the system in case they participate in a collective. So ensure we pass the required job info when we spawn new daemons, and construct the necessary connections to allow grpcomm to operate. 2014-12-03 18:11:17 -08:00
Howard Pritchard
c67afadcfc Merge pull request #289 from hppritcha/topic/remove_pmi
Topic/remove pmi
2014-12-03 16:58:35 -07:00
Jeff Squyres
a3af7d6dbb Revert "lsf configury: add dependent libraries for static linking"
This reverts commit 56cfa90ddaaf9939926ff54b5f2f34e32254681e.
2014-12-03 13:32:56 -08:00
Jeff Squyres
92c2ff91ec Revert "Cleanup static build requirements by adding the wrapper flags back to the component configure.m4's. Minor cleanup of the lsf configure logic."
This reverts commit open-mpi/ompi@32bf0e7b7e.
2014-12-03 13:15:20 -08:00
Ralph Castain
54c955c92d Fix a race condition that only appears to be affecting certain setups. The pmix.finalize function closes the file descriptor to the server, which then triggers the errhandler callback. Since the errmgr is about to be unloaded, it might be getting hit. 2014-12-03 12:19:00 -08:00
Howard Pritchard
666344a081 orte/mca/common/alps: fix configure file
Fix configure file for alps to actually check for
alps being available.

Also include stdio.h explicitly in common_alps.c
2014-12-03 09:44:18 -07:00
Howard Pritchard
ec38aa3732 orte/mca/common: add missing Makefile.am 2014-12-03 09:44:18 -07:00
Howard Pritchard
191fe0f949 alps configury changes
Clean up the orte_check_alps.m4.  There was a little of
unnecesary stuff for handling cle 5, since it wasn't actually
doing the right thing, which would be to use pkg-config to
find dependencies both for dynamic and static linking.

Decouple the searching for alps libs, etc. from cray pmi.

Switch the alps ess  and alps odls components' config files
to use the ALPS m4 macro.

alps configury fixes

Improve a check for detecting CLE release.
Improve an error message.
2014-12-03 09:44:17 -07:00
Howard Pritchard
d749077e1e odls/alps: make sure PMI env. variables set up
Add call to orte_odls_alps_get_rdma_creds in the
local proc launch step to obtain the Cray Rdma
credentials from the apshepherd, and to set
the PMI env. variables expected by uGNI BTL, etc.
2014-12-03 09:44:17 -07:00
Howard Pritchard
e0487e7702 orte/common/alps: add an alps common lib to orte
Add an alps common lib to orte.  Add a function
to determine whether or not a process is in a
PAGG container.

Note: we need a better naming convention for
common libs, since right now they use a "flat"
naming convention.
2014-12-03 09:44:17 -07:00
Howard Pritchard
a753c3ece0 ess/alps: add initial alps ess component
Note this alps ess component has nothing to do
with the old CNOS alps component used on
Cray Seastar/Portals3 (Cray XT) systems.

To work properly, changes need to be made to the
open method of the ess/pmi component to keep it
from selecting, and thus initializing, the opal/pmix/cray
component.
2014-12-03 09:44:17 -07:00
Ralph Castain
32bf0e7b7e Cleanup static build requirements by adding the wrapper flags back to the component configure.m4's. Minor cleanup of the lsf configure logic. 2014-12-03 07:14:06 -08:00
Ralph Castain
cb15cc06e1 Minor changes per Jeff's request on PR for 1.8.4 2014-12-02 19:54:10 -08:00
Ralph Castain
6294ed991b Fix singletons - still working on singleton comm_spawn 2014-12-02 14:12:24 -08:00
Ralph Castain
14cdb04327 Revise the ess/pmi selection logic as all APPs must select it, and no daemons. Cleanup some of the mca param levels in ess so we don't printout the topology quite as easily. 2014-12-01 21:19:11 -08:00
Jeff Squyres
56cfa90dda lsf configury: add dependent libraries for static linking
Ensure to add the LSF dependent libraries and LD flags for the wrapper
compiler static linking case.
2014-12-01 14:59:10 -08:00
Ralph Castain
f92ccaf0f9 Add missing var declarations 2014-12-01 09:36:28 -08:00
Ralph Castain
960ef34988 Ensure the LSF ras adds the hosts to the allocation. Correctly handle the semi-colon vs comma situation in hwloc slot_lists 2014-11-30 14:37:37 -08:00
Ralph Castain
3f9d9ae8b6 Provide tighter LSF integration by correctly handling scenarios where the user has asked LSF to assign bindings. Fix a couple of typos in lex parser definitions. Tell hostfile parser to ignore binding designations in hostfiles. Add an attribute to indicate that cpusets were provided as physical cpu ids.
Once validated, a version of this will be backported to the v1.8.4 release.
2014-11-30 11:50:31 -08:00
Elena
b17ea23ce0 fixes for direct routed component under mpirun 2014-11-26 13:36:49 +02:00
Nadezhda Kogteva
8dd21c7736 OOB UD: fix case when multiple oob components were specified in command line (checking of uri). 2014-11-25 11:48:11 +02:00
Gilles Gouaillardet
578fe41788 fix hangs introduced by previous commit a6744b81777ab8247908350bd15cca49bedf5208 2014-11-25 17:50:44 +09:00
Gilles Gouaillardet
a6744b8177 fix misc memory leaks specific to the master 2014-11-25 13:52:10 +09:00
Ralph Castain
48f702827e First part of memory leak cleanups from Gilles 2014-11-24 16:53:33 -08:00
Ralph Castain
2e00e335b9 Add missing header to tarball. Remove stale opal_unignore 2014-11-21 17:35:11 -08:00
Howard Pritchard
6e807c4e8a odls/alps: minor config cleanup 2014-11-21 11:09:28 -07:00
Nadezhda Kogteva
05b2eb1270 OOB UD: opal_ignore removed from oob ud component: component is compilable. Added support of new RML API, support of opal_buffer as input data. Added usage of routed component. 2014-11-20 10:20:35 +02:00
Howard Pritchard
9425ebefae Be more selective about closing fd's for alps/odls
Be more selective about closing fd's for the alps odls
component.  Don't close fd's of pipes set up by the
apshepherd for providing RDMA credentials, etc.

Add an entry to the help file in case
alps_app_lli_pipes returns an error.
2014-11-19 11:21:30 -07:00
Howard Pritchard
34c156759e fix some compiler warnings in ras/alps 2014-11-18 11:32:37 -07:00
Howard Pritchard
4df3447d96 fix compare_nodes bug in alps ras component
There was an obvious bug in the alps/ras component compare_nodes method
which resulted in the function always evaluating the nodes
as being equivalent.
2014-11-18 11:15:02 -07:00
Howard Pritchard
ff362c16ce add/update copyrights for alps odls component 2014-11-18 10:16:11 -07:00
Howard Pritchard
dc98b62070 add initial support for an alps odls component
It turns out that the support for Open MPI apps on
Cray was hanging on a thin thread of support when
using the mpirun job launcher.  It just happened that
with a certain set of configuration options things would
work.  This is bound to backfire at some point.

To fix this weakness, as well as to allow for mpirun launched
jobs to benefit from many of the advanced placement features
provided by the Cray Linux Environment (as opposed to the hwloc
only default env of orte), a new odls alps component is introduced.
2014-11-17 14:00:09 -07:00
Ralph Castain
d9ceb5aea4 Fix C++ builds by removing no-longer-needed type declaration 2014-11-14 11:44:24 -08:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Ralph Castain
2a90788724 Support physical processor ids in rankfile 2014-11-10 14:00:40 -08:00
Ralph Castain
8c837d3cb3 Doh - if we can't output an entire block, then we need to adjust the number of bytes remaining to be output or else we will output duplicate bytes when next we are able to write. 2014-11-07 13:13:13 -08:00
Elena
03fc809bc9 This commit contains new dstore component sm which is used for communication between pmix server and clients at the same node via shared memory. 2014-11-06 16:01:19 +02:00
Ralph Castain
738c3e1d72 Ensure that mpirun correctly selects the HNP ess component without attempting to init the PMI subsystem as mpirun won't be supported anyway, so let's avoid the error message. Also, daemons launched by the plm/slurm component must use the ess/slurm module as we cannot trust the Slurm PMI_Init functions to correctly tell us when PMI support is available. 2014-11-03 21:35:42 -08:00
Ralph Castain
6fbc68c830 Update the grpcomm direct component's priority so it sits at the bottom of the list, as it should now that the other components are active. Cleanup up the signature print function a touch to make it more readable. Remove the unneeded xcast functions in brks and rcd components as we will just fall thru to using the "direct" one 2014-11-03 14:43:17 -08:00
Gilles Gouaillardet
652ecdb888 oob/tcp: always include a missing header file
improve open-mpi/ompi@c9d1e16a9e
2014-10-29 13:39:23 +09:00
Gilles Gouaillardet
c9d1e16a9e oob/tcp: include a missing header file
warning can be seen under cygwin without the missing header file
2014-10-28 13:56:25 +09:00
Ralph Castain
64fae47d85 Ensure that the proxy "pull" of output gets directed to the requesting tool, which is no longer just the sender since the HNP may be making the request on behalf of someone else 2014-10-23 10:21:21 -07:00
Ralph Castain
526682e2f9 Add the ability for a tool that requests spawn of a job to also request forwarding of all output to the tool. The tool is responsible for its own call to push its stdin to the new job. The push request can come -after- the job is started, but the pull request has to be done during the spawn procedure or else output can be lost. 2014-10-23 08:16:49 -07:00
Ralph Castain
894acb0aa8 configury: new OPAL_SET_MCA_PREFIX/ORTE_SET_MCA_CMD_LINE_ID macros
These two macros set the MCA prefix and MCA cmd line id,
   respectively.  Specifically, MCA parameters will be named
   PREFIX<foo> in the environment, and the cmd line will use
   -ID foo bar.

   These macros must be called during configure.ac and a value
   supplied. In the case of Open MPI, the values given are
   PREFIX=OMPI_MCA_ and ID=mca.

   Other projects (such as ORCM) will call these macros with
   their own unique values.  For example, ORCM uses PREFIX=ORCM_MCA_
   and ID=omca

   This scheme is necessary to allow running Open MPI applications under
   systems that use their own versions of ORTE and OPAL.  For example,
   when running OMPI applications under ORCM, we need the MCA params passed
   to the ORCM daemons to be separated from those recognized by the OMPI application.
2014-10-22 18:57:40 -07:00
Ralph Castain
b6aa691e0a Fix incorrect implementation of new MCA param mca_base_env_list - it was not picking up envars and forwarding them, but only worked if you explicitly set a value for the envar. Ensure it works for both direct and indirect launch modes. Remove stale code as this replaced orte_forward_envars. Ensure it doesn't get passed to the ORTE daemons. 2014-10-16 12:58:56 -07:00
Ralph Castain
4fc4a8346b Fix a couple of minor issues. Ensure usock isn't used if the session dirs aren't setup. Protect an oddball case where orte_xml_fp is NULL. 2014-10-09 20:58:46 -07:00
Elena
e319c95267 fixes for grpcomm rcd/brucks algorithms 2014-10-09 06:12:26 +02:00