Ralph Castain
3ae3b96c17
Fix master compilation - a buried header dependency must have been removed.
2015-02-10 07:22:10 -08:00
Ralph Castain
a3275aa867
Once again, fix the blasted singleton comm_spawn
2015-02-05 17:34:25 -08:00
Jeff Squyres
0dbbffb753
pmix_base_frame: use the "= { 0 }" initializer
...
Per open-mpi/ompi#381 , convert the specific intialization of opal_pmix
to use the generic "= { 0 }" initializer. This form can be used to
initialize any type when the intent is just to zero out / assign
*some* value.
2015-02-05 17:51:06 -05:00
Jeff Squyres
621af3aa07
pmix_base: fix global opal_pmix symbol for static linking on OS X
...
OS X has weirdness when static linking. If a symbol is not
initialized, it is put into the common block section, and Weird Things
happen (linking when trying to using that global symbol will fail).
If you initialize the variable, it goes into a different section (and
linking to it will work).
This link (that might go stale someday) has some information about OS
X linker scope and treatment of symbol definitions:
https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/MachOTopics/1-Articles/executing_files.html#//apple_ref/doc/uid/TP40001829-98432-TPXREF120
Fixes #375 .
2015-02-04 12:12:31 -05:00
Ralph Castain
294ebc907a
Fix singleton operations so they can work inside a slurm environment
2015-01-27 09:29:42 -06:00
Ralph Castain
ba25e8a0ce
Fix singletons
2015-01-27 09:29:42 -06:00
Ralph Castain
028b00154d
Complete implementation of the schizo framework to support OMPI component
2015-01-27 09:29:42 -06:00
Gilles Gouaillardet
9e9261e90a
pmix: correctly set locality flags in proc_flags
...
do not use opal_process_info.cpuset which is not
set at that time.
2014-12-26 15:37:08 +09:00
Howard Pritchard
91b0d03bf2
pmix/cray: remove dead code
2014-12-19 13:08:23 -08:00
Ralph Castain
573a574a3c
Remove an unused dstore type that was redundant with another one. Define a corresponding PMIX_NODE_ID type (contains the vpid of the daemon hosting the proc) and ensure that the PMIx server includes that info in its process map
2014-12-15 12:11:13 -08:00
Ralph Castain
9658256a98
Restore the passing of the complete job map to the local proc on first get_attr so the info can be used by the MPI layer without continual calls back to the server. We'll find a more memory efficient method later.
2014-12-13 18:44:09 -08:00
Howard Pritchard
c75dccede1
pmix/cray: remove finalize call from comp close
...
The finalize call in component close method is
no longer being matched by an equivalent init call,
so remove this call in the close method.
2014-12-03 09:44:18 -07:00
Ralph Castain
d9b23c1054
Increment the init_count in the Slurm pmix components so they correctly respond to calls to pmix.initialized
2014-12-02 20:20:29 -08:00
Gilles Gouaillardet
578fe41788
fix hangs introduced by previous commit a6744b8177
2014-11-25 17:50:44 +09:00
Howard Pritchard
a632b632ca
better way to tell if a process is in a Cray PAGG
...
Use a more reliable way to tell if a process is
1) in a Cray PAGG
2) is actually considered an application process on
a compute node (not for example, a process in a PAGG
on a mom node).
2014-11-12 12:56:15 -07:00
Howard Pritchard
72bb4a2eee
make cray pmi compile again
...
Commit @80f07b65 resulted in changes that
caused cray pmi component to no longer compile.
This commit fixes that issue.
2014-11-12 12:33:30 -07:00
Artem Polyakov
fce08a3db3
Fix SLURM PMI2 component. set s2_nrank to the relative position of a process inside the node
...
(not relative position of a node inside the allocation).
2014-11-12 16:26:35 +06:00
Ralph Castain
780c93ee57
Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
...
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Gilles Gouaillardet
80f07b65f1
pmix: correctly split pmi messages
...
Thanks to @elenash for all the reviews
2014-11-11 17:16:00 +09:00
elenash
2687637071
Merge pull request #263 from elenash/master
...
dstore sm component implementing shared memory database for pmix client/server communication
2014-11-07 07:56:55 +03:00
Howard Pritchard
b389895c66
fix make dist for pmix/cray
...
Include file was left out of "sources" list that prevented
building for cray from dist tarball.
2014-11-06 15:10:51 -07:00
Elena
03fc809bc9
This commit contains new dstore component sm which is used for communication between pmix server and clients at the same node via shared memory.
2014-11-06 16:01:19 +02:00
Gilles Gouaillardet
ca0b969991
pmix: fix a return status in native_get_attr
2014-10-30 15:26:23 +09:00
Gilles Gouaillardet
8c556bbc66
pmix: fix alignment issue
2014-10-29 13:19:23 +09:00
Ralph Castain
4f0c1ae8d9
Continue cleanup of the PMI config code. Eliminate the multiple calls to check for pmi1 and pmi2 - we must check it only once to get the pmix components to build only in the correct situations. Ensure we set the wrapper flags so we handle static builds correctly.
2014-10-27 20:37:33 -07:00
Gilles Gouaillardet
248acbbc3b
pmix/slurm: correctly set locality of the local ranks as "not found"
2014-10-23 17:02:07 +09:00
Gilles Gouaillardet
7508c6f3ad
pmix: correctly handle NULL OPAL_BYTE_OBJECT object
2014-10-22 17:15:21 +09:00
Nadezhda Kogteva
2bce929330
MTL MXM cleanup: unnecessary OMPI_MTL_MXM_CONNECT_ON_FIRST_COMM variable removed
2014-10-20 10:29:47 +03:00
Ralph Castain
b6aa691e0a
Fix incorrect implementation of new MCA param mca_base_env_list - it was not picking up envars and forwarding them, but only worked if you explicitly set a value for the envar. Ensure it works for both direct and indirect launch modes. Remove stale code as this replaced orte_forward_envars. Ensure it doesn't get passed to the ORTE daemons.
2014-10-16 12:58:56 -07:00
Gilles Gouaillardet
27dcca0bb2
pmi/s1: fix large keys
...
do not overwrite the PMI key when pushing a message that does
not fit within 255 bytes
2014-10-16 13:29:32 +09:00
Gilles Gouaillardet
5c81658d58
pmix: fix big endian arch
...
use the appropriate 64 bits type otherwise data gets incorrectly
truncated on big endian arch
2014-10-15 17:17:09 +09:00
Elena
c905fe9b78
pmix: removed pmix_base_direct modex mca parameter, renamed orte_full_modex_cutoff and ompi_hostname_cutoff to direct_modex_cutoff
2014-10-09 06:15:31 +02:00
Gilles Gouaillardet
5c5453b8b1
pmix: fix test in native_get_attr
2014-10-03 11:54:08 +09:00
Ralph Castain
9e35f80ab6
Don't multiply define WANT_PMI_SUPPORT and friends. Turns out they weren't being used anywhere anyway, so no point in defining them at all
...
This commit was SVN r32822.
2014-09-30 20:43:25 +00:00
Howard Pritchard
8da51fab81
cray pmi equivalent to commit 5eb65b24
...
This commit was SVN r32820.
2014-09-30 19:25:00 +00:00
Ralph Castain
8d0b4f222a
The pmix.get functions should not be returning "success" if the requested info isn't found. Fix the macros and the component functions so they correctly return "not found" in that situation, and set the data regions and size to NULL and 0, respectively.
...
This commit was SVN r32818.
2014-09-30 18:03:12 +00:00
Howard Pritchard
201d4ec3ad
fix setting of PMIX_NODE_RANK in cray pmix comp.
...
Per discussions with pmix folks, it was determined that
the way the cray pmi pmix component was computing the
PMIX_NODE_RANK attribute for a process was incorrect.
This commit fixes the problem.
This commit was SVN r32810.
2014-09-29 16:55:31 +00:00
Howard Pritchard
1508a01325
Fixes to enable mpirun to work again on Cray
...
The ess pmi module was not handling aprun launched
daemons. All daemons were thinking they were vpid 1.
Also, turns out that on cray systems using MOM nodes
for launched jobs, just detecting whether or not a
process is in a PAGG container is not sufficient.
Crank up the priority of the alps PLM component in the
event that the configure detected the presence of both
slurm and alps.
Have the ESS pmi component open the pmix framework and
select a pmix component.
This commit was SVN r32773.
2014-09-23 15:37:26 +00:00
Howard Pritchard
820b34e5d2
Fix bad cut/paste for commit c19e7369
...
This commit was SVN r32712.
2014-09-11 21:00:04 +00:00
Howard Pritchard
d07c5674a3
Fix potential double free in cray pmi cray_fini
...
This commit was SVN r32711.
2014-09-11 20:30:40 +00:00
Ralph Castain
a7c5b77d70
Just because the openib BTL can't reach a process doesn't mean it is a job-ending error. If we have other methods for reaching the process (e.g., sm for a local proc), then that's okay. If there is no method for reaching a proc, then that's an error - but the BML will report that situation.
...
The question of whether or not the openib BTL supports loopback is a separate question. It may be more appropriate to make the modex be PMIX_GLOBAL for cases where openib can support loopback so someone can run without a shared memory component. I'll leave that decision to the IB vendors.
This commit was SVN r32702.
2014-09-10 17:02:16 +00:00
Ralph Castain
6323b226c7
Bring over some updates from the PMIx branch - mostly just minor cleanups. Make the direct grpcomm component no longer be the default. For now, we seem to be having problems with non-blocking fence operations, so make them not be the default under any scenario (e.g., when sm is the only btl in operation).
...
This commit was SVN r32673.
2014-09-06 19:19:44 +00:00
Howard Pritchard
fe2ea1f0fb
fix handling of OPAL_DSTORE_LOCALITY and ref cnt
...
This commit was SVN r32671.
2014-09-05 21:36:19 +00:00
Ralph Castain
41c6058153
Bring over changes to MXM from pmix branch:
...
MTL MXM: establish endpoint connection on the first communication when direct_modex used
This commit was SVN r32668.
2014-09-03 18:22:11 +00:00
Ralph Castain
5cdbc00136
Re-enable the usock oob component. Ensure the TCP component promotes messages for other procs to the OOB base so that other components have a chance to send the relay. Seems to be passing MTT, so let's see how it works for others.
...
This commit was SVN r32650.
2014-08-30 19:33:46 +00:00
Ralph Castain
9ac75451ff
Nathan had requested this before as he needs to know the #procs in the job to optimize the UGNI btl. Add the fetch for that data - the native pmix component already provides it, but ensure the Slurm PMI-1 support does too. If not found, fall back to the non-optimized number
...
This commit was SVN r32648.
2014-08-29 22:53:35 +00:00
Ralph Castain
f865ef61ab
Need local_size returned by the Slurm components
...
This commit was SVN r32646.
2014-08-29 22:23:27 +00:00
Howard Pritchard
9a2891f2d6
handle PMIX_LOCAL_SIZE attr arg in cray pmix
...
This commit was SVN r32645.
2014-08-29 21:18:02 +00:00
Ralph Castain
730e28349e
Some minor uninitialized variable cleanups
...
This commit was SVN r32629.
2014-08-29 02:21:13 +00:00
Gilles Gouaillardet
d743da18bf
pmix: fix process name parsing on 32 bits systems
...
opal_process_name_t is an uint64_t which is not equivalent to
an unsigned long on 32 bits systems.
this is now parsed as an unsigned long long.
This commit was SVN r32592.
2014-08-25 03:08:02 +00:00