1
1
Граф коммитов

16654 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
951d72692c Reverse the #if direction so we report daemon failure to the errmgr - otherwise, we just hang if a daemon fails to start.
Reviewed with Josh.

This commit was SVN r25366.
2011-10-25 19:09:52 +00:00
Nathan Hjelm
433cfa3665 use single copy for some sends
This commit was SVN r25365.
2011-10-25 18:38:42 +00:00
Mike Dubman
9ffeeb69d9 fix help message
This commit was SVN r25364.
2011-10-25 14:02:43 +00:00
Samuel Gutierrez
c646c93eec remove unneeded flags from cray xe6 platform file.
This commit was SVN r25363.
2011-10-24 18:42:43 +00:00
Samuel Gutierrez
663f4546f5 fix define typo in psm mtl.
This commit was SVN r25362.
2011-10-24 18:38:12 +00:00
Ralph Castain
c55cba55a7 Totally trivial spelling fix
This commit was SVN r25361.
2011-10-24 14:06:33 +00:00
Ralph Castain
a7cbc25658 Minor cleanups - check hwloc returns everywhere. Thanks to Chris Yeoh for pointing this out.
This commit was SVN r25360.
2011-10-24 14:05:26 +00:00
Ralph Castain
955d8e7d46 Allow apps to use pmi when launched by mpirun, if desired, without affecting daemons
This commit was SVN r25359.
2011-10-23 15:57:13 +00:00
Nathan Hjelm
e8af0d8589 don't use alps paffinity
This commit was SVN r25358.
2011-10-21 22:52:03 +00:00
Abhishek Kulkarni
46952e9008 Fix C/R functionality in trunk. Intra-node checkpointing of a job now works as expected.
Signed-off-by: Abhishek Kulkarni <adkulkar@osl.iu.edu>

This commit was SVN r25357.
2011-10-21 22:07:35 +00:00
Samuel Gutierrez
949364d2d6 update LANL Cray XE6 platform files to include PMI support.
This commit was SVN r25356.
2011-10-21 21:05:23 +00:00
Nathan Hjelm
7b1172b346 need a terminating character in the decoded string
This commit was SVN r25355.
2011-10-21 16:46:28 +00:00
Nathan Hjelm
fb19f56965 Cray doesn't define PMI2_SUCCESS
This commit was SVN r25354.
2011-10-21 16:34:22 +00:00
Nathan Hjelm
cd257ac707 fixed typo in pmi grpcomm
This commit was SVN r25353.
2011-10-21 16:28:36 +00:00
Nathan Hjelm
cd68dbe2b8 only try to build vader if xpmem is installed. unignore vader
This commit was SVN r25352.
2011-10-21 15:45:05 +00:00
Shiqing Fan
5711414eb7 Fix Windows build
This commit was SVN r25351.
2011-10-21 14:46:58 +00:00
Ralph Castain
53ef085567 Fix a minor issue seen by Jeff in specific failure pathway
This commit was SVN r25350.
2011-10-21 14:44:48 +00:00
Jeff Squyres
cbafea8f69 Add a DEPENDENCIES line so that if you edit something down in the
hwloc tree, it'll get picked up by the component (and therefore by
libopen-pal).

Thanks to Terry for finding the problem.

This commit was SVN r25349.
2011-10-21 11:39:52 +00:00
Ralph Castain
3e72fccacf Cray's PMI implementation is quite different from slurm's - they extended PMI-1 by adding some, but not all, of the PMI-2 APIs. So you can't just switch to using PMI-2 functions as it isn't a complete implementation. Instead, you have to selectively figure out which ones they have in PMI-2, and use any missing ones from PMI-1. What fun.
Modify the configure logic and the PMI components to accommodate Cray's approach. Refactor the PMI error reporting code so it resides in only one place. Cray actually decided -not- to define the PMI-2 error codes, so we have to use the PMI-1 codes instead. More fun.

This commit was SVN r25348.
2011-10-21 04:54:38 +00:00
Ralph Castain
e2adc8fa3a Ignore until Nathan can fix - probably configure problem
This commit was SVN r25347.
2011-10-21 03:43:01 +00:00
Ralph Castain
5947f61b86 Remove windows reference for now
This commit was SVN r25346.
2011-10-21 01:19:03 +00:00
Nathan Hjelm
414677a082 default to no xpmem support
This commit was SVN r25345.
2011-10-20 22:13:45 +00:00
Nathan Hjelm
ce29170968 update lanl xe6 platform files for vader
This commit was SVN r25344.
2011-10-20 21:50:53 +00:00
Nathan Hjelm
808a73a5c5 removed erroneous add of .deps
This commit was SVN r25343.
2011-10-20 21:41:51 +00:00
Nathan Hjelm
3dbaaf6879 initial commit of vader (xpmem) btl
This commit was SVN r25342.
2011-10-20 21:39:44 +00:00
Nathan Hjelm
586403f052 more pmi return code wtf
This commit was SVN r25337.
2011-10-20 17:53:04 +00:00
Nathan Hjelm
beb8d8ce32 pmi return code wtf
This commit was SVN r25336.
2011-10-20 17:51:24 +00:00
Ralph Castain
43e35486a4 Correct flag type - thanks George!
This commit was SVN r25335.
2011-10-20 04:00:13 +00:00
Nathan Hjelm
e1e8837992 add a uintptr_t to the seg_key union
This commit was SVN r25334.
2011-10-19 21:48:52 +00:00
George Bosilca
78751b3b2d Put back the OPI errors after the ORTE one.
This commit was SVN r25333.
2011-10-19 20:57:13 +00:00
Ralph Castain
84713d5a84 Fix singletons again - must have been broken for a very long time, which only shows how little anyone cares about this capability.
This commit was SVN r25332.
2011-10-19 20:19:08 +00:00
Ralph Castain
b44f8d4b28 Complete implementation of the ess.proc_get_locality API. Up to this point, the API was only capable of telling if the specified proc was sharing a node with you. However, the returned value was capable of telling you much more detailed info - e.g., if the proc shares a socket, a cache, or numa node. We just didn't have the data to provide that detail.
Use hwloc to obtain the cpuset for each process during mpi_init, and share that info in the modex. As it arrives, use a new opal_hwloc_base utility function to parse the value against the local proc's cpuset and determine where they overlap. Cache the value in the pmap object as it may be referenced multiple times.

Thus, the return value from orte_ess.proc_get_locality is a 16-bit bitmask that describes the resources being shared with you. This bitmask can be tested using the macros in opal/mca/paffinity/paffinity.h

Locality is available for all procs, whether launched via mpirun or directly with an external launcher such as slurm or aprun.

This commit was SVN r25331.
2011-10-19 20:18:14 +00:00
George Bosilca
1bc5da0911 These are supposed to be OPAL level errors.
This commit was SVN r25326.
2011-10-19 14:22:09 +00:00
Ralph Castain
72a4b0bd8a Fix constants
This commit was SVN r25325.
2011-10-19 14:14:58 +00:00
George Bosilca
a5f24bcdcf The error here is meaningless.
This commit was SVN r25324.
2011-10-19 13:04:46 +00:00
George Bosilca
efd88e10d7 Cleanup the error codes. Get rid of all the useless ones, and
mark the distinction between ORTE and OMPI errors.

This commit was SVN r25323.
2011-10-19 03:51:53 +00:00
Ralph Castain
2958f3de34 Add some clarifying comments and a small efficiency improvement
This commit was SVN r25322.
2011-10-18 18:30:43 +00:00
Ralph Castain
b771114086 Fix the fix :-)
If the errmgr is going to try and hold the orted until all routes and children are gone, then the exit cmd must do the same. Otherwise, the orted exits immediately without waiting for routes to be dismantled, which is why we don't see the connections close.

Also cleanup some diagnostics and add some debug to more clearly see what's going on.

This commit was SVN r25321.
2011-10-18 17:56:37 +00:00
Nathan Hjelm
adf950f4ab LANL: don't use per-peer receive queues on rr-class
This commit was SVN r25320.
2011-10-18 16:45:44 +00:00
Nathan Hjelm
9155f1ba1f LANL: up cq size
This commit was SVN r25319.
2011-10-18 16:40:35 +00:00
Nathan Hjelm
e16559983e LANL: match tlcc QP settings with tlcc2
This commit was SVN r25318.
2011-10-18 16:32:05 +00:00
Nathan Hjelm
607d387088 LANL: use only shared receive queues on tlcc
This commit was SVN r25317.
2011-10-18 16:23:46 +00:00
Nathan Hjelm
90c55c5b35 LANL: use pmi on tlcc
This commit was SVN r25316.
2011-10-18 16:12:14 +00:00
Ralph Castain
ae8e556d14 Okay, once again let's fix the vpid calculator. Identified problem with prior commit (some rmaps components already place their procs in the jdata->procs array, and others don't), so account for those variations.
This commit was SVN r25315.
2011-10-18 15:50:11 +00:00
George Bosilca
749b63c09d Provide a generic fix for the termination issue instead of r25248. The
termination condition is to be checked at the daemon/HNP level not down
in the routing.

This commit was SVN r25313.

The following SVN revision numbers were found above:
  r25248 --> open-mpi/ompi@b42ccc89b8
2011-10-18 03:07:37 +00:00
George Bosilca
c453614f8b A more meaningful name for this function (mpi_proc_complete_init
instead of ompi_proc_set_arch). Change the comment to reflect the
real behavior of the function.

This commit was SVN r25312.
2011-10-18 02:54:38 +00:00
George Bosilca
f28890fbb7 Revert r25302 as it break the --bynode option.
This commit was SVN r25311.

The following SVN revision numbers were found above:
  r25302 --> open-mpi/ompi@d7a8553179
2011-10-18 02:48:17 +00:00
Ralph Castain
0bf4f48aa3 Don't need priority in this framework
This commit was SVN r25308.
2011-10-17 22:39:15 +00:00
Ralph Castain
2fdd9c6dea Ensure mpirun doesn't pick this component
This commit was SVN r25307.
2011-10-17 22:28:28 +00:00
Nathan Hjelm
ad9005820f fixed typo in last commit
This commit was SVN r25306.
2011-10-17 21:35:22 +00:00