Brian Barrett
84aeb6a6a5
Update request alloc to use free list get instead of free list wait.
...
This commit was SVN r28729.
2013-07-05 20:24:43 +00:00
Brian Barrett
ea9cee73c1
Per RFC, remove darwin backtrace, since OS X since 10.5 has supported the
...
execinfo() interface (which has been the default for OMPI to use on Darwin)
This commit was SVN r28727.
2013-07-05 19:06:27 +00:00
George Bosilca
dc9352faf6
Remove some unused variables.
...
This commit was SVN r28726.
2013-07-05 13:31:54 +00:00
George Bosilca
8b01c3da33
Slightly reorder the code.
...
This commit was SVN r28725.
2013-07-05 13:29:29 +00:00
Jeff Squyres
b417095639
Do not destroy the sub-communicator until we have freed its attributes,
...
per the reason cited in the comment in the code.
This commit was SVN r28724.
2013-07-05 12:15:03 +00:00
George Bosilca
483ed8da8c
Remove an unused variable resulting from the removal of the last parameter of
...
the OMPI_FREE_LIST_GET macro.
This commit was SVN r28723.
2013-07-04 09:19:00 +00:00
George Bosilca
c9e5ab9ed1
Our macros for the OMPI-level free list had one extra argument, a possible return
...
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.
The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.
This commit was SVN r28722.
2013-07-04 08:34:37 +00:00
Ralph Castain
21c8041a40
Update libevent 2021 component so it also only warns once when detecting reentrant behavior
...
This commit was SVN r28721.
2013-07-04 04:41:04 +00:00
Ralph Castain
62378209f0
Even if we don't find the default hostfile, and nothing else was provided, then use all the known nodes.
...
cmr:v1.7.3:#3653:reviewer=jsquyres
cmr:v1.6.6:#3654:reviewer=jsquyres
This commit was SVN r28718.
2013-07-03 22:31:32 +00:00
Ralph Castain
443a6802b9
If the default hostfile is empty, we need to pickup all the known nodes, not just the head node.
...
cmr:v1.7.3:reviewer=jsquyres
cmr:v1.6.6:reviewer=jsquyres
This commit was SVN r28717.
2013-07-03 22:25:51 +00:00
Ralph Castain
bd65937bf3
If we enable ipv6, we resolve a hosts addresses and check them all against our local interfaces to determine if the given host is us. However, if we don't enable ipv6, we only checked the first address returned. This can cause us to incorrectly identify a hostname as "not us".
...
Make -disable-ipv6 behave the same as --enable-ipv6 by checking all the returned addresses.
This commit was SVN r28716.
2013-07-03 21:41:36 +00:00
Brian Barrett
d3b49535b5
Only allow communication from the same user, since we don't have job-level
...
protection.
This commit was SVN r28715.
2013-07-03 17:29:02 +00:00
Ralph Castain
45fad1ddcc
We really should be closing the event framework when told to do so.
...
cmr:v1.7.3,reviewer=jsquyres
This commit was SVN r28714.
2013-07-03 16:57:14 +00:00
Jeff Squyres
d1ce64f049
Fix some "malloc of 0 bytes" warnings
...
This commit was SVN r28713.
2013-07-03 12:05:33 +00:00
Ralph Castain
9166a8cc95
Per telecon today, add a flag so we only warn once about reentrant libevent loops - this will allow developers to better diagnose the problem as we won't swamp filesystems with warning messages.
...
This commit was SVN r28712.
2013-07-03 04:51:36 +00:00
Ralph Castain
243f9ef586
Set ignores
...
This commit was SVN r28711.
2013-07-03 04:47:33 +00:00
Jeff Squyres
ad16bcd6d1
Followup from Justin Bronder: Looks like I spoke too soon. The
...
sandbox team has informed me that they are getting rid of SANDBOX_PID
in the future and that using SANDBOX_ON would be preferred.
This commit was SVN r28708.
2013-07-03 01:38:26 +00:00
Brian Barrett
81efd0e3cf
Properly shut down Portals collective component
...
This commit was SVN r28707.
2013-07-02 22:07:27 +00:00
Brian Barrett
133dafd3dc
First take at Barrier and Ibarrier, both of which seem to work.
...
This commit was SVN r28706.
2013-07-02 21:42:10 +00:00
Brian Barrett
c4577723ed
fix misuse of param api
...
This commit was SVN r28705.
2013-07-02 21:41:42 +00:00
Brian Barrett
c9a8217af6
Portals 4 doesn't have a BTL, need to default to MTL, rather than finding some stupid slow BTL. THis selection logic sucks.
...
This commit was SVN r28704.
2013-07-02 21:18:04 +00:00
Brian Barrett
e4698f5cd4
Shell of the Portals 4 collectives componetn
...
This commit was SVN r28703.
2013-07-02 15:23:55 +00:00
Jeff Squyres
fea15ec34e
Add memory hooks override for Gentoo sandbox v2.5, too. Thanks to
...
Justin Bronder for the patch.
This commit was SVN r28702.
2013-07-02 12:34:51 +00:00
George Bosilca
fe012cdc2b
Use the converted value instead of calling the macro again.
...
This commit was SVN r28701.
2013-07-02 11:33:18 +00:00
Joshua Ladd
5d2d5e958c
Deleting garbage I accidentally committed. Thanks, Nathan\!
...
This commit was SVN r28698.
2013-07-01 22:50:54 +00:00
Joshua Ladd
e2b53dcf10
Adding the ompi_check_libhcoll.m4 file
...
This commit was SVN r28695.
2013-07-01 22:45:36 +00:00
Joshua Ladd
d7a50343bf
Per the details and schedule outlined in the attached RFC, Mellanox Technologies would like to CMR the new 'coll/hcoll' component. This component enables Mellanox Technologies' latest HPC middleware offering - 'Hcoll'. 'Hcoll' is a high-performance, standalone collectives library with support for truly asynchronous, non-blocking, hierarchical collectives via hardware offload on supporting Mellanox HCAs (ConnectX-3 and above.) To build the component, libhcoll must first be installed on your system, then you must configure OMPI with the configure flag: '--with-hcoll=/path/to/libhcoll'. Subsequent to installing, you may select the 'coll/hcoll' component at runtime as you would any other coll component, e.g. '-mca coll hcoll,tuned,libnbc'. This has been reviewed by Josh Ladd and should be added to cmr:v1.7:reviewer=jladd
...
This commit was SVN r28694.
2013-07-01 22:39:43 +00:00
George Bosilca
ae190246df
Oops, thanks Jeff for noticing.
...
This commit was SVN r28693.
2013-07-01 17:51:52 +00:00
George Bosilca
e665cda6c2
Add the empty basic component where the function pointer from the
...
base will be copied over. Without such a decoy component the
entire framework will not function correctly.
This commit was SVN r28692.
2013-07-01 17:47:44 +00:00
George Bosilca
dc1e68c3c1
Remove the item from the list before releasing it.
...
This commit was SVN r28691.
2013-07-01 16:54:48 +00:00
George Bosilca
702e669636
Remove a [very] annoying warning.
...
This commit was SVN r28690.
2013-07-01 16:49:13 +00:00
George Bosilca
a5bda43cfc
Small typo.
...
This commit was SVN r28689.
2013-07-01 16:48:45 +00:00
George Bosilca
5fae72b9aa
Add the MPI 2.2 MPI_Dist_graph functionality.
...
This patch reshape the way we deal with topologies completely. Where
our topologies were mainly storage components (they were not capable
of creating the new communicator), the new version is built around a
[possibly] common representation (in mca/topo/topo.h), but the functions
to attach and retrieve the topological information are specific to each
component. As a result the ompi_create_cart and ompi_create_graph functions
become useless and have been removed.
In addition to adding the internal infrastructure to manage the topology
information, it updates the MPI interface, and the debuggers support and
provides all Fortran interfaces.
This commit was SVN r28687.
2013-07-01 12:40:08 +00:00
George Bosilca
b82abf6bef
Silence a compiler warning.
...
This commit was SVN r28686.
2013-07-01 11:40:42 +00:00
Rolf vandeVaart
adda653fc1
Fix two bugs from previous commit.
...
This commit was SVN r28684.
2013-06-28 16:32:51 +00:00
Rolf vandeVaart
850d325f32
Adjust how search is done for dynamic load of library. CUDA only.
...
This commit was SVN r28683.
2013-06-27 22:13:25 +00:00
Ralph Castain
446e33a5d8
There are cases where we want to use the novm state machine, but the backend node topology differs from that where mpirun is executing. In those cases, we can wind up thinking we are oversubscribed because the head node has fewer cores than the compute nodes.
...
To resolve this situation, add the ability to specify a backend topology file that mpirun shall use for its mapping operations. Create a new "set_topology" function in opal hwloc to support it.
This commit was SVN r28682.
2013-06-27 03:04:50 +00:00
Jeff Squyres
75e4b92edd
Sync to v1.7 NEWS bullets
...
This commit was SVN r28681.
2013-06-26 19:47:01 +00:00
Ralph Castain
7331dd9534
Apparently, the alps configury has not been checked since we added the RTE abstraction code. Fix it now.
...
This commit was SVN r28673.
2013-06-26 07:03:54 +00:00
Ralph Castain
e8340b6339
There is no convention out there as to how OEMs handle PMI2 functions. Some put them in their own -lpmi2 library, and some don't. Some have split the PMI2 definitions into a pmi2.h and keep the PMI-1 definitions in a separate pmi.h, and some don't.
...
Try to handle cases more generally so at least Slurm and Cray can co-exist in peace.
This commit was SVN r28672.
2013-06-26 00:43:26 +00:00
Ralph Castain
fa943dc6ff
Cleanup a few things in the revised PMI configury - we know slurm has both pmi and pmi2 libs, so just auto-detect the presence of them if the user directed us to build with pmi support.
...
Also cleanup some changed names in the alps code
This commit was SVN r28670.
2013-06-24 02:41:40 +00:00
Jeff Squyres
e3d0782788
Move the assignment after the bozo check.
...
This commit was SVN r28669.
2013-06-22 12:38:32 +00:00
Jeff Squyres
dd25421d48
Convert strcpy() to strncpy(), and just to be extra-super paranoid,
...
use memset(0) for extra bonus points.
This commit was SVN r28668.
2013-06-22 12:21:18 +00:00
Rolf vandeVaart
5ebb74bee3
Fix case where amount of data sent is less than expected. Otherwise, we will get hang when running the RGET protocol.
...
Reviewed by hjelm,bosilca.
This commit was SVN r28667.
2013-06-21 18:35:16 +00:00
Joshua Ladd
0b5c1f2ea8
Add 'generic' support for PMI2 (previously, we checked for PMI2 only on Cray systems.) If your resource manager (e.g. SLURM) has support for PMI2, then the --with-pmi configure flag will enable its usage. If you don't have PMI2, then you will fallback to regular old PMI1. This patch was submitted by Ralph Castain and reviewed and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
...
This commit was SVN r28666.
2013-06-21 15:28:14 +00:00
Nathan Hjelm
299d5b3dd7
Fix two debugger attach bugs.
...
- orte_debugger_init_after_spawn was not being called for debuggers that
use the MPIR_attach_fifo to co-locate debugger daemons.
- MPIR_Breakpoint was not getting called if a debugger reattached. Add
a job state (ORTE_JOB_STATE_DEBUGGER_DETACH) to reset mpir_breakpoint_fired
to false when a debugger detaches to ensure MPIR_Breakpoint is called if
another debugger attaches. Tested with STAT 2.0/launchmon 1.0.
cmr:v1.7
This commit was SVN r28665.
2013-06-20 16:18:05 +00:00
Jeff Squyres
b9ca8e3cd1
Tweaked the help message a bit (this is the end result of iterating on
...
the message in email between Mike, Ralph, Jeff).
Add this to CMR #3642 and #3643 .
This commit was SVN r28662.
2013-06-20 13:19:23 +00:00
Jeff Squyres
84a4a2b18d
Sync with v1.6 bullets
...
This commit was SVN r28661.
2013-06-20 12:34:40 +00:00
Ralph Castain
13665bffe8
Per an off-list discussion, it appears possible for a system to report failure when executing getpwuid. There are several reasons for this error to occur, most notably if the system uses a network-based authentication protocol (e.g., NIS) and that sytem gets overwhelmed when we launch on a lot of nodes.
...
There is no good way to recover from this scenario, and from past experience, using the user's name in the session directory (as opposed to the uid) is very helpful when things go wrong. So print a help message when this happens (it is extremely rare, but has happened at least once now) and return an error.
cmr:v1.7.3,reviewer=jsquyres
cmr:v1.6.5,reviewer=jsquyres
This commit was SVN r28658.
2013-06-20 04:30:42 +00:00
Jeff Squyres
2e5c18195b
We want to ignore this MPI extension in the general case -- it's just
...
an example (and outputs stuff to stdout!).
This commit was SVN r28654.
2013-06-19 16:01:45 +00:00