Joshua Ladd
e2b53dcf10
Adding the ompi_check_libhcoll.m4 file
...
This commit was SVN r28695.
2013-07-01 22:45:36 +00:00
Joshua Ladd
d7a50343bf
Per the details and schedule outlined in the attached RFC, Mellanox Technologies would like to CMR the new 'coll/hcoll' component. This component enables Mellanox Technologies' latest HPC middleware offering - 'Hcoll'. 'Hcoll' is a high-performance, standalone collectives library with support for truly asynchronous, non-blocking, hierarchical collectives via hardware offload on supporting Mellanox HCAs (ConnectX-3 and above.) To build the component, libhcoll must first be installed on your system, then you must configure OMPI with the configure flag: '--with-hcoll=/path/to/libhcoll'. Subsequent to installing, you may select the 'coll/hcoll' component at runtime as you would any other coll component, e.g. '-mca coll hcoll,tuned,libnbc'. This has been reviewed by Josh Ladd and should be added to cmr:v1.7:reviewer=jladd
...
This commit was SVN r28694.
2013-07-01 22:39:43 +00:00
George Bosilca
ae190246df
Oops, thanks Jeff for noticing.
...
This commit was SVN r28693.
2013-07-01 17:51:52 +00:00
George Bosilca
e665cda6c2
Add the empty basic component where the function pointer from the
...
base will be copied over. Without such a decoy component the
entire framework will not function correctly.
This commit was SVN r28692.
2013-07-01 17:47:44 +00:00
George Bosilca
dc1e68c3c1
Remove the item from the list before releasing it.
...
This commit was SVN r28691.
2013-07-01 16:54:48 +00:00
George Bosilca
702e669636
Remove a [very] annoying warning.
...
This commit was SVN r28690.
2013-07-01 16:49:13 +00:00
George Bosilca
a5bda43cfc
Small typo.
...
This commit was SVN r28689.
2013-07-01 16:48:45 +00:00
George Bosilca
5fae72b9aa
Add the MPI 2.2 MPI_Dist_graph functionality.
...
This patch reshape the way we deal with topologies completely. Where
our topologies were mainly storage components (they were not capable
of creating the new communicator), the new version is built around a
[possibly] common representation (in mca/topo/topo.h), but the functions
to attach and retrieve the topological information are specific to each
component. As a result the ompi_create_cart and ompi_create_graph functions
become useless and have been removed.
In addition to adding the internal infrastructure to manage the topology
information, it updates the MPI interface, and the debuggers support and
provides all Fortran interfaces.
This commit was SVN r28687.
2013-07-01 12:40:08 +00:00
George Bosilca
b82abf6bef
Silence a compiler warning.
...
This commit was SVN r28686.
2013-07-01 11:40:42 +00:00
Rolf vandeVaart
adda653fc1
Fix two bugs from previous commit.
...
This commit was SVN r28684.
2013-06-28 16:32:51 +00:00
Rolf vandeVaart
850d325f32
Adjust how search is done for dynamic load of library. CUDA only.
...
This commit was SVN r28683.
2013-06-27 22:13:25 +00:00
Ralph Castain
446e33a5d8
There are cases where we want to use the novm state machine, but the backend node topology differs from that where mpirun is executing. In those cases, we can wind up thinking we are oversubscribed because the head node has fewer cores than the compute nodes.
...
To resolve this situation, add the ability to specify a backend topology file that mpirun shall use for its mapping operations. Create a new "set_topology" function in opal hwloc to support it.
This commit was SVN r28682.
2013-06-27 03:04:50 +00:00
Jeff Squyres
75e4b92edd
Sync to v1.7 NEWS bullets
...
This commit was SVN r28681.
2013-06-26 19:47:01 +00:00
Ralph Castain
7331dd9534
Apparently, the alps configury has not been checked since we added the RTE abstraction code. Fix it now.
...
This commit was SVN r28673.
2013-06-26 07:03:54 +00:00
Ralph Castain
e8340b6339
There is no convention out there as to how OEMs handle PMI2 functions. Some put them in their own -lpmi2 library, and some don't. Some have split the PMI2 definitions into a pmi2.h and keep the PMI-1 definitions in a separate pmi.h, and some don't.
...
Try to handle cases more generally so at least Slurm and Cray can co-exist in peace.
This commit was SVN r28672.
2013-06-26 00:43:26 +00:00
Ralph Castain
fa943dc6ff
Cleanup a few things in the revised PMI configury - we know slurm has both pmi and pmi2 libs, so just auto-detect the presence of them if the user directed us to build with pmi support.
...
Also cleanup some changed names in the alps code
This commit was SVN r28670.
2013-06-24 02:41:40 +00:00
Jeff Squyres
e3d0782788
Move the assignment after the bozo check.
...
This commit was SVN r28669.
2013-06-22 12:38:32 +00:00
Jeff Squyres
dd25421d48
Convert strcpy() to strncpy(), and just to be extra-super paranoid,
...
use memset(0) for extra bonus points.
This commit was SVN r28668.
2013-06-22 12:21:18 +00:00
Rolf vandeVaart
5ebb74bee3
Fix case where amount of data sent is less than expected. Otherwise, we will get hang when running the RGET protocol.
...
Reviewed by hjelm,bosilca.
This commit was SVN r28667.
2013-06-21 18:35:16 +00:00
Joshua Ladd
0b5c1f2ea8
Add 'generic' support for PMI2 (previously, we checked for PMI2 only on Cray systems.) If your resource manager (e.g. SLURM) has support for PMI2, then the --with-pmi configure flag will enable its usage. If you don't have PMI2, then you will fallback to regular old PMI1. This patch was submitted by Ralph Castain and reviewed and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
...
This commit was SVN r28666.
2013-06-21 15:28:14 +00:00
Nathan Hjelm
299d5b3dd7
Fix two debugger attach bugs.
...
- orte_debugger_init_after_spawn was not being called for debuggers that
use the MPIR_attach_fifo to co-locate debugger daemons.
- MPIR_Breakpoint was not getting called if a debugger reattached. Add
a job state (ORTE_JOB_STATE_DEBUGGER_DETACH) to reset mpir_breakpoint_fired
to false when a debugger detaches to ensure MPIR_Breakpoint is called if
another debugger attaches. Tested with STAT 2.0/launchmon 1.0.
cmr:v1.7
This commit was SVN r28665.
2013-06-20 16:18:05 +00:00
Jeff Squyres
b9ca8e3cd1
Tweaked the help message a bit (this is the end result of iterating on
...
the message in email between Mike, Ralph, Jeff).
Add this to CMR #3642 and #3643 .
This commit was SVN r28662.
2013-06-20 13:19:23 +00:00
Jeff Squyres
84a4a2b18d
Sync with v1.6 bullets
...
This commit was SVN r28661.
2013-06-20 12:34:40 +00:00
Ralph Castain
13665bffe8
Per an off-list discussion, it appears possible for a system to report failure when executing getpwuid. There are several reasons for this error to occur, most notably if the system uses a network-based authentication protocol (e.g., NIS) and that sytem gets overwhelmed when we launch on a lot of nodes.
...
There is no good way to recover from this scenario, and from past experience, using the user's name in the session directory (as opposed to the uid) is very helpful when things go wrong. So print a help message when this happens (it is extremely rare, but has happened at least once now) and return an error.
cmr:v1.7.3,reviewer=jsquyres
cmr:v1.6.5,reviewer=jsquyres
This commit was SVN r28658.
2013-06-20 04:30:42 +00:00
Jeff Squyres
2e5c18195b
We want to ignore this MPI extension in the general case -- it's just
...
an example (and outputs stuff to stdout!).
This commit was SVN r28654.
2013-06-19 16:01:45 +00:00
Ralph Castain
a51a0a8c48
Fix uninitialized var
...
This commit was SVN r28652.
2013-06-18 22:41:47 +00:00
Mike Dubman
d1c82994be
fix: detect threading model to take appropriate flow in mxm
...
This commit was SVN r28648.
2013-06-16 08:40:06 +00:00
George Bosilca
f5a55ccb39
Various cleanups.
...
This commit was SVN r28647.
2013-06-15 16:23:11 +00:00
George Bosilca
a6c3477e89
Remove useless include.
...
This commit was SVN r28646.
2013-06-15 16:07:30 +00:00
George Bosilca
b4ebc417a1
Correctly register the component MCA parameters.
...
Few cleanups in the includes.
This commit was SVN r28645.
2013-06-15 16:05:09 +00:00
Jeff Squyres
b5d269f5c0
Sync with 1.6.5 bullets
...
This commit was SVN r28641.
2013-06-14 16:37:17 +00:00
Jeff Squyres
cc8839de2f
Bring over v1.6.5 bullets.
...
This commit was SVN r28637.
2013-06-14 15:16:06 +00:00
Jeff Squyres
a0b27f5b28
Better comment than what was submitted in r28614.
...
This commit was SVN r28631.
The following SVN revision numbers were found above:
r28614 --> open-mpi/ompi@9556310bd0
2013-06-13 20:52:44 +00:00
Nathan Hjelm
8924140916
Per RFC: use a better hash algorithm for the opal_hash_table_*_ptr functions.
...
Chose the crc32 function present in opal/util/crc.c as the hash function. The
performance should be sufficient for most cases. If not we can always change
the function again.
This commit was SVN r28629.
2013-06-13 17:11:04 +00:00
Nathan Hjelm
518d1fe200
Fix two typos that prevented alps direct launch from working
...
This commit was SVN r28628.
2013-06-13 17:04:08 +00:00
Matthias Jurenz
ebf441ba4b
Changes to VT: Fixed infinite recursion bug if the verbosity level (env. VT_VERBOSE) is higher or equal to 2
...
This commit was SVN r28624.
2013-06-13 07:33:22 +00:00
Jeff Squyres
34fb0712c4
Per https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/256 , we need
...
to set *flag=1 when source == MPI_PROC_NULL.
cmr:v1.7.2:reviewer=dgoodell
cmr:v1.6.5:reviewer=dgoodell
This commit was SVN r28621.
2013-06-12 21:38:07 +00:00
Joshua Ladd
46362d2761
Stomps compiler warnings in HCA min-dist calculation. This should be added to cmr:v1.7:reviewer=jladd
...
This commit was SVN r28620.
2013-06-12 16:25:25 +00:00
Nathan Hjelm
db9bce0926
Add destructors for MPI_T error codes
...
This commit was SVN r28618.
2013-06-12 14:58:14 +00:00
Matthias Jurenz
ba9bc238ee
attempt to fix #3627 : Pass all configure options from the OMPI top-level configure to the OTF sub-configure
...
This commit was SVN r28616.
2013-06-12 13:39:21 +00:00
Mike Dubman
9556310bd0
cosmetic: add comment with rationale for malloc.h include
...
This commit was SVN r28614.
2013-06-12 05:58:32 +00:00
Nathan Hjelm
9b1f32bf12
BTL: add flags for signaled BTL operations
...
As per discussion in the June 2013 developer meeting these
flags will be used by the PML in the future to request
asynchronous progress on an operation. The naming was chosen
to reflect that a BTL supports this mode (MCA_BTL_FLAG_SIGNALED)
and that a descriptor should "signal" the remote side to wake
up and progress the message (MCA_BTL_DES_FLAG_SIGNAL).
Future commits will update OB1 to take advantage of this
feature when performing the RDMA get or RDMA rendezvous
protocols.
This commit was SVN r28612.
2013-06-11 21:52:20 +00:00
Jeff Squyres
bf7f9b1f41
Fix minor typo in man page.
...
This commit was SVN r28606.
2013-06-10 13:44:48 +00:00
Mike Dubman
d18b3ae1a7
fix malloc deprication error with gcc 4.6.3 on ubuntu/fedora
...
This commit was SVN r28605.
2013-06-09 18:13:16 +00:00
Jeff Squyres
d081e767b5
Make the man page rules output more like AM's silent rules
...
This commit was SVN r28604.
2013-06-08 12:33:52 +00:00
George Bosilca
d789423d34
Typo.
...
This commit was SVN r28603.
2013-06-08 10:44:02 +00:00
Tom Naughton
d86c3ce669
+ remove autogenerated 'install-sh'
...
This commit was SVN r28602.
2013-06-07 20:40:24 +00:00
Rolf vandeVaart
62ab008017
Fix SEGV because missing CUDA initialization.
...
This commit was SVN r28601.
2013-06-07 18:31:36 +00:00
Rolf vandeVaart
1230029aa1
The debug messages were swapped. Fixed.
...
This commit was SVN r28600.
2013-06-07 17:23:41 +00:00
Vishwanath Venkatesan
0b727f84da
Avoid malloc of zero bytes, add a check and avoid it.
...
This commit was SVN r28597.
2013-06-06 14:08:57 +00:00