Rolf vandeVaart
850d325f32
Adjust how search is done for dynamic load of library. CUDA only.
...
This commit was SVN r28683.
2013-06-27 22:13:25 +00:00
Ralph Castain
446e33a5d8
There are cases where we want to use the novm state machine, but the backend node topology differs from that where mpirun is executing. In those cases, we can wind up thinking we are oversubscribed because the head node has fewer cores than the compute nodes.
...
To resolve this situation, add the ability to specify a backend topology file that mpirun shall use for its mapping operations. Create a new "set_topology" function in opal hwloc to support it.
This commit was SVN r28682.
2013-06-27 03:04:50 +00:00
Jeff Squyres
75e4b92edd
Sync to v1.7 NEWS bullets
...
This commit was SVN r28681.
2013-06-26 19:47:01 +00:00
Ralph Castain
7331dd9534
Apparently, the alps configury has not been checked since we added the RTE abstraction code. Fix it now.
...
This commit was SVN r28673.
2013-06-26 07:03:54 +00:00
Ralph Castain
e8340b6339
There is no convention out there as to how OEMs handle PMI2 functions. Some put them in their own -lpmi2 library, and some don't. Some have split the PMI2 definitions into a pmi2.h and keep the PMI-1 definitions in a separate pmi.h, and some don't.
...
Try to handle cases more generally so at least Slurm and Cray can co-exist in peace.
This commit was SVN r28672.
2013-06-26 00:43:26 +00:00
Ralph Castain
fa943dc6ff
Cleanup a few things in the revised PMI configury - we know slurm has both pmi and pmi2 libs, so just auto-detect the presence of them if the user directed us to build with pmi support.
...
Also cleanup some changed names in the alps code
This commit was SVN r28670.
2013-06-24 02:41:40 +00:00
Jeff Squyres
e3d0782788
Move the assignment after the bozo check.
...
This commit was SVN r28669.
2013-06-22 12:38:32 +00:00
Jeff Squyres
dd25421d48
Convert strcpy() to strncpy(), and just to be extra-super paranoid,
...
use memset(0) for extra bonus points.
This commit was SVN r28668.
2013-06-22 12:21:18 +00:00
Rolf vandeVaart
5ebb74bee3
Fix case where amount of data sent is less than expected. Otherwise, we will get hang when running the RGET protocol.
...
Reviewed by hjelm,bosilca.
This commit was SVN r28667.
2013-06-21 18:35:16 +00:00
Joshua Ladd
0b5c1f2ea8
Add 'generic' support for PMI2 (previously, we checked for PMI2 only on Cray systems.) If your resource manager (e.g. SLURM) has support for PMI2, then the --with-pmi configure flag will enable its usage. If you don't have PMI2, then you will fallback to regular old PMI1. This patch was submitted by Ralph Castain and reviewed and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
...
This commit was SVN r28666.
2013-06-21 15:28:14 +00:00
Nathan Hjelm
299d5b3dd7
Fix two debugger attach bugs.
...
- orte_debugger_init_after_spawn was not being called for debuggers that
use the MPIR_attach_fifo to co-locate debugger daemons.
- MPIR_Breakpoint was not getting called if a debugger reattached. Add
a job state (ORTE_JOB_STATE_DEBUGGER_DETACH) to reset mpir_breakpoint_fired
to false when a debugger detaches to ensure MPIR_Breakpoint is called if
another debugger attaches. Tested with STAT 2.0/launchmon 1.0.
cmr:v1.7
This commit was SVN r28665.
2013-06-20 16:18:05 +00:00
Jeff Squyres
b9ca8e3cd1
Tweaked the help message a bit (this is the end result of iterating on
...
the message in email between Mike, Ralph, Jeff).
Add this to CMR #3642 and #3643 .
This commit was SVN r28662.
2013-06-20 13:19:23 +00:00
Jeff Squyres
84a4a2b18d
Sync with v1.6 bullets
...
This commit was SVN r28661.
2013-06-20 12:34:40 +00:00
Ralph Castain
13665bffe8
Per an off-list discussion, it appears possible for a system to report failure when executing getpwuid. There are several reasons for this error to occur, most notably if the system uses a network-based authentication protocol (e.g., NIS) and that sytem gets overwhelmed when we launch on a lot of nodes.
...
There is no good way to recover from this scenario, and from past experience, using the user's name in the session directory (as opposed to the uid) is very helpful when things go wrong. So print a help message when this happens (it is extremely rare, but has happened at least once now) and return an error.
cmr:v1.7.3,reviewer=jsquyres
cmr:v1.6.5,reviewer=jsquyres
This commit was SVN r28658.
2013-06-20 04:30:42 +00:00
Jeff Squyres
2e5c18195b
We want to ignore this MPI extension in the general case -- it's just
...
an example (and outputs stuff to stdout!).
This commit was SVN r28654.
2013-06-19 16:01:45 +00:00
Ralph Castain
a51a0a8c48
Fix uninitialized var
...
This commit was SVN r28652.
2013-06-18 22:41:47 +00:00
Mike Dubman
d1c82994be
fix: detect threading model to take appropriate flow in mxm
...
This commit was SVN r28648.
2013-06-16 08:40:06 +00:00
George Bosilca
f5a55ccb39
Various cleanups.
...
This commit was SVN r28647.
2013-06-15 16:23:11 +00:00
George Bosilca
a6c3477e89
Remove useless include.
...
This commit was SVN r28646.
2013-06-15 16:07:30 +00:00
George Bosilca
b4ebc417a1
Correctly register the component MCA parameters.
...
Few cleanups in the includes.
This commit was SVN r28645.
2013-06-15 16:05:09 +00:00
Jeff Squyres
b5d269f5c0
Sync with 1.6.5 bullets
...
This commit was SVN r28641.
2013-06-14 16:37:17 +00:00
Jeff Squyres
cc8839de2f
Bring over v1.6.5 bullets.
...
This commit was SVN r28637.
2013-06-14 15:16:06 +00:00
Jeff Squyres
a0b27f5b28
Better comment than what was submitted in r28614.
...
This commit was SVN r28631.
The following SVN revision numbers were found above:
r28614 --> open-mpi/ompi@9556310bd0
2013-06-13 20:52:44 +00:00
Nathan Hjelm
8924140916
Per RFC: use a better hash algorithm for the opal_hash_table_*_ptr functions.
...
Chose the crc32 function present in opal/util/crc.c as the hash function. The
performance should be sufficient for most cases. If not we can always change
the function again.
This commit was SVN r28629.
2013-06-13 17:11:04 +00:00
Nathan Hjelm
518d1fe200
Fix two typos that prevented alps direct launch from working
...
This commit was SVN r28628.
2013-06-13 17:04:08 +00:00
Matthias Jurenz
ebf441ba4b
Changes to VT: Fixed infinite recursion bug if the verbosity level (env. VT_VERBOSE) is higher or equal to 2
...
This commit was SVN r28624.
2013-06-13 07:33:22 +00:00
Jeff Squyres
34fb0712c4
Per https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/256 , we need
...
to set *flag=1 when source == MPI_PROC_NULL.
cmr:v1.7.2:reviewer=dgoodell
cmr:v1.6.5:reviewer=dgoodell
This commit was SVN r28621.
2013-06-12 21:38:07 +00:00
Joshua Ladd
46362d2761
Stomps compiler warnings in HCA min-dist calculation. This should be added to cmr:v1.7:reviewer=jladd
...
This commit was SVN r28620.
2013-06-12 16:25:25 +00:00
Nathan Hjelm
db9bce0926
Add destructors for MPI_T error codes
...
This commit was SVN r28618.
2013-06-12 14:58:14 +00:00
Matthias Jurenz
ba9bc238ee
attempt to fix #3627 : Pass all configure options from the OMPI top-level configure to the OTF sub-configure
...
This commit was SVN r28616.
2013-06-12 13:39:21 +00:00
Mike Dubman
9556310bd0
cosmetic: add comment with rationale for malloc.h include
...
This commit was SVN r28614.
2013-06-12 05:58:32 +00:00
Nathan Hjelm
9b1f32bf12
BTL: add flags for signaled BTL operations
...
As per discussion in the June 2013 developer meeting these
flags will be used by the PML in the future to request
asynchronous progress on an operation. The naming was chosen
to reflect that a BTL supports this mode (MCA_BTL_FLAG_SIGNALED)
and that a descriptor should "signal" the remote side to wake
up and progress the message (MCA_BTL_DES_FLAG_SIGNAL).
Future commits will update OB1 to take advantage of this
feature when performing the RDMA get or RDMA rendezvous
protocols.
This commit was SVN r28612.
2013-06-11 21:52:20 +00:00
Jeff Squyres
bf7f9b1f41
Fix minor typo in man page.
...
This commit was SVN r28606.
2013-06-10 13:44:48 +00:00
Mike Dubman
d18b3ae1a7
fix malloc deprication error with gcc 4.6.3 on ubuntu/fedora
...
This commit was SVN r28605.
2013-06-09 18:13:16 +00:00
Jeff Squyres
d081e767b5
Make the man page rules output more like AM's silent rules
...
This commit was SVN r28604.
2013-06-08 12:33:52 +00:00
George Bosilca
d789423d34
Typo.
...
This commit was SVN r28603.
2013-06-08 10:44:02 +00:00
Tom Naughton
d86c3ce669
+ remove autogenerated 'install-sh'
...
This commit was SVN r28602.
2013-06-07 20:40:24 +00:00
Rolf vandeVaart
62ab008017
Fix SEGV because missing CUDA initialization.
...
This commit was SVN r28601.
2013-06-07 18:31:36 +00:00
Rolf vandeVaart
1230029aa1
The debug messages were swapped. Fixed.
...
This commit was SVN r28600.
2013-06-07 17:23:41 +00:00
Vishwanath Venkatesan
0b727f84da
Avoid malloc of zero bytes, add a check and avoid it.
...
This commit was SVN r28597.
2013-06-06 14:08:57 +00:00
Edgar Gabriel
2d4655a05a
Logic has been revised compared to the previous implementation.
...
This commit was SVN r28594.
2013-06-05 23:47:42 +00:00
Edgar Gabriel
03c1db7a3a
fix the calculation of the UNIFORM flag.
...
This commit was SVN r28593.
2013-06-05 23:18:50 +00:00
Vishwanath Venkatesan
7d6a05982a
Removing the gather_array based on the flag UNIFORM FVIEW for read all operations (dynamic/static),
...
+ Disabling Timing data extraction by default in dynamic write all
This commit was SVN r28592.
2013-06-05 21:35:37 +00:00
Vishwanath Venkatesan
55878674d7
1. Removing the allgather_array based on the flag UNIFORM FVIEW. This is not really and optimization.
...
2. Fixing some of the debug printf's these are outdated.
This commit was SVN r28591.
2013-06-05 21:30:15 +00:00
Jeff Squyres
713e3aa3db
Refs trac:3626: that ticket specifically refers to the v1.6 branch; this
...
commit is the trunk version of what is needed for #3626 .
Add the "ignore_device" field to the INI file. This allows us to
specifically list devices that should be ignored by the openib BTL
(such as the Intel Phi, at least as of May 2013 -- see #3626 ).
Also add the Intel Phi to the ini file, and set its ignore_device=1.
Finally, add the concept of counting intentionally ignored verbs
devices. Devices are ignored for one of two reasons:
* If the number of allowed ports on that device is 0 (i.e., if
if_include/if_exclude was set such that we're intentionally
ignoring this device).
* If the INI ignore_device field for this device is set to 1.
Once we have the count of devices that were intentionally ignored,
only show the "Hey, there's verbs devices that you're not using!"
show_help message if there are devices that were ''unintentionally''
ignored.
This commit was SVN r28589.
The following Trac tickets were found above:
Ticket 3626 --> https://svn.open-mpi.org/trac/ompi/ticket/3626
2013-06-05 12:12:09 +00:00
Jeff Squyres
3019b7a3f8
Oops! Remove duplicate registration.
...
This commit was SVN r28588.
2013-06-05 11:55:19 +00:00
Jeff Squyres
1de00b17ad
Properly check the return status from registering the MCA params.
...
This commit was SVN r28587.
2013-06-05 11:53:18 +00:00
Nathan Hjelm
e48bd9809e
Add useful messages for MPI_T error codes
...
This commit was SVN r28584.
2013-06-04 23:18:44 +00:00
Jeff Squyres
d692aba672
Remove the DR PML. It was abondoned long ago. It had a nice life,
...
a few papers, and now a decent demise with respect.
This commit was SVN r28582.
2013-06-04 19:36:16 +00:00
Joshua Ladd
61ffb47573
Minor fix for the min-dist mapping algorithm: we need to call 'get_nbobjs_by_type' first, before we get the sorted list of nodes - we need to add node objects and fill them in the summary object for the current topology. This patch was submitted by Elena Elkina and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
...
This commit was SVN r28578.
2013-05-31 15:19:59 +00:00