1
1
Граф коммитов

18278 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
e8340b6339 There is no convention out there as to how OEMs handle PMI2 functions. Some put them in their own -lpmi2 library, and some don't. Some have split the PMI2 definitions into a pmi2.h and keep the PMI-1 definitions in a separate pmi.h, and some don't.
Try to handle cases more generally so at least Slurm and Cray can co-exist in peace.

This commit was SVN r28672.
2013-06-26 00:43:26 +00:00
Ralph Castain
fa943dc6ff Cleanup a few things in the revised PMI configury - we know slurm has both pmi and pmi2 libs, so just auto-detect the presence of them if the user directed us to build with pmi support.
Also cleanup some changed names in the alps code

This commit was SVN r28670.
2013-06-24 02:41:40 +00:00
Jeff Squyres
e3d0782788 Move the assignment after the bozo check.
This commit was SVN r28669.
2013-06-22 12:38:32 +00:00
Jeff Squyres
dd25421d48 Convert strcpy() to strncpy(), and just to be extra-super paranoid,
use memset(0) for extra bonus points.

This commit was SVN r28668.
2013-06-22 12:21:18 +00:00
Rolf vandeVaart
5ebb74bee3 Fix case where amount of data sent is less than expected. Otherwise, we will get hang when running the RGET protocol.
Reviewed by hjelm,bosilca.

This commit was SVN r28667.
2013-06-21 18:35:16 +00:00
Joshua Ladd
0b5c1f2ea8 Add 'generic' support for PMI2 (previously, we checked for PMI2 only on Cray systems.) If your resource manager (e.g. SLURM) has support for PMI2, then the --with-pmi configure flag will enable its usage. If you don't have PMI2, then you will fallback to regular old PMI1. This patch was submitted by Ralph Castain and reviewed and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28666.
2013-06-21 15:28:14 +00:00
Nathan Hjelm
299d5b3dd7 Fix two debugger attach bugs.
- orte_debugger_init_after_spawn was not being called for debuggers that
   use the MPIR_attach_fifo to co-locate debugger daemons.
 - MPIR_Breakpoint was not getting called if a debugger reattached. Add
   a job state (ORTE_JOB_STATE_DEBUGGER_DETACH) to reset mpir_breakpoint_fired
   to false when a debugger detaches to ensure MPIR_Breakpoint is called if
   another debugger attaches. Tested with STAT 2.0/launchmon 1.0.

cmr:v1.7

This commit was SVN r28665.
2013-06-20 16:18:05 +00:00
Jeff Squyres
b9ca8e3cd1 Tweaked the help message a bit (this is the end result of iterating on
the message in email between Mike, Ralph, Jeff).

Add this to CMR #3642 and #3643.

This commit was SVN r28662.
2013-06-20 13:19:23 +00:00
Jeff Squyres
84a4a2b18d Sync with v1.6 bullets
This commit was SVN r28661.
2013-06-20 12:34:40 +00:00
Ralph Castain
13665bffe8 Per an off-list discussion, it appears possible for a system to report failure when executing getpwuid. There are several reasons for this error to occur, most notably if the system uses a network-based authentication protocol (e.g., NIS) and that sytem gets overwhelmed when we launch on a lot of nodes.
There is no good way to recover from this scenario, and from past experience, using the user's name in the session directory (as opposed to the uid) is very helpful when things go wrong. So print a help message when this happens (it is extremely rare, but has happened at least once now) and return an error.

cmr:v1.7.3,reviewer=jsquyres
cmr:v1.6.5,reviewer=jsquyres

This commit was SVN r28658.
2013-06-20 04:30:42 +00:00
Jeff Squyres
2e5c18195b We want to ignore this MPI extension in the general case -- it's just
an example (and outputs stuff to stdout!).

This commit was SVN r28654.
2013-06-19 16:01:45 +00:00
Ralph Castain
a51a0a8c48 Fix uninitialized var
This commit was SVN r28652.
2013-06-18 22:41:47 +00:00
Mike Dubman
d1c82994be fix: detect threading model to take appropriate flow in mxm
This commit was SVN r28648.
2013-06-16 08:40:06 +00:00
George Bosilca
f5a55ccb39 Various cleanups.
This commit was SVN r28647.
2013-06-15 16:23:11 +00:00
George Bosilca
a6c3477e89 Remove useless include.
This commit was SVN r28646.
2013-06-15 16:07:30 +00:00
George Bosilca
b4ebc417a1 Correctly register the component MCA parameters.
Few cleanups in the includes.

This commit was SVN r28645.
2013-06-15 16:05:09 +00:00
Jeff Squyres
b5d269f5c0 Sync with 1.6.5 bullets
This commit was SVN r28641.
2013-06-14 16:37:17 +00:00
Jeff Squyres
cc8839de2f Bring over v1.6.5 bullets.
This commit was SVN r28637.
2013-06-14 15:16:06 +00:00
Jeff Squyres
a0b27f5b28 Better comment than what was submitted in r28614.
This commit was SVN r28631.

The following SVN revision numbers were found above:
  r28614 --> open-mpi/ompi@9556310bd0
2013-06-13 20:52:44 +00:00
Nathan Hjelm
8924140916 Per RFC: use a better hash algorithm for the opal_hash_table_*_ptr functions.
Chose the crc32 function present in opal/util/crc.c as the hash function. The
performance should be sufficient for most cases. If not we can always change
the function again.

This commit was SVN r28629.
2013-06-13 17:11:04 +00:00
Nathan Hjelm
518d1fe200 Fix two typos that prevented alps direct launch from working
This commit was SVN r28628.
2013-06-13 17:04:08 +00:00
Matthias Jurenz
ebf441ba4b Changes to VT: Fixed infinite recursion bug if the verbosity level (env. VT_VERBOSE) is higher or equal to 2
This commit was SVN r28624.
2013-06-13 07:33:22 +00:00
Jeff Squyres
34fb0712c4 Per https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/256, we need
to set *flag=1 when source == MPI_PROC_NULL.

cmr:v1.7.2:reviewer=dgoodell
cmr:v1.6.5:reviewer=dgoodell

This commit was SVN r28621.
2013-06-12 21:38:07 +00:00
Joshua Ladd
46362d2761 Stomps compiler warnings in HCA min-dist calculation. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28620.
2013-06-12 16:25:25 +00:00
Nathan Hjelm
db9bce0926 Add destructors for MPI_T error codes
This commit was SVN r28618.
2013-06-12 14:58:14 +00:00
Matthias Jurenz
ba9bc238ee attempt to fix #3627: Pass all configure options from the OMPI top-level configure to the OTF sub-configure
This commit was SVN r28616.
2013-06-12 13:39:21 +00:00
Mike Dubman
9556310bd0 cosmetic: add comment with rationale for malloc.h include
This commit was SVN r28614.
2013-06-12 05:58:32 +00:00
Nathan Hjelm
9b1f32bf12 BTL: add flags for signaled BTL operations
As per discussion in the June 2013 developer meeting these
flags will be used by the PML in the future to request
asynchronous progress on an operation. The naming was chosen
to reflect that a BTL supports this mode (MCA_BTL_FLAG_SIGNALED)
and that a descriptor should "signal" the remote side to wake
up and progress the message (MCA_BTL_DES_FLAG_SIGNAL).

Future commits will update OB1 to take advantage of this
feature when performing the RDMA get or RDMA rendezvous
protocols.

This commit was SVN r28612.
2013-06-11 21:52:20 +00:00
Jeff Squyres
bf7f9b1f41 Fix minor typo in man page.
This commit was SVN r28606.
2013-06-10 13:44:48 +00:00
Mike Dubman
d18b3ae1a7 fix malloc deprication error with gcc 4.6.3 on ubuntu/fedora
This commit was SVN r28605.
2013-06-09 18:13:16 +00:00
Jeff Squyres
d081e767b5 Make the man page rules output more like AM's silent rules
This commit was SVN r28604.
2013-06-08 12:33:52 +00:00
George Bosilca
d789423d34 Typo.
This commit was SVN r28603.
2013-06-08 10:44:02 +00:00
Tom Naughton
d86c3ce669 + remove autogenerated 'install-sh'
This commit was SVN r28602.
2013-06-07 20:40:24 +00:00
Rolf vandeVaart
62ab008017 Fix SEGV because missing CUDA initialization.
This commit was SVN r28601.
2013-06-07 18:31:36 +00:00
Rolf vandeVaart
1230029aa1 The debug messages were swapped. Fixed.
This commit was SVN r28600.
2013-06-07 17:23:41 +00:00
Vishwanath Venkatesan
0b727f84da Avoid malloc of zero bytes, add a check and avoid it.
This commit was SVN r28597.
2013-06-06 14:08:57 +00:00
Edgar Gabriel
2d4655a05a Logic has been revised compared to the previous implementation.
This commit was SVN r28594.
2013-06-05 23:47:42 +00:00
Edgar Gabriel
03c1db7a3a fix the calculation of the UNIFORM flag.
This commit was SVN r28593.
2013-06-05 23:18:50 +00:00
Vishwanath Venkatesan
7d6a05982a Removing the gather_array based on the flag UNIFORM FVIEW for read all operations (dynamic/static),
+ Disabling Timing data extraction by default in dynamic write all

This commit was SVN r28592.
2013-06-05 21:35:37 +00:00
Vishwanath Venkatesan
55878674d7 1. Removing the allgather_array based on the flag UNIFORM FVIEW. This is not really and optimization.
2. Fixing some of the debug printf's these are outdated.

This commit was SVN r28591.
2013-06-05 21:30:15 +00:00
Jeff Squyres
713e3aa3db Refs trac:3626: that ticket specifically refers to the v1.6 branch; this
commit is the trunk version of what is needed for #3626.

Add the "ignore_device" field to the INI file.  This allows us to
specifically list devices that should be ignored by the openib BTL
(such as the Intel Phi, at least as of May 2013 -- see #3626).  

Also add the Intel Phi to the ini file, and set its ignore_device=1.

Finally, add the concept of counting intentionally ignored verbs
devices.  Devices are ignored for one of two reasons:

 * If the number of allowed ports on that device is 0 (i.e., if
   if_include/if_exclude was set such that we're intentionally
   ignoring this device).
 * If the INI ignore_device field for this device is set to 1.

Once we have the count of devices that were intentionally ignored,
only show the "Hey, there's verbs devices that you're not using!"
show_help message if there are devices that were ''unintentionally''
ignored.

This commit was SVN r28589.

The following Trac tickets were found above:
  Ticket 3626 --> https://svn.open-mpi.org/trac/ompi/ticket/3626
2013-06-05 12:12:09 +00:00
Jeff Squyres
3019b7a3f8 Oops! Remove duplicate registration.
This commit was SVN r28588.
2013-06-05 11:55:19 +00:00
Jeff Squyres
1de00b17ad Properly check the return status from registering the MCA params.
This commit was SVN r28587.
2013-06-05 11:53:18 +00:00
Nathan Hjelm
e48bd9809e Add useful messages for MPI_T error codes
This commit was SVN r28584.
2013-06-04 23:18:44 +00:00
Jeff Squyres
d692aba672 Remove the DR PML. It was abondoned long ago. It had a nice life,
a few papers, and now a decent demise with respect.  

This commit was SVN r28582.
2013-06-04 19:36:16 +00:00
Joshua Ladd
61ffb47573 Minor fix for the min-dist mapping algorithm: we need to call 'get_nbobjs_by_type' first, before we get the sorted list of nodes - we need to add node objects and fill them in the summary object for the current topology. This patch was submitted by Elena Elkina and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28578.
2013-05-31 15:19:59 +00:00
Jeff Squyres
d1dc4da292 Fix typo (the debugger might not be TotalView).
This commit was SVN r28577.
2013-05-31 00:39:05 +00:00
Edgar Gabriel
87b3782b7f arghh, copy-and-paste error, status->_ucount has to be set to 0 not max_data for count=0.
This commit was SVN r28576.
2013-05-30 22:00:29 +00:00
Edgar Gabriel
9daec82f17 - make a fileview of 0 bytes work in ompio
- fixes the bug reported in ticket 3619 (which is already closed) also for ompio

This commit was SVN r28575.
2013-05-30 21:33:13 +00:00
Nathan Hjelm
e61a1aa865 Update LANL XE-6 platform files
This commit was SVN r28574.
2013-05-30 18:33:27 +00:00