1
1
Граф коммитов

18183 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
456df1c9f7 Remove redundant opal_output() messages from the module; the called
functions will now show_help() their own error messages if something
goes wrong (per r28470).

This commit was SVN r28471.

The following SVN revision numbers were found above:
  r28470 --> open-mpi/ompi@2ff95a7739
2013-05-10 15:12:07 +00:00
Jeff Squyres
2ff95a7739 Proper show_help error messages for LAMA.
This commit was SVN r28470.
2013-05-10 15:06:25 +00:00
Ralph Castain
353c77e659 Re-enable udcm for test purposes - appears to be working, but needs broader exposure to MTT
This commit was SVN r28468.
2013-05-09 16:10:29 +00:00
Tom Naughton
26acf8adb1 + revert part of changeset r28456 that slighly modified behavior of
".ompi_ignore" to ignore entire framework and avoided adding "ignored"
  frameworks to the autogenerated "frameworks.h" header file.

  This change restores previous behavior. 

This commit was SVN r28466.

The following SVN revision numbers were found above:
  r28456 --> open-mpi/ompi@0a950009be
2013-05-08 18:10:36 +00:00
Jeff Squyres
cad1d920b2 Check to ensure that we have struct ifreq.ifr_mtu before we try to use
it, because Solaris although has SIOCFIGMTU, it curiously does not
have ifreq.ifr_mtu.

This commit was SVN r28460.
2013-05-07 13:51:50 +00:00
Jeff Squyres
4b9b3a81ff Update the list of post-1.5.2 r numbers from hwloc that we have
committed here.

This commit was SVN r28458.
2013-05-07 01:22:06 +00:00
Jeff Squyres
ee0cdf86fd Fix issue raised by Stefan Friedel: remove an extraneous -L that is
added by hwloc's embedding so that it doesn't appear in
libhwloc_embedded.la (and therefore propogate all the way up to
libmpi.la). 

Committed upstream in hwloc SVN r5588.

This commit was SVN r28457.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r5588
2013-05-07 01:21:18 +00:00
Tom Naughton
0a950009be Updates to autogen.pl:
+  Add ifdef guard to project's autogenerated "frameworks.h" header file,
   e.g., "opal/inlude/opal/frameworks.h" would have "OPAL_FRAMEWORKS_H".

+ Avoid adding "ignored" frameworks to the autogenerated "frameworks.h"
  header file.

+ Avoid adding non-MCA projects to the autogenerated 'mca_project_list',
  which maintains existing support for "projects" with new MCA framework 
  enhancements.  Moves this down to mca_run_global().

+ Add small loop at end to add projects with a "config/" subdir
  to the list of includes for 'autoreconf'.

This commit was SVN r28456.
2013-05-06 23:05:18 +00:00
Ralph Castain
f15fe5045e Ensure that debugger connect can occur by getting the rml contact info updated before calling init_after_spawn
cmr:v1.7.3,reviewer=jsquyres

This commit was SVN r28455.
2013-05-06 22:00:45 +00:00
Ralph Castain
c52b94af8b Revert r28453 and r28452 - wrong fix
This commit was SVN r28454.

The following SVN revision numbers were found above:
  r28452 --> open-mpi/ompi@756ee4b5e0
  r28453 --> open-mpi/ompi@6da24143a2
2013-05-06 21:52:17 +00:00
Ralph Castain
6da24143a2 Minor performance improvement
This commit was SVN r28453.
2013-05-06 20:27:16 +00:00
Ralph Castain
756ee4b5e0 Update the rml_uri for each proc so debuggers can attach
This commit was SVN r28452.
2013-05-06 20:18:14 +00:00
Ralph Castain
707d0e653a Must use equal and not & comparison for mapping directives
This commit was SVN r28451.
2013-05-06 15:07:12 +00:00
Ralph Castain
a0a6412545 Do a little cleanup on abnormal termination procedure - don't keep submitting forced exit events (one will do), no need to reset the abnormal termination pipe event in orterun, etc.
This commit was SVN r28450.
2013-05-05 17:39:45 +00:00
Ralph Castain
12969cec81 Update orte_progress_threads configure option - no longer need to test for --enable-event-threads
This commit was SVN r28449.
2013-05-05 14:48:35 +00:00
Ralph Castain
850dbe77ec Update platform files
This commit was SVN r28448.
2013-05-05 14:35:13 +00:00
Ralph Castain
ae68a953f4 Sigh - one more place
This commit was SVN r28447.
2013-05-05 00:25:14 +00:00
Ralph Castain
fb2a694587 Fix print
This commit was SVN r28446.
2013-05-04 22:37:34 +00:00
Ralph Castain
27e3e382d5 No need for ORTE tools to use orte progress thread
This commit was SVN r28445.
2013-05-04 21:13:20 +00:00
Nathan Hjelm
422331b4da btl/openib: fix unconnected datagram connection method (udcm)
The primary issue with udcm is that the immediate data in message
acks were often bogus. This caused the sender to keep trying even
though a message was received and acked. The fix is to use the
source LID and QP to determine which message is being acked. In
most cases this should work well since only one message will be
in flight to any peer.

This commit was SVN r28444.
2013-05-03 17:11:38 +00:00
Ralph Castain
527ea1d090 Per the RFC, always enable libevent thread support.
This commit was SVN r28443.
2013-05-03 15:39:05 +00:00
Jeff Squyres
c8258c06e2 In coll_sm, we alloc a huge chunk of shared memory, divvy it into lots
of individual regions (each region is a multiple of page size in
length), and each process claims its own regions by binding it to its
local memory.  Each process would end up membining something like 16
individual regions in the overall shmem segment.

There were two errors in this code relating to the memory affinity
pinning.  Some combination of these two errors would lead to kernel
panics (!) on my RHEL 6.2 x86_64 machines when used with mmap'ed
shared memory (not posix or sysv shared memory, curiously enough):

1. The shared memory segment is initially divided into two regions:
control and data.  The control starts at the beginning of the shmem
segment, the data starts after that.  The data portion, unfortunately,
was ''not'' aligned to a page.  So all the multiple-of-page-size
regions that we divvy up were also not alined on page boundaries.  And
therefore all the regions we tried to membind were not on page
boundaries.

The solution was to ensure that the data portion started on a page
boundary.  Then all of the individual regions were on page boundaries,
too.

That being said, in my tests, Linux mbind() fails gracefully when the
address is not on a page boundary.  So I'm not sure how this worked at
all / led to a kernel panic...

2. There was some bad pointer math that resulted in membinding regions
larger than they should have been, resulting in region overlaps.
There were definitely overlaps between regions in the same process;
it's likely that there were overlaps between regions of multiple
processes, too -- I'm not sure (and don't care to figure out :-) ).

The solution was to fix the pointer math so that each region membinds
exactly only itself and no neighboring/overlapping regions.

cmr:v1.7.2:reviewer=samuel

This commit was SVN r28442.
2013-05-03 12:49:35 +00:00
Alex Mikheev
9e2fdc7d56 - correction of r28440
This commit was SVN r28441.

The following SVN revision numbers were found above:
  r28440 --> open-mpi/ompi@93ce233530
2013-05-02 12:52:58 +00:00
Alex Mikheev
93ce233530 - btl_openib: changed default SRQ settings:
- increase number of wqe to minimize number of RNRs
    - it is better to have high watermark and post relatively small number of wqes
    - increased TX queue size

This commit was SVN r28440.
2013-05-02 12:46:35 +00:00
Jeff Squyres
52fd270663 Implement MPI-2.2 functionality of deleting attributes on
MPI_COMM_SELF in reverse order during MPI_FINALIZE (well, actually,
''all'' attributes are now deleted in reverse order whenever a
communicator is destructed).

Also revamped a few things in the MPI attribute implementation:

 * use a One Big Lock philosophy for making the implementation thread
   safe (vs. the pair of locks we were using before).  One Big Lock is
   quite a bit simpler and has fewer corner cases; the code for
   attributes is still complicated, but is definitely less complex
   than it used to be.
 * The COPY_ATTR_CALLBACKS and DELETE_ATTR_CALLBACKS macros no longer
   return; they simply set a value if something went wrong.  Then we
   check this value after the macros complete.  This simplifies
   unlocking, etc.
 * Added write barriers right before releasing locks to ensure memory
   consistency.
 * Fixed a bunch of typos in comments, and some indenting.

Many thanks to KAWASHIMA Takahiro who contributed the original patch
for attribute destruction ordering, and who helped test/debug/evolve
the patch to its final form.

Fixes trac:3123.

cmr:v1.7.2:reviewer=bosilca

This commit was SVN r28439.

The following Trac tickets were found above:
  Ticket 3123 --> https://svn.open-mpi.org/trac/ompi/ticket/3123
2013-05-02 12:32:21 +00:00
Jeff Squyres
42a9a4c62c After examining a '''lot''' of MTT output with Ralph, fix the cause of
many, many MTT timeouts when running jobs under SLURM: send the right
command at the end to cause remote orteds to shut down.

This commit was SVN r28438.
2013-05-02 00:23:53 +00:00
Nathan Hjelm
4990412d0b undo accidental commit
This commit was SVN r28436.
2013-05-01 16:12:10 +00:00
Nathan Hjelm
d3727680a5 import
This commit was SVN r28435.
2013-05-01 16:01:48 +00:00
Alex Mikheev
f76680fbd0 - btl_openib: fix total registered memory calculation for ConnectIB and Ofed 2.0
This commit was SVN r28432.
2013-05-01 13:39:29 +00:00
George Bosilca
2331000d63 Correctly handle the invalid status for null and inactive
requests. This patch fixes trac:3475.

CMR v1.6, v1.7

This commit was SVN r28431.

The following Trac tickets were found above:
  Ticket 3475 --> https://svn.open-mpi.org/trac/ompi/ticket/3475
2013-05-01 12:55:24 +00:00
Jeff Squyres
eeb1d83c1d Don't assign the status if MPI_STATUS_IGNORE is passed in. Thanks to
Lisandro Dalcin for finding the issue.

This commit was SVN r28430.
2013-05-01 12:32:58 +00:00
Jeff Squyres
d92a8e01f8 Use the _SAFE list traversal macro so that we can remove each item
from the list (just for good measure), and then free() it (without
using _SAFE, we were accessing memory that was just free()'d to get to
the next item).  Also be a little more thorough -- DESTRUCT the list
when we're all done.

This commit was SVN r28429.
2013-05-01 12:26:16 +00:00
George Bosilca
1169ebdff8 Indentation.
This commit was SVN r28426.
2013-04-30 23:26:23 +00:00
George Bosilca
8b0335380a Fix the error messages to reference the correct function.
This commit was SVN r28425.
2013-04-30 23:26:03 +00:00
George Bosilca
6a75c84fa8 Remove useless define.
This commit was SVN r28424.
2013-04-30 23:24:59 +00:00
George Bosilca
92aeefebac The constructor and destructor are not publicly visible functions.
Fix the indentation.

This commit was SVN r28423.
2013-04-30 23:23:57 +00:00
Nathan Hjelm
75cc04faa6 Fix typo in check for mpi_leave_pinned vs mpi_leave_pinned_pipeline.
This commit was SVN r28421.
2013-04-30 20:08:32 +00:00
Ralph Castain
9de82aba55 Revert r28417 - given the non-standard way vprotocol is implemented, I see no way to use the framework verbosity here. Best to just leave it alone as those who use it know what they need to do to get debug output
This commit was SVN r28418.

The following SVN revision numbers were found above:
  r28417 --> open-mpi/ompi@b00de5be8b
2013-04-30 16:37:17 +00:00
Nathan Hjelm
b00de5be8b vprotocol: remove the old output and use the framework output
This commit was SVN r28417.
2013-04-30 15:21:42 +00:00
Ralph Castain
ceb4061214 Fix BTL_VERBOSE - when the MCA param change was committed, it left the base verbosity variable declared so things compiled. Sadly, the verbosity was now being set to a new variable, so debug never was output.
This commit was SVN r28414.
2013-04-30 01:15:52 +00:00
Nathan Hjelm
f384263de7 btl/openib: fix typo
This commit was SVN r28413.
2013-04-29 22:21:25 +00:00
Ralph Castain
4c0dcb1aa2 Update ignores and remove build product
This commit was SVN r28412.
2013-04-29 19:02:03 +00:00
Ralph Castain
5d7a93c032 Add the ability to use an external version of libevent. Clearly not recommended at this time. I've verified that it works in limited scenarios, but more thorough testing and performance impacts need to be assessed.
Interesting how many includes had to be fixed here and there to fill in missing dependencies :-)

This commit was SVN r28411.
2013-04-29 17:02:37 +00:00
Ralph Castain
3052acd968 Fix minor typo
This commit was SVN r28410.
2013-04-29 17:02:11 +00:00
Ralph Castain
bd83de0b7f Fix an obvious typo - it was set to default to true when instantiated.
This commit was SVN r28407.
2013-04-27 00:12:10 +00:00
Ralph Castain
700034cda3 Update platform files
This commit was SVN r28406.
2013-04-27 00:09:58 +00:00
Ralph Castain
8996ecb128 Add missing include
This commit was SVN r28405.
2013-04-27 00:09:36 +00:00
Ralph Castain
3818e88365 Remove and ignore build products
This commit was SVN r28404.
2013-04-27 00:07:18 +00:00
Jeff Squyres
c9c6ced1c9 Use some handy shell scripts from W Spector to s/ierr/ierror/ in the
mpi module.

This commit was SVN r28403.
2013-04-26 22:07:42 +00:00
Jeff Squyres
f55cea1a5b If there are no BTLs, do ''not'' actually shut down the fd listener,
because a) it may still be needed to shut down the CPCs, and b) it
will be shut down during component_close().

This commit was SVN r28402.
2013-04-26 15:31:50 +00:00