1
1
Граф коммитов

14368 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
ed1dbabc0c Remove the last vestiges of mpi_portable_platform.h.in
This commit was SVN r22789.
2010-03-05 21:21:03 +00:00
Jeff Squyres
95d7e08a66 More more discussion and testing has occurred off-ticket.
Short version: there is a bug in OS X/Snow Leopard, but there is also
a bug in Open MPI.  Fixing the bug in Open MPI is both trivial (a
1-line change) and avoids the bug in OS X.  We'll file an OS X bug
report upstream with Apple, but it should no longer affect us here in
OMPI.

Fixes trac:2039.

More details:

Some background first: 

 1. IPv4 sockets can only accept incoming IPv4 connections.  However,
    IPv6 sockets can be configured to accept ''only'' incoming IPv6
    connection, or ''both'' incoming IPv4 and IPv6 connections.  An
    IPv6 socket attribute sets which listening behavior is used.
 1. IPv4 and IPv6 have different port namespaces.  Hence, it is
    permissable to bind a v4 socket to port X ''and'' also bind a v6
    socket to that same port X on the same interface (assuming that
    the v6 socket is only accepting incoming v6 connections).
    Incoming v4 connections to port X on the interface should get
    matched to the listening v4 socket; incoming v6 connections should
    get matched to the listening v6 socket.
 1. When v6 sockets accept ''both'' incoming v4 and v6 connections, it
    should claim port X in both namespaces.
 1. Linux's default behavior is to only allow one listening socket to
    be bound to a given port (i.e., ''either'' a v6 or v4 socket to be
    bound to a single port X -- not both).  A v6 socket can listen for
    both v4 and v6 incoming connections on that port, but still --
    only one socket will be bound to that port.
 1. Snow Leopard's default behavior is to share ports -- i.e., let
    both a v4 and a v6 listening socket to be bound to port X
    (assuming that the v6 socket is only accepting incoming v6
    connections).

The TCP BTL creates a listening socket for each address family.
Hence, it creates a v4 listening socket on INADDR_ANY and a v6
listening socket on the v6 equivalent of INADDR_ANY.  OMPI then
iteratively tries to find ports to listen on within the range of
[mca_btl_tcp_port_min, mca_btl_tcp_port_min + mca_btl_tcp_port_range).

On Linux, the v4 socket will be bound to port X and the v6 socket will
likely be bound to port Y (where X != Y).  On Snow Leopard, the v4
socket will be bound to port X and the v6 socket may ''also'' be bound
to port X.  Since the namespaces are separate, this shouldn't be a
problem.

However, Open MPI was accidentally setting the v6 listening behavior
to accept ''both'' v4 and v6 incoming connections.  This is a trivial
thing to fix -- change a 0 to a 1 in the code.  On Linux, this issue
didn't matter because the v4 and v6 sockets were on different ports.
So even though the v6 socket ''would'' have accepted incoming v4
connections, that never happened because OMPI would direct v4
connections to the v4 port.

But on Snow Leopard, the v4 and v6 listening ports could end up
sharing the same port number.  As mentioned above, this ''shouldn't''
have been a problem, but it looks like Snow Leopard has the following
bugs:

 * If a v4 socket is already bound to port X, we're pretty sure that a
   v6 socket with the "accept both v4 and v6 incoming connections"
   listening behavior should not be able to claim port X (because
   there's already a v4 socket listening on X).  However, Snow Leopard
   would allow binding a v4 socket to port X, and then allow a v6
   socket configured to allow incoming v4 and v6 connections to
   ''also'' be bound to port X.
 * After binding the v6 socket to port X, Snow Leopard then lets
   ''another'' v4 socket ''also'' get bound to port X.  Hence, there's
   now '''three''' sockets all listening on port X.

This obviously led to mis-matched TCP connections, and things went
downhill from there.

That being said, Snow Leopard doesn't exhibit this behavior if v6
sockets only allow incoming v6 connections.  And technically, that is
exactly the behavior we want (we want v6 sockets to only accept
incoming v6 connections).  So if we just change the flag to make our
v6 listening socket us this behavior, the problem on OS X goes away.

That's what this commit does -- it changes a 0 to a 1, indicating
"only let this v6 socket allow incoming v6 connections."

That was simple, wasn't it?

This commit was SVN r22788.

The following Trac tickets were found above:
  Ticket 2039 --> https://svn.open-mpi.org/trac/ompi/ticket/2039
2010-03-05 17:37:57 +00:00
Ralph Castain
2a0f7e95ee Don't double account for the killed local proc - only adjust num_local_procs when the proc actually dies.
This commit was SVN r22787.
2010-03-05 13:53:18 +00:00
Ralph Castain
b2e24693c4 Check the return status when we forward stdin and remove the recipient when they are no longer alive
This commit was SVN r22786.
2010-03-05 13:41:28 +00:00
Ralph Castain
577eef1491 Pretty-print the recvd command for debug purposes
This commit was SVN r22785.
2010-03-05 13:38:20 +00:00
Ralph Castain
cdae19cf7b Add a convenience macro to make a job family
This commit was SVN r22784.
2010-03-05 13:35:09 +00:00
Ralph Castain
f2c65dc70f Ensure that the errmgr does not take action if the process was terminated by a "kill_procs" command as this can lead to circular logic.
Cleanup the kill_procs command by removing a no-longer-used param. We update the process state when the proc actually exits.

This commit was SVN r22783.
2010-03-05 13:22:12 +00:00
Ralph Castain
ef6c432e22 Fix a nasty bug where we would hang if an application trapped signals such as SIGTERM - a permissible thing to do. In such cases, we removed the process from the waitpid system and then sent it a SIGTERM. If the application trapped that and attempted to cleanly terminate, it would send us a sync message - and the daemon would then add it back to its local child list, causing both the daemon and the process to hang.
In this revision, we let the process terminate/exit however it can, and then pick it up via the usual waitpid.

This commit was SVN r22781.
2010-03-05 04:14:56 +00:00
Matthias Jurenz
75d71239d1 Fixed bug in parsing nm-file:
Do not trigger a parse error if address is out of range. Ignore symbol instead.

This commit was SVN r22778.
2010-03-04 16:03:53 +00:00
Shiqing Fan
db747e4390 Remove the old timing parameter but using orte_timing instead. Thanks for Rainer.
This commit was SVN r22775.
2010-03-04 15:00:03 +00:00
Shiqing Fan
4c1fc87502 Set the compile flags for F77 on Windows more correctly.
This commit was SVN r22774.
2010-03-04 11:41:42 +00:00
Matthias Jurenz
5b9515225d - fixed stack shutdown if maximum number of buffer flushes was reached
- fixed potential stack underflow in vtfilter which might be cause a segmentation fault

This commit was SVN r22773.
2010-03-04 08:08:20 +00:00
Iain Bason
18d9e96301 Fixed two problems:
1. The code that looks at btl_tcp_if_exclude before doing a
   modex_send uses strcmp rather than strncmp. That means that
   "lo0" gets sent even though "lo" is excluded.

2. The code that determines whether a particular local TCP
   interface can connect to a particular remote interface doesn't
   check for loopback interfaces. With this fix, users can now
   enable "lo" and be assured that it will only be used for intra-
   node communication.

This commit was SVN r22762.
2010-03-03 15:51:15 +00:00
George Bosilca
ec7fcf3f91 While building the profiling interface, ignore the
I/O functions if support for I/O is not requested.

This commit was SVN r22761.
2010-03-02 21:05:04 +00:00
Ralph Castain
c88fe1ea54 Create a new mca parameter to control creation of session directories. Defaults to true so that the current behavior of always creating them is preserved. If set to false (0), then don't create session directories. Helps in those environments where session directories are a problem.
Tell the sm btl that it cannot run if no session directories were created.

This commit was SVN r22756.
2010-03-02 15:18:33 +00:00
Ralph Castain
cd1efbb41e Try and do a better job of cleanup in abnormal termination. Ensure the daemons whack session directories prior to disabling signal traps. Ensure that the HNP and daemons all cleanup when they are doing an internal abort.
This commit was SVN r22755.
2010-03-02 14:51:23 +00:00
Ralph Castain
b692645772 Remote daemons should -always- whack any lingering session directories when exiting
This commit was SVN r22749.
2010-03-02 05:28:53 +00:00
Ralph Castain
69fe5ca69b Correctly compute bynode mapping, even in the presence of a $#$%#@^$ rankfile
This commit was SVN r22748.
2010-03-02 05:21:42 +00:00
Ralph Castain
bef06d52bc Silence compiler warning
This commit was SVN r22747.
2010-03-01 21:04:26 +00:00
Ralph Castain
5514d9c673 Fix the stupid rankfile mapper again, hopefully not breaking everything else to accommodate it. Looks like the round-robin mappers still work, at least...
This commit was SVN r22746.
2010-03-01 20:40:47 +00:00
Matthias Jurenz
5f368a094f Restored support for Automake's silent rules
This commit was SVN r22741.
2010-03-01 13:10:27 +00:00
Matthias Jurenz
157942809c Use more portable 'nm' command instead of the BFD library to collect symbol information for instrumentation with the GNU, Intel, and PathScale
This commit was SVN r22737.
2010-03-01 12:20:41 +00:00
Nadia Derbey
3f56f9e688 Fix typo in evutil.h
This commit was SVN r22730.
2010-03-01 07:55:08 +00:00
Ralph Castain
96590b9fad Filter multicast messages to avoid cross-job confusion
This commit was SVN r22729.
2010-02-28 18:22:56 +00:00
Ralph Castain
359dc5cad3 Complete the app_idx change by cleaning up warnings in mappers
This commit was SVN r22728.
2010-02-27 18:14:27 +00:00
Ralph Castain
2541aa98ab Change the app_idx type to uint32_t to support users who use large numbers of app_contexts. Set it up as a new typedef so we can change it later without as much effort.
This commit was SVN r22727.
2010-02-27 17:37:34 +00:00
Ralph Castain
f4c3cceb5e Get the function prototypes to match so we eliminate an annoying warning
This commit was SVN r22726.
2010-02-27 16:41:16 +00:00
Ralph Castain
6c0d7940c7 Add a new MCA param (and corresponding mpirun cmd line option) to output the debugger proctable info after launch. The output is just the job map with the process pid included, so you get a node-by-node list of the process ranks on that node and thier pids.
Works for initial launch and comm_spawn. xml and non-xml output is available

This commit was SVN r22725.
2010-02-27 08:32:25 +00:00
Ralph Castain
8c7f3a0c44 Silence warnings by correctly identifying when we are on a Mac
This commit was SVN r22724.
2010-02-27 08:15:49 +00:00
Jeff Squyres
b0eaebf46f Add Intel's OUI.
This commit was SVN r22723.
2010-02-26 19:54:16 +00:00
Rolf vandeVaart
2715141f6d Fix minor bug in the way we handle btl_tcp_if_include list.
This commit was SVN r22722.
2010-02-26 18:08:04 +00:00
Shiqing Fan
4a3f42d159 Correctly initialize the CCP command line buffer.
This commit was SVN r22721.
2010-02-26 15:53:00 +00:00
Shiqing Fan
e1c009932b Add a few more fortran compile flags, and enable dynamic build for f77 library now.
This commit was SVN r22720.
2010-02-26 07:55:32 +00:00
Ralph Castain
c6448587fe It is okay to not select an rmcast module
This commit was SVN r22719.
2010-02-26 02:39:04 +00:00
Jeff Squyres
2e91de0bdd This has bugged me for a long, long time: rename btl_openib_iwarp.* ->
btl_openib_ip.*.  The routines in these files are not specific to
iwarp -- they are specific to IP interfaces used with IBV devices
(even IB or IBoE/RoCEE/whatever devices).

This commit was SVN r22718.
2010-02-25 21:04:09 +00:00
Jeff Squyres
a4a81698c2 Mostly a patch from Vasily/Mellanox to fix multi-port and 32/64 bit
issues with iwarp.c.  These fixes are needed for IBoE / ROCEE /
whateveritscalledtoday.  I added a few minor changes to his base
patch.

This commit was SVN r22717.
2010-02-25 20:57:05 +00:00
Jeff Squyres
f6f6a06dff Update svn:ignore
This commit was SVN r22716.
2010-02-25 20:39:08 +00:00
Jeff Squyres
2de58f4091 A better fix for a timestamp issue: make sure that various
libtool.m4's are not newer than aclocal.m4.  They "usually weren't",
but if you had a slow filesystem, it could be possible that libtool.m4
would be newer than aclocal.m4, and Bad Things would happen during
"make" (i.e., running configure again).

This commit was SVN r22715.
2010-02-25 20:25:52 +00:00
Eugene Loh
316892b49f Fix spelling of "degradation".
This commit was SVN r22714.
2010-02-25 19:41:59 +00:00
Pavel Shamis
9fbfe6b1c0 The fix resolves the bug #2307. QP creation may fail, since the calculation for _reserved_ does not check for QP type. As result the max_recv_wr may get wrong value . Needs to go to both cmr:v1.4.2 and cmr:v1.5.0
This commit was SVN r22713.
2010-02-25 11:15:20 +00:00
Ralph Castain
4007630b34 Set the ignore properties to ignore generated files
Shame on you, Jeff... :-)

This commit was SVN r22712.
2010-02-25 06:09:15 +00:00
Ralph Castain
b89a21f0fa Grrr....cleanup the new module
This commit was SVN r22711.
2010-02-25 06:08:04 +00:00
Ralph Castain
8954700845 No, we don't have a .windows file...
This commit was SVN r22710.
2010-02-25 02:18:54 +00:00
Ralph Castain
18c7aaff08 Update the grpcomm framework to be more thread-friendly.
Modify the orte configure options to specify --enable-multicast such that it directs components to build or not instead of littering the code base with #if's. Remove those #if's where they used to occur.

Add a new grpcomm "mcast" module to support multicast operations. Still some work required to properly perform daemon collectives for comm_spawn operations. New module only builds when --enable-multicast is provided, and when specifically selected.

This commit was SVN r22709.
2010-02-25 01:11:29 +00:00
Jeff Squyres
dd4945c194 New part ID's from Chelsio and Intel. May still get more from
Chelsio. 

This commit was SVN r22708.
2010-02-24 20:39:40 +00:00
Ralph Castain
44044b9a33 Enable multicast on a couple of different platforms
This commit was SVN r22707.
2010-02-24 20:38:49 +00:00
Iain Bason
7445b23e0d Fixed a minor typo.
This commit was SVN r22706.
2010-02-24 19:05:19 +00:00
Jeff Squyres
5ed31baa38 Add news blurb about the pkg-config files.
This commit was SVN r22705.
2010-02-24 18:47:10 +00:00
Jeff Squyres
af6f1f4b00 Add pkg-config(1) config files to Open MPI. Additionally, fix a minor
bug: libmpi_f90 had libmpi.la in its LIBADD instead of libmpi_f77.la.

Fixes trac:2244.

This commit was SVN r22704.

The following Trac tickets were found above:
  Ticket 2244 --> https://svn.open-mpi.org/trac/ompi/ticket/2244
2010-02-24 18:46:06 +00:00
Jeff Squyres
3d04940921 Update NEWS and README to describe the ABI changes and our version
numbering schemes.

This commit was SVN r22703.
2010-02-24 17:24:42 +00:00