1
1

100 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
George Bosilca
de1078a71b Thanks to Alex Margolin for pointing out this relique.
This commit was SVN r26121.
2012-03-09 14:01:45 +00:00
Jeff Squyres
4c0d24ff9a Improve the help messages for tcp_btl_if_[in|ex]clude
This commit was SVN r25940.
2012-02-16 15:59:18 +00:00
Jeff Squyres
1cbfb53801 r24976 wasn't quite right -- you now actually get a warning if you
specify btl_tcp_if_include because btl_tcp_if_exclude is defaulted to
the loopback devices.

This commit does a few things:

 * Introduce a new OPAL MCA base function:
   mca_base_param_check_exclusive_string().  It checks to see that the
   ''user'' does not set two MCA parameters that are mutually
   exclusive by checking the source of those MCS param values.
 * Use the above function in many BTLs (and the OOB TCP) to ensure
   that <foo>_if_include and <foo>_if_exclude are not both specified
   ''by the user''.
 * Re-arrange many of these BTLs to move their MCA registration code
   into a separate component_register() function (vs. the
   component_open() function).

This code has been nominally reviewed and checked by Ralph, George,
Terry, and Shiqing.

This commit was SVN r25043.

The following SVN revision numbers were found above:
  r24976 --> open-mpi/ompi@8f4ac54336
2011-08-10 17:24:36 +00:00
Jeff Squyres
8f4ac54336 Fixes trac:2838: add a warning message and disqualify the TCP BTL if both
btl_tcp_if_include and btl_tcp_if_exclude are specified. 

This commit was SVN r24976.

The following Trac tickets were found above:
  Ticket 2838 --> https://svn.open-mpi.org/trac/ompi/ticket/2838
2011-08-01 23:30:33 +00:00
Jeff Squyres
b113b1a382 Add the btl_tcp_if_seq MCA parameter. From the help string:
If specified, a comma-delimited list of TCP interfaces.  Interfaces
  will be assigned, one to each MPI process, in a round-robin fashion
  on each server.  For example, if the list is "eth0,eth1" and four
  MPI processes are run on a single server, then local ranks 0 and 2
  will use eth0 and local ranks 1 and 3 will use eth1.

This feature is only useful for environments with virtual ethernet
interfaces on the same network.  For example, if eth0 and eth1 are
virtual interfaces to the same NIC on the same subnet, and if the NIC
provides different hardware resources to eth0 and eth1 (not just
different kernel resources), some HOL blocking and congestion issues
can be eased in a modest fashion.

This commit was SVN r24181.
2010-12-16 00:54:32 +00:00
Ralph Castain
9ea2b196ce Convert the opal_event framework to use direct function calls instead of hiding functions behind function pointers. Eliminate the opal_object_t abstraction of libevent's event struct so it can be directly passed to the libevent functions.
Note: the ompi_check_libfca.m4 file had to be modified to avoid it stomping on global CPPFLAGS and the like. The file was also relocated to the ompi/config directory as it pertains solely to an ompi-layer component.

Forgive the mid-day configure change, but I know Shiqing is working the windows issues and don't want to cause him unnecessary redo work.

This commit was SVN r23966.
2010-10-28 15:22:46 +00:00
Ralph Castain
86c7365e8e Clean up a few initialization issues - don't think these are impacting the shared memory situation as it didn't fix the problem.
Setup the event API to support multiple bases in preparation for splitting the OMPI and ORTE events. Holding here pending shared memory resolution.

This commit was SVN r23943.
2010-10-26 02:41:42 +00:00
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Jeff Squyres
c8bb7537e7 Remove include/opal/sys/cache.h -- its only purpose in life was to
#define CACHE_LINE_SIZE to 128.  This name has a conflict on NetBSD,
and it seems kinda odd to have a header file that ''only'' defines a
single value.  Also, we'll soon be raising hwloc to be a first-class
item, so having this file around seemed kinda weird.

Therefore, I replaced CACHE_LINE_SIZE with opal_cache_line_size, an
int (in opal/runtime/opal_init.c and opal/runtime/opal.h) on the
rationale that we can fill this in at runtime with hwloc info (trunk
and v1.5/beyond, only).  The only place we ''needed'' a compile-time
CACHE_LINE_SIZE was in the BTL SM (for struct padding), so I made a
new BTL_SM_ preprocessor macro with the old CACHE_LINE_SIZE value
(128).  That use isn't suitable for run-time hwloc information,
anyway.

This commit was SVN r23349.
2010-07-06 14:33:36 +00:00
Shiqing Fan
857f1669e2 Solve a few compilation problems on Windows.
This commit was SVN r23193.
2010-05-21 14:30:15 +00:00
Abhishek Kulkarni
afbe3e99c6 * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
 SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
 back the native error code.

* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
  (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
  decode 'ret' to get the native error code.

This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
Jeff Squyres
95d7e08a66 More more discussion and testing has occurred off-ticket.
Short version: there is a bug in OS X/Snow Leopard, but there is also
a bug in Open MPI.  Fixing the bug in Open MPI is both trivial (a
1-line change) and avoids the bug in OS X.  We'll file an OS X bug
report upstream with Apple, but it should no longer affect us here in
OMPI.

Fixes trac:2039.

More details:

Some background first: 

 1. IPv4 sockets can only accept incoming IPv4 connections.  However,
    IPv6 sockets can be configured to accept ''only'' incoming IPv6
    connection, or ''both'' incoming IPv4 and IPv6 connections.  An
    IPv6 socket attribute sets which listening behavior is used.
 1. IPv4 and IPv6 have different port namespaces.  Hence, it is
    permissable to bind a v4 socket to port X ''and'' also bind a v6
    socket to that same port X on the same interface (assuming that
    the v6 socket is only accepting incoming v6 connections).
    Incoming v4 connections to port X on the interface should get
    matched to the listening v4 socket; incoming v6 connections should
    get matched to the listening v6 socket.
 1. When v6 sockets accept ''both'' incoming v4 and v6 connections, it
    should claim port X in both namespaces.
 1. Linux's default behavior is to only allow one listening socket to
    be bound to a given port (i.e., ''either'' a v6 or v4 socket to be
    bound to a single port X -- not both).  A v6 socket can listen for
    both v4 and v6 incoming connections on that port, but still --
    only one socket will be bound to that port.
 1. Snow Leopard's default behavior is to share ports -- i.e., let
    both a v4 and a v6 listening socket to be bound to port X
    (assuming that the v6 socket is only accepting incoming v6
    connections).

The TCP BTL creates a listening socket for each address family.
Hence, it creates a v4 listening socket on INADDR_ANY and a v6
listening socket on the v6 equivalent of INADDR_ANY.  OMPI then
iteratively tries to find ports to listen on within the range of
[mca_btl_tcp_port_min, mca_btl_tcp_port_min + mca_btl_tcp_port_range).

On Linux, the v4 socket will be bound to port X and the v6 socket will
likely be bound to port Y (where X != Y).  On Snow Leopard, the v4
socket will be bound to port X and the v6 socket may ''also'' be bound
to port X.  Since the namespaces are separate, this shouldn't be a
problem.

However, Open MPI was accidentally setting the v6 listening behavior
to accept ''both'' v4 and v6 incoming connections.  This is a trivial
thing to fix -- change a 0 to a 1 in the code.  On Linux, this issue
didn't matter because the v4 and v6 sockets were on different ports.
So even though the v6 socket ''would'' have accepted incoming v4
connections, that never happened because OMPI would direct v4
connections to the v4 port.

But on Snow Leopard, the v4 and v6 listening ports could end up
sharing the same port number.  As mentioned above, this ''shouldn't''
have been a problem, but it looks like Snow Leopard has the following
bugs:

 * If a v4 socket is already bound to port X, we're pretty sure that a
   v6 socket with the "accept both v4 and v6 incoming connections"
   listening behavior should not be able to claim port X (because
   there's already a v4 socket listening on X).  However, Snow Leopard
   would allow binding a v4 socket to port X, and then allow a v6
   socket configured to allow incoming v4 and v6 connections to
   ''also'' be bound to port X.
 * After binding the v6 socket to port X, Snow Leopard then lets
   ''another'' v4 socket ''also'' get bound to port X.  Hence, there's
   now '''three''' sockets all listening on port X.

This obviously led to mis-matched TCP connections, and things went
downhill from there.

That being said, Snow Leopard doesn't exhibit this behavior if v6
sockets only allow incoming v6 connections.  And technically, that is
exactly the behavior we want (we want v6 sockets to only accept
incoming v6 connections).  So if we just change the flag to make our
v6 listening socket us this behavior, the problem on OS X goes away.

That's what this commit does -- it changes a 0 to a 1, indicating
"only let this v6 socket allow incoming v6 connections."

That was simple, wasn't it?

This commit was SVN r22788.

The following Trac tickets were found above:
  Ticket 2039 --> https://svn.open-mpi.org/trac/ompi/ticket/2039
2010-03-05 17:37:57 +00:00
Iain Bason
18d9e96301 Fixed two problems:
1. The code that looks at btl_tcp_if_exclude before doing a
   modex_send uses strcmp rather than strncmp. That means that
   "lo0" gets sent even though "lo" is excluded.

2. The code that determines whether a particular local TCP
   interface can connect to a particular remote interface doesn't
   check for loopback interfaces. With this fix, users can now
   enable "lo" and be assured that it will only be used for intra-
   node communication.

This commit was SVN r22762.
2010-03-03 15:51:15 +00:00
Jeff Squyres
4a40be650e Improve the MCA param help messages for btl_tcp_if_in|exclude.
This commit was SVN r21968.
2009-09-15 17:19:57 +00:00
George Bosilca
2143424eb5 The MCA parameter should always be taken into account, independent on
how many networks are available on the node.

This commit was SVN r21652.
2009-07-13 19:40:00 +00:00
George Bosilca
311e27b42f Pretty print an error message when the specified range of ports (for both
IPv4 and IPv6) is outside the legal boundaries. This fixes trac:1869.

This commit was SVN r21612.

The following Trac tickets were found above:
  Ticket 1869 --> https://svn.open-mpi.org/trac/ompi/ticket/1869
2009-07-07 17:52:30 +00:00
George Bosilca
4038834dfb Convert the port number in network order before binding the socket.
Thanks to Mariusz Mamonski (mamonski@man.poznan.pl) for the bug
report and patch.

This commit was SVN r21610.
2009-07-07 17:21:28 +00:00
Shiqing Fan
0b56a8a4d5 Enable IPv6 on Windows by default, and fix two type casts for IPv6 operations.
This commit was SVN r21586.
2009-07-02 14:41:03 +00:00
Shiqing Fan
e46bf10efd Correctly include win32 util header.
This commit was SVN r21343.
2009-06-01 19:16:00 +00:00
Shiqing Fan
b3c077e112 Microsoft doesn't provide inet_pton and inet_ntop APIs on Windows XP, but only on Windows Vista and 2008. So add a stand alone version of inet_pton and inet_ntop functions from ISC.
This commit was SVN r21295.
2009-05-27 14:32:30 +00:00
Rainer Keller
d5cb1b4f97 - Shown by __opal_attribute_format__, probably macro
was converted from opal_output_verbose

This commit was SVN r21237.
2009-05-14 13:57:37 +00:00
Jeff Squyres
f7f5603d5c * No need for warning aggregation here; orte_show_help() does that
for us already.
 * Slightly clarify the error message strings; now they match the new
   error strings for btl_openib_ipaddr_in|exclude

This commit was SVN r21197.
2009-05-09 12:29:06 +00:00
Jeff Squyres
c166678cf2 Gah -- forgot to test the case where btl_tcp_if_include wasn't
specified.  :-(

This commit was SVN r21191.
2009-05-07 21:50:02 +00:00
Jeff Squyres
97c9610b7f Note to self: test error messages before committing.
This commit was SVN r21190.
2009-05-07 20:31:26 +00:00
Jeff Squyres
4779520787 Add the ability to btl_tcp_if_include and btl_tcp_if_exclude based on
subnet specifications (in addition to interface names).  These
parameters now take a comma-delimited list of interfaces names and/or
a.b.c.d/x specifications (only IPv4 currently supported for subnet
specifications).  For example:

  mpirun --mca btl_tcp_if_include 10.10.30.0/8,eth0

This commit was SVN r21189.
2009-05-07 18:53:33 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
George Bosilca
64075cd54d Get rid of the bitmap header file.
This commit was SVN r20972.
2009-04-10 16:44:37 +00:00
Rainer Keller
9dea63d63a - Last of intrusive commits (promised)... err for now.
Anyway, this is blocking the move: do not include pml.h
   if not really needed, aka none of the following used:
     mca_pml
     MCA_PML_CALL
     OMPI_ANY_TAG
     OMPI_ANY_SOURCE
     OMPI_PROC_NULL

 - Notable exceptions (deleting in one header->adding):
   - ompi/mca/mtl/psm/
   - ompi/mca/osc/rdma/
   - ompi/mca/btl/openib/btl_openib_endpoint.c depended on
     pml_base_sendreq.h

 - Tested on Linux/x86-64, this time including make check
   (thanks Jeff and Ralph)

This commit was SVN r20725.
2009-03-04 17:06:51 +00:00
Rainer Keller
4c0e8e1e69 - Header orte/mca/oob/base/base.h is probably the wrong one to include
anyhow -- if oob functionality is neededm then orte/mca/oob/oob.h

   Nevertheless compiles fine with -Wimplicit-function-declaration   

This commit was SVN r20641.
2009-02-26 04:20:03 +00:00
Rainer Keller
04567d3af0 - Header orte/mca/errmgr/errmgr.h is not needed.
Once again compiles fine with -Wimplicit-function-declaration   

This commit was SVN r20640.
2009-02-26 04:05:30 +00:00
Rainer Keller
d81443cc5a - On the way to get the BTLs split out and lessen dependency on orte:
Often, orte/util/show_help.h is included, although no functionality
   is required -- instead, most often opal_output.h, or               
   orte/mca/rml/rml_types.h                                           
   Please see orte_show_help_replacement.sh commited next.            

 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   actually showed two *missing* #include "orte/util/show_help.h"     
   in orte/mca/odls/base/odls_base_default_fns.c and                  
   in orte/tools/orte-top/orte-top.c                                  
   Manually added these.                                              

   Let's have MTT the last word.

This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Jeff Squyres
d1c6f3f89a * Fix a truckload of Cisco copyrights to be the same as the rest of
the code base.
 * Fix a few misspellings in other copyrights.

This commit was SVN r20241.
2009-01-11 02:30:00 +00:00
Rolf vandeVaart
76f8ce01cf Need to add sppp to list of default excluded interfaces
to support Sun M9000 server.

This commit was SVN r19988.
2008-11-12 20:30:14 +00:00
Rolf vandeVaart
cad49da72d Fix the tcp btl so it makes use of the btl_tcp_if_include and btl_tcp_if_exclude
parameters on the connecting side also.  Also move define of IF_NAMESIZE
into if.h file.  And lastly, add one verbose debug message which may be
useful if we run into other issues like this.

This commit fixes trac:1573.

This commit was SVN r19932.

The following Trac tickets were found above:
  Ticket 1573 --> https://svn.open-mpi.org/trac/ompi/ticket/1573
2008-11-05 18:45:42 +00:00
George Bosilca
d37706f6f8 Reset the variables to NULL after releasing the memory.
This commit was SVN r19912.
2008-11-04 16:49:46 +00:00
Terry Dontje
b5a30af20e Correct double free of event on tcp_events list. Refs trac:1629.
This commit was SVN r19890.

The following Trac tickets were found above:
  Ticket 1629 --> https://svn.open-mpi.org/trac/ompi/ticket/1629
2008-11-03 19:03:57 +00:00
Shiqing Fan
04ee20a880 - Mainly type casts. Microsoft VC++ compiler is too strict.
This commit was SVN r19517.
2008-09-08 15:39:30 +00:00
Jeff Squyres
0af7ac53f2 Fixes trac:1392, #1400
* add "register" function to mca_base_component_t
   * converted coll:basic and paffinity:linux and paffinity:solaris to
     use this function
   * we'll convert the rest over time (I'll file a ticket once all
     this is committed)
 * add 32 bytes of "reserved" space to the end of mca_base_component_t
   and mca_base_component_data_2_0_0_t to make future upgrades
   [slightly] easier
   * new mca_base_component_t size: 196 bytes
   * new mca_base_component_data_2_0_0_t size: 36 bytes
 * MCA base version bumped to v2.0
   * '''We now refuse to load components that are not MCA v2.0.x'''
 * all MCA frameworks versions bumped to v2.0
 * be a little more explicit about version numbers in the MCA base
   * add big comment in mca.h about versioning philosophy

This commit was SVN r19073.

The following Trac tickets were found above:
  Ticket 1392 --> https://svn.open-mpi.org/trac/ompi/ticket/1392
2008-07-28 22:40:57 +00:00
Ralph Castain
9613b3176c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.

I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.

This commit was SVN r18619.
2008-06-09 14:53:58 +00:00
Jeff Squyres
e7ecd56bd2 This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.

= ORTE Job-Level Output Messages =

Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):

 * orte_output(): (and corresponding friends ORTE_OUTPUT,
   orte_output_verbose, etc.)  This function sends the output directly
   to the HNP for processing as part of a job-specific output
   channel.  It supports all the same outputs as opal_output()
   (syslog, file, stdout, stderr), but for stdout/stderr, the output
   is sent to the HNP for processing and output.  More on this below.
 * orte_show_help(): This function is a drop-in-replacement for
   opal_show_help(), with two differences in functionality:
   1. the rendered text help message output is sent to the HNP for
      display (rather than outputting directly into the process' stderr
      stream)
   1. the HNP detects duplicate help messages and does not display them
      (so that you don't see the same error message N times, once from
      each of your N MPI processes); instead, it counts "new" instances
      of the help message and displays a message every ~5 seconds when
      there are new ones ("I got X new copies of the help message...")

opal_show_help and opal_output still exist, but they only output in
the current process.  The intent for the new orte_* functions is that
they can apply job-level intelligence to the output.  As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.

=== New code ===

For ORTE and OMPI programmers, here's what you need to do differently
in new code:

 * Do not include opal/util/show_help.h or opal/util/output.h.
   Instead, include orte/util/output.h (this one header file has
   declarations for both the orte_output() series of functions and
   orte_show_help()).
 * Effectively s/opal_output/orte_output/gi throughout your code.
   Note that orte_output_open() takes a slightly different argument
   list (as a way to pass data to the filtering stream -- see below),
   so you if explicitly call opal_output_open(), you'll need to
   slightly adapt to the new signature of orte_output_open().
 * Literally s/opal_show_help/orte_show_help/.  The function signature
   is identical.

=== Notes ===

 * orte_output'ing to stream 0 will do similar to what
   opal_output'ing did, so leaving a hard-coded "0" as the first
   argument is safe.
 * For systems that do not use ORTE's RML or the HNP, the effect of
   orte_output_* and orte_show_help will be identical to their opal
   counterparts (the additional information passed to
   orte_output_open() will be lost!).  Indeed, the orte_* functions
   simply become trivial wrappers to their opal_* counterparts.  Note
   that we have not tested this; the code is simple but it is quite
   possible that we mucked something up.

= Filter Framework =

Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr.  The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations.  The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc.  This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).

Filtering is not active by default.  Filter components must be
specifically requested, such as:

{{{
$ mpirun --mca filter xml ...
}}}

There can only be one filter component active.

= New MCA Parameters =

The new functionality described above introduces two new MCA
parameters:

 * '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
   help messages will be aggregated, as described above.  If set to 0,
   all help messages will be displayed, even if they are duplicates
   (i.e., the original behavior).
 * '''orte_base_show_output_recursions''': An MCA parameter to help
   debug one of the known issues, described below.  It is likely that
   this MCA parameter will disappear before v1.3 final.

= Known Issues =

 * The XML filter component is not complete.  The current output from
   this component is preliminary and not real XML.  A bit more work
   needs to be done to configure.m4 search for an appropriate XML
   library/link it in/use it at run time.
 * There are possible recursion loops in the orte_output() and
   orte_show_help() functions -- e.g., if RML send calls orte_output()
   or orte_show_help().  We have some ideas how to fix these, but
   figured that it was ok to commit before feature freeze with known
   issues.  The code currently contains sub-optimal workarounds so
   that this will not be a problem, but it would be good to actually
   solve the problem rather than have hackish workarounds before v1.3 final.

This commit was SVN r18434.
2008-05-13 20:00:55 +00:00
Ralph Castain
d70e2e8c2b Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer

This commit was SVN r17632.
2008-02-28 01:57:57 +00:00
George Bosilca
6310ce955c The first patch related to the Active Message stuff. So far, here is what we have:
- the registration array is now global instead of one by BTL.
- each framework have to declare the entries in the registration array reserved. Then
  it have to define the internal way of sharing (or not) these entries between all
  components. As an example, the PML will not share as there is only one active PML
  at any moment, while the BTLs will have to. The tag is 8 bits long, the first 3
  are reserved for the framework while the remaining 5 are use internally by each
  framework.
- The registration function is optional. If a BTL do not provide such function,
  nothing happens. However, in the case where such function is provided in the BTL
  structure, it will be called by the BML, when a tag is registered.

Now, it's time for the second step... Converting OB1 from a switch based PML to an
active message one.

This commit was SVN r17140.
2008-01-15 05:32:53 +00:00
George Bosilca
48f5a26e8c Cast to keep VC happy (quiet).
This commit was SVN r17054.
2008-01-04 23:13:32 +00:00
Gleb Natapov
8b511b969d Introduce a new BTL parameter btl_rndv_eager_limit which determines size of a
first fragment of rendezvous protocol. Remove no longer used btl_min_send_size
parameter.

This commit was SVN r16969.
2007-12-16 08:35:17 +00:00
Jeff Squyres
80e9730100 Per http://www.open-mpi.org/community/lists/devel/2007/12/2698.php and
this thread:
http://www.open-mpi.org/community/lists/devel/2007/12/2807.php, set
TCP's exclusivity to LOW+100 and SCTP's exclusivity to LOW.

This commit was SVN r16942.
2007-12-12 15:55:37 +00:00
Rich Graham
27a748e7eb change all instances of ompi_free_list_init to ompi_free_list_init_new. Header
and payload data are specified separately at this stage.

This commit was SVN r16633.
2007-11-01 23:38:50 +00:00
George Bosilca
d67c0eefb4 Remove a compilation warning about using uninitialized variables.
This commit was SVN r16589.
2007-10-26 20:15:28 +00:00
George Bosilca
b1b5cb6453 Looks like SO_REUSEPORT it's not defined on some platforms. Switch
to the conventional SO_REUSEADDR instead.

This commit was SVN r16588.
2007-10-26 19:56:21 +00:00