1
1
Граф коммитов

3396 Коммитов

Автор SHA1 Сообщение Дата
Shiqing Fan
b8555448b5 Remove the unnecessary/duplicated unistd.h.
This commit was SVN r22346.
2009-12-28 16:22:16 +00:00
Shiqing Fan
d0f85beaf3 Correctly include those header files.
This commit was SVN r22344.
2009-12-28 16:13:06 +00:00
Shiqing Fan
90e3092ce5 Fix a type cast.
This commit was SVN r22343.
2009-12-28 16:12:46 +00:00
Shiqing Fan
a2d00d4ab8 Exclude a pml component that is not necessary for Windows.
This commit was SVN r22342.
2009-12-28 16:12:28 +00:00
George Bosilca
e127b20038 Correct a type in the name of the help string.
This commit was SVN r22336.
2009-12-21 19:13:25 +00:00
Vasily Filipov
897b7c0aa8 Fix orte_show_help message type error.
This commit was SVN r22321.
2009-12-16 14:11:43 +00:00
Vasily Filipov
e73274f9a9 Disabling SRQ limit event for devices that doesn't support this feature.
This commit was SVN r22320.
2009-12-16 14:05:35 +00:00
Vasily Filipov
87e71b26fe Jeff Squyres fixes
This commit was SVN r22319.
2009-12-16 10:23:58 +00:00
George Bosilca
b3d3a8e7b3 Remove useless lines.
This commit was SVN r22316.
2009-12-15 23:55:14 +00:00
George Bosilca
b85c3ca081 Enable support for the INRIA
knem (http://runtime.bordeaux.inria.fr/knem/) kernel device. This
is part of Ma Teng's work on Open MPI.

This commit was SVN r22315.
2009-12-15 23:34:09 +00:00
Vasily Filipov
c036c6ef95 Adding support for on-demand SRQ pre-post (receive wqe allocation)
This commit was SVN r22313.
2009-12-15 15:52:10 +00:00
Vasily Filipov
354bfe527f Improving support for non homogeneous OpenFabrics network configurations
This commit was SVN r22312.
2009-12-15 14:25:07 +00:00
Pavel Shamis
4d02aea54c Enabling, by default, RDMACM connection manager for RDMAoE devices
This commit was SVN r22311.
2009-12-15 13:52:19 +00:00
Jeff Squyres
4f68dfb03c Remove some dead code (thanks to George for pointing it out).
This commit was SVN r22309.
2009-12-14 21:20:41 +00:00
Christopher Yeoh
848bf0f5cd Fixes deadlock in osc rdma module
See #2102 for details

This commit was SVN r22299.
2009-12-14 01:52:57 +00:00
Christopher Yeoh
d5253aa0f1 Fixes multithread race which causes corruption of no_credits_pending_frags
list in the ib btl. See #2128 for details 

This commit was SVN r22298.
2009-12-14 01:41:45 +00:00
Eugene Loh
8177d91835 Minor change so that if the number of shared-memory FIFOs is greater
than can be used (e.g., number of on-node peers), that no additional
room is set aside for those FIFOs that will never be created.  This
makes it easier to have dedicated FIFOs:  just set btl_sm_num_fifos
to be very large rather than setting it to be the local number of
procs.  In practice, we ask for extra headroom anyhow, so this change
generally won't matter.

This commit was SVN r22291.
2009-12-10 19:28:39 +00:00
George Bosilca
76222eb869 Get rid of the useless mca_pml_base_endpoint_t and replace it by
[the well known and widely used!] mca_pml_endpoint_t.

This commit was SVN r22277.
2009-12-08 17:29:54 +00:00
Pavel Shamis
b024aee10c Removing unused lists from mca_btl_openib_qp_info_t. The lists were moved to device.
This commit was SVN r22271.
2009-12-07 17:42:09 +00:00
George Bosilca
f0303a8b25 Indentation.
This commit was SVN r22254.
2009-12-02 22:03:52 +00:00
Pavel Shamis
7d46985096 Removing unneeded spaces
This commit was SVN r22246.
2009-12-01 11:15:40 +00:00
Pavel Shamis
75a48f4b3c Bugfix for possible race in rdmacm_destroy_dummy_qp
This commit was SVN r22245.
2009-12-01 08:09:43 +00:00
Shiqing Fan
7cf427c39b Include the missing thread header, which is needed when build with --enable-progress-thread.
This commit was SVN r22239.
2009-11-27 14:49:24 +00:00
Brian Barrett
b57b8c5b3f Clean up request handling in the I/O framework to be more consistent with
other request-using frameworks.

 - Rather than having mpi/c/* functions allocate requests explicitly,
   pass the MPI_Request* down to the I/O component and have it 
   perform the allocation.
 - While the I/O base provides a base request which can be used,
   it is not required and all request management occurs within
   the component.
 - Push progress management into the component, rather than having it
   happen in the base.  Progress functions are now easily registered,
   and not all (ie, the one existing) components use progress functions
   in any rational way.

ROMIO switched to generalized requests instead of MPIO_Requests many
moons ago, and Open MPI now uses ROMIO's generalized requests, so there
is no reason to wrap those requests (which are OMPI requests) in another
level of request.

Now the file function passes the MPI_Request* to the ROMIO component,
which passes it to the underlying ROMIO function, which calls 
MPI_Grequest_start to create an OMPI request, which is what gets set
as the request to the user.  Much cleaner.

This patch has two motivations.  One, a whole heck of a lot of code
just got removed, and request handling is now much cleaner for I/O
components.  Two, by adding support for Argonne's proposed generalized
request extensions, we can allow ROMIO to provide async I/O through
generalized requests, which we couldn't rationally do in the old
setup due to the crazy request completion rules.

This commit was SVN r22235.
2009-11-26 05:13:43 +00:00
Brian Barrett
8075640ef1 The tests are MPI programs and are built using mpicc, so including
OMPI headers won't work

This commit was SVN r22233.
2009-11-25 18:06:15 +00:00
Rainer Keller
276b813f48 - Output according to their type.
This commit was SVN r22206.
2009-11-09 14:28:15 +00:00
Rainer Keller
366bd96c88 - Allow to work without xt-catamount module on Jaguar,
reducing the amount of components, that up to now needed to be
   deselected.

This commit was SVN r22205.
2009-11-09 14:26:24 +00:00
Eugene Loh
88c0921c5e Corrected the usage of "rc" in mca_btl_sm_component_progress.
The return code for this function should be the number of events
received.

This commit was SVN r22191.
2009-11-04 03:10:35 +00:00
Jeff Squyres
ab00aea1ff Per http://www.open-mpi.org/community/lists/devel/2009/10/7025.php,
use the new Automake "silent rules" if available.

If you are using an Automake prior to v1.11, you won't see the new 
silent rules -- it will automatically default back to the "verbose" 
rules.

Note, too, that even with these changes, you can enable the verbose 
"make all" output in one of two ways:

1. Add "V=1" to your "make" command line

{{{
shell$ make all V=1
}}}

2. Add "--disable-silent-rules" to your "configure" command line:

{{{
shell$ ./configure --disable-silent-rules ...
}}}

The one down side of using the silent rules by default is that we'll 
get less diagnostic information when users send their build logs.  I 
think we should update the web page to request that users send build 
logs of "make V=1", but I'm guessing that not everyone will do it.

Note that I did ''not'' silent-ize the libltdl build (which is a dozen
or so files in the beginning of the build) because we wholly import
libltdl at autogen time.  I therefore didn't want to patch libltdl
(further) after importing it a) to remain as forward- compatible as
possible, and b) patching the imported libltdl build system might be
tricky in terms of timestamps / dependencies.  So those dozen-or-so
files will still be "verbose", but the rest of the files in OMPI will
be "silent".

This commit was SVN r22189.
2009-11-04 02:07:02 +00:00
Eugene Loh
1a44fc478d In sm_btl_first_time_init(), when we figure the size of the shared
area, we cap the size at LONG_MAX.  But we are figuring out how much
we need.  So, if that amount exceeds LONG_MAX, we should return an
"out of resource" error code.

This commit was SVN r22172.
2009-10-29 23:06:32 +00:00
Rainer Keller
5be03b8fc0 - Patch r22148 overwrites the already defined LDFLAGS, losing e.g. -L...
Needs to be move to cmr:v1.3

This commit was SVN r22152.

The following SVN revision numbers were found above:
  r22148 --> open-mpi/ompi@a6c1fe888f
2009-10-28 14:25:10 +00:00
Jeff Squyres
a6c1fe888f We also need .so versioning of the OMPI "common" components since they
are installed as standalone libraries in $libdir.

This commit was SVN r22148.
2009-10-27 20:58:34 +00:00
Aurelien Bouteiller
59156cd92a Fix gcc 4.3 warning berserk about non-literal string format.
This commit was SVN r22147.
2009-10-27 20:45:02 +00:00
George Bosilca
3a2f071018 If the user asked for dynamic rules but "forget" to provide them, nicely
complain and switch back to the default behavior (fixed rules).

This commit was SVN r22109.
2009-10-19 17:58:47 +00:00
Jeff Squyres
9afe50d886 Update Cisco copyrights for consistency
This commit was SVN r22072.
2009-10-07 22:02:32 +00:00
Jeff Squyres
0d1e177453 Remove 2 extraneous ORTE_ERROR_LOGs and 1 extraneous opal_output.
This commit was SVN r22071.
2009-10-07 20:12:37 +00:00
Jeff Squyres
d56b8d9183 Fix CID 1369: minor memory leak.
This commit was SVN r22067.
2009-10-07 19:40:00 +00:00
Jeff Squyres
de59a24593 Fix CID 1384. Also remove some opal_output(0,...)'s in favor of
ORTE_ERROR_LOG.

This commit was SVN r22066.
2009-10-07 18:58:58 +00:00
Jeff Squyres
ec71acf7ca Fix CID 1385: fix an over-aggressive use of close, munmap, etc. in the
error case.  Also check for MAP_FAILED (instead of -1) from mmap().

This commit was SVN r22065.
2009-10-07 18:43:37 +00:00
Jeff Squyres
5ec86e5fe5 Fix CID 1386: fd can't be valid here, so don't bother to close/unlink.
This commit was SVN r22064.
2009-10-07 18:30:26 +00:00
Jeff Squyres
0f8ac9223f Refs trac:2023, #2027.
This commit does a bunch of things:

 * Address all remaining code review items from CMR #2023:

   * Defer mmap setup to be lazy; only set it up the first time we
     invoke a collective.  In this way, we don't penalize apps that
     make lots of communicators but don't invoke collectives on them
     (per #2027).
   * Remove the extra assignments of mca_coll_sm_one (fixing a
     convertor count setup that was the real problem).
   * Remove another extra/unnecessary assignment.
   * Increase libevent polling frequency when using the RML to
     bootstrap mmap'ed memory.
   * Fix a minor procs-related memory leak in btl_sm.
 * Commit a datatype fix that George and I discovered along the way to
   fixing the coll sm.
 * Improve error messages when mmap fails, potentially trying to
   de-alloc any allocated memory when that happens.
 * Fix a previously-unnoticed confusion between extent and true_extent
   in coll sm reduce.

This commit was SVN r22049.

The following Trac tickets were found above:
  Ticket 2023 --> https://svn.open-mpi.org/trac/ompi/ticket/2023
2009-10-02 17:13:56 +00:00
George Bosilca
16c6370b73 A little bit of cleanup, the main logic is still the same.
This commit was SVN r22043.
2009-10-01 14:05:25 +00:00
Shiqing Fan
21f6a1cb7c Update the corresponding part of mmap for Windows.
This commit was SVN r22038.
2009-09-30 14:50:17 +00:00
Shiqing Fan
96e9ffa016 Fix a type cast.
This commit was SVN r22034.
2009-09-30 14:02:47 +00:00
Jeff Squyres
152bc14079 Rename the help file to be consistent with others; add it to the Makefile.am.
This commit was SVN r22005.
2009-09-23 20:28:49 +00:00
Jeff Squyres
ef338602ef Arrgh -- effectively revert r21997. We ''do'' need that header file...
This commit was SVN r21998.

The following SVN revision numbers were found above:
  r21997 --> open-mpi/ompi@bf5f14ab32
2009-09-22 21:19:38 +00:00
Jeff Squyres
bf5f14ab32 Remove some debugging stuff.
This commit was SVN r21997.
2009-09-22 19:39:01 +00:00
Jeff Squyres
bb69bf22c0 Fix dumb logic in common sm setup that determines which nodes are
local and who has the lowest name.  

This commit was SVN r21994.
2009-09-22 17:54:43 +00:00
Jeff Squyres
b91e7ba91f This is no longer necessary.
This commit was SVN r21991.
2009-09-22 15:01:00 +00:00
Jeff Squyres
1ef988c3d9 A slight optimization: no longer call sched_yield() when polling for
shmem progress (or the Windows equiv).  Instead, poll hard on the
condition, but periocially call opal_progress().  This allows
badly-formed apps (e.g., the ibm test communicator/bsend_free) to
actually complete.

To be clear, there are far too many apps out there that assume that
MPI collectives will actually progress the rest of MPI.  I don't like
putting in a feature to enable broken apps, but I have a dim
recollection of this issue coming up before (apps "hanging" when
testing the sm coll because they assumed that calling collectives
would trigger other MPI progress).  Rather than have people claim that
OMPI is broken, I prefer to put in this "workaround".  :-(

Indeed, the bsend_free test ''may'' be coded that way for exactly that
reason...?  I don't remember offhand...

This commit was SVN r21984.
2009-09-21 22:20:44 +00:00
Jeff Squyres
64e3689a52 Grr -- test ''before'' committing! Sorry for all the noise folks;
this one really fixes the problem.  One more optimization coming later
(separately).

This commit was SVN r21983.
2009-09-21 21:32:26 +00:00
Jeff Squyres
bc43b6a085 Arrgh -- there was an extra assignment in there. Additionally, clean
it up a little to drive the point home that the lowest named proc goes
into array position [0].

This commit was SVN r21982.
2009-09-21 21:15:32 +00:00
Jeff Squyres
f9dfa03fde Fix a potential ordering issue with the names and RML exchange during
sm coll setup.

This commit was SVN r21981.
2009-09-21 21:10:45 +00:00
Josh Hursey
7ac8d89f12 Since r21967 converted the mpool sm module into a real module, it broke some of the C/R logic in the ft_event funciton (actually it wouldn't build after that patch).
This commit fixes the ft_event logic so that it uses the normal destroy funcitonality instead of the workaround with the component that was previously there. All and all it made for cleaner code, which is always good.

If r21967 moves to v1.3, this patch will need to be moved as well.

This commit was SVN r21972.

The following SVN revision numbers were found above:
  r21967 --> open-mpi/ompi@533633b8cb
2009-09-17 14:45:17 +00:00
Josh Hursey
59143be39d Fix a minor C/R bug related to cleaning up session directories when sm is present.
Before this, we would restore the topmost old session directory. This commit makes sure that we remove it when we are done with it.

This commit was SVN r21971.
2009-09-17 14:43:06 +00:00
Edgar Gabriel
9abeaad6e2 so here is what happens:
in the v1.2 series the cid's could never go above the max. allowed for a
particular pml. Because of that, pml_add_comm never checked for the cid, and
in fact pml_add_comm was called in comm_set, which is *before* we knew the
cid.

in the v1.3 series (and trunk) we check now the cid to detect overflow, and
because of that pml_add_comm has been moved *after* the cid allocation
routine, namely into the comm_activate routine.

in the v1.2 series, the comm_activate contained a synchronization step of the
old communicator in order to prevent incoming fragments on the new
communicator, with the main problem being that the allreduce in the
communicator allocation finished at different times on different processes,
and thus, this scenario could and did really occur.

in the v1.3 series, the comm_activate does not contain the synchronization
step anymore, since we introduced the new queue for fragments with unknown
cid. The problem is however, that whether a fragment is known or not is
decided by using ompi_comm_lookup(), which will return something useful as
soon as the cid allocation finished, even before pml_add_comm has been
called. So there is a small time gap where we will not post a message into
queue for unknown cid's, but we can also not look up the process structure
belonging to the rank in that comm ( that is in pml_ob1_match_recv_frag or
something like that). 


The current fix reintroduces the synchronization step in comm_activate, and
ensures that no fragment can be received for a new communicator before the
synchronization occurs , and thus comm_nextcid() and pml_add_comm has been
called. It seems to be the safest and easiest way for now. Welcome back, v1.2.

This commit was SVN r21970.
2009-09-17 14:37:02 +00:00
Jeff Squyres
4a40be650e Improve the MCA param help messages for btl_tcp_if_in|exclude.
This commit was SVN r21968.
2009-09-15 17:19:57 +00:00
Jeff Squyres
533633b8cb Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
 * Clean up concept of mpool module (I think that code was written way
   back when the concept of "modules" was fuzzy)
 * Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
   fix potential segv's when mmap'ed regions were at different
   addresses in different processes (thanks Tim!).
 * Change sm coll to no longer use mpool as its main source of shmem;
   rather, just mmap its own segment (because it's fixed size --
   there was nothing to be gained by using mpool; shedding the use of
   mpool saved a lot of complexity in the sm coll setup).  This
   effectively made Tim's fixes moot (because now everything is an
   offset into the mmap that is computed locally; there are no global
   pointers).  :-)
 * Slightly updated common/sm to allow making mmap's for a specific
   set of procs (vs. ''all'' procs in the process).  This potentially
   allows for same-host-inter-proc mmaps -- yay!
 * Fixed many, many things in the coll sm (particularly in reduce):
   * Fixed handling of MPI_IN_PLACE in reduce and allreduce
   * Fixed handling of non-contiguous datatypes in reduce
   * Changed the order of reductions to go from process (n-1)'s data
     to process 0's data, because that's how all other OMPI coll
     components work
   * Fixed lots of usage of ddt functions
   * When using a non-contiguous datatype, if the root process is not
     (n-1), now we used a 2nd convertor to copy from shmem to the rbuf
     (saves a memory copy vs. what was done before)
   * Lots and lots of little cleanups, clarifications, and minor
     optimizations (although still more could be done -- e.g., I think
     the use of write memory barriers is fairly sub-optimal; they
     could be ganged together at the root, for example)

I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.

This commit was SVN r21967.

The following Trac tickets were found above:
  Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
Lenny Verkhovsky
796b765952 fixed finding minimum distance to ibv_device,
thanks to Pasha .

This commit was SVN r21916.
2009-08-31 07:54:22 +00:00
Nysal Jan
f53f286456 Setup the convertor once during add_procs() instead on every request
This commit was SVN r21873.
2009-08-24 18:50:39 +00:00
Brian Barrett
07d49e982b hdr_ctx is a uint16, so can have CIDs in range of 0 ... 2^16 - 1. I think
someone (me?) must have done 2^(16 - 1) instead.  Ooops.

This commit was SVN r21869.
2009-08-22 05:21:01 +00:00
George Bosilca
5145efdc47 This typo lived way too long ...
This commit was SVN r21864.
2009-08-21 15:23:11 +00:00
Rainer Keller
8e1b23779f - Replace combinations of
#if defined (c_plusplus)
          defined (__cplusplus)
   followed by
      extern "C" {
   and the closing counterpart by BEGIN_C_DECLS and END_C_DECLS.

   Notable exceptions are:
    - opal/include/opal_config_bottom.h:
      This is our generated code, that itself defines BEGIN_C_DECL and
      END_C_DECL
    - ompi/mpi/cxx/mpicxx.h:
      Here we do not include opal_config_bottom.h:                                 
    - Belongs to external code:                                                    
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.c        
      opal/mca/backtrace/darwin/MoreBacktrace/MoreDebugging/MoreBacktrace.h        
    - opal/include/opal/prefetch.h:
      Has C++ specific macros that are protected:                                  

    - Had #if ... } #endif  _and_ END_C_DECLS (aka end up with 2x
      END_C_DECLS)
      ompi/mca/btl/openib/btl_openib.h
    - opal/event/event.h has #ifdef __cplusplus as BEGIN_C_DECLS...
    - opal/win32/ompi_process.h: had extern "C"\n {...
      opal/win32/ompi_process.h: dito
    - ompi/mca/btl/pcie/btl_pcie_lex.l: needed to add *_C_DECLS
      ompi/mpi/f90/test/align_c.c: dito
    - ompi/debuggers/msgq_interface.h: used #ifdef __cplusplus
    - ompi/mpi/f90/xml/common-C.xsl: Amend

   Tested on linux using --with-openib and --with-mx

   The following do not contain either opal_config.h, orte_config.h or
   ompi_config.h
   (but possibly other header files, that include one of the above):
      ompi/mca/bml/r2/bml_r2_ft.h
      ompi/mca/btl/gm/btl_gm_endpoint.h
      ompi/mca/btl/gm/btl_gm_proc.h
      ompi/mca/btl/mx/btl_mx_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_endpoint.h
      ompi/mca/btl/ofud/btl_ofud_frag.h
      ompi/mca/btl/ofud/btl_ofud_proc.h
      ompi/mca/btl/openib/btl_openib_mca.h
      ompi/mca/btl/portals/btl_portals_endpoint.h
      ompi/mca/btl/portals/btl_portals_frag.h
      ompi/mca/btl/sctp/btl_sctp_endpoint.h
      ompi/mca/btl/sctp/btl_sctp_proc.h
      ompi/mca/btl/tcp/btl_tcp_endpoint.h
      ompi/mca/btl/tcp/btl_tcp_ft.h
      ompi/mca/btl/tcp/btl_tcp_proc.h
      ompi/mca/btl/template/btl_template_endpoint.h
      ompi/mca/btl/template/btl_template_proc.h
      ompi/mca/btl/udapl/btl_udapl_eager_rdma.h
      ompi/mca/btl/udapl/btl_udapl_endpoint.h
      ompi/mca/btl/udapl/btl_udapl_mca.h
      ompi/mca/btl/udapl/btl_udapl_proc.h
      ompi/mca/mtl/mx/mtl_mx_endpoint.h
      ompi/mca/mtl/mx/mtl_mx.h
      ompi/mca/mtl/psm/mtl_psm_endpoint.h
      ompi/mca/mtl/psm/mtl_psm.h
      ompi/mca/pml/cm/pml_cm_component.h
      ompi/mca/pml/csum/pml_csum_comm.h
      ompi/mca/pml/dr/pml_dr_comm.h
      ompi/mca/pml/dr/pml_dr_component.h
      ompi/mca/pml/dr/pml_dr_endpoint.h
      ompi/mca/pml/dr/pml_dr_recvfrag.h
      ompi/mca/pml/example/pml_example.h
      ompi/mca/pml/ob1/pml_ob1_comm.h
      ompi/mca/pml/ob1/pml_ob1_component.h
      ompi/mca/pml/ob1/pml_ob1_endpoint.h
      ompi/mca/pml/ob1/pml_ob1_rdmafrag.h
      ompi/mca/pml/ob1/pml_ob1_recvfrag.h
      ompi/mca/pml/v/pml_v_output.h
      opal/include/opal/prefetch.h
      opal/mca/timer/aix/timer_aix.h
      opal/util/qsort.h
      test/support/components.h

This commit was SVN r21855.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2009-08-20 11:42:18 +00:00
Ralph Castain
270f0ffe18 Improve the performance of the csum pml module by not performing checksums on data when sending between procs on the same node.
Thanks to Nysal for this improvement!

This commit was SVN r21848.
2009-08-20 04:33:03 +00:00
Rainer Keller
567e5c4342 - As described in RFC,
http://www.open-mpi.org/community/lists/devel/2009/08/6618.php
   lower the default priority of PML/cm to allow _defined_ behaviour
   for systems, where both MTLs and BTLs are available (Portals and MX).

   Keep the previous behaviour of favoring in case of PSM.
   Still, the user may select --mca pml cm for apps where applicable.

This commit was SVN r21834.
2009-08-18 19:12:43 +00:00
George Bosilca
23e8ce91ba Rework the selection logic for the tuned collectives. All supported collectives
now are able to use the dynamic rules. Moreover, these rules are loaded only once,
and stored at the component level. All communicators are able to use these rules
(not only MPI_COMM_WORLD as until now).
A lot of minor corrections, memory management issues and reduction in the amount
of memory used by the tuned collectives.

This commit was SVN r21825.
2009-08-14 21:06:23 +00:00
Ralph Castain
ded58ae483 Silence some compiler warnings about print statements
This commit was SVN r21814.
2009-08-13 13:45:38 +00:00
Rainer Keller
02a39a208d - Patch r18658 introduced NUMA awareness and memory affinity for
BTL/sm. This static variable needlessly ends up in the so.-file.
   init_maffinity is called once from sm_btl_first_time_init.

   Checked with lennyve, static here is not necessary.

This commit was SVN r21813.

The following SVN revision numbers were found above:
  r18658 --> open-mpi/ompi@f4811d6c4d
2009-08-13 13:08:39 +00:00
Avneesh Pant
261d34db3a Endpoint options port and outsl only appear post version 0x0107 so conditionally compile them in.
This commit was SVN r21812.
2009-08-12 19:59:15 +00:00
Ralph Castain
0c73aa6a97 Fix a couple of errors that are preventing this module from building in MTT.
NOTE: there are still two errors that I cannot fix - will send those to devel list

This commit was SVN r21809.
2009-08-12 13:18:04 +00:00
Shiqing Fan
bce2f44154 Update related .windows files with proper compiling properties, in order to have a successful DSO build.
This commit was SVN r21805.
2009-08-12 08:55:58 +00:00
Pavel Shamis
31a88b149a Fixing thread deadlock flow in openib btl (mpi-thread enabled mode)
This commit was SVN r21793.
2009-08-11 10:43:52 +00:00
George Bosilca
51b2cfe40d This header is required to compile the FT.
This commit was SVN r21792.
2009-08-11 05:21:27 +00:00
Rainer Keller
76469ea64a - Change the property of a few files, that obviously
don't need to be svn:executable...

This commit was SVN r21786.
2009-08-11 01:40:00 +00:00
Rainer Keller
6050020c54 - Use OMPI_SUCCESS.
Fails to compile in environments with --disable-mpi

This commit was SVN r21785.
2009-08-10 17:46:25 +00:00
George Bosilca
9c2b993589 Complete r21778 by adding the missing headers.
This commit was SVN r21784.

The following SVN revision numbers were found above:
  r21778 --> open-mpi/ompi@e4d52b16b5
2009-08-10 17:07:43 +00:00
Terry Dontje
e4d52b16b5 Add in eager limit checks in pmls.
This commit was SVN r21778.
2009-08-10 12:46:20 +00:00
Donald Kerr
de6a7f57b0 fix #1984; only decrement send request req_state when not equal to zero
This commit was SVN r21775.
2009-08-07 14:58:50 +00:00
Steve Wise
ed39853f41 add new device ids for the Chelsio T3 RNIC
This commit was SVN r21774.
2009-08-07 14:14:08 +00:00
Rolf vandeVaart
c82e468ede Undo revision r21767 - sorry folks
This commit was SVN r21769.

The following SVN revision numbers were found above:
  r21767 --> open-mpi/ompi@41f38110ff
2009-08-05 22:23:26 +00:00
Rolf vandeVaart
41f38110ff HCA failover support in openib BTL
This commit was SVN r21767.
2009-08-05 21:53:02 +00:00
George Bosilca
cf8bd2142a Various cleanups and typos.
This commit was SVN r21765.
2009-08-05 03:12:33 +00:00
Rainer Keller
1bd94f2d98 - When calling ompi_mtl_portals_finalize, when then pml/ob1 is used
(aka w/o  --mca pml cm), make sure PtlEQGet will actually work
   on ompi_mtl_portals.ptl_eq_h -- do so without adding code to 
   ompi_mtl_portals_progress.

   Otherwise we abort() with
[nid09979:32503] ompi_mtl_portals_finalize: Going to call ompi_mtl_portals_progress
[nid09979:32503]  Error returned from PtlEQGet.  Error code - 14
[nid09979:32502] Signal: Aborted (6)
[nid09979:32502] Signal code:  (-6) 

This commit was SVN r21761.
2009-08-04 22:48:07 +00:00
George Bosilca
98bdf5d17b Remove a compiler warning about missing braces around
initializer.

This commit was SVN r21760.
2009-08-04 21:41:14 +00:00
George Bosilca
32416761ce Make the open/close symmetric with regards to the local variable
contruction/destruction.

This commit was SVN r21753.
2009-08-03 16:45:18 +00:00
Avneesh Pant
af09e7678c Convert a few opal_output() calls to instead use orte_show_help() as well as do some minor cosmetic changes dealing with tab spacing and c-blocks being enclosed with \{\}. There was also a long standing bug with the PSM mtl if the number of hardware contexts on adapter were less than the number of cores on a node (The default case is they are the same hence no issues were reported). For completeness we take care of this case as well but it requires us to tell PSM how many local processes are running on a node and the local rank of the process on a node so it can allocate the available hardware contexts appropriately.
This commit was SVN r21745.
2009-07-30 02:55:20 +00:00
George Bosilca
0bf381e931 This patch try to solve a issue on Leopard. The supposedly global
variables that are not initialized and are declared in a file that
doesn't export any globally visible function are marked as
non-initialized constants, i.e. uninitialized common symbols. For some
obscure reasons, they get removed from the object files on Mac OS X.

So far I found two solution to this problem. One require the addition
of "-c" to the linker command, the second one (corresponding to this
patch) force them to became a common initialized symbol.

This commit was SVN r21739.
2009-07-28 17:06:16 +00:00
Avneesh Pant
38e48d4e2f Add support for MCA parameters for PSM MTL to specify IB unit, port, IB service level and PSM debug level to use. Also specify in the openib btl params file that QLogic hardware supports a max inlined messages size of 0 only.
This commit was SVN r21734.
2009-07-24 20:09:39 +00:00
Edgar Gabriel
9a369d0fc1 - we accidentally decreased the counter for the number of dynamic
communicators twice, once in dpm.disconnect_wait, and once in
comm_free. The second location seems to be the right place for that (since a
communicator could be freed, and not disconnected), remove the instance in
disconnect_wait.

- add some error messages in case something goes wrong.

This commit was SVN r21720.
2009-07-20 19:54:24 +00:00
Terry Dontje
d432c9fdbc Add asserts to catch when btl_eager_limit is smaller than the pml headers.
This commit was SVN r21707.
2009-07-17 14:54:18 +00:00
George Bosilca
8275120656 Get rid of the ompi_convertor.h header file. Replace all references to ompi_convertor
by opal_convertor.
Cleanup the pcie BTL.

This commit was SVN r21703.
2009-07-16 19:13:30 +00:00
George Bosilca
2b57fc3835 Add the missing headers.
This commit was SVN r21701.
2009-07-16 18:42:14 +00:00
George Bosilca
3e971e61f3 The system headers are supposed to be protected by #ifdef and not by #if.
This commit was SVN r21700.
2009-07-16 18:27:33 +00:00
George Bosilca
d07ffedc54 No opal datatype functions in the BTL. The datatype attached to the
convertor is an ompi_datatype_t so calling the ompi level functions
is the way to go.

This commit was SVN r21698.
2009-07-16 18:25:08 +00:00
Jeff Squyres
9dc7f884b2 Fix yet another compile error from the great DDT split (r21641). Sigh.
This commit was SVN r21697.

The following SVN revision numbers were found above:
  r21641 --> open-mpi/ompi@6c5532072a
2009-07-16 18:08:03 +00:00
George Bosilca
e1383027e1 Correct a comment and cleanup/reorder the code.
This commit was SVN r21696.
2009-07-16 17:41:32 +00:00
Ralph Castain
e75d9b8296 Use orte_notifier to alert sys admins to checksum violations in the csum pml.
Add ability to store the RM's jobid string to tag the notifier message so that the sys admin knows what job had the problem.

This commit was SVN r21687.
2009-07-15 19:43:26 +00:00
Rainer Keller
8243831d76 - Get OpenIB BTL to work with old libibverbs installation
Tested on smoky.

This commit was SVN r21685.
2009-07-15 16:12:47 +00:00
Ralph Castain
dbac602be5 Add support for the add-host and add-hostfile MPI Info keys to allow Comm_spawn users to add new hosts to those already known by mpirun.
Requires full testing once comm_spawn is fixed (Edgar is working that now).

This commit was SVN r21664.
2009-07-14 14:34:11 +00:00
George Bosilca
2143424eb5 The MCA parameter should always be taken into account, independent on
how many networks are available on the node.

This commit was SVN r21652.
2009-07-13 19:40:00 +00:00
Josh Hursey
8d9d2ba7d1 Fix the datatype usage in CRCP Bkmrk. as a result of the great datatype shift in r21641
This commit was SVN r21650.

The following SVN revision numbers were found above:
  r21641 --> open-mpi/ompi@6c5532072a
2009-07-13 17:54:26 +00:00
Rainer Keller
6c5532072a - Split the datatype engine into two parts: an MPI specific part in
OMPI
   and a language agnostic part in OPAL. The convertor is completely
   moved into OPAL.  This offers several benefits as described in RFC
   http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
   namely:
    - Fewer basic types (int* and float* types, boolean and wchar
    - Fixing naming scheme to ompi-nomenclature.
    - Usability outside of the ompi-layer.
 - Due to the fixed nature of simple opal types, their information is
   completely
   known at compile time and therefore constified
 - With fewer datatypes (22), the actual sizes of bit-field types may be
   reduced
   from 64 to 32 bits, allowing reorganizing the opal_datatype
   structure, eliminating holes and keeping data required in convertor
   (upon send/recv) in one cacheline...
   This has implications to the convertor-datastructure and other parts
   of the code.
 - Several performance tests have been run, the netpipe latency does not
   change with
   this patch on Linux/x86-64 on the smoky cluster.
 - Extensive tests have been done to verify correctness (no new
   regressions) using:
   1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
    ompi-ddt:
    a. running both trunk and ompi-ddt resulted in no differences
       (except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
       correctly).
    b. with --enable-memchecker and running under valgrind (one buglet
       when run with static found in test-suite, commited)
   2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
      all passed (except for the dynamic/ tests failed!! as trunk/MTT)
   3. compilation and usage of HDF5 tests on Jaguar using PGI and
      PathScale compilers.
   4. compilation and usage on Scicortex.
 - Please note, that for the heterogeneous case, (-m32 compiled
   binaries/ompi), neither
   ompi-trunk, nor ompi-ddt branch would successfully launch.

This commit was SVN r21641.
2009-07-13 04:56:31 +00:00
Brian Barrett
2f3c0b4fcf Drain pipe from service thread to main thread during shutdown. By this
point, the event engine has been shut down until btl finalization is
done, so opal_progress in the wait loop is not an option - we have
to drain from inside the btl.

Clean up the looping structure for the finalize routine

Update copyrights.

This commit was SVN r21620.
2009-07-09 22:13:10 +00:00
Brian Barrett
ac34b1de69 RDMA CM doesn't retry if a packet is dropped, just timesout during route
discovery, which results in a timeout and we don't recover.  Instead,
try to recover a couple of times by retrying.

This commit was SVN r21619.
2009-07-09 22:10:06 +00:00
George Bosilca
311e27b42f Pretty print an error message when the specified range of ports (for both
IPv4 and IPv6) is outside the legal boundaries. This fixes trac:1869.

This commit was SVN r21612.

The following Trac tickets were found above:
  Ticket 1869 --> https://svn.open-mpi.org/trac/ompi/ticket/1869
2009-07-07 17:52:30 +00:00
George Bosilca
4038834dfb Convert the port number in network order before binding the socket.
Thanks to Mariusz Mamonski (mamonski@man.poznan.pl) for the bug
report and patch.

This commit was SVN r21610.
2009-07-07 17:21:28 +00:00
Jeff Squyres
92e40cb20a Enable the coll sync component to barrier before each 1000th collective.
This commit was SVN r21594.
2009-07-02 20:16:45 +00:00
Brian Barrett
3b410b0200 Increase context ref count and push on list before calling rdma_resolve_addr,
in case the event returns before rdma_resolve_addr returns.

This commit was SVN r21588.
2009-07-02 16:12:19 +00:00
Shiqing Fan
0b56a8a4d5 Enable IPv6 on Windows by default, and fix two type casts for IPv6 operations.
This commit was SVN r21586.
2009-07-02 14:41:03 +00:00
Jeff Squyres
cad12fda5f * Remove an extra blank line from the help file
* Add the help file to the Makefile.am so that it gets installed

This commit was SVN r21567.
2009-06-30 18:58:09 +00:00
Eugene Loh
fcd9fabae9 Minor cleanup in the sm BTL. http://www.open-mpi.org/community/lists/devel/2009/06/6363.php
This commit was SVN r21556.
2009-06-27 23:42:09 +00:00
Nysal Jan
5b13fd004a Bump up default MTU for eHCA 2. Improves peak unidirectional bandwidth by around 14%
This commit was SVN r21553.
2009-06-27 07:39:30 +00:00
Ralph Castain
ee18838e2f Remove svn conflict lines due to commit r21551 in the sm btl. I #if 0'd out the offending line that cause the conflict just in case it was the correct one. However, this now compiles cleanly, minus the following warnings that I wasn't sure which way to resolve:
btl_sm.c: In function ‘mca_btl_sm_sendi’:
btl_sm.c:734: warning: comparison between signed and unsigned
btl_sm.c: In function ‘mca_btl_sm_send’:
btl_sm.c:812: warning: comparison between signed and unsigned

This commit was SVN r21552.

The following SVN revision numbers were found above:
  r21551 --> open-mpi/ompi@bd995d26b4
2009-06-27 01:39:15 +00:00
Eugene Loh
bd995d26b4 Try to improve flow control in the sm BTL:
- poll FIFO occasionally even if just sending messages
- retry pending sends more often
  - just before trying a new send
  - as part of mca_btl_sm_component_progress
Maintain two new mca_btl_sm_component variables, num_outstanding_frags
and num_pending_sends, to keep overhead low.


Drain only one message fragment from the FIFO per btl_sm_component_progress
call (rather than drain until empty, which in retrospect everyone considers
to have been a mistake).

This commit was SVN r21551.
2009-06-27 00:12:56 +00:00
George Bosilca
24e74922ce Not yet the right time for this to be there.
This commit was SVN r21540.
2009-06-26 16:00:55 +00:00
Lenny Verkhovsky
7f8dc7c8b8 fix for r21524, mispell fix HAVE_IBV_FORK_INIT
This commit was SVN r21533.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r21524
2009-06-25 17:45:38 +00:00
Terry Dontje
efac1b73fb Surround use of want_fork_support structure field with ifdef instead of c conditional.
This commit was SVN r21526.
2009-06-25 12:03:30 +00:00
Nysal Jan
938599cb2d Fix build failure with latest IBM XL C/C++ v10.1 compiler. Also this seems like cleaner code.
This commit was SVN r21497.
2009-06-23 14:08:04 +00:00
George Bosilca
7f24b41051 This function doesn't have to be globally visible.
This commit was SVN r21494.
2009-06-22 17:14:54 +00:00
Jeff Squyres
c39998db17 Also show the "you might not have enough registered memory" warning
message earlier in the openib BTL startup sequence

This commit was SVN r21469.
2009-06-18 12:24:39 +00:00
Rolf vandeVaart
36a560506c Fix error message to match code default.
This commit was SVN r21452.
2009-06-16 20:59:53 +00:00
Rainer Keller
5e6061af02 - Few fixes and comments
This commit was SVN r21443.
2009-06-15 21:12:04 +00:00
Ralph Castain
44bb265a52 Add a new MPI_Info key to preposition OMPI libraries - implementation underway, but this just defines and passes the new key
This commit was SVN r21425.
2009-06-12 17:53:13 +00:00
Jeff Squyres
814a8f5e0f * Fix #1916: endian problems in iwarp wireup on big endian machines
(now works on both big and little endian machines)
 * Be a little more flexible when looking for active devices in
   btl_openib_component.c
 * Add device name and port number to lots of verbose and help
   messages
 * Add a bunch of verbose messages to give insight into what is
   occurring during all the CPC wireups

This commit was SVN r21418.
2009-06-11 17:30:30 +00:00
Ralph Castain
4881cd0df3 Revert the prior change out from the individual .h files - the problem was in the Makefile.am's, causing the make dist to fail.
This commit was SVN r21414.
2009-06-11 03:15:47 +00:00
Ralph Castain
91ab2b3e4f Specify complete path to included header files so it compiles in all environments
This commit was SVN r21412.
2009-06-11 02:46:30 +00:00
Avneesh Pant
295c0c36bc Add new QLogic adapter to device parameter file.
This commit was SVN r21409.
2009-06-10 23:37:38 +00:00
Terry Dontje
fa9f356c8c This commit fixes trac:1931. By separating out structures and defines used by
the debugger plugins into files suffixed by _dbg.h.

This commit was SVN r21404.

The following Trac tickets were found above:
  Ticket 1931 --> https://svn.open-mpi.org/trac/ompi/ticket/1931
2009-06-10 14:57:28 +00:00
George Bosilca
f9a510fd8a There is no need for an atomic read if we are not in a threaded case.
This commit was SVN r21394.
2009-06-08 23:55:52 +00:00
Ralph Castain
4d3aa5a8a4 Once again, into the breach!
Yes, friends, our favorite PCIE BTL has resurfaced as mgmt vacillates over its existence. This is an updated version that actually mostly works, in its final stages of debugging.

Some generalization still remains to be done...

This commit was SVN r21358.
2009-06-02 22:26:36 +00:00
Shiqing Fan
e46bf10efd Correctly include win32 util header.
This commit was SVN r21343.
2009-06-01 19:16:00 +00:00
Rainer Keller
b572dc3591 - As discussed revert r21330, Fortran-configure info should
not end up in OPAL
 - Will post an updated patch for the OMPI_ALIGNMENT_ parts (within C).

This commit was SVN r21342.

The following SVN revision numbers were found above:
  r21330 --> open-mpi/ompi@95596d1814
2009-06-01 19:02:34 +00:00
Rainer Keller
95596d1814 - Move alignment and size output generated by configure-tests
into the OPAL namespace, eliminating cases like opal/util/arch.c
   testing for ompi_fortran_logical_t.
   As this is processor- and compiler-related information
   (e.g. does the compiler/architecture support REAL*16)
   this should have been on the OPAL layer.
 - Unifies f77 code using MPI_Flogical instead of opal_fortran_logical_t

 - Tested locally (Linux/x86-64) with mpich and intel testsuite
   but would like to get this week-ends MTT output


 - PLEASE NOTE: configure-internal macro-names and
   ompi_cv_ variables have not been changed, so that
   external platform (not in contrib/) files still work.

This commit was SVN r21330.
2009-05-30 15:54:29 +00:00
Nysal Jan
85773de539 Move 'multi-frag receive' fixes to csum PML
This commit was SVN r21323.
2009-05-29 08:07:33 +00:00
Rainer Keller
9ef87898e0 (originally, did not want to do this during business hours, but
well..)
 - As Jeff suggested, for m4 macros, dont use _ OPAL, but
   rather OPAL_ prefix
 - Set the variable before AC_SUBST, so that replacement happens
   in f77 header-file, too.

This commit was SVN r21316.
2009-05-28 20:28:43 +00:00
Jeff Squyres
f960f2d944 Fix compiler warning
This commit was SVN r21312.
2009-05-28 13:34:48 +00:00
George Bosilca
2f9765926e This is the real commit, the previous one was just a test ... to make
sure Jeff is reading all commits ;)

This commit was SVN r21308.
2009-05-28 00:49:40 +00:00
George Bosilca
63118c9beb Do not allow redefinition of MPI_MAX_DATAREP_STRING.
This commit was SVN r21307.
2009-05-28 00:37:19 +00:00
George Bosilca
be320ca959 Don't add the offset to all segments, only the first one should be affected. Thanks
to Roberto Ammendola for this bug report and patch.

This commit was SVN r21300.
2009-05-27 16:12:18 +00:00
Edgar Gabriel
d93def71ea second part of the 'running out of cids problem', this time focusing on what
happens when hierarch is used. . Two major items:
 - modify the comm_activate step to take an additional argument, indicating
 whether the new communicatio has to go through the collective selection
 step. This is not required sometimes (e.g. when a process calls
 MPI_COMM_SPLIT with color=MPI_UNDEFINED), and contributed significantly to
 the exhaustion of cids.
 - when freeing a communicator, check whether we can reuse the block of cids
 assigned to that comm. This only works if the current front of the cid
 assignment (cid_block_start) is right ater the block of cids assigned to this
 comm.

Fixes trac:1904
Fixes trac:1926

This commit was SVN r21296.

The following Trac tickets were found above:
  Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904
  Ticket 1926 --> https://svn.open-mpi.org/trac/ompi/ticket/1926
2009-05-27 15:21:07 +00:00
Shiqing Fan
b3c077e112 Microsoft doesn't provide inet_pton and inet_ntop APIs on Windows XP, but only on Windows Vista and 2008. So add a stand alone version of inet_pton and inet_ntop functions from ISC.
This commit was SVN r21295.
2009-05-27 14:32:30 +00:00
Rainer Keller
fbb2834977 - Missed string.h to get rid of warnings...
This commit was SVN r21265.
2009-05-22 23:47:49 +00:00
Rainer Keller
225a1d6d8e - For memcpy and memset need string.h
This commit was SVN r21259.
2009-05-21 22:36:06 +00:00
George Bosilca
26342508de It is supposed not to be set until we detect our local rank ...
This commit was SVN r21256.
2009-05-20 17:41:13 +00:00
George Bosilca
7bd97ac17b Try to deal with the ticket #1831. I think we might reach a case where the
shm_fifos values are only partially updated, and this leads to wrong values
for the offset. Moving the write barrier at the right place, plus forcing
some read barriers might help.

In addition I get rid of the sm_offset array which is completely useless.

This commit was SVN r21253.
2009-05-19 22:50:44 +00:00
Jeff Squyres
8dc83b0865 Use the newer ORTE_NAME_PRINT instead of ORTE_NAME_ARGS
This commit was SVN r21250.
2009-05-18 14:38:13 +00:00
Jeff Squyres
17761f60b9 Clean up a few compiler warnings
This commit was SVN r21243.
2009-05-14 15:40:38 +00:00
Jeff Squyres
efd229b56b Clean up bunches of compiler warnings
This commit was SVN r21242.
2009-05-14 15:39:53 +00:00
Rainer Keller
b2f8095ba7 - Update to fix in r21234: as discussed on devel@,
for printing size_t use "%lu" and cast to (unsigned long).

This commit was SVN r21238.

The following SVN revision numbers were found above:
  r21234 --> open-mpi/ompi@22b6177fb9
2009-05-14 14:10:22 +00:00
Rainer Keller
d5cb1b4f97 - Shown by __opal_attribute_format__, probably macro
was converted from opal_output_verbose

This commit was SVN r21237.
2009-05-14 13:57:37 +00:00
Pavel Shamis
14eb11c18a Removing unused and duplicated XRC code.
I just found that we have 2 place where we call for XRC domain
creation. First one in init_one_device() and second one prepare_device_for_use().
They have absolutely identical code, but the call in init_one_device() is useless
because on this stage we don't know about QP configuration and we don't know if we need 
XRC at all. So I removing the duplicated code from init_one_device().

This commit was SVN r21235.
2009-05-14 13:15:00 +00:00
Rainer Keller
22b6177fb9 - Use the "z" length modifier for size_t arguments for printf.
This commit was SVN r21234.
2009-05-14 00:52:20 +00:00
Rainer Keller
73fd329cbd - Add the proper __opal_attribute_format__(__printf__...) to
declarations.

This commit was SVN r21226.
2009-05-14 00:10:59 +00:00
Jeff Squyres
f7f5603d5c * No need for warning aggregation here; orte_show_help() does that
for us already.
 * Slightly clarify the error message strings; now they match the new
   error strings for btl_openib_ipaddr_in|exclude

This commit was SVN r21197.
2009-05-09 12:29:06 +00:00
Jeff Squyres
cd6c6e6206 Add some error checking into the btl_openib_ipaddr_in|exclude
calculations, and fix a few small memory leaks in that same logic.

This commit was SVN r21196.
2009-05-09 12:28:09 +00:00
Jeff Squyres
c166678cf2 Gah -- forgot to test the case where btl_tcp_if_include wasn't
specified.  :-(

This commit was SVN r21191.
2009-05-07 21:50:02 +00:00
Jeff Squyres
97c9610b7f Note to self: test error messages before committing.
This commit was SVN r21190.
2009-05-07 20:31:26 +00:00
Jeff Squyres
4779520787 Add the ability to btl_tcp_if_include and btl_tcp_if_exclude based on
subnet specifications (in addition to interface names).  These
parameters now take a comma-delimited list of interfaces names and/or
a.b.c.d/x specifications (only IPv4 currently supported for subnet
specifications).  For example:

  mpirun --mca btl_tcp_if_include 10.10.30.0/8,eth0

This commit was SVN r21189.
2009-05-07 18:53:33 +00:00
Jeff Squyres
af8f0438ef Make a better error message when the TCP BTL fails to connect
This commit was SVN r21184.
2009-05-07 14:09:35 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Shiqing Fan
cd565923d3 Completely remove ltdl support for Windows build.
This commit was SVN r21170.
2009-05-05 18:59:13 +00:00
Ralph Castain
f6b95c4ee0 Remove unused var
This commit was SVN r21164.
2009-05-05 15:12:10 +00:00
George Bosilca
039fed1973 Fix Coverity CID #264.
This commit was SVN r21162.
2009-05-05 13:54:55 +00:00
George Bosilca
db096d7d3a Fix Coverity CID #304.
This commit was SVN r21159.
2009-05-05 13:47:47 +00:00
George Bosilca
b4334cff2e Cleanup.
This commit was SVN r21158.
2009-05-05 13:42:28 +00:00
George Bosilca
271eb11f28 Remove an unused statically defined function.
This commit was SVN r21157.
2009-05-05 13:23:49 +00:00
Rainer Keller
9bcc47b05c - Similar to the previous commit, pass ONE character less into
ompi_info_get, to not stand the chance to overwrite any of the
   buffers (on the stack).

This commit was SVN r21155.
2009-05-05 13:06:28 +00:00
Rainer Keller
250c3d0ddd - Fix Coverity CID 527
malloc buffer for ompi_info_get one character larger for the NUL-termination
   See comment in ompi/mpi/c/info_get.c or MPI-2.1 p289

This commit was SVN r21154.
2009-05-05 13:05:20 +00:00
Rainer Keller
9736af1191 - Fix Coverity CID 182:
Well, well, just do not "call" ompi_comm_rank twice but rather
   reuse variable...

 - Fix Coverity CID 1262:
   Using uninitialized value "(statuses[err_index]).MPI_ERROR"
   Sure, these statuses are only initialized after ompi_request_wait_all,
   so introduce a short-circuit label to jump to...

This commit was SVN r21153.
2009-05-05 12:28:51 +00:00
Brian Barrett
77cf736f48 Make max_contextid field match same type as cid in communicator.
Refs trac:1904

This commit was SVN r21141.

The following Trac tickets were found above:
  Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904
2009-05-01 21:11:59 +00:00
Ralph Castain
f6da7d86a2 Propagate Brian's change so we abort if we run out of CIDs to the csum module
This commit was SVN r21137.
2009-05-01 15:09:44 +00:00
Brian Barrett
7f898d4e2b * Make rdma the default. Somehow, the code didn't match what was supposed
to happen
* Properly error out (rather than cause buffer overflow) in case where
  the datatype packed description is larger than our control fragments.
  This still isn't standards conforming, but at least we know what
  happened.
* Expose win_set_name to external libraries (like the osc modules)
* Set default window name to the CID of the communcator it's using
  for communication

Refs trac:1905

This commit was SVN r21134.

The following Trac tickets were found above:
  Ticket 1905 --> https://svn.open-mpi.org/trac/ompi/ticket/1905
2009-04-30 22:36:09 +00:00
Brian Barrett
736debcffc Check during communicator creation that we didn't get assigned a CID we can't handle, so that the code aborts instead of hange.
Refs trac:1904

This commit was SVN r21133.

The following Trac tickets were found above:
  Ticket 1904 --> https://svn.open-mpi.org/trac/ompi/ticket/1904
2009-04-30 19:23:57 +00:00
Josh Hursey
13a3453e35 more copyright fixes - sorry
This commit was SVN r21129.
2009-04-30 16:41:50 +00:00
Josh Hursey
1f42065950 Make sure the CRCPW wrapper does not try to reference a NULL value in MPI_Finalize(), due to the ordering of pml_finalize and comm_del.
Some of the PML interfaces are noops in BKMRK. Allow the CRCPW to detect and skip the call to these functions.

This commit was SVN r21126.
2009-04-30 16:36:50 +00:00
Jeff Squyres
c01079e1f3 Fix some handling in error cases (avoid uninitialized variables,
etc.).  Yay compiler warnings for noticing these.  :-)

This commit was SVN r21114.
2009-04-29 14:40:12 +00:00
Rainer Keller
1ef32928fd - get MTL up and running:
Nothing notable, except mtl_base_datatype.h -- Undo change from r21096:
   Yes, we should not include datatype_internal.h, but we did and we have to:
   we derefence desc, and get an incomplete type, otherwise.

This commit was SVN r21103.

The following SVN revision numbers were found above:
  r21096 --> open-mpi/ompi@221fb9dbca
2009-04-29 08:04:16 +00:00
Rainer Keller
221fb9dbca ... Delayed due to notifier commits earlier this day ...
- Delete unnecessary header files using
   contrib/check_unnecessary_headers.sh after applying
   patches, that include headers, being "lost" due to
   inclusion in one of the now deleted headers...

   In total 817 files are touched.
   In ompi/mpi/c/ header files are moved up into the actual c-file,
   where necessary (these are the only additional #include),
   otherwise it is only deletions of #include (apart from the above
   additions required due to notifier...)

 - To get different MCAs (OpenIB, TM, ALPS), an earlier version was
   successfully compiled (yesterday) on:
   Linux locally using intel-11, gcc-4.3.2 and gcc-SVN + warnings enabled
   Smoky cluster (x86-64 running Linux) using PGI-8.0.2 + warnings enabled
   Lens cluster (x86-64 running Linux) using Pathscale-3.2 + warnings enabled

This commit was SVN r21096.
2009-04-29 01:32:14 +00:00
Rainer Keller
6c1cce8761 - For the upcoming header cleanup commit,
several header files (previously included by header-files)
   now have to be moved "upward".
   This is mainly system headers such as string.h, stdio.h and for
   networking, but also some orte headers.

This commit was SVN r21095.
2009-04-29 00:49:23 +00:00
Josh Hursey
aae5d3c825 Switch up the conditional statement, add OPAL_LIKELY.
Thanks to Jeff for the suggestion.

This commit was SVN r21082.
2009-04-27 19:08:05 +00:00
Josh Hursey
2e1fbc4c75 Fix a copy problem when using a drained message that has a NULL buffer. This often happens when draining a Barrier message which are sometimes point-to-point messages with count=0 and a NULL buffer.
This fixes a bug that can happen when checkpointing while one process is in such a routine. Previously a warning was thrown.

This commit was SVN r21080.
2009-04-27 18:39:45 +00:00
Shiqing Fan
3d4e0472d6 Add windows support files into the tarball, including .windows, CMakeLists.txt files, and CMake modules. Thanks to Jeff for testing it on Linux.
This commit was SVN r21069.
2009-04-24 16:39:33 +00:00
Jeff Squyres
8704d33f4e Revamp one of the credits system checks and make it smaller, simpler,
and mo'bettah.  Put in lengthy comments explaining what's going on.
We might still want to tweak this some more, but we can no longer get
IMB-EXT to hang with this new code anymore (e.g., even without eager
RDMA -- we discovered after the fact that the code in the v1.3.2
release will hang if eager RDMA is disabled).

Fixes trac:1890.  Really.

This commit was SVN r21061.

The following Trac tickets were found above:
  Ticket 1890 --> https://svn.open-mpi.org/trac/ompi/ticket/1890
2009-04-23 20:04:39 +00:00
Ethan Mallove
0f4c18472c Remove unused {{{malloc.h}}} include.
(see http://www.open-mpi.org/community/lists/devel/2009/04/5845.php)

This commit was SVN r21049.
2009-04-21 17:35:58 +00:00
Jeff Squyres
9d94fa2308 Fixes trac:1890: we weren't checking the return status of
_endpoint_post_send(), which could result in an infinite loop (see the
comment in the code).

This is part one of a proper fix; it's suitable for the v1.3 tree and
for an immediate release.  Pasha and I plan to spend a little more
time and clean up this stuff properly, but it does not need to be
included in v1.3.2.

This commit was SVN r21047.

The following Trac tickets were found above:
  Ticket 1890 --> https://svn.open-mpi.org/trac/ompi/ticket/1890
2009-04-21 16:57:17 +00:00
Nysal Jan
5353236a53 Reduce the size of FIN header by 8 bytes. This is done by rearranging the fields to reduce the amount of compiler padding
This commit was SVN r21046.
2009-04-21 14:41:51 +00:00
Jeff Squyres
c4b307d57e Fix an incorrect comment.
This commit was SVN r21045.
2009-04-21 13:38:43 +00:00
Jeff Squyres
6f78420900 Correctly check for the case where we didn't find the mpool that we
were looking for.  This makes the openib btl fail a little more
gracefully (for example) if you specify a bogus value to
btl_openib_mpool.

Thanks to Roberto Ammendola for identifying the exact issue.

This commit was SVN r21044.
2009-04-21 12:40:50 +00:00
Ralph Castain
2b0b9dd227 Sync to change in a tmp branch
This commit was SVN r21015.
2009-04-15 13:09:51 +00:00
Ralph Castain
46d6c6d516 Sync the csum module with the recent ob1 changes
This commit was SVN r21002.
2009-04-14 18:40:54 +00:00
Nysal Jan
221447ef17 Fix checksum mismatch on Big-endian systems when heterogeneous mode is enabled
This commit was SVN r21001.
2009-04-14 17:21:38 +00:00
Ralph Castain
9c39a3edd7 Enable the passing of MCA params to dynamically spawned jobs. This creates a new info_key "ompi_param" that allows a user to specify MCA params for a dynamically spawned job.
We currently apply all of the MCA params in the parent job to the child. This commit allows a user to specify additional params for the child job, and to override any pre-existing params with the new value so they can better control behavior of the child job.

This commit was SVN r20989.
2009-04-14 14:15:49 +00:00
Shiqing Fan
339948928d Only include unistd.h, if we have it.
This commit was SVN r20988.
2009-04-14 13:39:06 +00:00
Nysal Jan
697f1837f4 Move fix for ticket #1875 to csum PML
This commit was SVN r20986.
2009-04-14 10:44:29 +00:00
Jeff Squyres
778c8c86d2 Make mpool fail-to-unregister-freed-memory errors be fatal. Try to
make that routine a bit more safe, too (ensure to not call malloc and
friends if from_alloc==true).

This commit was SVN r20984.
2009-04-14 00:54:20 +00:00
George Bosilca
b5deb228f3 Allow the BTL to release the descriptor. In fact the only thing the PML
needs is to be involved in the RMA completion process, which is insured
by the MCA_BTL_DES_SEND_ALWAYS_CALLBACK flag. Fixes trac:1875.

This commit was SVN r20983.

The following Trac tickets were found above:
  Ticket 1875 --> https://svn.open-mpi.org/trac/ompi/ticket/1875
2009-04-13 23:41:50 +00:00
Jeff Squyres
8de80e9297 mca_mpool_base_mem_cb_array is no longer used anywhere in the code
base.  Remove it.

This commit was SVN r20982.
2009-04-13 23:41:45 +00:00
George Bosilca
64075cd54d Get rid of the bitmap header file.
This commit was SVN r20972.
2009-04-10 16:44:37 +00:00
George Bosilca
527540aeb1 Rename req_bytes_delivered to req_bytes_expected for the receive
requests to really reflect what this field means.

This commit was SVN r20971.
2009-04-10 16:36:20 +00:00
George Bosilca
c148d33eb5 Play nicely with the reference count on the ompi_proc structure.
This commit was SVN r20970.
2009-04-10 16:32:02 +00:00
Nysal Jan
1decf8bf36 Move ob1 FIN/ACK fixes to csum PML
This commit was SVN r20954.
2009-04-08 10:43:35 +00:00
George Bosilca
dfc7cea329 Fix the deadlock issues on the osu_bw. The problem is that the PML is
event driver, and if there are no event generated by the BTLs ... well
nothing happens (i.e there is no progress at the PML level and all
pending fragments remain pending). By forcing the BTL to trigger the
callbacks for all ACK and FIN, we give more opportunities to the PML
to do real progress, but we pay this in terms of performance.

This commit was SVN r20953.
2009-04-07 16:56:37 +00:00
George Bosilca
44ce610b8b Add a comment to highlight the fact that this function reappend the
FIN message to the pending list when the send fails. Therefore, any
upper level function is not required to add it.
Make sure we don't send the FIN twice.

This commit was SVN r20952.
2009-04-07 16:48:58 +00:00
George Bosilca
ccb79b963f This is the other half of the commit r20946 as I mess them up between
two of my testing machines. The fix require both commits!

This commit was SVN r20947.

The following SVN revision numbers were found above:
  r20946 --> open-mpi/ompi@e2bb4c9b8f
2009-04-06 21:49:52 +00:00
George Bosilca
e2bb4c9b8f Correct the handling of the pckt_pending list. The problem was that
we returned the pck before coping the values out. With this change
it seems to work at least on two architectures (even with the 
mpool size set back to 0).

This commit was SVN r20946.
2009-04-06 21:45:08 +00:00
Eugene Loh
c4adfd1806 Increase mpool_sm_min_size default to 64M so osu_bw will run.
This commit was SVN r20944.
2009-04-06 18:37:00 +00:00
Nysal Jan
5032f59edf Fix checksum computation in the buffered send code
This commit was SVN r20935.
2009-04-03 07:09:24 +00:00
Ralph Castain
ba1a98c398 Fix a warning message by pointing to the correct header
This commit was SVN r20930.
2009-04-02 13:54:59 +00:00
Nysal Jan
e561a6c43a Add missing checksum calculation. This fixes a checksum mismatch failure while using TCP BTL
This commit was SVN r20927.
2009-04-01 20:48:35 +00:00
Jeff Squyres
fc8993ba87 mpool name is the first show_help param, not the last.
This commit was SVN r20925.
2009-04-01 18:42:01 +00:00
Jeff Squyres
0d52271cd6 Per http://www.open-mpi.org/community/lists/announce/2009/03/0029.php
and https://svn.open-mpi.org/trac/ompi/ticket/1853, mallopt() hints do
not always work -- it is possible for memory to be returned to the OS
and therefore OMPI's registration cache becomes invalid.

This commit removes all use of mallopt() and uses a different way to
integrate ptmalloc2 than we have done in the past.  In particular, we
use almost exactly the same technique as MX:

 * Remove all uses of mallopt, to include the opal/memory mallopt
   component.
 * Name-shift all of OMPI's internal ptmalloc2 public symbols (e.g.,
   malloc -> opal_memory_ptmalloc2_malloc).
 * At run-time, use the existing glibc allocator malloc hook function
   pointers to fully hijack the glibc allocator with our own
   name-shifted ptmalloc2.
 * Make the decision whether to hijack the glibc allocator ''at run
   time'' (vs. at link time, as previous ptmalloc2 integration
   attempts have done).  Look at the OMPI_MCA_mpi_leave_pinned
   and OMPI_MCA_mpi_leave_pinned_pipeline environment variables and
   the existence of /sys/class/infiniband to determine if we should
   install the hooks or not.
 * As an added bonus, we can now tell if libopen-pal is linked
   statically or dynamically, and if we're linked statically, we
   assume that munmap intercept support doesn't work.

See the opal/mca/memory/ptmalloc2/README-open-mpi.txt file for all the
gory details about the implementation.

Fixes trac:1853.

This commit was SVN r20921.

The following Trac tickets were found above:
  Ticket 1853 --> https://svn.open-mpi.org/trac/ompi/ticket/1853
2009-04-01 17:52:16 +00:00
George Bosilca
8221975490 We don't need the opal_bitmap definitions here.
This commit was SVN r20918.
2009-04-01 15:25:10 +00:00
Terry Dontje
4b43911c6a Remove superfluous spaces in manpages that were causing catman to
generate mangled windex files.  Made ompi-top.1 and ompi-iof.1 build
by default.  Also, added the orte-top synonym to the ompi-top manpage.

This commit was SVN r20915.
2009-04-01 14:40:27 +00:00
Nysal Jan
aff903f39c Don't print this message by default
This commit was SVN r20914.
2009-04-01 14:31:21 +00:00
George Bosilca
c5b1bdd57c Correctly deal with the error case. The problem is tricky: the MPI standard doesn't allow
MPI_ERR_IN_STATUS to be returned from any functions that return only one completed request
(few exception here: wait_some and wait_all and the test versions). As we use an wait_all
in these send_receive functions we should convert the MPI_ERR_IN_STATUS to the real
error, i.e. the one comming from the MPI_ERROR field in the status corresponding to the
failed request.

This commit was SVN r20907.
2009-03-31 23:44:59 +00:00
George Bosilca
12ce14ec8c A possible patch for the SM problems. I moved the synchronization
after each process create it's FIFOs but before they access the
peer's FIFOs. Second, replace a one way synchronization by a real
barrier, so we know that every process is really where we expect
them to be.

This commit was SVN r20906.
2009-03-31 21:46:27 +00:00
Patrick Geoffray
278d508fa4 Missing send immediate function (NULL) in btl module definition.
Fixes trac:1859.

This commit was SVN r20905.

The following Trac tickets were found above:
  Ticket 1859 --> https://svn.open-mpi.org/trac/ompi/ticket/1859
2009-03-31 18:44:10 +00:00
George Bosilca
2b1804b0f4 Remove useless header files.
This commit was SVN r20897.
2009-03-30 23:09:18 +00:00
Jeff Squyres
b95a3d0eb9 * Remove an extraneous OBJ_CONSTRUCT
* Ensure we don't try to do opal_list_get_next() on an item we just
   deleted
 * set myaddrs = NULL when we're done with it, just for good measure

Once this is ported to OMPI v1.3 branch, it fixes
https://bugs.openfabrics.org/show_bug.cgi?id=1579.

This commit was SVN r20896.
2009-03-30 20:24:31 +00:00
Ralph Castain
cba3708893 Cleanup debugging output, remove an unnecessary re-compute of the checksum
This commit was SVN r20895.
2009-03-30 17:09:32 +00:00
Ralph Castain
d5e6104035 Continue to cleanup the csum pml module. Some minor corrections and debug output added.
This commit was SVN r20894.
2009-03-29 23:27:06 +00:00
Donald Kerr
47dc1bd493 fix #1828; rework the private data connection establishment process; reviewed by terry d.
This commit was SVN r20889.
2009-03-26 17:54:44 +00:00
Donald Kerr
27ed29a0a1 wrap linux specific steps in __linux__ define
This commit was SVN r20888.
2009-03-26 15:09:01 +00:00
Rainer Keller
6ad07dbffa - First have the _config.h header,
then include system headers based on what's defined.

This commit was SVN r20886.
2009-03-26 12:56:22 +00:00
Donald Kerr
a1ba2e2164 add Sun vendor_id to Tavor Infinihost settings
This commit was SVN r20883.
2009-03-25 19:13:55 +00:00
Donald Kerr
cd7940e208 add vendor_id
This commit was SVN r20882.
2009-03-25 17:57:24 +00:00
Pavel Shamis
d25b7203a2 Adding send_immediate (sendi) implementation to openib btl.
This commit was SVN r20881.
2009-03-25 16:53:26 +00:00
Pavel Shamis
8888c9831c Prevent segfault for case when we release SRQ before srq_create.
This commit was SVN r20872.
2009-03-25 14:18:05 +00:00
Ralph Castain
f72e3ba9f9 Update the PML base send init macro to take a converter_flag field (discussed with George).
Update the csum pml module - still not quite right, but closer.

Modify the LANL platform files to keep pace.

This commit was SVN r20859.
2009-03-24 19:12:53 +00:00
Ralph Castain
d88df53a86 A touch more cleanup. Also, bring over the peruse cleanups from r20844
This commit was SVN r20849.

The following SVN revision numbers were found above:
  r20844 --> open-mpi/ompi@daba352af4
2009-03-24 01:36:31 +00:00
Ralph Castain
78323fd6b2 Minor cleanups to compile without warnings
This commit was SVN r20848.
2009-03-24 00:54:16 +00:00
Ralph Castain
75ca19d1d1 Turn off a function that hasn't been added to the code base yet...
This commit was SVN r20847.
2009-03-23 23:56:11 +00:00
Ralph Castain
17f51a0389 Add a new PML module that acts as a "mini-dr" - when requested, it performs a dr-like checksum on messages for BTL's that require it, as specified by MCA params.
Add two new configure options that specify:

1. when to add padding to the openib control header - this *only* happens when the configure option is specified

2. when to use the dr-like checksum as opposed to the memcpy checksum. Not selectable at runtime - to eliminate performance impacts, this is a configure-only option

Also removed an unused checksum version from opal/util/crc.h.

The new component still needs a little cleanup and some sync with recent ob1 bug fixes. It was created as a separate module to avoid performance hits in ob1 itself, though most of the code is duplicative. The component is only selectable by either specifying it directly, or configuring with the dr-like checksum -and- setting -mca pml_csum_enable_checksum 1.

Modify the LANL platform files to take advantage of the new module.

This commit was SVN r20846.
2009-03-23 23:52:05 +00:00
Ralph Castain
fb2b41d40a Give up on the pcie BTL and blow it away. The drivers for this initial implementation have been too customized by IBM - too hard to re-integrate the code.
Maybe someday, someone with enough interest/time can start over...

This commit was SVN r20845.
2009-03-23 23:27:57 +00:00
George Bosilca
daba352af4 As the request is not yet updated (i.e. _MATCHED cannot be called as we don't yet know the
expected length of the message) we should use the source and tag from the message header
instead of the value from the status structure attached to the request.
-This line, and those below, will be ignored--

M    pml_ob1_recvreq.c

This commit was SVN r20844.
2009-03-23 20:25:53 +00:00
Jeff Squyres
804eb94f5f Fix the MCA param name to use the non-deprecated name.
This commit was SVN r20832.
2009-03-20 01:44:40 +00:00
Aurelien Bouteiller
fa9b6e729b Fix missing file in Makefile.am and the "CREATE FAILURE".
This commit was SVN r20821.
2009-03-18 13:42:48 +00:00
Rainer Keller
6f808d9b05 Preparation work for another commit (after RFC):
- This patch solely _adds_ required headers and is rather localized
   The next patch (after RFC) heavily removes headers (based on script)
 - ompi/communicator/communicator.h: For sources that use
   ompi_mpi_comm_world, don't require them to include "mpi.h"
 - ompi/debuggers/ompi_common_dll.c: mca_topo_base_comm_1_0_0_t needs
   #include "ompi/mca/topo/topo.h"
 - ompi/errhandler/errhandler_predefined.h:
   ompi/communicator/communicator.h depends on this header file!
   To prevent recursion just have fwd declarations.
   #include "ompi/types.h" for fwd declarations of the main structs.
 - ompi/mca/btl/btl.h: #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/mpool/base/mpool_base_tree.c: We use ompi_free_list_t and
   ompi_rb_tree_t, so have the proper classes
 - ompi/mca/op/op.h:
   Op is pretty self-contained: Nobody up to now has done
   #include "opal/class/opal_object.h"
 - ompi/mca/osc/pt2pt/osc_pt2pt_replyreq.h:
   #include "opal/types.h" for ompi_ptr_t 
 - ompi/mca/pml/base/base.h:
   We use opal_lists  
 - ompi/mca/pml/dr/pml_dr_vfrag.h:
   #include "opal/types.h" for ompi_ptr_t
 - ompi/mca/pml/ob1/pml_ob1_hdr.h:
   #include "ompi/mca/btl/btl.h" for mca_btl_base_segment_t
 - opal/dss/dss_unpack.c:
   #include "opal/types.h"
 - opal/mca/base/base.h:
   #include "opal/util/cmd_line.h" for opal_cmd_line_t
 - orte/mca/oob/tcp/oob_tcp.c:
   #include "opal/types.h" for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp.h:
   #include "opal/threads/threads.h" for opal_thread_t
 - orte/mca/oob/tcp/oob_tcp_msg.c:
   #include "opal/types.h" 
 - orte/mca/oob/tcp/oob_tcp_peer.c:
   #include "opal/types.h"  for opal_socklen_t
 - orte/mca/oob/tcp/oob_tcp_send.c:
   #include "opal/types.h" 
 - orte/mca/plm/base/plm_base_proxy.c:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT
 - orte/mca/rml/base/rml_base_receive.c:
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/mca/rml/oob/rml_oob_recv.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/mca/rml/oob/rml_oob_send.c:
   #include "opal/types.h" for ompi_iov_base_ptr_t
 - orte/runtime/orte_data_server.c
   #include "opal/util/output.h" for OPAL_OUTPUT_VERBOSE
 - orte/runtime/orte_globals.h:
   #include "orte/util/name_fns.h" for ORTE_NAME_PRINT

 Tested on Linux/x86-64

This commit was SVN r20817.
2009-03-17 21:34:30 +00:00
Rainer Keller
b9b84a9c29 - ompi/mca/mpool/rdma/mpool_rdma_module.c: At this level
without mpi.h we have no notion of MPI_SUCCESS...
 - ompi/mca/btl/sm/btl_sm.h: ptrdiff_t needs stddef.h 
 - ompi/mca/mpool/base/: If we use opal_pointer_array_t,
   better include the class header.

This commit was SVN r20816.
2009-03-17 20:21:36 +00:00
Aurelien Bouteiller
3cd5a0d833 Support for the MPI event logger improving event logging perfs.
This commit was SVN r20804.
2009-03-17 17:35:28 +00:00
Pavel Shamis
c6d038a8e8 Adding vendor_error code to error report.
This commit was SVN r20803.
2009-03-17 15:47:34 +00:00
Pavel Shamis
5afa2988f1 Updating RNR/IB timeout for openib btl
This commit was SVN r20801.
2009-03-17 15:03:06 +00:00
Donald Kerr
d29a5e57c1 remove superfluous define
This commit was SVN r20785.
2009-03-16 02:24:01 +00:00
George Bosilca
a9be1b1dde Set the mem_node to a more meaningful value, as suggested by Ake Sandgren.
This commit was SVN r20780.
2009-03-14 22:08:26 +00:00
Eugene Loh
64f52b0168 Clean up in response to code review on CMR 1825:
minor changes in comments and edge-case handling.

This commit was SVN r20774.
2009-03-13 18:11:41 +00:00
Rainer Keller
d8cf4c0fec - Get pgcc on XT to complain less:
In case we use memcmp, strlen, strup and friends include <string.h>
   Also several constants.h are not included directly
 - Let's have mca_topo_base_cart_create  return ompi-errors in
   ompi/mca/topo/base/topo_base_cart_create.c

This commit was SVN r20773.
2009-03-13 02:10:32 +00:00
Donald Kerr
ef55aae401 fix #1829 : udapl btl support for relaxed ordering
This commit was SVN r20772.
2009-03-13 01:01:00 +00:00
Rainer Keller
6fca443a71 - No, we don't want to have a notion of an MPI_Comm in this layer
We want ompi_communicator_t instead, rrrrrr.

This commit was SVN r20770.
2009-03-12 22:38:14 +00:00
George Bosilca
b29da4744f Amazing that there is only one compiler complaining about this ...
This commit was SVN r20768.
2009-03-12 21:30:08 +00:00
Rainer Keller
74b3acd4bd - No need to declare struct mca_mpool_base_resources_t;
Already in
   #include "ompi/mca/mpool/mpool.h"

This commit was SVN r20767.
2009-03-12 20:27:16 +00:00
Rainer Keller
296a6fb275 - So much fun along the way:
we normally don't do opal/include/opal/...
   Just use the std. opal/...

This commit was SVN r20766.
2009-03-12 19:21:11 +00:00
Rainer Keller
29b1b205fd - Remove two headers (and actually include rml.h) prior to test of
removal script...

This commit was SVN r20765.
2009-03-12 17:58:39 +00:00
Jeff Squyres
14ee1b7ba2 Refs trac:1826: remove barriers before all non-rooted collective ops.
This commit was SVN r20763.

The following Trac tickets were found above:
  Ticket 1826 --> https://svn.open-mpi.org/trac/ompi/ticket/1826
2009-03-12 02:23:08 +00:00
Ralph Castain
a8002c0f04 Remove missing files from Makefile.am so make dist will succeed
This commit was SVN r20743.
2009-03-06 02:57:51 +00:00
Rainer Keller
ec0ed48718 - Revert r20739
This commit was SVN r20742.

The following SVN revision numbers were found above:
  r20739 --> open-mpi/ompi@781caee0b6
2009-03-05 21:56:03 +00:00
Rainer Keller
a94438343b - Revert r20740
This commit was SVN r20741.

The following SVN revision numbers were found above:
  r20740 --> open-mpi/ompi@2a70618a77
2009-03-05 21:50:47 +00:00
Rainer Keller
2a70618a77 - Second patch, as discussed in Louisville.
Replace short macros in orte/util/name_fns.h
   to the actual fct. call.

 - Compiles on linux/x86-64

This commit was SVN r20740.
2009-03-05 21:14:18 +00:00
Rainer Keller
781caee0b6 - First of two or three patches, in orte/util/proc_info.h:
Adapt orte_process_info to orte_proc_info, and
   change orte_proc_info() to orte_proc_info_init().
 - Compiled on linux-x86-64
 - Discussed with Ralph

This commit was SVN r20739.
2009-03-05 20:36:44 +00:00
Shiqing Fan
99b415a7e0 On windows, the mca_common_* libraries should be installed in bin, otherwise the libraries that are dependent on them, e.g. shared build of mca_btl_sm, couldn't be loaded at runtime. This commit fixes the problem.
This commit was SVN r20735.
2009-03-05 14:57:35 +00:00
Ralph Castain
20b81ff634 Add the PCIE BTL. This won't actually work yet - still need to work through issues with system header files, generalize specification of resources, etc. - but it won't build unless specifically directed to do so. Meantime, any more changes that impact these areas of the code base can be reflected here rather than having to be dealt with later.
This commit was SVN r20734.
2009-03-05 02:40:25 +00:00
Rainer Keller
9dea63d63a - Last of intrusive commits (promised)... err for now.
Anyway, this is blocking the move: do not include pml.h
   if not really needed, aka none of the following used:
     mca_pml
     MCA_PML_CALL
     OMPI_ANY_TAG
     OMPI_ANY_SOURCE
     OMPI_PROC_NULL

 - Notable exceptions (deleting in one header->adding):
   - ompi/mca/mtl/psm/
   - ompi/mca/osc/rdma/
   - ompi/mca/btl/openib/btl_openib_endpoint.c depended on
     pml_base_sendreq.h

 - Tested on Linux/x86-64, this time including make check
   (thanks Jeff and Ralph)

This commit was SVN r20725.
2009-03-04 17:06:51 +00:00
Josh Hursey
b62bc63f76 Fix some compiler warnings. I was using the ompi_predefined_* types instead of the base classes.
This commit was SVN r20722.
2009-03-04 16:16:13 +00:00
Rainer Keller
fd28b392bf - An intrusive commit yet again (sorry): with the separation we
get bitten by header depending on having already included
   the corresponding [opal|orte|ompi]_config.h header.
   When separating, things like [OPAL|ORTE|OMPI]_DECLSPEC
   are missed.

   Script to add the corresponding header in front of all following
   (taking care of possible #ifdef HAVE_...)

 - Including some minor cleanups to
   - ompi/group/group.h -- include _after_ #ifndef OMPI_GROUP_H
   - ompi/mca/btl/btl.h -- nclude _after_ #ifndef MCA_BTL_H
   - ompi/mca/crcp/bkmrk/crcp_bkmrk_btl.c -- still no need for
     orte/util/output.h
   - ompi/mca/pml/dr/pml_dr_recvreq.c -- no need for mpool.h
   - ompi/mca/btl/btl.h -- reorder to fit
   - ompi/mca/bml/bml.h -- reorder to fit
   - ompi/runtime/ompi_mpi_finalize.c -- reorder to fit
   - ompi/request/request.h -- additionally need ompi/constants.h

 - Tested on linux/x86-64

This commit was SVN r20720.
2009-03-04 15:35:54 +00:00
Eugene Loh
efe8c3a283 Initialize reuse_old_request properly at the beginning of each loop iteration in pml_ob1_start.c.
This commit was SVN r20712.
2009-03-04 06:58:36 +00:00
Rainer Keller
811f2bd9b4 - As discussed on RFC, move the ompi_bitmap to the
opal layer.
   Add a check against a maximum (actually get rid of ifs internally to
   opal_bitmap.c) -- the functionality to set the current maximum size
   opal_bitmap_set_max_size() is currently only used in attribute.c
   to set the maximum OMPI_FORTRAN_HANDLE_MAX...

   Tested on linux/x86-64 with intel-tests with all_tests_no_perf_f
   run with 6 procs.
   Let's look into MTT as well...

This commit was SVN r20708.
2009-03-03 22:25:13 +00:00
Rich Graham
7ef1550267 add an index to indicate which socket group I belong to.
This commit was SVN r20672.
2009-03-02 14:39:54 +00:00
Rich Graham
daf7673aff gather socket information - not debugged.`
This commit was SVN r20670.
2009-03-02 10:58:12 +00:00
Rainer Keller
02416033ad - Get rid of warning on function declarations:
First "static inline", then the type

This commit was SVN r20657.
2009-02-28 14:15:34 +00:00
Tim Mattox
57be80c983 First pass at integrating the CIFTS/FTB support as
a notifier module.
The Notifier framework was extended slightly to
convey more information about each event notice.
This works with the FTB v0.5 API.

To compile with FTB support, use --with-ftb=/path/to/ftb/install

CIFTS == Coordinated Infrastructure for Fault Tolerant Systems
FTB == Fault Tolerance Backplane
see http://wiki.mcs.anl.gov/cifts/index.php

This commit was SVN r20655.
2009-02-27 22:53:43 +00:00
George Bosilca
e181ba50c9 Stop valgrind from complaining about few uninitialized bytes on the PML
headers. This feature is enabled only in debug mode when the heterogeneous
support is enabled.

This commit was SVN r20648.
2009-02-27 05:24:06 +00:00
Eugene Loh
ffb35a1b6c Exposed mca_btl_sm_sendi() to the PML so that it will be used. Reviewed the code.
Added a few comments and changed the return code after the FIFO write to be SUCCESS,
even if the FIFO write indicated an error.  Such an error would only mean that the
FIFO was full, but the FIFO-write operation would still be queued.  Therefore, the
PML should think of this as successful.

This commit was SVN r20644.
2009-02-26 18:10:50 +00:00
Josh Hursey
e46c512ee7 Fix a couple of missing headers resulting from recent cleanup
This commit was SVN r20643.
2009-02-26 16:56:56 +00:00
Rainer Keller
4c0e8e1e69 - Header orte/mca/oob/base/base.h is probably the wrong one to include
anyhow -- if oob functionality is neededm then orte/mca/oob/oob.h

   Nevertheless compiles fine with -Wimplicit-function-declaration   

This commit was SVN r20641.
2009-02-26 04:20:03 +00:00
Rainer Keller
04567d3af0 - Header orte/mca/errmgr/errmgr.h is not needed.
Once again compiles fine with -Wimplicit-function-declaration   

This commit was SVN r20640.
2009-02-26 04:05:30 +00:00
Rainer Keller
96e1b9b747 - Header orte/mca/rml/rml.h is not needed if no occurence of orte_rml
or ORTE_RML.
   As the others compiles fine with -Wimplicit-function-declaration

This commit was SVN r20639.
2009-02-26 03:52:31 +00:00
Rainer Keller
224d89a353 - There sure is no local stdio.h header file.
Take the system header file...

This commit was SVN r20637.
2009-02-26 02:17:29 +00:00
Rainer Keller
b9f9cd8174 - Missed an occurence of ompi/info/info.h
This commit was SVN r20636.
2009-02-26 02:15:40 +00:00
Rainer Keller
985648086d - Header ompi/info/info.h is not needed here.
This commit was SVN r20635.
2009-02-26 02:00:39 +00:00
Rainer Keller
b356e90fa1 - Get rid of include orte/util/proc_info.h, if not needed
Only proc_info.h-internal include file is opal/dss/dss_types.h
 - In one case (orte/util/hnp_contact.c) had to add proc_info.h again.
 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   works fine, no errors.

   Again, let's have MTT the last word.

This commit was SVN r20631.
2009-02-25 03:38:00 +00:00
Terry Dontje
0178b6c45f Added padding to predefined handle structures to maintain library version to
version compatibility.

This commit was SVN r20627.
2009-02-24 17:17:33 +00:00
Shiqing Fan
2148220ce4 Update the share libs dependency for windows build.
This commit was SVN r20625.
2009-02-23 17:49:46 +00:00
Josh Hursey
cde4ab5c32 Forgot another btl_base_close per r20617
Things should be working fine now with openib.

This commit was SVN r20618.

The following SVN revision numbers were found above:
  r20617 --> open-mpi/ompi@d460264c79
2009-02-22 15:24:38 +00:00
Josh Hursey
d460264c79 Fix C/R support in response to r20586. This commit changed the way that bml/r2 finalized, so the C/R support needed to be updated otherwise the BTLs were not properly handled on restart.
This commit was SVN r20617.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
  r20586 --> open-mpi/ompi@14a83a6bbc
2009-02-21 13:42:17 +00:00
Eugene Loh
463f11f993 Improve shared-memory allocation:
* compute mmap-file size more wisely and pass requested size to allocator
* change MCA parameters:
  - get rid of mpool_sm_per_peer_size
  - get rid of mpool_sm_max_size
  - set default mpool_sm_min_size to 0
* no longer pad sm allocations to page boundaries
* have sm_btl_first_time_init check return codes on free-list creations

Have mca_btl_sm_prepare_src() check to see if it can allocate an EAGER fragment
rather than a MAX fragment if the smaller size works.

Remove ompi/class/ompi_[circular_buffer_]fifo.h and references thereto.

Remove opal/util/pow2.[c|h] and references thereto.

This commit was SVN r20614.
2009-02-20 19:51:57 +00:00
Rainer Keller
02599446d0 - Occurences of ORTE_PROC_MY_NAME require orte/runtime/orte_globals.h
This commit was SVN r20607.
2009-02-20 03:16:13 +00:00
Rainer Keller
32b7189995 - Make usage of BTL_OUTPUT
This commit was SVN r20606.
2009-02-20 03:05:14 +00:00
George Bosilca
97a2296fdd Correct the GET protocol. Thanks to Mike Dubman for finding the problem and
testing my patch.

This commit was SVN r20591.
2009-02-19 16:00:15 +00:00
Jeff Squyres
b8259ba500 Remove unused variable. Thanks for the heads-up, Ralph!
This commit was SVN r20587.
2009-02-19 13:59:38 +00:00
Jeff Squyres
14a83a6bbc Clean up the BML shutdown. Reviewed by George.
This commit was SVN r20586.
2009-02-19 13:17:01 +00:00
Jeff Squyres
3742c3550c Add "sync" collective component. This component is totally
deactivated by default.  It is activated by setting either of the
following two MCA parameters to values greater than 0:

 * coll_sync_barrier_before
 * coll_sync_barrier_after

If !_before is >0, then the sync coll collective will insert itself
before the underlying collective operations and invoke a barrier
before every Nth barrier (N == coll_sync_barrier_before).  Similar for
!_after.  Note that N is a _per communicator_ value; not global to the
MPI process.

If both are 0 (which is the default), this component returns NULL for
the comm query, meaning that it is not insertted into the coll module
stack. 

The intent of this component is to provide a a workaround for
applications with large numbers of collectives of short messages that
can cause unbounded unexpected messages.  Specifically, it is possible
for some iterative collective communication patterns to cause
unbounded unexpected messages.  Forcing a barrier before or after
every Nth collective operation would prevent that behavior by forcing
applications to synchronize (and thereby consume any outstanding
unexpected messages caused by collectives on the same communicator).

Open MPI still needs to bound unexpected messages resource consumption
at the receiver, but this is a viable workaround for at least some
symptoms of the problem.

Additionally, there has been anecdotal evidence of some applications
that "perfom better" when they put barriers after other collective
operations.  This could be due to many factors -- including shortening
the unexpected message queue.  Putting this component in Open MPI
allows people to try this with their own applications and give real
world feedback on this kind of behavior.

This commit was SVN r20584.
2009-02-18 23:32:44 +00:00
Jeff Squyres
563e989b6d Use a bit more friendly language. :-)
This commit was SVN r20583.
2009-02-18 22:12:42 +00:00
George Bosilca
15b60941f3 Cast the req to an opal_list_item_t*
This commit was SVN r20581.
2009-02-18 02:33:37 +00:00
George Bosilca
21f8eba620 There was nothing in item to be added to any list. Instead add
the request that we just removed.

This commit was SVN r20580.
2009-02-18 02:15:57 +00:00
George Bosilca
1b1ed0da37 Always set the frag to NULL.
This commit was SVN r20579.
2009-02-18 02:15:09 +00:00
Eugene Loh
5bbf5ba7d7 First putback of some sm BTL latency optimizations:
* The main thing done here is to convert from multiple FIFOs/queues per
  receiver (each receiver has one FIFO for each sender) to a single FIFO/queue
  per receiver (all senders sharing the same FIFO for a given receiver).
* This requires rewriting the FIFO support, so that
  ompi/class/ompi_[circular_buffer_]fifo.h is no longer used and FIFO
  support is instead in btl_sm.h.
* The number of FIFOs per receiver is actually an MCA tunable parameter,
  but it appears that 1 or possibly 2 FIFOs (even for 112 local processes)
  per receiver is sufficient.

This commit was SVN r20578.
2009-02-17 15:58:15 +00:00
Jeff Squyres
265ac096e8 Restore a few #include's
This commit was SVN r20559.
2009-02-14 15:21:28 +00:00
Rainer Keller
d81443cc5a - On the way to get the BTLs split out and lessen dependency on orte:
Often, orte/util/show_help.h is included, although no functionality
   is required -- instead, most often opal_output.h, or               
   orte/mca/rml/rml_types.h                                           
   Please see orte_show_help_replacement.sh commited next.            

 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   actually showed two *missing* #include "orte/util/show_help.h"     
   in orte/mca/odls/base/odls_base_default_fns.c and                  
   in orte/tools/orte-top/orte-top.c                                  
   Manually added these.                                              

   Let's have MTT the last word.

This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Jeff Squyres
8b29e27ead Some minor valgrind-inspired cleanups: fix some memory leaks
This commit was SVN r20543.
2009-02-13 03:45:32 +00:00
Jeff Squyres
91415c2996 Some minor valgrind-inspired cleanups: fix some memory leaks
This commit was SVN r20542.
2009-02-13 03:45:11 +00:00
Jeff Squyres
c83ef674e3 Some minor valgrind-inspired cleanups: fix some memory leaks.
Also took the opprotunity to convert the rdma mpool to use the MCA
register function.

This commit was SVN r20541.
2009-02-13 03:44:29 +00:00