1
1
Граф коммитов

76 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
1b564f62bd Revert "Merge pull request #275 from hjelmn/btlmod"
This reverts commit ccaecf0fd6, reversing
changes made to 6a19bf85dd.
2014-11-19 23:22:43 -07:00
Nathan Hjelm
29e4e1c90a Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma 2014-11-19 11:33:03 -07:00
Nathan Hjelm
eed7b45db5 osc/rdma: fix issue identified by Berk Hess
osc/rdma uses counters to determine if all messages have been received
before exiting synchronization calls. The problem is that the active
target counter is always increasing (never zeroed). If over 2^31-1
messages are sent this causes the counter to overflow (in itself this
isn't an error). This causes test/wait to return before the communication
is complete. There is an additional error in the use of the fragment
flush function. If PSCW synchronization is in use this function CAN NOT
be called unless a post message has arrived.

Relevant mailing list thread: http://www.open-mpi.org/community/lists/devel/2014/10/16016.php

This commit fixes both issues. Tested against MTT and issue reproducer.

Closes #224.
2014-10-07 11:45:22 -06:00
Gilles Gouaillardet
5f1e0f284a Fix compilation when --enable-hetorogeneous
This commit was SVN r32410.
2014-08-04 10:35:08 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Nathan Hjelm
7f20868179 osc/rdma: ensure matching of post/start calls
The post and start window calls are supposed to be matching. The code
did not check to see that an incoming post matched with the start call.
This commit fixes the bug by placing the post on a pending list that
will be checked by the next call to start.

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r32017.
2014-06-17 15:23:06 +00:00
Nathan Hjelm
927098d567 osc/rdma: fix hang when accumulating with MPI_REPLACE
The replace callback did not increment the incoming frag counter. This
leads to a hang during synchronization. This commit adds the increment
and also puts the request on the garbage collection list to fix a leak.

This fixes a hang found when running the mpich test suite.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32016.
2014-06-17 14:53:29 +00:00
Nathan Hjelm
7f6de57653 osc/rdma: fix accumulate fragment size calculation
The wrong type was used when calculating the amount of space needed
for an accumulate fragment. Fixed the calculation and took the
opportunity to eliminate the get_acc header as it is identical to the
acc header.

This fixes trac:4719 and #4718

Tracking these fixes for 1.8.2 in this CMR.

Throwing this to Brad for review as he is the one who ran into the issue.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32015.

The following Trac tickets were found above:
  Ticket 4719 --> https://svn.open-mpi.org/trac/ompi/ticket/4719
2014-06-17 14:53:24 +00:00
Nathan Hjelm
a31bfbeb2c osc/rdma: fix typo in get accumulate path
There was a typo in the ompi_osc_gacc_long_start that was causing a
segmentation fault when executing long get accumulate operations.

cmr=v1.8.1:reviewer=jsquyres

This commit was SVN r31353.
2014-04-08 21:54:52 +00:00
Nathan Hjelm
fdf4c3b900 osc/rdma: really fix active message support
The last fix prevented a hang but had some cases where the results were
wrong. Fixed. Tested with armci, openmpi/ibm, openmpi/onesided.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31284.
2014-03-28 22:06:16 +00:00
Nathan Hjelm
ee7a1478ee osc/rdma: fix test/wait hang
There are differences between how active and passive messages are
accounted for in this component. Active message counts on the sender
side are set to zero before the control message is sent so we do not
have to add one to the expected number of messages or we end up
double counting the control message. This commit should fix that error.

Fixes regression in one-sided/test_rma1

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31281.
2014-03-28 20:49:20 +00:00
Nathan Hjelm
0d703759f6 osc/rdma: fix possible error when encountering accumulate lock contention
It is possible to get into a situation where a small accumulate operation
can not be completed because a large accumulate operation holds the lock.
In this case we may return from wait/flush/etc before the operation is
complete. To handle this case increment the expected incoming fragment
count when queuing an accumulate operation and increment the incoming
fragment count after processing the accumulate operation.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31224.
2014-03-25 21:00:43 +00:00
Nathan Hjelm
d681eb4655 osc/rdma: fix warnings introduced by r31204
cmr=v1.8:ticket=trac:4449

This commit was SVN r31221.

The following SVN revision numbers were found above:
  r31204 --> open-mpi/ompi@949abe45cd

The following Trac tickets were found above:
  Ticket 4449 --> https://svn.open-mpi.org/trac/ompi/ticket/4449
2014-03-25 21:00:19 +00:00
Nathan Hjelm
949abe45cd osc: fix datatype related issues in the one-sided code
This commit fixes two issues:

 - osc/rdma: The target side of an accumulate was using the target datatype
   in the receive to the packed buffer. This was conflicting with the way
   the reduction is done into the target buffer. Changed the receive to use
   the primitive datatype.

 - osc/base: The copy table was completely wrong. Fixed the table to match
   the underlying datatypes (which are opal not ompi datatypes).

 - osc/base: There is a problem using the optimized description. Fall back
   on using the non-optimized description until we can understand what is
   going wrong.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31204.
2014-03-25 15:28:48 +00:00
Nathan Hjelm
bc55276844 osc/rdma: fix bug in the active message code that could cause erroneous
results

The code to handle completion messages did not correctly increment the
number of expected messages. This could cause wait to return before all
incoming messages are complete.

I also added a check to ensure that start returns an error if we are in
a passive access epoch.

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31203.
2014-03-25 15:28:36 +00:00
Nathan Hjelm
0ed44f2fdb osc/rdma: add support for datatypes with large descriptions
This commit adds large datatype description support to the osc/rdma
component. Support is provided by an additional send/recv of the datatype
description if the description does not fit in an eager buffer. The
code is designed to require minimal new code and not for speed. We
consider this code path to be a slow path.

Refs trac:1905

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31197.

The following Trac tickets were found above:
  Ticket 1905 --> https://svn.open-mpi.org/trac/ompi/ticket/1905
2014-03-24 18:57:29 +00:00
Nathan Hjelm
e70809e169 osc/rdma: fix the spelling of incoming
cmr=v1.7.5:ticket=trac:4379

This commit was SVN r31050.

The following Trac tickets were found above:
  Ticket 4379 --> https://svn.open-mpi.org/trac/ompi/ticket/4379
2014-03-12 21:43:23 +00:00
Nathan Hjelm
1fc9a55d08 osc/rdma: do not use MPI_SOURCE to determine the peer in an send operation.
This fixes a bug in r31029 which removes the use of the pml base request
(also not a good way since cm doesn't use the base request). We now allocate
a data structure (ugh) to determine the needed information. Tested with
mtt/onesided.

cmr=v1.7.5:ticket=trac:4379

This commit was SVN r31044.

The following SVN revision numbers were found above:
  r31029 --> open-mpi/ompi@29e00f9161

The following Trac tickets were found above:
  Ticket 4379 --> https://svn.open-mpi.org/trac/ompi/ticket/4379
2014-03-12 17:14:11 +00:00
Nathan Hjelm
29e00f9161 osc/rdma: fix issues with mpi_leave_pinned when using rdma capable btls
It seems we can't release accumulate buffers in completion callbacks
because the btls don't release registration resources until after the
callback has fired. The fix is to keep track of the unused buffers and
free them later. This should resolve issues when running IMB-EXT and
IMB-RMA.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31029.
2014-03-12 14:39:03 +00:00
Nathan Hjelm
cbb531ed13 osc/rdma: use OPAL_ALIGN macro
cmr=v1.7.5:ticket=trac:4357

This commit was SVN r30975.

The following Trac tickets were found above:
  Ticket 4357 --> https://svn.open-mpi.org/trac/ompi/ticket/4357
2014-03-10 18:57:20 +00:00
Nathan Hjelm
5df8cd75a9 osc/rdma: ensure fragment headers and the packed datatype are 8-byte aligned.
The datatype unpacking code assumes that the packed datatype buffer has the
same alignment as an OPAL_PTRDIFF_TYPE. This was not enforced by the rdma
one-sided component. I changed the ordering and sized of various osc/rdma
headers to ensure their sizes are a multiple of 8-bytes and modified the
fragment allocation call to ensure all headers are 8-byte aligned. While
not the cleanest way to handle this situation it should resolve the issue.

Fixes trac:4315

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30974.

The following Trac tickets were found above:
  Ticket 4315 --> https://svn.open-mpi.org/trac/ompi/ticket/4315
2014-03-10 18:11:22 +00:00
Nathan Hjelm
85515f2587 osc/rdma: silence warning
cmr=v1.7.5:ticket=trac:4355

This commit was SVN r30970.

The following Trac tickets were found above:
  Ticket 4355 --> https://svn.open-mpi.org/trac/ompi/ticket/4355
2014-03-10 16:11:25 +00:00
Yossi Etigin
b04a2339c5 Fix segmentation fault when osc_rdma is used with pml_cm: osc_rdma
assumes the send request is derived from mca_pml_base_send_request_t,
but this is not true for pml cm, so we end up freeing invalid pointer.
 We cannot take the data pointer from the pml send request, so we pass 
the allocated buffer pointer in req_complete_cb_data, and put the 
osc_rdma_module pointer in that buffer as well.
 Previously, osc_pt2pt was used with pml_cm which didn't have this 
problem.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30967.
2014-03-10 15:21:37 +00:00
Nathan Hjelm
5a4037df4f osc/rdma: fix typo in rdma osc component.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30931.
2014-03-04 16:57:56 +00:00
Nathan Hjelm
acbd6032f9 Helps to include the correct header.
cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30821.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-25 19:14:48 +00:00
Nathan Hjelm
5edacac301 osc/rdma: add missing include
cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30820.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-25 19:11:19 +00:00
Ralph Castain
49d938de29 Merge one-sided updates to the trunk - written by Brian Barrett and Nathan Hjelmn
cmr=v1.7.5:reviewer=hjelmn:subject=Update one-sided to MPI-3

This commit was SVN r30816.
2014-02-25 17:36:43 +00:00
Brian Barrett
16a1166884 Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a
configure-time dynamic allocation of flags.  The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process.  Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.

This commit was SVN r29100.
2013-08-30 16:54:55 +00:00
Ralph Castain
5d1fa4fa0e Silence warnings:
osc_pt2pt_data_move.c: In function 'ompi_osc_pt2pt_sendreq_recv_accum_long_cb':
osc_pt2pt_data_move.c:643:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
osc_rdma_data_move.c: In function 'ompi_osc_rdma_control_send_cb':
osc_rdma_data_move.c:1312:37: warning: variable 'header' set but not used [-Wunused-but-set-variable]

This commit was SVN r29092.
2013-08-29 20:56:36 +00:00
George Bosilca
dc9352faf6 Remove some unused variables.
This commit was SVN r28726.
2013-07-05 13:31:54 +00:00
Nathan Hjelm
9d4a26f47d Update OMPI frameworks to use the MCA framework system.
Notes:
  - This commit also eliminates the need for an available components list in use
    in several frameworks. None of the code in question was making use of the
    priority field of the priority component list item so these extra lists were
    removed.
  - Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
  - Cleans up the ompi/orte-info functions. Expose the functions that construct the
    list of params so they can be used elsewhere.

patches for mtl/portals4 from brian

missed a few output variables in openib

This commit was SVN r28241.
2013-03-27 21:17:31 +00:00
Nathan Hjelm
249066e06d Timeout! Per RFC update the BTL interface to hide segment keys. All BTLs (with the exception of wv), all relevant PMLs, and osc/rdma have been updated for the new interface.
This commit was SVN r26626.
2012-06-21 17:09:12 +00:00
Nathan Hjelm
8962ce25b0 fixed some compiler errors caused by seg_key changes. osc/rdma may need to be updated to use btls that use 128 bit segment keys
This commit was SVN r25448.
2011-11-06 20:19:14 +00:00
Terry Dontje
fbda6aaf89 Fixes trac:2532 issues with 32-bit binaries
This commit was SVN r24891.

The following Trac tickets were found above:
  Ticket 2532 --> https://svn.open-mpi.org/trac/ompi/ticket/2532
2011-07-13 16:38:03 +00:00
Shiqing Fan
1ed0f40d35 Fix a few type casts on Windows.
This commit was SVN r24857.
2011-07-06 08:08:53 +00:00
Brian Barrett
a4b2bd903b * Implement long-ago discussed RFC to add a callback data pointer in the
request completion callback
* Use the completion callback pointer to remove all need for opal_progress
  calls in the one-sided layer

This commit was SVN r24848.
2011-06-30 20:05:16 +00:00
Eugene Loh
2770a12beb Continue clean up of thread options started in r22841, 22842, and 22849.
No need for any CMRs to 1.5... that was already done in CMR 2728.

This commit was SVN r24545.

The following SVN revision numbers were found above:
  r22841 --> open-mpi/ompi@b400b84162
2011-03-18 21:36:35 +00:00
Rolf vandeVaart
91c1ee86d7 Fix for fix of fix for handling misalignment when sending
onesided multifrag.

This fixes trac:2532.

This commit was SVN r23760.

The following Trac tickets were found above:
  Ticket 2532 --> https://svn.open-mpi.org/trac/ompi/ticket/2532
2010-09-16 18:58:11 +00:00
Rolf vandeVaart
47940f2aa0 Fix the fix (r23649) for ticket 2532. We were neglecting to
update the remain_len field for the buffer.

This really fixes ticket #2532.

This commit was SVN r23706.

The following SVN revision numbers were found above:
  r23649 --> open-mpi/ompi@f42c2a737f
2010-09-01 14:12:08 +00:00
Ethan Mallove
f42c2a737f Fixes trac:2532 - "MPI_Put can result in SIGBUS on SPARC"
Reviewed by Rolf V and Brian B

This commit was SVN r23649.

The following Trac tickets were found above:
  Ticket 2532 --> https://svn.open-mpi.org/trac/ompi/ticket/2532
2010-08-24 18:10:43 +00:00
Rainer Keller
6c5532072a - Split the datatype engine into two parts: an MPI specific part in
OMPI
   and a language agnostic part in OPAL. The convertor is completely
   moved into OPAL.  This offers several benefits as described in RFC
   http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
   namely:
    - Fewer basic types (int* and float* types, boolean and wchar
    - Fixing naming scheme to ompi-nomenclature.
    - Usability outside of the ompi-layer.
 - Due to the fixed nature of simple opal types, their information is
   completely
   known at compile time and therefore constified
 - With fewer datatypes (22), the actual sizes of bit-field types may be
   reduced
   from 64 to 32 bits, allowing reorganizing the opal_datatype
   structure, eliminating holes and keeping data required in convertor
   (upon send/recv) in one cacheline...
   This has implications to the convertor-datastructure and other parts
   of the code.
 - Several performance tests have been run, the netpipe latency does not
   change with
   this patch on Linux/x86-64 on the smoky cluster.
 - Extensive tests have been done to verify correctness (no new
   regressions) using:
   1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
    ompi-ddt:
    a. running both trunk and ompi-ddt resulted in no differences
       (except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
       correctly).
    b. with --enable-memchecker and running under valgrind (one buglet
       when run with static found in test-suite, commited)
   2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
      all passed (except for the dynamic/ tests failed!! as trunk/MTT)
   3. compilation and usage of HDF5 tests on Jaguar using PGI and
      PathScale compilers.
   4. compilation and usage on Scicortex.
 - Please note, that for the heterogeneous case, (-m32 compiled
   binaries/ompi), neither
   ompi-trunk, nor ompi-ddt branch would successfully launch.

This commit was SVN r21641.
2009-07-13 04:56:31 +00:00
Greg Koenig
60485ff95f This is a very large change to rename several #define values from
OMPI_* to OPAL_*.  This allows opal layer to be used more independent
from the whole of ompi.

NOTE: 9 "svn mv" operations immediately follow this commit.

This commit was SVN r21180.
2009-05-06 20:11:28 +00:00
Brian Barrett
7f898d4e2b * Make rdma the default. Somehow, the code didn't match what was supposed
to happen
* Properly error out (rather than cause buffer overflow) in case where
  the datatype packed description is larger than our control fragments.
  This still isn't standards conforming, but at least we know what
  happened.
* Expose win_set_name to external libraries (like the osc modules)
* Set default window name to the CID of the communcator it's using
  for communication

Refs trac:1905

This commit was SVN r21134.

The following Trac tickets were found above:
  Ticket 1905 --> https://svn.open-mpi.org/trac/ompi/ticket/1905
2009-04-30 22:36:09 +00:00
Rainer Keller
9dea63d63a - Last of intrusive commits (promised)... err for now.
Anyway, this is blocking the move: do not include pml.h
   if not really needed, aka none of the following used:
     mca_pml
     MCA_PML_CALL
     OMPI_ANY_TAG
     OMPI_ANY_SOURCE
     OMPI_PROC_NULL

 - Notable exceptions (deleting in one header->adding):
   - ompi/mca/mtl/psm/
   - ompi/mca/osc/rdma/
   - ompi/mca/btl/openib/btl_openib_endpoint.c depended on
     pml_base_sendreq.h

 - Tested on Linux/x86-64, this time including make check
   (thanks Jeff and Ralph)

This commit was SVN r20725.
2009-03-04 17:06:51 +00:00
Terry Dontje
0178b6c45f Added padding to predefined handle structures to maintain library version to
version compatibility.

This commit was SVN r20627.
2009-02-24 17:17:33 +00:00
Rainer Keller
d81443cc5a - On the way to get the BTLs split out and lessen dependency on orte:
Often, orte/util/show_help.h is included, although no functionality
   is required -- instead, most often opal_output.h, or               
   orte/mca/rml/rml_types.h                                           
   Please see orte_show_help_replacement.sh commited next.            

 - Local compilation (Linux/x86_64) w/ -Wimplicit-function-declaration
   actually showed two *missing* #include "orte/util/show_help.h"     
   in orte/mca/odls/base/odls_base_default_fns.c and                  
   in orte/tools/orte-top/orte-top.c                                  
   Manually added these.                                              

   Let's have MTT the last word.

This commit was SVN r20557.
2009-02-14 02:26:12 +00:00
Brian Barrett
cfc400eb57 * Enable eager sending for Accumulate
* If the accumulate is local, make it short-circuit the request path.  Accumulate requires local
  ops due to its window rules, so this is likely to help a bunch (on the codes I"m messing
  with at least)
* Due a better job at flushing everything that can go out on the wire in a resource constrained problem
* Move some debugging values around to make large problems somewhat easier to deal with

This commit was SVN r20277.
2009-01-14 20:15:15 +00:00
Brian Barrett
e1f40c6a71 Fixes to make the rdma osc component work again:
* Don't overwrite the des_flags field, removing the
    all important always callback field
  * Fix up return status of bml_base_send, since
    the rest of the code expects OMPI_SUCCESS or
    an error code

This commit was SVN r20178.
2009-01-01 23:48:29 +00:00
George Bosilca
00d24bf8ab Scalability patch, or slim-fast effect #1. All BML structures just
got a whole lot smaller, decreasing the memory footprint of the
running application. How much it's a good question. Here is a
breakdown:

- in mca_bml_base_endpoint_t: 3 *size_t + 1 * uint32_t
- in mca_bml_base_btl_t: 1 * int + 1 * double - 1 * float
                         + 6 * size_t + 9 * (void*)

The decrease in mca_bml_base_endpoint_t is for each peer and the
decrease in mca_bml_base_btl_t is for each BTL for each peer.
So, if we consider the most convenient case where there is only
one network between all peers, this decrease the memory foot print
per peer by
9*size_t + 9*(void*) + 2 * int32_t + 1 * double - 1 * float.
On a 64 bits machine this will be 156 bytes per peer.

Now we access all these fields directly from the underlying BTL
structure, and as this structure is common to multiple BML endpoint,
we are a lot more cache friendly. Even if this do not improve the
latency, it makes the SM performance graph a lot smoother.

This commit was SVN r19659.
2008-09-30 21:02:37 +00:00
Ralph Castain
9613b3176c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP.
After much work by Jeff and myself, and quite a lot of discussion, it has become clear that we simply cannot resolve the infinite loops caused by RML-involved subsystems calling orte_output. The original rationale for the change to orte_output has also been reduced by shifting the output of XML-formatted vs human readable messages to an alternative approach.

I have globally replaced the orte_output/ORTE_OUTPUT calls in the code base, as well as the corresponding .h file name. I have test compiled and run this on the various environments within my reach, so hopefully this will prove minimally disruptive.

This commit was SVN r18619.
2008-06-09 14:53:58 +00:00