1
1

623 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
033894b493 Merge pull request #541 from hjelmn/c99_components
C99 component initialization
2015-04-20 10:45:39 -06:00
Nathan Hjelm
d251fa1525 pml/ob1: fix heterogenous build
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-20 09:27:00 -06:00
Nathan Hjelm
df75d0382f ompi: use C99 subobject naming for component initialization
This commit helps future-proof ompi components by initializing each
component member by name.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
adrianreber
714d9aa67e Merge pull request #348 from adrianreber/topic/orte_cr_continue_like_restart
Topic/orte cr continue like restart
2015-03-12 14:54:02 +01:00
Adrian Reber
c08e234af7 FT: fix compilation using --with-ft (5/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

With the changes introduced in the previous patches in this series
some goto constructs for cleanup are no longer necessary and removed.
2015-03-11 14:23:33 +01:00
Adrian Reber
1c5a8df724 FT: fix compilation using --with-ft (2/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

The FT code used barrier mechanisms which have been removed
with aec5cd08bd8c33677276612b899b48618d271efa. This patch replaces
all those different barriers with opal_pmix.fence(NULL, 0);
I am not sure this is completely correct but at least a starting
point for a review.
2015-03-11 14:23:33 +01:00
Adrian Reber
f45dd069bd FT: fix compilation using --with-ft (1/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

This first patch moves orte_cr_continue_like_restart from ORTE
to opal_cr_continue_like_restart in OPAL. This only leaves three
calls from OPAL to ORTE in the FT code. As it is not yet 100%
clear how to handle these calls the code orte_sstore.set_attr()
has been #ifdef'd out for now.
2015-03-11 14:23:33 +01:00
Nathan Hjelm
3d32dbd793 btl/openib: cuda: fix CUDA-aware support with async copy
This commit should resolve an issue seen with CUDA-aware support. The
problem came in with BTL 3.0. Before 3.0 the size of the copy was
stored in the incoming segment's des_remote_count field. This field
does not exist in BTL 3.0 so I stored the value in the
des_segment_count field. This caused problems with the cuda support
code. To fix the issue the endpoint pointer is now stored in the in
fragment's endpoint pointer which free's up the segment's des_cbdata
pointer for storing the transfer size.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-03-10 14:38:12 -06:00
Rolf vandeVaart
30e9dd5066 Look in extra rdma array to find bml. This is needed with recent BML changes. Only affects CUDA-aware code. 2015-02-27 09:02:21 -05:00
George Bosilca
3fd8dc099d Revert "This function is now useless."
This reverts commit 0871c5c4895628ca85c3dadad37d63a05810ac06.
2015-02-26 17:54:46 -05:00
George Bosilca
7f90cedf23 Revert "Fix the logic for computing the different weights for each BTLs. This"
This reverts commit de118609eccfc9231f8c508fdc5edbed93d395b4.
2015-02-26 17:54:31 -05:00
George Bosilca
f3b58006c8 Merge branch 'master' of github.com:open-mpi/ompi 2015-02-25 12:01:35 -05:00
Jeff Squyres
c3381150de ob1: fix another PERUSE compile error 2015-02-25 05:53:12 -08:00
Nathan Hjelm
0ac2f08460 pml/ob1: fix peruse compile error
Fixes #416
2015-02-24 15:39:46 -07:00
Nathan Hjelm
5f1254d710 Update code base to use the new opal_free_list_t
Use of the old ompi_free_list_t and ompi_free_list_item_t is
deprecated. These classes will be removed in a future commit.

This commit updates the entire code base to use opal_free_list_t and
opal_free_list_item_t.

Notes:

OMPI_FREE_LIST_*_MT -> opal_free_list_* (uses opal_using_threads ())

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-24 10:05:45 -07:00
Howard Pritchard
bf89131f9e add owner files to opa/ompi/orte mca directories
This commit adds an owner file in each of the component directories
for each framework.  This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page.  Currently there are two
"fields" in the file, an owner and a status.  A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
2015-02-22 15:10:23 -07:00
George Bosilca
0871c5c489 This function is now useless. 2015-02-21 16:38:17 -05:00
George Bosilca
de118609ec Fix the logic for computing the different weights for each BTLs. This
removes the call to qsort, as the BTLs are already sorted based on
their respective bandwidth.
2015-02-21 16:37:18 -05:00
Rolf vandeVaart
dbd0064713 Fix bug in CUDA-aware and GDR introduced by refactoring 2015-02-18 17:44:28 -05:00
Nathan Hjelm
3847025540 pml/ob1: when using btl_get try to register the entire region before attempting to break the get into multiple rdma fragments
A little background. Historically ob1 always registered the entire memory
region when the RGET protocol was in use. This changed when Mellanox
added support to fragment RGET using the btl_prepare_dst function. Now
that the BTL layer has changed to split out the limits of get/put there
is explicit fragmentation code in ob1. Before this commit the registration
was still done per RGET fragment.

This commit will attempt to register the entire region before creating
RGET fragments. If the registration is successfull then all RGET
fragments will use this registration otherwise they will each attempt
to register their own segment of the receive buffer. If that fails
enough times each fragment will give up and fall back on send/recv.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-13 11:46:37 -07:00
Nathan Hjelm
c4a0e02261 pml/ob1: update for BTL 3.0 interface
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-13 11:46:37 -07:00
George Bosilca
df0512550e The extent of the datatype is irrelevant for deciding to do an immediate
send as long as we have to pack.
2015-01-19 02:23:12 -05:00
Gilles Gouaillardet
d14daf40d0 ob1: correctly handle types in which size > extent
do not send inline if extent*count *OR* size*count are greater than 256
2015-01-19 14:07:23 +09:00
Howard Pritchard
3fc7b389ff initial async progress changes for gni 2014-12-24 11:50:23 -07:00
Nathan Hjelm
1b564f62bd Revert "Merge pull request #275 from hjelmn/btlmod"
This reverts commit ccaecf0fd6c862877e6a1e2643f95fa956c87769, reversing
changes made to 6a19bf85dde5306f559f09952cf3919d97f52502.
2014-11-19 23:22:43 -07:00
Nathan Hjelm
0110603782 ob1 warning fix 2014-11-19 11:33:04 -07:00
Nathan Hjelm
24427639b6 Fix ob1 warnings 2014-11-19 11:33:03 -07:00
Nathan Hjelm
271818f887 pml/ob1: bug fixes and adjustments for changes in btl_sendi behavior 2014-11-19 11:33:03 -07:00
Nathan Hjelm
ee2b111011 Update PML for latest BTL update 2014-11-19 11:33:02 -07:00
Nathan Hjelm
c61e017177 pml: updates to reflect member changes in mca_btl_base_descriptor_t
and mca_btl_base_module_t structures
2014-11-19 11:33:02 -07:00
Nathan Hjelm
5936411a07 pml/ob1: when using btl_get try to register the entire region before
attempting to break the get into multiple rdma fragments

A little background. Historically ob1 always registered the entire memory
region when the RGET protocol was in use. This changed when Mellanox
added support to fragment RGET using the btl_prepare_dst function. Now
that the BTL layer has changed to split out the limits of get/put there
is explicit fragmentation code in ob1. Before this commit the registration
was still done per RGET fragment.

This commit will attempt to register the entire region before creating
RGET fragments. If the registration is successfull then all RGET
fragments will use this registration otherwise they will each attempt
to register their own segment of the receive buffer. If that fails
enough times each fragment will give up and fall back on send/recv.
2014-11-19 11:33:02 -07:00
Nathan Hjelm
b75bb8aea7 Update pml for btl changes 2014-11-19 11:33:02 -07:00
Jeff Squyres
7a5b2e9b13 ob1: change an OPAL_UNLIKELY to OPAL_LIKELY
Per
924d39e415 (commitcomment-8378266),
this OPAN_UNLIKELY should really be OPAL_LIKELY.
2014-10-31 03:22:55 -07:00
George Bosilca
924d39e415 Always OBJ_DESTRUCT the send request. 2014-10-30 01:28:50 -04:00
Gilles Gouaillardet
ed93c8787d ob1: add a destructor to mca_pml_ob1_recv_request_t
opal_mutex_t must be OBJ_DESTRUCTed in order to avoid
a memory leak (pthread_mutex_init allocates memory under
Cygwin, so pthread_mutex_destroy is mandatory)

Thanks to Marco Atzeri for reporting this issue
2014-10-29 13:30:29 +09:00
Jeff Squyres
c22e1ae33b configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros
These two macros set the prefix for the OPAL and ORTE libraries,
respectively.  Specifically, the OPAL library will be named
libPREFIXopen-pal.la and the ORTE library will be named
libPREFIXopen-rte.la.

These macros must be called, even if the prefix argument is empty.

The intent is that Open MPI will call these macros with an empty
prefix, but other projects (such as ORCM) will call these macros with
a non-empty prefix.  For example, ORCM libraries can be named
liborcm-open-pal.la and liborcm-open-rte.la.

This scheme is necessary to allow running Open MPI applications under
systems that use their own versions of ORTE and OPAL.  For example,
when running MPI applications under ORTE, if the ORTE and OPAL
libraries between OMPI and ORCM are not identical (which, because they
are released at different times, are likely to be different), we need
to ensure that the OMPI applications link against their ORTE and OPAL
libraries, but the ORCM executables link against their ORTE and OPAL
libraries.
2014-10-22 10:32:19 -07:00
George Bosilca
cee2a4e5c8 Missing alloca.h. Thanks Paul for catching this.
This commit was SVN r32388.
2014-08-01 03:28:23 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Nathan Hjelm
f960e4273e Fix typo in r32196
The wrong descriptor field was used when calculating the size received when
using the RDMA rendevous protcol.

This commit was SVN r32232.

The following SVN revision numbers were found above:
  r32196 --> open-mpi/ompi@a14e0f10d4
2014-07-14 21:00:53 +00:00
Gilles Gouaillardet
77184b5c4c Fix a cornercase with MPI_PROC_NULL persistent requests
Handle OMPI_REQUEST_NOOP in MPI_Startall rather than PML

cmr=v1.8.2:reviewer=bosilca:ticket=4764

This commit was SVN r32213.

The following Trac tickets were found above:
  Ticket 4764 --> https://svn.open-mpi.org/trac/ompi/ticket/4764
2014-07-11 04:37:01 +00:00
Nathan Hjelm
1b9621eeb0 Fix typo in r32196
This commit was SVN r32202.

The following SVN revision numbers were found above:
  r32196 --> open-mpi/ompi@a14e0f10d4
2014-07-10 18:43:49 +00:00
Nathan Hjelm
a14e0f10d4 Per RFC: Remove des_src and des_dst members from the
mca_btl_base_segment_t and replace them with des_local and des_remote

This change also updates the BTL version to 3.0.0. This commit does
not represent the final version of BTL 3.0.0. More changes are coming.

In making this change I updated all of the BTLs as well as BTL user's
to use the new structure members. Please evaluate your component to
ensure the changes are correct.

RFC text:

This is the first of several BTL interface changes I am proposing for
the 1.9/2.0 release series.

What: Change naming of btl descriptor members. I propose we change
des_src and des_dst (and their associated counts) to be des_local and
des_remote. For receive callbacks the des_local member will be used to
communicate the segment information to the callback. The proposed change
will include updating all of the doxygen in btl.h as well as updating
all BTLs and BTL users to use the new naming scheme.

Why: My btl usage makes use of both put and get operations on the same
descriptor. With the current naming scheme I need to ensure that there
is consistency beteen the segments described in des_src and des_dst
depending on whether a put or get operation is executed. Additionally,
the current naming prevents BTLs that do not require prepare/RMA matched
operations (do not set MCA_BTL_FLAGS_RDMA_MATCHED) from executing
multiple simultaneous put AND get operations. At the moment the
descriptor can only be used with one or the other. The naming change
makes it easier for BTL users to setup/modify descriptors for RMA
operations as the local segment and remote segment are always in the
same member field. The only issue I forsee with this change is that it
will require a little more work to move BTL fixes to the 1.8 release
series.

This commit was SVN r32196.
2014-07-10 16:31:15 +00:00
Gilles Gouaillardet
8d3bea2771 Fix the cornercase with MPI_PROC_NULL persistent requests.
This corner case is now handled in the pml so the same code
is invoked for both MPI_Start and MPI_Startall.
This also correctly report an error if MPI_Startall is invoked twice
on a MPI_PROC_NULL persistent request.

This commit was SVN r32139.
2014-07-04 04:58:52 +00:00
George Bosilca
843ef1fcb0 ompi_mpi_abort had one extra argument that was never used. Clean it up.
This commit was SVN r32124.
2014-07-03 00:34:44 +00:00
George Bosilca
fd0e1b7261 If we detect an error on a request that has been already released
at the MPI level, we should call abort on MPI_COMM_WORLD.

Fixes ticket #1943.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31982.
2014-06-10 16:24:13 +00:00
Jeff Squyres
b0a6e42f45 pml ob1: use the pre-computed size from the free lists
Based on a suggestion from George on #31806, use the pre-computed
sizes rather than duplicating the computation math (which may change
someday in the future).

cmr=v1.8.2:ticket=trac:4647

This commit was SVN r31841.

The following Trac tickets were found above:
  Ticket 4647 --> https://svn.open-mpi.org/trac/ompi/ticket/4647
2014-05-20 20:32:25 +00:00
George Bosilca
db9660264e Update the error message to pinpoint the right location.
Thanks Tim.

This commit was SVN r31839.
2014-05-20 20:08:42 +00:00
George Bosilca
685f051557 Move the allocator initialization from open to init. This clean
a memory leak. Similar changes shuld be applied to all the 
other PML that are copies of OB1. This patch is related to
#4653.

This commit was SVN r31838.
2014-05-20 19:34:18 +00:00
Nathan Hjelm
a1d5ce0893 pml/ob1: as per past RFC bring the inline send optimization to
MPI_Isend.

I filed an RFC for this optimization some time back. It is a
relatively simple optimization. If the data associated with an
MPI_Isend can be put on the wire without allocating an MPI_Request
then do so. In this case we can legally return omp_request_empty
which will correctly indicate that the request is complete and that is
was not cancelled (these are the only requirements on send requests).

cmr=v1.8.3:reviewer=bosilca

This commit was SVN r31828.
2014-05-19 19:34:59 +00:00
Gilles Gouaillardet
2b89aac15b Fix a typo in MCA_PML_OB1_RECV_REQUEST_UNPACK
cmr=v1.8.2:reviewer=rhc

This commit was SVN r31817.
2014-05-19 11:00:13 +00:00