1
1
Граф коммитов

334 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
928bf977b2 Merge pull request #361 from hppritcha/topic/psm_cancel
mtl/psm: fix problem with cancel sends
2015-01-28 13:58:29 -07:00
Howard Pritchard
4637b49bf4 Merge pull request #362 from hppritcha/topic/mtl_grammar_fix
mtl: minor grammar fix in comments
2015-01-28 05:35:24 -07:00
Howard Pritchard
e177dfc226 mtl: minor grammar fix in comments 2015-01-28 04:51:42 -07:00
Howard Pritchard
4643110c5e mtl/psm: fix problem with cancel sends
incorporate patch from @afriedle-intel to fix
problem with psm mtl cancel of sends.

Sorry for the delay in getting to this.

Fixes 347
2015-01-27 20:02:05 -07:00
Yohann Burette
a741c44035 mtl/ofi: fix compiler warnings. 2015-01-27 11:14:40 -08:00
Yohann Burette
a4c1faae37 mtl/ofi: Add OFI provider option.
The user can now specify which OFI provider to use with the MTL.
e.g. --mca mtl ofi --mca mtl_ofi_provider psm
2015-01-26 08:38:11 -08:00
Yohann Burette
3c06fd77db mtl/ofi: remove unneeded FI_REMOTE_COMPLETE flag. 2015-01-23 10:55:03 -08:00
Yohann Burette
b88708bf68 mtl/ofi: use fi_ep_bind(). 2015-01-23 10:50:10 -08:00
Jeff Squyres
9cc60b9e12 ofi mtl: update to new libfabric constant name 2015-01-15 07:12:39 -08:00
Yohann Burette
bc93e04604 Fixed code around fi_av_insert(). 2015-01-14 08:43:57 -08:00
Yohann Burette
f01dd429df Reset pointer to NULL to prevent double-freeing. 2015-01-05 17:01:37 -08:00
Yohann Burette
1e24da90fe Fix fi_av_insert return code test. 2015-01-05 17:01:37 -08:00
Yohann Burette
5944c294ad Add return code testing for fi_mr_reg. 2015-01-05 17:01:37 -08:00
Jeff Squyres
c621d1e622 libfabric: don't LIBADD the common library in the static case
Adding the libfabric common library in the --disable-dlopen case will
result in duplicate symbols.
2014-12-18 11:04:08 -08:00
Jeff Squyres
d6f059f538 configury: add some descriptive output messages in configure
Ensure that the ofi MTL and the usnic BTL have good descriptive output
messages in configure.
2014-12-17 13:36:01 -08:00
Jeff Squyres
4dcb92ab0b ofi: remove use of non-existent macros 2014-12-17 13:36:01 -08:00
Jeff Squyres
f3be0a5882 ofi: ensure that null_addr is initialized to NULL
And when null_addr is freed, set it back to NULL so that we don't try
to free it again in the error: label.
2014-12-16 17:32:15 -08:00
Jeff Squyres
8c7b6d266e ofi: add "unused" attribute to rc to prevent compiler warning 2014-12-16 17:30:46 -08:00
Yohann Burette
58a7a1e4ac Adding an Open Fabrics Interfaces (OFI) MTL.
This MTL implementation uses the OFIWG libfabric's tag messaging capabilities.
2014-12-16 15:43:39 -08:00
Andrew Friedley
e7bcad0c13 Remove unused variable.
Reported by @adrianreber, this patch removes an unused variable in the
PSM MTL, silencing a compiler warning.
2014-11-21 07:51:44 -08:00
Andrew Friedley
b97cda7fd9 PSM MTL: Don't connect procs already connected
PSM has issues when trying calling psm_ep_connect() more than once for a
specific peer.  Use the psm_ep_connect mask argument to avoid connecting
to processes that are already connected.

OMPI ticket #268.
2014-11-12 15:52:02 -08:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Andrew Friedley
273135dbc7 Don't open PSM context when run on single node
When running many ranks on a single node using PSM, it's possible to
exhaust the network hardware contexts (there are 16).  This patch checks
if only a single node is being used. If so, the 'ipath' component of PSM
is disabled and no hardware contexts are opened.
2014-11-03 07:18:16 -08:00
Nadezhda Kogteva
2bce929330 MTL MXM cleanup: unnecessary OMPI_MTL_MXM_CONNECT_ON_FIRST_COMM variable removed 2014-10-20 10:29:47 +03:00
Vasily Filipov
a215a4831d MTL/MXM: disable "bulk_connect" by default. 2014-10-13 09:47:56 +03:00
Ralph Castain
41c6058153 Bring over changes to MXM from pmix branch:
MTL MXM: establish endpoint connection on the first communication when direct_modex used

This commit was SVN r32668.
2014-09-03 18:22:11 +00:00
Todd Kordenbrock
6a3225d800 Fix invalid symbols left by the PMIx merge.
This commit was SVN r32597.
2014-08-25 16:30:26 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Alina Sklarevich
a914c68356 MTL MXM: fix check-help-string.pl errors and warnings.
This commit was SVN r32533.
2014-08-14 07:46:56 +00:00
Vasily Filipov
5ca2fffa44 MTL/MXM: call for ompi_proc_world instead of ompi_comm_size during del_procs.
This commit was SVN r32504.
2014-08-11 11:52:23 +00:00
Mike Dubman
3c8a4d7d2d mxm: opal refactoring voices
http://www.open-mpi.org/community/lists/devel/2014/08/15590.php

This commit was SVN r32486.
2014-08-10 04:35:56 +00:00
Nathan Hjelm
0f15afa4d9 Fix typo in psm mtl
This commit was SVN r32332.
2014-07-28 22:00:03 +00:00
Ryan Grant
caa10a5faf Portals fixes after latest move
This commit was SVN r32330.
2014-07-28 19:25:03 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Mike Dubman
da8df859b3 MXM: use builk connection establishment API
fixed by Vasily, reviewed by Yossi/Miked

cmr=v1.8.2:reviwer=ompi-rm1.8

This commit was SVN r32256.
2014-07-17 08:35:55 +00:00
Mike Dubman
e342a11c2e opal envlist mca: implement Jeff`s quibbles
fixed by Elena, reviewed by Miked

This commit was SVN r32216.
2014-07-11 07:23:20 +00:00
Joshua Ladd
057370364d Opal: Add a new MCA variable type "version_string". Also add a
new flag to ompi_info that allows a user to print all MCA variables of a specific type.  

 --type version_string

This command will print all MCA variables of type version_string.

This feature was developed by Elena Shipunova and was reviewed by Josh Ladd.

This commit was SVN r32166.
2014-07-09 01:37:23 +00:00
Mike Dubman
b51a42aeca MXM: fix mxm cleanup, should be called for any compat API
fixe by miked, reviewed by yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31984.
2014-06-12 15:46:38 +00:00
Alina Sklarevich
7b8ad47e93 MXM: fix env variable name to hint for thread usage in mxm
reviewed by MikeD
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31968.
2014-06-09 06:40:32 +00:00
Alina Sklarevich
f8a664f5ec MXM: generate the jobid only for MXM versions under v2.0.
reviewed by miked
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31910.
2014-06-01 13:29:24 +00:00
Mike Dubman
fad1063980 MXM: fix warning
reviewed by Yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31855.
2014-05-21 07:50:05 +00:00
Yossi Etigin
6aa5680059 Revert r30966.
cmr=v1.8.1:reviewer=ompi-gk1.8

This commit was SVN r31593.

The following SVN revision numbers were found above:
  r30966 --> open-mpi/ompi@280e96c99a
2014-05-01 22:17:09 +00:00
Nathan Hjelm
3e5388eaa6 mtl/psm: do not limit PSM to 8191 context ids
The old default context id maximum was committed to the trunk in
2006. After some discussion with Intel it appears this is restricting
the mtl to an arbirarly small number of communicators. Increasing the
default to allow up to 2^16 - 1 context ids.

Refs trac:4574

cmr=v1.8.2

This commit was SVN r31574.

The following Trac tickets were found above:
  Ticket 4574 --> https://svn.open-mpi.org/trac/ompi/ticket/4574
2014-04-30 22:10:15 +00:00
Mike Dubman
a4990de055 mca: track external lib version (runtime/compiletime) for mca component
based on thread: http://www.open-mpi.org/community/lists/devel/2014/04/14505.php

Create mca parameter to track runtime/compiletime ext lib version for component.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31487.
2014-04-22 18:02:26 +00:00
Mike Dubman
6f057e57ba MXM: enable on demand mapping for only MPI mxm context
fixed by Devender, reviewed by Yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31463.
2014-04-20 09:15:37 +00:00
Ryan Grant
ca0a7b1a9a Correct typo in r31332, mtl_portals_enpoint.h -> mtl_portals_endpoint.h
This commit was SVN r31338.

The following SVN revision numbers were found above:
  r31332 --> open-mpi/ompi@b12ee27b3d
2014-04-08 14:41:51 +00:00
Ralph Castain
b12ee27b3d Add missing files - thanks to Mr. Anonymous for reporting them as missing from the 1.8 tarball
cmr=v1.8.1:reviewer=jsquyres:subject=add missing portals4 files

This commit was SVN r31332.
2014-04-08 02:55:14 +00:00
Alina Sklarevich
5cbf085dc2 mtl mxm: silent a warning.
in ompi_mtl_mxm_add_procs, define the ep_index variable only
for an older version of mxm.

submitted by Alina, reviewed by Mike.
cmr=v1.8:reviewer=ompi-rm1.8

This commit was SVN r31245.
2014-03-27 08:39:51 +00:00
Ralph Castain
e4efd5675f Per telecon, add comment indicating this needs to be fixed
Refs trac:4354

This commit was SVN r30991.

The following Trac tickets were found above:
  Ticket 4354 --> https://svn.open-mpi.org/trac/ompi/ticket/4354
2014-03-11 15:57:11 +00:00
Yossi Etigin
280e96c99a In mtl_mxm, don't disconnect from a proc with refcount > 1.
This will keep the connection until mxm endpoint is destroyed.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30966.
2014-03-09 08:35:44 +00:00