1
1

5227 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
1b564f62bd Revert "Merge pull request #275 from hjelmn/btlmod"
This reverts commit ccaecf0fd6c862877e6a1e2643f95fa956c87769, reversing
changes made to 6a19bf85dde5306f559f09952cf3919d97f52502.
2014-11-19 23:22:43 -07:00
Nathan Hjelm
0d413fb73f Revert "Remove stale file reference"
This reverts commit 4c8fa17234bcaab07460fb12fbf826ceee4f8df2.
2014-11-19 23:16:16 -07:00
Ralph Castain
4c8fa17234 Remove stale file reference 2014-11-19 18:32:19 -08:00
Nathan Hjelm
5a0a48c3c4 osc: remove lingering rdma component files 2014-11-19 12:11:54 -07:00
Nathan Hjelm
1a5349ec79 ompi ignore bfo until it is updated for new btl interface 2014-11-19 11:33:04 -07:00
Nathan Hjelm
8f1a44e60e bml/r2: add all rdma btls even if another btl has higher exclusivity
Background: In order to support atomics each btl needs to provide support
for communicating with self unless the btl module can guarantee global
atomicity. Before this commit bml/r2 discarded any BTL with lower
exclusivity than an existing send btl. This would cause the BML to
discard any btl other than self.

The new behavior is as follows:

 - If an exisiting send btl has higher exclusivity then the btl will not be
   added to the send btl list for the endpoint.

 - If a btl provides RDMA support then it is always added to the rdma btl
   list.

 - bml_btl weight for send btls is now calculated across all send btls.

 - bml_btl weight for rdma btls is now calculated across all rdma btls.

With this change self should still win as the only send btl for loopback
without disqualifying other btls (ugni, openib) for atomic operations.
2014-11-19 11:33:04 -07:00
Nathan Hjelm
22625b005b osc/pt2pt: threading fixes and code cleanup 2014-11-19 11:33:04 -07:00
Nathan Hjelm
0110603782 ob1 warning fix 2014-11-19 11:33:04 -07:00
Nathan Hjelm
45d1fac8af ugni thread safety fixes 2014-11-19 11:33:03 -07:00
Nathan Hjelm
29e4e1c90a Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma 2014-11-19 11:33:03 -07:00
Nathan Hjelm
24427639b6 Fix ob1 warnings 2014-11-19 11:33:03 -07:00
Nathan Hjelm
271818f887 pml/ob1: bug fixes and adjustments for changes in btl_sendi behavior 2014-11-19 11:33:03 -07:00
Nathan Hjelm
ee2b111011 Update PML for latest BTL update 2014-11-19 11:33:02 -07:00
Nathan Hjelm
49ff5a79d0 Update BML for the latest BTL update 2014-11-19 11:33:02 -07:00
Nathan Hjelm
c61e017177 pml: updates to reflect member changes in mca_btl_base_descriptor_t
and mca_btl_base_module_t structures
2014-11-19 11:33:02 -07:00
Nathan Hjelm
5936411a07 pml/ob1: when using btl_get try to register the entire region before
attempting to break the get into multiple rdma fragments

A little background. Historically ob1 always registered the entire memory
region when the RGET protocol was in use. This changed when Mellanox
added support to fragment RGET using the btl_prepare_dst function. Now
that the BTL layer has changed to split out the limits of get/put there
is explicit fragmentation code in ob1. Before this commit the registration
was still done per RGET fragment.

This commit will attempt to register the entire region before creating
RGET fragments. If the registration is successfull then all RGET
fragments will use this registration otherwise they will each attempt
to register their own segment of the receive buffer. If that fails
enough times each fragment will give up and fall back on send/recv.
2014-11-19 11:33:02 -07:00
Nathan Hjelm
b75bb8aea7 Update pml for btl changes 2014-11-19 11:33:02 -07:00
Nathan Hjelm
66bd698eaf Update BML for BTL interface changes 2014-11-19 11:33:02 -07:00
Andrew Friedley
b97cda7fd9 PSM MTL: Don't connect procs already connected
PSM has issues when trying calling psm_ep_connect() more than once for a
specific peer.  Use the psm_ep_connect mask argument to avoid connecting
to processes that are already connected.

OMPI ticket #268.
2014-11-12 15:52:02 -08:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Gilles Gouaillardet
df6115aac4 topo/base: fix uninitialized variable
this commit fixes a bug introduced by commit open-mpi/ompi@e7c59e3adb
2014-11-10 13:06:50 +09:00
bosilca
e7c59e3adb Merge pull request #227 from ggouaillardet/rfc/coll_basic_neighbor
RFC/coll basic neighbor
2014-11-07 11:33:25 -05:00
Gilles Gouaillardet
64c18686b7 fix ompi_request_wait vs ompi_request_wait_all and
MPI_STATUS_IGNORE vs MPI_STATUSES_IGNORE
2014-11-04 12:16:30 +09:00
Andrew Friedley
273135dbc7 Don't open PSM context when run on single node
When running many ranks on a single node using PSM, it's possible to
exhaust the network hardware contexts (there are 16).  This patch checks
if only a single node is being used. If so, the 'ipath' component of PSM
is disabled and no hardware contexts are opened.
2014-11-03 07:18:16 -08:00
Ralph Castain
616f0894ce Add missing parens on values being passed to OPAL_THREAD_ADD32 2014-10-31 19:11:48 -07:00
Jeff Squyres
7a5b2e9b13 ob1: change an OPAL_UNLIKELY to OPAL_LIKELY
Per
924d39e415 (commitcomment-8378266),
this OPAN_UNLIKELY should really be OPAL_LIKELY.
2014-10-31 03:22:55 -07:00
Nathan Hjelm
672d96704c osc/rdma: fix regression introduced by eed7b45db59b356f6257f3b019e77643f14f84ce
The ompi_osc_signal_outgoing was moved from ompi_osc_rdma_frag_start to frag_send
which gave correct results for the bug reproducer but hangs with simple OSC
tests. Moved the ompi_osc_signal_outgoing back and it now passes all tests.

Closes #256
2014-10-30 23:16:11 -06:00
George Bosilca
924d39e415 Always OBJ_DESTRUCT the send request. 2014-10-30 01:28:50 -04:00
Gilles Gouaillardet
ed93c8787d ob1: add a destructor to mca_pml_ob1_recv_request_t
opal_mutex_t must be OBJ_DESTRUCTed in order to avoid
a memory leak (pthread_mutex_init allocates memory under
Cygwin, so pthread_mutex_destroy is mandatory)

Thanks to Marco Atzeri for reporting this issue
2014-10-29 13:30:29 +09:00
Nathan Hjelm
23dd3af946 osc/rdma: use unsigned types for all counters
Some of the counters used by the "rdma" one-sided component are intended
to overflow. Since overflow behavior is undefined for signed integers in
C it is safer to use unsigned integers here.
2014-10-22 15:36:15 -06:00
Jeff Squyres
c22e1ae33b configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros
These two macros set the prefix for the OPAL and ORTE libraries,
respectively.  Specifically, the OPAL library will be named
libPREFIXopen-pal.la and the ORTE library will be named
libPREFIXopen-rte.la.

These macros must be called, even if the prefix argument is empty.

The intent is that Open MPI will call these macros with an empty
prefix, but other projects (such as ORCM) will call these macros with
a non-empty prefix.  For example, ORCM libraries can be named
liborcm-open-pal.la and liborcm-open-rte.la.

This scheme is necessary to allow running Open MPI applications under
systems that use their own versions of ORTE and OPAL.  For example,
when running MPI applications under ORTE, if the ORTE and OPAL
libraries between OMPI and ORCM are not identical (which, because they
are released at different times, are likely to be different), we need
to ensure that the OMPI applications link against their ORTE and OPAL
libraries, but the ORCM executables link against their ORTE and OPAL
libraries.
2014-10-22 10:32:19 -07:00
Jeff Squyres
01fd96bfa5 Revert "Provide a mechanism by which an upstream project can rename
the OPAL and ORTE libraries. This is required by projects such as ORCM
that have their own ORTE and OPAL libraries in order to avoid library
confusion. By renaming their version of the libraries, the OMPI
applications can correctly dynamically load the correct one for their
build."

This reverts commit 63f619f8719fb853d76130d667f228b0a523bd60.
2014-10-22 10:32:11 -07:00
yosefe
b4f569b4d4 yalla: address comments on #246 by @jsquires 2014-10-22 10:42:56 +03:00
yosefe
ce7c748e51 Add new PML yalla, which uses mxm directly to reduce overhead.
http://starwars.wikia.com/wiki/Ubed_Yalla
2014-10-21 16:08:24 +03:00
Nadezhda Kogteva
2bce929330 MTL MXM cleanup: unnecessary OMPI_MTL_MXM_CONNECT_ON_FIRST_COMM variable removed 2014-10-20 10:29:47 +03:00
bosilca
d819939841 Merge pull request #233 from ggouaillardet/rfc/coll_module_disable
Provide a symmetric behavior for the activation/deactivation of collective modules.
2014-10-16 09:22:04 -04:00
George Bosilca
7541c03b4c Mark all instances where atomic operations are used but their return value is unnecessary 2014-10-15 21:47:32 -04:00
Edgar Gabriel
0219c87039 set the fs_ptr to NULL in case of an error, to avoid a malicious free on file_close. 2014-10-14 13:09:06 -05:00
Gilles Gouaillardet
e3f74aca1c Correctly mote the pointer back by the true_lb.
Fixes #231
2014-10-14 16:26:54 +09:00
Gilles Gouaillardet
0f983d5a4f add a disable function for coll module 2014-10-14 14:46:36 +09:00
Devendar Bureddy
7a6b4c36b0 HCOLL: Update the proc structure dereference
Update the proc structure dereference to reflect the new opal_proc_t
super field
2014-10-13 20:49:19 +03:00
Devendar Bureddy
b8d2a15be9 HCOLL: by default off 2014-10-13 20:49:09 +03:00
Vasily Filipov
a215a4831d MTL/MXM: disable "bulk_connect" by default. 2014-10-13 09:47:56 +03:00
Ralph Castain
63f619f871 Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build. 2014-10-10 11:39:08 -07:00
Gilles Gouaillardet
8eb2d62919 coll/sm: fix an other memory leak 2014-10-10 19:54:45 +09:00
Gilles Gouaillardet
27e4389259 * comment on communicator creation in mca_topo_base_dist_graph_create(...)
* use accesors to retrieve topo info
2014-10-10 16:07:20 +09:00
Gilles Gouaillardet
5d44a30111 coll/sm: fix minor memory leaks
port 4488.1.patch attached in #196 to master
2014-10-10 14:21:34 +09:00
Gilles Gouaillardet
76204dfafe coll/basic: fix segmentation fault in neighborhood collectives if the degree
of the topology is higher than the communicator size

It is possible to have a topology degree higher than the size of the communicator.
For example, a periodic cartesian communicator on MPI_COMM_SELF. This will leave
the neighborhood collectives with a request buffer that is too small.

This commits introduces a semantic change :
from now, c_topo must be set before invoking coll_select
2014-10-10 11:56:04 +09:00
Gilles Gouaillardet
2f67f29b85 Revert "coll/basic: fix segmentation fault in neighborhood collectives if the degree"
This reverts commit 9c788ff9400960823637dc0eebb3a8640414fa05.
2014-10-10 11:29:06 +09:00
Elena
c905fe9b78 pmix: removed pmix_base_direct modex mca parameter, renamed orte_full_modex_cutoff and ompi_hostname_cutoff to direct_modex_cutoff 2014-10-09 06:15:31 +02:00