1
1
Граф коммитов

5358 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
e0927895db Grrr...how many files did they forget? 2015-01-06 19:40:18 -08:00
Ralph Castain
84c41429e9 Add missing file 2015-01-06 18:41:11 -08:00
Nathan Hjelm
e68ed2876c osc/pt2pt: threading fixes and code cleanup 2015-01-06 13:39:16 -07:00
Nathan Hjelm
9eba7b9d35 Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma 2015-01-06 13:38:55 -07:00
Howard Pritchard
ec632001b1 Merge pull request #329 from ggouaillardet/topic/romio_refresh
refresh ROMIO based on v3.2a2-84-gef1cf14
2015-01-06 10:27:20 -07:00
Gilles Gouaillardet
0914de9eae refresh ROMIO based on v3.2a2-84-gef1cf14 2015-01-06 19:43:58 +09:00
Yohann Burette
f01dd429df Reset pointer to NULL to prevent double-freeing. 2015-01-05 17:01:37 -08:00
Yohann Burette
1e24da90fe Fix fi_av_insert return code test. 2015-01-05 17:01:37 -08:00
Yohann Burette
5944c294ad Add return code testing for fi_mr_reg. 2015-01-05 17:01:37 -08:00
Howard Pritchard
c857cc926c Merge pull request #327 from hppritcha/topic/async_progress
Topic/async progress
2015-01-05 16:20:44 -07:00
Devendar Bureddy
e732152304 HCOLL: Fix hcoll supported datatype checks corretcly 2015-01-02 21:18:12 +02:00
Howard Pritchard
3fc7b389ff initial async progress changes for gni 2014-12-24 11:50:23 -07:00
Gilles Gouaillardet
b9349d2eb9 coll/libnbc: fix MPI_Ireduce_scatter for single task communicator
when MPI_IN_PLACE is not used.

that commit fixes a regression introduced
open-mpi/ompi@49e79a9ade
2014-12-24 12:12:58 +09:00
Devendar Bureddy
e398ad6619 HCOLL: Fix OMPI to HCOLL predefined datatypes, Ops mapping 2014-12-23 22:30:29 +02:00
Jeff Squyres
c621d1e622 libfabric: don't LIBADD the common library in the static case
Adding the libfabric common library in the --disable-dlopen case will
result in duplicate symbols.
2014-12-18 11:04:08 -08:00
Rolf vandeVaart
3ec9685ee0 Add missing file to sources. Without this, tarball build does not work 2014-12-18 07:17:28 -08:00
Jeff Squyres
d6f059f538 configury: add some descriptive output messages in configure
Ensure that the ofi MTL and the usnic BTL have good descriptive output
messages in configure.
2014-12-17 13:36:01 -08:00
Jeff Squyres
4dcb92ab0b ofi: remove use of non-existent macros 2014-12-17 13:36:01 -08:00
Jeff Squyres
f3be0a5882 ofi: ensure that null_addr is initialized to NULL
And when null_addr is freed, set it back to NULL so that we don't try
to free it again in the error: label.
2014-12-16 17:32:15 -08:00
Jeff Squyres
8c7b6d266e ofi: add "unused" attribute to rc to prevent compiler warning 2014-12-16 17:30:46 -08:00
Yohann Burette
58a7a1e4ac Adding an Open Fabrics Interfaces (OFI) MTL.
This MTL implementation uses the OFIWG libfabric's tag messaging capabilities.
2014-12-16 15:43:39 -08:00
Mangala Jyothi Bhaskar
68d78fd718 Aggregator selection logic Part 2 and reorganized Part1 2014-12-16 15:48:40 -06:00
Mangala Jyothi Bhaskar
2bd52cc410 Initialize req variable to fix a warning 2014-12-16 13:24:28 -06:00
Edgar Gabriel
7e41e0e62b fix a segfault in the two-phase I/O algorithm for fileviews of 0 byte size. 2014-12-01 15:59:00 -06:00
yosefe
3f152733bf Add yalla to the list of default PMLs 2014-12-01 13:11:28 +02:00
Edgar Gabriel
0758d7570e part 1 of the fix to get rid of the missing symbols that prevent the sub-modules to be loaded. 2014-11-29 20:01:36 -06:00
Ralph Castain
48f702827e First part of memory leak cleanups from Gilles 2014-11-24 16:53:33 -08:00
George Bosilca
fb6ecdfd18 Fix few typos. 2014-11-24 01:48:09 -05:00
Andrew Friedley
e7bcad0c13 Remove unused variable.
Reported by @adrianreber, this patch removes an unused variable in the
PSM MTL, silencing a compiler warning.
2014-11-21 07:51:44 -08:00
George Bosilca
d622db783d Based on https://github.com/open-mpi/ompi/pull/262, we should use
true_lb while computing the lower bound.
2014-11-21 19:16:05 +09:00
Gilles Gouaillardet
705147e98b coll/tuned: fix allgather bruck algorithm 2014-11-21 19:16:05 +09:00
Nathan Hjelm
1b564f62bd Revert "Merge pull request #275 from hjelmn/btlmod"
This reverts commit ccaecf0fd6, reversing
changes made to 6a19bf85dd.
2014-11-19 23:22:43 -07:00
Nathan Hjelm
0d413fb73f Revert "Remove stale file reference"
This reverts commit 4c8fa17234.
2014-11-19 23:16:16 -07:00
Ralph Castain
4c8fa17234 Remove stale file reference 2014-11-19 18:32:19 -08:00
Nathan Hjelm
5a0a48c3c4 osc: remove lingering rdma component files 2014-11-19 12:11:54 -07:00
Nathan Hjelm
1a5349ec79 ompi ignore bfo until it is updated for new btl interface 2014-11-19 11:33:04 -07:00
Nathan Hjelm
8f1a44e60e bml/r2: add all rdma btls even if another btl has higher exclusivity
Background: In order to support atomics each btl needs to provide support
for communicating with self unless the btl module can guarantee global
atomicity. Before this commit bml/r2 discarded any BTL with lower
exclusivity than an existing send btl. This would cause the BML to
discard any btl other than self.

The new behavior is as follows:

 - If an exisiting send btl has higher exclusivity then the btl will not be
   added to the send btl list for the endpoint.

 - If a btl provides RDMA support then it is always added to the rdma btl
   list.

 - bml_btl weight for send btls is now calculated across all send btls.

 - bml_btl weight for rdma btls is now calculated across all rdma btls.

With this change self should still win as the only send btl for loopback
without disqualifying other btls (ugni, openib) for atomic operations.
2014-11-19 11:33:04 -07:00
Nathan Hjelm
22625b005b osc/pt2pt: threading fixes and code cleanup 2014-11-19 11:33:04 -07:00
Nathan Hjelm
0110603782 ob1 warning fix 2014-11-19 11:33:04 -07:00
Nathan Hjelm
45d1fac8af ugni thread safety fixes 2014-11-19 11:33:03 -07:00
Nathan Hjelm
29e4e1c90a Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma 2014-11-19 11:33:03 -07:00
Nathan Hjelm
24427639b6 Fix ob1 warnings 2014-11-19 11:33:03 -07:00
Nathan Hjelm
271818f887 pml/ob1: bug fixes and adjustments for changes in btl_sendi behavior 2014-11-19 11:33:03 -07:00
Nathan Hjelm
ee2b111011 Update PML for latest BTL update 2014-11-19 11:33:02 -07:00
Nathan Hjelm
49ff5a79d0 Update BML for the latest BTL update 2014-11-19 11:33:02 -07:00
Nathan Hjelm
c61e017177 pml: updates to reflect member changes in mca_btl_base_descriptor_t
and mca_btl_base_module_t structures
2014-11-19 11:33:02 -07:00
Nathan Hjelm
5936411a07 pml/ob1: when using btl_get try to register the entire region before
attempting to break the get into multiple rdma fragments

A little background. Historically ob1 always registered the entire memory
region when the RGET protocol was in use. This changed when Mellanox
added support to fragment RGET using the btl_prepare_dst function. Now
that the BTL layer has changed to split out the limits of get/put there
is explicit fragmentation code in ob1. Before this commit the registration
was still done per RGET fragment.

This commit will attempt to register the entire region before creating
RGET fragments. If the registration is successfull then all RGET
fragments will use this registration otherwise they will each attempt
to register their own segment of the receive buffer. If that fails
enough times each fragment will give up and fall back on send/recv.
2014-11-19 11:33:02 -07:00
Nathan Hjelm
b75bb8aea7 Update pml for btl changes 2014-11-19 11:33:02 -07:00
Nathan Hjelm
66bd698eaf Update BML for BTL interface changes 2014-11-19 11:33:02 -07:00
Andrew Friedley
b97cda7fd9 PSM MTL: Don't connect procs already connected
PSM has issues when trying calling psm_ep_connect() more than once for a
specific peer.  Use the psm_ep_connect mask argument to avoid connecting
to processes that are already connected.

OMPI ticket #268.
2014-11-12 15:52:02 -08:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Gilles Gouaillardet
df6115aac4 topo/base: fix uninitialized variable
this commit fixes a bug introduced by commit open-mpi/ompi@e7c59e3adb
2014-11-10 13:06:50 +09:00
bosilca
e7c59e3adb Merge pull request #227 from ggouaillardet/rfc/coll_basic_neighbor
RFC/coll basic neighbor
2014-11-07 11:33:25 -05:00
Gilles Gouaillardet
64c18686b7 fix ompi_request_wait vs ompi_request_wait_all and
MPI_STATUS_IGNORE vs MPI_STATUSES_IGNORE
2014-11-04 12:16:30 +09:00
Andrew Friedley
273135dbc7 Don't open PSM context when run on single node
When running many ranks on a single node using PSM, it's possible to
exhaust the network hardware contexts (there are 16).  This patch checks
if only a single node is being used. If so, the 'ipath' component of PSM
is disabled and no hardware contexts are opened.
2014-11-03 07:18:16 -08:00
Ralph Castain
616f0894ce Add missing parens on values being passed to OPAL_THREAD_ADD32 2014-10-31 19:11:48 -07:00
Jeff Squyres
7a5b2e9b13 ob1: change an OPAL_UNLIKELY to OPAL_LIKELY
Per
924d39e415 (commitcomment-8378266),
this OPAN_UNLIKELY should really be OPAL_LIKELY.
2014-10-31 03:22:55 -07:00
Nathan Hjelm
672d96704c osc/rdma: fix regression introduced by eed7b45db5
The ompi_osc_signal_outgoing was moved from ompi_osc_rdma_frag_start to frag_send
which gave correct results for the bug reproducer but hangs with simple OSC
tests. Moved the ompi_osc_signal_outgoing back and it now passes all tests.

Closes #256
2014-10-30 23:16:11 -06:00
George Bosilca
924d39e415 Always OBJ_DESTRUCT the send request. 2014-10-30 01:28:50 -04:00
Gilles Gouaillardet
ed93c8787d ob1: add a destructor to mca_pml_ob1_recv_request_t
opal_mutex_t must be OBJ_DESTRUCTed in order to avoid
a memory leak (pthread_mutex_init allocates memory under
Cygwin, so pthread_mutex_destroy is mandatory)

Thanks to Marco Atzeri for reporting this issue
2014-10-29 13:30:29 +09:00
Nathan Hjelm
23dd3af946 osc/rdma: use unsigned types for all counters
Some of the counters used by the "rdma" one-sided component are intended
to overflow. Since overflow behavior is undefined for signed integers in
C it is safer to use unsigned integers here.
2014-10-22 15:36:15 -06:00
Jeff Squyres
c22e1ae33b configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros
These two macros set the prefix for the OPAL and ORTE libraries,
respectively.  Specifically, the OPAL library will be named
libPREFIXopen-pal.la and the ORTE library will be named
libPREFIXopen-rte.la.

These macros must be called, even if the prefix argument is empty.

The intent is that Open MPI will call these macros with an empty
prefix, but other projects (such as ORCM) will call these macros with
a non-empty prefix.  For example, ORCM libraries can be named
liborcm-open-pal.la and liborcm-open-rte.la.

This scheme is necessary to allow running Open MPI applications under
systems that use their own versions of ORTE and OPAL.  For example,
when running MPI applications under ORTE, if the ORTE and OPAL
libraries between OMPI and ORCM are not identical (which, because they
are released at different times, are likely to be different), we need
to ensure that the OMPI applications link against their ORTE and OPAL
libraries, but the ORCM executables link against their ORTE and OPAL
libraries.
2014-10-22 10:32:19 -07:00
Jeff Squyres
01fd96bfa5 Revert "Provide a mechanism by which an upstream project can rename
the OPAL and ORTE libraries. This is required by projects such as ORCM
that have their own ORTE and OPAL libraries in order to avoid library
confusion. By renaming their version of the libraries, the OMPI
applications can correctly dynamically load the correct one for their
build."

This reverts commit 63f619f871.
2014-10-22 10:32:11 -07:00
yosefe
b4f569b4d4 yalla: address comments on #246 by @jsquires 2014-10-22 10:42:56 +03:00
yosefe
ce7c748e51 Add new PML yalla, which uses mxm directly to reduce overhead.
http://starwars.wikia.com/wiki/Ubed_Yalla
2014-10-21 16:08:24 +03:00
Nadezhda Kogteva
2bce929330 MTL MXM cleanup: unnecessary OMPI_MTL_MXM_CONNECT_ON_FIRST_COMM variable removed 2014-10-20 10:29:47 +03:00
bosilca
d819939841 Merge pull request #233 from ggouaillardet/rfc/coll_module_disable
Provide a symmetric behavior for the activation/deactivation of collective modules.
2014-10-16 09:22:04 -04:00
George Bosilca
7541c03b4c Mark all instances where atomic operations are used but their return value is unnecessary 2014-10-15 21:47:32 -04:00
Edgar Gabriel
0219c87039 set the fs_ptr to NULL in case of an error, to avoid a malicious free on file_close. 2014-10-14 13:09:06 -05:00
Gilles Gouaillardet
e3f74aca1c Correctly mote the pointer back by the true_lb.
Fixes #231
2014-10-14 16:26:54 +09:00
Gilles Gouaillardet
0f983d5a4f add a disable function for coll module 2014-10-14 14:46:36 +09:00
Devendar Bureddy
7a6b4c36b0 HCOLL: Update the proc structure dereference
Update the proc structure dereference to reflect the new opal_proc_t
super field
2014-10-13 20:49:19 +03:00
Devendar Bureddy
b8d2a15be9 HCOLL: by default off 2014-10-13 20:49:09 +03:00
Vasily Filipov
a215a4831d MTL/MXM: disable "bulk_connect" by default. 2014-10-13 09:47:56 +03:00
Ralph Castain
63f619f871 Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build. 2014-10-10 11:39:08 -07:00
Gilles Gouaillardet
8eb2d62919 coll/sm: fix an other memory leak 2014-10-10 19:54:45 +09:00
Gilles Gouaillardet
27e4389259 * comment on communicator creation in mca_topo_base_dist_graph_create(...)
* use accesors to retrieve topo info
2014-10-10 16:07:20 +09:00
Gilles Gouaillardet
5d44a30111 coll/sm: fix minor memory leaks
port 4488.1.patch attached in #196 to master
2014-10-10 14:21:34 +09:00
Gilles Gouaillardet
76204dfafe coll/basic: fix segmentation fault in neighborhood collectives if the degree
of the topology is higher than the communicator size

It is possible to have a topology degree higher than the size of the communicator.
For example, a periodic cartesian communicator on MPI_COMM_SELF. This will leave
the neighborhood collectives with a request buffer that is too small.

This commits introduces a semantic change :
from now, c_topo must be set before invoking coll_select
2014-10-10 11:56:04 +09:00
Gilles Gouaillardet
2f67f29b85 Revert "coll/basic: fix segmentation fault in neighborhood collectives if the degree"
This reverts commit 9c788ff940.
2014-10-10 11:29:06 +09:00
Elena
c905fe9b78 pmix: removed pmix_base_direct modex mca parameter, renamed orte_full_modex_cutoff and ompi_hostname_cutoff to direct_modex_cutoff 2014-10-09 06:15:31 +02:00
Nathan Hjelm
eed7b45db5 osc/rdma: fix issue identified by Berk Hess
osc/rdma uses counters to determine if all messages have been received
before exiting synchronization calls. The problem is that the active
target counter is always increasing (never zeroed). If over 2^31-1
messages are sent this causes the counter to overflow (in itself this
isn't an error). This causes test/wait to return before the communication
is complete. There is an additional error in the use of the fragment
flush function. If PSCW synchronization is in use this function CAN NOT
be called unless a post message has arrived.

Relevant mailing list thread: http://www.open-mpi.org/community/lists/devel/2014/10/16016.php

This commit fixes both issues. Tested against MTT and issue reproducer.

Closes #224.
2014-10-07 11:45:22 -06:00
Jeff Squyres
8468424f45 distscript: remove configure.params and autogen.subdirs kruft
Remove configure.params support: configure.params hasn't been used in
years.

Also remove autogen.subdirs support; those should really be handled by
their respective Makefile.am's.
2014-10-02 11:32:54 -07:00
Howard Pritchard
bb65835816 Fix iallgather problem with intercommunicators
A problem was found with the libnbc MPI_Iallgather
routine when using intercommunicators.  Special
thanks to Takahiro Kawashima(Fujitsu) for the patch
and a test case.  Verified master fails without the
patch and the test passes with the patch applied.

fixes #219
2014-10-02 11:45:17 -06:00
Jeff Squyres
72704441a2 URLs: update URLs for GitHub 2014-10-01 14:44:09 -07:00
Howard Pritchard
0f74467264 switch to ompi_mpi_thread_provided for ts check
Use ompi_mpi_thread_provided rather than opal_using_threads macro
to check whether MPI_THREAD_MULTIPLE is being used.

This commit was SVN r32815.
2014-09-29 22:20:35 +00:00
Howard Pritchard
7069f2361a disqualify coll ml for MPI_THREAD_MULTIPLE
This commit was SVN r32814.
2014-09-29 21:02:15 +00:00
Ralph Castain
eb95d6f892 ompi_info_get_bool returns "success" if the value isn't found, setting "flag" to false, but doesn't set the value of the param itself. So if you don't specify "blocking_fence" in MPI_Info, then the "blocking_fence" flag wasn't being set.
Initialize the blocking_fence flag to false as the code logic indicates that it should only be set if someone provides that flag.

Thanks to Lisandro Dalcin for reporting it

cmr=v1.8.4:reviewer=hjelmn

This commit was SVN r32812.
2014-09-29 17:21:28 +00:00
George Bosilca
49e79a9ade Fix the case of a single process.
This commit was SVN r32807.
2014-09-28 22:06:39 +00:00
Nathan Hjelm
9c788ff940 coll/basic: fix segmentation fault in neighborhood collectives if the degree
of the topology is higher than the communicator size

It is possible to have a topology degree higher than the size of the communicator.
For example, a periodic cartesian communicator on MPI_COMM_SELF. This will leave
the neighborhood collectives with a request buffer that is too small. This commit
adds a call that will dynamically increase the size of the request buffer if it
is too small.

A better fix would be to create the topology *before* calling the coll_select
routine on a communicator. This will take some discussion and the solution will
not likely be ready anytime soon.

Thanks to Lisandro Dalcin for reporting this.

Original thread: http://www.open-mpi.org/community/lists/devel/2014/08/15713.php

cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32796.
2014-09-25 17:43:29 +00:00
Edgar Gabriel
05c34946f7 implementation of non-blocking read/write operations through aio
functions for the posix module. Som interface changes for the fbtl were
necessary for that.

This commit was SVN r32777.
2014-09-23 21:27:57 +00:00
Rolf vandeVaart
5c73101a72 Fix typo.
This commit was SVN r32755.
2014-09-18 13:58:54 +00:00
Vasily Filipov
c7c63fe73e COLL/TUNED: alltoall - return previous default values of algorithm choosing decision thresholds (were changed by r32735)
reviewed by miked
    cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32753.

The following SVN revision numbers were found above:
  r32735 --> open-mpi/ompi@5fecf65daf
2014-09-18 08:07:51 +00:00
Rolf vandeVaart
8db1f89dd1 Small change to allow CUDA-aware to work with non-reduction nonblocking collectives.
Only used when CUDA-aware feature compiled in.

This commit was SVN r32750.
2014-09-17 16:55:01 +00:00
Vasily Filipov
ff10b25e7d warnings (caused by commit r32735) fix.
reviewed by miked
    cmr=v1.8.3:reviewer=ompi-rm1.8 

This commit was SVN r32740.

The following SVN revision numbers were found above:
  r32735 --> open-mpi/ompi@5fecf65daf
2014-09-16 06:33:49 +00:00
Vasily Filipov
5fecf65daf OMPI/COLL/Tuned: add command line params for thresholds to decide if small/intermediate MSGs alltoall algorithm will be used.
cmr=v1.8.3:reviewer=miked

This commit was SVN r32735.
2014-09-15 12:34:21 +00:00
Mangala Jyothi Bhaskar
dc05b709a7 it is ok to not have a sharedfp component selected, as long as no
sharedfp functionality is being used. Return an error however if no
sharedfp component is selected and the applications calls a
file_read/write_shared function.

This commit was SVN r32718.
2014-09-12 21:15:58 +00:00
Edgar Gabriel
597177cd8b silence a warning regarding the return value of the fbtl's.
This commit was SVN r32717.
2014-09-12 18:01:30 +00:00
Mangala Jyothi Bhaskar
cd78a3a026 Fixed offset data type in communication
This commit was SVN r32710.
2014-09-11 14:51:30 +00:00
Mangala Jyothi Bhaskar
4ff21d6178 Fixed offset data type in communication
This commit was SVN r32709.
2014-09-11 14:51:07 +00:00
Mangala Jyothi Bhaskar
6e5f2c8ae8 Fixed offset data type in communication
This commit was SVN r32708.
2014-09-11 14:50:30 +00:00
Edgar Gabriel
4ccc0f5ea2 the length of the iov array should be limited to IOV_MAX, which is defined in limits.h
This commit was SVN r32706.
2014-09-10 21:59:45 +00:00
Edgar Gabriel
cc46b65a5e the fbtl interfaces should really be an ssize_t not a size_t, since the return
value could be negative, which is allowed for ssize_t, but not for size_t.

This commit was SVN r32700.
2014-09-10 15:01:54 +00:00
Edgar Gabriel
599cb7b351 update the pvfs2 fbtl to return the number of bytes generated.
This commit was SVN r32699.
2014-09-10 13:32:06 +00:00
Edgar Gabriel
3a5f4f72da make the zero byte read/write scenarios work without the contiguous flag.
This commit was SVN r32690.
2014-09-09 16:26:14 +00:00
Edgar Gabriel
6a607caed8 fix some zero byte allocation scenarios.
This commit was SVN r32689.
2014-09-09 16:25:44 +00:00
Edgar Gabriel
ed02927767 - do not set the contiguous memory option in the collective operations. It
should not be stored on the file handle anyway, since it is not a property of
the file.
- protect a realloc for zero byte scenarios.

This commit was SVN r32678.
2014-09-07 18:09:43 +00:00
Edgar Gabriel
0d425e2f74 resetting the counter for the iov array has to happen outside of the if statement.
This commit was SVN r32677.
2014-09-07 16:30:56 +00:00
Edgar Gabriel
0f59ce6591 use the fbtl return value as originally intended, namely to retrieve the
number of bytes written and read. Status contains now the actual number of
bytes written for individual operations. For collective operations, this is
unfortunately not possible.

This commit was SVN r32674.
2014-09-07 15:14:57 +00:00
Ralph Castain
41c6058153 Bring over changes to MXM from pmix branch:
MTL MXM: establish endpoint connection on the first communication when direct_modex used

This commit was SVN r32668.
2014-09-03 18:22:11 +00:00
Gilles Gouaillardet
edfbeba7bf coll/ml: better error handling
when CHECK_AND_RECYCLE detects an error, a message is displayed
if the error occurs on an intrinsic communicator, then abort
the program (instead of trying to free the communicator)

cmr=v1.8.3:reviewer=hjelmn

This commit was SVN r32659.
2014-09-01 10:00:49 +00:00
Ralph Castain
b554cd7d86 Turn off the coll/ml component if --without-hwloc was given
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32621.
2014-08-27 20:25:39 +00:00
Edgar Gabriel
46de730059 fix a typo
This commit was SVN r32603.
2014-08-25 20:53:19 +00:00
Edgar Gabriel
52eac0146d cleanup of the fbtl interfaces: remove the *sorted optimization flag, since it
was not used anyway in the last two years. Simplifies the code significantly.

This commit was SVN r32602.
2014-08-25 18:04:24 +00:00
Todd Kordenbrock
6a3225d800 Fix invalid symbols left by the PMIx merge.
This commit was SVN r32597.
2014-08-25 16:30:26 +00:00
Ralph Castain
ac0c584eb7 Add missing file
This commit was SVN r32588.
2014-08-23 04:31:35 +00:00
Ralph Castain
b1a7375192 Fix the "unreachable" message so it outputs the correct hostname for the remote proc. Cleanup some of the pmix stuff when running corner cases of errors
This commit was SVN r32584.
2014-08-22 19:20:45 +00:00
Vishwanath Venkatesan
b176787d0f Remove unwanted spaces + Test commit
This commit was SVN r32576.
2014-08-22 05:11:17 +00:00
Edgar Gabriel
9987135da0 add initial support for non-blocking read and write operations.
This commit was SVN r32571.
2014-08-22 01:34:19 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Mangala Jyothi Bhaskar
a5973c3f8c revamp of the aggregator selection logic, part 1.
This commit was SVN r32557.
2014-08-20 19:28:04 +00:00
Rolf vandeVaart
8709071819 Fix missing help file.
This commit was SVN r32550.
2014-08-18 21:52:31 +00:00
Edgar Gabriel
fabad95b8e - extend the explicit offset patch to collective explicit offset operations as
well
- minor restructuring to support the shared file pointer operations correctly
  for explicit offsets

This commit was SVN r32538.
2014-08-15 14:03:29 +00:00
Edgar Gabriel
d773dc8aa5 make arbitrary sequences of explicit and implicit offset operations work properly.
This commit was SVN r32537.
2014-08-15 01:49:43 +00:00
Edgar Gabriel
da1b6c2e87 some code reorganization in preparation for non-blocking read and write
operations.

This commit was SVN r32534.
2014-08-14 20:17:58 +00:00
Alina Sklarevich
a914c68356 MTL MXM: fix check-help-string.pl errors and warnings.
This commit was SVN r32533.
2014-08-14 07:46:56 +00:00
Edgar Gabriel
e401b68ca5 fix the zero byte fileview problem reported by Mohamad on the mailinglist
This commit was SVN r32529.
2014-08-13 23:44:43 +00:00
Vasily Filipov
5ca2fffa44 MTL/MXM: call for ompi_proc_world instead of ompi_comm_size during del_procs.
This commit was SVN r32504.
2014-08-11 11:52:23 +00:00
Gilles Gouaillardet
cf9e144f05 silence warnings
gcc 3.4.3 on solaris 10 issues some warnings

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32500.
2014-08-11 07:36:46 +00:00
Gilles Gouaillardet
22cb8a1834 check-help-strings cleanup
This commit was SVN r32497.
2014-08-11 03:27:45 +00:00
Gilles Gouaillardet
f24699623f check-help-strings cleanup
This commit was SVN r32495.
2014-08-11 03:25:22 +00:00
Gilles Gouaillardet
b565e69b86 check-help-strings cleanup
This commit was SVN r32491.
2014-08-11 03:19:57 +00:00
Mike Dubman
3c8a4d7d2d mxm: opal refactoring voices
http://www.open-mpi.org/community/lists/devel/2014/08/15590.php

This commit was SVN r32486.
2014-08-10 04:35:56 +00:00
Mike Dubman
0f60c34a9f fca: adopt opal API refactoring, fix warning.
based on http://www.open-mpi.org/community/lists/devel/2014/08/15558.php

This commit was SVN r32484.
2014-08-09 15:50:51 +00:00
Ralph Castain
e95187514c Update the proc structure dereference to reflect the new opal_proc_t super field
This commit was SVN r32462.
2014-08-08 16:12:49 +00:00
Jeff Squyres
132375f07f helpfiles: fix filenames referenced by calls to show_help()
This commit was SVN r32453.
2014-08-08 13:34:15 +00:00
Jeff Squyres
80a7309462 helpfiles: remove empty helpfiles
This commit was SVN r32452.
2014-08-08 13:33:47 +00:00
Jeff Squyres
c6d9bf906e configury: ensure wrapper static LIBS is filled properly
In core library portions of the configury (e.g., top-level
configure.ac itself), we were calling AC_CHECK_LIB and
OPAL_CHECK_FUNC_LIB to check for various libraries.

'''SIDENOTE:''' It turns out that modern Autoconf has AC_SEARCH_LIBS,
which does just about exactly what OPAL_CHECK_FUNC_LIB does.  So this
commit effectively replaces OPAL_CHECK_FUNC_LIB with AC_SEARCH_LIBS.

However, we never bothered to add these found libraries to the wrapper
compiler list of libraries used for static linking (doh!).  We've been
getting lucky for quite a while that components were adding the same
libraries to their wrapper compiler LIBS list.  

This is problematic, however, if we don't build some of these
components.  For example, Paul Hargrove noticed that if he configured
with --disable-shared --enable-static --disable-io-romio, ROMIO was no
longer adding some libraries to the wrapper LIBS list -- libraries
that just happened to also be needed by core OPAL/ORTE/OMPI layers.

The solution is not to use AC_CHECK_LIB or OPAL_CHECK_FUNC_LIB, but
use a pair of new macros:

 * OPAL_SEARCH_LIBS_CORE: a wrapper around AC_SEARCH_LIBS.  If we add
   something to $LIBS, then also add it to the wrapper list of static
   libraries.  This is the main piece of functionality that was
   wrong/missing.
 * OPAL_SEARCH_LIBS_COMPONENT: similar to OPAL_SEARCH_LIBS_CORE, but
   instead of directly adding it to the wrapper list of static
   libaries, add it to <framework>_<component>_LIBS (which eventually
   gets slurped up into the wrapper list of static libraries.  See the
   lengthy comment in config/opal_setup_wrappers.m4 near the beginning
   of OPAL_SETUP_WRAPPER_INIT() for a more detailed explanation).
   Most components did this correctly already, but one or two weren't
   right, so I implemented this second macro quite similar to the
   first and put it everywhere we already used AC_SEARCH_LIBS or
   OPAL_CHECK_FUNC_LIB.

This needs to soak for a day or two on the trunk before moving to the
v1.8 branch.

Refs trac:4834

cmr=v1.8.2:reviewer=ggouaillardet

This commit was SVN r32447.

The following Trac tickets were found above:
  Ticket 4834 --> https://svn.open-mpi.org/trac/ompi/ticket/4834
2014-08-07 23:54:45 +00:00
Ralph Castain
1e93e85403 Cleanup some autoconf messages - thanks to Paul Hargrove for noting them
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32429.
2014-08-05 14:48:42 +00:00
Ralph Castain
5f244e8b19 Use opal_getpagesize to get the proper page size
Refs trac:4826

This commit was SVN r32422.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-04 20:23:00 +00:00
Ralph Castain
2ceaa8a1dd Per patch from Thomas, remove orte abstraction breaks
This commit was SVN r32413.
2014-08-04 17:07:50 +00:00
Gilles Gouaillardet
5f1e0f284a Fix compilation when --enable-hetorogeneous
This commit was SVN r32410.
2014-08-04 10:35:08 +00:00
Gilles Gouaillardet
5b1ae87c76 coll/ml: fix ML_ERROR/printf parameters
cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32409.
2014-08-04 04:05:59 +00:00
Gilles Gouaillardet
f7b13d1126 Fix missing ampersand.
also replase the OMPI_CAST_RTE_NAME macro with
an inline function if OPAL_ENABLE_DEBUG, so we can
get warnings from the compiler if ampersand is missing.

Thanks to Paul Hargrove for reporting the bugs

This commit was SVN r32408.
2014-08-04 02:52:56 +00:00
Ralph Castain
61bf7af9d2 Per Paul Hargrove's suggestion, create an opal_pagesize function to abstract the various ways of obtaining that value. Rather than creating a separate file for only that one function, put it in a convenient place that is at least somewhat related.
Refs trac:4826

This commit was SVN r32407.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-02 18:38:16 +00:00
Ralph Castain
daeb9b6c4f Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain.
Thanks to Gilles for pointing out some of the discrepancies.

This commit was SVN r32398.
2014-08-01 14:44:11 +00:00
Gilles Gouaillardet
cd8fa75f87 coll/ml: align on page size as returned by sysconf
Thanks to Paul Hargrove for pointing into the right direction

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32393.
2014-08-01 08:10:12 +00:00
George Bosilca
cee2a4e5c8 Missing alloca.h. Thanks Paul for catching this.
This commit was SVN r32388.
2014-08-01 03:28:23 +00:00
Ralph Castain
309d75dadc Add missing ampersand - function call required a pointer, not the name itself
This commit was SVN r32357.
2014-07-30 14:48:20 +00:00
Gilles Gouaillardet
b95537376f bcol/basesmuma: fix parameter order
Ref: #4815

This commit was SVN r32353.
2014-07-30 05:38:53 +00:00
Nathan Hjelm
0a32ea87e7 Remove unneeded EXTRA_DIST
This commit was SVN r32343.
2014-07-29 16:54:11 +00:00
George Bosilca
815d7bc846 Fix the example to use the new 2_2 topo component.
This commit was SVN r32338.
2014-07-29 07:00:01 +00:00
George Bosilca
78eae6108a Add an example on how to handle topologies.
This commit was SVN r32337.
2014-07-29 05:05:59 +00:00
Nathan Hjelm
1407c1f501 Remove RML code from common/sm
The only user of this code was coll/sm. I implemented a basic replacement
for the removed code. This gets the trunk compiling again with
--disable-dlopen.

This commit was SVN r32333.
2014-07-28 22:00:12 +00:00
Nathan Hjelm
0f15afa4d9 Fix typo in psm mtl
This commit was SVN r32332.
2014-07-28 22:00:03 +00:00
Nathan Hjelm
603ba71b0d Ignore components broken by the BTL move.
common/ofacm is only used by the iboffload code in ompi. This code does
not currently work so it is safe to ignore these components until it is
fixed.

This commit was SVN r32331.
2014-07-28 21:24:18 +00:00
Ryan Grant
caa10a5faf Portals fixes after latest move
This commit was SVN r32330.
2014-07-28 19:25:03 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Jeff Squyres
0ab4eaa7d3 usnic: revert r32315 because the BTL move to opal is ongoing
Let's not make the move to OPAL any harder than it has to be; this
commit can wait until after the BTL move.

This commit was SVN r32316.

The following SVN revision numbers were found above:
  r32315 --> open-mpi/ompi@7b7ed8ed97
2014-07-25 14:17:20 +00:00
Jeff Squyres
7b7ed8ed97 usnic: minor cleanup / consolidation
CMR'ing just to (try to) keep the differences between trunk and v1.8
branch (somewhat) small.

Reviewed by Dave Goodell

cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32315.
2014-07-25 14:11:54 +00:00
Jeff Squyres
6ae45b34fc usnic: check connectivity on first communication to a peer
Previously, we were only checking connectivity upon first ''send'' to
a peer.  But this ignores the case where the first communication to a
peer is actually an ACK -- i.e., we successfully received something
from the peer and we need to send an ACK back.  So we need to verify
that the ACK will actually get there.

Specifically, certain asymmetric routing cases can lead to a hang if
we don't check the connectivity in both directions.  E.g., if the
sender is able to get traffic to the receiver, but the receiver is
unable to get traffic back to the sender because it made a different
routing decision than the sender.

In this case, the connectivity checker from the sender could succeed
(because the connectivity checker will ACK along the same path in
which the ping was received), but sending a BTL ACK could fail
(because the BTL ACK will be sent back along the path chosen by the
graph algorithm, which, in an erroneous asymmetric routing scenario,
may be different/wrong).

Hence, we want to trigger the connectivity checker at the first
communication from A->B, which may either be a BTL send or an ACK.

Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32309.
2014-07-24 21:32:56 +00:00
Rolf vandeVaart
9bc8fbaefd Create new error message so we can better pinpoint where an error occurs.
This commit was SVN r32303.
2014-07-24 15:18:55 +00:00
Rolf vandeVaart
3f703afb97 Fix CUDA registration where we run out of memory being allocated.
This commit was SVN r32297.
2014-07-23 21:10:17 +00:00
Todd Kordenbrock
42a871efd4 This commit fixes trac:4662 - "Portals4/MTL hangs in c_get_accumulate test".
- Portals4/OSC was unable to acquire an exclusive lock due to an invalid
local address in the atomic operation.  This caused the reported hang.
- After fixing the hang, the test continued to fail because
ompi_datatype_is_contiguous_memory_layout() reports that MPI_EMPTY (the
origin datatype) is noncontiguous and Portals4/OSC does not support
noncontiguous datatypes at this time.  However, in this case the origin
count is zero so contiguous/noncontiguous is irrelevant.  Now we skip
the contiguous check if the count is zero.

cmr=v1.8.3:reviewer=regrant:subject=Fix for "Portals4/MTL hangs in c_get_accumulate test"

This commit was SVN r32295.

The following Trac tickets were found above:
  Ticket 4662 --> https://svn.open-mpi.org/trac/ompi/ticket/4662
2014-07-23 19:13:07 +00:00
Edgar Gabriel
d4f83ab929 clean up of the MCA parameters of the fcoll framework. Most parameters are now
set/retrieved in ompio instead of the fcoll components.

This commit was SVN r32294.
2014-07-23 19:03:14 +00:00
Jeff Squyres
ac2621debf usnic: show_help if we can't create the connectivity map file
QA ran across the case where the user can't write to the target
directory for the connectivity map file.  In this case, we silently
continued.  They requested that we at least warn in this case.

Fixes Cisco bug CSCup62821

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32283.
2014-07-22 20:50:59 +00:00
Devendar Bureddy
74852b4d21 HCOLL: fix misplaced hcoll_init return value check.
cmr=v1.8.2:reviewer=jladd

This commit was SVN r32282.
2014-07-22 18:47:34 +00:00
Rolf vandeVaart
63d6a08283 Fix set-but-unused-warning noticed by jsquyres.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32281.
2014-07-22 18:37:40 +00:00
Howard Pritchard
828a4a29b7 Subject: fix regression in ugni btl eager get path
Description:
This mod fixes a regression in the ugni btl eager get
path introduced in changeset 32196.
References:4800
Closes:4800

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32264.
2014-07-22 15:42:11 +00:00
Rolf vandeVaart
8778418da2 Remove some debug #ifdefs (oops). Other lock support.
This commit was SVN r32263.
2014-07-22 02:09:06 +00:00
Rolf vandeVaart
1a61dd3078 Add some more locks where needed.
This commit was SVN r32262.
2014-07-22 00:29:57 +00:00
Rolf vandeVaart
7897d2a828 Improve verbose message which says which device:ports are being used. Also move where message is generated.
This commit was SVN r32261.
2014-07-21 20:38:52 +00:00
Jeff Squyres
da18eb1b8b common/verbs: fix usnic detection
The logic was mishandling the case of a newer kernel and an older
libusnic_verbs.  Simplify usnic_transport() to return constants in the
2 known cases (not a usNIC device and the TRANSPORT_USNIC_UDP case),
and call the magic probe in all other cases.

Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32260.
2014-07-21 19:52:29 +00:00
Jeff Squyres
b6075ea775 usnic: explicitly handle case when both endpoints are NULL
If we don't explicitly declare that (a == NULL && b == NULL) is
equivalent to qsort, we could end up with wonky sorting order.  I.e.,
it's *possible* that some NULLs could end up in the middle of the
array.

Regardless of whether it will ever happen in practice, it makes the
code more clear to also handle the "both are NULL" case.

Also fix the 2-spacing indents.

Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32259.
2014-07-21 16:22:48 +00:00
Rolf vandeVaart
947a4e14b4 Add a lock and clean up handling of some error conditions,
This commit was SVN r32258.
2014-07-17 19:33:10 +00:00
Mike Dubman
da8df859b3 MXM: use builk connection establishment API
fixed by Vasily, reviewed by Yossi/Miked

cmr=v1.8.2:reviwer=ompi-rm1.8

This commit was SVN r32256.
2014-07-17 08:35:55 +00:00
Rolf vandeVaart
26e3282a18 One more minor movement for easier reading. No functional change.
This commit was SVN r32252.
2014-07-16 20:59:07 +00:00
Rolf vandeVaart
c332ca75ff Change function name for clarity.
This commit was SVN r32251.
2014-07-16 20:46:10 +00:00
Rolf vandeVaart
a2dd4ca226 Remove hack that is no longer needed.
This commit was SVN r32250.
2014-07-16 14:00:17 +00:00
Rolf vandeVaart
61821adf2f Fix self deadlock bugs.
This commit was SVN r32249.
2014-07-15 20:50:41 +00:00
Ralph Castain
6c5e592785 Revert r32222, r32210, and r32203 as they created a problem when daemon collectives did not involve app procs on every node. Instead, modify the ompi/mca/rte/orte/rte_orte.h to add a new function that allows apps to request new daemon collective ids for use in barrier and modex operations. This will only appear in ORTE-based installations, but it is only being used by a couple of researchers at the moment.
Update the orte/test/mpi/coll_test.c test to show the revised example.

This commit was SVN r32234.

The following SVN revision numbers were found above:
  r32203 --> open-mpi/ompi@a523dba41d
  r32210 --> open-mpi/ompi@2ce11ed5c4
  r32222 --> open-mpi/ompi@d55f16db50
2014-07-15 03:48:00 +00:00
Nathan Hjelm
f960e4273e Fix typo in r32196
The wrong descriptor field was used when calculating the size received when
using the RDMA rendevous protcol.

This commit was SVN r32232.

The following SVN revision numbers were found above:
  r32196 --> open-mpi/ompi@a14e0f10d4
2014-07-14 21:00:53 +00:00
Ralph Castain
3d1b32a2c6 Silence warning
cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32231.
2014-07-14 19:27:30 +00:00
Mike Dubman
e342a11c2e opal envlist mca: implement Jeff`s quibbles
fixed by Elena, reviewed by Miked

This commit was SVN r32216.
2014-07-11 07:23:20 +00:00
Gilles Gouaillardet
77184b5c4c Fix a cornercase with MPI_PROC_NULL persistent requests
Handle OMPI_REQUEST_NOOP in MPI_Startall rather than PML

cmr=v1.8.2:reviewer=bosilca:ticket=4764

This commit was SVN r32213.

The following Trac tickets were found above:
  Ticket 4764 --> https://svn.open-mpi.org/trac/ompi/ticket/4764
2014-07-11 04:37:01 +00:00
Gilles Gouaillardet
d3ff5d77e1 scif: Fix compile error related to r32196
This commit was SVN r32212.

The following SVN revision numbers were found above:
  r32196 --> open-mpi/ompi@a14e0f10d4
2014-07-11 04:32:25 +00:00
Jeff Squyres
7384ee9e44 usnic: handle NULL endpoints in connectivity map
The connectivity map output routine needs to handle the case where
entries in the endpoints array are NULL (e.g., if one process has 2
endpoints and another process has only 1 endpoint).

Fixes Cisco bug CSCup83649.

cmr=v1.8.2

This commit was SVN r32211.
2014-07-11 00:43:45 +00:00
Ralph Castain
a523dba41d NOTE: this modifies the MPI-RTE interface
We have been getting several requests for new collectives that need to be inserted in various places of the MPI layer, all in support of either checkpoint/restart or various research efforts. Until now, this would require that the collective id's be generated at launch. which required modification
s to ORTE and other places. We chose not to make collectives reusable as the race conditions associated with resetting collective counters are daunti
ng.

This commit extends the collective system to allow self-generation of collective id's that the daemons need to support, thereby allowing developers to request any number of collectives for their work. There is one restriction: RTE collectives must occur at the process level - i.e., we don't curren
tly have a way of tagging the collective to a specific thread. From the comment in the code:

 * In order to allow scalable
 * generation of collective id's, they are formed as:
 *
 * top 32-bits are the jobid of the procs involved in
 * the collective. For collectives across multiple jobs
 * (e.g., in a connect_accept), the daemon jobid will
 * be used as the id will be issued by mpirun. This
 * won't cause problems because daemons don't use the
 * collective_id
 *
 * bottom 32-bits are a rolling counter that recycles
 * when the max is hit. The daemon will cleanup each
 * collective upon completion, so this means a job can
 * never have more than 2**32 collectives going on at
 * a time. If someone needs more than that - they've got
 * a problem.
 *
 * Note that this means (for now) that RTE-level collectives
 * cannot be done by individual threads - they must be
 * done at the overall process level. This is required as
 * there is no guaranteed ordering for the collective id's,
 * and all the participants must agree on the id of the
 * collective they are executing. So if thread A on one
 * process asks for a collective id before thread B does,
 * but B asks before A on another process, the collectives will
 * be mixed and not result in the expected behavior. We may
 * find a way to relax this requirement in the future by
 * adding a thread context id to the jobid field (maybe taking the
 * lower 16-bits of that field).

This commit includes a test program (orte/test/mpi/coll_test.c) that cycles 100 times across barrier and modex collectives.

This commit was SVN r32203.
2014-07-10 18:53:12 +00:00
Nathan Hjelm
1b9621eeb0 Fix typo in r32196
This commit was SVN r32202.

The following SVN revision numbers were found above:
  r32196 --> open-mpi/ompi@a14e0f10d4
2014-07-10 18:43:49 +00:00
Nathan Hjelm
32ab6f850e osc/rdma: fix warning
cmr=v1.8.2:reviewer=rhc

This commit was SVN r32201.
2014-07-10 18:42:55 +00:00
Jeff Squyres
3c4674484d usnic: Fix compile errors related to r32196
This commit was SVN r32198.

The following SVN revision numbers were found above:
  r32196 --> open-mpi/ompi@a14e0f10d4
2014-07-10 17:18:03 +00:00
Nathan Hjelm
a14e0f10d4 Per RFC: Remove des_src and des_dst members from the
mca_btl_base_segment_t and replace them with des_local and des_remote

This change also updates the BTL version to 3.0.0. This commit does
not represent the final version of BTL 3.0.0. More changes are coming.

In making this change I updated all of the BTLs as well as BTL user's
to use the new structure members. Please evaluate your component to
ensure the changes are correct.

RFC text:

This is the first of several BTL interface changes I am proposing for
the 1.9/2.0 release series.

What: Change naming of btl descriptor members. I propose we change
des_src and des_dst (and their associated counts) to be des_local and
des_remote. For receive callbacks the des_local member will be used to
communicate the segment information to the callback. The proposed change
will include updating all of the doxygen in btl.h as well as updating
all BTLs and BTL users to use the new naming scheme.

Why: My btl usage makes use of both put and get operations on the same
descriptor. With the current naming scheme I need to ensure that there
is consistency beteen the segments described in des_src and des_dst
depending on whether a put or get operation is executed. Additionally,
the current naming prevents BTLs that do not require prepare/RMA matched
operations (do not set MCA_BTL_FLAGS_RDMA_MATCHED) from executing
multiple simultaneous put AND get operations. At the moment the
descriptor can only be used with one or the other. The naming change
makes it easier for BTL users to setup/modify descriptors for RMA
operations as the local segment and remote segment are always in the
same member field. The only issue I forsee with this change is that it
will require a little more work to move BTL fixes to the 1.8 release
series.

This commit was SVN r32196.
2014-07-10 16:31:15 +00:00
Howard Pritchard
0bc7405e07 Subject: fix name conflict when both ugni and scif installed on system
Description: This mod fixes two name conflicts between the ugni and scif btls.
References:4771
Closes:4771

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32183.
2014-07-09 19:33:58 +00:00
Nathan Hjelm
56ad231b7c coll/ml: temporarily disable binding check
This commit was SVN r32178.
2014-07-09 14:39:49 +00:00
Joshua Ladd
057370364d Opal: Add a new MCA variable type "version_string". Also add a
new flag to ompi_info that allows a user to print all MCA variables of a specific type.  

 --type version_string

This command will print all MCA variables of type version_string.

This feature was developed by Elena Shipunova and was reviewed by Josh Ladd.

This commit was SVN r32166.
2014-07-09 01:37:23 +00:00
Nathan Hjelm
b6abe68972 osc/rdma: check for more types of window access violations
This commit adds a check to see if the target is in an access epoch. If
not we return OMPI_ERR_RMA_SYNC. This fixes test_start3 in the onesided
test suite. The cost of this extra check is 1 byte/peer for the boolean
flag indicating that the peer is in an access epoch.

I also fixed a problem where mupliple unexpected post messages are not
correctly handled.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32160.
2014-07-08 21:11:12 +00:00
Jeff Squyres
d63cf04d2e btl_usnic_map.c: Arrgh! Forgot to svn add this file.
cmr=v1.8.2:ticket=trac:4773

This commit was SVN r32159.

The following Trac tickets were found above:
  Ticket 4773 --> https://svn.open-mpi.org/trac/ompi/ticket/4773
2014-07-08 20:09:31 +00:00
Jeff Squyres
1e17ab461b usnic: add btl_usnic_connectivity_map MCA param to output link information
If the btl_usnic_connectivity_map MCA param is set to a non-NULL
value, then each MPI process will output a file named
<prefix>-<hostname>.pid<pid>.job<jobid>.mcwrank<MCW rank>.txt.  Its
contents will detail which usNIC device(s) (and therefore which
link(s)) are being used to communicate with each peer MPI process.

Here is a sample output file (named
mpi005.pid26071.job1640759297.mcwrank0.txt):

{{{
device=usnic_0,interface=eth4,ip=10.10.0.5/16,mac=24:57:20:05:20:00,mtu=9000
device=usnic_1,interface=eth5,ip=10.2.0.5/16,mac=24:57:20:05:21:00,mtu=9000
device=usnic_2,interface=eth6,ip=10.3.0.5/16,mac=24:57:20:05:50:00,mtu=9000
peer=1,hostname=mpi006,device=usnic_0@peer_ip=10.10.0.6/16@peer_mac=24:57:20:06:20:00,device=usnic_1@peer_ip=10.2.0.6/16@peer_mac=24:57:20:06:21:00,device=usnic_2@peer_ip=10.3.0.6/16@peer_mac=24:57:20:06:50:00
peer=2,hostname=mpi007,device=usnic_0@peer_ip=10.10.0.7/16@peer_mac=24:57:20:07:20:00,device=usnic_1@peer_ip=10.2.0.7/16@peer_mac=24:57:20:07:21:00,device=usnic_2@peer_ip=10.3.0.7/16@peer_mac=24:57:20:07:50:00
peer=3,hostname=mpi008,device=usnic_0@peer_ip=10.10.0.8/16@peer_mac=24:57:20:08:20:00,device=usnic_1@peer_ip=10.2.0.8/16@peer_mac=24:57:20:08:21:00,device=usnic_2@peer_ip=10.3.0.8/16@peer_mac=24:57:20:08:50:00
}}}

Reviewed by Reese Faucette

cmr=v1.8.2

This commit was SVN r32156.
2014-07-08 19:14:46 +00:00
Nathan Hjelm
309a6cf951 coll/ml: set n_resources to 0 when destructing an lmngr
Also keep track of the allocation base so we free the correct pointer
when cleaning up.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r32151.
2014-07-07 15:11:26 +00:00
Gilles Gouaillardet
8d3bea2771 Fix the cornercase with MPI_PROC_NULL persistent requests.
This corner case is now handled in the pml so the same code
is invoked for both MPI_Start and MPI_Startall.
This also correctly report an error if MPI_Startall is invoked twice
on a MPI_PROC_NULL persistent request.

This commit was SVN r32139.
2014-07-04 04:58:52 +00:00
Edgar Gabriel
a16e4c5bf9 As discussed during the Open MPI meeting, make ompio the default parallel I/O
library on the trunk in order to expose it to more testing.

This commit was SVN r32138.
2014-07-03 20:04:58 +00:00
Jeff Squyres
e022dd30bc usnic: EHOSTUNREACH means there is no route
ibv_create_ah() can also return EHOSTUNREACH, which means that there
is no route to the peer.  Treat that as a non-fatal warning.

Reviewed by Reese Faucette.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32135.
2014-07-03 17:19:30 +00:00
Jeff Squyres
81edddff61 usnic: make this show_help message like the others
There's no need for the port number (since usNIC has no port numbers),
and make the wording the same as other help messages.

Reviewed by Reese Faucette.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32134.
2014-07-03 17:17:34 +00:00
George Bosilca
843ef1fcb0 ompi_mpi_abort had one extra argument that was never used. Clean it up.
This commit was SVN r32124.
2014-07-03 00:34:44 +00:00
George Bosilca
2883adcdf3 Remove useless variables.
This commit was SVN r32123.
2014-07-03 00:30:54 +00:00
Gilles Gouaillardet
8a2a0293fd fix sort_devs_by_distance in btl/openib
no need to #include <math.h> ...

cmr=v1.8.2:reviewer=miked:ticket=4759

This commit was SVN r32121.

The following Trac tickets were found above:
  Ticket 4759 --> https://svn.open-mpi.org/trac/ompi/ticket/4759
2014-07-02 08:08:10 +00:00
Gilles Gouaillardet
134eee1c4f fix sort_devs_by_distance in btl/openib
The distances as returned by hwloc_get_whole_distance_matrix_by_type are typ float.
This patch handle all distances as float.

cmr=v1.8.2:reviewer=miked

This commit was SVN r32120.
2014-07-02 07:56:40 +00:00
Ryan Grant
a1d312343b This commit fixes trac:4681 - ibm c_fence_lock hangs
cmr=v1.8.2:reviewer=tkordenbrock:subject=Portals4/MTL hanging fix

This commit was SVN r32113.

The following Trac tickets were found above:
  Ticket 4681 --> https://svn.open-mpi.org/trac/ompi/ticket/4681
2014-07-01 17:03:03 +00:00
Ryan Grant
5cb8cc856c Refs trac:4682 - This commit fixes c_flush test failure in the ibm test suite for Portals 4 OSC
cmr=v1.8.2:reviewer=tkordenbrock:subject=Move r32112 to v1.8.2 branch

This commit was SVN r32112.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r32112

The following Trac tickets were found above:
  Ticket 4682 --> https://svn.open-mpi.org/trac/ompi/ticket/4682
2014-07-01 16:26:16 +00:00
Mike Dubman
ce6d5b8cd7 HCOLL: make it OFF by default
fixed by miked, reviewed by Alex

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32101.
2014-06-28 18:45:03 +00:00
Dave Goodell
c104604387 common/verbs: update usnic transport probe
RHEL 7 has shipped with kernel support for the RDMA_TRANSPORT_USNIC
enum, but ''not'' the RDMA_TRANSPORT_USNIC_UDP enum.  This means that
when you install usNIC drivers from cisco.com, the kernel will report
IBV_TRANSPORT_USNIC, even though the transport is actually using UDP.

Therefore, we have to modify the logic in common/verbs to do the
additional magic probe if the device reports either an
IBV_TRANSPORT_IWARP or IBV_TRANSPORT_USNIC (because both of those might
be lies -- do the probe to figure out the real transport).

The code changed by this patch is fairly trivial; it simply moves the
logic of the magic probe to its own short function, and then calls that
short function in both the IBV_TRANSPORT_(IWARP|USNIC) cases.  It looks
longer because several lengthy comments were also updated.

Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32098.
2014-06-27 18:43:32 +00:00
Devendar Bureddy
228772ae81 hcoll gatherv support
cmr=v1.8.2:reviewer=jladd

This commit was SVN r32097.
2014-06-26 18:14:41 +00:00
George Bosilca
99561c5cc1 If the enable fails don't give up, but instead keep going with
the other collective modules. If we endup without some of the
collective the code will raise an error anyway.

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32096.
2014-06-26 15:52:45 +00:00
Jeff Squyres
8e52ba423f finalize/disconnect: add explicit comment about why we use an RTE barrier
Based on extensive discussions before/at the June 2014 developer's
meeting, put a lengthy comment explaining a second reason why we
''must'' use an RTE barrier during MPI_FINALIZE and
MPI_COMM_DISCONNECT (i.e., unreliable transports).  Slightly explain
more the original reason why we do this, too (BTLs can lie/buffer a
message without actually injecting it on the network). 

This commit was SVN r32095.
2014-06-26 14:31:40 +00:00
Dave Goodell
f6bb853409 usnic: properly check src iface in route queries
rtnetlink doesn't check the source address when determining whether to
return route info for a query.  So we need to check that the OIF matches
the OIF of the source interface name.  Without this check, OMPI might
pair a local interface which does not have a route to a particular
remote interface.

Fixes Cisco bug CSCup55797.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32090.
2014-06-25 22:39:02 +00:00
Ralph Castain
f3cb124e50 Revert r32082 and r32070 - the developer's conference has decided to go a different direction on the threaded progress effort. This will involve some degree of prototyping to understand the tradeoffs prior to making a final design decision, and so we'll hold off on the final change until that is completed.
This commit was SVN r32089.

The following SVN revision numbers were found above:
  r32070 --> open-mpi/ompi@12d92d0c22
  r32082 --> open-mpi/ompi@aa6438ef7a
2014-06-25 20:43:28 +00:00
Adrian Reber
4b25e92194 get the FT code to compile again by adding/removing #includes
This commit was SVN r32086.
2014-06-25 18:42:17 +00:00
Rolf vandeVaart
aa6438ef7a Remove OMPI_USE_PROGRESS_THREADS that was missed.
This commit was SVN r32082.
2014-06-25 14:42:21 +00:00
Gilles Gouaillardet
fae7adf8ee Remove legacy FCA_IS_LOCAL_PROCESS macro
and use OPAL_PROC_ON_LOCAL_NODE instead

cmr=v1.8.2:reviewer=rhc

This commit was SVN r32079.
2014-06-25 02:37:53 +00:00
Ralph Castain
f70b4a33ec Per the developer conference, let's be a little nicer during MPI_Finalize and ease up on the cpu by inserting usleep into the loop over opal_progress while waiting for the RTE barrier to complete. This is a non-performant area of the code, and while most codes may call finalize at close-to-similar times, there are some that may choose to have one or more procs continue to perform some work prior to finalizing.
So save a little power while we are waiting.

cmr=v1.8.2:reviewer=jladd:subject=save power during finalize

This commit was SVN r32077.
2014-06-24 21:59:50 +00:00
Jeff Squyres
bce33635a7 sctp: remove from trunk
At the developer meeting today, the question was raised as to whether
the SCTP BTL was maintained any more.  I emailed Alan Wagner to see if
he had any interest/resources to continue to maintain the SCTP BTL.
He indicated that he unfortunately had any resources to maintain it;
it would be fine to remove the SCTP BTL from the trunk.

So long, SCTP BTL... fare thee well...

This commit was SVN r32075.
2014-06-24 21:23:09 +00:00
Jeff Squyres
69fa331cc2 openib/ugni: output verbose message when a BTL is ignored due to THREAD_MULTIPLE
usnic and portals4 already do this.

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32074.
2014-06-24 21:13:17 +00:00
Jeff Squyres
d7a2d964f0 usnic: use the correct output stream name
cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r32073.
2014-06-24 18:13:49 +00:00
Ralph Castain
12d92d0c22 Per the OMPI developer conference, remove the last vestiges of OMPI_USE_PROGRESS_THREADS
This commit was SVN r32070.
2014-06-24 17:05:11 +00:00
Jeff Squyres
011db6974e usnic: refactor usnic_add_procs() into 2 distinct parts
1: find/create procs, and create associated endpoint for each
2: resolve peer addresses

The 2nd part is done as a separate loop so that the address lookups
can be parallelized.

The overall result is to split usnic_add_procs() into two smaller,
simpler parts.

cmr=v1.8.2:ticket=trac:4734

This commit was SVN r32062.

The following Trac tickets were found above:
  Ticket 4734 --> https://svn.open-mpi.org/trac/ompi/ticket/4734
2014-06-20 20:58:36 +00:00
Jeff Squyres
1ea7bad5a0 usnic: behave better when ibv_create_ah() fails
When ibv_create_ah() fails due to an address resolution failure, it
really only means that we can't reach that one peer -- so we should
just ignore that one peer.  If ibv_create_ah() fails for some other
reason, then give up on the entire usnic_X device.

Change the show_help() message that is displayed when ibv_create_ah()
fails due to address resolution failure; indicate that it's likely a
routing problem.  Also opal_output_verbose() the same info, since
show_help() is de-duplicated (and this particular show_help() message
can be squelched).

Fixes Cisco bugs CSCup35851 and CSCup35872.

cmr=v1.8.2:ticket=trac:4734

This commit was SVN r32061.

The following Trac tickets were found above:
  Ticket 4734 --> https://svn.open-mpi.org/trac/ompi/ticket/4734
2014-06-20 20:53:50 +00:00
Jeff Squyres
baeae72370 usnic: remove "device" and "port" language from show_help() messages
Move away from verbs-specific terms "device" and "port" in the usnic
BTL help messages.  Replace them with "usNIC interface" (since usNIC
has no concept of a port).

cmr=v1.8.2:ticket=trac:4734

This commit was SVN r32029.

The following Trac tickets were found above:
  Ticket 4734 --> https://svn.open-mpi.org/trac/ompi/ticket/4734
2014-06-18 15:20:50 +00:00
Jeff Squyres
9e32cb9b60 usnic: move some defines into btl_usnic_util.h
Move MACLEN and IPV4LEN into _util.h and rename them to be MACSTRLEN
and IPV4STRLEN, respectively.

cmr=v1.8.2:ticket=trac:4734

This commit was SVN r32028.

The following Trac tickets were found above:
  Ticket 4734 --> https://svn.open-mpi.org/trac/ompi/ticket/4734
2014-06-18 15:19:36 +00:00
Jeff Squyres
7395b65531 usnic: remove some debugging verbose output
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32027.
2014-06-18 14:14:34 +00:00
Nathan Hjelm
fd21b244ce osc/rdma: better name for lookup function
cmr=v1.8.2:ticket=trac:4732:reviewer=ompi-rm1.8

This commit was SVN r32021.

The following Trac tickets were found above:
  Ticket 4732 --> https://svn.open-mpi.org/trac/ompi/ticket/4732
2014-06-17 19:49:17 +00:00
Nathan Hjelm
390f8f52b4 osc/rdma: clean up group process matching a bit
cmr=v1.8.2:ticket=trac:4732:reviewer=dgoodell

This commit was SVN r32018.

The following Trac tickets were found above:
  Ticket 4732 --> https://svn.open-mpi.org/trac/ompi/ticket/4732
2014-06-17 17:48:30 +00:00
Nathan Hjelm
7f20868179 osc/rdma: ensure matching of post/start calls
The post and start window calls are supposed to be matching. The code
did not check to see that an incoming post matched with the start call.
This commit fixes the bug by placing the post on a pending list that
will be checked by the next call to start.

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r32017.
2014-06-17 15:23:06 +00:00
Nathan Hjelm
927098d567 osc/rdma: fix hang when accumulating with MPI_REPLACE
The replace callback did not increment the incoming frag counter. This
leads to a hang during synchronization. This commit adds the increment
and also puts the request on the garbage collection list to fix a leak.

This fixes a hang found when running the mpich test suite.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32016.
2014-06-17 14:53:29 +00:00
Nathan Hjelm
7f6de57653 osc/rdma: fix accumulate fragment size calculation
The wrong type was used when calculating the amount of space needed
for an accumulate fragment. Fixed the calculation and took the
opportunity to eliminate the get_acc header as it is identical to the
acc header.

This fixes trac:4719 and #4718

Tracking these fixes for 1.8.2 in this CMR.

Throwing this to Brad for review as he is the one who ran into the issue.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32015.

The following Trac tickets were found above:
  Ticket 4719 --> https://svn.open-mpi.org/trac/ompi/ticket/4719
2014-06-17 14:53:24 +00:00
Gilles Gouaillardet
e9ed9def02 Fix MPI_Alltoallv in coll/tuned
This changeset :
- always call the low/level implementation for :
  * MPI_Alltoallv
  * MPI_Neighbor_alltoallv
  * MPI_Alltoallw
  * MPI_Neighbor_alltoallv
- fix mca_coll_tuned_alltoallv_intra_basic_inplace
  so zero size types are correctly handled

cmr=v1.8.2:reviewer=bosilca:ticket=4715

This commit was SVN r32013.

The following Trac tickets were found above:
  Ticket 4715 --> https://svn.open-mpi.org/trac/ompi/ticket/4715
2014-06-17 06:11:34 +00:00
Nathan Hjelm
2f96f16416 osc/rdma: ensure eager sends are active before checking for sync errors
in self optimization

This addresses an issue found with the MPICH pscw_ordering test. Eager sends
were not yet active (which is ok for the standard path) but not ok for the
self optimization. Fixed by waiting for all post messages before checking
the sync state.

Fixes trac:4724

Tracking the 1.8.2 issue in this CMR.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32012.

The following Trac tickets were found above:
  Ticket 4724 --> https://svn.open-mpi.org/trac/ompi/ticket/4724
2014-06-17 04:53:47 +00:00
Nathan Hjelm
41f0059f1e osc/sm: use an unsigned long when calculating the total segment size
Brad correctly pointed out that the total window size should not be an
int. Changed it to an unsigned long.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32010.
2014-06-17 04:33:43 +00:00
Nathan Hjelm
6ec9c6c422 osc/sm: return ompi_request_empty for all request ops
Only one field is valid for RMA requests: MPI_ERROR. This field is set
to the correct value in ompi_request_empty so there is no reason to
allocate and keep track of osc/sm requests because they are always
complete on return. Since we are no longer using the osc/sm request
structure or free list they are now removed.

Closes trac:4723

Tracking this issue with the CMR. Brad, can you verify the issue is indeed fixed.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32009.

The following Trac tickets were found above:
  Ticket 4723 --> https://svn.open-mpi.org/trac/ompi/ticket/4723
2014-06-17 04:27:02 +00:00
George Bosilca
542e4996a7 Cleanup the utilities functions in tuned.
This commit was SVN r31987.
2014-06-13 16:04:45 +00:00
Gilles Gouaillardet
50256c62c5 Fix MPI_Alltoallv in coll/tuned.
Correctly handle the corner case in MPI_Alltoallv when
some tasks have no data to transfer and some other tasks
do have data to transfer.

This test case is covered in ibm/collective/alltoallv_somezeros
from the ompi-tests repo.

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31985.
2014-06-13 06:03:23 +00:00
Mike Dubman
b51a42aeca MXM: fix mxm cleanup, should be called for any compat API
fixe by miked, reviewed by yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31984.
2014-06-12 15:46:38 +00:00
George Bosilca
fd0e1b7261 If we detect an error on a request that has been already released
at the MPI level, we should call abort on MPI_COMM_WORLD.

Fixes ticket #1943.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31982.
2014-06-10 16:24:13 +00:00
Alina Sklarevich
7b8ad47e93 MXM: fix env variable name to hint for thread usage in mxm
reviewed by MikeD
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31968.
2014-06-09 06:40:32 +00:00
Jeff Squyres
82d4e33510 usnic: let connectivity client timeout if agent never appears
This would be a really, really weird case if it ever happens (i.e.,
you have usnics but the agent process failed somewhere in MPI_INIT
such that the agent never appears), but having an infinite loop
doesn't seem like a good idea.

(does not need to go to v1.8 because v1.8 still uses RML for
communication for the connectivity checker)

This commit was SVN r31932.
2014-06-02 18:14:35 +00:00
Jeff Squyres
af04d60098 usnic: do not enable the connecitivty agent if there are no usnics
Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31931.
2014-06-02 18:09:55 +00:00
Gilles Gouaillardet
d1bcd103ac btl/openib : correctly handle eager rdma buffers in mca_btl_openib_del_procs
if eager rdma is used, endpoint reference_count is greater than one.
this commit is a temporary fix that OBJ_RELEASE the endpoint as much as needed.
thought this is likely correct, it can be suboptimal and hence needs to be reviewed

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31922.
2014-06-02 02:23:52 +00:00
Ralph Castain
9305756276 Per RFC:
http://www.open-mpi.org/community/lists/devel/2014/05/14838.php

Remove stale component

This commit was SVN r31917.
2014-06-01 17:03:03 +00:00
Ralph Castain
8736a1c138 Per RFC:
http://www.open-mpi.org/community/lists/devel/2014/05/14822.php

Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root).

This commit was SVN r31916.
2014-06-01 16:14:10 +00:00
Ralph Castain
cf2c7381d0 Replace the PML barrier with an RTE barrier for now until we can come up with a better solution for connectionless BTLs.
Refs trac:4643

This commit was SVN r31915.

The following Trac tickets were found above:
  Ticket 4643 --> https://svn.open-mpi.org/trac/ompi/ticket/4643
2014-06-01 16:08:56 +00:00
Alina Sklarevich
f8a664f5ec MXM: generate the jobid only for MXM versions under v2.0.
reviewed by miked
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31910.
2014-06-01 13:29:24 +00:00