1
1
Граф коммитов

8041 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
81dc3a5db9 Merge pull request #335 from hjelmn/osc_updates
Osc updates
2015-01-07 11:16:55 -06:00
Ralph Castain
e0927895db Grrr...how many files did they forget? 2015-01-06 19:40:18 -08:00
Ralph Castain
84c41429e9 Add missing file 2015-01-06 18:41:11 -08:00
Nathan Hjelm
e68ed2876c osc/pt2pt: threading fixes and code cleanup 2015-01-06 13:39:16 -07:00
Nathan Hjelm
3d79806805 add more internal RMA error codes 2015-01-06 13:39:04 -07:00
Nathan Hjelm
9eba7b9d35 Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma 2015-01-06 13:38:55 -07:00
Jeff Squyres
cab1379dfb Fortran: only emit real16 and complex32 if supported
This is the master version of @ggouaillardet's patch from
open-mpi/ompi-release#148 (there was a minor conflict to fix and
several fuzzings of line numbers).
2015-01-06 09:47:26 -08:00
Howard Pritchard
ec632001b1 Merge pull request #329 from ggouaillardet/topic/romio_refresh
refresh ROMIO based on v3.2a2-84-gef1cf14
2015-01-06 10:27:20 -07:00
Gilles Gouaillardet
0914de9eae refresh ROMIO based on v3.2a2-84-gef1cf14 2015-01-06 19:43:58 +09:00
Yohann Burette
f01dd429df Reset pointer to NULL to prevent double-freeing. 2015-01-05 17:01:37 -08:00
Yohann Burette
1e24da90fe Fix fi_av_insert return code test. 2015-01-05 17:01:37 -08:00
Yohann Burette
5944c294ad Add return code testing for fi_mr_reg. 2015-01-05 17:01:37 -08:00
Howard Pritchard
c857cc926c Merge pull request #327 from hppritcha/topic/async_progress
Topic/async progress
2015-01-05 16:20:44 -07:00
Jeff Squyres
ce2008aa88 man pages: update non-blocking send descriptions
As noted by Alexander Pozdneev, non-blocking sends are now able to
*access* buffers in pending non-blocking send operations; the buffers
just can't be *modified*.
2015-01-05 15:44:27 -05:00
Devendar Bureddy
e732152304 HCOLL: Fix hcoll supported datatype checks corretcly 2015-01-02 21:18:12 +02:00
Gilles Gouaillardet
e8d084e6b9 fix ABI fix
Fix an undeleted line in open-mpi/ompi@24df0ed039
Thanks to Nick Papior Andersen for pointing this.
2014-12-28 18:07:51 +09:00
Gilles Gouaillardet
24df0ed039 MPI_Comm_split_type: fix ABI compatibility
ABI compatibility was previously broken in
open-mpi/ompi@3deda3dc82
2014-12-25 19:43:58 +09:00
Howard Pritchard
3fc7b389ff initial async progress changes for gni 2014-12-24 11:50:23 -07:00
Howard Pritchard
65c4f8d18e Merge pull request #326 from zerothi/master
Enabled COMM_TYPE_SPLIT dependent on locality
2014-12-24 09:13:04 -07:00
Nick Papior Andersen
3deda3dc82 Added several new COMM_TYPE_<> splits
Using the underlying hardware identification to split
communicators based on locality has been enabled using
the MPI_Comm_Split_Type function.

Currently implemented split's are:
  HWTHREAD
  CORE
  L1CACHE
  L2CACHE
  L3CACHE
  SOCKET
  NUMA
  NODE
  BOARD
  HOST
  CU
  CLUSTER

However only NODE is defined in the standard which is why the
remaning splits are referred to using the OMPI_ prefix instead
of the standard MPI_ prefix.

I have tested this using --without-hwloc and --with-hwloc=<path>
which both give the same output.

NOTE: I think something fishy is going on in the locality operators.
In my test-program I couldn't get the correct split on these requests:
  NUMA, SOCKET, L3CACHE
where I suspected a full communicator but only got one.
2014-12-24 11:21:35 +00:00
Gilles Gouaillardet
b9349d2eb9 coll/libnbc: fix MPI_Ireduce_scatter for single task communicator
when MPI_IN_PLACE is not used.

that commit fixes a regression introduced
open-mpi/ompi@49e79a9ade
2014-12-24 12:12:58 +09:00
Devendar Bureddy
e398ad6619 HCOLL: Fix OMPI to HCOLL predefined datatypes, Ops mapping 2014-12-23 22:30:29 +02:00
Jeff Squyres
9144517ad4 man: update a bunch of attribute-related man pages
Per discussion starting
http://www.open-mpi.org/community/lists/users/2014/12/26018.php, at
least note that OMPI does not allow adding or deleting attributes in
an attribute copy or delete callback (or any of its children) on the
same object on which the callback was invoked.
2014-12-19 11:45:58 -08:00
Jeff Squyres
c621d1e622 libfabric: don't LIBADD the common library in the static case
Adding the libfabric common library in the --disable-dlopen case will
result in duplicate symbols.
2014-12-18 11:04:08 -08:00
Rolf vandeVaart
3ec9685ee0 Add missing file to sources. Without this, tarball build does not work 2014-12-18 07:17:28 -08:00
George Bosilca
4d55ae838d Prevent deadlocks on recursive calls (deleting communicators with
attributes from an attribute callback).
2014-12-17 23:12:33 -05:00
Jeff Squyres
d6f059f538 configury: add some descriptive output messages in configure
Ensure that the ofi MTL and the usnic BTL have good descriptive output
messages in configure.
2014-12-17 13:36:01 -08:00
Jeff Squyres
4dcb92ab0b ofi: remove use of non-existent macros 2014-12-17 13:36:01 -08:00
Jeff Squyres
9d1d34c0c0 Fortran: do not dist mpif-h/sizeof_f.f90; it is generated 2014-12-17 10:24:31 -08:00
Gilles Gouaillardet
27aec2ef5b configury: disable f08 fortran bindings if the compiler does
not support c_funloc with TS 29113 subclause 8.1 aka
removed restrictions on ISO_C_BINDING module procedures.
2014-12-17 17:35:45 +09:00
Jeff Squyres
f3be0a5882 ofi: ensure that null_addr is initialized to NULL
And when null_addr is freed, set it back to NULL so that we don't try
to free it again in the error: label.
2014-12-16 17:32:15 -08:00
Jeff Squyres
8c7b6d266e ofi: add "unused" attribute to rc to prevent compiler warning 2014-12-16 17:30:46 -08:00
Yohann Burette
58a7a1e4ac Adding an Open Fabrics Interfaces (OFI) MTL.
This MTL implementation uses the OFIWG libfabric's tag messaging capabilities.
2014-12-16 15:43:39 -08:00
Mangala Jyothi Bhaskar
68d78fd718 Aggregator selection logic Part 2 and reorganized Part1 2014-12-16 15:48:40 -06:00
Mangala Jyothi Bhaskar
2bd52cc410 Initialize req variable to fix a warning 2014-12-16 13:24:28 -06:00
Jeff Squyres
1b63129de3 fortran: ensure to specify the shared library version 2014-12-16 11:16:46 -08:00
Artem Polyakov
01601f3284 Merge pull request #305 from artpol84/timing
Timing framework improvement
2014-12-16 15:13:48 +06:00
George Bosilca
3430714989 Correctly propagate the requested level of thread support during the
component init calls.
2014-12-13 02:36:21 -05:00
Artem Polyakov
8ffad75a0a Introduce timing interval measurement facility in timing framework 2014-12-10 16:47:49 +06:00
Ralph Castain
06e49d0e92 Per contribution from Pascal Deveze of Bull: move opal_set_using_threads earlier in MPI_Init (before datatype init) so the value gets set in time to be properly used. 2014-12-09 00:37:57 -08:00
Jeff Squyres
a71b5dd5c7 debuggers: update warning messages when types not found
Fixes #302.
2014-12-04 03:01:51 -08:00
Jeff Squyres
1dd68d48a8 MPI_Wtime.3: give further explanation about high-res timers 2014-12-03 17:07:42 -08:00
Nadezhda Kogteva
315a240899 Timing framework: pack timing scripts to tarball always 2014-12-02 12:22:46 +02:00
Edgar Gabriel
7e41e0e62b fix a segfault in the two-phase I/O algorithm for fileviews of 0 byte size. 2014-12-01 15:59:00 -06:00
yosefe
3f152733bf Add yalla to the list of default PMLs 2014-12-01 13:11:28 +02:00
Edgar Gabriel
0758d7570e part 1 of the fix to get rid of the missing symbols that prevent the sub-modules to be loaded. 2014-11-29 20:01:36 -06:00
George Bosilca
dee243c58d ompi_proc_finalize has an interesting side effect. A proc is
inserted in the ompi_proc_list as soon as it is created and it
is removed only upon the call to the destructor. In ompi_proc_finalize
we loop over all procs in ompi_proc_finalize and release them once.
However, as a proc is not removed from this list right away, we
decrease the ref count for each proc until it reach zero and the
proc is finally removed. Thus, we cannot clean the BML/BTL after
the call the ompi_proc_finalize.
A quick fix is to delay the call to ompi_proc_finalize until all
other frameworks have been finalized, and then the behavior
depicted above will give the expected outcome.
2014-11-28 18:26:36 -05:00
Nadezhda Kogteva
45ed55afd7 Adding of missed time measurement scripts in tarball 2014-11-28 12:15:30 +02:00
George Bosilca
43901fa15a Merge branch 'master' of github.com:open-mpi/ompi 2014-11-24 22:54:41 -05:00
Ralph Castain
48f702827e First part of memory leak cleanups from Gilles 2014-11-24 16:53:33 -08:00
George Bosilca
fb6ecdfd18 Fix few typos. 2014-11-24 01:48:09 -05:00
George Bosilca
d4edd097c0 Allow for native timer (cycle level) integration
for MPI_Wtime and MPI_Wtick.
2014-11-24 00:45:14 -05:00
Andrew Friedley
e7bcad0c13 Remove unused variable.
Reported by @adrianreber, this patch removes an unused variable in the
PSM MTL, silencing a compiler warning.
2014-11-21 07:51:44 -08:00
George Bosilca
d622db783d Based on https://github.com/open-mpi/ompi/pull/262, we should use
true_lb while computing the lower bound.
2014-11-21 19:16:05 +09:00
Gilles Gouaillardet
705147e98b coll/tuned: fix allgather bruck algorithm 2014-11-21 19:16:05 +09:00
Nathan Hjelm
1b564f62bd Revert "Merge pull request #275 from hjelmn/btlmod"
This reverts commit ccaecf0fd6, reversing
changes made to 6a19bf85dd.
2014-11-19 23:22:43 -07:00
Nathan Hjelm
0d413fb73f Revert "Remove stale file reference"
This reverts commit 4c8fa17234.
2014-11-19 23:16:16 -07:00
Ralph Castain
4c8fa17234 Remove stale file reference 2014-11-19 18:32:19 -08:00
Nathan Hjelm
5a0a48c3c4 osc: remove lingering rdma component files 2014-11-19 12:11:54 -07:00
Nathan Hjelm
1a5349ec79 ompi ignore bfo until it is updated for new btl interface 2014-11-19 11:33:04 -07:00
Nathan Hjelm
8f1a44e60e bml/r2: add all rdma btls even if another btl has higher exclusivity
Background: In order to support atomics each btl needs to provide support
for communicating with self unless the btl module can guarantee global
atomicity. Before this commit bml/r2 discarded any BTL with lower
exclusivity than an existing send btl. This would cause the BML to
discard any btl other than self.

The new behavior is as follows:

 - If an exisiting send btl has higher exclusivity then the btl will not be
   added to the send btl list for the endpoint.

 - If a btl provides RDMA support then it is always added to the rdma btl
   list.

 - bml_btl weight for send btls is now calculated across all send btls.

 - bml_btl weight for rdma btls is now calculated across all rdma btls.

With this change self should still win as the only send btl for loopback
without disqualifying other btls (ugni, openib) for atomic operations.
2014-11-19 11:33:04 -07:00
Nathan Hjelm
22625b005b osc/pt2pt: threading fixes and code cleanup 2014-11-19 11:33:04 -07:00
Nathan Hjelm
60648e4231 add more internal RMA error codes 2014-11-19 11:33:04 -07:00
Nathan Hjelm
0110603782 ob1 warning fix 2014-11-19 11:33:04 -07:00
Nathan Hjelm
45d1fac8af ugni thread safety fixes 2014-11-19 11:33:03 -07:00
Nathan Hjelm
29e4e1c90a Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma 2014-11-19 11:33:03 -07:00
Nathan Hjelm
24427639b6 Fix ob1 warnings 2014-11-19 11:33:03 -07:00
Nathan Hjelm
271818f887 pml/ob1: bug fixes and adjustments for changes in btl_sendi behavior 2014-11-19 11:33:03 -07:00
Nathan Hjelm
ee2b111011 Update PML for latest BTL update 2014-11-19 11:33:02 -07:00
Nathan Hjelm
49ff5a79d0 Update BML for the latest BTL update 2014-11-19 11:33:02 -07:00
Nathan Hjelm
c61e017177 pml: updates to reflect member changes in mca_btl_base_descriptor_t
and mca_btl_base_module_t structures
2014-11-19 11:33:02 -07:00
Nathan Hjelm
5936411a07 pml/ob1: when using btl_get try to register the entire region before
attempting to break the get into multiple rdma fragments

A little background. Historically ob1 always registered the entire memory
region when the RGET protocol was in use. This changed when Mellanox
added support to fragment RGET using the btl_prepare_dst function. Now
that the BTL layer has changed to split out the limits of get/put there
is explicit fragmentation code in ob1. Before this commit the registration
was still done per RGET fragment.

This commit will attempt to register the entire region before creating
RGET fragments. If the registration is successfull then all RGET
fragments will use this registration otherwise they will each attempt
to register their own segment of the receive buffer. If that fails
enough times each fragment will give up and fall back on send/recv.
2014-11-19 11:33:02 -07:00
Nathan Hjelm
b75bb8aea7 Update pml for btl changes 2014-11-19 11:33:02 -07:00
Nathan Hjelm
66bd698eaf Update BML for BTL interface changes 2014-11-19 11:33:02 -07:00
Andrew Friedley
b97cda7fd9 PSM MTL: Don't connect procs already connected
PSM has issues when trying calling psm_ep_connect() more than once for a
specific peer.  Use the psm_ep_connect mask argument to avoid connecting
to processes that are already connected.

OMPI ticket #268.
2014-11-12 15:52:02 -08:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Jeff Squyres
a904a2deeb OpenMPI.3in: remove trailing blank lines; no content changes 2014-11-10 08:38:24 -08:00
Gilles Gouaillardet
df6115aac4 topo/base: fix uninitialized variable
this commit fixes a bug introduced by commit open-mpi/ompi@e7c59e3adb
2014-11-10 13:06:50 +09:00
bosilca
e7c59e3adb Merge pull request #227 from ggouaillardet/rfc/coll_basic_neighbor
RFC/coll basic neighbor
2014-11-07 11:33:25 -05:00
Ralph Castain
a4c0019153 Remove the no-longer-used variables from the opal_hash_table_t definition, and their reference in the ompi debugger code. 2014-11-03 21:35:42 -08:00
Gilles Gouaillardet
64c18686b7 fix ompi_request_wait vs ompi_request_wait_all and
MPI_STATUS_IGNORE vs MPI_STATUSES_IGNORE
2014-11-04 12:16:30 +09:00
Andrew Friedley
273135dbc7 Don't open PSM context when run on single node
When running many ranks on a single node using PSM, it's possible to
exhaust the network hardware contexts (there are 16).  This patch checks
if only a single node is being used. If so, the 'ipath' component of PSM
is disabled and no hardware contexts are opened.
2014-11-03 07:18:16 -08:00
Ralph Castain
616f0894ce Add missing parens on values being passed to OPAL_THREAD_ADD32 2014-10-31 19:11:48 -07:00
Jeff Squyres
7a5b2e9b13 ob1: change an OPAL_UNLIKELY to OPAL_LIKELY
Per
924d39e415 (commitcomment-8378266),
this OPAN_UNLIKELY should really be OPAL_LIKELY.
2014-10-31 03:22:55 -07:00
Nathan Hjelm
672d96704c osc/rdma: fix regression introduced by eed7b45db5
The ompi_osc_signal_outgoing was moved from ompi_osc_rdma_frag_start to frag_send
which gave correct results for the bug reproducer but hangs with simple OSC
tests. Moved the ompi_osc_signal_outgoing back and it now passes all tests.

Closes #256
2014-10-30 23:16:11 -06:00
George Bosilca
924d39e415 Always OBJ_DESTRUCT the send request. 2014-10-30 01:28:50 -04:00
Gilles Gouaillardet
ed93c8787d ob1: add a destructor to mca_pml_ob1_recv_request_t
opal_mutex_t must be OBJ_DESTRUCTed in order to avoid
a memory leak (pthread_mutex_init allocates memory under
Cygwin, so pthread_mutex_destroy is mandatory)

Thanks to Marco Atzeri for reporting this issue
2014-10-29 13:30:29 +09:00
Gilles Gouaillardet
6af465f12d wrappers: add the $(EXEEXT) extension to the installed symbolic links 2014-10-28 16:43:36 +09:00
Gilles Gouaillardet
eef7590e58 wrappers: add the $(EXEEXT) extension to the installed symbolic links 2014-10-28 16:42:51 +09:00
Gilles Gouaillardet
a16c1e4418 mpiJava: call mca_base_var_register *after* MPI_Init
Thanks to Takahiro Kawashima and Siegmar Gross for pointing this issue
2014-10-27 14:41:54 +09:00
Jeff Squyres
37c2b9cf30 Merge pull request #241 from cniethammer/master
Add missing Fortran binding for Win_allocate.
2014-10-24 08:54:28 -04:00
Jeff Squyres
96c655ec67 man: add $COPYRIGHT$ token and emacs mode to all man pages 2014-10-23 17:21:54 -04:00
Jeff Squyres
ec7808cd27 man: remove stale Java bindings from MPI man pages
Fixes #244
2014-10-23 16:43:41 -04:00
Nathan Hjelm
23dd3af946 osc/rdma: use unsigned types for all counters
Some of the counters used by the "rdma" one-sided component are intended
to overflow. Since overflow behavior is undefined for signed integers in
C it is safer to use unsigned integers here.
2014-10-22 15:36:15 -06:00
Ralph Castain
2ec59acac4 Silence a slew of warnings when --enable-memchecker is given. Reviewed by Jeff 2014-10-22 13:59:08 -07:00
Jeff Squyres
c22e1ae33b configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros
These two macros set the prefix for the OPAL and ORTE libraries,
respectively.  Specifically, the OPAL library will be named
libPREFIXopen-pal.la and the ORTE library will be named
libPREFIXopen-rte.la.

These macros must be called, even if the prefix argument is empty.

The intent is that Open MPI will call these macros with an empty
prefix, but other projects (such as ORCM) will call these macros with
a non-empty prefix.  For example, ORCM libraries can be named
liborcm-open-pal.la and liborcm-open-rte.la.

This scheme is necessary to allow running Open MPI applications under
systems that use their own versions of ORTE and OPAL.  For example,
when running MPI applications under ORTE, if the ORTE and OPAL
libraries between OMPI and ORCM are not identical (which, because they
are released at different times, are likely to be different), we need
to ensure that the OMPI applications link against their ORTE and OPAL
libraries, but the ORCM executables link against their ORTE and OPAL
libraries.
2014-10-22 10:32:19 -07:00
Jeff Squyres
01fd96bfa5 Revert "Provide a mechanism by which an upstream project can rename
the OPAL and ORTE libraries. This is required by projects such as ORCM
that have their own ORTE and OPAL libraries in order to avoid library
confusion. By renaming their version of the libraries, the OMPI
applications can correctly dynamically load the correct one for their
build."

This reverts commit 63f619f871.
2014-10-22 10:32:11 -07:00
yosefe
b4f569b4d4 yalla: address comments on #246 by @jsquires 2014-10-22 10:42:56 +03:00
yosefe
ce7c748e51 Add new PML yalla, which uses mxm directly to reduce overhead.
http://starwars.wikia.com/wiki/Ubed_Yalla
2014-10-21 16:08:24 +03:00
Jeff Squyres
952be15d7f MPI_Ibarrier.3in: add missing man page
Add MPI_Ibarrier.3in to reference MPI_Barrier.3, and update
MPI_Barrier.3in to include bindings for MPI_Ibarrier.  Slightly update
the text to be inclusive of the non-blocking case.

Fixes #242.
2014-10-20 05:26:53 -07:00
Nadezhda Kogteva
2bce929330 MTL MXM cleanup: unnecessary OMPI_MTL_MXM_CONNECT_ON_FIRST_COMM variable removed 2014-10-20 10:29:47 +03:00
Christoph Niethammer
9020a1c1f6 Add missing Fortran binding for Win_allocate. 2014-10-17 13:22:54 +02:00
bosilca
d819939841 Merge pull request #233 from ggouaillardet/rfc/coll_module_disable
Provide a symmetric behavior for the activation/deactivation of collective modules.
2014-10-16 09:22:04 -04:00
Gilles Gouaillardet
b5aea782ce Revert "Fix heterogeneous support"
Per the discussion at http://www.open-mpi.org/community/lists/devel/2014/10/16050.php

This reverts commit c9c5d4011b.
2014-10-16 12:24:38 +09:00
George Bosilca
7541c03b4c Mark all instances where atomic operations are used but their return value is unnecessary 2014-10-15 21:47:32 -04:00
Gilles Gouaillardet
c9c5d4011b Fix heterogeneous support
* redefine orte_process_name_t so it can be converted
  between host and network format as an opal_identifier_t
  aka uint64_t by the OPAL layer.
* correctly send OPAL_DSTORE_ARCH key
2014-10-15 17:19:13 +09:00
Edgar Gabriel
0219c87039 set the fs_ptr to NULL in case of an error, to avoid a malicious free on file_close. 2014-10-14 13:09:06 -05:00
Gilles Gouaillardet
e3f74aca1c Correctly mote the pointer back by the true_lb.
Fixes #231
2014-10-14 16:26:54 +09:00
Gilles Gouaillardet
0f983d5a4f add a disable function for coll module 2014-10-14 14:46:36 +09:00
Devendar Bureddy
7a6b4c36b0 HCOLL: Update the proc structure dereference
Update the proc structure dereference to reflect the new opal_proc_t
super field
2014-10-13 20:49:19 +03:00
Devendar Bureddy
b8d2a15be9 HCOLL: by default off 2014-10-13 20:49:09 +03:00
Vasily Filipov
a215a4831d MTL/MXM: disable "bulk_connect" by default. 2014-10-13 09:47:56 +03:00
Ralph Castain
63f619f871 Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build. 2014-10-10 11:39:08 -07:00
Gilles Gouaillardet
8eb2d62919 coll/sm: fix an other memory leak 2014-10-10 19:54:45 +09:00
Gilles Gouaillardet
27e4389259 * comment on communicator creation in mca_topo_base_dist_graph_create(...)
* use accesors to retrieve topo info
2014-10-10 16:07:20 +09:00
Gilles Gouaillardet
5d44a30111 coll/sm: fix minor memory leaks
port 4488.1.patch attached in #196 to master
2014-10-10 14:21:34 +09:00
Gilles Gouaillardet
76204dfafe coll/basic: fix segmentation fault in neighborhood collectives if the degree
of the topology is higher than the communicator size

It is possible to have a topology degree higher than the size of the communicator.
For example, a periodic cartesian communicator on MPI_COMM_SELF. This will leave
the neighborhood collectives with a request buffer that is too small.

This commits introduces a semantic change :
from now, c_topo must be set before invoking coll_select
2014-10-10 11:56:04 +09:00
Gilles Gouaillardet
2f67f29b85 Revert "coll/basic: fix segmentation fault in neighborhood collectives if the degree"
This reverts commit 9c788ff940.
2014-10-10 11:29:06 +09:00
Elena
c905fe9b78 pmix: removed pmix_base_direct modex mca parameter, renamed orte_full_modex_cutoff and ompi_hostname_cutoff to direct_modex_cutoff 2014-10-09 06:15:31 +02:00
Nathan Hjelm
eed7b45db5 osc/rdma: fix issue identified by Berk Hess
osc/rdma uses counters to determine if all messages have been received
before exiting synchronization calls. The problem is that the active
target counter is always increasing (never zeroed). If over 2^31-1
messages are sent this causes the counter to overflow (in itself this
isn't an error). This causes test/wait to return before the communication
is complete. There is an additional error in the use of the fragment
flush function. If PSCW synchronization is in use this function CAN NOT
be called unless a post message has arrived.

Relevant mailing list thread: http://www.open-mpi.org/community/lists/devel/2014/10/16016.php

This commit fixes both issues. Tested against MTT and issue reproducer.

Closes #224.
2014-10-07 11:45:22 -06:00
Ralph Castain
fd6a044b7f Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages.
Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.
2014-10-03 16:02:57 -06:00
Jeff Squyres
413e775dbf version configury: make dist now works
Update the VERSION file scheme:

* Remove "want_repo_rev".
* Add "tarball_version".

All values are now always included (major, minor, release, greek,
repo_rev).  However, configure.ac now runs "opal_get_version.sh
... --tarball", which will return the value of tarball_version (if it
is non-empty) or the "full" version string (i.e.,
"major.minor.releasegreek").
2014-10-02 11:32:54 -07:00
Jeff Squyres
8468424f45 distscript: remove configure.params and autogen.subdirs kruft
Remove configure.params support: configure.params hasn't been used in
years.

Also remove autogen.subdirs support; those should really be handled by
their respective Makefile.am's.
2014-10-02 11:32:54 -07:00
Howard Pritchard
bb65835816 Fix iallgather problem with intercommunicators
A problem was found with the libnbc MPI_Iallgather
routine when using intercommunicators.  Special
thanks to Takahiro Kawashima(Fujitsu) for the patch
and a test case.  Verified master fails without the
patch and the test passes with the patch applied.

fixes #219
2014-10-02 11:45:17 -06:00
Ralph Castain
3263f721b6 Strip crlf line endings 2014-10-02 08:37:18 -07:00
Jeff Squyres
72704441a2 URLs: update URLs for GitHub 2014-10-01 14:44:09 -07:00
Ralph Castain
69328c30f5 Simplify the check for abort_print_stack by removing stale #ifdefined
cmr=v1.8.4:reviewer=jsquyres

This commit was SVN r32821.
2014-09-30 19:38:29 +00:00
Howard Pritchard
0f74467264 switch to ompi_mpi_thread_provided for ts check
Use ompi_mpi_thread_provided rather than opal_using_threads macro
to check whether MPI_THREAD_MULTIPLE is being used.

This commit was SVN r32815.
2014-09-29 22:20:35 +00:00
Howard Pritchard
7069f2361a disqualify coll ml for MPI_THREAD_MULTIPLE
This commit was SVN r32814.
2014-09-29 21:02:15 +00:00
Ralph Castain
eb95d6f892 ompi_info_get_bool returns "success" if the value isn't found, setting "flag" to false, but doesn't set the value of the param itself. So if you don't specify "blocking_fence" in MPI_Info, then the "blocking_fence" flag wasn't being set.
Initialize the blocking_fence flag to false as the code logic indicates that it should only be set if someone provides that flag.

Thanks to Lisandro Dalcin for reporting it

cmr=v1.8.4:reviewer=hjelmn

This commit was SVN r32812.
2014-09-29 17:21:28 +00:00
George Bosilca
49e79a9ade Fix the case of a single process.
This commit was SVN r32807.
2014-09-28 22:06:39 +00:00
Jeff Squyres
318e3b426a fortran: workaround Absoft linker issue
MTT found that the addition of the MPI_SIZEOF interfaces to mpif.h was
causing a linker error with the Absoft compiler.  Absoft is working on
a fix, but we can workaround the issue for now.  See comment in
Makefile.am in this commit for a lengthy explanation.

Refs trac:4917

This commit was SVN r32797.

The following Trac tickets were found above:
  Ticket 4917 --> https://svn.open-mpi.org/trac/ompi/ticket/4917
2014-09-25 21:07:46 +00:00
Nathan Hjelm
9c788ff940 coll/basic: fix segmentation fault in neighborhood collectives if the degree
of the topology is higher than the communicator size

It is possible to have a topology degree higher than the size of the communicator.
For example, a periodic cartesian communicator on MPI_COMM_SELF. This will leave
the neighborhood collectives with a request buffer that is too small. This commit
adds a call that will dynamically increase the size of the request buffer if it
is too small.

A better fix would be to create the topology *before* calling the coll_select
routine on a communicator. This will take some discussion and the solution will
not likely be ready anytime soon.

Thanks to Lisandro Dalcin for reporting this.

Original thread: http://www.open-mpi.org/community/lists/devel/2014/08/15713.php

cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32796.
2014-09-25 17:43:29 +00:00
Jeff Squyres
d13034d0b0 fortran: add configury to check for storage_size()
gfortran 4.8 does not support storage_size() on all relevant types
that we need.  So add a configure test to check and see if the
compiler's storage_size() intrinsic supports enough types for us to do
MPI_SIZEOF.

Also remove an accidentally redundant check for fortran INTERFACE.

Refs trac:4917

This commit was SVN r32790.

The following Trac tickets were found above:
  Ticket 4917 --> https://svn.open-mpi.org/trac/ompi/ticket/4917
2014-09-25 00:17:29 +00:00
Jeff Squyres
c9ea7f2732 fortran: ensure that sizeof_f08.h is built before mpi-f08.lo
mpi-f08.F90 includes sizeof_f08.h, so we need to add a Makefile
dependency to ensure that sizeof_f08.h is built first.

Refs trac:4917

This commit was SVN r32789.

The following Trac tickets were found above:
  Ticket 4917 --> https://svn.open-mpi.org/trac/ompi/ticket/4917
2014-09-24 23:59:18 +00:00
Ralph Castain
4024c8af9e Have to include the mpisync directory so the Makefile.in gets built - just don't build the binary and install it if timing isn't enabled
This commit was SVN r32781.
2014-09-24 01:18:21 +00:00
Edgar Gabriel
05c34946f7 implementation of non-blocking read/write operations through aio
functions for the posix module. Som interface changes for the fbtl were
necessary for that.

This commit was SVN r32777.
2014-09-23 21:27:57 +00:00
Artem Polyakov
f2e586980b Fix timing framework:
1. Fixes according to (http://www.open-mpi.org/community/lists/devel/2014/09/15869.php)
2. Force mpisync:rank0 to gather results. Now sync info is written by rank0 to the output file.
3. Improve mpirun_prof: 1) adopt to the environment (SLURM/TORQUE); 2) recognize some noteset-related mpirun options.

This commit was SVN r32772.
2014-09-23 12:59:54 +00:00
Ralph Castain
9c20940190 Remove mpif-sizeof.h during distclean
This commit was SVN r32771.
2014-09-21 14:26:19 +00:00
Ralph Castain
70896550bf Per input from Artem, update the copyrights on these files, ensuring to include all the licensing info for the files broght over from the mpiperf project.
This commit was SVN r32770.
2014-09-20 14:54:24 +00:00
Jeff Squyres
7f419dc5b6 fortran: set CLEANFILES properly
CLEANFILES was previously set; we need to use += to add to it.

refs trac:4917

This commit was SVN r32769.

The following Trac tickets were found above:
  Ticket 4917 --> https://svn.open-mpi.org/trac/ompi/ticket/4917
2014-09-20 10:43:49 +00:00
Jeff Squyres
040611556f fortran: don't complain about script args if we're not building fortran
refs trac:4917

This commit was SVN r32766.

The following Trac tickets were found above:
  Ticket 4917 --> https://svn.open-mpi.org/trac/ompi/ticket/4917
2014-09-20 01:22:40 +00:00
Jeff Squyres
d7eaca83fa Fortran: Fix MPI_SIZEOF. What a disaster. :-(
What started as a simple ticket ended up reaching the way up to the
MPI Forum.
    
It turns out that we are supposed to have MPI_SIZEOF for all Fortran
interfaces: mpif.h, the mpi module, and the mpi_f08 module.
    
It further turns out that to properly support MPI_SIZEOF, your Fortran
compiler *has* support the INTERFACE keyword and ISO_FORTRAN_ENV.  We
can't use "ignore TKR" functionality, because the whole point of
MPI_SIZEOF is that the implementation knows what type was passed to it
("ignore TKR" functionality, by definition, throws that information
away).  Hence, we have to have an MPI_SIZEOF interface+implementation
for all intrinsic types, kinds, and ranks.

This commit therefore adds a perl script that generates both the
interfaces and implementations for MPI_SIZEOF in each of mpif.h, the
mpi module, and mpi_f08 module (yay consolidation!).

The perl script uses the results of some new configure tests:

* check if the Fortran compiler supports the INTERFACE keyword
* check if the Fortran compiler supports ISO_FORTRAN_ENV
* find the max array rank (i.e., dimension) that the compiler supports

If the Fortran compiler supports both INTERFACE and ISO_FORTRAN_ENV,
then we'll build the MPI_SIZEOF interfaces.  If not, we'll skip
MPI_SIZEOF in mpif.h and the mpi module.  Note that we won't build the
mpi_f08 module -- to include the MPI_SIZEOF interfaces -- if the
Fortran compiler doesn't support INTERFACE, ISO_FORTRAN_ENV, and a
whole bunch of ther modern Fortran stuff.

Since MPI_SIZEOF interfaces are now generated by the perl script, this
commit also removes all the old MPI_SIZEOF implementations (which were
laden with a zillion #if blocks).

cmr=v1.8.3

This commit was SVN r32764.
2014-09-19 13:44:52 +00:00
Rolf vandeVaart
5c73101a72 Fix typo.
This commit was SVN r32755.
2014-09-18 13:58:54 +00:00
Vasily Filipov
c7c63fe73e COLL/TUNED: alltoall - return previous default values of algorithm choosing decision thresholds (were changed by r32735)
reviewed by miked
    cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32753.

The following SVN revision numbers were found above:
  r32735 --> open-mpi/ompi@5fecf65daf
2014-09-18 08:07:51 +00:00
Jeff Squyres
0f29f222f2 fortran: remove 2 unused files
As noted in the comments of these files, they aren't used.  Instead,
the Fortran interfaces for WTICK/WTIME just BIND(C) invoke the
back-end C functions (yay BIND(C)!).  Hence, there's no need to keep
these old wrapper files around any more.

cmr=v1.8.3

This commit was SVN r32751.
2014-09-17 21:49:24 +00:00
Rolf vandeVaart
8db1f89dd1 Small change to allow CUDA-aware to work with non-reduction nonblocking collectives.
Only used when CUDA-aware feature compiled in.

This commit was SVN r32750.
2014-09-17 16:55:01 +00:00
Ralph Castain
d50c8ba65f Per patch from Gilles, cleanup some errors that surface when building with PGI. Verified by Tetsuya, reviewed okay by Jeff.
RM-approved

cmr=v1.8.3:reviewer=ompi-gk1.8

This commit was SVN r32745.
2014-09-16 19:07:02 +00:00
Vasily Filipov
ff10b25e7d warnings (caused by commit r32735) fix.
reviewed by miked
    cmr=v1.8.3:reviewer=ompi-rm1.8 

This commit was SVN r32740.

The following SVN revision numbers were found above:
  r32735 --> open-mpi/ompi@5fecf65daf
2014-09-16 06:33:49 +00:00
Ralph Castain
dfb952fa78 [Contribution from Artem - moved it to svn from git for him]
Replace our old, clunky timing setup with a much nicer one that is only available if configured with --enable-timing. Add a tool for profiling clock differences between the nodes so you can get more precise timing measurements. I'll ask Artem to update the Github wiki with full instructions on how to use this setup.

This commit was SVN r32738.
2014-09-15 18:00:46 +00:00
Vasily Filipov
5fecf65daf OMPI/COLL/Tuned: add command line params for thresholds to decide if small/intermediate MSGs alltoall algorithm will be used.
cmr=v1.8.3:reviewer=miked

This commit was SVN r32735.
2014-09-15 12:34:21 +00:00
Mangala Jyothi Bhaskar
dc05b709a7 it is ok to not have a sharedfp component selected, as long as no
sharedfp functionality is being used. Return an error however if no
sharedfp component is selected and the applications calls a
file_read/write_shared function.

This commit was SVN r32718.
2014-09-12 21:15:58 +00:00
Edgar Gabriel
597177cd8b silence a warning regarding the return value of the fbtl's.
This commit was SVN r32717.
2014-09-12 18:01:30 +00:00
Mangala Jyothi Bhaskar
cd78a3a026 Fixed offset data type in communication
This commit was SVN r32710.
2014-09-11 14:51:30 +00:00
Mangala Jyothi Bhaskar
4ff21d6178 Fixed offset data type in communication
This commit was SVN r32709.
2014-09-11 14:51:07 +00:00
Mangala Jyothi Bhaskar
6e5f2c8ae8 Fixed offset data type in communication
This commit was SVN r32708.
2014-09-11 14:50:30 +00:00
Edgar Gabriel
4ccc0f5ea2 the length of the iov array should be limited to IOV_MAX, which is defined in limits.h
This commit was SVN r32706.
2014-09-10 21:59:45 +00:00
Edgar Gabriel
cc46b65a5e the fbtl interfaces should really be an ssize_t not a size_t, since the return
value could be negative, which is allowed for ssize_t, but not for size_t.

This commit was SVN r32700.
2014-09-10 15:01:54 +00:00
Edgar Gabriel
599cb7b351 update the pvfs2 fbtl to return the number of bytes generated.
This commit was SVN r32699.
2014-09-10 13:32:06 +00:00
Gilles Gouaillardet
e71452d73a Revert r32696
This commit was SVN r32697.

The following SVN revision numbers were found above:
  r32696 --> open-mpi/ompi@e4c3500166
2014-09-10 04:35:47 +00:00
Gilles Gouaillardet
e4c3500166 Fix MPI_Status_set_elements[_x] for non predefined datatypes
Fixes trac:4896

cmr=v1.8.3:reviewer=bosilca

This commit was SVN r32696.

The following Trac tickets were found above:
  Ticket 4896 --> https://svn.open-mpi.org/trac/ompi/ticket/4896
2014-09-10 02:41:29 +00:00
Edgar Gabriel
3a5f4f72da make the zero byte read/write scenarios work without the contiguous flag.
This commit was SVN r32690.
2014-09-09 16:26:14 +00:00
Edgar Gabriel
6a607caed8 fix some zero byte allocation scenarios.
This commit was SVN r32689.
2014-09-09 16:25:44 +00:00
Edgar Gabriel
ed02927767 - do not set the contiguous memory option in the collective operations. It
should not be stored on the file handle anyway, since it is not a property of
the file.
- protect a realloc for zero byte scenarios.

This commit was SVN r32678.
2014-09-07 18:09:43 +00:00
Edgar Gabriel
0d425e2f74 resetting the counter for the iov array has to happen outside of the if statement.
This commit was SVN r32677.
2014-09-07 16:30:56 +00:00
Edgar Gabriel
0f59ce6591 use the fbtl return value as originally intended, namely to retrieve the
number of bytes written and read. Status contains now the actual number of
bytes written for individual operations. For collective operations, this is
unfortunately not possible.

This commit was SVN r32674.
2014-09-07 15:14:57 +00:00
Ralph Castain
41c6058153 Bring over changes to MXM from pmix branch:
MTL MXM: establish endpoint connection on the first communication when direct_modex used

This commit was SVN r32668.
2014-09-03 18:22:11 +00:00
Gilles Gouaillardet
edfbeba7bf coll/ml: better error handling
when CHECK_AND_RECYCLE detects an error, a message is displayed
if the error occurs on an intrinsic communicator, then abort
the program (instead of trying to free the communicator)

cmr=v1.8.3:reviewer=hjelmn

This commit was SVN r32659.
2014-09-01 10:00:49 +00:00
Ralph Castain
aae1bb4f44 Silence warning
This commit was SVN r32657.
2014-08-31 08:10:35 +00:00
Jeff Squyres
f4238d65a5 fortran: also provide PMPI variants for MPI_Alloc_mem_cptr
r32622 was the first half of the fix -- we need the PMPI variants as well.

Refs trac:4882

This commit was SVN r32627.

The following SVN revision numbers were found above:
  r32622 --> open-mpi/ompi@cf0f734a98

The following Trac tickets were found above:
  Ticket 4882 --> https://svn.open-mpi.org/trac/ompi/ticket/4882
2014-08-28 23:47:38 +00:00
Gilles Gouaillardet
cf0f734a98 Fortran: add mpi_alloc_mem_cptr like bindings when configured with --without-weak-symbols
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32622.
2014-08-28 09:34:54 +00:00
Ralph Castain
b554cd7d86 Turn off the coll/ml component if --without-hwloc was given
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32621.
2014-08-27 20:25:39 +00:00
Jeff Squyres
d85527701a Fix MPI_COMM_SPLIT_TYPE with MPI_UNDEFINED
Thanks to Lisandro Dalcin for identifying the problem.

Fixes trac:4876

Submitted by George Boscila, reviewed by Jeff Squyres.

cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32615.

The following Trac tickets were found above:
  Ticket 4876 --> https://svn.open-mpi.org/trac/ompi/ticket/4876
2014-08-27 12:17:33 +00:00
Gilles Gouaillardet
7e3784e0b7 MPI_Type_create_indexed_block.3: fix a typo in the man page
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32614.
2014-08-27 03:48:03 +00:00
George Bosilca
8de93982d5 Correctly build the args for the hindexed_block datatype.
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32613.
2014-08-27 03:45:07 +00:00
Edgar Gabriel
46de730059 fix a typo
This commit was SVN r32603.
2014-08-25 20:53:19 +00:00
Edgar Gabriel
52eac0146d cleanup of the fbtl interfaces: remove the *sorted optimization flag, since it
was not used anyway in the last two years. Simplifies the code significantly.

This commit was SVN r32602.
2014-08-25 18:04:24 +00:00
Todd Kordenbrock
6a3225d800 Fix invalid symbols left by the PMIx merge.
This commit was SVN r32597.
2014-08-25 16:30:26 +00:00
Jeff Squyres
e8eb07ad87 ompi_common_dll.c: the topo mtc union offset must be saved
Since the union contains pointers -- not instances -- we need to save
the mtc offset to get to the pointers later.

This commit was SVN r32591.
2014-08-23 15:42:44 +00:00
Ralph Castain
ac0c584eb7 Add missing file
This commit was SVN r32588.
2014-08-23 04:31:35 +00:00
Ralph Castain
b1a7375192 Fix the "unreachable" message so it outputs the correct hostname for the remote proc. Cleanup some of the pmix stuff when running corner cases of errors
This commit was SVN r32584.
2014-08-22 19:20:45 +00:00
Vishwanath Venkatesan
b176787d0f Remove unwanted spaces + Test commit
This commit was SVN r32576.
2014-08-22 05:11:17 +00:00
Edgar Gabriel
9987135da0 add initial support for non-blocking read and write operations.
This commit was SVN r32571.
2014-08-22 01:34:19 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Mangala Jyothi Bhaskar
a5973c3f8c revamp of the aggregator selection logic, part 1.
This commit was SVN r32557.
2014-08-20 19:28:04 +00:00
Rolf vandeVaart
8709071819 Fix missing help file.
This commit was SVN r32550.
2014-08-18 21:52:31 +00:00
Jeff Squyres
0a398c155f opal MCA params: Move (and adapt) help message to opal help file
This commit was SVN r32547.
2014-08-16 11:54:41 +00:00
Edgar Gabriel
fabad95b8e - extend the explicit offset patch to collective explicit offset operations as
well
- minor restructuring to support the shared file pointer operations correctly
  for explicit offsets

This commit was SVN r32538.
2014-08-15 14:03:29 +00:00
Edgar Gabriel
d773dc8aa5 make arbitrary sequences of explicit and implicit offset operations work properly.
This commit was SVN r32537.
2014-08-15 01:49:43 +00:00
Jeff Squyres
3e78f7878c fortran: add missing bindings for WIN_SYNC, WIN_LOCK_ALL,
WIN_UNLOCK_ALL

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32535.
2014-08-14 20:27:30 +00:00
Edgar Gabriel
da1b6c2e87 some code reorganization in preparation for non-blocking read and write
operations.

This commit was SVN r32534.
2014-08-14 20:17:58 +00:00
Alina Sklarevich
a914c68356 MTL MXM: fix check-help-string.pl errors and warnings.
This commit was SVN r32533.
2014-08-14 07:46:56 +00:00
Edgar Gabriel
e401b68ca5 fix the zero byte fileview problem reported by Mohamad on the mailinglist
This commit was SVN r32529.
2014-08-13 23:44:43 +00:00
Vasily Filipov
5ca2fffa44 MTL/MXM: call for ompi_proc_world instead of ompi_comm_size during del_procs.
This commit was SVN r32504.
2014-08-11 11:52:23 +00:00
Gilles Gouaillardet
cf9e144f05 silence warnings
gcc 3.4.3 on solaris 10 issues some warnings

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32500.
2014-08-11 07:36:46 +00:00
Gilles Gouaillardet
22cb8a1834 check-help-strings cleanup
This commit was SVN r32497.
2014-08-11 03:27:45 +00:00
Gilles Gouaillardet
f24699623f check-help-strings cleanup
This commit was SVN r32495.
2014-08-11 03:25:22 +00:00
Gilles Gouaillardet
b565e69b86 check-help-strings cleanup
This commit was SVN r32491.
2014-08-11 03:19:57 +00:00
Mike Dubman
3c8a4d7d2d mxm: opal refactoring voices
http://www.open-mpi.org/community/lists/devel/2014/08/15590.php

This commit was SVN r32486.
2014-08-10 04:35:56 +00:00
Mike Dubman
0f60c34a9f fca: adopt opal API refactoring, fix warning.
based on http://www.open-mpi.org/community/lists/devel/2014/08/15558.php

This commit was SVN r32484.
2014-08-09 15:50:51 +00:00
Jeff Squyres
ca0ccc5321 headers: remove trailing commas in enum lists
Per http://www.open-mpi.org/community/lists/devel/2014/08/15576.php,
trailing commas are not valid in enum lists in C++ until C++11.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r32482.
2014-08-09 12:04:17 +00:00
Ralph Castain
70da69a4f3 Cleanup and ignore Intel compiler build products
This commit was SVN r32463.
2014-08-08 16:13:43 +00:00
Ralph Castain
e95187514c Update the proc structure dereference to reflect the new opal_proc_t super field
This commit was SVN r32462.
2014-08-08 16:12:49 +00:00
Jeff Squyres
132375f07f helpfiles: fix filenames referenced by calls to show_help()
This commit was SVN r32453.
2014-08-08 13:34:15 +00:00
Jeff Squyres
80a7309462 helpfiles: remove empty helpfiles
This commit was SVN r32452.
2014-08-08 13:33:47 +00:00
Jeff Squyres
c6d9bf906e configury: ensure wrapper static LIBS is filled properly
In core library portions of the configury (e.g., top-level
configure.ac itself), we were calling AC_CHECK_LIB and
OPAL_CHECK_FUNC_LIB to check for various libraries.

'''SIDENOTE:''' It turns out that modern Autoconf has AC_SEARCH_LIBS,
which does just about exactly what OPAL_CHECK_FUNC_LIB does.  So this
commit effectively replaces OPAL_CHECK_FUNC_LIB with AC_SEARCH_LIBS.

However, we never bothered to add these found libraries to the wrapper
compiler list of libraries used for static linking (doh!).  We've been
getting lucky for quite a while that components were adding the same
libraries to their wrapper compiler LIBS list.  

This is problematic, however, if we don't build some of these
components.  For example, Paul Hargrove noticed that if he configured
with --disable-shared --enable-static --disable-io-romio, ROMIO was no
longer adding some libraries to the wrapper LIBS list -- libraries
that just happened to also be needed by core OPAL/ORTE/OMPI layers.

The solution is not to use AC_CHECK_LIB or OPAL_CHECK_FUNC_LIB, but
use a pair of new macros:

 * OPAL_SEARCH_LIBS_CORE: a wrapper around AC_SEARCH_LIBS.  If we add
   something to $LIBS, then also add it to the wrapper list of static
   libraries.  This is the main piece of functionality that was
   wrong/missing.
 * OPAL_SEARCH_LIBS_COMPONENT: similar to OPAL_SEARCH_LIBS_CORE, but
   instead of directly adding it to the wrapper list of static
   libaries, add it to <framework>_<component>_LIBS (which eventually
   gets slurped up into the wrapper list of static libraries.  See the
   lengthy comment in config/opal_setup_wrappers.m4 near the beginning
   of OPAL_SETUP_WRAPPER_INIT() for a more detailed explanation).
   Most components did this correctly already, but one or two weren't
   right, so I implemented this second macro quite similar to the
   first and put it everywhere we already used AC_SEARCH_LIBS or
   OPAL_CHECK_FUNC_LIB.

This needs to soak for a day or two on the trunk before moving to the
v1.8 branch.

Refs trac:4834

cmr=v1.8.2:reviewer=ggouaillardet

This commit was SVN r32447.

The following Trac tickets were found above:
  Ticket 4834 --> https://svn.open-mpi.org/trac/ompi/ticket/4834
2014-08-07 23:54:45 +00:00
Ralph Castain
1e93e85403 Cleanup some autoconf messages - thanks to Paul Hargrove for noting them
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32429.
2014-08-05 14:48:42 +00:00
Ralph Castain
5f244e8b19 Use opal_getpagesize to get the proper page size
Refs trac:4826

This commit was SVN r32422.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-04 20:23:00 +00:00
Ralph Castain
2ceaa8a1dd Per patch from Thomas, remove orte abstraction breaks
This commit was SVN r32413.
2014-08-04 17:07:50 +00:00
Gilles Gouaillardet
5f1e0f284a Fix compilation when --enable-hetorogeneous
This commit was SVN r32410.
2014-08-04 10:35:08 +00:00
Gilles Gouaillardet
5b1ae87c76 coll/ml: fix ML_ERROR/printf parameters
cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32409.
2014-08-04 04:05:59 +00:00
Gilles Gouaillardet
f7b13d1126 Fix missing ampersand.
also replase the OMPI_CAST_RTE_NAME macro with
an inline function if OPAL_ENABLE_DEBUG, so we can
get warnings from the compiler if ampersand is missing.

Thanks to Paul Hargrove for reporting the bugs

This commit was SVN r32408.
2014-08-04 02:52:56 +00:00
Ralph Castain
61bf7af9d2 Per Paul Hargrove's suggestion, create an opal_pagesize function to abstract the various ways of obtaining that value. Rather than creating a separate file for only that one function, put it in a convenient place that is at least somewhat related.
Refs trac:4826

This commit was SVN r32407.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-02 18:38:16 +00:00
Ralph Castain
daeb9b6c4f Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain.
Thanks to Gilles for pointing out some of the discrepancies.

This commit was SVN r32398.
2014-08-01 14:44:11 +00:00
Gilles Gouaillardet
cd8fa75f87 coll/ml: align on page size as returned by sysconf
Thanks to Paul Hargrove for pointing into the right direction

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32393.
2014-08-01 08:10:12 +00:00
George Bosilca
cee2a4e5c8 Missing alloca.h. Thanks Paul for catching this.
This commit was SVN r32388.
2014-08-01 03:28:23 +00:00
Jeff Squyres
537aa674a5 fortran: remove a duplicate listing of this file
This fixes some duplicate symbols, once the .o files for the modules
were restored into the library (some compilers need the .o files, some
don't (!)).

Also, remove trailing whitespace.  :-)

This commit was SVN r32386.
2014-08-01 00:43:04 +00:00
Jeff Squyres
abbcde6cb9 fortran: add the .o's back into libmpi_usempif08
Thanks to Paul Hargrove for the massive hint to find this.

This commit was SVN r32385.
2014-07-31 23:41:47 +00:00
George Bosilca
9b2fcd898e No more ORTE specifics in this file.
This commit was SVN r32384.
2014-07-31 22:34:16 +00:00
George Bosilca
f39abb9e69 Reverting r32355: a number of processes is not a notion that a low level
communication library should use to initialize itself.

Ralph will champion this change back with an RFC if there is a realistic
need/use case from the community.

This commit was SVN r32361.

The following SVN revision numbers were found above:
  r32355 --> open-mpi/ompi@c903917f47
2014-07-30 20:11:35 +00:00
Ralph Castain
309d75dadc Add missing ampersand - function call required a pointer, not the name itself
This commit was SVN r32357.
2014-07-30 14:48:20 +00:00
Ralph Castain
c903917f47 Expose the num_procs information to the opal layer as the info is needed in several BTLs
This commit was SVN r32355.
2014-07-30 09:33:41 +00:00
Gilles Gouaillardet
b95537376f bcol/basesmuma: fix parameter order
Ref: #4815

This commit was SVN r32353.
2014-07-30 05:38:53 +00:00
Nathan Hjelm
0a32ea87e7 Remove unneeded EXTRA_DIST
This commit was SVN r32343.
2014-07-29 16:54:11 +00:00
George Bosilca
815d7bc846 Fix the example to use the new 2_2 topo component.
This commit was SVN r32338.
2014-07-29 07:00:01 +00:00
George Bosilca
78eae6108a Add an example on how to handle topologies.
This commit was SVN r32337.
2014-07-29 05:05:59 +00:00
Nathan Hjelm
1407c1f501 Remove RML code from common/sm
The only user of this code was coll/sm. I implemented a basic replacement
for the removed code. This gets the trunk compiling again with
--disable-dlopen.

This commit was SVN r32333.
2014-07-28 22:00:12 +00:00
Nathan Hjelm
0f15afa4d9 Fix typo in psm mtl
This commit was SVN r32332.
2014-07-28 22:00:03 +00:00
Nathan Hjelm
603ba71b0d Ignore components broken by the BTL move.
common/ofacm is only used by the iboffload code in ompi. This code does
not currently work so it is safe to ignore these components until it is
fixed.

This commit was SVN r32331.
2014-07-28 21:24:18 +00:00
Ryan Grant
caa10a5faf Portals fixes after latest move
This commit was SVN r32330.
2014-07-28 19:25:03 +00:00
Ralph Castain
bcade48e27 Move the opal_process_info initialization to the right place - all that info is known (for ourselves only) immediately after rte_init.
This commit was SVN r32329.
2014-07-28 19:19:35 +00:00
George Bosilca
a3feb627cf Move some of the ompi_process_info down in OPAL.
This commit was SVN r32324.
2014-07-26 21:43:34 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Jeff Squyres
0ab4eaa7d3 usnic: revert r32315 because the BTL move to opal is ongoing
Let's not make the move to OPAL any harder than it has to be; this
commit can wait until after the BTL move.

This commit was SVN r32316.

The following SVN revision numbers were found above:
  r32315 --> open-mpi/ompi@7b7ed8ed97
2014-07-25 14:17:20 +00:00
Jeff Squyres
7b7ed8ed97 usnic: minor cleanup / consolidation
CMR'ing just to (try to) keep the differences between trunk and v1.8
branch (somewhat) small.

Reviewed by Dave Goodell

cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32315.
2014-07-25 14:11:54 +00:00
Jeff Squyres
6ae45b34fc usnic: check connectivity on first communication to a peer
Previously, we were only checking connectivity upon first ''send'' to
a peer.  But this ignores the case where the first communication to a
peer is actually an ACK -- i.e., we successfully received something
from the peer and we need to send an ACK back.  So we need to verify
that the ACK will actually get there.

Specifically, certain asymmetric routing cases can lead to a hang if
we don't check the connectivity in both directions.  E.g., if the
sender is able to get traffic to the receiver, but the receiver is
unable to get traffic back to the sender because it made a different
routing decision than the sender.

In this case, the connectivity checker from the sender could succeed
(because the connectivity checker will ACK along the same path in
which the ping was received), but sending a BTL ACK could fail
(because the BTL ACK will be sent back along the path chosen by the
graph algorithm, which, in an erroneous asymmetric routing scenario,
may be different/wrong).

Hence, we want to trigger the connectivity checker at the first
communication from A->B, which may either be a BTL send or an ACK.

Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32309.
2014-07-24 21:32:56 +00:00
Bert Wesarg
3e34812a0d Changes to VT/OTF:
Ensure that target directories exists before creating symlinks.

cmr=v1.8.2:reviewer=jsquyres

Thanks Jeff to step up as an reviewer.

This commit was SVN r32305.
2014-07-24 19:06:43 +00:00
Rolf vandeVaart
9bc8fbaefd Create new error message so we can better pinpoint where an error occurs.
This commit was SVN r32303.
2014-07-24 15:18:55 +00:00
Rolf vandeVaart
3f703afb97 Fix CUDA registration where we run out of memory being allocated.
This commit was SVN r32297.
2014-07-23 21:10:17 +00:00
Todd Kordenbrock
42a871efd4 This commit fixes trac:4662 - "Portals4/MTL hangs in c_get_accumulate test".
- Portals4/OSC was unable to acquire an exclusive lock due to an invalid
local address in the atomic operation.  This caused the reported hang.
- After fixing the hang, the test continued to fail because
ompi_datatype_is_contiguous_memory_layout() reports that MPI_EMPTY (the
origin datatype) is noncontiguous and Portals4/OSC does not support
noncontiguous datatypes at this time.  However, in this case the origin
count is zero so contiguous/noncontiguous is irrelevant.  Now we skip
the contiguous check if the count is zero.

cmr=v1.8.3:reviewer=regrant:subject=Fix for "Portals4/MTL hangs in c_get_accumulate test"

This commit was SVN r32295.

The following Trac tickets were found above:
  Ticket 4662 --> https://svn.open-mpi.org/trac/ompi/ticket/4662
2014-07-23 19:13:07 +00:00
Edgar Gabriel
d4f83ab929 clean up of the MCA parameters of the fcoll framework. Most parameters are now
set/retrieved in ompio instead of the fcoll components.

This commit was SVN r32294.
2014-07-23 19:03:14 +00:00
Jeff Squyres
8e80480cbc fortran: don't use optional params in the ompi/pompi interfaces
Fix a copy-n-paste error: the ompi/pompi interfaces should not have
optional ierror arguments.  Optional ierror arguments are only used in
the MPI_<foo> interfaces.  The ompi/pompi interfaces are the actual
underlying routines (in C, incidentally, which is why they're declared
as BIND(C)), and do not have optional ierror arguments.

Also fix a typo in the BIND(C) name for pompi_win_shared_query_f().

cmr=v1.8.2:reviewer=ggouaillardet

This commit was SVN r32287.
2014-07-22 21:51:29 +00:00
Jeff Squyres
4da3c85b54 fortran: revert Absoft-based fixes
Rever r32246, r32254, and 32255 -- they were fixing side-effects of
the real bug.  Real fix coming after this one.

This commit was SVN r32286.

The following SVN revision numbers were found above:
  r32246 --> open-mpi/ompi@08d2a1a48d
  r32254 --> open-mpi/ompi@232d4dbb7b
2014-07-22 21:49:22 +00:00
Jeff Squyres
ac2621debf usnic: show_help if we can't create the connectivity map file
QA ran across the case where the user can't write to the target
directory for the connectivity map file.  In this case, we silently
continued.  They requested that we at least warn in this case.

Fixes Cisco bug CSCup62821

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32283.
2014-07-22 20:50:59 +00:00
Devendar Bureddy
74852b4d21 HCOLL: fix misplaced hcoll_init return value check.
cmr=v1.8.2:reviewer=jladd

This commit was SVN r32282.
2014-07-22 18:47:34 +00:00
Rolf vandeVaart
63d6a08283 Fix set-but-unused-warning noticed by jsquyres.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32281.
2014-07-22 18:37:40 +00:00
Howard Pritchard
828a4a29b7 Subject: fix regression in ugni btl eager get path
Description:
This mod fixes a regression in the ugni btl eager get
path introduced in changeset 32196.
References:4800
Closes:4800

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32264.
2014-07-22 15:42:11 +00:00
Rolf vandeVaart
8778418da2 Remove some debug #ifdefs (oops). Other lock support.
This commit was SVN r32263.
2014-07-22 02:09:06 +00:00
Rolf vandeVaart
1a61dd3078 Add some more locks where needed.
This commit was SVN r32262.
2014-07-22 00:29:57 +00:00
Rolf vandeVaart
7897d2a828 Improve verbose message which says which device:ports are being used. Also move where message is generated.
This commit was SVN r32261.
2014-07-21 20:38:52 +00:00
Jeff Squyres
da18eb1b8b common/verbs: fix usnic detection
The logic was mishandling the case of a newer kernel and an older
libusnic_verbs.  Simplify usnic_transport() to return constants in the
2 known cases (not a usNIC device and the TRANSPORT_USNIC_UDP case),
and call the magic probe in all other cases.

Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32260.
2014-07-21 19:52:29 +00:00
Jeff Squyres
b6075ea775 usnic: explicitly handle case when both endpoints are NULL
If we don't explicitly declare that (a == NULL && b == NULL) is
equivalent to qsort, we could end up with wonky sorting order.  I.e.,
it's *possible* that some NULLs could end up in the middle of the
array.

Regardless of whether it will ever happen in practice, it makes the
code more clear to also handle the "both are NULL" case.

Also fix the 2-spacing indents.

Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32259.
2014-07-21 16:22:48 +00:00
Rolf vandeVaart
947a4e14b4 Add a lock and clean up handling of some error conditions,
This commit was SVN r32258.
2014-07-17 19:33:10 +00:00
Mike Dubman
da8df859b3 MXM: use builk connection establishment API
fixed by Vasily, reviewed by Yossi/Miked

cmr=v1.8.2:reviwer=ompi-rm1.8

This commit was SVN r32256.
2014-07-17 08:35:55 +00:00
Gilles Gouaillardet
a7bfd6e766 fortran: fix compile issue with ABSoft compilers
one more fix ...

cmr=v1.8.2:ticket=trac:4792

This commit was SVN r32255.

The following Trac tickets were found above:
  Ticket 4792 --> https://svn.open-mpi.org/trac/ompi/ticket/4792
2014-07-17 07:38:13 +00:00
Gilles Gouaillardet
232d4dbb7b fortran: fix compile issue with ABSoft compilers
Simplify and fix the r32246

cmr=v1.8.2:ticket=trac:4792

This commit was SVN r32254.

The following SVN revision numbers were found above:
  r32246 --> open-mpi/ompi@08d2a1a48d

The following Trac tickets were found above:
  Ticket 4792 --> https://svn.open-mpi.org/trac/ompi/ticket/4792
2014-07-17 06:00:32 +00:00
Rolf vandeVaart
26e3282a18 One more minor movement for easier reading. No functional change.
This commit was SVN r32252.
2014-07-16 20:59:07 +00:00
Rolf vandeVaart
c332ca75ff Change function name for clarity.
This commit was SVN r32251.
2014-07-16 20:46:10 +00:00
Rolf vandeVaart
a2dd4ca226 Remove hack that is no longer needed.
This commit was SVN r32250.
2014-07-16 14:00:17 +00:00
Rolf vandeVaart
61821adf2f Fix self deadlock bugs.
This commit was SVN r32249.
2014-07-15 20:50:41 +00:00
Gilles Gouaillardet
08d2a1a48d fortran: fix compile issue with ABSoft compilers
ABSoft compilers cannot compile a fortran subroutine
with the BIND(C, NAME="name") modifier *and* argument(s)
with the OPTIONAL modifier

This patch detects this unsupported feature and use
adhoc wrappers if it is missing

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32246.
2014-07-15 10:41:11 +00:00
Gilles Gouaillardet
fd7bfc2221 mpi: remove automatically generated file from the tarball
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32244.
2014-07-15 07:52:13 +00:00
Gilles Gouaillardet
fd71d81e5d vt: remove automatically generated files from the tarball
cmr=v1.8.2:reviewer=jurenz

This commit was SVN r32243.
2014-07-15 07:51:19 +00:00
Ralph Castain
6c5e592785 Revert r32222, r32210, and r32203 as they created a problem when daemon collectives did not involve app procs on every node. Instead, modify the ompi/mca/rte/orte/rte_orte.h to add a new function that allows apps to request new daemon collective ids for use in barrier and modex operations. This will only appear in ORTE-based installations, but it is only being used by a couple of researchers at the moment.
Update the orte/test/mpi/coll_test.c test to show the revised example.

This commit was SVN r32234.

The following SVN revision numbers were found above:
  r32203 --> open-mpi/ompi@a523dba41d
  r32210 --> open-mpi/ompi@2ce11ed5c4
  r32222 --> open-mpi/ompi@d55f16db50
2014-07-15 03:48:00 +00:00
Nathan Hjelm
f960e4273e Fix typo in r32196
The wrong descriptor field was used when calculating the size received when
using the RDMA rendevous protcol.

This commit was SVN r32232.

The following SVN revision numbers were found above:
  r32196 --> open-mpi/ompi@a14e0f10d4
2014-07-14 21:00:53 +00:00
Ralph Castain
3d1b32a2c6 Silence warning
cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32231.
2014-07-14 19:27:30 +00:00
Matthias Jurenz
2d01dd04d4 Reverted r32225. r29732 already fixes the automake issue in the trunk
This commit was SVN r32230.

The following SVN revision numbers were found above:
  r29732 --> open-mpi/ompi@3923ee89ec
  r32225 --> open-mpi/ompi@0db23b0210
2014-07-14 11:21:38 +00:00
Mike Dubman
0db23b0210 BUILD: support new automake
new automake requires subdirs-object directive, to resolve this:

09:43:37 automake: warning: possible forward-incompatibility.
09:43:37 automake: At least a source file is in a subdirectory, but the 'subdir-objects'
09:43:37 automake: automake option hasn't been enabled.  For now, the corresponding output
09:43:37 automake: object file(s) will be placed in the top-level directory.  However,
09:43:37 automake: this behaviour will change in future Automake versions: they will
09:43:37 automake: unconditionally cause object files to be placed in the same subdirectory
09:43:37 automake: of the corresponding sources.
09:43:37 automake: You are advised to start using 'subdir-objects' option throughout your
09:43:37 automake: project, to avoid future incompatibilities.
09:43:37 tools/otfmerge/Makefile.common:13: warning: source file '$(OTFMERGESRCDIR)/otfmerge.c' is in a subdirectory,
09:43:37 tools/otfmerge/Makefile.common:13: but option 'subdir-objects' is disabled

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32225.
2014-07-12 12:38:15 +00:00
Mike Dubman
e342a11c2e opal envlist mca: implement Jeff`s quibbles
fixed by Elena, reviewed by Miked

This commit was SVN r32216.
2014-07-11 07:23:20 +00:00
Gilles Gouaillardet
ee9098e2ee Fix use-mpi-trk/Makefile.am
Always include into the tarball (aka 'make dist') :
 - mpi-f90-interfaces.h
 - mpi-f90-cptr-interfaces.F90

cmr=v1.8.2:ticket=trac:4736

This commit was SVN r32215.

The following Trac tickets were found above:
  Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
2014-07-11 05:05:19 +00:00
Gilles Gouaillardet
77184b5c4c Fix a cornercase with MPI_PROC_NULL persistent requests
Handle OMPI_REQUEST_NOOP in MPI_Startall rather than PML

cmr=v1.8.2:reviewer=bosilca:ticket=4764

This commit was SVN r32213.

The following Trac tickets were found above:
  Ticket 4764 --> https://svn.open-mpi.org/trac/ompi/ticket/4764
2014-07-11 04:37:01 +00:00
Gilles Gouaillardet
d3ff5d77e1 scif: Fix compile error related to r32196
This commit was SVN r32212.

The following SVN revision numbers were found above:
  r32196 --> open-mpi/ompi@a14e0f10d4
2014-07-11 04:32:25 +00:00
Jeff Squyres
7384ee9e44 usnic: handle NULL endpoints in connectivity map
The connectivity map output routine needs to handle the case where
entries in the endpoints array are NULL (e.g., if one process has 2
endpoints and another process has only 1 endpoint).

Fixes Cisco bug CSCup83649.

cmr=v1.8.2

This commit was SVN r32211.
2014-07-11 00:43:45 +00:00
Jeff Squyres
0089ac20af Fortran: put type(c_ptr) interfaces in a separate file in the TKR mpi module
Older gfortran compilers (e.g., the gfortran that ships in RHEL5) do
not support ISO_C_BINDING, and therefore do not support the
TYPE(C_PTR) type.  As such, they cannot support the overloaded
interfaces for MPI_WIN_ALLOCATE_SHARED and MPI_SHARED_QUERY that are
mandated in MPI-3.

So we separate those interfaces out into a separate .F90 file that is
#include'd in the tkr mpi.F90 file.  In this separate .F90 file, we
use an #if to determine whether the compiler supports ISO_C_BINDING or
not.

Also re-jiggered the order of testing in ompi_setup_mpi_fortran.m4: we
now need to test whether the compiler supports ISO_C_BINDING even when
we're only building the mpi module (not strictly when we're building
the mpi_f08 module).

Finally, tweaked the use-mpi-tkr/Makefile.am to:

* Add some proper dependencies for mpi.F90
* Allow the general AM compilation to be used instead of supplying a
  specific rule for compiling mpi.F90

cmr=v1.8.2:ticket=trac:4736

This commit was SVN r32204.

The following Trac tickets were found above:
  Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
2014-07-10 19:10:03 +00:00
Ralph Castain
a523dba41d NOTE: this modifies the MPI-RTE interface
We have been getting several requests for new collectives that need to be inserted in various places of the MPI layer, all in support of either checkpoint/restart or various research efforts. Until now, this would require that the collective id's be generated at launch. which required modification
s to ORTE and other places. We chose not to make collectives reusable as the race conditions associated with resetting collective counters are daunti
ng.

This commit extends the collective system to allow self-generation of collective id's that the daemons need to support, thereby allowing developers to request any number of collectives for their work. There is one restriction: RTE collectives must occur at the process level - i.e., we don't curren
tly have a way of tagging the collective to a specific thread. From the comment in the code:

 * In order to allow scalable
 * generation of collective id's, they are formed as:
 *
 * top 32-bits are the jobid of the procs involved in
 * the collective. For collectives across multiple jobs
 * (e.g., in a connect_accept), the daemon jobid will
 * be used as the id will be issued by mpirun. This
 * won't cause problems because daemons don't use the
 * collective_id
 *
 * bottom 32-bits are a rolling counter that recycles
 * when the max is hit. The daemon will cleanup each
 * collective upon completion, so this means a job can
 * never have more than 2**32 collectives going on at
 * a time. If someone needs more than that - they've got
 * a problem.
 *
 * Note that this means (for now) that RTE-level collectives
 * cannot be done by individual threads - they must be
 * done at the overall process level. This is required as
 * there is no guaranteed ordering for the collective id's,
 * and all the participants must agree on the id of the
 * collective they are executing. So if thread A on one
 * process asks for a collective id before thread B does,
 * but B asks before A on another process, the collectives will
 * be mixed and not result in the expected behavior. We may
 * find a way to relax this requirement in the future by
 * adding a thread context id to the jobid field (maybe taking the
 * lower 16-bits of that field).

This commit includes a test program (orte/test/mpi/coll_test.c) that cycles 100 times across barrier and modex collectives.

This commit was SVN r32203.
2014-07-10 18:53:12 +00:00
Nathan Hjelm
1b9621eeb0 Fix typo in r32196
This commit was SVN r32202.

The following SVN revision numbers were found above:
  r32196 --> open-mpi/ompi@a14e0f10d4
2014-07-10 18:43:49 +00:00
Nathan Hjelm
32ab6f850e osc/rdma: fix warning
cmr=v1.8.2:reviewer=rhc

This commit was SVN r32201.
2014-07-10 18:42:55 +00:00
Jeff Squyres
3c4674484d usnic: Fix compile errors related to r32196
This commit was SVN r32198.

The following SVN revision numbers were found above:
  r32196 --> open-mpi/ompi@a14e0f10d4
2014-07-10 17:18:03 +00:00
Nathan Hjelm
a14e0f10d4 Per RFC: Remove des_src and des_dst members from the
mca_btl_base_segment_t and replace them with des_local and des_remote

This change also updates the BTL version to 3.0.0. This commit does
not represent the final version of BTL 3.0.0. More changes are coming.

In making this change I updated all of the BTLs as well as BTL user's
to use the new structure members. Please evaluate your component to
ensure the changes are correct.

RFC text:

This is the first of several BTL interface changes I am proposing for
the 1.9/2.0 release series.

What: Change naming of btl descriptor members. I propose we change
des_src and des_dst (and their associated counts) to be des_local and
des_remote. For receive callbacks the des_local member will be used to
communicate the segment information to the callback. The proposed change
will include updating all of the doxygen in btl.h as well as updating
all BTLs and BTL users to use the new naming scheme.

Why: My btl usage makes use of both put and get operations on the same
descriptor. With the current naming scheme I need to ensure that there
is consistency beteen the segments described in des_src and des_dst
depending on whether a put or get operation is executed. Additionally,
the current naming prevents BTLs that do not require prepare/RMA matched
operations (do not set MCA_BTL_FLAGS_RDMA_MATCHED) from executing
multiple simultaneous put AND get operations. At the moment the
descriptor can only be used with one or the other. The naming change
makes it easier for BTL users to setup/modify descriptors for RMA
operations as the local segment and remote segment are always in the
same member field. The only issue I forsee with this change is that it
will require a little more work to move BTL fixes to the 1.8 release
series.

This commit was SVN r32196.
2014-07-10 16:31:15 +00:00
George Bosilca
2861419661 Correct trivial typos in man files and FUNC_NAME variables.
Patch provided by Fujitsu (Kawashima, Takahiro)

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32190.
2014-07-10 01:47:23 +00:00
Ralph Castain
796f57f709 Protect against problems if someone passes us thru a pipe and then abnormally terminates the pipe early
This commit was SVN r32189.
2014-07-09 22:41:53 +00:00
Howard Pritchard
0bc7405e07 Subject: fix name conflict when both ugni and scif installed on system
Description: This mod fixes two name conflicts between the ugni and scif btls.
References:4771
Closes:4771

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32183.
2014-07-09 19:33:58 +00:00
Nathan Hjelm
56ad231b7c coll/ml: temporarily disable binding check
This commit was SVN r32178.
2014-07-09 14:39:49 +00:00
Joshua Ladd
057370364d Opal: Add a new MCA variable type "version_string". Also add a
new flag to ompi_info that allows a user to print all MCA variables of a specific type.  

 --type version_string

This command will print all MCA variables of type version_string.

This feature was developed by Elena Shipunova and was reviewed by Josh Ladd.

This commit was SVN r32166.
2014-07-09 01:37:23 +00:00
Jeff Squyres
5081f958a6 fortran: fix MPI_Win_allocate_shared and MPI_Win_shared_query
Several problems with MPI_Win_allocate_shared and MPI_Win_shared_query
were discovered in a code review.  This commit fixes them:

* Add _cptr versions of both subroutines in mpif-h, use-mpi-tkr, and
  use-mpi-ignore-tkr directories
* Fix case of PMPI weak symbols for both C implementations
* Add MPI and PMPI f08 implementations of both subroutines (there is
  no _cptr version in the mpi_f08 module)
* Fixed _f08 suffix on the f08 module of both subroutines

cmr=v1.8.2:ticket=trac:4736

This commit was SVN r32162.

The following Trac tickets were found above:
  Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
2014-07-09 00:10:04 +00:00
Nathan Hjelm
1eb6ac5e80 mpit: update the return code check for mca_base_var_get
mca_base_var_get now can return OPAL_ERR_NOT_FOUND if a variable no
longer exists. This commit updates the return code check to ensure
the correct MPI_T error code is returned to the user.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32161.
2014-07-08 21:17:47 +00:00
Nathan Hjelm
b6abe68972 osc/rdma: check for more types of window access violations
This commit adds a check to see if the target is in an access epoch. If
not we return OMPI_ERR_RMA_SYNC. This fixes test_start3 in the onesided
test suite. The cost of this extra check is 1 byte/peer for the boolean
flag indicating that the peer is in an access epoch.

I also fixed a problem where mupliple unexpected post messages are not
correctly handled.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32160.
2014-07-08 21:11:12 +00:00
Jeff Squyres
d63cf04d2e btl_usnic_map.c: Arrgh! Forgot to svn add this file.
cmr=v1.8.2:ticket=trac:4773

This commit was SVN r32159.

The following Trac tickets were found above:
  Ticket 4773 --> https://svn.open-mpi.org/trac/ompi/ticket/4773
2014-07-08 20:09:31 +00:00
Jeff Squyres
1e17ab461b usnic: add btl_usnic_connectivity_map MCA param to output link information
If the btl_usnic_connectivity_map MCA param is set to a non-NULL
value, then each MPI process will output a file named
<prefix>-<hostname>.pid<pid>.job<jobid>.mcwrank<MCW rank>.txt.  Its
contents will detail which usNIC device(s) (and therefore which
link(s)) are being used to communicate with each peer MPI process.

Here is a sample output file (named
mpi005.pid26071.job1640759297.mcwrank0.txt):

{{{
device=usnic_0,interface=eth4,ip=10.10.0.5/16,mac=24:57:20:05:20:00,mtu=9000
device=usnic_1,interface=eth5,ip=10.2.0.5/16,mac=24:57:20:05:21:00,mtu=9000
device=usnic_2,interface=eth6,ip=10.3.0.5/16,mac=24:57:20:05:50:00,mtu=9000
peer=1,hostname=mpi006,device=usnic_0@peer_ip=10.10.0.6/16@peer_mac=24:57:20:06:20:00,device=usnic_1@peer_ip=10.2.0.6/16@peer_mac=24:57:20:06:21:00,device=usnic_2@peer_ip=10.3.0.6/16@peer_mac=24:57:20:06:50:00
peer=2,hostname=mpi007,device=usnic_0@peer_ip=10.10.0.7/16@peer_mac=24:57:20:07:20:00,device=usnic_1@peer_ip=10.2.0.7/16@peer_mac=24:57:20:07:21:00,device=usnic_2@peer_ip=10.3.0.7/16@peer_mac=24:57:20:07:50:00
peer=3,hostname=mpi008,device=usnic_0@peer_ip=10.10.0.8/16@peer_mac=24:57:20:08:20:00,device=usnic_1@peer_ip=10.2.0.8/16@peer_mac=24:57:20:08:21:00,device=usnic_2@peer_ip=10.3.0.8/16@peer_mac=24:57:20:08:50:00
}}}

Reviewed by Reese Faucette

cmr=v1.8.2

This commit was SVN r32156.
2014-07-08 19:14:46 +00:00
Nathan Hjelm
309a6cf951 coll/ml: set n_resources to 0 when destructing an lmngr
Also keep track of the allocation base so we free the correct pointer
when cleaning up.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r32151.
2014-07-07 15:11:26 +00:00
Gilles Gouaillardet
bd72628a9d Cleanup : fix the cornercase with MPI_PROC_NULL persistent requests.
This commit was SVN r32140.
2014-07-04 07:20:44 +00:00
Gilles Gouaillardet
8d3bea2771 Fix the cornercase with MPI_PROC_NULL persistent requests.
This corner case is now handled in the pml so the same code
is invoked for both MPI_Start and MPI_Startall.
This also correctly report an error if MPI_Startall is invoked twice
on a MPI_PROC_NULL persistent request.

This commit was SVN r32139.
2014-07-04 04:58:52 +00:00
Edgar Gabriel
a16e4c5bf9 As discussed during the Open MPI meeting, make ompio the default parallel I/O
library on the trunk in order to expose it to more testing.

This commit was SVN r32138.
2014-07-03 20:04:58 +00:00
Jeff Squyres
e022dd30bc usnic: EHOSTUNREACH means there is no route
ibv_create_ah() can also return EHOSTUNREACH, which means that there
is no route to the peer.  Treat that as a non-fatal warning.

Reviewed by Reese Faucette.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32135.
2014-07-03 17:19:30 +00:00
Jeff Squyres
81edddff61 usnic: make this show_help message like the others
There's no need for the port number (since usNIC has no port numbers),
and make the wording the same as other help messages.

Reviewed by Reese Faucette.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32134.
2014-07-03 17:17:34 +00:00
Jeff Squyres
852af8b834 ompi_mpi_abort: fix corner cases, simplify logic
I recently found a case where ompi_mpi_abort() segv's:

{{{
$ mpirun --mca btl non_existent_btl_name ...
}}}

In this case, the BML init fails because we have no paths to any
peers.  It calls ompi_mpi_abort(), but this is before ompi_comm_self
has been setup.  ompi_mpi_abort() assumes that if the comm parameter
is != NULL, it can be used.  But since we aborted so early in
MPI_INIT, that's a false assumption.

(note that this isn't happening on v1.8 because the check for
INIT/FINALIZE in ompi_mpi_abort() is a little different.  Hence: this
is a trunk issue -- at least for now)

When fixing this problem, I noticed a few other problems in ompi_mpi_abort():

* the group access was incorrect (it didn't use accessor functions)
* it wasn't clear that ORTE's ompi_rte_abort_peers() returns
  NOT_IMPLEMENTED and falls through down to ompi_rte_abort()
* the check for my proc in the communicator was a little more
  complicated than necessary
* the logic for checking for aborts early in MPI_INIT wasn't right
* some comments were stale
* the hostname output in error messages would be NULL if MPI_FINALIZE
  had been invoked
* it was possible to abort, but still exit with a 0 status

This commit fixes all of the above problems, and makes the logic a
little more straightforward.  Thanks to Ralph Castain and George
Bosilca for the assists with this patch.

This commit was SVN r32125.
2014-07-03 02:38:27 +00:00
George Bosilca
843ef1fcb0 ompi_mpi_abort had one extra argument that was never used. Clean it up.
This commit was SVN r32124.
2014-07-03 00:34:44 +00:00
George Bosilca
2883adcdf3 Remove useless variables.
This commit was SVN r32123.
2014-07-03 00:30:54 +00:00
Gilles Gouaillardet
8a2a0293fd fix sort_devs_by_distance in btl/openib
no need to #include <math.h> ...

cmr=v1.8.2:reviewer=miked:ticket=4759

This commit was SVN r32121.

The following Trac tickets were found above:
  Ticket 4759 --> https://svn.open-mpi.org/trac/ompi/ticket/4759
2014-07-02 08:08:10 +00:00
Gilles Gouaillardet
134eee1c4f fix sort_devs_by_distance in btl/openib
The distances as returned by hwloc_get_whole_distance_matrix_by_type are typ float.
This patch handle all distances as float.

cmr=v1.8.2:reviewer=miked

This commit was SVN r32120.
2014-07-02 07:56:40 +00:00
Ryan Grant
a1d312343b This commit fixes trac:4681 - ibm c_fence_lock hangs
cmr=v1.8.2:reviewer=tkordenbrock:subject=Portals4/MTL hanging fix

This commit was SVN r32113.

The following Trac tickets were found above:
  Ticket 4681 --> https://svn.open-mpi.org/trac/ompi/ticket/4681
2014-07-01 17:03:03 +00:00
Ryan Grant
5cb8cc856c Refs trac:4682 - This commit fixes c_flush test failure in the ibm test suite for Portals 4 OSC
cmr=v1.8.2:reviewer=tkordenbrock:subject=Move r32112 to v1.8.2 branch

This commit was SVN r32112.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r32112

The following Trac tickets were found above:
  Ticket 4682 --> https://svn.open-mpi.org/trac/ompi/ticket/4682
2014-07-01 16:26:16 +00:00
Mike Dubman
ce6d5b8cd7 HCOLL: make it OFF by default
fixed by miked, reviewed by Alex

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32101.
2014-06-28 18:45:03 +00:00
Dave Goodell
c104604387 common/verbs: update usnic transport probe
RHEL 7 has shipped with kernel support for the RDMA_TRANSPORT_USNIC
enum, but ''not'' the RDMA_TRANSPORT_USNIC_UDP enum.  This means that
when you install usNIC drivers from cisco.com, the kernel will report
IBV_TRANSPORT_USNIC, even though the transport is actually using UDP.

Therefore, we have to modify the logic in common/verbs to do the
additional magic probe if the device reports either an
IBV_TRANSPORT_IWARP or IBV_TRANSPORT_USNIC (because both of those might
be lies -- do the probe to figure out the real transport).

The code changed by this patch is fairly trivial; it simply moves the
logic of the magic probe to its own short function, and then calls that
short function in both the IBV_TRANSPORT_(IWARP|USNIC) cases.  It looks
longer because several lengthy comments were also updated.

Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32098.
2014-06-27 18:43:32 +00:00
Devendar Bureddy
228772ae81 hcoll gatherv support
cmr=v1.8.2:reviewer=jladd

This commit was SVN r32097.
2014-06-26 18:14:41 +00:00
George Bosilca
99561c5cc1 If the enable fails don't give up, but instead keep going with
the other collective modules. If we endup without some of the
collective the code will raise an error anyway.

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32096.
2014-06-26 15:52:45 +00:00
Jeff Squyres
8e52ba423f finalize/disconnect: add explicit comment about why we use an RTE barrier
Based on extensive discussions before/at the June 2014 developer's
meeting, put a lengthy comment explaining a second reason why we
''must'' use an RTE barrier during MPI_FINALIZE and
MPI_COMM_DISCONNECT (i.e., unreliable transports).  Slightly explain
more the original reason why we do this, too (BTLs can lie/buffer a
message without actually injecting it on the network). 

This commit was SVN r32095.
2014-06-26 14:31:40 +00:00
Dave Goodell
f6bb853409 usnic: properly check src iface in route queries
rtnetlink doesn't check the source address when determining whether to
return route info for a query.  So we need to check that the OIF matches
the OIF of the source interface name.  Without this check, OMPI might
pair a local interface which does not have a route to a particular
remote interface.

Fixes Cisco bug CSCup55797.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32090.
2014-06-25 22:39:02 +00:00
Ralph Castain
f3cb124e50 Revert r32082 and r32070 - the developer's conference has decided to go a different direction on the threaded progress effort. This will involve some degree of prototyping to understand the tradeoffs prior to making a final design decision, and so we'll hold off on the final change until that is completed.
This commit was SVN r32089.

The following SVN revision numbers were found above:
  r32070 --> open-mpi/ompi@12d92d0c22
  r32082 --> open-mpi/ompi@aa6438ef7a
2014-06-25 20:43:28 +00:00
Adrian Reber
4b25e92194 get the FT code to compile again by adding/removing #includes
This commit was SVN r32086.
2014-06-25 18:42:17 +00:00
Rolf vandeVaart
aa6438ef7a Remove OMPI_USE_PROGRESS_THREADS that was missed.
This commit was SVN r32082.
2014-06-25 14:42:21 +00:00
Gilles Gouaillardet
fae7adf8ee Remove legacy FCA_IS_LOCAL_PROCESS macro
and use OPAL_PROC_ON_LOCAL_NODE instead

cmr=v1.8.2:reviewer=rhc

This commit was SVN r32079.
2014-06-25 02:37:53 +00:00
Ralph Castain
f70b4a33ec Per the developer conference, let's be a little nicer during MPI_Finalize and ease up on the cpu by inserting usleep into the loop over opal_progress while waiting for the RTE barrier to complete. This is a non-performant area of the code, and while most codes may call finalize at close-to-similar times, there are some that may choose to have one or more procs continue to perform some work prior to finalizing.
So save a little power while we are waiting.

cmr=v1.8.2:reviewer=jladd:subject=save power during finalize

This commit was SVN r32077.
2014-06-24 21:59:50 +00:00
Jeff Squyres
bce33635a7 sctp: remove from trunk
At the developer meeting today, the question was raised as to whether
the SCTP BTL was maintained any more.  I emailed Alan Wagner to see if
he had any interest/resources to continue to maintain the SCTP BTL.
He indicated that he unfortunately had any resources to maintain it;
it would be fine to remove the SCTP BTL from the trunk.

So long, SCTP BTL... fare thee well...

This commit was SVN r32075.
2014-06-24 21:23:09 +00:00
Jeff Squyres
69fa331cc2 openib/ugni: output verbose message when a BTL is ignored due to THREAD_MULTIPLE
usnic and portals4 already do this.

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32074.
2014-06-24 21:13:17 +00:00
Jeff Squyres
d7a2d964f0 usnic: use the correct output stream name
cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r32073.
2014-06-24 18:13:49 +00:00
Jeff Squyres
fb9d063be2 Fortran: include the type functions (eq/ne) in libmpi_usempif08
This file has to be pre-emptively compiled to generate the module, but
then it also has to be included in libmpi_usempif08.

cmr=v1.8.2:ticket=trac:4736

This commit was SVN r32071.

The following Trac tickets were found above:
  Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
2014-06-24 17:48:15 +00:00
Ralph Castain
12d92d0c22 Per the OMPI developer conference, remove the last vestiges of OMPI_USE_PROGRESS_THREADS
This commit was SVN r32070.
2014-06-24 17:05:11 +00:00
Gilles Gouaillardet
926e29c972 Fortran: add ompi/mpi/fortran/use-mpi-f08/mpi-f08-sizeof.F90 to the dist tarball.
cmr=v1.8.2:ticket=trac:4736

This commit was SVN r32065.

The following Trac tickets were found above:
  Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
2014-06-23 04:14:28 +00:00
Gilles Gouaillardet
d1f5d9f675 Fortran: fix OMPI_GENERATE_F77_BINDINGS macro invokation
Some parameters were ommited and compilation failed if
configured with --disable-weak-symbols

cmr=v1.8.2:ticket=trac:4736

This commit was SVN r32064.

The following Trac tickets were found above:
  Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
2014-06-23 02:10:35 +00:00
Jeff Squyres
011db6974e usnic: refactor usnic_add_procs() into 2 distinct parts
1: find/create procs, and create associated endpoint for each
2: resolve peer addresses

The 2nd part is done as a separate loop so that the address lookups
can be parallelized.

The overall result is to split usnic_add_procs() into two smaller,
simpler parts.

cmr=v1.8.2:ticket=trac:4734

This commit was SVN r32062.

The following Trac tickets were found above:
  Ticket 4734 --> https://svn.open-mpi.org/trac/ompi/ticket/4734
2014-06-20 20:58:36 +00:00
Jeff Squyres
1ea7bad5a0 usnic: behave better when ibv_create_ah() fails
When ibv_create_ah() fails due to an address resolution failure, it
really only means that we can't reach that one peer -- so we should
just ignore that one peer.  If ibv_create_ah() fails for some other
reason, then give up on the entire usnic_X device.

Change the show_help() message that is displayed when ibv_create_ah()
fails due to address resolution failure; indicate that it's likely a
routing problem.  Also opal_output_verbose() the same info, since
show_help() is de-duplicated (and this particular show_help() message
can be squelched).

Fixes Cisco bugs CSCup35851 and CSCup35872.

cmr=v1.8.2:ticket=trac:4734

This commit was SVN r32061.

The following Trac tickets were found above:
  Ticket 4734 --> https://svn.open-mpi.org/trac/ompi/ticket/4734
2014-06-20 20:53:50 +00:00
Jeff Squyres
395078da00 Fortran: fix two type mistakes
Use the appropriate modules, don't use mpif-config.h.

cmr=v1.8.2:ticket=trac:4736

This commit was SVN r32052.

The following Trac tickets were found above:
  Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
2014-06-19 20:25:09 +00:00
Jeff Squyres
8935e0a5e0 Fortran use-mpi-tkr: remove real*16 and complex*32 (for now)
There is more comprehensive work regarding MPI_SIZEOF coming, but the
Fortran working group in the MPI Forum is debating this internally,
and I'm still doing more testing to get a final solution.  So for the
moment, just remove real*16 and complex*32 support so that it compiles
porperly with older compilers (that do not support real*16 and
complex*32).

This commit was SVN r32048.
2014-06-19 18:12:53 +00:00
Jeff Squyres
b375808928 Fortran: add files accidentally skipped in r32042
cmr=v1.8.2:ticket=trac:4736

This commit was SVN r32046.

The following SVN revision numbers were found above:
  r32042 --> open-mpi/ompi@fa764c1567

The following Trac tickets were found above:
  Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
2014-06-19 13:53:27 +00:00
Jeff Squyres
134c527f18 Fortran: Move all f08-related modules out of fortran/base
Move them all to fortran/use-mpi-f08, since that's the only directory
that uses them (the use-mpi-f08-desc directory has been disabled).

cmr=v1.8.2:ticket=trac:4736

This commit was SVN r32045.

The following Trac tickets were found above:
  Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
2014-06-19 13:44:08 +00:00
Jeff Squyres
fa764c1567 Fortran: add missing implementation of win_allocate_shared and win_shared_query
Thanks to Michael Rachner for pointing out the issue.

cmr=v1.8.2:ticket=trac:4736

This commit was SVN r32042.

The following Trac tickets were found above:
  Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
2014-06-19 13:38:25 +00:00
Jeff Squyres
2cbda4fe6d Fortran: fix a few ierr->ierror mistakes that crept in
Thanks for Walter Spector for raising the issue on the users list.

Refs trac:3582

cmr=v1.8.2:ticket=trac:4736

This commit was SVN r32041.

The following Trac tickets were found above:
  Ticket 3582 --> https://svn.open-mpi.org/trac/ompi/ticket/3582
  Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
2014-06-19 13:37:22 +00:00
Jeff Squyres
555073630e Fortran: remove the scripts from the use-mpi-tkr implementation
This is part one of several Fortran improvements and fixes.  This
first part removes the now-defunct scripts that are used to generate
the .f90 files in the use-mpi-tkr implementation, and just commits the
output from those scripts.  This makes long-term maintenance of the
use-mpi-tkr implementation simpler.

cmr=v1.8.2:reviewer=jsquyres:subject=Various Fortran fixes/improvements

This commit was SVN r32040.
2014-06-19 13:35:30 +00:00
Jeff Squyres
baeae72370 usnic: remove "device" and "port" language from show_help() messages
Move away from verbs-specific terms "device" and "port" in the usnic
BTL help messages.  Replace them with "usNIC interface" (since usNIC
has no concept of a port).

cmr=v1.8.2:ticket=trac:4734

This commit was SVN r32029.

The following Trac tickets were found above:
  Ticket 4734 --> https://svn.open-mpi.org/trac/ompi/ticket/4734
2014-06-18 15:20:50 +00:00
Jeff Squyres
9e32cb9b60 usnic: move some defines into btl_usnic_util.h
Move MACLEN and IPV4LEN into _util.h and rename them to be MACSTRLEN
and IPV4STRLEN, respectively.

cmr=v1.8.2:ticket=trac:4734

This commit was SVN r32028.

The following Trac tickets were found above:
  Ticket 4734 --> https://svn.open-mpi.org/trac/ompi/ticket/4734
2014-06-18 15:19:36 +00:00
Jeff Squyres
7395b65531 usnic: remove some debugging verbose output
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32027.
2014-06-18 14:14:34 +00:00
Nathan Hjelm
fd21b244ce osc/rdma: better name for lookup function
cmr=v1.8.2:ticket=trac:4732:reviewer=ompi-rm1.8

This commit was SVN r32021.

The following Trac tickets were found above:
  Ticket 4732 --> https://svn.open-mpi.org/trac/ompi/ticket/4732
2014-06-17 19:49:17 +00:00
Nathan Hjelm
390f8f52b4 osc/rdma: clean up group process matching a bit
cmr=v1.8.2:ticket=trac:4732:reviewer=dgoodell

This commit was SVN r32018.

The following Trac tickets were found above:
  Ticket 4732 --> https://svn.open-mpi.org/trac/ompi/ticket/4732
2014-06-17 17:48:30 +00:00
Nathan Hjelm
7f20868179 osc/rdma: ensure matching of post/start calls
The post and start window calls are supposed to be matching. The code
did not check to see that an incoming post matched with the start call.
This commit fixes the bug by placing the post on a pending list that
will be checked by the next call to start.

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r32017.
2014-06-17 15:23:06 +00:00
Nathan Hjelm
927098d567 osc/rdma: fix hang when accumulating with MPI_REPLACE
The replace callback did not increment the incoming frag counter. This
leads to a hang during synchronization. This commit adds the increment
and also puts the request on the garbage collection list to fix a leak.

This fixes a hang found when running the mpich test suite.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32016.
2014-06-17 14:53:29 +00:00
Nathan Hjelm
7f6de57653 osc/rdma: fix accumulate fragment size calculation
The wrong type was used when calculating the amount of space needed
for an accumulate fragment. Fixed the calculation and took the
opportunity to eliminate the get_acc header as it is identical to the
acc header.

This fixes trac:4719 and #4718

Tracking these fixes for 1.8.2 in this CMR.

Throwing this to Brad for review as he is the one who ran into the issue.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32015.

The following Trac tickets were found above:
  Ticket 4719 --> https://svn.open-mpi.org/trac/ompi/ticket/4719
2014-06-17 14:53:24 +00:00
Gilles Gouaillardet
e9ed9def02 Fix MPI_Alltoallv in coll/tuned
This changeset :
- always call the low/level implementation for :
  * MPI_Alltoallv
  * MPI_Neighbor_alltoallv
  * MPI_Alltoallw
  * MPI_Neighbor_alltoallv
- fix mca_coll_tuned_alltoallv_intra_basic_inplace
  so zero size types are correctly handled

cmr=v1.8.2:reviewer=bosilca:ticket=4715

This commit was SVN r32013.

The following Trac tickets were found above:
  Ticket 4715 --> https://svn.open-mpi.org/trac/ompi/ticket/4715
2014-06-17 06:11:34 +00:00
Nathan Hjelm
2f96f16416 osc/rdma: ensure eager sends are active before checking for sync errors
in self optimization

This addresses an issue found with the MPICH pscw_ordering test. Eager sends
were not yet active (which is ok for the standard path) but not ok for the
self optimization. Fixed by waiting for all post messages before checking
the sync state.

Fixes trac:4724

Tracking the 1.8.2 issue in this CMR.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32012.

The following Trac tickets were found above:
  Ticket 4724 --> https://svn.open-mpi.org/trac/ompi/ticket/4724
2014-06-17 04:53:47 +00:00
Nathan Hjelm
37ae430424 rma: fix locking/unlocking of MPI_PROC_NULL
It is valid to lock/unlock MPI_PROC_NULL. It probably isn't work tracking
whether MPI_PROC_NULL is locked for MPI_PROC_NULL RMA operations so this
is probably the permanent solution.

Closes trac:4720

Tracking the 1.8.2 issue with this CMR.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32011.

The following Trac tickets were found above:
  Ticket 4720 --> https://svn.open-mpi.org/trac/ompi/ticket/4720
2014-06-17 04:41:49 +00:00
Nathan Hjelm
41f0059f1e osc/sm: use an unsigned long when calculating the total segment size
Brad correctly pointed out that the total window size should not be an
int. Changed it to an unsigned long.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32010.
2014-06-17 04:33:43 +00:00
Nathan Hjelm
6ec9c6c422 osc/sm: return ompi_request_empty for all request ops
Only one field is valid for RMA requests: MPI_ERROR. This field is set
to the correct value in ompi_request_empty so there is no reason to
allocate and keep track of osc/sm requests because they are always
complete on return. Since we are no longer using the osc/sm request
structure or free list they are now removed.

Closes trac:4723

Tracking this issue with the CMR. Brad, can you verify the issue is indeed fixed.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32009.

The following Trac tickets were found above:
  Ticket 4723 --> https://svn.open-mpi.org/trac/ompi/ticket/4723
2014-06-17 04:27:02 +00:00
George Bosilca
84193fff6d More comprehensible error messages.
This commit was SVN r32007.
2014-06-16 20:23:16 +00:00
George Bosilca
542e4996a7 Cleanup the utilities functions in tuned.
This commit was SVN r31987.
2014-06-13 16:04:45 +00:00
Gilles Gouaillardet
50256c62c5 Fix MPI_Alltoallv in coll/tuned.
Correctly handle the corner case in MPI_Alltoallv when
some tasks have no data to transfer and some other tasks
do have data to transfer.

This test case is covered in ibm/collective/alltoallv_somezeros
from the ompi-tests repo.

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31985.
2014-06-13 06:03:23 +00:00
Mike Dubman
b51a42aeca MXM: fix mxm cleanup, should be called for any compat API
fixe by miked, reviewed by yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31984.
2014-06-12 15:46:38 +00:00
George Bosilca
fd0e1b7261 If we detect an error on a request that has been already released
at the MPI level, we should call abort on MPI_COMM_WORLD.

Fixes ticket #1943.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31982.
2014-06-10 16:24:13 +00:00
Alina Sklarevich
7b8ad47e93 MXM: fix env variable name to hint for thread usage in mxm
reviewed by MikeD
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31968.
2014-06-09 06:40:32 +00:00
George Bosilca
f5ebd2faeb Fix the Fortran issue identified by Akan Sang Loon. The dist graph
is really special as the weights can be one of the following three
values (NULL, EMPTY or some legal value). As such, we need a complex
if to correctly convert the Fortran value to the corresponding C
value. Thus, always defining the c_ array is the simplest and most
straighforward approach.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31955.
2014-06-05 17:10:48 +00:00
Ralph Castain
248a4b100f Per Artem, we don't know our VPID at the time of getting the initial timing mark, so just get it if timing is requested
This commit was SVN r31951.
2014-06-04 16:28:41 +00:00