1
1
Граф коммитов

7851 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
a71b5dd5c7 debuggers: update warning messages when types not found
Fixes #302.
2014-12-04 03:01:51 -08:00
Jeff Squyres
1dd68d48a8 MPI_Wtime.3: give further explanation about high-res timers 2014-12-03 17:07:42 -08:00
Nadezhda Kogteva
315a240899 Timing framework: pack timing scripts to tarball always 2014-12-02 12:22:46 +02:00
Edgar Gabriel
7e41e0e62b fix a segfault in the two-phase I/O algorithm for fileviews of 0 byte size. 2014-12-01 15:59:00 -06:00
yosefe
3f152733bf Add yalla to the list of default PMLs 2014-12-01 13:11:28 +02:00
Edgar Gabriel
0758d7570e part 1 of the fix to get rid of the missing symbols that prevent the sub-modules to be loaded. 2014-11-29 20:01:36 -06:00
George Bosilca
dee243c58d ompi_proc_finalize has an interesting side effect. A proc is
inserted in the ompi_proc_list as soon as it is created and it
is removed only upon the call to the destructor. In ompi_proc_finalize
we loop over all procs in ompi_proc_finalize and release them once.
However, as a proc is not removed from this list right away, we
decrease the ref count for each proc until it reach zero and the
proc is finally removed. Thus, we cannot clean the BML/BTL after
the call the ompi_proc_finalize.
A quick fix is to delay the call to ompi_proc_finalize until all
other frameworks have been finalized, and then the behavior
depicted above will give the expected outcome.
2014-11-28 18:26:36 -05:00
Nadezhda Kogteva
45ed55afd7 Adding of missed time measurement scripts in tarball 2014-11-28 12:15:30 +02:00
George Bosilca
43901fa15a Merge branch 'master' of github.com:open-mpi/ompi 2014-11-24 22:54:41 -05:00
Ralph Castain
48f702827e First part of memory leak cleanups from Gilles 2014-11-24 16:53:33 -08:00
George Bosilca
fb6ecdfd18 Fix few typos. 2014-11-24 01:48:09 -05:00
George Bosilca
d4edd097c0 Allow for native timer (cycle level) integration
for MPI_Wtime and MPI_Wtick.
2014-11-24 00:45:14 -05:00
Andrew Friedley
e7bcad0c13 Remove unused variable.
Reported by @adrianreber, this patch removes an unused variable in the
PSM MTL, silencing a compiler warning.
2014-11-21 07:51:44 -08:00
George Bosilca
d622db783d Based on https://github.com/open-mpi/ompi/pull/262, we should use
true_lb while computing the lower bound.
2014-11-21 19:16:05 +09:00
Gilles Gouaillardet
705147e98b coll/tuned: fix allgather bruck algorithm 2014-11-21 19:16:05 +09:00
Nathan Hjelm
1b564f62bd Revert "Merge pull request #275 from hjelmn/btlmod"
This reverts commit ccaecf0fd6, reversing
changes made to 6a19bf85dd.
2014-11-19 23:22:43 -07:00
Nathan Hjelm
0d413fb73f Revert "Remove stale file reference"
This reverts commit 4c8fa17234.
2014-11-19 23:16:16 -07:00
Ralph Castain
4c8fa17234 Remove stale file reference 2014-11-19 18:32:19 -08:00
Nathan Hjelm
5a0a48c3c4 osc: remove lingering rdma component files 2014-11-19 12:11:54 -07:00
Nathan Hjelm
1a5349ec79 ompi ignore bfo until it is updated for new btl interface 2014-11-19 11:33:04 -07:00
Nathan Hjelm
8f1a44e60e bml/r2: add all rdma btls even if another btl has higher exclusivity
Background: In order to support atomics each btl needs to provide support
for communicating with self unless the btl module can guarantee global
atomicity. Before this commit bml/r2 discarded any BTL with lower
exclusivity than an existing send btl. This would cause the BML to
discard any btl other than self.

The new behavior is as follows:

 - If an exisiting send btl has higher exclusivity then the btl will not be
   added to the send btl list for the endpoint.

 - If a btl provides RDMA support then it is always added to the rdma btl
   list.

 - bml_btl weight for send btls is now calculated across all send btls.

 - bml_btl weight for rdma btls is now calculated across all rdma btls.

With this change self should still win as the only send btl for loopback
without disqualifying other btls (ugni, openib) for atomic operations.
2014-11-19 11:33:04 -07:00
Nathan Hjelm
22625b005b osc/pt2pt: threading fixes and code cleanup 2014-11-19 11:33:04 -07:00
Nathan Hjelm
60648e4231 add more internal RMA error codes 2014-11-19 11:33:04 -07:00
Nathan Hjelm
0110603782 ob1 warning fix 2014-11-19 11:33:04 -07:00
Nathan Hjelm
45d1fac8af ugni thread safety fixes 2014-11-19 11:33:03 -07:00
Nathan Hjelm
29e4e1c90a Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma 2014-11-19 11:33:03 -07:00
Nathan Hjelm
24427639b6 Fix ob1 warnings 2014-11-19 11:33:03 -07:00
Nathan Hjelm
271818f887 pml/ob1: bug fixes and adjustments for changes in btl_sendi behavior 2014-11-19 11:33:03 -07:00
Nathan Hjelm
ee2b111011 Update PML for latest BTL update 2014-11-19 11:33:02 -07:00
Nathan Hjelm
49ff5a79d0 Update BML for the latest BTL update 2014-11-19 11:33:02 -07:00
Nathan Hjelm
c61e017177 pml: updates to reflect member changes in mca_btl_base_descriptor_t
and mca_btl_base_module_t structures
2014-11-19 11:33:02 -07:00
Nathan Hjelm
5936411a07 pml/ob1: when using btl_get try to register the entire region before
attempting to break the get into multiple rdma fragments

A little background. Historically ob1 always registered the entire memory
region when the RGET protocol was in use. This changed when Mellanox
added support to fragment RGET using the btl_prepare_dst function. Now
that the BTL layer has changed to split out the limits of get/put there
is explicit fragmentation code in ob1. Before this commit the registration
was still done per RGET fragment.

This commit will attempt to register the entire region before creating
RGET fragments. If the registration is successfull then all RGET
fragments will use this registration otherwise they will each attempt
to register their own segment of the receive buffer. If that fails
enough times each fragment will give up and fall back on send/recv.
2014-11-19 11:33:02 -07:00
Nathan Hjelm
b75bb8aea7 Update pml for btl changes 2014-11-19 11:33:02 -07:00
Nathan Hjelm
66bd698eaf Update BML for BTL interface changes 2014-11-19 11:33:02 -07:00
Andrew Friedley
b97cda7fd9 PSM MTL: Don't connect procs already connected
PSM has issues when trying calling psm_ep_connect() more than once for a
specific peer.  Use the psm_ep_connect mask argument to avoid connecting
to processes that are already connected.

OMPI ticket #268.
2014-11-12 15:52:02 -08:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Jeff Squyres
a904a2deeb OpenMPI.3in: remove trailing blank lines; no content changes 2014-11-10 08:38:24 -08:00
Gilles Gouaillardet
df6115aac4 topo/base: fix uninitialized variable
this commit fixes a bug introduced by commit open-mpi/ompi@e7c59e3adb
2014-11-10 13:06:50 +09:00
bosilca
e7c59e3adb Merge pull request #227 from ggouaillardet/rfc/coll_basic_neighbor
RFC/coll basic neighbor
2014-11-07 11:33:25 -05:00
Ralph Castain
a4c0019153 Remove the no-longer-used variables from the opal_hash_table_t definition, and their reference in the ompi debugger code. 2014-11-03 21:35:42 -08:00
Gilles Gouaillardet
64c18686b7 fix ompi_request_wait vs ompi_request_wait_all and
MPI_STATUS_IGNORE vs MPI_STATUSES_IGNORE
2014-11-04 12:16:30 +09:00
Andrew Friedley
273135dbc7 Don't open PSM context when run on single node
When running many ranks on a single node using PSM, it's possible to
exhaust the network hardware contexts (there are 16).  This patch checks
if only a single node is being used. If so, the 'ipath' component of PSM
is disabled and no hardware contexts are opened.
2014-11-03 07:18:16 -08:00
Ralph Castain
616f0894ce Add missing parens on values being passed to OPAL_THREAD_ADD32 2014-10-31 19:11:48 -07:00
Jeff Squyres
7a5b2e9b13 ob1: change an OPAL_UNLIKELY to OPAL_LIKELY
Per
924d39e415 (commitcomment-8378266),
this OPAN_UNLIKELY should really be OPAL_LIKELY.
2014-10-31 03:22:55 -07:00
Nathan Hjelm
672d96704c osc/rdma: fix regression introduced by eed7b45db5
The ompi_osc_signal_outgoing was moved from ompi_osc_rdma_frag_start to frag_send
which gave correct results for the bug reproducer but hangs with simple OSC
tests. Moved the ompi_osc_signal_outgoing back and it now passes all tests.

Closes #256
2014-10-30 23:16:11 -06:00
George Bosilca
924d39e415 Always OBJ_DESTRUCT the send request. 2014-10-30 01:28:50 -04:00
Gilles Gouaillardet
ed93c8787d ob1: add a destructor to mca_pml_ob1_recv_request_t
opal_mutex_t must be OBJ_DESTRUCTed in order to avoid
a memory leak (pthread_mutex_init allocates memory under
Cygwin, so pthread_mutex_destroy is mandatory)

Thanks to Marco Atzeri for reporting this issue
2014-10-29 13:30:29 +09:00
Gilles Gouaillardet
6af465f12d wrappers: add the $(EXEEXT) extension to the installed symbolic links 2014-10-28 16:43:36 +09:00
Gilles Gouaillardet
eef7590e58 wrappers: add the $(EXEEXT) extension to the installed symbolic links 2014-10-28 16:42:51 +09:00
Gilles Gouaillardet
a16c1e4418 mpiJava: call mca_base_var_register *after* MPI_Init
Thanks to Takahiro Kawashima and Siegmar Gross for pointing this issue
2014-10-27 14:41:54 +09:00
Jeff Squyres
37c2b9cf30 Merge pull request #241 from cniethammer/master
Add missing Fortran binding for Win_allocate.
2014-10-24 08:54:28 -04:00
Jeff Squyres
96c655ec67 man: add $COPYRIGHT$ token and emacs mode to all man pages 2014-10-23 17:21:54 -04:00
Jeff Squyres
ec7808cd27 man: remove stale Java bindings from MPI man pages
Fixes #244
2014-10-23 16:43:41 -04:00
Nathan Hjelm
23dd3af946 osc/rdma: use unsigned types for all counters
Some of the counters used by the "rdma" one-sided component are intended
to overflow. Since overflow behavior is undefined for signed integers in
C it is safer to use unsigned integers here.
2014-10-22 15:36:15 -06:00
Ralph Castain
2ec59acac4 Silence a slew of warnings when --enable-memchecker is given. Reviewed by Jeff 2014-10-22 13:59:08 -07:00
Jeff Squyres
c22e1ae33b configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros
These two macros set the prefix for the OPAL and ORTE libraries,
respectively.  Specifically, the OPAL library will be named
libPREFIXopen-pal.la and the ORTE library will be named
libPREFIXopen-rte.la.

These macros must be called, even if the prefix argument is empty.

The intent is that Open MPI will call these macros with an empty
prefix, but other projects (such as ORCM) will call these macros with
a non-empty prefix.  For example, ORCM libraries can be named
liborcm-open-pal.la and liborcm-open-rte.la.

This scheme is necessary to allow running Open MPI applications under
systems that use their own versions of ORTE and OPAL.  For example,
when running MPI applications under ORTE, if the ORTE and OPAL
libraries between OMPI and ORCM are not identical (which, because they
are released at different times, are likely to be different), we need
to ensure that the OMPI applications link against their ORTE and OPAL
libraries, but the ORCM executables link against their ORTE and OPAL
libraries.
2014-10-22 10:32:19 -07:00
Jeff Squyres
01fd96bfa5 Revert "Provide a mechanism by which an upstream project can rename
the OPAL and ORTE libraries. This is required by projects such as ORCM
that have their own ORTE and OPAL libraries in order to avoid library
confusion. By renaming their version of the libraries, the OMPI
applications can correctly dynamically load the correct one for their
build."

This reverts commit 63f619f871.
2014-10-22 10:32:11 -07:00
yosefe
b4f569b4d4 yalla: address comments on #246 by @jsquires 2014-10-22 10:42:56 +03:00
yosefe
ce7c748e51 Add new PML yalla, which uses mxm directly to reduce overhead.
http://starwars.wikia.com/wiki/Ubed_Yalla
2014-10-21 16:08:24 +03:00
Jeff Squyres
952be15d7f MPI_Ibarrier.3in: add missing man page
Add MPI_Ibarrier.3in to reference MPI_Barrier.3, and update
MPI_Barrier.3in to include bindings for MPI_Ibarrier.  Slightly update
the text to be inclusive of the non-blocking case.

Fixes #242.
2014-10-20 05:26:53 -07:00
Nadezhda Kogteva
2bce929330 MTL MXM cleanup: unnecessary OMPI_MTL_MXM_CONNECT_ON_FIRST_COMM variable removed 2014-10-20 10:29:47 +03:00
Christoph Niethammer
9020a1c1f6 Add missing Fortran binding for Win_allocate. 2014-10-17 13:22:54 +02:00
bosilca
d819939841 Merge pull request #233 from ggouaillardet/rfc/coll_module_disable
Provide a symmetric behavior for the activation/deactivation of collective modules.
2014-10-16 09:22:04 -04:00
Gilles Gouaillardet
b5aea782ce Revert "Fix heterogeneous support"
Per the discussion at http://www.open-mpi.org/community/lists/devel/2014/10/16050.php

This reverts commit c9c5d4011b.
2014-10-16 12:24:38 +09:00
George Bosilca
7541c03b4c Mark all instances where atomic operations are used but their return value is unnecessary 2014-10-15 21:47:32 -04:00
Gilles Gouaillardet
c9c5d4011b Fix heterogeneous support
* redefine orte_process_name_t so it can be converted
  between host and network format as an opal_identifier_t
  aka uint64_t by the OPAL layer.
* correctly send OPAL_DSTORE_ARCH key
2014-10-15 17:19:13 +09:00
Edgar Gabriel
0219c87039 set the fs_ptr to NULL in case of an error, to avoid a malicious free on file_close. 2014-10-14 13:09:06 -05:00
Gilles Gouaillardet
e3f74aca1c Correctly mote the pointer back by the true_lb.
Fixes #231
2014-10-14 16:26:54 +09:00
Gilles Gouaillardet
0f983d5a4f add a disable function for coll module 2014-10-14 14:46:36 +09:00
Devendar Bureddy
7a6b4c36b0 HCOLL: Update the proc structure dereference
Update the proc structure dereference to reflect the new opal_proc_t
super field
2014-10-13 20:49:19 +03:00
Devendar Bureddy
b8d2a15be9 HCOLL: by default off 2014-10-13 20:49:09 +03:00
Vasily Filipov
a215a4831d MTL/MXM: disable "bulk_connect" by default. 2014-10-13 09:47:56 +03:00
Ralph Castain
63f619f871 Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build. 2014-10-10 11:39:08 -07:00
Gilles Gouaillardet
8eb2d62919 coll/sm: fix an other memory leak 2014-10-10 19:54:45 +09:00
Gilles Gouaillardet
27e4389259 * comment on communicator creation in mca_topo_base_dist_graph_create(...)
* use accesors to retrieve topo info
2014-10-10 16:07:20 +09:00
Gilles Gouaillardet
5d44a30111 coll/sm: fix minor memory leaks
port 4488.1.patch attached in #196 to master
2014-10-10 14:21:34 +09:00
Gilles Gouaillardet
76204dfafe coll/basic: fix segmentation fault in neighborhood collectives if the degree
of the topology is higher than the communicator size

It is possible to have a topology degree higher than the size of the communicator.
For example, a periodic cartesian communicator on MPI_COMM_SELF. This will leave
the neighborhood collectives with a request buffer that is too small.

This commits introduces a semantic change :
from now, c_topo must be set before invoking coll_select
2014-10-10 11:56:04 +09:00
Gilles Gouaillardet
2f67f29b85 Revert "coll/basic: fix segmentation fault in neighborhood collectives if the degree"
This reverts commit 9c788ff940.
2014-10-10 11:29:06 +09:00
Elena
c905fe9b78 pmix: removed pmix_base_direct modex mca parameter, renamed orte_full_modex_cutoff and ompi_hostname_cutoff to direct_modex_cutoff 2014-10-09 06:15:31 +02:00
Nathan Hjelm
eed7b45db5 osc/rdma: fix issue identified by Berk Hess
osc/rdma uses counters to determine if all messages have been received
before exiting synchronization calls. The problem is that the active
target counter is always increasing (never zeroed). If over 2^31-1
messages are sent this causes the counter to overflow (in itself this
isn't an error). This causes test/wait to return before the communication
is complete. There is an additional error in the use of the fragment
flush function. If PSCW synchronization is in use this function CAN NOT
be called unless a post message has arrived.

Relevant mailing list thread: http://www.open-mpi.org/community/lists/devel/2014/10/16016.php

This commit fixes both issues. Tested against MTT and issue reproducer.

Closes #224.
2014-10-07 11:45:22 -06:00
Ralph Castain
fd6a044b7f Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages.
Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.
2014-10-03 16:02:57 -06:00
Jeff Squyres
413e775dbf version configury: make dist now works
Update the VERSION file scheme:

* Remove "want_repo_rev".
* Add "tarball_version".

All values are now always included (major, minor, release, greek,
repo_rev).  However, configure.ac now runs "opal_get_version.sh
... --tarball", which will return the value of tarball_version (if it
is non-empty) or the "full" version string (i.e.,
"major.minor.releasegreek").
2014-10-02 11:32:54 -07:00
Jeff Squyres
8468424f45 distscript: remove configure.params and autogen.subdirs kruft
Remove configure.params support: configure.params hasn't been used in
years.

Also remove autogen.subdirs support; those should really be handled by
their respective Makefile.am's.
2014-10-02 11:32:54 -07:00
Howard Pritchard
bb65835816 Fix iallgather problem with intercommunicators
A problem was found with the libnbc MPI_Iallgather
routine when using intercommunicators.  Special
thanks to Takahiro Kawashima(Fujitsu) for the patch
and a test case.  Verified master fails without the
patch and the test passes with the patch applied.

fixes #219
2014-10-02 11:45:17 -06:00
Ralph Castain
3263f721b6 Strip crlf line endings 2014-10-02 08:37:18 -07:00
Jeff Squyres
72704441a2 URLs: update URLs for GitHub 2014-10-01 14:44:09 -07:00
Ralph Castain
69328c30f5 Simplify the check for abort_print_stack by removing stale #ifdefined
cmr=v1.8.4:reviewer=jsquyres

This commit was SVN r32821.
2014-09-30 19:38:29 +00:00
Howard Pritchard
0f74467264 switch to ompi_mpi_thread_provided for ts check
Use ompi_mpi_thread_provided rather than opal_using_threads macro
to check whether MPI_THREAD_MULTIPLE is being used.

This commit was SVN r32815.
2014-09-29 22:20:35 +00:00
Howard Pritchard
7069f2361a disqualify coll ml for MPI_THREAD_MULTIPLE
This commit was SVN r32814.
2014-09-29 21:02:15 +00:00
Ralph Castain
eb95d6f892 ompi_info_get_bool returns "success" if the value isn't found, setting "flag" to false, but doesn't set the value of the param itself. So if you don't specify "blocking_fence" in MPI_Info, then the "blocking_fence" flag wasn't being set.
Initialize the blocking_fence flag to false as the code logic indicates that it should only be set if someone provides that flag.

Thanks to Lisandro Dalcin for reporting it

cmr=v1.8.4:reviewer=hjelmn

This commit was SVN r32812.
2014-09-29 17:21:28 +00:00
George Bosilca
49e79a9ade Fix the case of a single process.
This commit was SVN r32807.
2014-09-28 22:06:39 +00:00
Jeff Squyres
318e3b426a fortran: workaround Absoft linker issue
MTT found that the addition of the MPI_SIZEOF interfaces to mpif.h was
causing a linker error with the Absoft compiler.  Absoft is working on
a fix, but we can workaround the issue for now.  See comment in
Makefile.am in this commit for a lengthy explanation.

Refs trac:4917

This commit was SVN r32797.

The following Trac tickets were found above:
  Ticket 4917 --> https://svn.open-mpi.org/trac/ompi/ticket/4917
2014-09-25 21:07:46 +00:00
Nathan Hjelm
9c788ff940 coll/basic: fix segmentation fault in neighborhood collectives if the degree
of the topology is higher than the communicator size

It is possible to have a topology degree higher than the size of the communicator.
For example, a periodic cartesian communicator on MPI_COMM_SELF. This will leave
the neighborhood collectives with a request buffer that is too small. This commit
adds a call that will dynamically increase the size of the request buffer if it
is too small.

A better fix would be to create the topology *before* calling the coll_select
routine on a communicator. This will take some discussion and the solution will
not likely be ready anytime soon.

Thanks to Lisandro Dalcin for reporting this.

Original thread: http://www.open-mpi.org/community/lists/devel/2014/08/15713.php

cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32796.
2014-09-25 17:43:29 +00:00
Jeff Squyres
d13034d0b0 fortran: add configury to check for storage_size()
gfortran 4.8 does not support storage_size() on all relevant types
that we need.  So add a configure test to check and see if the
compiler's storage_size() intrinsic supports enough types for us to do
MPI_SIZEOF.

Also remove an accidentally redundant check for fortran INTERFACE.

Refs trac:4917

This commit was SVN r32790.

The following Trac tickets were found above:
  Ticket 4917 --> https://svn.open-mpi.org/trac/ompi/ticket/4917
2014-09-25 00:17:29 +00:00
Jeff Squyres
c9ea7f2732 fortran: ensure that sizeof_f08.h is built before mpi-f08.lo
mpi-f08.F90 includes sizeof_f08.h, so we need to add a Makefile
dependency to ensure that sizeof_f08.h is built first.

Refs trac:4917

This commit was SVN r32789.

The following Trac tickets were found above:
  Ticket 4917 --> https://svn.open-mpi.org/trac/ompi/ticket/4917
2014-09-24 23:59:18 +00:00
Ralph Castain
4024c8af9e Have to include the mpisync directory so the Makefile.in gets built - just don't build the binary and install it if timing isn't enabled
This commit was SVN r32781.
2014-09-24 01:18:21 +00:00
Edgar Gabriel
05c34946f7 implementation of non-blocking read/write operations through aio
functions for the posix module. Som interface changes for the fbtl were
necessary for that.

This commit was SVN r32777.
2014-09-23 21:27:57 +00:00
Artem Polyakov
f2e586980b Fix timing framework:
1. Fixes according to (http://www.open-mpi.org/community/lists/devel/2014/09/15869.php)
2. Force mpisync:rank0 to gather results. Now sync info is written by rank0 to the output file.
3. Improve mpirun_prof: 1) adopt to the environment (SLURM/TORQUE); 2) recognize some noteset-related mpirun options.

This commit was SVN r32772.
2014-09-23 12:59:54 +00:00
Ralph Castain
9c20940190 Remove mpif-sizeof.h during distclean
This commit was SVN r32771.
2014-09-21 14:26:19 +00:00
Ralph Castain
70896550bf Per input from Artem, update the copyrights on these files, ensuring to include all the licensing info for the files broght over from the mpiperf project.
This commit was SVN r32770.
2014-09-20 14:54:24 +00:00
Jeff Squyres
7f419dc5b6 fortran: set CLEANFILES properly
CLEANFILES was previously set; we need to use += to add to it.

refs trac:4917

This commit was SVN r32769.

The following Trac tickets were found above:
  Ticket 4917 --> https://svn.open-mpi.org/trac/ompi/ticket/4917
2014-09-20 10:43:49 +00:00
Jeff Squyres
040611556f fortran: don't complain about script args if we're not building fortran
refs trac:4917

This commit was SVN r32766.

The following Trac tickets were found above:
  Ticket 4917 --> https://svn.open-mpi.org/trac/ompi/ticket/4917
2014-09-20 01:22:40 +00:00
Jeff Squyres
d7eaca83fa Fortran: Fix MPI_SIZEOF. What a disaster. :-(
What started as a simple ticket ended up reaching the way up to the
MPI Forum.
    
It turns out that we are supposed to have MPI_SIZEOF for all Fortran
interfaces: mpif.h, the mpi module, and the mpi_f08 module.
    
It further turns out that to properly support MPI_SIZEOF, your Fortran
compiler *has* support the INTERFACE keyword and ISO_FORTRAN_ENV.  We
can't use "ignore TKR" functionality, because the whole point of
MPI_SIZEOF is that the implementation knows what type was passed to it
("ignore TKR" functionality, by definition, throws that information
away).  Hence, we have to have an MPI_SIZEOF interface+implementation
for all intrinsic types, kinds, and ranks.

This commit therefore adds a perl script that generates both the
interfaces and implementations for MPI_SIZEOF in each of mpif.h, the
mpi module, and mpi_f08 module (yay consolidation!).

The perl script uses the results of some new configure tests:

* check if the Fortran compiler supports the INTERFACE keyword
* check if the Fortran compiler supports ISO_FORTRAN_ENV
* find the max array rank (i.e., dimension) that the compiler supports

If the Fortran compiler supports both INTERFACE and ISO_FORTRAN_ENV,
then we'll build the MPI_SIZEOF interfaces.  If not, we'll skip
MPI_SIZEOF in mpif.h and the mpi module.  Note that we won't build the
mpi_f08 module -- to include the MPI_SIZEOF interfaces -- if the
Fortran compiler doesn't support INTERFACE, ISO_FORTRAN_ENV, and a
whole bunch of ther modern Fortran stuff.

Since MPI_SIZEOF interfaces are now generated by the perl script, this
commit also removes all the old MPI_SIZEOF implementations (which were
laden with a zillion #if blocks).

cmr=v1.8.3

This commit was SVN r32764.
2014-09-19 13:44:52 +00:00
Rolf vandeVaart
5c73101a72 Fix typo.
This commit was SVN r32755.
2014-09-18 13:58:54 +00:00
Vasily Filipov
c7c63fe73e COLL/TUNED: alltoall - return previous default values of algorithm choosing decision thresholds (were changed by r32735)
reviewed by miked
    cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32753.

The following SVN revision numbers were found above:
  r32735 --> open-mpi/ompi@5fecf65daf
2014-09-18 08:07:51 +00:00
Jeff Squyres
0f29f222f2 fortran: remove 2 unused files
As noted in the comments of these files, they aren't used.  Instead,
the Fortran interfaces for WTICK/WTIME just BIND(C) invoke the
back-end C functions (yay BIND(C)!).  Hence, there's no need to keep
these old wrapper files around any more.

cmr=v1.8.3

This commit was SVN r32751.
2014-09-17 21:49:24 +00:00
Rolf vandeVaart
8db1f89dd1 Small change to allow CUDA-aware to work with non-reduction nonblocking collectives.
Only used when CUDA-aware feature compiled in.

This commit was SVN r32750.
2014-09-17 16:55:01 +00:00
Ralph Castain
d50c8ba65f Per patch from Gilles, cleanup some errors that surface when building with PGI. Verified by Tetsuya, reviewed okay by Jeff.
RM-approved

cmr=v1.8.3:reviewer=ompi-gk1.8

This commit was SVN r32745.
2014-09-16 19:07:02 +00:00
Vasily Filipov
ff10b25e7d warnings (caused by commit r32735) fix.
reviewed by miked
    cmr=v1.8.3:reviewer=ompi-rm1.8 

This commit was SVN r32740.

The following SVN revision numbers were found above:
  r32735 --> open-mpi/ompi@5fecf65daf
2014-09-16 06:33:49 +00:00
Ralph Castain
dfb952fa78 [Contribution from Artem - moved it to svn from git for him]
Replace our old, clunky timing setup with a much nicer one that is only available if configured with --enable-timing. Add a tool for profiling clock differences between the nodes so you can get more precise timing measurements. I'll ask Artem to update the Github wiki with full instructions on how to use this setup.

This commit was SVN r32738.
2014-09-15 18:00:46 +00:00
Vasily Filipov
5fecf65daf OMPI/COLL/Tuned: add command line params for thresholds to decide if small/intermediate MSGs alltoall algorithm will be used.
cmr=v1.8.3:reviewer=miked

This commit was SVN r32735.
2014-09-15 12:34:21 +00:00
Mangala Jyothi Bhaskar
dc05b709a7 it is ok to not have a sharedfp component selected, as long as no
sharedfp functionality is being used. Return an error however if no
sharedfp component is selected and the applications calls a
file_read/write_shared function.

This commit was SVN r32718.
2014-09-12 21:15:58 +00:00
Edgar Gabriel
597177cd8b silence a warning regarding the return value of the fbtl's.
This commit was SVN r32717.
2014-09-12 18:01:30 +00:00
Mangala Jyothi Bhaskar
cd78a3a026 Fixed offset data type in communication
This commit was SVN r32710.
2014-09-11 14:51:30 +00:00
Mangala Jyothi Bhaskar
4ff21d6178 Fixed offset data type in communication
This commit was SVN r32709.
2014-09-11 14:51:07 +00:00
Mangala Jyothi Bhaskar
6e5f2c8ae8 Fixed offset data type in communication
This commit was SVN r32708.
2014-09-11 14:50:30 +00:00
Edgar Gabriel
4ccc0f5ea2 the length of the iov array should be limited to IOV_MAX, which is defined in limits.h
This commit was SVN r32706.
2014-09-10 21:59:45 +00:00
Edgar Gabriel
cc46b65a5e the fbtl interfaces should really be an ssize_t not a size_t, since the return
value could be negative, which is allowed for ssize_t, but not for size_t.

This commit was SVN r32700.
2014-09-10 15:01:54 +00:00
Edgar Gabriel
599cb7b351 update the pvfs2 fbtl to return the number of bytes generated.
This commit was SVN r32699.
2014-09-10 13:32:06 +00:00
Gilles Gouaillardet
e71452d73a Revert r32696
This commit was SVN r32697.

The following SVN revision numbers were found above:
  r32696 --> open-mpi/ompi@e4c3500166
2014-09-10 04:35:47 +00:00
Gilles Gouaillardet
e4c3500166 Fix MPI_Status_set_elements[_x] for non predefined datatypes
Fixes trac:4896

cmr=v1.8.3:reviewer=bosilca

This commit was SVN r32696.

The following Trac tickets were found above:
  Ticket 4896 --> https://svn.open-mpi.org/trac/ompi/ticket/4896
2014-09-10 02:41:29 +00:00
Edgar Gabriel
3a5f4f72da make the zero byte read/write scenarios work without the contiguous flag.
This commit was SVN r32690.
2014-09-09 16:26:14 +00:00
Edgar Gabriel
6a607caed8 fix some zero byte allocation scenarios.
This commit was SVN r32689.
2014-09-09 16:25:44 +00:00
Edgar Gabriel
ed02927767 - do not set the contiguous memory option in the collective operations. It
should not be stored on the file handle anyway, since it is not a property of
the file.
- protect a realloc for zero byte scenarios.

This commit was SVN r32678.
2014-09-07 18:09:43 +00:00
Edgar Gabriel
0d425e2f74 resetting the counter for the iov array has to happen outside of the if statement.
This commit was SVN r32677.
2014-09-07 16:30:56 +00:00
Edgar Gabriel
0f59ce6591 use the fbtl return value as originally intended, namely to retrieve the
number of bytes written and read. Status contains now the actual number of
bytes written for individual operations. For collective operations, this is
unfortunately not possible.

This commit was SVN r32674.
2014-09-07 15:14:57 +00:00
Ralph Castain
41c6058153 Bring over changes to MXM from pmix branch:
MTL MXM: establish endpoint connection on the first communication when direct_modex used

This commit was SVN r32668.
2014-09-03 18:22:11 +00:00
Gilles Gouaillardet
edfbeba7bf coll/ml: better error handling
when CHECK_AND_RECYCLE detects an error, a message is displayed
if the error occurs on an intrinsic communicator, then abort
the program (instead of trying to free the communicator)

cmr=v1.8.3:reviewer=hjelmn

This commit was SVN r32659.
2014-09-01 10:00:49 +00:00
Ralph Castain
aae1bb4f44 Silence warning
This commit was SVN r32657.
2014-08-31 08:10:35 +00:00
Jeff Squyres
f4238d65a5 fortran: also provide PMPI variants for MPI_Alloc_mem_cptr
r32622 was the first half of the fix -- we need the PMPI variants as well.

Refs trac:4882

This commit was SVN r32627.

The following SVN revision numbers were found above:
  r32622 --> open-mpi/ompi@cf0f734a98

The following Trac tickets were found above:
  Ticket 4882 --> https://svn.open-mpi.org/trac/ompi/ticket/4882
2014-08-28 23:47:38 +00:00
Gilles Gouaillardet
cf0f734a98 Fortran: add mpi_alloc_mem_cptr like bindings when configured with --without-weak-symbols
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32622.
2014-08-28 09:34:54 +00:00
Ralph Castain
b554cd7d86 Turn off the coll/ml component if --without-hwloc was given
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32621.
2014-08-27 20:25:39 +00:00
Jeff Squyres
d85527701a Fix MPI_COMM_SPLIT_TYPE with MPI_UNDEFINED
Thanks to Lisandro Dalcin for identifying the problem.

Fixes trac:4876

Submitted by George Boscila, reviewed by Jeff Squyres.

cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32615.

The following Trac tickets were found above:
  Ticket 4876 --> https://svn.open-mpi.org/trac/ompi/ticket/4876
2014-08-27 12:17:33 +00:00
Gilles Gouaillardet
7e3784e0b7 MPI_Type_create_indexed_block.3: fix a typo in the man page
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32614.
2014-08-27 03:48:03 +00:00
George Bosilca
8de93982d5 Correctly build the args for the hindexed_block datatype.
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32613.
2014-08-27 03:45:07 +00:00
Edgar Gabriel
46de730059 fix a typo
This commit was SVN r32603.
2014-08-25 20:53:19 +00:00
Edgar Gabriel
52eac0146d cleanup of the fbtl interfaces: remove the *sorted optimization flag, since it
was not used anyway in the last two years. Simplifies the code significantly.

This commit was SVN r32602.
2014-08-25 18:04:24 +00:00
Todd Kordenbrock
6a3225d800 Fix invalid symbols left by the PMIx merge.
This commit was SVN r32597.
2014-08-25 16:30:26 +00:00
Jeff Squyres
e8eb07ad87 ompi_common_dll.c: the topo mtc union offset must be saved
Since the union contains pointers -- not instances -- we need to save
the mtc offset to get to the pointers later.

This commit was SVN r32591.
2014-08-23 15:42:44 +00:00
Ralph Castain
ac0c584eb7 Add missing file
This commit was SVN r32588.
2014-08-23 04:31:35 +00:00
Ralph Castain
b1a7375192 Fix the "unreachable" message so it outputs the correct hostname for the remote proc. Cleanup some of the pmix stuff when running corner cases of errors
This commit was SVN r32584.
2014-08-22 19:20:45 +00:00
Vishwanath Venkatesan
b176787d0f Remove unwanted spaces + Test commit
This commit was SVN r32576.
2014-08-22 05:11:17 +00:00
Edgar Gabriel
9987135da0 add initial support for non-blocking read and write operations.
This commit was SVN r32571.
2014-08-22 01:34:19 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Mangala Jyothi Bhaskar
a5973c3f8c revamp of the aggregator selection logic, part 1.
This commit was SVN r32557.
2014-08-20 19:28:04 +00:00
Rolf vandeVaart
8709071819 Fix missing help file.
This commit was SVN r32550.
2014-08-18 21:52:31 +00:00
Jeff Squyres
0a398c155f opal MCA params: Move (and adapt) help message to opal help file
This commit was SVN r32547.
2014-08-16 11:54:41 +00:00
Edgar Gabriel
fabad95b8e - extend the explicit offset patch to collective explicit offset operations as
well
- minor restructuring to support the shared file pointer operations correctly
  for explicit offsets

This commit was SVN r32538.
2014-08-15 14:03:29 +00:00
Edgar Gabriel
d773dc8aa5 make arbitrary sequences of explicit and implicit offset operations work properly.
This commit was SVN r32537.
2014-08-15 01:49:43 +00:00
Jeff Squyres
3e78f7878c fortran: add missing bindings for WIN_SYNC, WIN_LOCK_ALL,
WIN_UNLOCK_ALL

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32535.
2014-08-14 20:27:30 +00:00
Edgar Gabriel
da1b6c2e87 some code reorganization in preparation for non-blocking read and write
operations.

This commit was SVN r32534.
2014-08-14 20:17:58 +00:00
Alina Sklarevich
a914c68356 MTL MXM: fix check-help-string.pl errors and warnings.
This commit was SVN r32533.
2014-08-14 07:46:56 +00:00
Edgar Gabriel
e401b68ca5 fix the zero byte fileview problem reported by Mohamad on the mailinglist
This commit was SVN r32529.
2014-08-13 23:44:43 +00:00
Vasily Filipov
5ca2fffa44 MTL/MXM: call for ompi_proc_world instead of ompi_comm_size during del_procs.
This commit was SVN r32504.
2014-08-11 11:52:23 +00:00
Gilles Gouaillardet
cf9e144f05 silence warnings
gcc 3.4.3 on solaris 10 issues some warnings

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32500.
2014-08-11 07:36:46 +00:00
Gilles Gouaillardet
22cb8a1834 check-help-strings cleanup
This commit was SVN r32497.
2014-08-11 03:27:45 +00:00
Gilles Gouaillardet
f24699623f check-help-strings cleanup
This commit was SVN r32495.
2014-08-11 03:25:22 +00:00
Gilles Gouaillardet
b565e69b86 check-help-strings cleanup
This commit was SVN r32491.
2014-08-11 03:19:57 +00:00
Mike Dubman
3c8a4d7d2d mxm: opal refactoring voices
http://www.open-mpi.org/community/lists/devel/2014/08/15590.php

This commit was SVN r32486.
2014-08-10 04:35:56 +00:00
Mike Dubman
0f60c34a9f fca: adopt opal API refactoring, fix warning.
based on http://www.open-mpi.org/community/lists/devel/2014/08/15558.php

This commit was SVN r32484.
2014-08-09 15:50:51 +00:00
Jeff Squyres
ca0ccc5321 headers: remove trailing commas in enum lists
Per http://www.open-mpi.org/community/lists/devel/2014/08/15576.php,
trailing commas are not valid in enum lists in C++ until C++11.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r32482.
2014-08-09 12:04:17 +00:00
Ralph Castain
70da69a4f3 Cleanup and ignore Intel compiler build products
This commit was SVN r32463.
2014-08-08 16:13:43 +00:00
Ralph Castain
e95187514c Update the proc structure dereference to reflect the new opal_proc_t super field
This commit was SVN r32462.
2014-08-08 16:12:49 +00:00
Jeff Squyres
132375f07f helpfiles: fix filenames referenced by calls to show_help()
This commit was SVN r32453.
2014-08-08 13:34:15 +00:00
Jeff Squyres
80a7309462 helpfiles: remove empty helpfiles
This commit was SVN r32452.
2014-08-08 13:33:47 +00:00
Jeff Squyres
c6d9bf906e configury: ensure wrapper static LIBS is filled properly
In core library portions of the configury (e.g., top-level
configure.ac itself), we were calling AC_CHECK_LIB and
OPAL_CHECK_FUNC_LIB to check for various libraries.

'''SIDENOTE:''' It turns out that modern Autoconf has AC_SEARCH_LIBS,
which does just about exactly what OPAL_CHECK_FUNC_LIB does.  So this
commit effectively replaces OPAL_CHECK_FUNC_LIB with AC_SEARCH_LIBS.

However, we never bothered to add these found libraries to the wrapper
compiler list of libraries used for static linking (doh!).  We've been
getting lucky for quite a while that components were adding the same
libraries to their wrapper compiler LIBS list.  

This is problematic, however, if we don't build some of these
components.  For example, Paul Hargrove noticed that if he configured
with --disable-shared --enable-static --disable-io-romio, ROMIO was no
longer adding some libraries to the wrapper LIBS list -- libraries
that just happened to also be needed by core OPAL/ORTE/OMPI layers.

The solution is not to use AC_CHECK_LIB or OPAL_CHECK_FUNC_LIB, but
use a pair of new macros:

 * OPAL_SEARCH_LIBS_CORE: a wrapper around AC_SEARCH_LIBS.  If we add
   something to $LIBS, then also add it to the wrapper list of static
   libraries.  This is the main piece of functionality that was
   wrong/missing.
 * OPAL_SEARCH_LIBS_COMPONENT: similar to OPAL_SEARCH_LIBS_CORE, but
   instead of directly adding it to the wrapper list of static
   libaries, add it to <framework>_<component>_LIBS (which eventually
   gets slurped up into the wrapper list of static libraries.  See the
   lengthy comment in config/opal_setup_wrappers.m4 near the beginning
   of OPAL_SETUP_WRAPPER_INIT() for a more detailed explanation).
   Most components did this correctly already, but one or two weren't
   right, so I implemented this second macro quite similar to the
   first and put it everywhere we already used AC_SEARCH_LIBS or
   OPAL_CHECK_FUNC_LIB.

This needs to soak for a day or two on the trunk before moving to the
v1.8 branch.

Refs trac:4834

cmr=v1.8.2:reviewer=ggouaillardet

This commit was SVN r32447.

The following Trac tickets were found above:
  Ticket 4834 --> https://svn.open-mpi.org/trac/ompi/ticket/4834
2014-08-07 23:54:45 +00:00
Ralph Castain
1e93e85403 Cleanup some autoconf messages - thanks to Paul Hargrove for noting them
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32429.
2014-08-05 14:48:42 +00:00
Ralph Castain
5f244e8b19 Use opal_getpagesize to get the proper page size
Refs trac:4826

This commit was SVN r32422.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-04 20:23:00 +00:00
Ralph Castain
2ceaa8a1dd Per patch from Thomas, remove orte abstraction breaks
This commit was SVN r32413.
2014-08-04 17:07:50 +00:00
Gilles Gouaillardet
5f1e0f284a Fix compilation when --enable-hetorogeneous
This commit was SVN r32410.
2014-08-04 10:35:08 +00:00
Gilles Gouaillardet
5b1ae87c76 coll/ml: fix ML_ERROR/printf parameters
cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32409.
2014-08-04 04:05:59 +00:00
Gilles Gouaillardet
f7b13d1126 Fix missing ampersand.
also replase the OMPI_CAST_RTE_NAME macro with
an inline function if OPAL_ENABLE_DEBUG, so we can
get warnings from the compiler if ampersand is missing.

Thanks to Paul Hargrove for reporting the bugs

This commit was SVN r32408.
2014-08-04 02:52:56 +00:00
Ralph Castain
61bf7af9d2 Per Paul Hargrove's suggestion, create an opal_pagesize function to abstract the various ways of obtaining that value. Rather than creating a separate file for only that one function, put it in a convenient place that is at least somewhat related.
Refs trac:4826

This commit was SVN r32407.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-02 18:38:16 +00:00
Ralph Castain
daeb9b6c4f Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain.
Thanks to Gilles for pointing out some of the discrepancies.

This commit was SVN r32398.
2014-08-01 14:44:11 +00:00
Gilles Gouaillardet
cd8fa75f87 coll/ml: align on page size as returned by sysconf
Thanks to Paul Hargrove for pointing into the right direction

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32393.
2014-08-01 08:10:12 +00:00
George Bosilca
cee2a4e5c8 Missing alloca.h. Thanks Paul for catching this.
This commit was SVN r32388.
2014-08-01 03:28:23 +00:00
Jeff Squyres
537aa674a5 fortran: remove a duplicate listing of this file
This fixes some duplicate symbols, once the .o files for the modules
were restored into the library (some compilers need the .o files, some
don't (!)).

Also, remove trailing whitespace.  :-)

This commit was SVN r32386.
2014-08-01 00:43:04 +00:00
Jeff Squyres
abbcde6cb9 fortran: add the .o's back into libmpi_usempif08
Thanks to Paul Hargrove for the massive hint to find this.

This commit was SVN r32385.
2014-07-31 23:41:47 +00:00
George Bosilca
9b2fcd898e No more ORTE specifics in this file.
This commit was SVN r32384.
2014-07-31 22:34:16 +00:00
George Bosilca
f39abb9e69 Reverting r32355: a number of processes is not a notion that a low level
communication library should use to initialize itself.

Ralph will champion this change back with an RFC if there is a realistic
need/use case from the community.

This commit was SVN r32361.

The following SVN revision numbers were found above:
  r32355 --> open-mpi/ompi@c903917f47
2014-07-30 20:11:35 +00:00
Ralph Castain
309d75dadc Add missing ampersand - function call required a pointer, not the name itself
This commit was SVN r32357.
2014-07-30 14:48:20 +00:00
Ralph Castain
c903917f47 Expose the num_procs information to the opal layer as the info is needed in several BTLs
This commit was SVN r32355.
2014-07-30 09:33:41 +00:00
Gilles Gouaillardet
b95537376f bcol/basesmuma: fix parameter order
Ref: #4815

This commit was SVN r32353.
2014-07-30 05:38:53 +00:00
Nathan Hjelm
0a32ea87e7 Remove unneeded EXTRA_DIST
This commit was SVN r32343.
2014-07-29 16:54:11 +00:00
George Bosilca
815d7bc846 Fix the example to use the new 2_2 topo component.
This commit was SVN r32338.
2014-07-29 07:00:01 +00:00
George Bosilca
78eae6108a Add an example on how to handle topologies.
This commit was SVN r32337.
2014-07-29 05:05:59 +00:00
Nathan Hjelm
1407c1f501 Remove RML code from common/sm
The only user of this code was coll/sm. I implemented a basic replacement
for the removed code. This gets the trunk compiling again with
--disable-dlopen.

This commit was SVN r32333.
2014-07-28 22:00:12 +00:00
Nathan Hjelm
0f15afa4d9 Fix typo in psm mtl
This commit was SVN r32332.
2014-07-28 22:00:03 +00:00
Nathan Hjelm
603ba71b0d Ignore components broken by the BTL move.
common/ofacm is only used by the iboffload code in ompi. This code does
not currently work so it is safe to ignore these components until it is
fixed.

This commit was SVN r32331.
2014-07-28 21:24:18 +00:00
Ryan Grant
caa10a5faf Portals fixes after latest move
This commit was SVN r32330.
2014-07-28 19:25:03 +00:00
Ralph Castain
bcade48e27 Move the opal_process_info initialization to the right place - all that info is known (for ourselves only) immediately after rte_init.
This commit was SVN r32329.
2014-07-28 19:19:35 +00:00
George Bosilca
a3feb627cf Move some of the ompi_process_info down in OPAL.
This commit was SVN r32324.
2014-07-26 21:43:34 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Jeff Squyres
0ab4eaa7d3 usnic: revert r32315 because the BTL move to opal is ongoing
Let's not make the move to OPAL any harder than it has to be; this
commit can wait until after the BTL move.

This commit was SVN r32316.

The following SVN revision numbers were found above:
  r32315 --> open-mpi/ompi@7b7ed8ed97
2014-07-25 14:17:20 +00:00
Jeff Squyres
7b7ed8ed97 usnic: minor cleanup / consolidation
CMR'ing just to (try to) keep the differences between trunk and v1.8
branch (somewhat) small.

Reviewed by Dave Goodell

cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32315.
2014-07-25 14:11:54 +00:00
Jeff Squyres
6ae45b34fc usnic: check connectivity on first communication to a peer
Previously, we were only checking connectivity upon first ''send'' to
a peer.  But this ignores the case where the first communication to a
peer is actually an ACK -- i.e., we successfully received something
from the peer and we need to send an ACK back.  So we need to verify
that the ACK will actually get there.

Specifically, certain asymmetric routing cases can lead to a hang if
we don't check the connectivity in both directions.  E.g., if the
sender is able to get traffic to the receiver, but the receiver is
unable to get traffic back to the sender because it made a different
routing decision than the sender.

In this case, the connectivity checker from the sender could succeed
(because the connectivity checker will ACK along the same path in
which the ping was received), but sending a BTL ACK could fail
(because the BTL ACK will be sent back along the path chosen by the
graph algorithm, which, in an erroneous asymmetric routing scenario,
may be different/wrong).

Hence, we want to trigger the connectivity checker at the first
communication from A->B, which may either be a BTL send or an ACK.

Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32309.
2014-07-24 21:32:56 +00:00
Bert Wesarg
3e34812a0d Changes to VT/OTF:
Ensure that target directories exists before creating symlinks.

cmr=v1.8.2:reviewer=jsquyres

Thanks Jeff to step up as an reviewer.

This commit was SVN r32305.
2014-07-24 19:06:43 +00:00
Rolf vandeVaart
9bc8fbaefd Create new error message so we can better pinpoint where an error occurs.
This commit was SVN r32303.
2014-07-24 15:18:55 +00:00
Rolf vandeVaart
3f703afb97 Fix CUDA registration where we run out of memory being allocated.
This commit was SVN r32297.
2014-07-23 21:10:17 +00:00
Todd Kordenbrock
42a871efd4 This commit fixes trac:4662 - "Portals4/MTL hangs in c_get_accumulate test".
- Portals4/OSC was unable to acquire an exclusive lock due to an invalid
local address in the atomic operation.  This caused the reported hang.
- After fixing the hang, the test continued to fail because
ompi_datatype_is_contiguous_memory_layout() reports that MPI_EMPTY (the
origin datatype) is noncontiguous and Portals4/OSC does not support
noncontiguous datatypes at this time.  However, in this case the origin
count is zero so contiguous/noncontiguous is irrelevant.  Now we skip
the contiguous check if the count is zero.

cmr=v1.8.3:reviewer=regrant:subject=Fix for "Portals4/MTL hangs in c_get_accumulate test"

This commit was SVN r32295.

The following Trac tickets were found above:
  Ticket 4662 --> https://svn.open-mpi.org/trac/ompi/ticket/4662
2014-07-23 19:13:07 +00:00