1
1
Граф коммитов

7186 Коммитов

Автор SHA1 Сообщение Дата
Pavel Shamis
3a683419c5 Fixing broken dependency between ML/BCOLS
This is hot-fix patch for the issue reported by Ralph. 
In future we plan to restructure ml data structure layout.

Tested by Nathan.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30619.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-07 19:15:45 +00:00
Jeff Squyres
6f8e76df7e Revert r30539 and r30540; using the sqrt() to limit the computation is
just plain wrong (i.e., it gives wrong answers).  

When time permits, perhaps we can put in a better algorithm for
MPI_DIMS_CREATE (Andreas Schäfer mentioned that nnodes can now be on
the order of millions, and the current algorithm is... inefficient, at
best).

This commit was SVN r30606.

The following SVN revision numbers were found above:
  r30539 --> open-mpi/ompi@fb67d98867
  r30540 --> open-mpi/ompi@4417ed2133
2014-02-07 13:46:48 +00:00
Ralph Castain
74d3393a4f Revert r30600, r30602-30604 as the first one broke the tarball and the others couldn't fix it
This commit was SVN r30605.

The following SVN revision numbers were found above:
  r30600 --> open-mpi/ompi@7d2c4cb468
  r30602 --> open-mpi/ompi@9e751a0302
  r30604 --> open-mpi/ompi@3012c280cf

Revision number ranges (suitable for "git log"):
  r30602-30604 --> open-mpi/ompi@9e751a03^..3012c280
2014-02-07 04:38:06 +00:00
Ralph Castain
3012c280cf I surrender - this code is just too interbred with other components for me to clean up, so turn it off for now
This commit was SVN r30604.
2014-02-07 04:16:21 +00:00
Ralph Castain
3954311bac We have rules about not cross-integrating components, even across frameworks - please follow them.
This commit was SVN r30603.
2014-02-07 03:46:45 +00:00
Ralph Castain
9e751a0302 You absolutely, positively *cannot* include a header file from a component in the base functions!
This commit was SVN r30602.
2014-02-07 03:27:06 +00:00
Nathan Hjelm
a06e491c2c ob1: large buffered sends were broken by the ob1 optimizations. fix them
The problem was caused by the static request optimization. The buffered send case
is much like the isend case in that the request structure may be needed after
MPI_Bsend completes. Fix this case by calling isend and freeing the resulting
request.

cmr=v1.7.5:ticket=trac:4149

This commit was SVN r30601.

The following Trac tickets were found above:
  Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149
2014-02-07 00:12:36 +00:00
Jeff Squyres
7d2c4cb468 There's a few ml-related bugs outstanding, and Nathan is looking into
them, but it's going to take a little time (at least one day).  So
Nathan says it's ok to .ompi_ignore coll ml until he's able to fix it.

This commit was SVN r30600.
2014-02-06 23:51:03 +00:00
Nathan Hjelm
3902cf66f1 ob1: OBJ_CONSTRUCT the convertor in the send_inline optimization.
This change does not appear to increase the small message latency of ping-pong
benchmarks and fixes an issue found by our ibm datatype tests.

Fixes trac:4232

cmr=v1.7.5:ticket=trac:4149

This commit was SVN r30598.

The following Trac tickets were found above:
  Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149
  Ticket 4232 --> https://svn.open-mpi.org/trac/ompi/ticket/4232
2014-02-06 21:27:42 +00:00
Nathan Hjelm
a41cb1f086 Remove duplicate definition of xpmem_apid_t
cmr=v1.7.5:ticket=trac:4216

This commit was SVN r30589.

The following Trac tickets were found above:
  Ticket 4216 --> https://svn.open-mpi.org/trac/ompi/ticket/4216
2014-02-06 20:38:20 +00:00
Jeff Squyres
12a4d1a27f Minor update to r30430: put the variables at the top of the function
instead of making an inner block.

Refs trac:4185

This commit was SVN r30588.

The following SVN revision numbers were found above:
  r30430 --> open-mpi/ompi@ea3cb1e110

The following Trac tickets were found above:
  Ticket 4185 --> https://svn.open-mpi.org/trac/ompi/ticket/4185
2014-02-06 18:37:19 +00:00
Jeff Squyres
fad3cbf639 Revert r30571.
This commit was SVN r30587.

The following SVN revision numbers were found above:
  r30571 --> open-mpi/ompi@081b679881
2014-02-06 18:35:30 +00:00
Mike Dubman
081b679881 OMPI: add call to del_procs
fixed by AlexM, reviewed by miked
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30571.
2014-02-06 08:38:32 +00:00
George Bosilca
6ee06b7fda No exit down into a BTL.
This commit was SVN r30566.
2014-02-05 15:04:01 +00:00
Ralph Castain
1326ed704f Per the RFC discussed here:
http://www.open-mpi.org/community/lists/devel/2014/01/13789.php

add support for async modex when requested.

cmr=v1.7.5:reviewer=jsquyres:subject=Add async modex support

This commit was SVN r30565.
2014-02-05 14:39:27 +00:00
Joshua Ladd
1dbd8688db This fixes a long standing bug in the OpenIB BTL's MCA param intialization.Only caught if BTL_OPENIB_FAILOVER_ENABLED. Thanks to Jeff for spotting. This should be added to:
cmr=v1.7.4:reviewer=jsquyres
cmr=v1.6.6

This commit was SVN r30558.
2014-02-04 20:01:39 +00:00
Jeff Squyres
d9786c42f7 Addendum to r30531:
* Fix some comments
 * Fix some spacing in the non-verbose "make" output
 * Make javadoc non-verbose output like other non-verbose output
 * Remove the use of JAVA_CLASS_FILES; it wasn't correct any way (it
   both derived names from JAVA_SRC_FILES ''and'' used mpi/*.class, so
   many files were listed twice)
 * Move the generation of javadoc files to "make" time (vs. "make
   install" time) by putting the "doc" subdirectory in BUILT_SOURCES
 * Make doc dependent upon mpi/MPI.class, not mpi.jar -- we only need
   the classes to exist, not the final jarfile.
 * Make jdoc-install dependent upon a real build artifact (the doc
   dir), not an artificial name that will never exist (jdoc)
 * Separate the removal of the doc (and mpi) subdirectories during
   "make clean" off into the clean-local target, because CLEANFILES
   can really only had ''files'' added to it.

These changes also fix parallel builds.

cmr=v1.7.5:ticket=trac:4214

This commit was SVN r30547.

The following SVN revision numbers were found above:
  r30531 --> open-mpi/ompi@6ca8e68e4b

The following Trac tickets were found above:
  Ticket 4214 --> https://svn.open-mpi.org/trac/ompi/ticket/4214
2014-02-03 22:32:45 +00:00
Jeff Squyres
fa02bba7c5 Remove a bunch of extra whitespace.
Thanks to Andreas Schäfer for the original patch.

This commit was SVN r30541.
2014-02-03 19:30:43 +00:00
Jeff Squyres
4417ed2133 Gah; I missed the #include in r30539.
cmr=v1.7.5:ticket=trac:4217

This commit was SVN r30540.

The following SVN revision numbers were found above:
  r30539 --> open-mpi/ompi@fb67d98867

The following Trac tickets were found above:
  Ticket 4217 --> https://svn.open-mpi.org/trac/ompi/ticket/4217
2014-02-03 19:28:07 +00:00
Jeff Squyres
fb67d98867 Suggestion from Andreas Schäfer: we really only need sqrt(freeprocs)
primes.  This considerably reduces the computational load when
freeprocs is large.

cmr=v1.7.5:reviewer=hjelmn:subject=MPI_Dims_create optimization

This commit was SVN r30539.
2014-02-03 19:21:04 +00:00
Nathan Hjelm
12f0bf9488 basesmuma: missed a couple of MB references
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30538.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-03 18:19:53 +00:00
Nathan Hjelm
84320f3815 btl/vader: fix compilation with SGI xpmem and add some debugging to
component_init.

cmr=v1.7.5:ticker=#4053

This commit was SVN r30535.
2014-02-03 17:42:40 +00:00
Nathan Hjelm
64321acc22 basesmuma: do not call MB directly
opal does not always define MB. It is recommended that opal_atomic_[rw]mb is
called instead. We will need to address the cases where these functions are
no-ops on weak-memory ordered cpus.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30534.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-03 17:01:57 +00:00
Nathan Hjelm
c2b061cc84 basesmuma: clean up code
Several changes are contained in this commit:

 - Clean up tabs and trailing whitespaces

 - Use consistent indentation in changed files

 - Remove unused code. None of the removed code will ever have been
   used in a trunk build.

 - Clean up the smcm code quite a bit

 - Do not fflush stderr and use opal_output instead of fprintf.

These changes have been tested on Cray XE-6 and PSM systems.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30533.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-03 17:01:46 +00:00
Jeff Squyres
1e952808ef r30519 (and the associated CMR #4209) left out fixing
MPI_SUBARRAYS_SUPPORTED and MPI_ASYNC_PROTECTS_NONBLOCKING in the F08
descriptor prototype.

This commit fixes the F08 descriptor prototype in the same was as
r30519 did for the non-F08-descriptor implementation.

Thanks to Mike Dubman for finding the issue.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30532.

The following SVN revision numbers were found above:
  r30519 --> open-mpi/ompi@caaab7e8a3
2014-02-03 16:55:33 +00:00
Jeff Squyres
6ca8e68e4b Install Java API docs into $(docdir)
Fixes trac:4054

cmr=v1.7.5:reviewer=osvegis

This commit was SVN r30531.

The following Trac tickets were found above:
  Ticket 4054 --> https://svn.open-mpi.org/trac/ompi/ticket/4054
2014-02-03 16:10:08 +00:00
Christoph Niethammer
4f23d8214c Fixed incorrect calculation of reallocated memory in mca_bml_r2_del_btl.
This commit was SVN r30529.
2014-02-03 08:43:59 +00:00
Nathan Hjelm
1ae39753dc bcol/basesmuma: check the return code of bcol_basesmuma_smcm_allgather_connection.
Fixes a segmentation fault found by the bogus intercomm_create test.

cmr=v1.7.4:review=manjugv

This commit was SVN r30527.
2014-01-31 22:20:25 +00:00
Jeff Squyres
1a9cdcc8ff Restore version numbers to "ompi_info --all" output.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30523.
2014-01-31 16:20:46 +00:00
Jeff Squyres
caaab7e8a3 Fix Fortran delcarations of MPI_SUBARRAYS_SUPPORTED and MPI_ASYNC_PROTECTES_NONBLOCKING
Ensure that these two flags are in all of mpif.h, the mpi module, and
the mpi_f08 module.  Thanks to Rolf Rabenseifner for pointing out the
issue.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30519.
2014-01-31 15:22:12 +00:00
Adrian Reber
7de34ea201 SNAPC/CRCP/SSTORE: remove compiler warnings
This commit was SVN r30488.
2014-01-29 20:52:00 +00:00
Adrian Reber
fa1036f38c SSTORE/CRCP: use ORTE_WAIT_FOR_COMPLETION with non-blocking receives
During the commits to make the C/R code compile again the
blocking receive calls were replaced by non-blocking
which broke the code. This patch uses ORTE_WAIT_FOR_COMPLETION()
to wait until the non-blocking calls have finished.

This commit was SVN r30486.
2014-01-29 20:30:35 +00:00
Hadi Montakhabi
7bf4c425ff Fix: making sure the file type is not overwritten by the last queried component
This commit was SVN r30478.
2014-01-29 19:21:03 +00:00
Nathan Hjelm
afae924e29 coll/ml: fix some warnings and the spelling of indices
This commit fixes one warning that should have caused coll/ml to segfault
on reduce. The fix should be correct but we will continue to investigate.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30477.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-29 18:44:21 +00:00
Nathan Hjelm
700e97cf6a btl/vader: add support for SGI's implementation of xpmem and add support
for 32-bit architectures.

This commit also modifies _OMPI_CHECK_HEADER to use AC_CHECK_HEADERS instead
of AC_CHECK_HEADER. This allows components to check for multiple headers
instead of just one. The new semantics of the header check in OMPI_CHECK_PACKAGE
are to return success if at least one of the specified headers exists. The new
semantics will not break current usage.

cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30476.

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
2014-01-29 18:35:47 +00:00
Jeff Squyres
3fa9d36aba Per http://www.open-mpi.org/community/lists/devel/2014/01/13938.php,
Orion Poplawski noticed that we should not be installing mpio.h.

cmr=v1.7.4:reviewer=hjelmn:subject=do not install mpio.h

This commit was SVN r30465.
2014-01-28 21:46:26 +00:00
George Bosilca
bde9619386 Various minor cleanups.
This commit was SVN r30431.
2014-01-26 17:27:12 +00:00
George Bosilca
ea3cb1e110 Don't forget to call del_procs.
This commit was SVN r30430.
2014-01-26 17:26:40 +00:00
George Bosilca
d265981c55 Don't always retain the proc, do it only for new procs. This enforce a strict policy in the BML, it has one and only one ref on each proc.
This commit was SVN r30429.
2014-01-26 17:26:04 +00:00
Ralph Castain
b32556e6dc Fixes trac:4143
After IM with Nathan, apply patch from ticket after verification by Paul Hargrove that it fixes the problem on non-x86 32-bit platforms

Verified by Paul, RM-approved

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r30411.

The following Trac tickets were found above:
  Ticket 4143 --> https://svn.open-mpi.org/trac/ompi/ticket/4143
2014-01-24 17:56:52 +00:00
Nathan Hjelm
2435057a57 ignore the iboffload component for now.
This commit was SVN r30398.
2014-01-23 16:06:21 +00:00
Rolf vandeVaart
9f3bf4747d Provide option to have synchronous copy be asynchronous with a wait. For now,
this has to be selected at runtime.  Also fix up some error messages to have
node name in them.

This commit was SVN r30396.
2014-01-23 15:47:20 +00:00
Jeff Squyres
9fee7c2b4d According to a report from Adam Moody, there is a compile error with
ROMIO and Lustre 2.4.0.  It has been solved upstream already; here's
the ticket:

    http://trac.mpich.org/projects/mpich/ticket/1973

And here's the commit that fixed it:

    a0c4278f14

OMPI does not have the other code referred to in that git commit (in
ad_lustre_hints.c).

Thanks to Adam Moody for reporting the issue.

cmr=v1.7.4:reviewer=hjelmn:subject=Fix ROMIO compile error w/ Lustre 2.4

This commit was SVN r30393.
2014-01-23 14:15:35 +00:00
Christoph Niethammer
86776daf75 Fixed typo in opal output message.
This commit was SVN r30392.
2014-01-23 08:37:40 +00:00
Mike Dubman
071838bb0a HCOLL: call hcoll_finalize and hcoll progress unregister in case of hcoll module query failures
fixed by Elena, reviewed by Val/Miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30390.
2014-01-23 07:29:23 +00:00
Jeff Squyres
772afc760e Shift .h files from one Makefile.am to another to enable "make dist"
cmr=v1.7.4:ticket=4162

This commit was SVN r30384.

The following Trac tickets were found above:
  Ticket 4162 --> https://svn.open-mpi.org/trac/ompi/ticket/4162
2014-01-23 02:00:05 +00:00
Jeff Squyres
2281a682ba Remove old kruft from the Makefile.am
The dist graph functions are on the trunk and have long-since been
added to the relevant lists.

cmr=v1.7.5:ticket=4163

This commit was SVN r30382.

The following Trac tickets were found above:
  Ticket 4163 --> https://svn.open-mpi.org/trac/ompi/ticket/4163
2014-01-23 01:33:44 +00:00
Jeff Squyres
7515d2caa9 Add Emacs mode at the top of the file
cmr=v1.7.5:ticket=4163

This commit was SVN r30381.

The following Trac tickets were found above:
  Ticket 4163 --> https://svn.open-mpi.org/trac/ompi/ticket/4163
2014-01-23 01:32:26 +00:00
Jeff Squyres
d910522ff6 Remove placeholder text file.
cmr=v1.7.5:subject=Rollup of Fortran fixes for 1.7.5

This commit was SVN r30380.
2014-01-23 01:30:59 +00:00
Jeff Squyres
aa0ceaa78b Move common code to ompi/mpi/fortran/base.
The attribute and conversion callback subroutine interfaces
are used by all 3 modules, and belong in the fortran/base directory,
not the directory of a specific module.

Also clean up some comments.

cmr=v1.7.4:ticket=4162

This commit was SVN r30378.

The following Trac tickets were found above:
  Ticket 4162 --> https://svn.open-mpi.org/trac/ompi/ticket/4162
2014-01-23 01:28:04 +00:00
Jeff Squyres
19617394f0 Add profiling versions of dist_graph functions into the library
Also fix the interfaces that have logical parameters (the
non-profiling versions were added/fixed a long time ago; it looks like
the profiling versions were inadvertantly skipped).

cmr=v1.7.4:ticket=4162

This commit was SVN r30377.

The following Trac tickets were found above:
  Ticket 4162 --> https://svn.open-mpi.org/trac/ompi/ticket/4162
2014-01-23 01:24:54 +00:00
Jeff Squyres
5aa75d0ed9 Add missing pmpi interfaces for neighbor routines
Somehow these interfaces were missed when adding these interfaces.

cmr=v1.7.4:ticket=4162

This commit was SVN r30376.

The following Trac tickets were found above:
  Ticket 4162 --> https://svn.open-mpi.org/trac/ompi/ticket/4162
2014-01-23 01:23:31 +00:00
Jeff Squyres
fe76eac8ab Revert part of SVN r30273: remove "protected" from special Fortran sentinels
r30273 made the use of the Fortran "protected" keyword be
compiler-specific (i.e., configure/macro-ized it).  But it
inadvertantly added the use of "protected" to some sentinel constants
that should not be protected (e.g., MPI_STATUS_IGNORE).

This commit reverts the addition of "protected" to the constants that
should not be protected.

cmr=v1.7.4:subject=Rollup of Fortran fixes for v1.7.4

This commit was SVN r30375.

The following SVN revision numbers were found above:
  r30273 --> open-mpi/ompi@5f17bc3c2c
2014-01-23 01:21:42 +00:00
Ralph Castain
06e6a06f3e Cleanup a couple of abstraction breaks found by Thomas Naughton
This commit was SVN r30371.
2014-01-22 21:36:24 +00:00
Hadi Montakhabi
8af6b8b4e4 add support for PLFS filesystem
This commit was SVN r30370.
2014-01-22 21:16:15 +00:00
Nathan Hjelm
7ba8bd81fa coll/ml: remove debug fprintfs
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30367.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-22 17:21:05 +00:00
Nathan Hjelm
82d996fb76 coll/ml: cleanup some merge related errors
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30366.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-22 16:48:09 +00:00
Nathan Hjelm
ff4c9c808a btl/ugni: fix leak in new sendi function.
cmr=v1.7.5:ticket=trac:4151

This commit was SVN r30365.

The following Trac tickets were found above:
  Ticket 4151 --> https://svn.open-mpi.org/trac/ompi/ticket/4151
2014-01-22 16:32:07 +00:00
Nathan Hjelm
66b69da394 Fix a bug in the ob1 optimizations that can cause a segfault.
btl sendi functions currently can not handle the descriptor being NULL. The
send inline optimization was assuming (incorrectly) that NULL was ok.

cmr=v1.7.5:ticket=trac:4149

This commit was SVN r30364.

The following Trac tickets were found above:
  Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149
2014-01-22 16:31:58 +00:00
Nathan Hjelm
1a021b8f2d coll/ml: add support for blocking and non-blocking allreduce, reduce, and
allgather.

The new collectives provide a signifigant performance increase over tuned for
small and medium messages. We are initially setting the priority lower than
tuned until this has had some time to soak in the trunk. Please set
coll_ml_priority to 90 for MTT runs.

Credit for this work goes to Manjunath Gorentla Venkata (ORNL), Pavel Shamis (ORNL),
and Nathan Hjelm (LANL).

Commit details (for reference):

Import ORNL's collectives for MPI_Allreduce, MPI_Reduce, and MPI_Allgather.

We need to take the basesmuma header into account when calculating the
ptpcoll small message thresholds. Add a define to bcol.h indicating the
maximum header size so we can take the header into account while not
making ptpcoll dependent on information from basesmuma.

This resolves an issue with allreduce where ptpcoll overwrites the
header of the next buffer in the basesmuma bank.

Fix reduce and make a sequential collective launcher in coll_ml_inlines.h

The root calculation for reduce was wrong for any root != 0. There are
four possibilities for the root:

 - The root is not the current process but is in the current hierarchy. In
   this case the root is the index of the global root as specified in the
   root vector.

 - The root is not the current process and is not in the next level of the
   hierarchy. In this case 0 must be the local root since this process will
   never communicate with the real root.

 - The root is not the current process but will be in next level of the
   hierarchy. In this case the current process must be the root.

 - I am the root. The root is my index.

Tested with IMB which rotates the root on every call to MPI_Reduce. Consider
IMB the reproducer for the issue this commit solves.

Make the bcast algorithm decision an enumerated variable

Resolve various asset failures when destructing coll ml requests.

Two issues:

 - Always reset the request to be invalid before returning it to the
   free list. This will avoid an asset in ompi_request_t's destructor.
   OMPI_REQUEST_FINI does this (and also releases the fortran handle
   index).

 - Never explicitly construct or destruct the superclass of an opal
   object. This screws up the class function tables and will cause
   either an assert failure or a segmentation fault when destructing
   coll ml requests.

Cleanup allgather.

I removed the duplicate non-blocking and blocking functions and modeled
the cleanup after what I found in allreduce. Also cleaned up the code
somewhat.

Don't bother copying from the send to the recieve buffer in
bcol_basesmuma_allreduce_intra_fanin_fanout if the pointers are the
same.

The eliminates a warning about memcpy and aliasing and avoids an
unnecessary call to memcpy.

Alwasy call CHECK_AND_RELEASE on memsync collectives.

There was a call to OBJ_RELEASE on the collective communicator but
because CHECK_AND_RECYLCE was never called there was not matching call
to OBJ_RELEASE. This caused coll ml to leak communicators.

Make allreduce use the sequential collective launcher in coll_ml_inlines.h

Just launch the next collective in the component progress.

I am a little unsure about this patch. There appears to be some sort
of race between collectives that causes buffer exhaustion in some cases
(IMB Allreduce is a reproducer). Changing progress to only launch the
next bcol seems to resolve the issue but might not be the best fix.

Note that I see little-no performance penalty for this change.

Fix allreduce when there are extra sources.

There was an issue with the buffer offset calculation when there are
extra sources. In the case of extra sources == 1 the offset was set
to buffer_size (just past the header of the next buffer). I adjusted
the buffer size to take into accoun the maximum header size (see the
earlier commit that added this) and simplified the offset calculation.

Make reduce/allreduce non-blocking. This is required for MPI_Comm_idup
to work correctly.

This has been tested with various layouts using the ibm testsuite and
imb and appears to have the same performance as the old blocking version.

Fix allgather for non-contiguous layouts and simplify parsing the
topology.

Some things in this patch:

 - There were several comments to the effect that level 0 of the
   hierarchy MUST contain all of the ranks. At least one function
   made this assumption but it was not true. I changed the sbgp
   components and the coll ml initization code to enforce this
   requirement.

 - Ensure that hierarchy level 0 has the ranks in the correct
   scatter gather order. This removes the need for a separate
   sort list and fixes the offset calculation for allgather.

 - There were several passes over the hierarchy to determine
   properties of the hierarchy. I eliminated these extra passes
   and the memory allocation associated with them and calculate the
   tree properties on the fly. The same DFS recursion also handles
   the re-order of level 0.

All these changes have been verified with MPI_Allreduce, MPI_Reduce, and
MPI_Allgather. All functions now pass all IBM/Open MPI, and IMB tests.

coll/ml: correct pointer usage for MPI_BOTTOM

Since contiguous datatypes are copied via memcpy (bypassing the convertor) we
need to adjust for the lb of the datatype. This corrects problems found testing
code that uses MPI_BOTTOM (NULL) as the send pointer.

Add fallback collectives for allreduce and reduce.

cmr=v1.7.5:reviewer=pasha

This commit was SVN r30363.
2014-01-22 15:39:19 +00:00
Jeff Squyres
be0e557d3c Revert r30164: it was just the wrong thing to do.
Fixes trac:4155.

This commit was SVN r30360.

The following SVN revision numbers were found above:
  r30164 --> open-mpi/ompi@ca84ffdbd4

The following Trac tickets were found above:
  Ticket 4155 --> https://svn.open-mpi.org/trac/ompi/ticket/4155
2014-01-22 00:51:03 +00:00
Nathan Hjelm
c9c335544e btl/ugni: fix a typo in r30353
cmr=v1.7.5:ticket=trac:4151

This commit was SVN r30354.

The following SVN revision numbers were found above:
  r30353 --> open-mpi/ompi@aa3fea55b2

The following Trac tickets were found above:
  Ticket 4151 --> https://svn.open-mpi.org/trac/ompi/ticket/4151
2014-01-21 21:02:28 +00:00
Nathan Hjelm
aa3fea55b2 btl/ugni: re-add a sendi function to exploit the new optimization in
ob1.

Also update LANL platform files to use the latest version of ugni.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30353.
2014-01-21 20:53:35 +00:00
Nathan Hjelm
2b57f4227e ob1: optimize blocking send and receive paths
Per RFC. There are two optimizations in this commit:

 - Allocate requests for blocking sends and receives on the stack. This
   bypasses the request free list and saves two atomics on the critical path.
   This change improves the small message ping-pong by 50-200ns on both AMD
   and Intel CPUs.

 - For small messages try to use the btl sendi function before intializing a
   send request. If the sendi fails or the btl does not have a sendi function
   silently fallback on the standard send path.

cmr=v1.7.5:reviewer=brbarret

This commit was SVN r30343.
2014-01-21 15:16:21 +00:00
George Bosilca
7e1593ef80 Prevent integer overflow in datatype creation. Patch based on
Gilles Gouaillardet solution attached to ticket #4145.

Closes trac:4145.
cmr=v1.7.4:reviewer=ompi-rm1.7
cmr=v1.6.6:reviewer=ompi-rm1.6

This commit was SVN r30342.

The following Trac tickets were found above:
  Ticket 4145 --> https://svn.open-mpi.org/trac/ompi/ticket/4145
2014-01-21 14:44:00 +00:00
Mike Dubman
b8550a55a7 HCOLL: many fixes
Adds coll_hcoll_np mca parameter similar to that of fca component (defaults to 32). Those who use hcoll be aware that from now on the communicators less than 32 procs will run w/o hcoll by default. - Resolves fallback issue in case libhcoll runs out of allowed contexts. The solution is moving hcoll_context_create from comm_enable to comm_query. Shortly, comm_enable should never return OMPI_ERROR in the coll component with highest priority (hcoll). Otherwise the ompi coll_base_select will unselect the coll funtion pointers and module references leaving the communicator w/o coll pointer. This will cause the fail. Same behavior can be reproduced even with tuned if one would hardcore some "return OMPI_ERROR" into it's module_enable funtion. - Additionally, removed all the dead code under #if 0; removed unused variables (path for library, active_modules list) and classes (module list wrapper)

Fixed by Val, Reviewed by Devendar/Josh/Miked

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30341.
2014-01-21 12:19:47 +00:00
Ralph Castain
2cf4862b49 Cleanup warnings for use of void* - requires intermediate cast to uintptr_t. Thanks to Paul Hargrove for reporting it
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30333.
2014-01-20 15:44:45 +00:00
Edgar Gabriel
be5d5834c5 fix the problem identified by a user on the mailing list with MPI_MODE_EXCL
cmr=v1.7.4:reviewer=vvenkatesan:subject=fix a problem when opening a file with MODE_EXCL

This commit was SVN r30324.
2014-01-18 16:06:27 +00:00
Nathan Hjelm
c88626510c Fix a merge issues with new ROMIO and fix obvious ROMIO bug.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30319.
2014-01-18 00:29:16 +00:00
Hadi Montakhabi
8c14411289 f_cc_size is contiguous chunk size, not the stripe width. There is no stripe_width in the file handle structure.
This commit was SVN r30314.
2014-01-17 18:35:55 +00:00
Mike Dubman
2af0f878bc remove bml_init call, called from btl add_proc.
Refs trac:3763

This commit was SVN r30310.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2014-01-17 16:52:20 +00:00
Mike Dubman
b7750ccbf4 OSHMEM: bml initialization is moved into ompi_init
it fixes race of mca_var segfault in finalization of shmem

based on this thread:
http://www.open-mpi.org/community/lists/devel/2014/01/13778.php

Refs trac:3763

fixed by Igor, reviewed by Brian

This commit was SVN r30304.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2014-01-17 06:09:29 +00:00
Nathan Hjelm
f2a73fcdbd udreg: free huge page allocations correctly
This commit fixes an error path that occurs when huge page allocations are
enabled. In this case we allocate a huge page and try to register it but fail.
We then were calling free on the opal object. Fix this by calling the proper free
function.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30289.
2014-01-14 16:26:06 +00:00
Nathan Hjelm
f9d2032705 vader: ensure fast box data is aligned on 4-byte boundaries
This commit fixes a bus error on Solaris/Sparc.

Closes trac:4111

cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30288.

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
  Ticket 4111 --> https://svn.open-mpi.org/trac/ompi/ticket/4111
2014-01-14 16:04:52 +00:00
Rolf vandeVaart
e75afb2b82 Fix bug in distance computation code when deciding which devices to use on a NUMA node.
Also add a verbose flag so one can see what devices are selected as well as another flag to override
locality information and use all devices on the node.  

This commit was SVN r30287.
2014-01-14 15:41:56 +00:00
Nathan Hjelm
da1316ca6e vader: don't OBJ_RELEASE endpoint rcaches.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30284.
2014-01-13 23:44:34 +00:00
Jeff Squyres
fbd70d7798 George correctly pointed out that there's no need for this test: it
effectively exists elsewhere in the code already.

This commit was SVN r30277.
2014-01-13 22:26:22 +00:00
Jeff Squyres
20d6391734 Patch submitted by Paul Hargrove to fix NetBSD compile with -laio.
NetBSD puts the AIO functions in -lrt, vs. the usual libc.  So we
need the fbtl/posix configure.m4 to test for -lrt properly.

Reviewed by Jeff Squyres.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Fix NetBSD use of -laio

This commit was SVN r30274.
2014-01-13 18:49:39 +00:00
Jeff Squyres
5f17bc3c2c Make the use of PROTECTED in the mpi_f08 module be optional.
Add a configure test to see if the Fortran compiler supports the
PROTECTED keyword.  If it does, use in mpi-f08-types.F90 (via a macro
defined in configure-fortran-output-bottom.h).

This is needed to support the PGI 9 Fortran compiler, which does not
support the PROTECTED keyword.

Note that regardless of whether we want to support the PGI 9 Fortran
compiler + mpi_f08, we need to correctly detect whether PROTECTED
works or not, and then use that determination as a criteria for
building the mpi_f08 module.  Previously, mpi-f08-types.F90 used
PROTECTED unconditionally, and we didn't test for it in configure.  So
if a compiler (e.g., PGI 9) supported everything else but didn't
support PROTECTED, it would try to compile the mpi_f08 stuff and choke
on the use of PROTECTED.

Refs trac:4093

This commit was SVN r30273.

The following Trac tickets were found above:
  Ticket 4093 --> https://svn.open-mpi.org/trac/ompi/ticket/4093
2014-01-13 18:35:42 +00:00
Ralph Castain
e7710873a1 Open/close the RTE framework
cmr=v1.7.4:reviewer=hjelmn

This commit was SVN r30270.
2014-01-13 17:43:24 +00:00
Jeff Squyres
40939df16c Add two predefined MPI object padding tests:
1. Canary compile-time test: this is compiled whenever you compile
    the entire OMPI tree.  It's a noinst standalone library comprised
    of a single .c file, so no one will notice its addition, and it
    doesn't get linked/installed to any real build products.  If we
    are out of padding space on any predefined MPI object type, it
    will fail to compile.  This will alert/annoy a human, who will be
    able to fix the real problem.
 1. Added a "make check" test that will print out the amount of
    predefined padding left on all the MPI object types.

This commit was SVN r30268.
2014-01-13 16:39:39 +00:00
Yossi Etigin
7564e2c13f Fix a recursion in mxm send flow which happens when mpi starts a new send from the context of send completion callback.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30265.
2014-01-12 17:47:03 +00:00
Yossi Etigin
9504969f7d fix communicator double-free from pt2pt component, caused by r29938.
cmr=v1.7.5:reviewer=brbarret

This commit was SVN r30264.

The following SVN revision numbers were found above:
  r29938 --> open-mpi/ompi@ecfb122c97
2014-01-12 17:38:14 +00:00
Ralph Castain
286ff6d552 For large scale systems, we would like to avoid doing a full modex during MPI_Init so that launch will scale a little better. At the moment, our options are somewhat limited as only a few BTLs don't immediately call modex_recv on all procs during startup. However, for those situations where someone can take advantage of it, add the ability to do a "modex on demand" retrieval of data from remote procs when we launch via mpirun.
NOTE: launch performance will be absolutely awful if you do this with BTLs that aren't configured to modex_recv on first message!

Even with "modex on demand", we still have to do a barrier in place of the modex - we simply don't move any data around, which does reduce the time impact. The barrier is required to ensure that the other proc has in fact registered all its BTL info and therefore is prepared to hand over a complete data package. Otherwise, you may not get the info you need. In addition, the shared memory BTL can fail to properly rendezvous as it expects the barrier to be in place.

This behavior will *only* take effect under the following conditions:

1. launched via mpirun

2. #procs is greater than ompi_hostname_cutoff, which defaults to UINT32_MAX

3. mca param rte_orte_direct_modex is set to 1. At the moment, we are having problems getting this param to register properly, so only the first two conditions are in effect. Still, the bottom line is you have to *want* this behavior to get it.

The planned next evolution of this will be to make the direct modex be non-blocking - this will require two fixes:

1. if the remote proc doesn't have the required info, then let it delay its response until it does. This means we need a way for the MPI layer to tell the RTE "I am done entering modex data".

2. adjust the SM rendezvous logic to loop until the required file has been created

Creating a placeholder to bring this over to 1.7.5 when ready.

cmr=v1.7.5:reviewer=hjelmn:subject=Enable direct modex at scale

This commit was SVN r30259.
2014-01-11 17:36:06 +00:00
Jeff Squyres
69ecf1670c Remove even more dead Fortran configury.
This configure option was only relevant when we were generating TKR
"use mpi" interfaces for MPI subroutines with choice buffers.  Now
that we aren't, the only interface that needs to accept a choice
buffer is MPI_SIZEOF (which we have to provide).  

And since there's now only several dozen interfaces in the "mpi" TKR
module, there's no reason to not generate ''all'' possible array rank
values (when there were thousands of interfaces, generating 4-vs-7
array ranks per interface per type was a big deal).  The default used
to be 4; now we can just hard-code it to 7, the max possible value for
Fortran 2003 (I think the max was raised ?to 11? in F2008, but let's
not go there for now).

cmr=v1.7.5:reviewer=dgoodell:subject=Remove even more dead Fortran configury

This commit was SVN r30257.
2014-01-11 14:06:59 +00:00
Jeff Squyres
b0ffdb3ae5 As noted by Paul Hargrove, older PGI compilers support ''some'' of
BIND(C), but not ''all'' of it.  So expand our configure checks to
look for multiple different forms of BIND(C):

 * ISO_C_BINDING
 * SUBROUTINE ... BIND(C)
 * TYPE, BIND(C)
 * TYPE(foo), BIND(C, name="bar")

If the compiler supports all of these, then declare that we support
BIND(C), and the rest of the mpi_f08 checks can continue.  If we miss
any one of those, don't bother continuing -- we won't build the
mpi_f08 module.

Also push the results of all of these tests down to ompi_info so that
they can be reported easily (e.g., "Hey, why doesn't my OMPI
installation have the mpi_f08 module?").

cmr=v1.7.4:reviewer=jsquyres:subject=Expand Fortran BIND(C) configure checks

This commit was SVN r30247.
2014-01-10 23:44:55 +00:00
Jeff Squyres
751aa195e9 Similar to r30244, make the libmpi_usempif08 Fortran library also
LIBADD libmpi.la

cmr=v1.7.4:reviewer=brbarret:subject=Add libmpi to libmpi_usempif08_LIBADD

This commit was SVN r30245.

The following SVN revision numbers were found above:
  r30244 --> open-mpi/ompi@7015343951
2014-01-10 21:33:10 +00:00
Jeff Squyres
7015343951 Make the Fortran libraries also LIBADD libmpi.la (libmpi_usempi for
TKR LIBADDs libmpi_mpifh; there	is no library for libmpi_usempi	ignore
TKR).

Refs trac:4085

This commit was SVN r30244.

The following Trac tickets were found above:
  Ticket 4085 --> https://svn.open-mpi.org/trac/ompi/ticket/4085
2014-01-10 21:30:58 +00:00
Jeff Squyres
34ae50a0ed Fix int <--> pointer casting by adding intermediate cast through (intptr_t)
Reviewed by Dave Goodell

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Add intptr_t casting in usnic btl

This commit was SVN r30243.
2014-01-10 20:42:53 +00:00
Nathan Hjelm
5259ab213f Fix one more error path in udreg. In this case we hit the maximum size
of the udreg cache and get a different error code back.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30242.
2014-01-10 19:27:32 +00:00
Ralph Castain
9566650458 Per Marco, don't define a "min" function if one is already defined to avoid conflict with cygwin reserved word
This commit was SVN r30241.
2014-01-10 18:03:25 +00:00
Ralph Castain
880943dc10 Per Marco, rename "interface" to "tcp_interface" to avoid cygwin reserved word
This commit was SVN r30240.
2014-01-10 18:02:22 +00:00
Ralph Castain
c7a94a57d7 Per Marco, rename ERROR tags to exit_ERROR to avoid cygwin reserved name issues.
Refs trac:4085

This commit was SVN r30239.

The following Trac tickets were found above:
  Ticket 4085 --> https://svn.open-mpi.org/trac/ompi/ticket/4085
2014-01-10 18:00:49 +00:00
Jeff Squyres
350d989c00 Fix OpenBSD warnings where <malloc.h> is available and usable, but not
intended to be used and emits a compile-time warning.

Thanks to Paul Hargrove for identifying the issue.

cmr=v1.7.4:reviewer=hjelmn:subject=remove/replace malloc.h

This commit was SVN r30231.
2014-01-10 17:20:49 +00:00
Jeff Squyres
53a3defde9 s/CACHE_LINE_SIZE/BASESMUMA_CACHE_LINE_SIZE/g to avoid a system macro
name clash on some BSDs.

cmr=v1.7.4:reviewer=pasha

This commit was SVN r30230.
2014-01-10 16:48:43 +00:00
Edgar Gabriel
217e61e345 add proper typcasts to intptr_t to avoid warnings on 32bit systems.
This commit was SVN r30229.
2014-01-10 16:19:04 +00:00
Jeff Squyres
212e07a1e9 Don't instantiate+init variables in a switch block.
Avoid compiler warning about (unnecessarily) initializing 2 variables
during instantiation at the top of a switch block (but outside of any
case statements): just declare the variables at the top of the outter
block.  They're already safely initialized, so don't worry about
initializing them in the instantiation.

Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Don't instantiate+init variables in a switch block

This commit was SVN r30228.
2014-01-10 15:39:16 +00:00
Mike Dubman
110c99af4f sharing negative tag space between libNBC and HCOLL
fixed by devendar, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30224.
2014-01-10 12:51:34 +00:00
Nathan Hjelm
52c231df3e ob1 does not check the return code of mpool_register. This can cause the
ob1 dummy registration to actually be used when using udreg. Fix this by
always setting reg to NULL when mpool/udreg's register function fails.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30214.
2014-01-10 00:46:16 +00:00
Jeff Squyres
b0b17c62aa Protect against orte_proc_applied_binding being NULL.
It is now possible for orte_proc_applied_binding to be NULL (e.g., if
you mpirun --bind-to none), so we need to ensure we don't pass it down
to opal_hwloc_base_cset2*str().

Also, take the opprotunity to de-duplicate some strings that are used
in multiple places.

Refs trac:4073

This commit was SVN r30204.

The following Trac tickets were found above:
  Ticket 4073 --> https://svn.open-mpi.org/trac/ompi/ticket/4073
2014-01-09 23:38:34 +00:00
Jeff Squyres
115025b8dd Ensure that the usnic BTL is only built on 64 bit Linux platforms.
Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Ensure the usnic BTL only builds on 64 bit Linux

This commit was SVN r30199.
2014-01-09 22:17:01 +00:00
Brian Barrett
013e0ec771 * Add multi-device support to the Portals 4 btl.
* Remove use of the Portals 4 proc tag for the btl, as it's causing more
problems than its worth.

This commit was SVN r30191.
2014-01-09 20:01:42 +00:00
Ralph Castain
f179f2086b Do a better job of reporting bindings - if someone gives a spec that binds us to all processors, then we are effectively unbound and should report it clearly instead of outputting a long line of B's.
cmr=v1.7.4:reviewer=jsquyres:subject=Do a better job of reporting bindings

This commit was SVN r30179.
2014-01-09 16:16:16 +00:00
Nathan Hjelm
bb01fc2938 Add missing MCA variable enumerator sentinel.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30178.
2014-01-09 15:28:42 +00:00
Alina Sklarevich
2869ff1782 mxm: fixes for compilation warnings.
removed set but not used variables and a variable that is unused.

reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30176.
2014-01-09 15:15:14 +00:00
Mike Dubman
0fae2caef3 Create a comm keyval for hcoll component with delete callback function.
Set comm attribute with keyval.
Wait for pending hcoll module tasks in comm delete callback where PML
still valid on the communicator. safely destroy hcoll context during
hcoll module destructor.

Author: Devendar Bureddy 
reviewed by miked

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30175.
2014-01-09 11:27:24 +00:00
Nathan Hjelm
10ecd80c8c Fix typo in udreg mpool that could cause us to try to use an invalid
registration. This was causing transaction errors on Aries systems.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r30174.
2014-01-09 05:56:29 +00:00
Ralph Castain
2453843972 Add missing include - thanks to Paul Hargrove for spotting it
cmr=v1.7.4:reviewer=jsquyres:subject=add missing include in bcol

This commit was SVN r30171.
2014-01-09 03:57:55 +00:00
Jeff Squyres
776f6144af Part 2/companion to r30169: remove Fortran TKR interfaces for MPI
subroutines with choice buffers.

Refs trac:4065

This commit was SVN r30170.

The following SVN revision numbers were found above:
  r30169 --> open-mpi/ompi@759ee33fd4

The following Trac tickets were found above:
  Ticket 4065 --> https://svn.open-mpi.org/trac/ompi/ticket/4065
2014-01-09 02:23:20 +00:00
Jeff Squyres
759ee33fd4 Per thread starting here:
http://www.open-mpi.org/community/lists/users/2014/01/23327.php

Revert the Fortran mpi module default size to "small", meaning that we
won't provide interfaces for MPI subroutines that take a choice buffer
any more.  The short version is that MPI-3 p610:34-41 disallows it.

This commit simply removes all these subroutines from the build
process (i.e., remove them from nodist_libmpi_usempi_la_SOURCES).
Since MPI-3 actually forbids providing these interfaces, I'll do a
second commit to actually remove all the scripts and associated
Makefile.am junk.

cmr=v1.7.4:reviewer=dgoodell:subject=Remove choice buffer interfaces from Fortran mpi module

This commit was SVN r30169.
2014-01-09 01:33:13 +00:00
Jeff Squyres
ca84ffdbd4 Need to have BIND(C) on the callback interfaces. Reviewed/confirmed
by Tobias Burnus.

Refs trac:4058.

This commit was SVN r30164.

The following Trac tickets were found above:
  Ticket 4058 --> https://svn.open-mpi.org/trac/ompi/ticket/4058
2014-01-08 23:12:41 +00:00
Jeff Squyres
9d41632eba Change the MCA level to 2 (from 5) on the rationale that it may be
needed for correctness.  The if_include/if_exclude are level 1, and
the TCP port range params are level 2; this parameter seems to be on
par with the TCP port range params.

Refs trac:4019

This commit was SVN r30161.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2014-01-08 19:04:26 +00:00
Jeff Squyres
8c871c2db6 Fix some compiler warnings:
* Remove some set-but-not-used variables
 * Make a convenience function return void (we weren't using the
   return code, anyway)
 * Mark a function as inline (it was supposed to be inline anyway)

Reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7:subject=Fix usnic BTL compiler warnings

This commit was SVN r30160.
2014-01-08 16:57:14 +00:00
Ralph Castain
cb31187bbe Correct tcp_not_use_nodelay option processing - change in mca param system incorrectly reversed the original parameter
Thanks to Tetsuya Mishima for detecting it!

cmr=v1.7.4:reviewer=jsquyres:subject=Correct tcp_not_use_nodelay option processing

This commit was SVN r30157.
2014-01-08 15:12:50 +00:00
Mike Dubman
43d6a30693 Fix problems of:
- HCOLL close without init
- Call hcoll progress after comm finalize
- mpirun default for coll_hcoll_enable is 1

fixed by Igor, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30156.
2014-01-08 10:55:25 +00:00
Jeff Squyres
36cca10042 Thanks to a reminder from Tobias Burunus, commit support for the
upcoming GCC/gfortran 4.9's ignore TKR interface.

This was originally committed in a side mercurial repo, but I sadly
completely forgot about it until Tobias reminded me.

cmr=v1.7.5:reviewer=dgoodell:subject=Add support for gfortran 4.9 Fortran ignore TKR

This commit was SVN r30152.
2014-01-08 03:46:27 +00:00
Ralph Castain
e2ca265f40 Per 1/7/2014 telecon: Add an MCA param to turn on all warnings for missing excluded interfaces.
Refs trac:4019

This commit was SVN r30146.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2014-01-08 00:21:25 +00:00
Jeff Squyres
13b29cff2c This commit compliements/completes r30140. r30140 made all the
configury/Makefile.am changes; this commit renames the internal
installdirs.h framework struct field names to match the configry macro
names:

 * pkgdatdir ->	ompidatadir
 * pkglibdir -> ompilibdir
 * pkgincludedir -> ompiincludedir

This commit was SVN r30145.

The following SVN revision numbers were found above:
  r30140 --> open-mpi/ompi@8b778903d8
2014-01-07 23:36:33 +00:00
Brian Barrett
7d472ad5a5 Improve some comments
This commit was SVN r30144.
2014-01-07 23:35:04 +00:00
Jeff Squyres
50d20ade82 Fix compiler warnings: remove unused variables
This commit was SVN r30143.
2014-01-07 23:21:47 +00:00
Jeff Squyres
8349e122e8 Fix compiler warning (signed/unsigned comparison)
This commit was SVN r30142.
2014-01-07 23:18:55 +00:00
Brian Barrett
afde8370b3 Pull both calls to get into one function, and wrap with the appropriate
reference count if flow control is enabled.

This commit was SVN r30141.
2014-01-07 23:15:09 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Tom Naughton
c01db6faca fix typo in btl:vader for OMPI_LOCAL_RANK_INVALID
This commit was SVN r30139.
2014-01-07 21:42:51 +00:00
Brian Barrett
dbcc53bc6f Fix a threading issue
Remove some unneeded UNLIKELYs

This commit was SVN r30138.
2014-01-07 19:41:39 +00:00
Rolf vandeVaart
b3edca19df Add braces per coding convention and design review.
This commit was SVN r30137.
2014-01-07 17:30:37 +00:00
Jeff Squyres
8bf4ad9030 Refs trac:4301
Complements r30073: tighten up the string parsing of the vendor parts
ID MCA param a bit.  Also fix a small memory leak: ensure to free the
array uint32_t's parsed out of the MCA param.

This commit was SVN r30128.

The following SVN revision numbers were found above:
  r30073 --> open-mpi/ompi@6003702a51

The following Trac tickets were found above:
  Ticket 4301 --> https://svn.open-mpi.org/trac/ompi/ticket/4301
2014-01-06 22:16:04 +00:00
Nathan Hjelm
e627c91227 btl/vader: add support for traditional shared memory.
This commit adds support for placing the send memory segment in a
traditional shared memory segment when XPMEM is not available. The
current default is to reserve 4MB for shared memory on each process.
The latest benchmarks show vader performing better than sm on both
Intel and AMD CPUs.

For large messages vader will now use CMA if it is available (and
XPMEM is not).

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30123.
2014-01-06 19:51:44 +00:00
Nathan Hjelm
5c8ea3a251 btl/openib: Move free list memory allocation to add_procs
Per RFC which expired two weeks ago:

We are planning to make a change to Open MPI to always set up the btls. This
means the btl init will be called even if add_procs is never called for that
btl. In the openib btl free lists fragments are currently allocated in btl_init.
To avoid wasting that memory this commit moves that final device setup to
the add_procs function. This included allocating free lists, and starting the
async event thread.

At this time this change is safe since we have a barrier after add_procs in
MPI_Init. If this changes we will need to re-think some of the initialization
since we might have the possibility of a connection request before add_procs
is called.

Tested with Mellanox ConnectX2 and QLogic HCAs.

Commit also cleans up tabs in btl_openib_async.c.

cmr=v1.7.5:reviewer=miked

This commit was SVN r30122.
2014-01-06 19:51:30 +00:00
Matthias Jurenz
03c5791104 Changes to VT/OTF:
Fixed compiler warnings seen with the Clang compiler.

This commit was SVN r30121.
2014-01-06 14:03:01 +00:00
Brian Barrett
d4bb1cbbad * Start working on thread safety of Portals 4 MTL
* Only call flowctl_add_procs if there's a new proc in the add_procs call

This commit was SVN r30110.
2014-01-02 22:37:01 +00:00
Brian Barrett
e811a8a9cb Make the Portals 4 collective component disable itself when there's not a
Portals 4 point-to-point (MTL or BTL) component in use

This commit was SVN r30109.
2014-01-02 22:35:37 +00:00
Oscar Vega-Gisbert
c9b7ea6d1a Comm.reduceLocal: add missing offset artifact
This commit was SVN r30108.
2014-01-02 21:57:48 +00:00
George Bosilca
fb0f7d7fa5 Fix the issue with the topologies attached to a communicator.
This commit was SVN r30107.
2014-01-02 17:38:09 +00:00
Ralph Castain
871f4e519c Silence warning
Refs trac:4040

This commit was SVN r30105.

The following Trac tickets were found above:
  Ticket 4040 --> https://svn.open-mpi.org/trac/ompi/ticket/4040
2014-01-02 16:05:54 +00:00
Oscar Vega-Gisbert
795131fc59 javadoc: remove old references to offset
This commit was SVN r30102.
2014-01-01 21:40:27 +00:00
Rolf vandeVaart
c47e06463d Adjust CUDA related crossover value.
This commit was SVN r30100.
2013-12-30 18:39:11 +00:00
Rolf vandeVaart
e7f430d9ac Add empty line that was inadvertently removed in message.
This commit was SVN r30099.
2013-12-30 18:38:07 +00:00
George Bosilca
947c180d7f Create a finalize function to provide an opportunity to the mpool
base to release the internal structures.

This commit was SVN r30098.
2013-12-29 11:45:46 +00:00
Ralph Castain
652f7a120f Add Mellanox device IDs that were included in prior releases, but somehow missing again here
cmr=v1.7.4:reviewer=miked

This commit was SVN r30095.
2013-12-26 17:47:05 +00:00
Ralph Castain
62378a64c8 As Jeff pointed out, the reqd flag should only turn off the show_help - still enter the rest of the code block
Refs trac:4019

This commit was SVN r30091.

The following Trac tickets were found above:
  Ticket 4019 --> https://svn.open-mpi.org/trac/ompi/ticket/4019
2013-12-26 15:02:41 +00:00
Ralph Castain
a8a91b374e Update component-level selection comments to match latest revisions
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30087.
2013-12-25 19:12:43 +00:00
Jeff Squyres
12d23e9c92 Left out valid end-of-string comparison in r30073.
Refs trac:4031

This commit was SVN r30074.

The following SVN revision numbers were found above:
  r30073 --> open-mpi/ompi@6003702a51

The following Trac tickets were found above:
  Ticket 4031 --> https://svn.open-mpi.org/trac/ompi/ticket/4031
2013-12-24 12:07:56 +00:00
Jeff Squyres
6003702a51 Minor improvements to the usnic BTL:
1. Fix ompi_info memory leak in usnic BTL: do not allocate memory in
    the component register function, because ompi_info only calls the
    component register function and then dlclose's the component -- it
    does not call component finalize.  Instead, defer parsing the MCA
    param (and alloc'ing memory) until the component init function so
    that any allocated memory can be freed in the component close
    function.
 1. Also add a new check to ensure that we actually have some part
    numbers to check.  Add a show_help message if we don't find any
    vendor part IDs to check.
 1. Add a verbose output if usnic disqualifies itself from selection
    because THREAD_MULTIPLE was specified.

cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r30073.
2013-12-24 11:57:35 +00:00
Jeff Squyres
365ce2cd03 Fix minor MPI thread memory leak / fix valgrind still-reachable warning.
cmr=v1.7.5:reviewer=brbarret:subject=Fix minor MPI thread memory leak

This commit was SVN r30072.
2013-12-24 11:05:51 +00:00
Jeff Squyres
bceaa347b1 Label what the GAP_TEST macro does. Print more meaningful output as
to what the test is doing (i.e., checking for gaps between struct fields).

This commit was SVN r30070.
2013-12-24 11:03:24 +00:00
Ralph Castain
6a432ca092 Per patch from Ashley Pittman, correct the name of the struct within which the code is looking for "mtc".
cmr=v1.7.4:reviewer=bosilca:subject=Correct name of struct

This commit was SVN r30061.
2013-12-23 21:32:16 +00:00
Ralph Castain
9eebb79d54 Cleanup a loop that couldn't possibly execute as the outer loop indexed was being reused by the inner loops, leaving the index at the cutoff point after the first iteration
cmr=v1.7.4:reviewer=edgar:subject=Cleanup loop in sharedfp

This commit was SVN r30059.
2013-12-23 18:34:34 +00:00
Nathan Hjelm
3be4536d9b Cleanup various leaks in ompi_info reported by valgrind.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30058.
2013-12-23 17:47:43 +00:00
Mike Dubman
80f4e02e0a Several changes:
- Modifications to coll/hcoll component related to the changes in the libhcoll API. 
  Now, hcoll_destroy_context accepts one more parameter that indicates if the context was
  really destroyed as a result of the call. 
  This new "non-blocking" context destruction fixes hang discovered in IMB with mcast enabled. 
- Clean up all the left contexts (if any) on the comm_world destruction. 

fixed by Val, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30055.
2013-12-23 06:57:12 +00:00
Jeff Squyres
1448522d15 In an MPI_IBCAST, we cannot shortcut if there's only 1 process.
cmr=v1.7.4:reviewer=brbarret:subject=Fix IBCAST for COMM_SELF
-This line, and those below, will be ignored--

M    c/ibcast.c

This commit was SVN r30054.
2013-12-22 22:55:58 +00:00
Jeff Squyres
71ec6c1617 Remove unnecessary "mpi.h"; move opal headers to the top.
This commit was SVN r30053.
2013-12-22 20:38:43 +00:00
George Bosilca
b324884375 This might explain the current difficulties with the mapping...
This commit was SVN r30047.
2013-12-21 23:26:13 +00:00
George Bosilca
38cbaeaa82 Try to impose a little bit of consistency on how we parse lists of
modules by enforcing the use of OPAL list accessors.

This commit was SVN r30045.
2013-12-21 23:23:33 +00:00
Ralph Castain
042ed95e4e Remove an annoying warning. If the user excludes a non-existent interface, there is no reason to warn - the interface may simply not exist on that node.
cmr=v1.7.4:reviewer=jsquyres:subject=Remove an annoying warning

This commit was SVN r30042.
2013-12-21 01:51:11 +00:00
Adrian Reber
53a70fe87f Trying to get the C/R code to compile again. (send_*_nb)
This patch changes all send/send_buffer occurrences in the C/R code
to send_nb/send_buffer_nb.
The new code compiles but does not work.

Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED

Changes from V2:
* just replace the blocking calls with the non-blocking calls
* all #ifdef's introduced in V1 are gone
* send_* returns error code or ORTE_SUCCESS (not the number of bytes)

This commit was SVN r30036.
2013-12-20 21:58:28 +00:00
Adrian Reber
a3813d37c7 Trying to get the C/R code to compile again. (recv_*_nb)
This patch changes all recv/recv_buffer occurrences in the C/R code
to recv_nb/recv_buffer_nb.
The old code is still there but disabled using ifdefs (ENABLE_FT_FIXED).
The new code compiles but does not work.

Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED

Changes from V2:
* only #ifdef out the code where the behaviour is changed
  (used to be blocking; now non-blocking)

This commit was SVN r30035.
2013-12-20 21:05:40 +00:00
Rolf vandeVaart
695d854cd8 Fix return value.
This commit was SVN r30034.
2013-12-20 20:57:04 +00:00
Ralph Castain
31248c0985 Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match.
Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node.

Refs trac:4003

This commit was SVN r30033.

The following Trac tickets were found above:
  Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003
2013-12-20 20:42:39 +00:00
Rolf vandeVaart
4cd1958deb Fix so we do not get warnings when running on system without CUDA software installed and CUDA-aware compiled in.
This commit was SVN r30032.
2013-12-20 20:39:25 +00:00
Ralph Castain
0098f9f51a Remove remaining stale references
Refs trac:4006

This commit was SVN r30027.

The following Trac tickets were found above:
  Ticket 4006 --> https://svn.open-mpi.org/trac/ompi/ticket/4006
2013-12-20 17:48:28 +00:00
Dave Goodell
bd901a68ed usnic: fix 'fls' warnings+errors
The old version caused compilation errors on Solaris.  Thanks to Paul
Hargrove for testing and reporting the bug:

  http://www.open-mpi.org/community/lists/devel/2013/12/13520.php

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30025.
2013-12-20 17:37:22 +00:00
George Bosilca
7178492dd5 Correctly initialize and finalize all the datatype classes. No memory leaks on the
datatype engine subsists.

This commit was SVN r30019.
2013-12-20 15:57:10 +00:00
Jeff Squyres
4739850931 As reported by Paul Hargrove
(http://www.open-mpi.org/community/lists/devel/2013/12/13521.php),
OpenBSD-5 #define's MIN and MAX, so we need to #undef them.

cmr=v1.7.4:reviewer=rhc:subject=undef MIN and MAX for OpenBSD-5

This commit was SVN r30007.
2013-12-20 11:40:59 +00:00
Ralph Castain
6959ba5577 Add missing include file.
Thanks to Paul Hargrove for spotting it.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29998.
2013-12-19 23:39:21 +00:00
Dave Goodell
0c6b292442 romio: pick "infinitely stale" fix from upstream
Some NFS scenarios can result in an infinite ESTALE return, which will
hang ROMIO.  This commit causes ROMIO to error out after a large number
of retries instead of spinning forever.

This is MPICH commit b250d338:

http://git.mpich.org/mpich.git/commit/b250d338e66667a8a1071a5f73a4151fd59f83b2

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r29993.
2013-12-19 22:55:26 +00:00
Ralph Castain
b745078535 Support user-provided envars for comm_spawn using info key "env"
Thanks to Tom Fogal for the request

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29990.
2013-12-19 20:59:50 +00:00
Jeff Squyres
3a14adef63 Remove the comments around these assignments; otherwise, we won't get
function pointers set to the _map functions, and we get segv's in MTT
testing (e.g., the C++ suite, which actually calls MPI_Cart_map and
MPI_Graph_map).

cmr=v1.7.4:reviewer=bosilca:subject=Fix topo _map function pointer assignments

This commit was SVN r29988.
2013-12-19 20:41:32 +00:00
Yossi Etigin
6ab4aba9e6 Fix missing include of show_help.h in mtl mxm.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29987.
2013-12-19 19:37:21 +00:00
Tom Naughton
3aefca32b0 + update rte db_fetch comments with change from r29931
This commit was SVN r29971.

The following SVN revision numbers were found above:
  r29931 --> open-mpi/ompi@0995a6f3b9
2013-12-19 01:16:58 +00:00
Jeff Squyres
bb59b07321 Remove CFLAGS setting that was really only intended for the v1.6
branch (it's not necessary on trunk/v1.7 because they require C99,
which allows variadic macros).

Also fix another compiler warning (using %p to print a (void*)).

Submitted by Jeff, reviewed by Dave.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=two usnic BTL fixes

This commit was SVN r29966.
2013-12-19 00:19:05 +00:00
Nathan Hjelm
b9765a380f Update NEWS with new MPI-3 features and a note about the new ROMIO
version.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29965.
2013-12-19 00:16:07 +00:00
Jeff Squyres
515fd00411 CSCul95082: DMAR faults during mtt testing
usnic_channel_finalize() was deregistering recv buffers before
destroying the QP to which they were posted. The QP needs to be
destroyed first so that the NIC does not attemp tto write to
deregistered memory, causing the DMAR messages.

Submitted by Reese, reviewed by Jeff.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29963.
2013-12-19 00:01:35 +00:00
Ralph Castain
77553f72be Per this email thread:
http://www.open-mpi.org/community/lists/devel/2013/12/13412.php

fix the backtrace function to avoid async issues. Thanks to Takahiro Kawashima for the patch

This commit was SVN r29955.
2013-12-18 17:57:37 +00:00
Jeff Squyres
2665c91b2a Fixes trac:3958: use the right type name (mca_topo_base_module_t) in the
debugger code (not mca_topo_base_module_2_1_0_t).

I checked: we do a similar thing for coll in the communicator struct
(i.e., leave the version number off the module struct).  I confess to
not remembering ''why'' we leave the version number off, but it seems
to be consistent this way...

cmr=v1.7.4:reviewer=bosilca:subject=fix debugger type symbol lookup for mca_topo_base_module_t

This commit was SVN r29953.

The following Trac tickets were found above:
  Ticket 3958 --> https://svn.open-mpi.org/trac/ompi/ticket/3958
2013-12-18 15:17:15 +00:00
Yossi Etigin
ecfb122c97 Fix segfault in osc pt2pt completion handler, when the request is canceled during finalization.
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29938.
2013-12-17 17:30:14 +00:00
Ralph Castain
0995a6f3b9 Revert r29917 and replace it with a fix that resolves the thread deadlock while retaining the desired debug info. In an earlier commit, we had changed the modex accordingly:
* automatically retrieve the hostname (and all RTE info) for all procs during MPI_Init if nprocs < cutoff

* if nprocs > cutoff, retrieve the hostname (and all RTE info) for a proc upon the first call to modex_recv for that proc. This would provide the hostname for debugging purposes as we only report errors on messages, and so we must have called modex_recv to get the endpoint info

* BTLs are not to call modex_recv until they need the endpoint info for first message - i.e., not during add_procs so we don't call it for every process in the job, but only those with whom we communicate

My understanding is that only some BTLs have been modified to meet that third requirement, but those include the Cray ones where jobs are big enough that launch times were becoming an issue. Other BTLs would hopefully be modified as time went on and interest in using them at scale arose. Meantime, those BTLs would call modex_recv on every proc, and we would therefore be no worse than the prior behavior.

This commit revises the MPI-RTE interface to pass the ompi_proc_t instead of the ompi_process_name_t for the proc so that the hostname can be easily inserted. I have advised the ORNL folks of the change.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix thread deadlock

This commit was SVN r29931.

The following SVN revision numbers were found above:
  r29917 --> open-mpi/ompi@1a972e2c9d
2013-12-17 03:26:00 +00:00
Adrian Reber
b42aad44a3 Trying to get the C/R code to compile again. This patch
includes various fixes all over the C/R code which are
hard to group like the other patches.

Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values

Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)

This commit was SVN r29922.
2013-12-16 15:35:28 +00:00
George Bosilca
efb32da1e0 There is no need for this include.
This commit was SVN r29918.
2013-12-15 17:04:45 +00:00
George Bosilca
1a972e2c9d Don't be greedy, just do what we asked for.
This commit was SVN r29917.
2013-12-15 16:54:01 +00:00
George Bosilca
430a13719f Only if OMPI_BTL_SM_HAVE_CMA is set to 1.
This commit was SVN r29916.
2013-12-15 16:49:27 +00:00
Jeff Squyres
0ab48ad0d2 Fix some annoying flex warnings that have been there for years.
Many thanks to Tom Fogal for the initial patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings

This commit was SVN r29904.
2013-12-14 00:36:12 +00:00
Rolf vandeVaart
b955dbd6d9 Fix various items discovered by review of ticket #3951.
This commit was SVN r29900.
2013-12-13 21:25:07 +00:00
Jeff Squyres
f4afa4fd1f Add missing include, exposed in "external libevent" work.
Refs trac:3694

This commit was SVN r29898.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:21:30 +00:00
Jeff Squyres
bcfe2156d5 Bring over m4 quoting fix from v1.7 branch (in r29894) that was
discovered when removing some components.

This commit was SVN r29895.

The following SVN revision numbers were found above:
  r29894 --> open-mpi/ompi@58ed00296c
2013-12-13 20:27:33 +00:00
Brian Barrett
121ca26c59 Per discussion at Develoepr's Meeting, remove Solaris threads support. Solaris
will just fall back to pthreads, which should be no problem.

This commit was SVN r29893.
2013-12-13 20:07:11 +00:00
Ralph Castain
f763be26c4 Closes trac:2433. Check for hetero architecture and disqualify sm connections if that is found as the sm btl currently doesn't support hetero operations.
cmr=v1.7.4:reviewer=brbarret:subject=Disqualify sm btl for hetero procs

This commit was SVN r29882.

The following Trac tickets were found above:
  Ticket 2433 --> https://svn.open-mpi.org/trac/ompi/ticket/2433
2013-12-13 15:23:33 +00:00
Mike Dubman
fb3f94a16e remove debug print
Refs trac:3969

This commit was SVN r29876.

The following Trac tickets were found above:
  Ticket 3969 --> https://svn.open-mpi.org/trac/ompi/ticket/3969
2013-12-13 06:08:44 +00:00
Mike Dubman
21be95c9b5 Initialize sm global variables in mca_btl_sm_component_open(), because they are destructed in mca_btl_sm_component_close(), and init() function might not be called or fail.
For exammple, mca_btl_sm.knem_fd remained 0, and mca_btl_sm_component_close() ended up doing closing fd 0 which belongs to someone else.

fixed by Yossi, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29875.
2013-12-13 06:01:24 +00:00
Jeff Squyres
bac67e0d81 Per discussion @Chicago OMPI dev meeting Dec 2013: remove all MX support.
This commit was SVN r29873.
2013-12-12 18:54:47 +00:00
Nathan Hjelm
3262080391 Cleanup udcm structures to avoid issues with nesting structures with
flexible members.

UDCM is ready to go for 1.7.4 with this patch.

cmr=v1.7.4:ticket=3940

This commit was SVN r29861.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-12 05:24:37 +00:00
Nathan Hjelm
e0e94a6029 Fix warning caused by typo in r29815
This commit was SVN r29860.

The following SVN revision numbers were found above:
  r29815 --> open-mpi/ompi@d556b60b21
2013-12-11 21:45:39 +00:00
Nathan Hjelm
6ab69c758b Fix warnings in udcm.
cmr=v1.7.4:reviewer=rhc:ticket=3940

This commit was SVN r29859.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-11 21:40:06 +00:00
Rolf vandeVaart
3ae88f8a24 Ensure no fork support with GDR. CUDA-aware code only.
This commit was SVN r29854.
2013-12-10 18:08:53 +00:00
Rolf vandeVaart
1cc55f305f Add extra check for GDR. Adjust some names and replace opal_output with opal_show_help.
This commit was SVN r29853.
2013-12-10 16:04:08 +00:00
Jeff Squyres
0f61bb651e Technically, the PORT_ACTIVE is not a "bad" event.
Note that this event should never happen within a single OMPI job,
because OMPI will ignore usnic ports that are down.  The PORT_ACTIVE
event should only occur if a port ''was'' down and is now ''up''.  But
what the heck -- if we ever do get this event, it is harmless -- just
ignore it.

This commit was SVN r29852.
2013-12-09 20:45:55 +00:00
Edgar Gabriel
c253c2eec6 fix the condition for the lazy open of shared filepointers.
This commit was SVN r29850.
2013-12-09 19:37:21 +00:00
Mike Dubman
9a65e0d8c6 cosmetic fixed fpr hcol autotools
Refs: #3694

This commit was SVN r29841.
2013-12-08 09:45:13 +00:00
Mike Dubman
2e124454b4 cosmitic fix to remove redundant -lfca
use CPP extra flags var which propagated to coll/fca and scoll/fca
Refs: #3694

This commit was SVN r29832.
2013-12-07 15:00:54 +00:00
Jeff Squyres
3bd9c603ff Clean up variables used in configure with OPAL_VAR_SCOPE.
This is helpful in the work for #3694: ensure that many places that
eventually end up in configure don't overly-pollute the global shell
variable space (because debugging accidental shell variable pollution
can be a real pain).

Refs trac:3694

This commit was SVN r29830.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-06 23:40:34 +00:00
Rolf vandeVaart
d556b60b21 Chnage some CUDA configure code and macro names per review request by jsquyres in ticket #3880.
Functionally, nothing changes.

This commit was SVN r29815.
2013-12-06 14:35:10 +00:00
Nathan Hjelm
231ebb09c9 Update romio configury to remove a warning message.
cmr=v1.7.4:ticket=3158

This commit was SVN r29811.

The following Trac tickets were found above:
  Ticket 3158 --> https://svn.open-mpi.org/trac/ompi/ticket/3158
2013-12-06 00:12:35 +00:00
Dave Goodell
da26226e3c usnic: add some extra debug-build sanity checks
On the off chance that the PML is twiddling fields that it really
shouldn't be...

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29804.
2013-12-05 00:28:11 +00:00
Oscar Vega-Gisbert
97fc83e29e Remove references to pinning support
This commit was SVN r29802.
2013-12-04 22:40:26 +00:00
Jeff Squyres
ba018b3603 Protect the container_of #define.
MOFED apparently has a /usr/include/infiniband/verbs.h that also
defines a (slightly different but fully compatible) container_of
macro.  So put proper #ifndef protection around our definition of
container_of.

Thanks to Rolf vandeVaart for pointing out the issue.

Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29799.
2013-12-04 14:24:56 +00:00
Yossi Etigin
a913b00f89 mtl mxm: update configuration parsing api to mxm 2.1, drop
older version support (1.0 and 1.1), and cleanup the code.

reviewed by miked.

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29797.
2013-12-04 09:11:55 +00:00
Jeff Squyres
c74c1e86d3 Per suggestion from Paul Kapinos, report in BTL verbosity if a device
is skipped because it is too far away.

(see thread starting here:
http://www.open-mpi.org/community/lists/devel/2013/06/12470.php)

This commit was SVN r29790.
2013-12-03 22:44:11 +00:00
Rolf vandeVaart
218c05a4d1 Make sure synchronous copies are complete before moving the data.
This commit was SVN r29789.
2013-12-03 21:20:14 +00:00
Rolf vandeVaart
ab77435d9b Fix the CUDA-aware case where we are not sending any GPU data.
This commit was SVN r29788.
2013-12-03 20:25:58 +00:00
Devendar Bureddy
4554770ee4 hcol fixes
cmr=v1.7.4:reviewer=jladd

This commit was SVN r29787.
2013-12-03 20:21:40 +00:00
Nathan Hjelm
fe327d9859 udcm: cleanup code and improve the ack handling
Originally udcm acks used the immediate data to indicate which message was
being acknowleged. This data was (mysteriously) junk when using QLogic HCAs so I
updated udcm to use the source info (slid, qp, etc) to determine which message was being
acked. This works as long as we don't have two messages simultaneously in flight
to a particular peer and then loose the first of the two messages. The chances of this
happening are tiny. To fix this case I updated the udcm message header to include
a pointer to the in flight message. This pointer is then sent back to the sending
process to ack receipt.

cmr=v1.7.4:ticket=trac:3940

This commit was SVN r29775.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-02 20:18:46 +00:00
Jeff Squyres
3a7af4ab40 Fix another clang warning: sendreq is undefined if proc==NULL.
cmr=v1.7.4:reviewer=hjelmn:subject=fix ob1 undefined sendreq value

This commit was SVN r29774.
2013-12-02 19:44:42 +00:00
Jeff Squyres
16c63c5bbe Fix conditional: don't just check the constant (thanks to clang for an
excellent warning message!)

cmr=v1.7.4:reviewer=hjelmn:subject=fix MCA_BASE_VAR_SOURCE_OVERRIDE test

This commit was SVN r29773.
2013-12-02 19:41:59 +00:00
Nathan Hjelm
fb0b0442c4 openib/connect: re-enable xrc support in the openib btl
This commit updates the udcm cpc to support xrc. The steps followed by udcm
mimic those in the removed xoob cpc. This update has been tested with both XRC
and RC.

Mellanox, this is intended to go into 1.7.4. Please review carefully and let
me know if there are any issues.

cmr=v1.7.4:reviewer=miked

This commit was SVN r29767.
2013-11-27 22:28:04 +00:00
George Bosilca
cb24277737 Restrict the usage of MPI_Type_extent only to receiving processes
(aka the root). This commit is based on a patch provided by Pierre 
Jolivet.
Fix all the output to match the failing MPI call.

This commit was SVN r29761.
2013-11-27 12:09:31 +00:00
Rolf vandeVaart
aa98b0333b Call function from function table. Discovered during static build.
This commit was SVN r29755.
2013-11-25 22:46:07 +00:00
Matthias Jurenz
90ebdd920f Changes to VT:
- added preprocessor conditional for vt_cupti_events_enabled
	  (fixes compile error when CUDA-RT wrapper are enabled and CUPTI is disabled (as reported at: https://svn.open-mpi.org/trac/ompi/changeset/29752 by Jörg Bornschein))

This commit was SVN r29754.
2013-11-25 12:58:43 +00:00
Ralph Castain
ac9820c46f Link against common cuda library
Thanks to Jorg Bornschein for pointing it out

cmr=v1.7.4:reviewer=rolfv

This commit was SVN r29750.
2013-11-24 17:06:51 +00:00
George Bosilca
68268377af Fix an error message for the igather and the usage of the extent on
non non-root processes for the iscatter. Thanks to Pierre
Jolivet for the bug report and the patch.

This commit was SVN r29736.
2013-11-23 00:59:22 +00:00
Matthias Jurenz
3923ee89ec Changes to VT/OTF:
Fixed warnings about the need of the 'subdir-objects' option when using Automake v1.14.
Due to a bug in Automake (see http://debbugs.gnu.org/cgi/bugreport.cgi?bug=13928) the 'subdir-objects' option cannot be enabled.
To get around this problem external sources files are sym linked in the current build directory (as done in ompi/mpi/c/profile) to lead Automake to believe that all source files are in the same directory.

This commit was SVN r29732.
2013-11-22 12:37:31 +00:00
Edgar Gabriel
4f425872be fix the streams used in opal_output in the sharedfp components.
This commit was SVN r29726.
2013-11-21 16:11:49 +00:00
Devendar Bureddy
4a311ae9fd continue search sorted openib device list if no btls found with nearest HCA.
cmr=v1.7.4:reviewer=jladd

This commit was SVN r29725.
2013-11-20 22:23:12 +00:00
Nathan Hjelm
24a7e7aa34 Add support for the udreg registration cache and dynamics on XE/XK/XC.
To support the new mpool two changes were made to the mpool infrastructure:

 1) Added an mpool flag to indicate that an mpool does not need the memory
    hooks to use the leave pinned protocols. This flag is checked in the
    mpool lookup.

 2) Add a mpool context to the base registration. This new member is used
    by the udreg mpool to store the udreg context associated with the
    particular registration. The new member will not break the ABI
    compatibility as the new member is only currently used by the udreg
    mpool.

Dynamics support for Cray systems makes use of the global rank provided by
orte to give the ugni library a unique rank for each process. Dynamics
support is not available under direct-launch (srun.)

cmr=v1.7.4

This commit was SVN r29719.
2013-11-18 04:58:37 +00:00
Jeff Squyres
5206e877be Help decrease conflicts between SVN trunk and Cisco git branch of OMPI v1.6 branch
This commit was SVN r29715.
2013-11-15 21:35:56 +00:00
Jeff Squyres
e6ed7c9f4d Avoid trivial "don't mix declarations and code" compiler warning
This commit was SVN r29714.
2013-11-15 21:31:10 +00:00
Oscar Vega-Gisbert
da84609091 Update Javadoc
This commit was SVN r29713.
2013-11-14 22:18:45 +00:00
Rolf vandeVaart
92e6aaa808 Adjust a default value. Adjust some levels of verbosity and one more debug message.
This commit was SVN r29712.
2013-11-14 21:47:27 +00:00
Oscar Vega-Gisbert
73fc5d2b3a Update Javadoc
This commit was SVN r29710.
2013-11-14 21:19:16 +00:00
Ralph Castain
7480beb7f0 Per request from Nathan, add an offset value to the job struct so we can construct a "global rank" that spans multiple jobs during dynamic launch operations. Store a new ORTE_DB_GLOBAL_RANK value for each process in the database, and ensure that we share our own value during connect_accept so both sides can see it.
This isn't being used yet - just enabling Nathan to do what he needs.

***** NOTE: any use of the OMPI_DB_GLOBAL_RANK database key must be protected by #ifdef OMPI_DB_GLOBAL_RANK as not all RTE's will define this key. *****

This commit was SVN r29708.
2013-11-14 17:01:43 +00:00
Ralph Castain
22e30a680d Given that the oob and xoob cpc's are no longer operable and haven't been since the OOB update, remove them to avoid confusion
cmr:v1.7.4:reviewer=hjelmn:subject=Remove stale cpcs from openib

This commit was SVN r29703.
2013-11-14 04:16:53 +00:00
Nathan Hjelm
6b3cf0c1ba Merge branch 'romio_refresh'
This commit was SVN r29695.
2013-11-13 21:02:55 +00:00
Rolf vandeVaart
4964a5e98b Per this RFC from October 8, 2013 and as discuessed in telecon.
http://www.open-mpi.org/community/lists/devel/2013/10/13072.php

Add support for pinning GPU Direct RDMA in openib BTL for better small message latency of GPU buffers. 
Note that none of this is compiled in unless CUDA-aware support is requested.

This commit was SVN r29680.
2013-11-13 13:22:39 +00:00
Jeff Squyres
684dc2f849 Don't use the hard-coded name libmpi.so -- instead, use
libmpi.<OPAL_DYN_LIB_SUFFIX>, where OPAL_DYN_LIB_SUFFIX was determined
by configure.

Thanks to Ömer Demirel for reporting the issue.

Refs trac:3905.

This commit was SVN r29676.

The following Trac tickets were found above:
  Ticket 3905 --> https://svn.open-mpi.org/trac/ompi/ticket/3905
2013-11-13 03:25:18 +00:00
Jeff Squyres
98ff91cfeb Refs trac:3091
Gah!  The "device" variable isn't used at all in this loop (my eye
glossed over the next line and thought that "device" was used in the
free() statement, but it's actually "devices" -- not "device").

This commit was SVN r29665.

The following Trac tickets were found above:
  Ticket 3091 --> https://svn.open-mpi.org/trac/ompi/ticket/3091
2013-11-12 23:01:04 +00:00
Jeff Squyres
7cb31111a6 Refs trac:3901
Feedback from Dave's review.

This commit was SVN r29664.

The following Trac tickets were found above:
  Ticket 3901 --> https://svn.open-mpi.org/trac/ompi/ticket/3901
2013-11-12 22:51:20 +00:00
Ralph Castain
762400d559 Silence warning
Refs trac:3898

This commit was SVN r29659.

The following Trac tickets were found above:
  Ticket 3898 --> https://svn.open-mpi.org/trac/ompi/ticket/3898
2013-11-11 22:53:09 +00:00
Jeff Squyres
5a940f5ee7 Arrgh -- remove debugging printf.
This commit was SVN r29657.
2013-11-11 22:44:28 +00:00
Jeff Squyres
e20217eccc Expand the "btl_usnic" MPI_T enumeration to have strings of the form:
<usnic device name>,<eth device>,<ip address>/<CIDR prefix>

For example:

   usnic_0,eth4,10.1.0.15/16

This is just handy for mapping the usnic_X device back to the IP
network to which it corresponds.

This commit was SVN r29656.
2013-11-11 22:25:30 +00:00
Nathan Hjelm
6a331275d8 Set transfers as active before starting them.
cmr=v1.7.4:ticket=trac:3898

This commit was SVN r29654.

The following Trac tickets were found above:
  Ticket 3898 --> https://svn.open-mpi.org/trac/ompi/ticket/3898
2013-11-11 21:50:54 +00:00
Nathan Hjelm
3d3c29ae96 btl/scif: do not return resource busy if we started a connection attempt.
Resolves a hang when using scif for shared memory transfers. This is a
simple change and doesn't require a review.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29653.
2013-11-11 19:36:34 +00:00
Nathan Hjelm
b5ce72cc15 Set the modex as active before starting it. This resolves a hang in
MPI_Init() on comm-spawned processes.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29652.
2013-11-11 19:33:32 +00:00
Rolf vandeVaart
a6df7bc33a Fix issues reported in ticket #3877. Also added additional comments.
This commit was SVN r29641.
2013-11-07 20:44:47 +00:00
Rolf vandeVaart
2cf7c40ee5 Minor adjustments to error messages due to review of #3880.
This commit was SVN r29640.
2013-11-07 20:21:21 +00:00
Rolf vandeVaart
3290cde630 Various minor changes to bring smcuda up to date with sm.
This commit was SVN r29639.
2013-11-07 19:45:56 +00:00
Dave Goodell
82db913490 usnic: fix module_recv_buffers perf regression
Cisco v1.6 git commit 913ec6c and upstream trunk r29593 (segfault fix)
introduced a performance regression by inadvertently disabling the
`module_recv_buffers` functionality.  With those changes in place, the
`btl_usnic_recv.c` logic would end up mallocing a buffer that should
have otherwise come from a `module_recv_buffers` pool.  It also resulted
in a small, bounded memory leak (128 buffers at each power-of-two size
interval).

The new version just places the buffer after the free list item with a
flexible array member.  I bumped the pool to allocate all 128 elements
up front because the deferred allocation was modestly impacting IMB
Sendrecv performance at a few sizes.

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29631.

The following SVN revision numbers were found above:
  r29593 --> open-mpi/ompi@1ed9b8ff43
2013-11-07 01:27:31 +00:00
Vishwanath Venkatesan
d37a5faa20 Need not do aggregator selection for one process case
So adding a check for this corner case!

This commit was SVN r29622.
2013-11-06 21:05:26 +00:00
Brian Barrett
6d7a1fbb82 Move opal_portable_platform.h to opal/include/opal, which is where it really
should have been all along and fix one place that uses the file

Update opal_portable_platform.h with changes to mpi_portable_platform.h made 
in r29608.

Make mpi_portable_platform.h a symlink to opal_portable_platform.h, so that
they won't get out of sync.  I'd like to remove mpi_portable_platform.h, but
we don't automatically add -I${includedir}/openmpi/ to make that sane from
a header include point of view, so that's future work.

This commit was SVN r29618.

The following SVN revision numbers were found above:
  r29608 --> open-mpi/ompi@b71bd51cdd
2013-11-06 17:12:26 +00:00
Brian Barrett
cf8de1ef0f Minor indent cleanup in init_query()
Only use Portals on communicators with more than one rank
Fix computation of number of children when using the hypercube tree

This commit was SVN r29616.
2013-11-06 15:21:09 +00:00
Jeff Squyres
e28261898d Per discussion on the devel list, rename the btl_usnic_devices MPI_T
state pvar to be btl_usnic (i.e., the best suggestion so far).

See http://www.open-mpi.org/community/lists/devel/2013/11/13188.php
for more detail.

This commit was SVN r29614.
2013-11-06 06:19:03 +00:00
Brian Barrett
9780043456 re-apply r29608 and fix the broken configure test that broke worse with the
patch.  See ticket #3885, comment 10 for an explination of why calling
_STRINGIFY on something that's not a numerical constant is always a bad idea.

This commit was SVN r29613.

The following SVN revision numbers were found above:
  r29608 --> open-mpi/ompi@b71bd51cdd
2013-11-05 22:41:10 +00:00