1
1
Граф коммитов

858 Коммитов

Автор SHA1 Сообщение Дата
Vasily Filipov
5fecf65daf OMPI/COLL/Tuned: add command line params for thresholds to decide if small/intermediate MSGs alltoall algorithm will be used.
cmr=v1.8.3:reviewer=miked

This commit was SVN r32735.
2014-09-15 12:34:21 +00:00
Gilles Gouaillardet
edfbeba7bf coll/ml: better error handling
when CHECK_AND_RECYCLE detects an error, a message is displayed
if the error occurs on an intrinsic communicator, then abort
the program (instead of trying to free the communicator)

cmr=v1.8.3:reviewer=hjelmn

This commit was SVN r32659.
2014-09-01 10:00:49 +00:00
Ralph Castain
b554cd7d86 Turn off the coll/ml component if --without-hwloc was given
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32621.
2014-08-27 20:25:39 +00:00
Rolf vandeVaart
8709071819 Fix missing help file.
This commit was SVN r32550.
2014-08-18 21:52:31 +00:00
Gilles Gouaillardet
22cb8a1834 check-help-strings cleanup
This commit was SVN r32497.
2014-08-11 03:27:45 +00:00
Mike Dubman
0f60c34a9f fca: adopt opal API refactoring, fix warning.
based on http://www.open-mpi.org/community/lists/devel/2014/08/15558.php

This commit was SVN r32484.
2014-08-09 15:50:51 +00:00
Ralph Castain
e95187514c Update the proc structure dereference to reflect the new opal_proc_t super field
This commit was SVN r32462.
2014-08-08 16:12:49 +00:00
Jeff Squyres
132375f07f helpfiles: fix filenames referenced by calls to show_help()
This commit was SVN r32453.
2014-08-08 13:34:15 +00:00
Ralph Castain
5f244e8b19 Use opal_getpagesize to get the proper page size
Refs trac:4826

This commit was SVN r32422.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-04 20:23:00 +00:00
Gilles Gouaillardet
5b1ae87c76 coll/ml: fix ML_ERROR/printf parameters
cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32409.
2014-08-04 04:05:59 +00:00
Ralph Castain
daeb9b6c4f Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain.
Thanks to Gilles for pointing out some of the discrepancies.

This commit was SVN r32398.
2014-08-01 14:44:11 +00:00
Gilles Gouaillardet
cd8fa75f87 coll/ml: align on page size as returned by sysconf
Thanks to Paul Hargrove for pointing into the right direction

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32393.
2014-08-01 08:10:12 +00:00
Nathan Hjelm
1407c1f501 Remove RML code from common/sm
The only user of this code was coll/sm. I implemented a basic replacement
for the removed code. This gets the trunk compiling again with
--disable-dlopen.

This commit was SVN r32333.
2014-07-28 22:00:12 +00:00
Ryan Grant
caa10a5faf Portals fixes after latest move
This commit was SVN r32330.
2014-07-28 19:25:03 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Devendar Bureddy
74852b4d21 HCOLL: fix misplaced hcoll_init return value check.
cmr=v1.8.2:reviewer=jladd

This commit was SVN r32282.
2014-07-22 18:47:34 +00:00
Mike Dubman
e342a11c2e opal envlist mca: implement Jeff`s quibbles
fixed by Elena, reviewed by Miked

This commit was SVN r32216.
2014-07-11 07:23:20 +00:00
Nathan Hjelm
56ad231b7c coll/ml: temporarily disable binding check
This commit was SVN r32178.
2014-07-09 14:39:49 +00:00
Joshua Ladd
057370364d Opal: Add a new MCA variable type "version_string". Also add a
new flag to ompi_info that allows a user to print all MCA variables of a specific type.  

 --type version_string

This command will print all MCA variables of type version_string.

This feature was developed by Elena Shipunova and was reviewed by Josh Ladd.

This commit was SVN r32166.
2014-07-09 01:37:23 +00:00
Nathan Hjelm
309a6cf951 coll/ml: set n_resources to 0 when destructing an lmngr
Also keep track of the allocation base so we free the correct pointer
when cleaning up.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r32151.
2014-07-07 15:11:26 +00:00
George Bosilca
843ef1fcb0 ompi_mpi_abort had one extra argument that was never used. Clean it up.
This commit was SVN r32124.
2014-07-03 00:34:44 +00:00
Mike Dubman
ce6d5b8cd7 HCOLL: make it OFF by default
fixed by miked, reviewed by Alex

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32101.
2014-06-28 18:45:03 +00:00
Devendar Bureddy
228772ae81 hcoll gatherv support
cmr=v1.8.2:reviewer=jladd

This commit was SVN r32097.
2014-06-26 18:14:41 +00:00
George Bosilca
99561c5cc1 If the enable fails don't give up, but instead keep going with
the other collective modules. If we endup without some of the
collective the code will raise an error anyway.

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32096.
2014-06-26 15:52:45 +00:00
Gilles Gouaillardet
fae7adf8ee Remove legacy FCA_IS_LOCAL_PROCESS macro
and use OPAL_PROC_ON_LOCAL_NODE instead

cmr=v1.8.2:reviewer=rhc

This commit was SVN r32079.
2014-06-25 02:37:53 +00:00
Gilles Gouaillardet
e9ed9def02 Fix MPI_Alltoallv in coll/tuned
This changeset :
- always call the low/level implementation for :
  * MPI_Alltoallv
  * MPI_Neighbor_alltoallv
  * MPI_Alltoallw
  * MPI_Neighbor_alltoallv
- fix mca_coll_tuned_alltoallv_intra_basic_inplace
  so zero size types are correctly handled

cmr=v1.8.2:reviewer=bosilca:ticket=4715

This commit was SVN r32013.

The following Trac tickets were found above:
  Ticket 4715 --> https://svn.open-mpi.org/trac/ompi/ticket/4715
2014-06-17 06:11:34 +00:00
George Bosilca
542e4996a7 Cleanup the utilities functions in tuned.
This commit was SVN r31987.
2014-06-13 16:04:45 +00:00
Gilles Gouaillardet
50256c62c5 Fix MPI_Alltoallv in coll/tuned.
Correctly handle the corner case in MPI_Alltoallv when
some tasks have no data to transfer and some other tasks
do have data to transfer.

This test case is covered in ibm/collective/alltoallv_somezeros
from the ompi-tests repo.

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31985.
2014-06-13 06:03:23 +00:00
Rolf vandeVaart
489664b4a9 Remove debug code.
This commit was SVN r31901.
2014-05-28 16:06:16 +00:00
Rolf vandeVaart
570e313c9b Add collective module to handle CUDA aware buffers and reductions. Per RFC sent last week.
Reviewed by bosilca.

This commit was SVN r31894.
2014-05-27 21:24:43 +00:00
Mike Dubman
7b05b5c4c2 HCOLL: use proper parameter in progress unregister
fixed by Nadezhda, reviewed by Elena/MikeD

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31891.
2014-05-26 09:03:30 +00:00
Gilles Gouaillardet
baf532087a Fix mca_coll_basic_alltoallw_inter()
Avoid sending/receiving zero size messages in order to be compliant
with the top-level modification

cmr=v1.8.2:ticket=4651:reviewer=bosilca

This commit was SVN r31836.

The following Trac tickets were found above:
  Ticket 4651 --> https://svn.open-mpi.org/trac/ompi/ticket/4651
2014-05-20 09:22:57 +00:00
Gilles Gouaillardet
8bafe06c57 Fixes *alltoall* collectives at top level
This commit :
 - Correctly retrieve the communicator size when
   checking memory and parameters
 - Ensure (sendtype,sendcount) and (recvtype,recvcount)
   matches and return with MPI_ERR_TRUNCATE otherwise
 - Return with MPI_SUCCESS without invoking the low level
   if no data is going to be transferred
 - Fixes trac:4506

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31815.

The following Trac tickets were found above:
  Ticket 4506 --> https://svn.open-mpi.org/trac/ompi/ticket/4506
2014-05-19 07:46:07 +00:00
Mike Dubman
cadc1485ff HCOLL: register memory release hook to avoid races
fixed by Devender, reviewed by Miked

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31809.
2014-05-17 19:49:43 +00:00
Nathan Hjelm
c32d84154a coll/ml: fix leaks and close all the framework opened
It is essential to call mca_base_framework_close for every framework
that is opened. coll/ml was not doing this so neither bcol nor sbgp
were getting cleaned up. This commit fixes this omission.

Also fixed a leak caused by calling OBJ_DESTRUCT for something created
with OBJ_NEW. With these changes coll/ml appears to be valgrind clean.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31743.
2014-05-13 21:22:12 +00:00
Gilles Gouaillardet
e3df77548d Fix memory leak when releasing a communicator created by
MPI_Cart_Create/MPI_Graph_create/MPI_Dist_Graph

Fixes trac:4581

This commit was SVN r31716.

The following Trac tickets were found above:
  Ticket 4581 --> https://svn.open-mpi.org/trac/ompi/ticket/4581
2014-05-13 04:49:23 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
Devendar Bureddy
dfaac7d29d Do not call into hcoll progress after MPI_Finalize
Reviewed by Mike
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31639.
2014-05-05 22:46:39 +00:00
Jeff Squyres
64c1228b55 Roll back r31519 and r31521: George convinced us that these approaches
weren't right.

This commit was SVN r31528.

The following SVN revision numbers were found above:
  r31519 --> open-mpi/ompi@b449c750b7
  r31521 --> open-mpi/ompi@e243805ed8
2014-04-24 20:27:03 +00:00
Jeff Squyres
e243805ed8 coll tuned alltoallv: correctly handle 0-sized messages with MPI_IN_PLACE
Patch from Gilles Gouaillardet on #4517 to fix handling 0-sized
messages in coll tuned with MPI_ALLTOALLV and MPI_IN_PLACE.

Reviewed by Jeff Squyres.

Fixes trac:4517

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31521.

The following Trac tickets were found above:
  Ticket 4517 --> https://svn.open-mpi.org/trac/ompi/ticket/4517
2014-04-24 16:55:53 +00:00
Jeff Squyres
b449c750b7 coll basic: correctly handle alltoall[vw] 0-sized messages
Patch from Gilles Gouaillardet on #4506 to correctly handle 0-sized
messages in coll/basic MPI_Alltoallv and MPI_Alltoallw.

Reviewed by Jeff Squyres.

Fixes trac:4506.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31519.

The following Trac tickets were found above:
  Ticket 4506 --> https://svn.open-mpi.org/trac/ompi/ticket/4506
2014-04-24 16:25:43 +00:00
Jeff Squyres
e9b694f1d8 coll_base_comm_unselect.c: fix memory leaks
Ensure to also OBJ_RELEASE the neightbor and ineighbor modules.

Fixes trac:4444 (this patch is from that ticket).

This commit was SVN r31516.

The following Trac tickets were found above:
  Ticket 4444 --> https://svn.open-mpi.org/trac/ompi/ticket/4444
2014-04-24 15:53:06 +00:00
Mike Dubman
a4990de055 mca: track external lib version (runtime/compiletime) for mca component
based on thread: http://www.open-mpi.org/community/lists/devel/2014/04/14505.php

Create mca parameter to track runtime/compiletime ext lib version for component.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31487.
2014-04-22 18:02:26 +00:00
George Bosilca
6a65d27bcc Print the 3rd buffer for the MPI_Op.
This commit was SVN r31471.
2014-04-21 23:29:30 +00:00
Nathan Hjelm
e125bbe347 coll/ml: clean out apparently stale code
The file coll_ml_ibarrier.c wasn't included in coll/ml's Makefile.am
and the setup code from coll_ml_hier_algorithms_ibarrier.c was not
being called. It looks like this code is stale and has long since been
replaced by the code in coll_ml_barrier.c

Once all these little CMRs are approved I may make it into one roll-up
CMR to make it easier on the RM.

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31418.
2014-04-16 22:43:43 +00:00
Nathan Hjelm
484a3f6147 coll/ml: fix issues identified by the clang static analyser and fix
a segmentation fault in the reduce cleanup

Some of the changes address false warnings produced by scan-build. I
added asserts and changed some malloc calls to calloc to silence these
warnings.

The was one issue in cleanup for reduce since the component_functions
member is changed by the allreduce call. There may be other issues
with how this code works but releasing the allocated
component_functions after setting up the static functions addresses
the primary issue (SIGSEGV).

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31417.
2014-04-16 22:43:35 +00:00
Jeff Squyres
6521dcc4f1 Trivial defensive programming/style update: use {}, even for 1-line blocks.
This commit was SVN r31361.
2014-04-09 16:28:31 +00:00
George Bosilca
95a4f219ea This commit fixes some of the Coverity reported warnings. I addressed
some of the collective modules, the shared memory and the profiling
interface. I left out VT, dynamic fcoll and seq rmaps.

cmr=v1.8.1:reviewer=jsquyres:subject=silence Coverity reported warnings

This commit was SVN r31309.
2014-04-06 18:23:49 +00:00
Nathan Hjelm
71bdb8c439 coll/ml: fix some warnings identified by clang
cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31285.
2014-03-28 22:31:41 +00:00
Nathan Hjelm
459431622b Revert "coll/ml: there is no reason not to enable coll/ml when a process in not"
Discussed this with Manju and we decided to back this one out until a later time.

This reverts commit r31188 and closes trac:4435

This commit was SVN r31282.

The following SVN revision numbers were found above:
  r31188 --> open-mpi/ompi@f1dd589092

The following Trac tickets were found above:
  Ticket 4435 --> https://svn.open-mpi.org/trac/ompi/ticket/4435
2014-03-28 21:16:34 +00:00
Manjunath Gorentla Venkata
28609d3ac2 Clean wanring in sbgp and coll ml
This commit was SVN r31280.
2014-03-28 19:53:36 +00:00
Manjunath Gorentla Venkata
8c849ee991 coll/ml : Replace longer error message with opal_show_help; thanks Jeff for identifying those
This commit was SVN r31279.
2014-03-28 19:25:54 +00:00
Nathan Hjelm
a9fb4976d5 coll/ml: more fixes
There were a couple of issues with the memory leak fixes and several more verbose
issues. This fixes those issues.

cmr=v1.8.1:ticket=trac:4473

This commit was SVN r31273.

The following Trac tickets were found above:
  Ticket 4473 --> https://svn.open-mpi.org/trac/ompi/ticket/4473
2014-03-28 18:31:28 +00:00
Nathan Hjelm
bd3b550c6d coll/ml: fix leaks
Thanks to ggouaillardet for finding and fixing these issues.

Closes trac:4460

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31264.

The following Trac tickets were found above:
  Ticket 4460 --> https://svn.open-mpi.org/trac/ompi/ticket/4460
2014-03-27 23:25:31 +00:00
Nathan Hjelm
0cccb2fb59 coll/ml: reduce noise from coll/ml error messages
The error doesn't prevent the user from running so there is no reason
to display it unless the user requested it (through coll_ml_verbose).

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31242.
2014-03-26 22:50:06 +00:00
Nathan Hjelm
15a8c9d7b8 coll/ml: addendum to r31189. increment the bcol_index
cmr=v1.8:ticket=trac:4436

This commit was SVN r31193.

The following SVN revision numbers were found above:
  r31189 --> open-mpi/ompi@c7d830f4b9

The following Trac tickets were found above:
  Ticket 4436 --> https://svn.open-mpi.org/trac/ompi/ticket/4436
2014-03-21 22:03:56 +00:00
Nathan Hjelm
c7d830f4b9 coll/ml: improve the buffer size calculation and ensure the bcol_index in
a hierarchy actually matches a bcol that is in use.

There was a bug in one of the paths to calculate the ml buffer size. I fixed
the bug and squashed all the paths together to avoid further issues (the
result was correct in another path that calculated the same value).

Additionally, the i_hier was being used as the bcol_index. This is not
correct in a couple of cases so I added a variable to keep track of the
real bcol_index.

cmr=v1.8:reviewer=pasha

This commit was SVN r31189.
2014-03-21 21:54:28 +00:00
Nathan Hjelm
f1dd589092 coll/ml: there is no reason not to enable coll/ml when a process in not
bound.

This case is correctly handled by coll/ml so remove the check that diables
coll/ml in the not bound case.

cmr=v1.8:reviewer=manjugv

This commit was SVN r31188.
2014-03-21 21:54:21 +00:00
Nathan Hjelm
08bbdcbf61 coll/ml: fix leaks in coll/ml resources
This patch fixes two leaks:

 - Fix typo in fallback collective code that caused coll/ml to retain
   the ibcast module twice but only release it once. One of those ibcast
   saves was supposed to be bcast.

 - Do not check for module initialization in the module destructor. It
   is possible to destruct a module that is partially setup.

cmr=v1.8:reviewer=manjugv

This commit was SVN r31187.
2014-03-21 21:54:14 +00:00
Nathan Hjelm
e764d3bebc coll/ml: really remove the asserts in the barrier setup
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31136.
2014-03-18 22:04:50 +00:00
Nathan Hjelm
e030443d45 coll/ml: further improve the hierarchy discovery to handle the case where a
sbgp module fails to group any processes on any nodes.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31131.
2014-03-18 21:26:24 +00:00
Nathan Hjelm
8b2d723fd4 coll/ml: fix valgrind warning about reading uninitialed value
This isn't causing any errors that I know about but it does fix an
annoying valgrind warning. Simple fix, no review required.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31130.
2014-03-18 21:26:17 +00:00
Nathan Hjelm
d9c8bf3785 coll/ml: move error messages to verbose output
There are situations where coll/ml does not initialize properly. These will
eventually need to be fixed but in the meantime it is better to not always
print an error message because the collective framework can still fall back
on another collective module. This commit reduces the verbose output.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31129.
2014-03-18 21:26:10 +00:00
Nathan Hjelm
97d7315dd2 coll/ml: do not assert if a barrier algorithm is not available
It is usually not a good idea to assert when something is not implemented
or something goes wrong. Replace asserts with debug output and return.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31128.
2014-03-18 21:26:04 +00:00
Jeff Squyres
5efd961149 Remove unnecessary \n's in ML_VERBOSE and ML_ERROR.
Also fixed spelling: IS_NOT_RECHABLE -> IS_NOT_REACHABLE.

Also mark a few places where opal_show_help() should have been used;
Manju will take care of these.

This commit was SVN r31104.
2014-03-18 12:24:32 +00:00
Nathan Hjelm
3f469d08e7 coll/ml: increase the number of allowed processes in a local reduce and
add checks to see if the bcol module can support allreduce.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31096.
2014-03-17 23:10:19 +00:00
Nathan Hjelm
f92579dce5 coll/ml: fix a case not correctly handled by r31071
In r31071 I modified the logic to not increment the hierarchy level if
no processes were selected by that sbgp. That fixed a problem seen on
systems where we don't support process binding. The problem is there
is a case where we actually did select processes yet the number of
selected processes is 0. We need to increment the hierarchy in this case
as well.

This should fix the segmentation fault found by recent MTT runs. Once
this is committed to 1.7.5 remove the .ompi_ignore's from coll/ml and
bcol/ptpcoll. Tested with ompi-tests/ibm.

cmr=v1.7.5:reviewer=rhc

This commit was SVN r31081.

The following SVN revision numbers were found above:
  r31071 --> open-mpi/ompi@1911d97044
2014-03-15 22:37:28 +00:00
Jeff Squyres
34d92315ae Remove extraneous "while(0)".
Oops.

cmr=v1.7.5:ticket=trac:4395

This commit was SVN r31075.

The following Trac tickets were found above:
  Ticket 4395 --> https://svn.open-mpi.org/trac/ompi/ticket/4395
2014-03-14 20:41:54 +00:00
Jeff Squyres
036db91f3d For the love of all that is holy, do not put 1MB arrays on the stack.
This was causing JVMs to run out of stack space, and all manner of
badness ensued.

Instead, use the heap -- that's what it's there for.

cmr=v1.7.5:reviewer=rhc:subject=make coll/ml use the heap for large debug array

This commit was SVN r31073.
2014-03-14 20:39:39 +00:00
Nathan Hjelm
1911d97044 coll/ml: fix assertion failure that occurs when level 0 of the hierarchy
fails to select any processes on any nodes.

Also modified basesmsocket to only print debugging info to the framework
output.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31071.
2014-03-14 19:39:00 +00:00
Nathan Hjelm
61f30d992a coll/ml: reduce has some issues when using non-contiguous datatypes. until
these issues are resolved disable coll/ml reduce.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31030.
2014-03-12 14:39:16 +00:00
Jeff Squyres
da87b506bd Remove warnings identified by clang 3.4
* Remove unused static functions
 * Remove unused static variables

cmr=v1.8:reviewer=hjelmn

This commit was SVN r31023.
2014-03-12 13:17:54 +00:00
Mike Dubman
a14dda491e OSHMEM: various fixes
- -check-shmem-params is OFF by default. It checks OSHMEM API params and will abort on bad input
- hcoll do not save fallback coll pointers for unsupported collectives.

fixed by Val, Roman, reviewed by Miked/Igor

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30995.
2014-03-11 17:27:33 +00:00
Nathan Hjelm
da2a68f669 coll/ml: fix bcast buffer size calculation
cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30963.
2014-03-07 21:00:08 +00:00
Nathan Hjelm
0af741810c coll/ml: do not access group proc pointers directly. use ompi_comm_peer_lookup instead.
Resolves an issue seen with --enable-sparse-groups.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30945.
2014-03-05 22:57:21 +00:00
Ralph Castain
29a7eda280 Remove executable property
This commit was SVN r30791.
2014-02-21 17:27:47 +00:00
Mike Dubman
608269ed72 fca: support relocation of fca packages to opal_prefix/../fca
reviewed by AlexM
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30728.
2014-02-14 14:49:41 +00:00
Pavel Shamis
3a683419c5 Fixing broken dependency between ML/BCOLS
This is hot-fix patch for the issue reported by Ralph. 
In future we plan to restructure ml data structure layout.

Tested by Nathan.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30619.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-07 19:15:45 +00:00
Ralph Castain
74d3393a4f Revert r30600, r30602-30604 as the first one broke the tarball and the others couldn't fix it
This commit was SVN r30605.

The following SVN revision numbers were found above:
  r30600 --> open-mpi/ompi@7d2c4cb468
  r30602 --> open-mpi/ompi@9e751a0302
  r30604 --> open-mpi/ompi@3012c280cf

Revision number ranges (suitable for "git log"):
  r30602-30604 --> open-mpi/ompi@9e751a03^..3012c280
2014-02-07 04:38:06 +00:00
Jeff Squyres
7d2c4cb468 There's a few ml-related bugs outstanding, and Nathan is looking into
them, but it's going to take a little time (at least one day).  So
Nathan says it's ok to .ompi_ignore coll ml until he's able to fix it.

This commit was SVN r30600.
2014-02-06 23:51:03 +00:00
Nathan Hjelm
c2b061cc84 basesmuma: clean up code
Several changes are contained in this commit:

 - Clean up tabs and trailing whitespaces

 - Use consistent indentation in changed files

 - Remove unused code. None of the removed code will ever have been
   used in a trunk build.

 - Clean up the smcm code quite a bit

 - Do not fflush stderr and use opal_output instead of fprintf.

These changes have been tested on Cray XE-6 and PSM systems.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30533.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-03 17:01:46 +00:00
Nathan Hjelm
afae924e29 coll/ml: fix some warnings and the spelling of indices
This commit fixes one warning that should have caused coll/ml to segfault
on reduce. The fix should be correct but we will continue to investigate.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30477.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-29 18:44:21 +00:00
Ralph Castain
b32556e6dc Fixes trac:4143
After IM with Nathan, apply patch from ticket after verification by Paul Hargrove that it fixes the problem on non-x86 32-bit platforms

Verified by Paul, RM-approved

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r30411.

The following Trac tickets were found above:
  Ticket 4143 --> https://svn.open-mpi.org/trac/ompi/ticket/4143
2014-01-24 17:56:52 +00:00
Mike Dubman
071838bb0a HCOLL: call hcoll_finalize and hcoll progress unregister in case of hcoll module query failures
fixed by Elena, reviewed by Val/Miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30390.
2014-01-23 07:29:23 +00:00
Nathan Hjelm
7ba8bd81fa coll/ml: remove debug fprintfs
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30367.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-22 17:21:05 +00:00
Nathan Hjelm
82d996fb76 coll/ml: cleanup some merge related errors
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30366.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-22 16:48:09 +00:00
Nathan Hjelm
1a021b8f2d coll/ml: add support for blocking and non-blocking allreduce, reduce, and
allgather.

The new collectives provide a signifigant performance increase over tuned for
small and medium messages. We are initially setting the priority lower than
tuned until this has had some time to soak in the trunk. Please set
coll_ml_priority to 90 for MTT runs.

Credit for this work goes to Manjunath Gorentla Venkata (ORNL), Pavel Shamis (ORNL),
and Nathan Hjelm (LANL).

Commit details (for reference):

Import ORNL's collectives for MPI_Allreduce, MPI_Reduce, and MPI_Allgather.

We need to take the basesmuma header into account when calculating the
ptpcoll small message thresholds. Add a define to bcol.h indicating the
maximum header size so we can take the header into account while not
making ptpcoll dependent on information from basesmuma.

This resolves an issue with allreduce where ptpcoll overwrites the
header of the next buffer in the basesmuma bank.

Fix reduce and make a sequential collective launcher in coll_ml_inlines.h

The root calculation for reduce was wrong for any root != 0. There are
four possibilities for the root:

 - The root is not the current process but is in the current hierarchy. In
   this case the root is the index of the global root as specified in the
   root vector.

 - The root is not the current process and is not in the next level of the
   hierarchy. In this case 0 must be the local root since this process will
   never communicate with the real root.

 - The root is not the current process but will be in next level of the
   hierarchy. In this case the current process must be the root.

 - I am the root. The root is my index.

Tested with IMB which rotates the root on every call to MPI_Reduce. Consider
IMB the reproducer for the issue this commit solves.

Make the bcast algorithm decision an enumerated variable

Resolve various asset failures when destructing coll ml requests.

Two issues:

 - Always reset the request to be invalid before returning it to the
   free list. This will avoid an asset in ompi_request_t's destructor.
   OMPI_REQUEST_FINI does this (and also releases the fortran handle
   index).

 - Never explicitly construct or destruct the superclass of an opal
   object. This screws up the class function tables and will cause
   either an assert failure or a segmentation fault when destructing
   coll ml requests.

Cleanup allgather.

I removed the duplicate non-blocking and blocking functions and modeled
the cleanup after what I found in allreduce. Also cleaned up the code
somewhat.

Don't bother copying from the send to the recieve buffer in
bcol_basesmuma_allreduce_intra_fanin_fanout if the pointers are the
same.

The eliminates a warning about memcpy and aliasing and avoids an
unnecessary call to memcpy.

Alwasy call CHECK_AND_RELEASE on memsync collectives.

There was a call to OBJ_RELEASE on the collective communicator but
because CHECK_AND_RECYLCE was never called there was not matching call
to OBJ_RELEASE. This caused coll ml to leak communicators.

Make allreduce use the sequential collective launcher in coll_ml_inlines.h

Just launch the next collective in the component progress.

I am a little unsure about this patch. There appears to be some sort
of race between collectives that causes buffer exhaustion in some cases
(IMB Allreduce is a reproducer). Changing progress to only launch the
next bcol seems to resolve the issue but might not be the best fix.

Note that I see little-no performance penalty for this change.

Fix allreduce when there are extra sources.

There was an issue with the buffer offset calculation when there are
extra sources. In the case of extra sources == 1 the offset was set
to buffer_size (just past the header of the next buffer). I adjusted
the buffer size to take into accoun the maximum header size (see the
earlier commit that added this) and simplified the offset calculation.

Make reduce/allreduce non-blocking. This is required for MPI_Comm_idup
to work correctly.

This has been tested with various layouts using the ibm testsuite and
imb and appears to have the same performance as the old blocking version.

Fix allgather for non-contiguous layouts and simplify parsing the
topology.

Some things in this patch:

 - There were several comments to the effect that level 0 of the
   hierarchy MUST contain all of the ranks. At least one function
   made this assumption but it was not true. I changed the sbgp
   components and the coll ml initization code to enforce this
   requirement.

 - Ensure that hierarchy level 0 has the ranks in the correct
   scatter gather order. This removes the need for a separate
   sort list and fixes the offset calculation for allgather.

 - There were several passes over the hierarchy to determine
   properties of the hierarchy. I eliminated these extra passes
   and the memory allocation associated with them and calculate the
   tree properties on the fly. The same DFS recursion also handles
   the re-order of level 0.

All these changes have been verified with MPI_Allreduce, MPI_Reduce, and
MPI_Allgather. All functions now pass all IBM/Open MPI, and IMB tests.

coll/ml: correct pointer usage for MPI_BOTTOM

Since contiguous datatypes are copied via memcpy (bypassing the convertor) we
need to adjust for the lb of the datatype. This corrects problems found testing
code that uses MPI_BOTTOM (NULL) as the send pointer.

Add fallback collectives for allreduce and reduce.

cmr=v1.7.5:reviewer=pasha

This commit was SVN r30363.
2014-01-22 15:39:19 +00:00
Mike Dubman
b8550a55a7 HCOLL: many fixes
Adds coll_hcoll_np mca parameter similar to that of fca component (defaults to 32). Those who use hcoll be aware that from now on the communicators less than 32 procs will run w/o hcoll by default. - Resolves fallback issue in case libhcoll runs out of allowed contexts. The solution is moving hcoll_context_create from comm_enable to comm_query. Shortly, comm_enable should never return OMPI_ERROR in the coll component with highest priority (hcoll). Otherwise the ompi coll_base_select will unselect the coll funtion pointers and module references leaving the communicator w/o coll pointer. This will cause the fail. Same behavior can be reproduced even with tuned if one would hardcore some "return OMPI_ERROR" into it's module_enable funtion. - Additionally, removed all the dead code under #if 0; removed unused variables (path for library, active_modules list) and classes (module list wrapper)

Fixed by Val, Reviewed by Devendar/Josh/Miked

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30341.
2014-01-21 12:19:47 +00:00
Ralph Castain
9566650458 Per Marco, don't define a "min" function if one is already defined to avoid conflict with cygwin reserved word
This commit was SVN r30241.
2014-01-10 18:03:25 +00:00
Ralph Castain
c7a94a57d7 Per Marco, rename ERROR tags to exit_ERROR to avoid cygwin reserved name issues.
Refs trac:4085

This commit was SVN r30239.

The following Trac tickets were found above:
  Ticket 4085 --> https://svn.open-mpi.org/trac/ompi/ticket/4085
2014-01-10 18:00:49 +00:00
Mike Dubman
110c99af4f sharing negative tag space between libNBC and HCOLL
fixed by devendar, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30224.
2014-01-10 12:51:34 +00:00
Nathan Hjelm
bb01fc2938 Add missing MCA variable enumerator sentinel.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30178.
2014-01-09 15:28:42 +00:00
Mike Dubman
0fae2caef3 Create a comm keyval for hcoll component with delete callback function.
Set comm attribute with keyval.
Wait for pending hcoll module tasks in comm delete callback where PML
still valid on the communicator. safely destroy hcoll context during
hcoll module destructor.

Author: Devendar Bureddy 
reviewed by miked

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30175.
2014-01-09 11:27:24 +00:00
Mike Dubman
43d6a30693 Fix problems of:
- HCOLL close without init
- Call hcoll progress after comm finalize
- mpirun default for coll_hcoll_enable is 1

fixed by Igor, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30156.
2014-01-08 10:55:25 +00:00
Jeff Squyres
13b29cff2c This commit compliements/completes r30140. r30140 made all the
configury/Makefile.am changes; this commit renames the internal
installdirs.h framework struct field names to match the configry macro
names:

 * pkgdatdir ->	ompidatadir
 * pkglibdir -> ompilibdir
 * pkgincludedir -> ompiincludedir

This commit was SVN r30145.

The following SVN revision numbers were found above:
  r30140 --> open-mpi/ompi@8b778903d8
2014-01-07 23:36:33 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Brian Barrett
e811a8a9cb Make the Portals 4 collective component disable itself when there's not a
Portals 4 point-to-point (MTL or BTL) component in use

This commit was SVN r30109.
2014-01-02 22:35:37 +00:00
Mike Dubman
80f4e02e0a Several changes:
- Modifications to coll/hcoll component related to the changes in the libhcoll API. 
  Now, hcoll_destroy_context accepts one more parameter that indicates if the context was
  really destroyed as a result of the call. 
  This new "non-blocking" context destruction fixes hang discovered in IMB with mcast enabled. 
- Clean up all the left contexts (if any) on the comm_world destruction. 

fixed by Val, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30055.
2013-12-23 06:57:12 +00:00
Jeff Squyres
71ec6c1617 Remove unnecessary "mpi.h"; move opal headers to the top.
This commit was SVN r30053.
2013-12-22 20:38:43 +00:00
Jeff Squyres
0ab48ad0d2 Fix some annoying flex warnings that have been there for years.
Many thanks to Tom Fogal for the initial patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings

This commit was SVN r29904.
2013-12-14 00:36:12 +00:00
Mike Dubman
9a65e0d8c6 cosmetic fixed fpr hcol autotools
Refs: #3694

This commit was SVN r29841.
2013-12-08 09:45:13 +00:00
Mike Dubman
2e124454b4 cosmitic fix to remove redundant -lfca
use CPP extra flags var which propagated to coll/fca and scoll/fca
Refs: #3694

This commit was SVN r29832.
2013-12-07 15:00:54 +00:00
Devendar Bureddy
4554770ee4 hcol fixes
cmr=v1.7.4:reviewer=jladd

This commit was SVN r29787.
2013-12-03 20:21:40 +00:00
George Bosilca
cb24277737 Restrict the usage of MPI_Type_extent only to receiving processes
(aka the root). This commit is based on a patch provided by Pierre 
Jolivet.
Fix all the output to match the failing MPI call.

This commit was SVN r29761.
2013-11-27 12:09:31 +00:00
George Bosilca
68268377af Fix an error message for the igather and the usage of the extent on
non non-root processes for the iscatter. Thanks to Pierre
Jolivet for the bug report and the patch.

This commit was SVN r29736.
2013-11-23 00:59:22 +00:00
Nathan Hjelm
24a7e7aa34 Add support for the udreg registration cache and dynamics on XE/XK/XC.
To support the new mpool two changes were made to the mpool infrastructure:

 1) Added an mpool flag to indicate that an mpool does not need the memory
    hooks to use the leave pinned protocols. This flag is checked in the
    mpool lookup.

 2) Add a mpool context to the base registration. This new member is used
    by the udreg mpool to store the udreg context associated with the
    particular registration. The new member will not break the ABI
    compatibility as the new member is only currently used by the udreg
    mpool.

Dynamics support for Cray systems makes use of the global rank provided by
orte to give the ugni library a unique rank for each process. Dynamics
support is not available under direct-launch (srun.)

cmr=v1.7.4

This commit was SVN r29719.
2013-11-18 04:58:37 +00:00
Brian Barrett
cf8de1ef0f Minor indent cleanup in init_query()
Only use Portals on communicators with more than one rank
Fix computation of number of children when using the hypercube tree

This commit was SVN r29616.
2013-11-06 15:21:09 +00:00
Nathan Hjelm
c71125acfd Using MPI_* functions in iallreduce can cause comm-spawned processes to
crash. Update libnbc's iallreduce function to use ompi_* functions
instead.

cmr=v1.7.4:reviewer=brbarret

This commit was SVN r29582.
2013-11-01 16:45:54 +00:00
Nathan Hjelm
a31e617d17 Remove outdated comments in coll_basic_reduce_scatter.c.
Refs trac:1559

This commit was SVN r29566.

The following Trac tickets were found above:
  Ticket 1559 --> https://svn.open-mpi.org/trac/ompi/ticket/1559
2013-10-30 16:20:20 +00:00
Nathan Hjelm
167d5613db Do not do arithmetic with void * in basic neighborhood alltoall[vw].
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29558.
2013-10-29 20:02:13 +00:00
Nathan Hjelm
b202bb0d63 Fix the recursive halfing algorithms for reduce scatter in both basic
and tuned to correctly handle 0 recvcounts.

Tested with the reproducer from #1550.

Refs trac:1559

This commit was SVN r29542.

The following Trac tickets were found above:
  Ticket 1559 --> https://svn.open-mpi.org/trac/ompi/ticket/1559
2013-10-28 19:06:38 +00:00
Mike Dubman
d27cffedb9 expand tabs to 4 spaces
cd ompi/mca/coll/fca
for i in *.[ch]; do expand -t 4 $i > koko && mv koko $i; done
Refs: #3799

This commit was SVN r29472.
2013-10-22 17:05:55 +00:00
Jeff Squyres
6714890244 paffinity.h is gone and won't be coming back.
This commit was SVN r29467.
2013-10-22 15:59:00 +00:00
Mike Dubman
5a7dff2d15 fix icc warning
fixed by Dinar, reviewed by miked
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29428.
2013-10-12 18:04:28 +00:00
Nathan Hjelm
6232ef3bfb At coll_select time we can not check whether the communicator has a
virtual topology. Remove code checking for a virtual topology until
this flag is set before coll_select.

This commit was SVN r29344.
2013-10-03 03:37:46 +00:00
Nathan Hjelm
7bedf62dd8 Add basic algorithms for the remaining non-blocking collectives.
The algorithms are intended for MPI-3.0 compliance and are not
optimized. We should aim to add better algorithms in the future through
cheetah.

MPI_Iallreduce and MPI_Igatherv on intercommunicators are required for
MPI_Comm_idup support.

cmr=v1.7.4:reviewer=brbarret:ticket=trac:2715

This commit was SVN r29333.

The following Trac tickets were found above:
  Ticket 2715 --> https://svn.open-mpi.org/trac/ompi/ticket/2715
2013-10-02 14:26:23 +00:00
Mike Dubman
19748e6957 fix race condition which can happen on finalize
1. Change in rte api implementation: now comm_world used to do p2p. 
This allows to not worry about other comms being destroyed.

2. added a notification mechanism with a help of which runtime can say libhcoll that RTE api can not be used any longer. 
pass a pointer to a flag, and its size to libhcoll.
The flag changes when the RTE is no longer available. 
Currently this flag is just ompi_mpi_finalized global bool value.

cmr=v1.7.3:reviewer=jladd

This commit was SVN r29331.
2013-10-02 13:38:47 +00:00
Nathan Hjelm
4f12406436 Don't check for neighborhood collective routines on non-virtual topology communicators
This commit was SVN r29319.
2013-10-01 19:59:18 +00:00
Mike Dubman
9bf7578ff2 fix memory corruption
cmr:v1.7.3:reviewer=ompi-rm1.7

This commit was SVN r29293.
2013-09-30 06:18:12 +00:00
Nathan Hjelm
c5596548b2 MPI-3: Add support for neighborhood collectives
Blocking versions are simple linear algorithms implemented in coll/basic. Non-
blocking versions are from libnbc 1.1.1. All algorithms have been tested with
simple test cases.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29265.
2013-09-26 21:55:08 +00:00
Mike Dubman
7c6ff00da5 Add caching of FCA communicators
developed by Dinar, reviewed by miked/yossi.
cmr:v1.7.3:reviewer=jsquyres:subject=add caching of FCA communicators.

This commit was SVN r29256.
2013-09-26 17:48:07 +00:00
Joshua Ladd
82e092db1b Adding interface changes in hcoll component to support non-blocking collectives in libhcoll. This was added by Elena Elkina and reviewed by Josh Ladd.
cmr:v1.7.3:reviewer=jladd:subject=Add support for non-blocking collectives in hcoll

This commit was SVN r29244.
2013-09-25 16:14:59 +00:00
Mike Dubman
2a5c342587 Modifications that are necessary in order to meet latest libhcoll API.
cmr:v1.7.3:reviewer=jladd

This commit was SVN r29202.
2013-09-18 12:22:02 +00:00
George Bosilca
5f686a90d0 Fix several issues regarding MPI_IN_PLACE and different flavors
of MPI_Alltoall.
 - add support for MPI_IN_PLACE in the self collective component.
 - fix the extent usage in the tuned collective component.
 - correctly use the peer counts instead of local - add support for MPI_IN_PLACE in the self collective component.
 - fix the extent usage in the tuned collective component.
 - correctly use the peer counts instead of local.
Thanks to Fujitsu for the patch.

This commit was SVN r29187.
2013-09-17 11:35:18 +00:00
Brian Barrett
16a1166884 Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a
configure-time dynamic allocation of flags.  The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process.  Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.

This commit was SVN r29100.
2013-08-30 16:54:55 +00:00
Nathan Hjelm
f5495ace48 coll/ml: update the coll_ml_enable_fragmentation variable to support the
option to autodetect whether fragmentation should be enabled

cmr=v1.7.3:ticket=trac:3717

This commit was SVN r29065.

The following Trac tickets were found above:
  Ticket 3717 --> https://svn.open-mpi.org/trac/ompi/ticket/3717
2013-08-27 16:36:54 +00:00
Ralph Castain
c74c54e18d Cleanup uninitialized warnings
This commit was SVN r29033.
2013-08-16 21:23:09 +00:00
Nathan Hjelm
6c75699068 coll/ml: fix typo in assert that could cause an abort in debug builds.
cmr=v1.7.3:reviewer=manjugv

This commit was SVN r29024.
2013-08-12 14:31:44 +00:00
Nathan Hjelm
47320713bb coll/ml: do not register variables in open and fix a bug in the coll/ml parser
cmr=v1.7.3:reviewer=pasha

This commit was SVN r29010.
2013-08-09 17:55:30 +00:00
Brian Barrett
2cc947513b * Fix some compile errors
* Need to subtract 1 off the size so that we stay in the bit length requirements

This commit was SVN r28997.
2013-08-05 18:49:48 +00:00
Nathan Hjelm
e4f105ffb3 revert change that shouldn't have been part of r28952
This commit was SVN r28953.

The following SVN revision numbers were found above:
  r28952 --> open-mpi/ompi@cb90a4a7fc
2013-07-25 20:23:55 +00:00
Nathan Hjelm
cb90a4a7fc Add simple algorithms to support MPI_IN_PLACE for MPI_Alltoall, MPI_Alltoallv, and MPI_Alltoallw.
Working on faster algorithms for tuned that will come at a later time.

cmr=v1.7.3:ticket=trac:2965

This commit was SVN r28952.

The following Trac tickets were found above:
  Ticket 2965 --> https://svn.open-mpi.org/trac/ompi/ticket/2965
2013-07-25 19:19:41 +00:00
Jeff Squyres
baa3182794 Per RFC
(http://www.open-mpi.org/community/lists/devel/2013/07/12534.php),
remove a bunch of dead code.

This commit was SVN r28756.
2013-07-11 17:34:28 +00:00
Joshua Ladd
16beaa3878 This fixes the nasty configure.m4 hack that was added long ago and not removed. My fault for not catching earlier. I've also removed the '.ompi_ignore' in coll/hcoll. Throwing this to Nathan for review. Upon successful review, this should be added to cmr:v1.7:reviewer=hjelmn
This commit was SVN r28753.
2013-07-11 09:55:46 +00:00
Jeff Squyres
28dac8010b The hcoll component configure.m4 commits multiple sins, and breaks
many builds.  I am temporarily .ompi_ignore'ing this component until
it can be fixed by its owner.

 * It calls AC_MSG_ERROR, which configure.m4 scripts are ''never''
   supposed to do.  If you don't want to build, then call $2.
 * All static and --disable-dlopen builds are broken; they fall afoul
   of whatever test configure.m4 is doing and therefore error out of
   configure entirely (vs. simply disabling the hcoll component).
 * There appear to be multiple shell scripting errors in the
   configure.m4.  Here's the output of "./configure --disable-dlopen":
{{{
--- MCA component coll:hcoll (m4 configuration macro)
checking for MCA component coll:hcoll compile mode... static
checking --with-hcoll value... simple ok (unspecified)
./configure: line 421: test: basic: integer expression expected
configure: error: Can not use coll/hcoll and coll/ml (static build)
   simultaneously. You have two options:
                1. Use static build & disable ml with:
   --enable-mpi-no-build=coll-ml
                2. Use dso build for ML & disable ml at runtime: -mca
   coll self
./configure: line 310: return: basic: numeric argument required
./configure: line 320: exit: basic: numeric argument required
}}}

Finally, all of these configure.m4 errors aside, I don't understand
why there is a ''compile-time'' exclusion between the hcoll and ml
components.  Why isn't this a ''run-time'' decision?  Having what
seems to be an unnecessary compile-time exclusion goes against the
general Open MPI philosophy.

Note: Open MPI 1.7 is also broken in all the same ways.  I suggest
that the RM's .ompi_ignore hcoll over there, too.

Mellanox: please fix.

This commit was SVN r28748.
2013-07-10 16:03:15 +00:00
Aurelien Bouteiller
e1066143a4 rename ompi_free_list operations to _mt, as per discussions at last face to face meeting
This commit was SVN r28734.
2013-07-08 22:07:52 +00:00
Brian Barrett
ecbbf888d3 * Update Portals 4 MTL's multi-md code to be a bit cleaner (no if statements
in the path) and not create MDs due to boundary crossing
* Add the same logic to the Coll component

This commit was SVN r28733.
2013-07-08 21:27:37 +00:00
Brian Barrett
84aeb6a6a5 Update request alloc to use free list get instead of free list wait.
This commit was SVN r28729.
2013-07-05 20:24:43 +00:00
George Bosilca
c9e5ab9ed1 Our macros for the OMPI-level free list had one extra argument, a possible return
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.

The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.

This commit was SVN r28722.
2013-07-04 08:34:37 +00:00
Brian Barrett
d3b49535b5 Only allow communication from the same user, since we don't have job-level
protection.

This commit was SVN r28715.
2013-07-03 17:29:02 +00:00
Brian Barrett
81efd0e3cf Properly shut down Portals collective component
This commit was SVN r28707.
2013-07-02 22:07:27 +00:00
Brian Barrett
133dafd3dc First take at Barrier and Ibarrier, both of which seem to work.
This commit was SVN r28706.
2013-07-02 21:42:10 +00:00
Brian Barrett
e4698f5cd4 Shell of the Portals 4 collectives componetn
This commit was SVN r28703.
2013-07-02 15:23:55 +00:00
Joshua Ladd
5d2d5e958c Deleting garbage I accidentally committed. Thanks, Nathan\!
This commit was SVN r28698.
2013-07-01 22:50:54 +00:00
Joshua Ladd
d7a50343bf Per the details and schedule outlined in the attached RFC, Mellanox Technologies would like to CMR the new 'coll/hcoll' component. This component enables Mellanox Technologies' latest HPC middleware offering - 'Hcoll'. 'Hcoll' is a high-performance, standalone collectives library with support for truly asynchronous, non-blocking, hierarchical collectives via hardware offload on supporting Mellanox HCAs (ConnectX-3 and above.) To build the component, libhcoll must first be installed on your system, then you must configure OMPI with the configure flag: '--with-hcoll=/path/to/libhcoll'. Subsequent to installing, you may select the 'coll/hcoll' component at runtime as you would any other coll component, e.g. '-mca coll hcoll,tuned,libnbc'. This has been reviewed by Josh Ladd and should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28694.
2013-07-01 22:39:43 +00:00
Jeff Squyres
c8258c06e2 In coll_sm, we alloc a huge chunk of shared memory, divvy it into lots
of individual regions (each region is a multiple of page size in
length), and each process claims its own regions by binding it to its
local memory.  Each process would end up membining something like 16
individual regions in the overall shmem segment.

There were two errors in this code relating to the memory affinity
pinning.  Some combination of these two errors would lead to kernel
panics (!) on my RHEL 6.2 x86_64 machines when used with mmap'ed
shared memory (not posix or sysv shared memory, curiously enough):

1. The shared memory segment is initially divided into two regions:
control and data.  The control starts at the beginning of the shmem
segment, the data starts after that.  The data portion, unfortunately,
was ''not'' aligned to a page.  So all the multiple-of-page-size
regions that we divvy up were also not alined on page boundaries.  And
therefore all the regions we tried to membind were not on page
boundaries.

The solution was to ensure that the data portion started on a page
boundary.  Then all of the individual regions were on page boundaries,
too.

That being said, in my tests, Linux mbind() fails gracefully when the
address is not on a page boundary.  So I'm not sure how this worked at
all / led to a kernel panic...

2. There was some bad pointer math that resulted in membinding regions
larger than they should have been, resulting in region overlaps.
There were definitely overlaps between regions in the same process;
it's likely that there were overlaps between regions of multiple
processes, too -- I'm not sure (and don't care to figure out :-) ).

The solution was to fix the pointer math so that each region membinds
exactly only itself and no neighboring/overlapping regions.

cmr:v1.7.2:reviewer=samuel

This commit was SVN r28442.
2013-05-03 12:49:35 +00:00
Nathan Hjelm
9d4a26f47d Update OMPI frameworks to use the MCA framework system.
Notes:
  - This commit also eliminates the need for an available components list in use
    in several frameworks. None of the code in question was making use of the
    priority field of the priority component list item so these extra lists were
    removed.
  - Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
  - Cleans up the ompi/orte-info functions. Expose the functions that construct the
    list of params so they can be used elsewhere.

patches for mtl/portals4 from brian

missed a few output variables in openib

This commit was SVN r28241.
2013-03-27 21:17:31 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Ralph Castain
a4b6fb241f Remove all remaining vestiges of the Windows integration
This commit was SVN r28137.
2013-02-28 17:31:47 +00:00