1
1
Граф коммитов

85 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
0f983d5a4f add a disable function for coll module 2014-10-14 14:46:36 +09:00
Howard Pritchard
0f74467264 switch to ompi_mpi_thread_provided for ts check
Use ompi_mpi_thread_provided rather than opal_using_threads macro
to check whether MPI_THREAD_MULTIPLE is being used.

This commit was SVN r32815.
2014-09-29 22:20:35 +00:00
Howard Pritchard
7069f2361a disqualify coll ml for MPI_THREAD_MULTIPLE
This commit was SVN r32814.
2014-09-29 21:02:15 +00:00
Gilles Gouaillardet
edfbeba7bf coll/ml: better error handling
when CHECK_AND_RECYCLE detects an error, a message is displayed
if the error occurs on an intrinsic communicator, then abort
the program (instead of trying to free the communicator)

cmr=v1.8.3:reviewer=hjelmn

This commit was SVN r32659.
2014-09-01 10:00:49 +00:00
Ralph Castain
b554cd7d86 Turn off the coll/ml component if --without-hwloc was given
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32621.
2014-08-27 20:25:39 +00:00
Ralph Castain
5f244e8b19 Use opal_getpagesize to get the proper page size
Refs trac:4826

This commit was SVN r32422.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-04 20:23:00 +00:00
Gilles Gouaillardet
5b1ae87c76 coll/ml: fix ML_ERROR/printf parameters
cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32409.
2014-08-04 04:05:59 +00:00
Gilles Gouaillardet
cd8fa75f87 coll/ml: align on page size as returned by sysconf
Thanks to Paul Hargrove for pointing into the right direction

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32393.
2014-08-01 08:10:12 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Nathan Hjelm
56ad231b7c coll/ml: temporarily disable binding check
This commit was SVN r32178.
2014-07-09 14:39:49 +00:00
Nathan Hjelm
309a6cf951 coll/ml: set n_resources to 0 when destructing an lmngr
Also keep track of the allocation base so we free the correct pointer
when cleaning up.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r32151.
2014-07-07 15:11:26 +00:00
George Bosilca
843ef1fcb0 ompi_mpi_abort had one extra argument that was never used. Clean it up.
This commit was SVN r32124.
2014-07-03 00:34:44 +00:00
Nathan Hjelm
c32d84154a coll/ml: fix leaks and close all the framework opened
It is essential to call mca_base_framework_close for every framework
that is opened. coll/ml was not doing this so neither bcol nor sbgp
were getting cleaned up. This commit fixes this omission.

Also fixed a leak caused by calling OBJ_DESTRUCT for something created
with OBJ_NEW. With these changes coll/ml appears to be valgrind clean.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31743.
2014-05-13 21:22:12 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Nathan Hjelm
e125bbe347 coll/ml: clean out apparently stale code
The file coll_ml_ibarrier.c wasn't included in coll/ml's Makefile.am
and the setup code from coll_ml_hier_algorithms_ibarrier.c was not
being called. It looks like this code is stale and has long since been
replaced by the code in coll_ml_barrier.c

Once all these little CMRs are approved I may make it into one roll-up
CMR to make it easier on the RM.

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31418.
2014-04-16 22:43:43 +00:00
Nathan Hjelm
484a3f6147 coll/ml: fix issues identified by the clang static analyser and fix
a segmentation fault in the reduce cleanup

Some of the changes address false warnings produced by scan-build. I
added asserts and changed some malloc calls to calloc to silence these
warnings.

The was one issue in cleanup for reduce since the component_functions
member is changed by the allreduce call. There may be other issues
with how this code works but releasing the allocated
component_functions after setting up the static functions addresses
the primary issue (SIGSEGV).

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31417.
2014-04-16 22:43:35 +00:00
Nathan Hjelm
71bdb8c439 coll/ml: fix some warnings identified by clang
cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31285.
2014-03-28 22:31:41 +00:00
Nathan Hjelm
459431622b Revert "coll/ml: there is no reason not to enable coll/ml when a process in not"
Discussed this with Manju and we decided to back this one out until a later time.

This reverts commit r31188 and closes trac:4435

This commit was SVN r31282.

The following SVN revision numbers were found above:
  r31188 --> open-mpi/ompi@f1dd589092

The following Trac tickets were found above:
  Ticket 4435 --> https://svn.open-mpi.org/trac/ompi/ticket/4435
2014-03-28 21:16:34 +00:00
Manjunath Gorentla Venkata
28609d3ac2 Clean wanring in sbgp and coll ml
This commit was SVN r31280.
2014-03-28 19:53:36 +00:00
Manjunath Gorentla Venkata
8c849ee991 coll/ml : Replace longer error message with opal_show_help; thanks Jeff for identifying those
This commit was SVN r31279.
2014-03-28 19:25:54 +00:00
Nathan Hjelm
a9fb4976d5 coll/ml: more fixes
There were a couple of issues with the memory leak fixes and several more verbose
issues. This fixes those issues.

cmr=v1.8.1:ticket=trac:4473

This commit was SVN r31273.

The following Trac tickets were found above:
  Ticket 4473 --> https://svn.open-mpi.org/trac/ompi/ticket/4473
2014-03-28 18:31:28 +00:00
Nathan Hjelm
bd3b550c6d coll/ml: fix leaks
Thanks to ggouaillardet for finding and fixing these issues.

Closes trac:4460

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31264.

The following Trac tickets were found above:
  Ticket 4460 --> https://svn.open-mpi.org/trac/ompi/ticket/4460
2014-03-27 23:25:31 +00:00
Nathan Hjelm
0cccb2fb59 coll/ml: reduce noise from coll/ml error messages
The error doesn't prevent the user from running so there is no reason
to display it unless the user requested it (through coll_ml_verbose).

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31242.
2014-03-26 22:50:06 +00:00
Nathan Hjelm
15a8c9d7b8 coll/ml: addendum to r31189. increment the bcol_index
cmr=v1.8:ticket=trac:4436

This commit was SVN r31193.

The following SVN revision numbers were found above:
  r31189 --> open-mpi/ompi@c7d830f4b9

The following Trac tickets were found above:
  Ticket 4436 --> https://svn.open-mpi.org/trac/ompi/ticket/4436
2014-03-21 22:03:56 +00:00
Nathan Hjelm
c7d830f4b9 coll/ml: improve the buffer size calculation and ensure the bcol_index in
a hierarchy actually matches a bcol that is in use.

There was a bug in one of the paths to calculate the ml buffer size. I fixed
the bug and squashed all the paths together to avoid further issues (the
result was correct in another path that calculated the same value).

Additionally, the i_hier was being used as the bcol_index. This is not
correct in a couple of cases so I added a variable to keep track of the
real bcol_index.

cmr=v1.8:reviewer=pasha

This commit was SVN r31189.
2014-03-21 21:54:28 +00:00
Nathan Hjelm
f1dd589092 coll/ml: there is no reason not to enable coll/ml when a process in not
bound.

This case is correctly handled by coll/ml so remove the check that diables
coll/ml in the not bound case.

cmr=v1.8:reviewer=manjugv

This commit was SVN r31188.
2014-03-21 21:54:21 +00:00
Nathan Hjelm
08bbdcbf61 coll/ml: fix leaks in coll/ml resources
This patch fixes two leaks:

 - Fix typo in fallback collective code that caused coll/ml to retain
   the ibcast module twice but only release it once. One of those ibcast
   saves was supposed to be bcast.

 - Do not check for module initialization in the module destructor. It
   is possible to destruct a module that is partially setup.

cmr=v1.8:reviewer=manjugv

This commit was SVN r31187.
2014-03-21 21:54:14 +00:00
Nathan Hjelm
e764d3bebc coll/ml: really remove the asserts in the barrier setup
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31136.
2014-03-18 22:04:50 +00:00
Nathan Hjelm
e030443d45 coll/ml: further improve the hierarchy discovery to handle the case where a
sbgp module fails to group any processes on any nodes.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31131.
2014-03-18 21:26:24 +00:00
Nathan Hjelm
8b2d723fd4 coll/ml: fix valgrind warning about reading uninitialed value
This isn't causing any errors that I know about but it does fix an
annoying valgrind warning. Simple fix, no review required.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31130.
2014-03-18 21:26:17 +00:00
Nathan Hjelm
d9c8bf3785 coll/ml: move error messages to verbose output
There are situations where coll/ml does not initialize properly. These will
eventually need to be fixed but in the meantime it is better to not always
print an error message because the collective framework can still fall back
on another collective module. This commit reduces the verbose output.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31129.
2014-03-18 21:26:10 +00:00
Nathan Hjelm
97d7315dd2 coll/ml: do not assert if a barrier algorithm is not available
It is usually not a good idea to assert when something is not implemented
or something goes wrong. Replace asserts with debug output and return.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31128.
2014-03-18 21:26:04 +00:00
Jeff Squyres
5efd961149 Remove unnecessary \n's in ML_VERBOSE and ML_ERROR.
Also fixed spelling: IS_NOT_RECHABLE -> IS_NOT_REACHABLE.

Also mark a few places where opal_show_help() should have been used;
Manju will take care of these.

This commit was SVN r31104.
2014-03-18 12:24:32 +00:00
Nathan Hjelm
3f469d08e7 coll/ml: increase the number of allowed processes in a local reduce and
add checks to see if the bcol module can support allreduce.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31096.
2014-03-17 23:10:19 +00:00
Nathan Hjelm
f92579dce5 coll/ml: fix a case not correctly handled by r31071
In r31071 I modified the logic to not increment the hierarchy level if
no processes were selected by that sbgp. That fixed a problem seen on
systems where we don't support process binding. The problem is there
is a case where we actually did select processes yet the number of
selected processes is 0. We need to increment the hierarchy in this case
as well.

This should fix the segmentation fault found by recent MTT runs. Once
this is committed to 1.7.5 remove the .ompi_ignore's from coll/ml and
bcol/ptpcoll. Tested with ompi-tests/ibm.

cmr=v1.7.5:reviewer=rhc

This commit was SVN r31081.

The following SVN revision numbers were found above:
  r31071 --> open-mpi/ompi@1911d97044
2014-03-15 22:37:28 +00:00
Jeff Squyres
34d92315ae Remove extraneous "while(0)".
Oops.

cmr=v1.7.5:ticket=trac:4395

This commit was SVN r31075.

The following Trac tickets were found above:
  Ticket 4395 --> https://svn.open-mpi.org/trac/ompi/ticket/4395
2014-03-14 20:41:54 +00:00
Jeff Squyres
036db91f3d For the love of all that is holy, do not put 1MB arrays on the stack.
This was causing JVMs to run out of stack space, and all manner of
badness ensued.

Instead, use the heap -- that's what it's there for.

cmr=v1.7.5:reviewer=rhc:subject=make coll/ml use the heap for large debug array

This commit was SVN r31073.
2014-03-14 20:39:39 +00:00
Nathan Hjelm
1911d97044 coll/ml: fix assertion failure that occurs when level 0 of the hierarchy
fails to select any processes on any nodes.

Also modified basesmsocket to only print debugging info to the framework
output.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31071.
2014-03-14 19:39:00 +00:00
Nathan Hjelm
61f30d992a coll/ml: reduce has some issues when using non-contiguous datatypes. until
these issues are resolved disable coll/ml reduce.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31030.
2014-03-12 14:39:16 +00:00
Jeff Squyres
da87b506bd Remove warnings identified by clang 3.4
* Remove unused static functions
 * Remove unused static variables

cmr=v1.8:reviewer=hjelmn

This commit was SVN r31023.
2014-03-12 13:17:54 +00:00
Nathan Hjelm
da2a68f669 coll/ml: fix bcast buffer size calculation
cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30963.
2014-03-07 21:00:08 +00:00
Nathan Hjelm
0af741810c coll/ml: do not access group proc pointers directly. use ompi_comm_peer_lookup instead.
Resolves an issue seen with --enable-sparse-groups.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30945.
2014-03-05 22:57:21 +00:00
Ralph Castain
29a7eda280 Remove executable property
This commit was SVN r30791.
2014-02-21 17:27:47 +00:00
Pavel Shamis
3a683419c5 Fixing broken dependency between ML/BCOLS
This is hot-fix patch for the issue reported by Ralph. 
In future we plan to restructure ml data structure layout.

Tested by Nathan.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30619.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-07 19:15:45 +00:00
Ralph Castain
74d3393a4f Revert r30600, r30602-30604 as the first one broke the tarball and the others couldn't fix it
This commit was SVN r30605.

The following SVN revision numbers were found above:
  r30600 --> open-mpi/ompi@7d2c4cb468
  r30602 --> open-mpi/ompi@9e751a0302
  r30604 --> open-mpi/ompi@3012c280cf

Revision number ranges (suitable for "git log"):
  r30602-30604 --> open-mpi/ompi@9e751a03^..3012c280
2014-02-07 04:38:06 +00:00
Jeff Squyres
7d2c4cb468 There's a few ml-related bugs outstanding, and Nathan is looking into
them, but it's going to take a little time (at least one day).  So
Nathan says it's ok to .ompi_ignore coll ml until he's able to fix it.

This commit was SVN r30600.
2014-02-06 23:51:03 +00:00
Nathan Hjelm
c2b061cc84 basesmuma: clean up code
Several changes are contained in this commit:

 - Clean up tabs and trailing whitespaces

 - Use consistent indentation in changed files

 - Remove unused code. None of the removed code will ever have been
   used in a trunk build.

 - Clean up the smcm code quite a bit

 - Do not fflush stderr and use opal_output instead of fprintf.

These changes have been tested on Cray XE-6 and PSM systems.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30533.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-02-03 17:01:46 +00:00
Nathan Hjelm
afae924e29 coll/ml: fix some warnings and the spelling of indices
This commit fixes one warning that should have caused coll/ml to segfault
on reduce. The fix should be correct but we will continue to investigate.

cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30477.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-29 18:44:21 +00:00
Ralph Castain
b32556e6dc Fixes trac:4143
After IM with Nathan, apply patch from ticket after verification by Paul Hargrove that it fixes the problem on non-x86 32-bit platforms

Verified by Paul, RM-approved

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r30411.

The following Trac tickets were found above:
  Ticket 4143 --> https://svn.open-mpi.org/trac/ompi/ticket/4143
2014-01-24 17:56:52 +00:00
Nathan Hjelm
7ba8bd81fa coll/ml: remove debug fprintfs
cmr=v1.7.5:ticket=trac:4158

This commit was SVN r30367.

The following Trac tickets were found above:
  Ticket 4158 --> https://svn.open-mpi.org/trac/ompi/ticket/4158
2014-01-22 17:21:05 +00:00