1
1
Граф коммитов

1148 Коммитов

Автор SHA1 Сообщение Дата
Pascal Deveze
df59d6cdd4 coll-portals4: Correct and simplify how the data are cut in segment_nb segments (bcast) 2016-07-21 15:58:09 +02:00
Pascal Deveze
274f8d608c coll-portals4: Change output format and change variable names (minor changes). 2016-07-21 11:06:45 +02:00
Gilles Gouaillardet
14624506df coll/libnbc: do not exchange data between roots in ompi_coll_libnbc_ireduce_scatter_inter()
this is now useless since the scatter is done via the local communicator
2016-07-11 17:18:30 +09:00
Gilles Gouaillardet
a55d57406b coll/base: fix non zero lower bound datatype handling in mca_coll_base_alltoallv_intra_basic_inplace() 2016-07-08 16:55:26 +09:00
Gilles Gouaillardet
7b8094aac1 coll/base: silence misc warning
as reported by Coverity with CIDs 1363349-1363362

Offset temporary buffer when a non zero lower bound datatype is used.

Thanks Hristo Iliev for the report

(cherry picked from commit 0e393195d9)
2016-07-08 13:06:26 +09:00
Gilles Gouaillardet
678d08647b coll/libnbc: various fixes
- correctly handle non commutative operators
 - correctly handle non zero lower bound ddt
 - correctly handle ddt with size > extent
 - revamp NBC_Sched_op so it takes two buffers and matches ompi_op_reduce semantic
 - various fix for inter communicators

Thanks Yuki Matsumoto for the report
2016-07-07 15:55:49 +09:00
Gilles Gouaillardet
3e559a14a9 coll/inter: fix non standard ddt handling
- correctly handle non zero lower bound ddt
 - correctly handle ddt with size > extent

Thanks Yuki Matsumoto for the report
2016-07-07 15:49:59 +09:00
Gilles Gouaillardet
488d037d51 coll/basic: fix non standard ddt handling
- correctly handle non zero lower bound ddt
 - correctly handle ddt with size > extent

Thanks Yuki Matsumoto for the report
2016-07-07 15:49:53 +09:00
Gilles Gouaillardet
c06fb04a9a coll/base: fix non zero lower bound ddt handling in ompi_coll_base_reduce_intra_basic_linear()
Thanks Yuki Matsumoto for the report
2016-07-07 15:49:48 +09:00
Joshua Hursey
0a09f8bc51 coll/hcoll: Protect module destruct when not fully initialized
* If hcoll is given a negative priority, but not enabled=0 then
   the module is constructed, but then destructed before calling
   it's query(). So the previous pointers are not initialized.
   If we try to OBJ_RELEASE them in a debug build an assert will fire.
   This commit adds some protection against that and initializes
   the _module pointers to NULL.
2016-07-01 13:41:27 -05:00
Joshua Hursey
59f304b9e9 coll/base: neg. priority cleanup, verbose output improvements
* Print a verbose message if the component was disqualified because of
   a negative priority.
 * If a disqualified component provided a module, release it.
 * Display list of selected components in priority order
   - During the process of volunteering collective functions for a
     communicator, print the component name and priority. This will
     cause the verbose messages to be displayed in reverse priority
     order (lowest priority first, up to highest). This is helpful
     when determining which collective components are active in which
     order for a given communicator.
     To see the messages you need the following MCA parameter set to 9
     or higher: `-mca coll_base_verbose 9`
 * Adjust verbose for commonly needed verbose output from 10 to 9 to
   make it easier to access this information.
2016-07-01 13:41:27 -05:00
George Bosilca
9c4f56be4b Fix the coll_base_sendrecv function. 2016-06-18 18:23:51 +02:00
Gilles Gouaillardet
80e362de52 coll/base: fix memory free in ompi_coll_base_allreduce_intra_recursivedoubling err handler
Fix CID 1362630

Fixes open-mpi/ompi@0e393195d9
2016-06-09 13:12:25 +09:00
Gilles Gouaillardet
ead7efef3f coll/basic: silence CID 1362614 in mca_coll_basic_allreduce_inter() 2016-06-09 09:40:19 +09:00
Gilles Gouaillardet
ad2e1a5ae9 coll/base: silence CID 1362613 in ompi_coll_base_alltoall_intra_basic_linear() 2016-06-09 09:40:05 +09:00
Gilles Gouaillardet
80b267af1c coll/base: silence CID 1362601 in ompi_coll_base_sendrecv_zero() 2016-06-09 09:37:31 +09:00
Gilles Gouaillardet
0e393195d9 coll/base: fix [all]reduce with non zero lower bound datatypes
Offset temporary buffer when a non zero lower bound datatype is used.

Thanks Hristo Iliev for the report
2016-06-08 16:48:00 +09:00
Gilles Gouaillardet
c976559877 coll/basic: fix log basic bcast
The log basic bcast was completely broken. The rank 0 gets the
hibit set to -1, so it always returned an error.
2016-06-06 11:01:51 +09:00
George Bosilca
9376b0340b Fix the basic barrier.
The log basic barrier was completely broken. The rank 0 gets the
hibit set to 0, so it always returned an error.
2016-06-03 23:46:25 -04:00
George Bosilca
d577e12dd0 Fix comment. 2016-06-03 00:57:31 +09:00
George Bosilca
223d75595d Give a boost to MPI_Barrier.
Based on current implementation it is faster to use a blocking
send than the non-blocking version. Switch the exchange function
used in the barrier to use the blocking version combined with
the non-blocking version of the receive.
2016-06-02 11:45:25 +09:00
Gilles Gouaillardet
5f565dfec3 configury: clean the flex generated .c files 2016-06-01 11:13:31 +09:00
Valentin Petrov
5ff6372886 coll/hcoll: bugfix: initialize req_type field
If left uninitialized then segfault is possible in MPI_Waitall in
    the case the field by chance equals OMPI_REQUEST_GEN.
2016-05-25 15:38:01 +03:00
bosilca
b90c83840f Refactor the request completion (#1422)
* Remodel the request.
Added the wait sync primitive and integrate it into the PML and MTL
infrastructure. The multi-threaded requests are now significantly
less heavy and less noisy (only the threads associated with completed
requests are signaled).

* Fix the condition to release the request.
2016-05-24 18:20:51 -05:00
Gilles Gouaillardet
0a19337371 coll/base: return MPI_ERR_UNSUPPORTED_OPERATION when coll_base_*_two_procs algo is used on a communicator that has no two tasks
Thanks Dave Love for the report
2016-05-09 14:18:40 +09:00
Gilles Gouaillardet
6c9d65c0ca coll/libnbc: fix MPI_Ireduce_scatter_block for one task communicator
Thanks Lisandro Dalcin for the report

Fixes open-mpi/ompi#248
2016-05-06 09:43:29 +09:00
Joshua Ladd
4771c9ece6 Merge pull request #1617 from jladd-mlnx/topic/disable-hcoll-barrier-in-finalize-ompi-trunk
HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla
2016-05-04 10:12:34 -04:00
Todd Kordenbrock
3498bed650 Merge pull request #1555 from shawone/check_reduce_ret
coll-portals4: check return value from reduce kary tree functions
2016-05-03 10:17:23 -05:00
Devendar Bureddy
cafd55f18c HCOLL: fix hang in hcoll barrier called from finalize for MXM/yalla
tear down

HCOLL barrier may not complete if HCOLL progress is not called periodically.
which is the case in HCOLL teardown progress in the finalize.
(cherry picked from commit 793244d75dd94d1d5e0243bcccf6d04318750f3f)
2016-05-03 00:49:57 +03:00
Valentin Petrov
21f1c572c0 Adds mapping to hcoll complex dte 2016-04-19 14:14:28 +03:00
Nicolas Chevalier
c86d4035d2 coll-portals4: check return value from reduce kary tree functions 2016-04-18 12:02:30 +00:00
George Bosilca
004c0cc05b Fix issues identified by @derbeyn. 2016-03-29 15:50:32 -04:00
George Bosilca
57eadb0dd6 Fix for Coverity CID 1357152.
Or at least that was the origin of the issue. It turns out
we were freeing the wrong buffer (but as it only happen in the
case of an error we never noticed).
2016-03-24 00:53:30 -04:00
George Bosilca
4b38b6bd0c Fix multiple issues with the collective requests.
This patch addresses most (if not all) @derbeyn concerns
expressed on #1015. I added checks for the requests allocation
in all functions, ompi_coll_base_free_reqs is called with the
right number of requests, I removed the unnecessary basic_module_comm_t
and use the base_module_comm_t instead, I remove all uses of the
COLL_BASE_BCAST_USE_BLOCKING define, and other minor fixes.
2016-03-23 18:35:41 -04:00
Nathan Hjelm
c8b077f232 coll/ml: fix coverity issues
Fix CID 715744 (#1 of 1): Logically dead code (DEADCODE):
Fix CID 715745 (#1 of 1): Logically dead code (DEADCODE):

The free of scratch_num in either place is defensive programming. Instead of removing the free the conditional around the free has been removed to quiet the warning.

Fix CID 715753 (#1 of 1): Dereference after null check (FORWARD_NULL):
Fix CID 715778 (#1 of 1): Dereference before null check (REVERSE_INULL):

Fixed the conditional to check for collective_alg != NULL instead of collective_alg->functions != NULL.

Fix CID 715749 (#1 of 4): Explicit null dereferenced (FORWARD_NULL):

Updated code to ensure that none of the parse functions are reached with a non-NULL value.

Fix CID 715746 (#1 of 1): Logically dead code (DEADCODE):

Removed dead code.

Fix CID 715768 (#1 of 1): Resource leak (RESOURCE_LEAK):
Fix CID 715769 (#2 of 2): Resource leak (RESOURCE_LEAK):
Fix CID 715772 (#1 of 1): Resource leak (RESOURCE_LEAK):

Move free calls to before error checks to cleanup leak in error paths.

Fix CID 741334 (#1 of 1): Explicit null dereferenced (FORWARD_NULL):

Added a check to ensure temp is not dereferenced if it is NULL.

Fix CID 1196605 (#1 of 1): Bad bit shift operation (BAD_SHIFT):

Fixed overflow in calculation by replacing int mask with 1ul.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 10:11:16 -06:00
Nathan Hjelm
2f4e5325aa coll/base: fix coverity issues
Fix CID 1325868 (#1 of 1): Dereference after null check (FORWARD_NULL):
Fix CID 1325869 (#1-2 of 2): Dereference after null check (FORWARD_NULL):

Here reqs can indeed be NULL. Added a check to
ompi_coll_base_free_reqs to prevent dereferencing NULL pointer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-18 09:31:43 -06:00
Gilles Gouaillardet
fbed6df4a3 coll/base: fix a typo
typo was introduced in open-mpi/ompi@c98e97a46e
2016-03-11 14:18:03 +09:00
Aurélien Bouteiller
c98e97a46e Do not return MPI_ERR_PENDING from collectives. 2016-03-09 16:13:34 -05:00
Joshua Ladd
69e3c6f289 Merge pull request #1321 from jladd-mlnx/topic/add-allgatherv-reduce
Adding entry points for Allgatherv, iAllgatherv, Reduce, and iReduce.
2016-01-25 20:46:52 -05:00
Valentin Petrov
5e2a2c0755 BufFix for coll/hcoll: coll_request must be set to ACTIVE when alloced
If the state of the request is not set to OMPI_REQUEST_ACTIVE
       then MPI_Test would immediately signal such request completed
       while hcoll may still be working on it.

Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-23 03:23:59 +02:00
Joshua Ladd
e398bf6f3a Adding entry points for Allgatherv, iAllgatherv, Reduce, and iReduce.
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-23 03:09:29 +02:00
Jeff Squyres
60ffe713b8 common syms: whitelist bison-generated common symbols
Bison generates some common symbols that we can't do anything about,
so whitelist them.
2016-01-16 03:53:14 -08:00
Artem Polyakov
2abb2972ac Fix Mellanox copyrights with respect to the following PRs:
* https://github.com/open-mpi/ompi/pull/1184
* https://github.com/open-mpi/ompi/pull/1188
* https://github.com/open-mpi/ompi/pull/1197
* https://github.com/open-mpi/ompi/pull/1202
* https://github.com/open-mpi/ompi/pull/1210
* https://github.com/open-mpi/ompi/pull/1216
* https://github.com/open-mpi/ompi/pull/1236
* https://github.com/open-mpi/ompi/pull/1237
* https://github.com/open-mpi/ompi/pull/1248
* https://github.com/open-mpi/ompi/pull/1260
* https://github.com/open-mpi/ompi/pull/1264
2015-12-30 00:12:19 +06:00
Gilles Gouaillardet
cebde2a753 coll/tuned: add missing #include "opal/util/output.h"
Thanks Marco Atzeri for contributing the original patch
2015-12-24 14:41:17 +09:00
Gilles Gouaillardet
77f199d1d7 coll/fca: add missing #include <alloca.h> 2015-12-24 14:33:58 +09:00
Ralph Castain
ac6289dca6 Cleanup the warnings from the ompi layer when compiling optimized under Mac OSX
Cleanup per George's comments
2015-12-17 17:39:15 -08:00
igor.ivanov@itseez.com
0a9956927a ompi/coll: Fix warnings in fca components
warning: assignment from incompatible pointer type
2015-12-16 16:22:16 +02:00
igor.ivanov@itseez.com
8f45d83d46 ompi/coll: Fix warnings in hcoll component
warning: assignment from incompatible pointer type
2015-12-16 14:52:29 +02:00
Nathan Hjelm
9d659465b7 Merge pull request #1210 from artpol84/icbarrier_fix
Fix NBC iBarrier for inter-communicators.
2015-12-14 13:52:38 -08:00
Artem Polyakov
2d0919dbdc Fix NBC iGatherv for inter-communicators.
We need to use remote size to form a schedule.
2015-12-14 12:19:10 +06:00
Artem Polyakov
fc17deca43 Fix NBC iBarrier for inter-communicators.
Remove send of the extra message. This bug hase triggered on
MPICH/coll/nbicbarrier test. In this test a series of communicators
are created.
This extre-message was reseived after original communicator was destroyed
and queued into non_existing_communicator_pending. When new completely
unrelated communicator with the same id as original was created this message
was pushed into the frags_cant_match queue and caused seq numbers skew and hang.
2015-12-12 13:27:31 +06:00
Gilles Gouaillardet
3a3b13ea12 coll/base: fix an integer overflow in ompi_coll_base_reduce_generic
Refs open-mpi/ompi#1198
2015-12-11 13:55:59 +09:00
Gilles Gouaillardet
37c978f5e9 coll/libnbc: correctly handle changed types.
this fixes open-mpi/ompi@d816d1c194
thanks Jeff for the review
2015-12-07 10:13:43 +09:00
George Bosilca
3a9664ac9d Fix Coverity CIDs 1341584-1341589. 2015-12-06 14:06:36 -05:00
George Bosilca
688108cf7f Patch submitted by @ggouaillardet on ticket #1091. 2015-12-02 20:42:18 -05:00
George Bosilca
4d00c59b2e Cleanup the memory handling for temporary buffers in
some of the collective modules. Added a new function
opan_datatype_span, to compute the memory span of
count number of datatype, excluding the gaps in the
beginning and at the end. If a memory allocation is
made using the returned value, the gap (also returned)
should be removed from the allocated pointer.
2015-12-02 20:42:18 -05:00
Ryan Grant
324534b191 Merge pull request #1161 from tkordenbrock/topic/add.triggered.scatter
coll-portals4: add scatter and iscatter implementations that use Portals4 triggered operations
2015-11-30 16:53:47 -07:00
Todd Kordenbrock
4721b70dd5 coll-portals4: add scatter and iscatter implementations that use Portals4 triggered operations
This commit adds implementations of scatter and iscatter using
Portals4 triggered operations.  Currently, the only algorithm
is linear.
2015-11-30 15:07:18 -06:00
Todd Kordenbrock
f6f525e0d8 coll-portals4: remove unneeded code from gather
This commit removes two pieces of unneeded code from gather.  First
it removes destroy_tree() calls from linear_top(), because the
linear algorithm does not create a tree, so there is no need to
destroy it.  Second it removes unpack_bytes from the gather request
because it was calculated but never used.
2015-11-30 10:38:51 -06:00
Gilles Gouaillardet
d816d1c194 coll/libnbc: use PMPI_* and internal ompi_* insted of MPI_* 2015-11-20 13:46:19 +09:00
Ryan Grant
f60c506c68 Merge pull request #999 from tkordenbrock/topic/add.triggered.gather
coll-portals4: add gather and igather implementations that use Portals4 triggered operations
2015-10-20 14:59:09 -06:00
Gilles Gouaillardet
0f23037775 coll/base: fix memory allocation in mca_coll_base_alltoall_intra_basic_inplace 2015-10-19 16:47:59 +09:00
Jeff Squyres
62351f442a help: remove stale help messages and files
Found by contrib/check-help-strings.pl.
2015-10-13 16:50:20 -04:00
Todd Kordenbrock
7c738fb657 coll-portals4: add gather and igather implementations that use Portals4 triggered operations
This commit adds implementations of gather and igather using
Portals4 triggered operations.  The default algorithm is linear,
but binomial can be selected using an MCA parameter -
coll_portals4_use_binomial_gather_algorithm.
2015-10-13 11:26:35 -05:00
Nathan Hjelm
d8dc5292ed Merge pull request #1002 from hjelmn/ompi_coverity
ompi: fix coverity issues
2015-10-09 12:27:41 -06:00
Nathan Hjelm
4cb42f8264 ompi: fix coverity issues
Fixes CID 715741: Logically dead code

Verified. Removed dead code.

Fixes CID 1320878: Resource leak

Free proc_list before returning.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-10-09 08:41:27 -06:00
Gilles Gouaillardet
e946c82847 Revert "coll/basic: fix segmentation fault in neighborhood collectives if the degree"
This partially reverts commit open-mpi/ompi@76204dfafe.
2015-10-08 12:00:41 -04:00
Gilles Gouaillardet
99cca2cfd3 Revert "* comment on communicator creation in mca_topo_base_dist_graph_create(...)"
This partially reverts commit open-mpi/ompi@27e4389259.
2015-10-08 12:00:41 -04:00
George Bosilca
a8bdd8f668 Don't lose the pointer to the request array. Patch provided by
@ggouaillardet.
2015-10-08 12:00:41 -04:00
George Bosilca
88492a1e12 Consistently use the request array for all modules (single array stored
in the base).
Correctly deal with persistent requests (they must be always freed when
they are stored in the request array associated with the communicator).
Always use MPI_STATUS_IGNORE for single request waiting functions.
2015-10-08 12:00:41 -04:00
George Bosilca
01b32caf98 Update the basic module to dynamically allocate the right
number of requests.

Remove unnecessary fields.We don't need these fields.
2015-10-08 12:00:41 -04:00
George Bosilca
a324602174 Never allocate a temporary array for the requests. Instead rely on the
module_data to hold one with the largest necessary size. This array is
only allocated when needed, and it is released upon communicator
destruction.
2015-10-08 12:00:41 -04:00
Todd Kordenbrock
f33b0c1cdf coll-portals4: allreduce: remove extra %d from error message. 2015-10-08 07:57:33 -05:00
Devendar Bureddy
72f98ccf6c HCOLL: Enable alltoall interface 2015-10-07 08:00:04 +03:00
Gilles Gouaillardet
de8de65b07 coll/tuned: remove unused prototypes from coll_tuned.h 2015-10-06 09:07:48 +09:00
Mike Dubman
e8d7373b14 COLL/FCA: revert to prev barrier if called from finalize
FCA barrier may not complete if FCA progress is not called periodically.
PMI/PMI2 API that can be used in rte barrier has no provision for calling
external progress function.

So it is possible that during finalize some ranks will be stuck
in fca barrier while others are in PMI barrier.
2015-10-04 09:40:19 +03:00
Devendar Bureddy
243b75aa80 HCOLL: Add alltoallv interface 2015-10-02 01:51:33 +03:00
Nathan Hjelm
12bd300c40 Merge pull request #929 from hjelmn/add_procs
Update add_procs support
2015-09-28 17:29:13 -06:00
Todd Kordenbrock
3e63a3458c portals4: add support for dynamic add_procs() to all Portals4 components
In the default mode of operation, the Portals4 components support
dynamic add_procs().

The Portals4 components have two alternate modes (flow control and
logical-to-physical) that require knowledge of all procs at startup.
In these modes, mtl-portals4 sets the MCA_MTL_BASE_FLAG_REQUIRE_WORLD
flag and btl-portals4 sets the MCA_BTL_FLAGS_SINGLE_ADD_PROCS flag
to tell the PML that we need all the procs in one add_procs() call.
2015-09-24 22:12:57 -05:00
Nathan Hjelm
30f8d0b038 coll/libnbc: fix coverity errors
Fix CID 1196812: Resource Leak

dsts array was leaked on error.

Fix CID 710565: Copy-paste error

The line in question (nbc:513) is indeed a copy-paste error.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 16:14:49 -06:00
Rolf vandeVaart
2c51faa58d Fix warnings due to missing const 2015-09-21 14:18:44 -04:00
Gilles Gouaillardet
a611274704 pml: fix commit open-mpi/ompi@6e6a3e965c
do not use the const modifier for allocator nor recv buffers
2015-09-18 09:54:18 +09:00
Gilles Gouaillardet
a1627feaf7 coll/ml, bcol: fix prototypes (e.g. use the const modifier) 2015-09-11 13:20:44 +09:00
Nathan Hjelm
a41889112c Remove calls to ompi_group_peer_lookup in coll/sm and coll/fca
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Gilles Gouaillardet
e01bac962f coll: do not cast way the const modifier when this is not necessary
update the coll framework and mpi c bindings
2015-09-09 09:18:57 +09:00
Jeff Squyres
bc9e5652ff whitespace: purge whitespace at end of lines
Generated by running "./contrib/whitespace-purge.sh".
2015-09-08 09:47:17 -07:00
Gilles Gouaillardet
c404e98dce coll/ml: silence warnings (incorrect callback prototype) 2015-09-07 14:56:49 +09:00
Gilles Gouaillardet
56f8a7b840 coll/ml: declare a global variable as static to avoid an uninitialized common symbol. 2015-09-07 14:56:03 +09:00
rhc54
665b30376a Merge pull request #868 from rhc54/topic/hwloc
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 17:58:07 -07:00
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
Pavel Shamis (Pasha)
32c69630ad ML: Replace opal ignore with a zero priority
The priority set by default to 0. As a result component open reports
an error and the component is not loaded (no resources allocated).
2015-09-04 11:28:47 -04:00
Gilles Gouaillardet
1a238d3a4f configury: fix fca detection
* do not add -I/.../include/fca -I /.../include/fca_core to CPPFLAGS
 * allow configure --with-fca
 * search fca libs in both DIR/lib and DIR/lib64
 * fix the description of the --with-fca option
2015-08-13 11:09:15 +09:00
Gilles Gouaillardet
df98a73131 configury: fix hcoll detection
* do not add -I/.../include/hcoll -I /.../include/hcoll/api to CPPFLAGS
 * allow configure --with-hcoll
 * search hcoll libs in both DIR/lib and DIR/lib64
 * fix the description of the --with-hcoll option
2015-08-13 11:08:56 +09:00
Nathan Hjelm
d42e0968b1 coll/libnbc: rewrite parts of libnbc
This commit rewrites parts of libnbc to fix issues identified by
coverity and myself. The changes are as follows:

 - libnbc function would return invalid error codes (internal to
   libnbc) to the mpi layer. These codes names are of the form
   NBC_. They do not match up with the error codes expected by the mpi
   layer. I purged the use of all these error codes with the exception
   of NBC_OK and NBC_CONTINUE in progress. These codes are used to
   identify when a request handle is complete.

 - Handles and schedules were leaked by all collective routines on
   error. A new routine was added to return a collective handle
   (NBC_Return_handle).

 - Temporary buffers containting in/out neighbors for neighborhood
   collectives were always leaked.

 - Neigborhood collectives contained code to handle MPI_IN_PLACE which
   is never a valid input for the send or receive buffer. Stipped this
   code out.

 - Files were inconsistently named. Most are nbc_isomething.c but one
   was named coll_libnbc_ireduce_scatter_block.c.

 - Made the NBC_Schedule "structure" and object so it can be
   retained/released. This may enable the use of schedule caching at a
   later time. More testing will be needed to ensure the caching code
   works. If it doesn't the code should be stripped out completely.

 - Added code to simply common case of scheduling send/recv +
   barrier.

 - Code cleanup for readability.

The code now passes the clang static analyzer.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-10 11:53:25 -06:00
Jeff Squyres
a0ebbee6ef libnbc: __FUNCTION__ -> __func__ fixes 2015-08-05 05:27:23 -07:00
Gilles Gouaillardet
318a1a40a4 coll/libnbc: ireduce_scatter_block
silence malloc(0) warning reported by Lisandro
2015-07-27 16:23:08 +09:00
Nathan Hjelm
ee36d813dc Merge pull request #657 from hjelmn/c99
more c99 updates
2015-06-25 11:21:09 -06:00
Howard Pritchard
f45914db9b Merge pull request #670 from hppritcha/topic/ownership_update
ownership: update ownership files
2015-06-25 11:02:45 -06:00
Nathan Hjelm
4d92c9989e more c99 updates
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-06-25 10:14:13 -06:00
Howard Pritchard
e49a37c034 ownership: update ownership files
per discussions at OMPI devel workshop

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-06-25 10:04:42 -06:00
Jeff Squyres
0bb3fd0a10 coll hierarch: remove last stale file 2015-06-25 08:40:50 -07:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Gilles Gouaillardet
0bd765eddd fix NBC_Copy for legitimate zero size messages
this fixes a regression from open-mpi/ompi@9a70765f27
2015-06-22 09:51:25 +09:00
Todd Kordenbrock
a274d2795c coll-portals4: implement collective operations using Portals4 triggered operations
This commit implements the reduce, allreduce, barrier and bcast
collective operations using Portals4 triggered operations.
2015-06-02 11:41:19 -05:00
rhc54
b59fa14004 Merge pull request #583 from rhc54/topic/mallocwarnings
Silence malloc(0) warnings reported by Lisandro
2015-05-12 13:37:38 -07:00
Ralph Castain
9a70765f27 Silence malloc(0) warnings reported by Lisandro 2015-05-12 12:38:58 -07:00
George Bosilca
78f5f0f8a9 Show the name of the collective that failed to get initialized. 2015-05-11 15:10:37 -04:00
Ralph Castain
6e95bcd583 Fix typo in oob_tcp.c when IPV6 enabled. Cleanup a few other warnings, including a type in coll_sm that prevented that component from registering its MCA params! 2015-05-07 21:05:08 -07:00
Gilles Gouaillardet
9d56b85b55 initialize common symbols from ompi 2015-05-08 10:11:58 +09:00
Devendar Bureddy
88eb1fa936 HCOLL: refactoring hcoll_init 2015-05-04 22:03:36 +03:00
Rolf vandeVaart
91a8ec52ca Fix possible unintialized warnings 2015-04-28 16:25:35 -04:00
Nathan Hjelm
033894b493 Merge pull request #541 from hjelmn/c99_components
C99 component initialization
2015-04-20 10:45:39 -06:00
Devendar Bureddy
19f5a3eff4 HCOLL: skip hcoll if enable_mpi_threads is true
reasons:
    1) default OCOMS is not configured with --enable-ocoms-multi-threads
    2) locking overheads
2015-04-20 19:39:49 +03:00
Devendar Bureddy
dd8e9fa176 HCOLL: enable by defaut 2015-04-20 19:39:30 +03:00
Nathan Hjelm
df75d0382f ompi: use C99 subobject naming for component initialization
This commit helps future-proof ompi components by initializing each
component member by name.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Nathan Hjelm
3436f2917d Merge pull request #449 from hjelmn/mca_base_update
mca/base update
2015-04-16 08:41:48 -06:00
Devendar Bureddy
6ddc7ac35c HCOLL: Fix assertion
hcoll context may not be destroyed if it is cached.
2015-04-01 20:33:28 +03:00
Nathan Hjelm
b68d66bb9b MCA: Add the project/project version to the MCA base component
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.

All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-03-27 10:59:04 -06:00
Devendar Bureddy
71c28cea65 HCOLL: hcoll dte fixes
- hcoll currently do not support datatype with gaps around it (i.e dtsize !=
dtextent)
    - check for user defined Ops.
2015-03-25 16:04:11 +02:00
Gilles Gouaillardet
6de973daae coll/sm: remove unused value
as reported by Coverity with CID 1269962
2015-03-09 17:31:32 +09:00
Gilles Gouaillardet
757b40e56a coll/tuned: remove dead code
as reported by Coverity with CID 1271638
that looks like a multiple paste error ...
2015-03-06 15:02:56 +09:00
Gilles Gouaillardet
71ac1331f1 coll/tuned: remove unused variables 2015-02-27 17:26:48 +09:00
Gilles Gouaillardet
b179a17018 coll/base: add function prototypes 2015-02-27 17:26:36 +09:00
Gilles Gouaillardet
ce2020d255 coll/base: fix error reporting
and silence CID 1271639
2015-02-27 17:04:26 +09:00
George Bosilca
ced44e12da Update copyright. 2015-02-26 15:54:58 -05:00
George Bosilca
47e6e15e02 Typo in a rebase. 2015-02-26 15:54:19 -05:00
George Bosilca
d126c2e6f8 Fix few COVERITY reported issues. 2015-02-26 15:53:42 -05:00
George Bosilca
44d590b8fd Fix a small problem with the handling of requests in MPI_Alltoall. 2015-02-26 15:52:44 -05:00
George Bosilca
3f757bc8cb Add a constructor for mca_coll_base_comm_t. 2015-02-26 15:52:36 -05:00
George Bosilca
d6e69ecab3 Do not preallocate any requests. They are instead automatically
preallocated on the first collective that needs them.
Remove the ompi_coll_tuned_preallocate_memory_comm_size_limit MCA
parameter.
2015-02-26 15:52:27 -05:00
George Bosilca
0445670bb9 Fix the automatic handling of communicator associated requests.
If the array doesn't exist, or if it's size is not adequate then
we reallocate it. Otherwise just keep using the same array of requests.
2015-02-26 15:52:18 -05:00
George Bosilca
67d01bd8cd Redirect most of the basic module functions to base. 2015-02-26 15:52:00 -05:00
George Bosilca
211f05fb09 Complete the dismantle of the tuned module. 2015-02-26 15:50:55 -05:00
George Bosilca
aa019e239e Rename the base header file containing the prototypes of the collective
functions.
2015-02-26 15:50:29 -05:00
George Bosilca
8fbcdf685d Split the tuned framework in two. Move all the functions down in the
base, so that they can now be used by all modules. Keep the decision
functions in tuned.
2015-02-26 15:46:13 -05:00
George Bosilca
004f65a865 Fix issue with the error reporting as suggested by Gilles. 2015-02-26 13:01:13 -05:00
Todd Kordenbrock
c73e4fd98b coll-portals4: fix incomplete free list conversion 2015-02-26 10:53:45 -06:00
Gilles Gouaillardet
05140df1e6 coll/tuned: regression fix
fix the regression introduced in open-mpi/ompi@004160f8da
2015-02-26 13:58:06 +09:00
Nysal Jan K.A
ded408f485 Fix a crash while closing libnbc
If the free list initialization fails in libnbc_open()
mca_coll_libnbc_component.active_requests remain uninitialized,
resulting in a crash while closing the component
2015-02-25 17:26:28 +05:30
Jeff Squyres
a85a392896 Merge pull request #422 from jsquyres/topic/coverity-fixes
Some Coverity fixes
2015-02-24 17:00:10 -05:00
Jeff Squyres
1c3cf068a4 nbc ireduce_scatter: ensure to check the correct return code
This was CID 1196644 and 1196621
2015-02-24 15:24:11 -05:00
Jeff Squyres
e9980654a8 nbc ireduce_scatter_block: ensure to check the correct return code
This was CID 1196643 and 1196615
2015-02-24 15:24:11 -05:00
Jeff Squyres
b35eb6fe10 nbc ireduce_scatter_block: ensure to check the correct return code
This was CID 709594 and 709592
2015-02-24 15:24:10 -05:00
Jeff Squyres
b0acef6f2d nbc_ireduce_scatter: ensure to check the proper return code
This was CID 709229 and 709224.
2015-02-24 15:24:10 -05:00
Jeff Squyres
1cf197d771 coll_basic_barrier: guard against opal_hibit() returning -1
This was CID 1196606 and 1196607
2015-02-24 15:24:08 -05:00
Nathan Hjelm
5f1254d710 Update code base to use the new opal_free_list_t
Use of the old ompi_free_list_t and ompi_free_list_item_t is
deprecated. These classes will be removed in a future commit.

This commit updates the entire code base to use opal_free_list_t and
opal_free_list_item_t.

Notes:

OMPI_FREE_LIST_*_MT -> opal_free_list_* (uses opal_using_threads ())

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-24 10:05:45 -07:00
Howard Pritchard
c9e81b54fb Merge pull request #412 from hppritcha/topic/owner_files
add owner files to opa/ompi/orte mca directories
2015-02-23 09:48:20 -07:00
Howard Pritchard
61fb62499a hcoll belongs to MLNX and is active 2015-02-23 09:14:03 -07:00
Gilles Gouaillardet
004160f8da coll/tuned: silence CID 1269934 2015-02-23 13:45:23 +09:00
Howard Pritchard
bf89131f9e add owner files to opa/ompi/orte mca directories
This commit adds an owner file in each of the component directories
for each framework.  This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page.  Currently there are two
"fields" in the file, an owner and a status.  A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
2015-02-22 15:10:23 -07:00
Igor Ivanov
010dce307a Fix set of coverity issues
List of CIDs (scan.coverity.com):
oshmem:
1269787, 1269907, 1270161, 1270162, 1270977, 1270978
ompi:
1270170, 1270172, 1270173

Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-02-20 17:45:46 +04:00
Gilles Gouaillardet
bda8058f39 coll/tuned: fix memory leaks and misc issues
as reported by Coverity with CIDs
70132, 70265, 70267, 70268, 70322, 70400, 70580, 70615,
1269934, 1269944, 1269968, 1269982, 1269983
2015-02-18 16:29:42 +09:00
Jeff Squyres
27a783b1c3 coll_sm_barrier: remove dead code
This was CID 1269978.
2015-02-12 10:24:02 -08:00
Rolf vandeVaart
1f749b0224 Bump priority of coll cuda component so it is higher than self.
Otherwise, get some odd interactions with coll self in CUDA-aware
builds.
2015-02-12 12:29:03 -05:00
Jeff Squyres
a1c521f968 We're .opal_ignore these days, not .ompi_ignore. :-) 2015-02-03 13:56:53 -08:00
Jeff Squyres
30f05bc966 Makefiles: remove unused macros 2015-01-31 04:51:25 -08:00
George Bosilca
7adf74c617 As discussed on the devel mailing list in
http://www.open-mpi.org/community/lists/devel/2015/01/16820.php,
coll ML has two pending issues: a deadlock and a performance critical
on every communicator creation. After confirmation over IM from
Pasha, the ML collective module will be disabled until it is
fixed. Token to Pasha.
2015-01-27 16:27:12 -05:00
Jeff Squyres
2d5b92157f hierarch: with Edgar's blessing, remove the coll hierarch module 2015-01-27 13:25:27 -06:00
Devendar Bureddy
036e687d9c HCOLL: Do not block hcoll progress in finalize 2015-01-27 17:01:00 +02:00
Gilles Gouaillardet
8c1698ae4a coll/libnbc: enhance fix for MPI_Ireduce_scatter on single task communicator
this improves open-mpi/ompi@b9349d2eb9
2015-01-09 13:44:01 +09:00
Devendar Bureddy
e732152304 HCOLL: Fix hcoll supported datatype checks corretcly 2015-01-02 21:18:12 +02:00
Gilles Gouaillardet
b9349d2eb9 coll/libnbc: fix MPI_Ireduce_scatter for single task communicator
when MPI_IN_PLACE is not used.

that commit fixes a regression introduced
open-mpi/ompi@49e79a9ade
2014-12-24 12:12:58 +09:00
Devendar Bureddy
e398ad6619 HCOLL: Fix OMPI to HCOLL predefined datatypes, Ops mapping 2014-12-23 22:30:29 +02:00
Rolf vandeVaart
3ec9685ee0 Add missing file to sources. Without this, tarball build does not work 2014-12-18 07:17:28 -08:00
George Bosilca
d622db783d Based on https://github.com/open-mpi/ompi/pull/262, we should use
true_lb while computing the lower bound.
2014-11-21 19:16:05 +09:00
Gilles Gouaillardet
705147e98b coll/tuned: fix allgather bruck algorithm 2014-11-21 19:16:05 +09:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
bosilca
e7c59e3adb Merge pull request #227 from ggouaillardet/rfc/coll_basic_neighbor
RFC/coll basic neighbor
2014-11-07 11:33:25 -05:00
Gilles Gouaillardet
64c18686b7 fix ompi_request_wait vs ompi_request_wait_all and
MPI_STATUS_IGNORE vs MPI_STATUSES_IGNORE
2014-11-04 12:16:30 +09:00
Jeff Squyres
c22e1ae33b configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros
These two macros set the prefix for the OPAL and ORTE libraries,
respectively.  Specifically, the OPAL library will be named
libPREFIXopen-pal.la and the ORTE library will be named
libPREFIXopen-rte.la.

These macros must be called, even if the prefix argument is empty.

The intent is that Open MPI will call these macros with an empty
prefix, but other projects (such as ORCM) will call these macros with
a non-empty prefix.  For example, ORCM libraries can be named
liborcm-open-pal.la and liborcm-open-rte.la.

This scheme is necessary to allow running Open MPI applications under
systems that use their own versions of ORTE and OPAL.  For example,
when running MPI applications under ORTE, if the ORTE and OPAL
libraries between OMPI and ORCM are not identical (which, because they
are released at different times, are likely to be different), we need
to ensure that the OMPI applications link against their ORTE and OPAL
libraries, but the ORCM executables link against their ORTE and OPAL
libraries.
2014-10-22 10:32:19 -07:00
bosilca
d819939841 Merge pull request #233 from ggouaillardet/rfc/coll_module_disable
Provide a symmetric behavior for the activation/deactivation of collective modules.
2014-10-16 09:22:04 -04:00
George Bosilca
7541c03b4c Mark all instances where atomic operations are used but their return value is unnecessary 2014-10-15 21:47:32 -04:00
Gilles Gouaillardet
e3f74aca1c Correctly mote the pointer back by the true_lb.
Fixes #231
2014-10-14 16:26:54 +09:00
Gilles Gouaillardet
0f983d5a4f add a disable function for coll module 2014-10-14 14:46:36 +09:00
Devendar Bureddy
7a6b4c36b0 HCOLL: Update the proc structure dereference
Update the proc structure dereference to reflect the new opal_proc_t
super field
2014-10-13 20:49:19 +03:00
Devendar Bureddy
b8d2a15be9 HCOLL: by default off 2014-10-13 20:49:09 +03:00
Gilles Gouaillardet
8eb2d62919 coll/sm: fix an other memory leak 2014-10-10 19:54:45 +09:00
Gilles Gouaillardet
27e4389259 * comment on communicator creation in mca_topo_base_dist_graph_create(...)
* use accesors to retrieve topo info
2014-10-10 16:07:20 +09:00
Gilles Gouaillardet
5d44a30111 coll/sm: fix minor memory leaks
port 4488.1.patch attached in #196 to master
2014-10-10 14:21:34 +09:00
Gilles Gouaillardet
76204dfafe coll/basic: fix segmentation fault in neighborhood collectives if the degree
of the topology is higher than the communicator size

It is possible to have a topology degree higher than the size of the communicator.
For example, a periodic cartesian communicator on MPI_COMM_SELF. This will leave
the neighborhood collectives with a request buffer that is too small.

This commits introduces a semantic change :
from now, c_topo must be set before invoking coll_select
2014-10-10 11:56:04 +09:00
Gilles Gouaillardet
2f67f29b85 Revert "coll/basic: fix segmentation fault in neighborhood collectives if the degree"
This reverts commit 9c788ff940.
2014-10-10 11:29:06 +09:00
Howard Pritchard
bb65835816 Fix iallgather problem with intercommunicators
A problem was found with the libnbc MPI_Iallgather
routine when using intercommunicators.  Special
thanks to Takahiro Kawashima(Fujitsu) for the patch
and a test case.  Verified master fails without the
patch and the test passes with the patch applied.

fixes #219
2014-10-02 11:45:17 -06:00
Howard Pritchard
0f74467264 switch to ompi_mpi_thread_provided for ts check
Use ompi_mpi_thread_provided rather than opal_using_threads macro
to check whether MPI_THREAD_MULTIPLE is being used.

This commit was SVN r32815.
2014-09-29 22:20:35 +00:00
Howard Pritchard
7069f2361a disqualify coll ml for MPI_THREAD_MULTIPLE
This commit was SVN r32814.
2014-09-29 21:02:15 +00:00
George Bosilca
49e79a9ade Fix the case of a single process.
This commit was SVN r32807.
2014-09-28 22:06:39 +00:00
Nathan Hjelm
9c788ff940 coll/basic: fix segmentation fault in neighborhood collectives if the degree
of the topology is higher than the communicator size

It is possible to have a topology degree higher than the size of the communicator.
For example, a periodic cartesian communicator on MPI_COMM_SELF. This will leave
the neighborhood collectives with a request buffer that is too small. This commit
adds a call that will dynamically increase the size of the request buffer if it
is too small.

A better fix would be to create the topology *before* calling the coll_select
routine on a communicator. This will take some discussion and the solution will
not likely be ready anytime soon.

Thanks to Lisandro Dalcin for reporting this.

Original thread: http://www.open-mpi.org/community/lists/devel/2014/08/15713.php

cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32796.
2014-09-25 17:43:29 +00:00
Rolf vandeVaart
5c73101a72 Fix typo.
This commit was SVN r32755.
2014-09-18 13:58:54 +00:00
Vasily Filipov
c7c63fe73e COLL/TUNED: alltoall - return previous default values of algorithm choosing decision thresholds (were changed by r32735)
reviewed by miked
    cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32753.

The following SVN revision numbers were found above:
  r32735 --> open-mpi/ompi@5fecf65daf
2014-09-18 08:07:51 +00:00
Rolf vandeVaart
8db1f89dd1 Small change to allow CUDA-aware to work with non-reduction nonblocking collectives.
Only used when CUDA-aware feature compiled in.

This commit was SVN r32750.
2014-09-17 16:55:01 +00:00
Vasily Filipov
ff10b25e7d warnings (caused by commit r32735) fix.
reviewed by miked
    cmr=v1.8.3:reviewer=ompi-rm1.8 

This commit was SVN r32740.

The following SVN revision numbers were found above:
  r32735 --> open-mpi/ompi@5fecf65daf
2014-09-16 06:33:49 +00:00
Vasily Filipov
5fecf65daf OMPI/COLL/Tuned: add command line params for thresholds to decide if small/intermediate MSGs alltoall algorithm will be used.
cmr=v1.8.3:reviewer=miked

This commit was SVN r32735.
2014-09-15 12:34:21 +00:00
Gilles Gouaillardet
edfbeba7bf coll/ml: better error handling
when CHECK_AND_RECYCLE detects an error, a message is displayed
if the error occurs on an intrinsic communicator, then abort
the program (instead of trying to free the communicator)

cmr=v1.8.3:reviewer=hjelmn

This commit was SVN r32659.
2014-09-01 10:00:49 +00:00
Ralph Castain
b554cd7d86 Turn off the coll/ml component if --without-hwloc was given
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32621.
2014-08-27 20:25:39 +00:00
Rolf vandeVaart
8709071819 Fix missing help file.
This commit was SVN r32550.
2014-08-18 21:52:31 +00:00
Gilles Gouaillardet
22cb8a1834 check-help-strings cleanup
This commit was SVN r32497.
2014-08-11 03:27:45 +00:00
Mike Dubman
0f60c34a9f fca: adopt opal API refactoring, fix warning.
based on http://www.open-mpi.org/community/lists/devel/2014/08/15558.php

This commit was SVN r32484.
2014-08-09 15:50:51 +00:00
Ralph Castain
e95187514c Update the proc structure dereference to reflect the new opal_proc_t super field
This commit was SVN r32462.
2014-08-08 16:12:49 +00:00
Jeff Squyres
132375f07f helpfiles: fix filenames referenced by calls to show_help()
This commit was SVN r32453.
2014-08-08 13:34:15 +00:00
Ralph Castain
5f244e8b19 Use opal_getpagesize to get the proper page size
Refs trac:4826

This commit was SVN r32422.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-04 20:23:00 +00:00
Gilles Gouaillardet
5b1ae87c76 coll/ml: fix ML_ERROR/printf parameters
cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32409.
2014-08-04 04:05:59 +00:00
Ralph Castain
daeb9b6c4f Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain.
Thanks to Gilles for pointing out some of the discrepancies.

This commit was SVN r32398.
2014-08-01 14:44:11 +00:00
Gilles Gouaillardet
cd8fa75f87 coll/ml: align on page size as returned by sysconf
Thanks to Paul Hargrove for pointing into the right direction

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32393.
2014-08-01 08:10:12 +00:00
Nathan Hjelm
1407c1f501 Remove RML code from common/sm
The only user of this code was coll/sm. I implemented a basic replacement
for the removed code. This gets the trunk compiling again with
--disable-dlopen.

This commit was SVN r32333.
2014-07-28 22:00:12 +00:00
Ryan Grant
caa10a5faf Portals fixes after latest move
This commit was SVN r32330.
2014-07-28 19:25:03 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Devendar Bureddy
74852b4d21 HCOLL: fix misplaced hcoll_init return value check.
cmr=v1.8.2:reviewer=jladd

This commit was SVN r32282.
2014-07-22 18:47:34 +00:00
Mike Dubman
e342a11c2e opal envlist mca: implement Jeff`s quibbles
fixed by Elena, reviewed by Miked

This commit was SVN r32216.
2014-07-11 07:23:20 +00:00
Nathan Hjelm
56ad231b7c coll/ml: temporarily disable binding check
This commit was SVN r32178.
2014-07-09 14:39:49 +00:00
Joshua Ladd
057370364d Opal: Add a new MCA variable type "version_string". Also add a
new flag to ompi_info that allows a user to print all MCA variables of a specific type.  

 --type version_string

This command will print all MCA variables of type version_string.

This feature was developed by Elena Shipunova and was reviewed by Josh Ladd.

This commit was SVN r32166.
2014-07-09 01:37:23 +00:00
Nathan Hjelm
309a6cf951 coll/ml: set n_resources to 0 when destructing an lmngr
Also keep track of the allocation base so we free the correct pointer
when cleaning up.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r32151.
2014-07-07 15:11:26 +00:00
George Bosilca
843ef1fcb0 ompi_mpi_abort had one extra argument that was never used. Clean it up.
This commit was SVN r32124.
2014-07-03 00:34:44 +00:00
Mike Dubman
ce6d5b8cd7 HCOLL: make it OFF by default
fixed by miked, reviewed by Alex

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32101.
2014-06-28 18:45:03 +00:00
Devendar Bureddy
228772ae81 hcoll gatherv support
cmr=v1.8.2:reviewer=jladd

This commit was SVN r32097.
2014-06-26 18:14:41 +00:00
George Bosilca
99561c5cc1 If the enable fails don't give up, but instead keep going with
the other collective modules. If we endup without some of the
collective the code will raise an error anyway.

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32096.
2014-06-26 15:52:45 +00:00
Gilles Gouaillardet
fae7adf8ee Remove legacy FCA_IS_LOCAL_PROCESS macro
and use OPAL_PROC_ON_LOCAL_NODE instead

cmr=v1.8.2:reviewer=rhc

This commit was SVN r32079.
2014-06-25 02:37:53 +00:00
Gilles Gouaillardet
e9ed9def02 Fix MPI_Alltoallv in coll/tuned
This changeset :
- always call the low/level implementation for :
  * MPI_Alltoallv
  * MPI_Neighbor_alltoallv
  * MPI_Alltoallw
  * MPI_Neighbor_alltoallv
- fix mca_coll_tuned_alltoallv_intra_basic_inplace
  so zero size types are correctly handled

cmr=v1.8.2:reviewer=bosilca:ticket=4715

This commit was SVN r32013.

The following Trac tickets were found above:
  Ticket 4715 --> https://svn.open-mpi.org/trac/ompi/ticket/4715
2014-06-17 06:11:34 +00:00
George Bosilca
542e4996a7 Cleanup the utilities functions in tuned.
This commit was SVN r31987.
2014-06-13 16:04:45 +00:00
Gilles Gouaillardet
50256c62c5 Fix MPI_Alltoallv in coll/tuned.
Correctly handle the corner case in MPI_Alltoallv when
some tasks have no data to transfer and some other tasks
do have data to transfer.

This test case is covered in ibm/collective/alltoallv_somezeros
from the ompi-tests repo.

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31985.
2014-06-13 06:03:23 +00:00
Rolf vandeVaart
489664b4a9 Remove debug code.
This commit was SVN r31901.
2014-05-28 16:06:16 +00:00
Rolf vandeVaart
570e313c9b Add collective module to handle CUDA aware buffers and reductions. Per RFC sent last week.
Reviewed by bosilca.

This commit was SVN r31894.
2014-05-27 21:24:43 +00:00
Mike Dubman
7b05b5c4c2 HCOLL: use proper parameter in progress unregister
fixed by Nadezhda, reviewed by Elena/MikeD

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31891.
2014-05-26 09:03:30 +00:00
Gilles Gouaillardet
baf532087a Fix mca_coll_basic_alltoallw_inter()
Avoid sending/receiving zero size messages in order to be compliant
with the top-level modification

cmr=v1.8.2:ticket=4651:reviewer=bosilca

This commit was SVN r31836.

The following Trac tickets were found above:
  Ticket 4651 --> https://svn.open-mpi.org/trac/ompi/ticket/4651
2014-05-20 09:22:57 +00:00
Gilles Gouaillardet
8bafe06c57 Fixes *alltoall* collectives at top level
This commit :
 - Correctly retrieve the communicator size when
   checking memory and parameters
 - Ensure (sendtype,sendcount) and (recvtype,recvcount)
   matches and return with MPI_ERR_TRUNCATE otherwise
 - Return with MPI_SUCCESS without invoking the low level
   if no data is going to be transferred
 - Fixes trac:4506

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31815.

The following Trac tickets were found above:
  Ticket 4506 --> https://svn.open-mpi.org/trac/ompi/ticket/4506
2014-05-19 07:46:07 +00:00
Mike Dubman
cadc1485ff HCOLL: register memory release hook to avoid races
fixed by Devender, reviewed by Miked

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31809.
2014-05-17 19:49:43 +00:00
Nathan Hjelm
c32d84154a coll/ml: fix leaks and close all the framework opened
It is essential to call mca_base_framework_close for every framework
that is opened. coll/ml was not doing this so neither bcol nor sbgp
were getting cleaned up. This commit fixes this omission.

Also fixed a leak caused by calling OBJ_DESTRUCT for something created
with OBJ_NEW. With these changes coll/ml appears to be valgrind clean.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31743.
2014-05-13 21:22:12 +00:00
Gilles Gouaillardet
e3df77548d Fix memory leak when releasing a communicator created by
MPI_Cart_Create/MPI_Graph_create/MPI_Dist_Graph

Fixes trac:4581

This commit was SVN r31716.

The following Trac tickets were found above:
  Ticket 4581 --> https://svn.open-mpi.org/trac/ompi/ticket/4581
2014-05-13 04:49:23 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
Devendar Bureddy
dfaac7d29d Do not call into hcoll progress after MPI_Finalize
Reviewed by Mike
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31639.
2014-05-05 22:46:39 +00:00
Jeff Squyres
64c1228b55 Roll back r31519 and r31521: George convinced us that these approaches
weren't right.

This commit was SVN r31528.

The following SVN revision numbers were found above:
  r31519 --> open-mpi/ompi@b449c750b7
  r31521 --> open-mpi/ompi@e243805ed8
2014-04-24 20:27:03 +00:00
Jeff Squyres
e243805ed8 coll tuned alltoallv: correctly handle 0-sized messages with MPI_IN_PLACE
Patch from Gilles Gouaillardet on #4517 to fix handling 0-sized
messages in coll tuned with MPI_ALLTOALLV and MPI_IN_PLACE.

Reviewed by Jeff Squyres.

Fixes trac:4517

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31521.

The following Trac tickets were found above:
  Ticket 4517 --> https://svn.open-mpi.org/trac/ompi/ticket/4517
2014-04-24 16:55:53 +00:00
Jeff Squyres
b449c750b7 coll basic: correctly handle alltoall[vw] 0-sized messages
Patch from Gilles Gouaillardet on #4506 to correctly handle 0-sized
messages in coll/basic MPI_Alltoallv and MPI_Alltoallw.

Reviewed by Jeff Squyres.

Fixes trac:4506.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31519.

The following Trac tickets were found above:
  Ticket 4506 --> https://svn.open-mpi.org/trac/ompi/ticket/4506
2014-04-24 16:25:43 +00:00
Jeff Squyres
e9b694f1d8 coll_base_comm_unselect.c: fix memory leaks
Ensure to also OBJ_RELEASE the neightbor and ineighbor modules.

Fixes trac:4444 (this patch is from that ticket).

This commit was SVN r31516.

The following Trac tickets were found above:
  Ticket 4444 --> https://svn.open-mpi.org/trac/ompi/ticket/4444
2014-04-24 15:53:06 +00:00
Mike Dubman
a4990de055 mca: track external lib version (runtime/compiletime) for mca component
based on thread: http://www.open-mpi.org/community/lists/devel/2014/04/14505.php

Create mca parameter to track runtime/compiletime ext lib version for component.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31487.
2014-04-22 18:02:26 +00:00
George Bosilca
6a65d27bcc Print the 3rd buffer for the MPI_Op.
This commit was SVN r31471.
2014-04-21 23:29:30 +00:00
Nathan Hjelm
e125bbe347 coll/ml: clean out apparently stale code
The file coll_ml_ibarrier.c wasn't included in coll/ml's Makefile.am
and the setup code from coll_ml_hier_algorithms_ibarrier.c was not
being called. It looks like this code is stale and has long since been
replaced by the code in coll_ml_barrier.c

Once all these little CMRs are approved I may make it into one roll-up
CMR to make it easier on the RM.

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31418.
2014-04-16 22:43:43 +00:00
Nathan Hjelm
484a3f6147 coll/ml: fix issues identified by the clang static analyser and fix
a segmentation fault in the reduce cleanup

Some of the changes address false warnings produced by scan-build. I
added asserts and changed some malloc calls to calloc to silence these
warnings.

The was one issue in cleanup for reduce since the component_functions
member is changed by the allreduce call. There may be other issues
with how this code works but releasing the allocated
component_functions after setting up the static functions addresses
the primary issue (SIGSEGV).

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31417.
2014-04-16 22:43:35 +00:00
Jeff Squyres
6521dcc4f1 Trivial defensive programming/style update: use {}, even for 1-line blocks.
This commit was SVN r31361.
2014-04-09 16:28:31 +00:00
George Bosilca
95a4f219ea This commit fixes some of the Coverity reported warnings. I addressed
some of the collective modules, the shared memory and the profiling
interface. I left out VT, dynamic fcoll and seq rmaps.

cmr=v1.8.1:reviewer=jsquyres:subject=silence Coverity reported warnings

This commit was SVN r31309.
2014-04-06 18:23:49 +00:00
Nathan Hjelm
71bdb8c439 coll/ml: fix some warnings identified by clang
cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31285.
2014-03-28 22:31:41 +00:00
Nathan Hjelm
459431622b Revert "coll/ml: there is no reason not to enable coll/ml when a process in not"
Discussed this with Manju and we decided to back this one out until a later time.

This reverts commit r31188 and closes trac:4435

This commit was SVN r31282.

The following SVN revision numbers were found above:
  r31188 --> open-mpi/ompi@f1dd589092

The following Trac tickets were found above:
  Ticket 4435 --> https://svn.open-mpi.org/trac/ompi/ticket/4435
2014-03-28 21:16:34 +00:00
Manjunath Gorentla Venkata
28609d3ac2 Clean wanring in sbgp and coll ml
This commit was SVN r31280.
2014-03-28 19:53:36 +00:00
Manjunath Gorentla Venkata
8c849ee991 coll/ml : Replace longer error message with opal_show_help; thanks Jeff for identifying those
This commit was SVN r31279.
2014-03-28 19:25:54 +00:00
Nathan Hjelm
a9fb4976d5 coll/ml: more fixes
There were a couple of issues with the memory leak fixes and several more verbose
issues. This fixes those issues.

cmr=v1.8.1:ticket=trac:4473

This commit was SVN r31273.

The following Trac tickets were found above:
  Ticket 4473 --> https://svn.open-mpi.org/trac/ompi/ticket/4473
2014-03-28 18:31:28 +00:00
Nathan Hjelm
bd3b550c6d coll/ml: fix leaks
Thanks to ggouaillardet for finding and fixing these issues.

Closes trac:4460

cmr=v1.8.1:reviewer=manjugv

This commit was SVN r31264.

The following Trac tickets were found above:
  Ticket 4460 --> https://svn.open-mpi.org/trac/ompi/ticket/4460
2014-03-27 23:25:31 +00:00
Nathan Hjelm
0cccb2fb59 coll/ml: reduce noise from coll/ml error messages
The error doesn't prevent the user from running so there is no reason
to display it unless the user requested it (through coll_ml_verbose).

cmr=v1.8:reviewer=jsquyres

This commit was SVN r31242.
2014-03-26 22:50:06 +00:00
Nathan Hjelm
15a8c9d7b8 coll/ml: addendum to r31189. increment the bcol_index
cmr=v1.8:ticket=trac:4436

This commit was SVN r31193.

The following SVN revision numbers were found above:
  r31189 --> open-mpi/ompi@c7d830f4b9

The following Trac tickets were found above:
  Ticket 4436 --> https://svn.open-mpi.org/trac/ompi/ticket/4436
2014-03-21 22:03:56 +00:00
Nathan Hjelm
c7d830f4b9 coll/ml: improve the buffer size calculation and ensure the bcol_index in
a hierarchy actually matches a bcol that is in use.

There was a bug in one of the paths to calculate the ml buffer size. I fixed
the bug and squashed all the paths together to avoid further issues (the
result was correct in another path that calculated the same value).

Additionally, the i_hier was being used as the bcol_index. This is not
correct in a couple of cases so I added a variable to keep track of the
real bcol_index.

cmr=v1.8:reviewer=pasha

This commit was SVN r31189.
2014-03-21 21:54:28 +00:00
Nathan Hjelm
f1dd589092 coll/ml: there is no reason not to enable coll/ml when a process in not
bound.

This case is correctly handled by coll/ml so remove the check that diables
coll/ml in the not bound case.

cmr=v1.8:reviewer=manjugv

This commit was SVN r31188.
2014-03-21 21:54:21 +00:00
Nathan Hjelm
08bbdcbf61 coll/ml: fix leaks in coll/ml resources
This patch fixes two leaks:

 - Fix typo in fallback collective code that caused coll/ml to retain
   the ibcast module twice but only release it once. One of those ibcast
   saves was supposed to be bcast.

 - Do not check for module initialization in the module destructor. It
   is possible to destruct a module that is partially setup.

cmr=v1.8:reviewer=manjugv

This commit was SVN r31187.
2014-03-21 21:54:14 +00:00