1
1

5963 Коммитов

Автор SHA1 Сообщение Дата
Artem Polyakov
fc17deca43 Fix NBC iBarrier for inter-communicators.
Remove send of the extra message. This bug hase triggered on
MPICH/coll/nbicbarrier test. In this test a series of communicators
are created.
This extre-message was reseived after original communicator was destroyed
and queued into non_existing_communicator_pending. When new completely
unrelated communicator with the same id as original was created this message
was pushed into the frags_cant_match queue and caused seq numbers skew and hang.
2015-12-12 13:27:31 +06:00
Gilles Gouaillardet
3a3b13ea12 coll/base: fix an integer overflow in ompi_coll_base_reduce_generic
Refs open-mpi/ompi#1198
2015-12-11 13:55:59 +09:00
Alina Sklarevich
3ffd8dcd20 PML UCX: fix typo (following 7becc54d). 2015-12-10 13:51:10 +02:00
Nathan Hjelm
dae3746d2f Merge pull request #1190 from kawashima-fj/pr/sm-win-test-fix
osc/sm: Fix a bug that `MPI_WIN_TEST` does not update `flag` to 0
2015-12-08 06:39:16 -07:00
KAWASHIMA Takahiro
9c7b6a4352 osc/sm: Fix a bug that MPI_WIN_TEST does not update flag to 0.
`MPI_WIN_TEST` must update the `flag` parameter to 0 when not all
origin processes called `MPI_WIN_COMPLETE`. But sm OSC doesn't.
If the caller initialize the `flag` argument to a non-0 value,
the caller will receive the non-0 `flag` value.
2015-12-08 19:23:21 +09:00
Gilles Gouaillardet
59a361b781 ompio: correctly handle zero f_cc_size in mca_io_ompio_simple_grouping 2015-12-08 17:00:11 +09:00
Nathan Hjelm
f68c315188 pml/ob1: add missing ompi_request_wait_completion for buffered sends
This commit adds a call to ompi_request_wait_completion for buffered
sends. Without this line it is possible to get into a state where the
data is never sent.

Fixes open-mpi/ompi#1185

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-07 22:28:07 -07:00
Gilles Gouaillardet
bfe8e03d9d fcoll/two_phase: use ompi_mpi_abort instead of PMPI_Abort
Thanks Jeff for the review
2015-12-07 11:34:36 +09:00
Gilles Gouaillardet
37c978f5e9 coll/libnbc: correctly handle changed types.
this fixes open-mpi/ompi@d816d1c194
thanks Jeff for the review
2015-12-07 10:13:43 +09:00
George Bosilca
3a9664ac9d Fix Coverity CIDs 1341584-1341589. 2015-12-06 14:06:36 -05:00
bosilca
8fee96c086 Merge pull request #1091 from bosilca/topic/datatype_span
Cleanup the temporary memory allocation in collectives
2015-12-03 19:25:04 -05:00
Gilles Gouaillardet
a5440ade5f topo/treematch: do not invoke hwloc_topology_{init,load}
* this is not necessary
 * this overwrites existing topology, that could be different if hwloc_base_topo_file is used
2015-12-03 11:24:32 +09:00
George Bosilca
688108cf7f Patch submitted by @ggouaillardet on ticket #1091. 2015-12-02 20:42:18 -05:00
George Bosilca
4d00c59b2e Cleanup the memory handling for temporary buffers in
some of the collective modules. Added a new function
opan_datatype_span, to compute the memory span of
count number of datatype, excluding the gaps in the
beginning and at the end. If a memory allocation is
made using the returned value, the gap (also returned)
should be removed from the allocated pointer.
2015-12-02 20:42:18 -05:00
Jeff Squyres
15325c8094 op/x86: change the owner to Ralph
Cisco no longer cares about this component, but Intel might.
Transferring ownership to Ralph.
2015-12-01 15:08:07 -08:00
Nathan Hjelm
5334d22a37 ompi/group: release ompi_proc_t's at group destruction
This commit changes the way ompi_proc_t's are retained/released by
ompi_group_t's. Before this change ompi_proc_t's were retained once
for the group and then once for each retain of a group. This method
adds unnecessary overhead (need to traverse the group list each time
the group is retained) and causes problems when using an async
add_procs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-30 23:03:47 -07:00
Ryan Grant
324534b191 Merge pull request #1161 from tkordenbrock/topic/add.triggered.scatter
coll-portals4: add scatter and iscatter implementations that use Portals4 triggered operations
2015-11-30 16:53:47 -07:00
Todd Kordenbrock
4721b70dd5 coll-portals4: add scatter and iscatter implementations that use Portals4 triggered operations
This commit adds implementations of scatter and iscatter using
Portals4 triggered operations.  Currently, the only algorithm
is linear.
2015-11-30 15:07:18 -06:00
Todd Kordenbrock
f6f525e0d8 coll-portals4: remove unneeded code from gather
This commit removes two pieces of unneeded code from gather.  First
it removes destroy_tree() calls from linear_top(), because the
linear algorithm does not create a tree, so there is no need to
destroy it.  Second it removes unpack_bytes from the gather request
because it was calculated but never used.
2015-11-30 10:38:51 -06:00
Gilles Gouaillardet
80f02518ff topo/base: correctly free the topo object in mca_topo_base_dist_graph_create_adjacent 2015-11-30 15:33:59 +09:00
Ryan Grant
81d482dca6 Merge pull request #1137 from francois-wellenreiter/trig_mtl_rdv
MTL portals4 : improve the rendez-vous protocol using PtlTriggeredGet…
2015-11-24 17:31:31 -07:00
Ryan Grant
219581e87e Merge pull request #1090 from tkordenbrock/topic/check.for.invalid.handles.in.finalize
mtl-portals4: test for valid handle before releasing resources
2015-11-20 07:54:44 -06:00
Mike Dubman
c544620a7c Merge pull request #1138 from igor-ivanov/pr/yalla-valgrind
yalla: fix valgrind error due to uninitialized status field.
2015-11-20 07:19:11 -05:00
Gilles Gouaillardet
002c7b8b3a fcoll/two_phase: use PMPI_* insted of MPI_* 2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
561e7f6647 vprotocol/pessimist: use internal ompi_* insted of MPI_* 2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
025fd8a9fc osc: use PMPI_* insted of MPI_* 2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
d816d1c194 coll/libnbc: use PMPI_* and internal ompi_* insted of MPI_* 2015-11-20 13:46:19 +09:00
yosefe
3bb1270715 yalla: fix valgrind error due to uninitialized status field. 2015-11-19 10:59:31 +02:00
Francois WELLENREITER
9126ea5e82 MTL portals4 : improve the rendez-vous protocol using PtlTriggeredGet operation 2015-11-19 09:52:53 +01:00
Edgar Gabriel
9e5ade4e8b argh, a debugging sleep statement got into the source code. 2015-11-16 13:26:57 -06:00
Edgar Gabriel
dbfbcdecd5 make adjustments for the default settings of grouping parameters and the default contiguous group size option.
minor bug fix in the simple grouping strategy.
2015-11-16 08:17:27 -06:00
Edgar Gabriel
27628774c7 add a new option for a simple aggregator selection which has zero communication costs. 2015-11-16 08:17:26 -06:00
Edgar Gabriel
66c1ea5fcb change the default value of the grouping option. Also add new grouping option which skips the refinement step in the aggregator selection. 2015-11-16 08:17:23 -06:00
Edgar Gabriel
e8e117503d reduce the communication volume during MPI_File_set_view 2015-11-16 08:17:22 -06:00
yohann
005400a937 mtl/ofi: Make sure the resources are managed by the provider. 2015-11-13 16:16:58 -08:00
Nathan Hjelm
9ef0821856 osc/rdma: fix some threading bugs
There were two bugs in osc/rdma when using threads:

 - Deadlock is ompi_osc_rdma_start_atomic. This occurs because
   ompi_osc_rdma_frag_alloc is called with the module lock. To fix the
   issue the module lock is now recursive. In the future I will add a
   new lock to protect just the current rdma fragment.

 - Do not drop the lock in ompi_osc_rdma_frag_alloc when calling
   ompi_osc_rdma_frag_complete. Not only is it not needed but dropping
   the lock at this point can cause a competing thread to mess up the
   state.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-12 20:25:57 -07:00
Yossi
b750b72a81 Merge pull request #1127 from yosefe/topic/pml-ucx-implement-cancel
pml_ucx: implement cancel, and add small optimizations.
2015-11-12 10:50:48 +02:00
yosefe
7becc54d67 pml_ucx: fix typo. 2015-11-12 09:57:41 +02:00
Todd Kordenbrock
b9630f802b Merge pull request #1120 from francois-wellenreiter/mtl_min_mdbind
mtl-portals4 : remove useless PtlMDBind PtlMDRelease calls for rendez-vous messages
2015-11-10 14:34:19 -06:00
yosefe
d66b01d380 pml_ucx: implement cancel, and add small optimizations. 2015-11-10 17:40:06 +02:00
Gilles Gouaillardet
d6ff25b9a2 pml/monitoring: initialize common symbols 2015-11-10 13:58:54 +09:00
Jeff Squyres
e89ecac83c bml r2: fix exclusivity comparison
Fixes open-mpi/ompi#1106
2015-11-06 13:26:32 -08:00
Francois WELLENREITER
b301b49a40 MTL portals4 : remove useless PtlMDBind PtlMDRelease calls for rendez-vous messages 2015-11-06 15:55:44 +01:00
yosefe
45c3d04857 pml_ucx: fix request construct/destruct.
We should invoke OBJ_CONTRUCT/OBJ_DESTRUCT only on regular requests
(which are embedded inside UCX requests) and for the completed request.
Persistent requests are already constructed/destructed by the free list.
This fixes an assertion in ompi_request_destruct.
2015-11-04 11:03:46 +02:00
Todd Kordenbrock
cefe50cf54 mtl-portals4: test for valid handle before releasing resources
During component finalize, mtl-portals4 would blindly release
resources without testing if the handle was valid.  This was OK,
but resource allocation is now delayed until add_procs().  If
mtl-portals4 is deselected, it will be finalized without
add_procs() ever being called.  This commit ensures that invalid
handles are not released.
2015-11-02 21:01:14 -06:00
George Bosilca
5c60e76669 Fix Coverity CIDs 1338021, 1338020, 1338019, 1338018. 2015-11-02 17:38:51 -05:00
George Bosilca
b77c203068 Add more comments and restore the progress, flags, max tag, and max
context_id from the original PML.
2015-10-31 17:13:35 -04:00
George Bosilca
3efd494972 Make sure the monitoring infrastructure works well with the
new dynamic add_procs.
2015-10-31 17:13:35 -04:00
Guillaume Papauré
86714ad91e change pml_monitoring_messages_count and pml_monitoring_messages_size pvars to use the start/stop features 2015-10-31 17:13:35 -04:00
George Bosilca
a43c2ce529 Fully integrate the monitoring with the MPI_T PVAR.
Writing to the pml_monitoring_flush variable will set the filename of
the output file.
Stopping a session for the pml_monitoring_flush will force the
generation of the nobitoring output file (as long as the filename
is not NULL).
To reset the monitoring, une has to bind the pml_monitoring_flush to a
session.
2015-10-31 17:13:35 -04:00