1
1
Граф коммитов

8857 Коммитов

Автор SHA1 Сообщение Дата
Mike Dubman
c544620a7c Merge pull request #1138 from igor-ivanov/pr/yalla-valgrind
yalla: fix valgrind error due to uninitialized status field.
2015-11-20 07:19:11 -05:00
Gilles Gouaillardet
002c7b8b3a fcoll/two_phase: use PMPI_* insted of MPI_* 2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
561e7f6647 vprotocol/pessimist: use internal ompi_* insted of MPI_* 2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
025fd8a9fc osc: use PMPI_* insted of MPI_* 2015-11-20 13:46:19 +09:00
Gilles Gouaillardet
d816d1c194 coll/libnbc: use PMPI_* and internal ompi_* insted of MPI_* 2015-11-20 13:46:19 +09:00
yosefe
3bb1270715 yalla: fix valgrind error due to uninitialized status field. 2015-11-19 10:59:31 +02:00
Francois WELLENREITER
9126ea5e82 MTL portals4 : improve the rendez-vous protocol using PtlTriggeredGet operation 2015-11-19 09:52:53 +01:00
Edgar Gabriel
9e5ade4e8b argh, a debugging sleep statement got into the source code. 2015-11-16 13:26:57 -06:00
Edgar Gabriel
dbfbcdecd5 make adjustments for the default settings of grouping parameters and the default contiguous group size option.
minor bug fix in the simple grouping strategy.
2015-11-16 08:17:27 -06:00
Edgar Gabriel
27628774c7 add a new option for a simple aggregator selection which has zero communication costs. 2015-11-16 08:17:26 -06:00
Edgar Gabriel
66c1ea5fcb change the default value of the grouping option. Also add new grouping option which skips the refinement step in the aggregator selection. 2015-11-16 08:17:23 -06:00
Edgar Gabriel
e8e117503d reduce the communication volume during MPI_File_set_view 2015-11-16 08:17:22 -06:00
yohann
005400a937 mtl/ofi: Make sure the resources are managed by the provider. 2015-11-13 16:16:58 -08:00
Nathan Hjelm
9ef0821856 osc/rdma: fix some threading bugs
There were two bugs in osc/rdma when using threads:

 - Deadlock is ompi_osc_rdma_start_atomic. This occurs because
   ompi_osc_rdma_frag_alloc is called with the module lock. To fix the
   issue the module lock is now recursive. In the future I will add a
   new lock to protect just the current rdma fragment.

 - Do not drop the lock in ompi_osc_rdma_frag_alloc when calling
   ompi_osc_rdma_frag_complete. Not only is it not needed but dropping
   the lock at this point can cause a competing thread to mess up the
   state.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-11-12 20:25:57 -07:00
Yossi
b750b72a81 Merge pull request #1127 from yosefe/topic/pml-ucx-implement-cancel
pml_ucx: implement cancel, and add small optimizations.
2015-11-12 10:50:48 +02:00
yosefe
7becc54d67 pml_ucx: fix typo. 2015-11-12 09:57:41 +02:00
Todd Kordenbrock
b9630f802b Merge pull request #1120 from francois-wellenreiter/mtl_min_mdbind
mtl-portals4 : remove useless PtlMDBind PtlMDRelease calls for rendez-vous messages
2015-11-10 14:34:19 -06:00
yosefe
d66b01d380 pml_ucx: implement cancel, and add small optimizations. 2015-11-10 17:40:06 +02:00
Gilles Gouaillardet
d6ff25b9a2 pml/monitoring: initialize common symbols 2015-11-10 13:58:54 +09:00
Jeff Squyres
e89ecac83c bml r2: fix exclusivity comparison
Fixes open-mpi/ompi#1106
2015-11-06 13:26:32 -08:00
Francois WELLENREITER
b301b49a40 MTL portals4 : remove useless PtlMDBind PtlMDRelease calls for rendez-vous messages 2015-11-06 15:55:44 +01:00
Ralph Castain
bfdf08ae86 Fix intercomm_create by ensuring that both sides know how to translate jobid to/from nspace
Return something just to ensure that pack is happy
2015-11-06 02:19:45 -08:00
Nathan Hjelm
fda5daf453 Merge pull request #1096 from kawashima-fj/pr/fortran-var-type-fix
Fix Fortran variable types
2015-11-05 14:27:40 -07:00
Nathan Hjelm
acf3cb9b9b Merge pull request #1095 from kawashima-fj/pr/trivial-fixes
Some trivial fixes
2015-11-04 09:45:59 -07:00
yosefe
45c3d04857 pml_ucx: fix request construct/destruct.
We should invoke OBJ_CONTRUCT/OBJ_DESTRUCT only on regular requests
(which are embedded inside UCX requests) and for the completed request.
Persistent requests are already constructed/destructed by the free list.
This fixes an assertion in ompi_request_destruct.
2015-11-04 11:03:46 +02:00
KAWASHIMA Takahiro
c09f9f05d3 mpi/tool: Fix an incorrect type cast.
This bug caused an invalid result value on `MPI_T_cvar_read`
on big-endian machines or for large (>=2Gi) cvar values.
2015-11-04 11:28:43 +09:00
KAWASHIMA Takahiro
384f4b51d1 fortran: Fix: missing dimension(*) in (I)NEIGHBOR_ALLTOALLW. 2015-11-04 10:38:25 +09:00
KAWASHIMA Takahiro
1092eabfab fortran: Update comment.
The structure was changed in commit 9c77c6b.
2015-11-04 10:38:25 +09:00
KAWASHIMA Takahiro
107c0073dd fortran: Fix: MPI_UNWEIGHTED and MPI_WEIGHTS_EMPTY should be arrays.
Without this modification, gfortran throw the following error
if these variables are used for `MPI_DIST_GRAPH_CREATE_ADJACENT` or
`MPI_DIST_GRAPH_CREATE_ADJACENT`.

  Error: There is no specific subroutine for the generic
  'mpi_dist_graph_create_adjacent' at (1)
2015-11-04 10:38:25 +09:00
KAWASHIMA Takahiro
d5e1f40a1e fortran: Fix: info should be an integer parameter. 2015-11-04 10:38:24 +09:00
KAWASHIMA Takahiro
9bf93810d7 fortran: Fix: array dimension of MPI_ARGVS_NULL.
`MPI_ARGVS_NULL` should be a two-dimensional array.
Without this modification, gfortran throw the following error
if `MPI_ARGVS_NULL` is used for `MPI_COMM_SPAWN_MULTIPLE`.

  Error: There is no specific subroutine for the generic
  'mpi_comm_spawn_multiple' at (1)
2015-11-04 10:38:24 +09:00
George Bosilca
b14212f142 Fix Coverity issue 1338059. 2015-11-02 22:51:52 -05:00
Todd Kordenbrock
cefe50cf54 mtl-portals4: test for valid handle before releasing resources
During component finalize, mtl-portals4 would blindly release
resources without testing if the handle was valid.  This was OK,
but resource allocation is now delayed until add_procs().  If
mtl-portals4 is deselected, it will be finalized without
add_procs() ever being called.  This commit ensures that invalid
handles are not released.
2015-11-02 21:01:14 -06:00
George Bosilca
5c60e76669 Fix Coverity CIDs 1338021, 1338020, 1338019, 1338018. 2015-11-02 17:38:51 -05:00
bosilca
f1a5362f94 Merge pull request #1072 from bosilca/topic/resized
Fix for the subarray and darray type creation issue.
2015-11-01 21:17:03 -05:00
George Bosilca
b77c203068 Add more comments and restore the progress, flags, max tag, and max
context_id from the original PML.
2015-10-31 17:13:35 -04:00
George Bosilca
3efd494972 Make sure the monitoring infrastructure works well with the
new dynamic add_procs.
2015-10-31 17:13:35 -04:00
Guillaume Papauré
86714ad91e change pml_monitoring_messages_count and pml_monitoring_messages_size pvars to use the start/stop features 2015-10-31 17:13:35 -04:00
George Bosilca
a43c2ce529 Fully integrate the monitoring with the MPI_T PVAR.
Writing to the pml_monitoring_flush variable will set the filename of
the output file.
Stopping a session for the pml_monitoring_flush will force the
generation of the nobitoring output file (as long as the filename
is not NULL).
To reset the monitoring, une has to bind the pml_monitoring_flush to a
session.
2015-10-31 17:13:35 -04:00
George Bosilca
646a662721 Use the new group interface and add const to the PML send functions. 2015-10-31 17:13:35 -04:00
George Bosilca
5224a7ce4d Allow the pvar to be written by invoking the associated callback.
Use a PVAR to generate the monitoring dump of the information into a
file.

Use the PVAR to instruct the PML monitoring when to do the dump.
2015-10-31 17:13:35 -04:00
George Bosilca
df167f4177 Rewrite the close logic to be more clean and cleaner. 2015-10-31 17:13:35 -04:00
George Bosilca
c801ffde86 Use MPI_T variables to handle the flush in a more MPI-blessed way.
Code cleanup.

Update the monitoring test to use MPI_T variables.
2015-10-31 17:13:35 -04:00
George Bosilca
4f88c82500 Fix a convertion problem and add a comment about the lack of component
retain in the new component infrastructure.

Clean Makefile.am to fix "make distcheck".

Update the gitignore rules.
2015-10-31 17:13:35 -04:00
George Bosilca
80343a0d39 add ability to querry pml monitorinting results with MPI Tools interface
using performance variables "pml_monitoring_messages_count" and
"pml_monitoring_messages_size"

Per Brice suggestion make all data count and message length be
uint64_t.
2015-10-31 17:13:35 -04:00
George Bosilca
a47d69202f Add a monitoring PML. This PML track all data exchanges by the processes
counting or not the collective traffic as a separate entity. The need
for such a PML is simply because the PMPI interface doesn't allow us to
identify the collective generated traffic.
2015-10-31 17:13:35 -04:00
Rolf vandeVaart
578385ca78 Merge pull request #1079 from rolfv/pr/cuda-require-41
Make CUDA 4.1 a requirement for CUDA-aware support
2015-10-29 12:56:22 -04:00
Nathan Hjelm
b1e3936261 Merge pull request #1078 from rolfv/pr/disable-osc-rdma-for-cuda
Disable the use of osc rdma when we detect a GPU buffer
2015-10-29 10:03:28 -06:00
Rolf vandeVaart
f2ff6e03ab Make CUDA 4.1 a requirement for CUDA-aware support.
Remove all related preprocessor conditionals.
2015-10-29 11:24:02 -04:00
Matias Cabral
8ebcac1b2c Merge pull request #1075 from matcabral/psm2_symbol_rename
Updated psm2 mtl with new externally exposed symbols of psm2.so 
Fixes open-mpi/ompi#1018
Fixes open-mpi/ompi#1021
2015-10-28 13:55:45 -07:00
Rolf vandeVaart
87a4cc6118 Disable the use of osc rdma when we detect a GPU buffer as it is not supported in that component.
This forces a failover to the osc pt2pt component. Fixes #1012
2015-10-28 14:47:45 -04:00
yosefe
ae738d0434 pml_ucx: add pmi fence in del_procs 2015-10-28 18:34:36 +02:00
Matias A Cabral
ed16d8e1cc Updated psm2 mtl with new externally exposed symbols of psm2.so
Fixes open-mpi/ompi#1018
Fixes open-mpi/ompi#1021
2015-10-28 09:12:33 -07:00
yosefe
41b6230be3 pml_ucx: fix debug macros, and initialize mpi request properly. 2015-10-28 10:59:25 +02:00
Ralph Castain
267ca8fcd3 Cleanup the PMIx direct modex support. Add an MCA parameter pmix_base_async_modex that will cause the async modex to be used when set to 1. Default it to 0 for now
to continue current default behavior.

Also add an MCA param pmix_base_collect_data to direct that the blocking fence shall return all data to each process. Obviously, this param has no effect if async_
modex is used.
2015-10-27 17:31:56 -07:00
yohann
8bf1c95cdc mtl/ofi: Remove unused help messages. 2015-10-27 09:38:04 -07:00
Nathan Hjelm
69d403d42b Merge pull request #1054 from hjelmn/add_procs_threading
add_procs: add threading protection for dynamic add_procs
2015-10-27 09:27:13 -06:00
George Bosilca
679dc9b437 Fix the subarray and darray type creation. Include a
small patch provided by Gilles.
2015-10-26 23:44:26 -04:00
yohann
a111d66f0f mtl/ofi: Change hints to FI_PROGRESS_MANUAL. 2015-10-26 15:32:30 -07:00
yohann
fde8b89ceb mtl/ofi: Use OFI's representation of ANY_SRC instead of NULL. 2015-10-26 14:38:41 -07:00
yohann
4246de4508 mtl/ofi: Treat error correctly. 2015-10-26 14:38:33 -07:00
George Bosilca
2622b9d3a1 Fix minor issues in the treematch topo
based on a patch provided by Guillaume.
2015-10-25 21:38:59 -04:00
Gilles Gouaillardet
1105634ca1 mpi_f08: fix MPI_WIN_{ATTACH,DETACH} bindings
fixes INTENT from open-mpi/ompi@9600e2bc63

Thanks Jeff for pointing this !
2015-10-26 10:02:45 +09:00
George Bosilca
4ac247b1da Minor updated on the validity checks for the alltoall collectives. 2015-10-24 15:25:28 -04:00
Jeff Squyres
140cf90e3e osc_rdma: minor compiler warning stomp 2015-10-23 06:21:56 -07:00
Ralph Castain
4c12022a50 Silence a couple of warnings from valgrind and compilers. Since some pmix components may return success with a NULL value from a "get", check for that situation before attempting to unload the data. Preset the hostname before calling modex_recv to get it so unload properly checks for NULL. Cast a returned value to the correct ompi_proc_t pointer 2015-10-22 20:56:02 -07:00
Nathan Hjelm
9dad35b467 Merge pull request #1061 from hjelmn/osc_fixes
one-sided fixes
2015-10-22 18:23:19 -06:00
Nathan Hjelm
63e744ffc6 osc/rdma: use only a single btl registration for local state
This commit fixes a bug that can occur on Cray Gemini networks. If
multiple registrations are used for the local state then we looks the
atomicity guarantees. To avoid issues like this use only a single
registration handle for all local state on a node.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-22 15:51:19 -06:00
Nathan Hjelm
f690fc8fd5 osc/pt2pt: fix warnings
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-22 15:50:40 -06:00
Nathan Hjelm
e716866e0c Merge pull request #1057 from hjelmn/binding_fix
win: fix erroneous argument check
2015-10-22 15:15:43 -06:00
Jeff Squyres
86270e7613 MPI_File_open: add note about allowable chars in filenames
Thanks to @nasailja for the original text suggestion.
2015-10-22 11:56:53 -07:00
Nathan Hjelm
e4219aa692 Merge pull request #1059 from hjelmn/osc_fixes
osc/rdma: bug fixes
2015-10-22 11:25:51 -06:00
Nathan Hjelm
97c9732bad osc/rdma: bug fixes
This commit fixes the following:

 - CIDs 1328491, 1328492: Dead code caused by typos in a prior
   commit.

 - Fix the calculation of dynamic memory regions. This was causes
   incorrect RMA range errors when accessing the last partial page of
   an attachment.

 - Fix a SEGV when using dynamic memory windows with local state (all
   processes on the same node).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-22 09:49:38 -06:00
yohann
889c76634e mtl/ofi: Increase priority. 2015-10-22 08:39:36 -07:00
Nathan Hjelm
6ae57647ab win: fix erroneous argument check
When using dynamic memory windows the displacement becomes a
pointer. Since the high bit may be set on valid pointers on some
platforms the check for disp > 0 is invalid. This commit adds the
window flavor to ompi_win_t and disables the displacement check when
operating on dynamic memory windows.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-22 09:33:26 -06:00
Howard Pritchard
ce8e241922 Merge pull request #1055 from nrgraham23/java_warnings_fix
Fix Java related warnings
2015-10-22 08:17:45 -06:00
Nathan Hjelm
b2fa2a9bef Merge pull request #1056 from hjelmn/osc_fixes
osc/pt2pt: reset all_sync sync object before sending complete messages
2015-10-21 19:40:28 -06:00
Nathan Hjelm
864f88a2a3 osc/pt2pt: reset all_sync sync object before sending complete messages
This commit fixes a bug that occurs when a post message comes in when
sending complete messages or while waiting for all outgoing messages
to flush. In that case the post message might get incorrecly
associated with the ending sync object.

References open-mpi/ompi#1012

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-21 18:30:08 -06:00
Nathaniel Graham
c4d70ab425 Fix Java related warnings
This commit fixes java related warnings.

Fixes #881

Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2015-10-21 17:14:25 -07:00
Nathan Hjelm
08e267b811 add_procs: add threading protection for dynamic add_procs
This commit add protection to the group, ob1, and bml endpoint lookup
code. For ob1 and the bml a lock has been added. For performance
reasons the lock is only held if a bml or ob1 endpoint does not
exist. ompi_group_dense_lookup no uses opal_atomic_cmpset to ensure
the proc is only retained by the thread that actually updates the
group.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-21 16:13:41 -06:00
Nathan Hjelm
386991d590 Merge pull request #1052 from hjelmn/osc_rdma_fixes
osc/rdma: use standard verbosity levels
2015-10-21 13:21:50 -06:00
Nathan Hjelm
9476c7bbca osc/rdma: use standard verbosity levels
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-21 12:31:41 -06:00
yohann
abe5002ee9 mtl/ofi: remove threading and progress hints. 2015-10-21 10:25:08 -07:00
yosefe
cc76db8d39 ucx: reduce components priority to 5. 2015-10-21 17:38:25 +03:00
Gilles Gouaillardet
a0782e1c7e mpi: MPI_Neighbor_all* and MPI_Ineighbor_all* do not work with
inter communicators (fail with MPI_ERR_COMM) or non process topologies
communicators (fail with MPI_ERR_TOPOLOGY)
2015-10-21 16:21:19 +09:00
Mike Dubman
4ea13f10f6 Merge pull request #1008 from alex-mikheev/topic/ucx_support
UCX support for ompi and oshmem
2015-10-21 09:33:33 +03:00
Gilles Gouaillardet
3b0b929883 ompi: MPI_IN_PLACE is not a valid argument of MPI_Neighbor_all* and MPI_Ineighbor_all* 2015-10-21 14:46:35 +09:00
Gilles Gouaillardet
256976a108 mpi: MPI_IN_PLACE is not a valid argument of MPI_All* and MPI_Iall* with an inter communicator 2015-10-21 14:46:28 +09:00
Gilles Gouaillardet
9b31530d5c man: misc fix of revamp of MPI_File_* and MPI_Register_datarep man pages
that fixes commit open-mpi/ompi@b17c89c1e6 :
- remove unnecessary empty lines
- add USE mpi in MPI_Register_datarep

Thanks Jeff for noticing this
2015-10-21 09:41:01 +09:00
Nathan Hjelm
763744a32c Merge pull request #1046 from hjelmn/osc_rdma_fixes
osc/rdma: bug fixes
2015-10-20 16:44:38 -06:00
Nathan Hjelm
b8ee05d352 osc/rdma: bug fixes
This commit fixes several bugs in the osc/rdma component:

 - Complete aggregated requests immediately. Completion of RMA
   requests indicates local completion anyway. This fixes a hang in
   the c_reqops test.

 - Correctly mark Rget_accumulate requests.

 - Set the local base flag correctly on the local peer.

 - Clear or set the no locks flag on the window if the value is
   changed by MPI_Win_set_info.

 - Actually update the target when using MPI_OP_REPLACE.

Fixes open-mpi/ompi#1010

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-20 15:27:15 -06:00
Ryan Grant
f60c506c68 Merge pull request #999 from tkordenbrock/topic/add.triggered.gather
coll-portals4: add gather and igather implementations that use Portals4 triggered operations
2015-10-20 14:59:09 -06:00
yosefe
a313588337 ompi: Add UCX PML. 2015-10-20 19:46:06 +03:00
yosefe
502dc8aaa4 add pml-specific field in OMPI datatype.
PML UCX will use it to cache a handle for UCX datatype.
2015-10-20 19:46:06 +03:00
Jeff Squyres
630d6bf800 Merge pull request #1038 from kawashima-fj/pr/man-correction
man: Various manpage corrections
2015-10-20 06:40:05 -04:00
KAWASHIMA Takahiro
7ab464fbb4 Revert "man: Remove unnecessary spaces in front of parameters."
This reverts commit 3253a30ab2.

Because Gilles' b17c89c1 committed a few hours ago has the same change,
my RP branch had a conflict.
2015-10-20 15:32:45 +09:00
KAWASHIMA Takahiro
373a94a3f1 man: Revert my MPI_File_iwrite_shared manpage change.
This reverts commit 2226cdb3da
and                 d9c93c9f5d.

Because Gilles' b17c89c1 committed a few hours ago has the same change,
my RP branch had a conflict.
2015-10-20 14:41:22 +09:00
Gilles Gouaillardet
2bd77ed4f9 mpi: fail with MPI_ERR_INTERN if MPI_IN_PLACE is used with MPI_I*alltoall*
currently, MPI fails with MPI_ERR_ARG. This is counter intuitive since
MPI_IN_PLACE is a legitimate parameter. MPI_IN_PLACE might not be correctly
implemented by all the non blocking modules (libnbc, ...) so fail with
MPI_ERR_INTERN for the time being.
2015-10-20 14:12:33 +09:00
Gilles Gouaillardet
b17c89c1e6 man: revamp MPI_File_* and MPI_Register_datarep man pages
- suggest USE mpi instead of INCLUDE 'mpif.h'
- fix indentation
Thanks Jeff for pointing this issue.
2015-10-20 13:12:12 +09:00
KAWASHIMA Takahiro
d9c93c9f5d man: Add const that is removed accidentally in 2226cdb. 2015-10-20 08:49:10 +09:00
Nathan Hjelm
9602484568 Merge pull request #1040 from hjelmn/mtl_priority
Change how cm's priority is calculated
2015-10-19 14:18:36 -06:00
Nathan Hjelm
53f6b57c0a pml/cm: use the priority of the mtl component
This commit changes the priority of mtl components to be relative to
pml/ob1 and updates the mtl interface to expose this priority. cm now
sets its own priority based on the priority of the selected mtl
component.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:42 -06:00
Nathan Hjelm
8b5810f7f7 mca/base: add priority output to mca_base_select
The mca_base_select function uses returned priorities to select the
best component/module. This priority may be of use to the caller so
pass that information back in an optional argument. If the priority is
not needed pass NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:41 -06:00
Nathan Hjelm
bedd80214e pml/ob1: remove priority check
This commit removes code that checks the ob1 priority vs the previous
priority. The previous priority is meaningless here and may only cause
ob1 to disable itself when it shouldn't.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:41 -06:00
Nathan Hjelm
2fd176ac7f cm: fix selection priority
This patch removes a priority check that disables cm if the previous
pml had higher priority. The check was incorrect as coded and is
unnecessary as we finalize all but one pml anyway.

Fixes open-mpi/ompi#1035

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:26 -06:00
Gilles Gouaillardet
0f23037775 coll/base: fix memory allocation in mca_coll_base_alltoall_intra_basic_inplace 2015-10-19 16:47:59 +09:00
KAWASHIMA Takahiro
7fef70c9b8 man: Add MPI_DIST_GRAPH in MPI_TOPO_TEST manpage.
`MPI_DIST_GRAPH` was added in MPI-2.2.
2015-10-19 15:24:12 +09:00
KAWASHIMA Takahiro
d3a29e364c man: Change 'MPI ADDRESS KIND' to 'MPI_ADDRESS_KIND'.
(underscores instead of spaces)
2015-10-19 15:11:06 +09:00
KAWASHIMA Takahiro
dcd14103d5 man: Remove unnecessary spaces in Fortran syntax.
Similar lines of other routines have no space.
2015-10-19 15:04:50 +09:00
KAWASHIMA Takahiro
34c3b5d74d man: Correct the kind of ADDRESS parameter of MPI_GET_ADDRESS. 2015-10-19 15:01:30 +09:00
KAWASHIMA Takahiro
3253a30ab2 man: Remove unnecessary spaces in front of parameters. 2015-10-19 14:48:23 +09:00
KAWASHIMA Takahiro
2226cdb3da man: Correct the routine name of MPI_FILE_IWRITE_SHARED. 2015-10-19 14:44:49 +09:00
KAWASHIMA Takahiro
bffc7b6c8f man: Add man of MPI_Message_{c2f,f2c} and MPI_Op_commutative.
These routines were added in MPI-2.2 but were missing in OMPI man pages.
2015-10-19 13:49:40 +09:00
KAWASHIMA Takahiro
9942d5a933 man: MPI_IBARRIER has two output parameters. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
953c95e9bb man: Update description of MPI_IN_PLACE of MPI_Exscan.
MPI-2.2 added MPI_IN_PLACE support for MPI_Exscan.
2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
5a3b8b34cd man: Remove outdated description. MPI-2.2 is ratified. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
1261b115e4 man: Fix incorrect nroff markups. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
9b96209ac5 man: Fix incorrect C++ binding descriptions. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
ffce87328d man: MPI_Get_version now returns 3.1 instead of 2.1. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
80a0b30be8 man: Correct wrong argument order. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
4bbe86b171 man: Fix a typo of an argument. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
519ddd9ae9 man: Insert missing error classes & Fix incorrect error codes. 2015-10-19 13:46:02 +09:00
Nathan Hjelm
7dac5d36e5 bump fortran mpi version
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-16 11:04:17 -06:00
Nathan Hjelm
467ff3a450 Merge pull request #1024 from hjelmn/mpi_31
bump mpi version to 3.1
2015-10-16 09:35:52 -06:00
Edgar Gabriel
e9be9a1772 Merge pull request #1032 from edgargabriel/pr/macos-sharedfp-sm-fix
limit the number of bytes used for the semaphore name depending on pl…
2015-10-15 15:09:44 -05:00
Edgar Gabriel
f97655f28e make sure the iov buffer is initialized to zero, otherwise bad things can happen for 0-byte contributions on a process. 2015-10-15 12:46:01 -05:00
Jeff Squyres
40b4d5d74d help-mpi-api.txt: remove now-stale help messages 2015-10-15 12:39:16 -04:00
Jeff Squyres
338257a2f4 man: update man pages for Init*/Finalize*
Update language surrounding initialization and finalization in
MPI_Init[_thread], MPI_Initialized, MPI_Finalize, and MPI_Finalized.
2015-10-15 12:39:16 -04:00
Jeff Squyres
f5ad90c920 init/finalize: extensions
Proposed extensions for Open MPI:

- If MPI_INITLIZED is invoked and MPI is only partially initialized,
  wait until MPI is fully initialized before returning.
- If MPI_FINALIZED is invoked and MPI is only partially finalized,
  wait until MPI is fully finalized before returning.
- If the ompi_mpix_allow_multi_init MCA param is true, allow MPI_INIT
  and MPI_INIT_THREAD to be invoked multiple times without error (MPI
  will be safely initialized only the first time it is invoked).
2015-10-15 12:39:15 -04:00
Edgar Gabriel
0177918a08 limit the number of bytes used for the semaphore name depending on platform (31 bytes for MacOS, 252 for Linux) 2015-10-15 11:13:45 -05:00
Nathan Hjelm
341b60dd57 Merge pull request #1029 from kawashima-fj/pr/ob1-fin-memory-leak
pml/ob1: Fix a memory leak regarding pending FIN control messages.
2015-10-15 07:55:52 -06:00
KAWASHIMA Takahiro
66a8bc9e45 fortran/mpif-h: Insert missing weak symbols & Fix incorrect symbol names. 2015-10-15 11:58:41 +09:00
KAWASHIMA Takahiro
4e56505202 pml/ob1: Fix a memory leak regarding pending FIN control messages.
Once a FIN control message is appended to the pending list,
the ob1 PML attempts to send the FIN again in the                               `mca_pml_ob1_process_pending_packets` function.
But if the PML failed to sent the FIN again, the `mca_pml_ob1_send_fin`
function creates a new `mca_pml_ob1_pckt_pending_t` object and the
old object is not retured to the free list.
2015-10-15 11:15:03 +09:00
Jeff Squyres
aceb1ebb47 Merge pull request #1026 from hjelmn/static_mutex
opal static mutex initializers
2015-10-14 22:10:51 -04:00
Jeff Squyres
2b9c9f3093 Fortran: add missing MPI_AINT in mpi_f08 module 2015-10-14 17:32:01 -07:00
Jeff Squyres
8307330e8a Merge pull request #989 from jsquyres/pr/friendly-message-when-dynamics-disabled
Print friendly message when dynamics disabled
2015-10-14 19:52:52 -04:00
Nathan Hjelm
7f7ff8d851 mpit: use opal static mutex initializer
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-14 16:08:42 -06:00
Jeff Squyres
889d80a659 mxm/yalla: disable MPI dynamic process functionality
Disable the MPI dynamic process functionality when these components
are selected to be used.
2015-10-14 13:42:56 -07:00
Jeff Squyres
ac25505e03 mpi: infrastructure to gracefully disable MPI dyn procs
Add ompi_mpi_dynamics_disable() function to disable MPI dynamic
process functionality (i.e., such that if MPI_COMM_SPAWN/etc. are
invoked, you'll get a show_help error explaining that MPI dynamic
process functionality is disabled in this environment -- instead of a
potentially-cryptic network or hardware error).

Fixes #984
2015-10-14 13:42:56 -07:00
Nathan Hjelm
e11f014c6e osc/rdma: fix segmentation fault when running 1 ppn
This commit fixes an issue identified by @rolfv. The local peer was
not being correctly initialized when running with a single process on
a node.

This fixes open-mpi/ompi#1010

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-14 12:40:52 -06:00
Nathan Hjelm
06dd9ec317 bump mpi version to 3.1
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-14 09:52:30 -06:00
Jeff Squyres
5d97d7b5d5 Merge pull request #1017 from jsquyres/pr/fix-cr-exits
dynamics: fix OPAL_CR_EXIT_LIBRARY()
2015-10-14 05:45:05 -04:00
Jeff Squyres
62351f442a help: remove stale help messages and files
Found by contrib/check-help-strings.pl.
2015-10-13 16:50:20 -04:00
Jeff Squyres
a4adee5329 dynamics: fix OPAL_CR_EXIT_LIBRARY()
Noticed that these were wrong will working on a different pull
request.  Submit these fixes indepdent of other changes, just to keep
things separated.
2015-10-13 10:57:33 -07:00
Todd Kordenbrock
7c738fb657 coll-portals4: add gather and igather implementations that use Portals4 triggered operations
This commit adds implementations of gather and igather using
Portals4 triggered operations.  The default algorithm is linear,
but binomial can be selected using an MCA parameter -
coll_portals4_use_binomial_gather_algorithm.
2015-10-13 11:26:35 -05:00
Jeff Squyres
71dfba9ed3 Merge pull request #845 from ggouaillardet/topic/pmpi_vs_mpi
Remove --enable-mpi-profile configure option (i.e., always build PMPI bindings)
2015-10-13 12:13:10 -04:00
Jeff Squyres
9045d6de00 proc.c: fix some compiler warnings
Eliminate unused variables and fix a signed/unsigned comparison issue.
2015-10-13 09:34:18 -04:00
Gilles Gouaillardet
3e469662ad trim man pages if no c++/f08/fortran 2015-10-13 10:21:42 +09:00
Gilles Gouaillardet
66c30b2721 Add Fortran 2008 syntax to the manpages 2015-10-13 09:21:45 +09:00
Gilles Gouaillardet
291a464efb configury: remove the --enable-mpi-profiling option
and directly call the PMPI_* symbols from C and Fortran bindings
2015-10-13 08:52:35 +09:00
Gilles Gouaillardet
40b57ff347 fortran: only generate the correct symbol based on the compiler mangling. 2015-10-13 08:52:03 +09:00
Gilles Gouaillardet
53b952dc2b oshmem: invoke the C PMPI_* subroutines instead of the MPI_* ones
when profiling is built.
This prevents oshmem subroutines from being wrapped twice by third
party tools (e.g. once in oshmem and once in MPI)
see discussion starting at http://www.open-mpi.org/community/lists/devel/2015/08/17842.php

Thanks to Bert Wesarg for bringing this to our attention
2015-10-13 08:52:03 +09:00
Gilles Gouaillardet
16d65a2762 fortran/mpif-h: invoke the C PMPI_* subroutines instead of the MPI_* ones
when profiling is built.
This prevents Fortran subroutines from being wrapped twice by third
party tools (e.g. once in Fortran and once in C)
see discussion starting at http://www.open-mpi.org/community/lists/devel/2015/08/17842.php
2015-10-13 08:52:02 +09:00
Nathan Hjelm
d8dc5292ed Merge pull request #1002 from hjelmn/ompi_coverity
ompi: fix coverity issues
2015-10-09 12:27:41 -06:00
bosilca
1310acc83f Merge pull request #912 from bosilca/topic/coll_requests
This patch fixes the issues identified by @ggouaillardet in the IBM tests (collectives and topologies). It also improves the memory usage of OMPI, as a communicator without collective communications will never allocate the array of requests needed to coordinate the basic collective algorithms. This ticket replaced #790.
2015-10-09 11:27:07 -04:00
Nathan Hjelm
4cb42f8264 ompi: fix coverity issues
Fixes CID 715741: Logically dead code

Verified. Removed dead code.

Fixes CID 1320878: Resource leak

Free proc_list before returning.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-10-09 08:41:27 -06:00
Todd Kordenbrock
141b20d991 osc-portals4: Initialize datatype in MPI_Get_accumulate and MPI_Rget_accumulate
Fix code paths that didn't convert the MPI datatype to the
corresponding Portals4 datatype.

Thanks to Nicolas Chevalier (@shawone) for finding this bug and
submitting a patch.
2015-10-08 12:17:19 -05:00
Gilles Gouaillardet
e946c82847 Revert "coll/basic: fix segmentation fault in neighborhood collectives if the degree"
This partially reverts commit open-mpi/ompi@76204dfafe.
2015-10-08 12:00:41 -04:00
Gilles Gouaillardet
99cca2cfd3 Revert "* comment on communicator creation in mca_topo_base_dist_graph_create(...)"
This partially reverts commit open-mpi/ompi@27e4389259.
2015-10-08 12:00:41 -04:00
George Bosilca
a8bdd8f668 Don't lose the pointer to the request array. Patch provided by
@ggouaillardet.
2015-10-08 12:00:41 -04:00
George Bosilca
88492a1e12 Consistently use the request array for all modules (single array stored
in the base).
Correctly deal with persistent requests (they must be always freed when
they are stored in the request array associated with the communicator).
Always use MPI_STATUS_IGNORE for single request waiting functions.
2015-10-08 12:00:41 -04:00
George Bosilca
01b32caf98 Update the basic module to dynamically allocate the right
number of requests.

Remove unnecessary fields.We don't need these fields.
2015-10-08 12:00:41 -04:00
George Bosilca
a324602174 Never allocate a temporary array for the requests. Instead rely on the
module_data to hold one with the largest necessary size. This array is
only allocated when needed, and it is released upon communicator
destruction.
2015-10-08 12:00:41 -04:00
Ryan Grant
8134ba76f1 Merge pull request #998 from tkordenbrock/topic/fix.incorrect.ompi_proc.cast
Looks good to me.

mtl-portals4: fix bug in the Portals4 get_peer family
2015-10-08 08:38:16 -06:00
Todd Kordenbrock
88d79efd9f mtl-portals4: fix bug in the Portals4 get_peer family
The Portals4 get_peer family incorrectly cast the ompi_proc_t to
ptl_process_t and returned that as the peer.  The ptl_process_t is
actually found in the endpoint array.  This commit fixes the
Portals4 get_peer family to return the dereferenced endpoint
pointer.
2015-10-08 07:57:48 -05:00
Todd Kordenbrock
f33b0c1cdf coll-portals4: allreduce: remove extra %d from error message. 2015-10-08 07:57:33 -05:00
Devendar Bureddy
72f98ccf6c HCOLL: Enable alltoall interface 2015-10-07 08:00:04 +03:00
Nathan Hjelm
c124fd4a0b Merge pull request #977 from hjelmn/ompi_win_free
win: free windows in ompi_win_finalize
2015-10-06 10:11:28 -06:00
Nathan Hjelm
d7205f90f1 win: free windows in ompi_win_finalize
This commit frees any outstanding windows at ompi_win_finalize. If
ompi_debug_show_handle_leaks is set a warning message is printed out
indicating that a window is still allocated.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-06 09:25:51 -06:00
Edagr Gabriel
8af80cd02c update the interfaces of the sharedfp addproc component to match the changes made in the const commit. 2015-10-06 07:54:38 -05:00
Gilles Gouaillardet
de8de65b07 coll/tuned: remove unused prototypes from coll_tuned.h 2015-10-06 09:07:48 +09:00
Mike Dubman
e8d7373b14 COLL/FCA: revert to prev barrier if called from finalize
FCA barrier may not complete if FCA progress is not called periodically.
PMI/PMI2 API that can be used in rte barrier has no provision for calling
external progress function.

So it is possible that during finalize some ranks will be stuck
in fca barrier while others are in PMI barrier.
2015-10-04 09:40:19 +03:00
Mike Dubman
5bebed45eb OMPI: set "in finalize" indicator in finalize flow 2015-10-04 09:39:37 +03:00
Nathan Hjelm
5122327727 fcoll/two_phase: fix new coverity errors
Fix CID 1325467: use after free

Remove extra free of aggregator_list.

Fix CID 1325466: resource leak

Fix typo in prior coverity fix.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-10-02 21:38:31 -06:00
Nathan Hjelm
c99a8a55ba Merge pull request #967 from hjelmn/libnbc_fix
op: allow user operations in ompi_3buff_op_reduce
2015-10-02 18:48:32 -06:00
Nathan Hjelm
57d3b83297 op: allow user operations in ompi_3buff_op_reduce
This commit allows user operations to be used in the
ompi_3buff_op_reduce function. This fixes an issue identified in:

http://www.open-mpi.org/community/lists/devel/2014/04/14586.php

and

http://www.open-mpi.org/community/lists/users/2015/10/27769.php

The fix is to copy source1 into the target then call the user op
function with source2 and target.

Fixes #966

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-02 10:35:21 -06:00
Nathan Hjelm
eb79edff33 Merge pull request #963 from hjelmn/ompi_coverity
fcoll/two_phase: fix coverity errors
2015-10-02 10:22:24 -06:00
Devendar Bureddy
243b75aa80 HCOLL: Add alltoallv interface 2015-10-02 01:51:33 +03:00
Nathan Hjelm
95b95e19af fcoll/dynamic: fix coverity errors
Fixes CID 72320: Explicit NULL dereferenced

On error it is possible that the blocklen_per_process array is
NULL. Change the NULL check before the free to check for non-NULL on
the array not the array element. Also clean up allocation of this
array to use calloc instead of malloc + setting each element to NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-01 14:38:09 -06:00
Nathan Hjelm
09df7aa205 fcoll/two_phase: fix coverity errors
Fixes CIDs 72300, 72344, 1196764-1196768, 72300: Resource leaks

Mulitple allocated arrays are going out of scope at the end of
mca_fcoll_two_phase_file_write_all. Free these arrays. Also removed
the extraneous NULL checks since free (NULL) is safe in C.

Change returns to goto exit where the allocated resources are freed.

Fixes CIDs 72285-72292, 72297, 72298: Resource leaks

Change all appropriate return statements to goto exit to ensure that
all resources are freed. Also removed the NULL checks since free
(NULL) is safe in C.

Fixes CIDs 72295, 72296: Resource leaks

Moved free of requests and recv_types to after exit label. This will
ensure these are freed on error.

Also added a loop and statement to free send_buf which is going out of
scope at the end of the function.

Fixes CIDs 72336-72240, 735197, 735198: Resource leaks

Moved the exit label before to before the resources are released and
changed all appropriate return statements to goto exit. Also removed
extraneous NULL checks because free (NULL) is safe in C.

Fixes CIDs 72341, 72343, 1196805-1196809: Resource leaks

Free all resources after exit label and change return statements to
goto exit to ensure all resources are freed on error.

Fixes CID 1269973: Unused value

Check return code of ompi_request_wait_all. If it fails jump to the
exit.

Fixes CID 714119: Dereference before NULL check

Wrong value checked in conditional.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-01 14:38:09 -06:00
Nathan Hjelm
9d9450a054 Merge pull request #958 from hjelmn/man_pages
ompi/man: fix typos in formatting
2015-09-30 07:30:59 -06:00
Nathan Hjelm
fbaa79835f ompi/man: fix typos in formatting
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-29 23:32:44 -06:00
Nathan Hjelm
5fd9c35957 osc/rdma: fix incorrect assert
This commit fixes MTT failures in debug builds.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-29 15:37:40 -06:00
Nathan Hjelm
7b8ec48c68 osc/rdma: fix typos inarguments to btl_atomic_[f]op
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-29 08:09:00 -06:00
Gilles Gouaillardet
57ecce4e0f ompi_proc_complete_init: always reset u16ptr
if a key is not found, u16ptr is set to NULL and following
opal_value_unload calls might fail
2015-09-29 11:41:51 +09:00
Nathan Hjelm
12bd300c40 Merge pull request #929 from hjelmn/add_procs
Update add_procs support
2015-09-28 17:29:13 -06:00
Nathan Hjelm
6b83fa2f58 ompi/comm: fix coverity errors
Fixes CID 1323841: Logically dead code

Wrong value in conditional. Should be newcomp not newcomm.

Fixes CID 1269762: Explicit null dereference

ompi_group_incl could return an error and not set local_group. Add a
check to ensure ompi_group_incl succeeded before incrementing the proc
count.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-28 15:55:19 -06:00
Nathan Hjelm
6611c000c9 Fix coverity warnings
Fix CID 1315271: Constant expression result

The intent of this conditional is to not produce a peruse event for
probe or mprobe requests. Coverity is correct that the expression is
always true. Changed the || to && to fix. Also moved the conditional
within an OMPI_WANT_PERUSE to ensure the conditional is not evaluated
if peruse is disabled.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-28 15:35:25 -06:00
rhc54
73449e969e Merge pull request #949 from rhc54/topic/nmclean
Cleanup the code a bit by simply adding our nspace to the top of the …
2015-09-28 10:44:22 -07:00
Ralph Castain
a4a3dfd480 Cleanup the code a bit by simply adding our nspace to the top of the list of jobid <-> nspace correlations. Add two new APIs to opal_pmix for registering new jobid/nspace pairs and retrieving an nspace given a jobid - these are required to support connect/accept. No impact on the PMIx library. 2015-09-28 08:50:13 -07:00
Gilles Gouaillardet
97b9d12c58 man: fix a typo in MPI_Ibarrier C prototype
Thanks Harald Servat for reporting this
2015-09-28 16:54:20 +09:00
Gilles Gouaillardet
5e15c20cf8 ompi/info: silence a warning in ompi_info_set_value_enum 2015-09-28 16:42:54 +09:00
Gilles Gouaillardet
f241475db9 ompi: initialize ompi_proc_list common symbol 2015-09-28 10:09:27 +09:00
Nathan Hjelm
20d5c07638 Fix CID 1312113: Logically dead code
Removed logically dead code.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-27 09:46:06 -06:00
bosilca
0a3c54ed61 Merge pull request #942 from bosilca/topic/global_request
Fix for "Random errors on MPI_COMPARE_AND_SWAP with pt2pt OSC of Open MPI master" (#933)
2015-09-27 16:56:29 +02:00
Nathan Hjelm
552e1b59a5 osc/rdma: fix coverity issues
Fixes CID 1324730, 1327429, 1324728, 1196633, 1324731, 1324727, and
1196632: Logically dead code

OMPI_OSC_RDMA_REQUEST_ALLOC can never return a NULL request. Removed
unnecessary NULL checks.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-26 12:45:14 -06:00
Nathan Hjelm
ebf19ac5eb osc/pt2pt: fix coveity issues
Fixed CID 1269712, 1269709, 1269706, 1269703, 1269694: Logically dead code

Remove extra NULL check as OMPI_OSC_PT2PT_REQUEST_ALLOC can never set the
request to NULL.

Fixes CID 1269668: Unchecked return value

False positive. Add (void) to indicate we do not care about the return code
from opal_hash_table_get_uint32.

Fixes CID 1324726: Free of address-of expression

Do not free lock if it was not allocated.

Fixes CID 1269658: Free of address-of expression

Never will happen but because op is always a built-in op there is no
reason to retain/release it anyway.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-26 11:18:22 -06:00
George Bosilca
01d8e23ccc Fix the random errors related to the recursive sends and receives
identified by Fujitsu.
2015-09-26 00:44:51 +02:00
Nathan Hjelm
f84716fcd0 Merge pull request #941 from hjelmn/osc_pt2pt_fix
osc/pt2pt: fix heterogenous build
2015-09-25 08:07:09 -06:00
Nathan Hjelm
ae7f47e04d osc/pt2pt: fix heterogenous build
Fixes #940

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-25 00:15:02 -06:00
Todd Kordenbrock
3e63a3458c portals4: add support for dynamic add_procs() to all Portals4 components
In the default mode of operation, the Portals4 components support
dynamic add_procs().

The Portals4 components have two alternate modes (flow control and
logical-to-physical) that require knowledge of all procs at startup.
In these modes, mtl-portals4 sets the MCA_MTL_BASE_FLAG_REQUIRE_WORLD
flag and btl-portals4 sets the MCA_BTL_FLAGS_SINGLE_ADD_PROCS flag
to tell the PML that we need all the procs in one add_procs() call.
2015-09-24 22:12:57 -05:00
Nathan Hjelm
248212276d osc/sm: fix remaining coverity issues
Fixes CID 1324870: Memory - illegal accesses (USE_AFTER_FREE)

Free osc module after calling destruct on the lock.

Fixes CID 1324868: Integer handling issues (OVERFLOW_BEFORE_WIDEN)
Fixes CID 1324867: Integer handling issues (OVERFLOW_BEFORE_WIDEN)

Explicitly cast to uint64_t to ensure the widen happens before an overflow
can occur.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-24 15:55:01 -06:00
Nathan Hjelm
4ab4c4d7e9 Merge pull request #930 from hjelmn/ompi_coverity
coll/libnbc: fix coverity errors
2015-09-23 17:07:34 -06:00
Nathan Hjelm
54a4061d88 Add support for detecting when dynamic add_procs is not possible
This commit adds support to the pml, mtl, and btl frameworks for
components to indicate at runtime that they do not support the new
dynamic add_procs behavior. At the high end the lack of dynamic
add_procs support is signalled by the pml using the new pml_flags
member to the pml module structure. If the
MCA_PML_BASE_FLAG_REQUIRE_WORLD flag is set MPI_Init will generate the
ompi_proc_t array passed to add_proc from ompi_proc_world () instead
of ompi_proc_get_allocated ().

Both cm and ob1 have been updated to detect if the underlying mtl and
btl components support dynamic add_procs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 16:22:05 -06:00
Nathan Hjelm
2c89c7f47d ompi/proc: add function to get all allocated procs
This commit adds two new functions:

 - ompi_proc_get_allocated - Returns all procs in the current job that
   have already been allocated. This is used in init/finalize to
   determine which procs to pass to add_procs/del_procs.

 - ompi_proc_world_size - returns the number of processes in
   MPI_COMM_WORLD. This may be removed in favor of callers just
   looking at ompi_process_info.

The behavior of ompi_proc_world has been restored to return
ompi_proc_t's for all processes in the current job. The use of this
function is discouraged.

Code that was using ompi_proc_world() has been updated to make use of
the new functions to avoid the memory overhead of ompi_comm_world ().

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 16:22:05 -06:00
Nathan Hjelm
30f8d0b038 coll/libnbc: fix coverity errors
Fix CID 1196812: Resource Leak

dsts array was leaked on error.

Fix CID 710565: Copy-paste error

The line in question (nbc:513) is indeed a copy-paste error.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 16:14:49 -06:00
William Throwe
80bb41a079 ROMIO configure looks for lstat in wrong header
ROMIO configure looks for lstat in wrong header

The ROMIO configure script checks for a declaration of lstat in
unistd.h, but, at least on the Linux machines I checked, lstat is in
sys/stat.h.  (The detection failure led to a linker error when building
ROMIO as part of OpenMPI on one of my admittedly strangely configured
machines, somehow.)  It appears from the man page that either location
is possible, so check both.

(cherry picked from mpich/mpich@7b8bd055df)

Signed-off-by: Rob Latham <robl@mcs.anl.gov>
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 11:56:53 -06:00
Nathan Hjelm
db74fa9d0f bml/r2: fix memory leak
The add_procs change made some assumptions in the bml/r2 add_procs
wrong. This lead to del_procs never being called. I removed the logic
that checks the ompi_proc_t reference count and removed an unnecessary
allocation. The allocation only makes sense if we pass more than a
single proc at a time to the btl del_procs.

This commit also ensures that the btl del_procs is called if the
endpoint is in the btl_rdma array but not the btl_send array.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 10:45:13 -06:00
Nathan Hjelm
ee5810813b osc/pt2pt: fix regression in pscw sync on 0 size groups
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 17:09:00 -06:00
Nathan Hjelm
f6920aa916 osc/rdma: check for usable btls during query
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 17:08:28 -06:00
Nathan Hjelm
903762e194 osc/sm: fix pscw synchronization
The osc/sm component was using a simple counter to determine if all
expected posts had arrived to start a PSCW access epoch. This is
incorrect as a post may arrive from a peer that isn't part of the
current start group. There are many ways this could have been fixed.
This commit adds an n^2 bitmap. When a process posts it sets a bit in
the bitmap associated with the access rank to indicate the post is
complete. The access rank checks for and clears the bits associated
with all the processes in the start group.

The bitmap requires comm_size ^ 2 bits of space. This should be
managable as most nodes have relatively small numbers of processes. If
this changes another algorigthm can be implemented.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 16:00:27 -06:00
Nathan Hjelm
5553dba0c4 Merge pull request #919 from hjelmn/accumulate_ops
ompi/win: save value of accumulate_ops info key on window
2015-09-22 10:50:50 -06:00
Nathan Hjelm
036395dc0f osc/pt2pt: fix typos
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 10:30:01 -06:00
Nathan Hjelm
974061c38f osc: fixed issues identified by coverity
Fix CID 1324733: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324734: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324735: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324736: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324737: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324751: Memory - illegal accesses  (USE_AFTER_FREE)
Fix CID 1324750: (USE_AFTER_FREE)
Fix CID 1324749: Memory - corruptions  (USE_AFTER_FREE)
Fix CID 1324748: Memory - illegal accesses  (USE_AFTER_FREE)
Fix CID 1324747: (USE_AFTER_FREE)
Fix CID 1324746: Memory - corruptions  (USE_AFTER_FREE)

Add missing return on an error path.

Fix CID 1324745: Code maintainability issues  (UNUSED_VALUE)

Ignore return code from barrier. It was not being used anyway.

Fix CID 1324738: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324741: Null pointer dereferences  (REVERSE_INULL)

module->selected_btl can not be NULL in osc/rdma during normal
operation. Removed the unnecessary NULL check.

Fix CID 1324752: Memory - illegal accesses  (USE_AFTER_FREE)

Move ompi_osc_pt2pt_module_lock_remove to before the lock is freed.

Fix CID 1324744: Uninitialized variables  (UNINIT)
Fix CID 1324743: Uninitialized variables  (UNINIT)

This array is not used unitialized but there is no reason not to use
calloc here to silence the warning.

The following CID is a false positive: 1324742. I will mark it such in
coverity.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 09:23:39 -06:00
bosilca
733328aa4d Merge pull request #916 from rolfv/pr/fix-coll-cuda-const-warnings
Fix warnings due to missing const
2015-09-22 16:34:40 +02:00
igor-ivanov
a9fc53cf20 Merge pull request #923 from igor-ivanov/pr/mpisync
ompi/tools: Add O(logN) algorithm for data collection
2015-09-22 16:11:25 +03:00
Igor Ivanov
53be890c03 ompi/tools: Add O(logN) algorithm for data collection
Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-09-22 15:21:37 +03:00
Nathan Hjelm
6751409c32 ompi/win: save value of accumulate_ops info key on window
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-21 16:37:29 -06:00
Nathan Hjelm
d6724f2828 ompi: add missing man pages
This commit adds man pages for the MPI_Win_allocate and MPI_Win_allocated_shared
MPI-3 functions. The man page for MPI_Win_create has also been updated to
indicate support for the same_size and same_disp_unit info keys

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-21 16:21:28 -06:00
Rolf vandeVaart
2c51faa58d Fix warnings due to missing const 2015-09-21 14:18:44 -04:00
Mike Dubman
23c41a0320 Merge pull request #908 from igor-ivanov/pr/oshmem-check
Recovering oshmem functionality
2015-09-21 19:50:24 +03:00
Nathan Hjelm
60c2b0df48 Merge pull request #903 from hjelmn/new_osc_rdma
osc/rdma: add true RDMA one-sided component
2015-09-21 10:29:11 -06:00
Nathan Hjelm
88100ad670 Merge pull request #902 from hjelmn/new_osc
osc/pt2pt: reduce memory footprint of windows
2015-09-21 10:28:41 -06:00
Edgar Gabriel
01fcfb08fe do not set the contigous flag in two_phase_file_read_all. This optimization
needs some more debugging for the two_phase component, and is disabled
for two_phase_file_write_all as well.
2015-09-18 09:30:50 -05:00
Edgar Gabriel
3734a38370 this file should have been part of the previous commit. for removeing io_ompio_nbc.[ch] 2015-09-18 09:28:25 -05:00
Edgar Gabriel
cf46a6bd4d remove the io_ompio_nbc.[ch] files, they are not used anymore at this point in time. 2015-09-18 09:26:25 -05:00
Gilles Gouaillardet
a611274704 pml: fix commit open-mpi/ompi@6e6a3e965c
do not use the const modifier for allocator nor recv buffers
2015-09-18 09:54:18 +09:00
Jeff Squyres
567c9e3a5b mtl_ofi_component.c: add missing argv.h header 2015-09-17 10:05:05 -07:00
Igor Ivanov
4b8d9b8eff oshmem/proc: Refactor proc component
Most functionality of oshmem_proc duplicates ompi_proc. In addition
to that, Current logic does not allow to do oshmem initialization
w/o ompi startup.
So this refactoring allows to  avoid code duplication, decrease used
memory and make oshmem support easier.
Now oshmem_proc is transparent ompi_proc structure, that can be
extended by oshmem specific data.

Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-09-17 18:49:00 +03:00
Igor Ivanov
11f61790ee ompi/proc: Extend ompi_proc_t structure with padding to support oshmem data
Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-09-17 18:48:59 +03:00
Nathan Hjelm
dfbe584c92 ompi/group: fix typos in add_procs changes
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-17 09:21:32 -06:00
Nathan Hjelm
d8df9d414d osc/rdma: add true RDMA one-sided component
This commit adds support for performing one-sided operations over
supported hardware (currently Infiniband and Cray Gemini/Aries). This
component is still undergoing active development.

Current features:

 - Use network atomic operations (fadd, cswap) for implementing
   locking and PSCW synchronization.

 - Aggregate small contiguous puts.

 - Reduced memory footprint by storing window data (pointer, keys,
   etc) at the lowest rank on each node. The data is fetched as each
   process needs to communicate with a new peer. This is a trade-off
   between the performance of the first operation on a peer and the
   memory utilization of a window.

TODO:

 - Add support for the accumulate_ops info key. If it is known that
   the same op or same op/no op is used it may be possible to use
   hardware atomics for fetch-and-op and compare-and-swap.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-16 15:01:33 -06:00
Nathan Hjelm
fd42343ff0 osc/pt2pt: reduce memory footprint of window
This commit updates osc/pt2pt to allocate peer object as they are
needed rather than all at once. Additionally, to help improve the
memory footprint a new synchronization structure has been added.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-16 13:01:56 -06:00
Nathan Hjelm
c84c05bab7 ompi/comm: fix comm_[i]dup on intracommunicators
The behavior of ompi_comm_set was changed to get the remote size from
the remote group. This broke how ompi_comm_[i]dup were using
ompi_comm_set. In order to adapt to the new behavior these functions
now pass NULL for the remote group if the communicator is not an
inter-communicator.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-16 10:31:18 -06:00
George Bosilca
02624bd0b6 Fix all treematch issues idenfied by Coverity. 2015-09-15 23:49:11 -04:00
George Bosilca
6ab5f68fc3 indentation. 2015-09-15 22:46:13 -04:00
rhc54
5597416fe0 Merge pull request #897 from rhc54/topic/oob
Remove the last involvement of the OOB system from the MPI layer
2015-09-15 14:40:21 -07:00
Jeff Squyres
7cb546a221 core: yow; this should absolutely not be in the repo! 2015-09-15 16:15:04 -04:00
Ralph Castain
c1bbbb5e2f Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds 2015-09-15 13:08:35 -07:00
Nathan Hjelm
9c45c63143 ompi/dpm: fix typo in dynamic communicator detection
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-15 12:42:58 -06:00
Nathan Hjelm
6379178046 ompi/comm: fix bug in ompi_comm_set
This commit updates the behavior of ompi_comm_set to explicitly take
either local/remote group(s) OR local/remote array(s). If array(s) are
in use the sizes will be taken from the appropriate group(s).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-15 11:37:44 -06:00
Nathan Hjelm
f29b65aa14 ompi/proc: fix typos CID 1323840
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-11 21:02:30 -06:00
Ralph Castain
fbcf819d2e Remove unnecessary include 2015-09-11 15:53:00 -07:00
Nathan Hjelm
f798c909d1 Merge pull request #883 from hjelmn/comm_split_update
ompi/comm: improve comm_split_type scalability
2015-09-11 16:35:34 -06:00
Ralph Castain
b60b03d613 It is okay not to get the hostname - we don't require that it be provided 2015-09-11 13:01:20 -07:00
Nathan Hjelm
c45789a222 ompi/comm: improve comm_split_type scalability
This commit includes two changes. First, the locality code has been
factored out to improve readability and maintainability. Second,
instead of looking up each proc using ompi_group_peer_lookup the code
now uses ompi_group_peer_lookup_existing. The code falls back on modex
if a proc doesn't exist. This will prevent MPI_Comm_split_type from
allocating ompi_proc_t's for every process in the job.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-11 13:53:48 -06:00
Nathan Hjelm
1868b5937c Merge pull request #889 from hjelmn/sentinel_update
Use the low instead of the high bit to indicate a proc is a sentinel
2015-09-11 12:30:27 -06:00
rhc54
c31093ff19 Merge pull request #890 from rhc54/topic/fixpmi
Revert "Revert "Fix the handling of cpusets so we get the correct cpu…
2015-09-11 09:25:24 -07:00
Nathan Hjelm
898a0a038c bml/r2: fix coverity CID 1323765
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-11 09:39:10 -06:00
Nathan Hjelm
64c8f124fc Use the low instead of the high bit to indicate a proc is a sentinel
The assumption that the high bit is not in use in pointers on any of our
supported platforms was incorrect. A better assumption is that all
ompi_proc_t pointers will be at least 2-byte aligned. This allows us
to use the low bit. To do this we drop the highest bit of the
opal_process_name_t jobid (hope this is ok) and use the low bit to
indicate the proc is really a sentinel.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-11 09:32:02 -06:00