1
1
Граф коммитов

8766 Коммитов

Автор SHA1 Сообщение Дата
Ryan Grant
f60c506c68 Merge pull request #999 from tkordenbrock/topic/add.triggered.gather
coll-portals4: add gather and igather implementations that use Portals4 triggered operations
2015-10-20 14:59:09 -06:00
yosefe
a313588337 ompi: Add UCX PML. 2015-10-20 19:46:06 +03:00
yosefe
502dc8aaa4 add pml-specific field in OMPI datatype.
PML UCX will use it to cache a handle for UCX datatype.
2015-10-20 19:46:06 +03:00
Jeff Squyres
630d6bf800 Merge pull request #1038 from kawashima-fj/pr/man-correction
man: Various manpage corrections
2015-10-20 06:40:05 -04:00
KAWASHIMA Takahiro
7ab464fbb4 Revert "man: Remove unnecessary spaces in front of parameters."
This reverts commit 3253a30ab2.

Because Gilles' b17c89c1 committed a few hours ago has the same change,
my RP branch had a conflict.
2015-10-20 15:32:45 +09:00
KAWASHIMA Takahiro
373a94a3f1 man: Revert my MPI_File_iwrite_shared manpage change.
This reverts commit 2226cdb3da
and                 d9c93c9f5d.

Because Gilles' b17c89c1 committed a few hours ago has the same change,
my RP branch had a conflict.
2015-10-20 14:41:22 +09:00
Gilles Gouaillardet
2bd77ed4f9 mpi: fail with MPI_ERR_INTERN if MPI_IN_PLACE is used with MPI_I*alltoall*
currently, MPI fails with MPI_ERR_ARG. This is counter intuitive since
MPI_IN_PLACE is a legitimate parameter. MPI_IN_PLACE might not be correctly
implemented by all the non blocking modules (libnbc, ...) so fail with
MPI_ERR_INTERN for the time being.
2015-10-20 14:12:33 +09:00
Gilles Gouaillardet
b17c89c1e6 man: revamp MPI_File_* and MPI_Register_datarep man pages
- suggest USE mpi instead of INCLUDE 'mpif.h'
- fix indentation
Thanks Jeff for pointing this issue.
2015-10-20 13:12:12 +09:00
KAWASHIMA Takahiro
d9c93c9f5d man: Add const that is removed accidentally in 2226cdb. 2015-10-20 08:49:10 +09:00
Nathan Hjelm
9602484568 Merge pull request #1040 from hjelmn/mtl_priority
Change how cm's priority is calculated
2015-10-19 14:18:36 -06:00
Nathan Hjelm
53f6b57c0a pml/cm: use the priority of the mtl component
This commit changes the priority of mtl components to be relative to
pml/ob1 and updates the mtl interface to expose this priority. cm now
sets its own priority based on the priority of the selected mtl
component.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:42 -06:00
Nathan Hjelm
8b5810f7f7 mca/base: add priority output to mca_base_select
The mca_base_select function uses returned priorities to select the
best component/module. This priority may be of use to the caller so
pass that information back in an optional argument. If the priority is
not needed pass NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:41 -06:00
Nathan Hjelm
bedd80214e pml/ob1: remove priority check
This commit removes code that checks the ob1 priority vs the previous
priority. The previous priority is meaningless here and may only cause
ob1 to disable itself when it shouldn't.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:41 -06:00
Nathan Hjelm
2fd176ac7f cm: fix selection priority
This patch removes a priority check that disables cm if the previous
pml had higher priority. The check was incorrect as coded and is
unnecessary as we finalize all but one pml anyway.

Fixes open-mpi/ompi#1035

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-19 12:32:26 -06:00
Gilles Gouaillardet
0f23037775 coll/base: fix memory allocation in mca_coll_base_alltoall_intra_basic_inplace 2015-10-19 16:47:59 +09:00
KAWASHIMA Takahiro
7fef70c9b8 man: Add MPI_DIST_GRAPH in MPI_TOPO_TEST manpage.
`MPI_DIST_GRAPH` was added in MPI-2.2.
2015-10-19 15:24:12 +09:00
KAWASHIMA Takahiro
d3a29e364c man: Change 'MPI ADDRESS KIND' to 'MPI_ADDRESS_KIND'.
(underscores instead of spaces)
2015-10-19 15:11:06 +09:00
KAWASHIMA Takahiro
dcd14103d5 man: Remove unnecessary spaces in Fortran syntax.
Similar lines of other routines have no space.
2015-10-19 15:04:50 +09:00
KAWASHIMA Takahiro
34c3b5d74d man: Correct the kind of ADDRESS parameter of MPI_GET_ADDRESS. 2015-10-19 15:01:30 +09:00
KAWASHIMA Takahiro
3253a30ab2 man: Remove unnecessary spaces in front of parameters. 2015-10-19 14:48:23 +09:00
KAWASHIMA Takahiro
2226cdb3da man: Correct the routine name of MPI_FILE_IWRITE_SHARED. 2015-10-19 14:44:49 +09:00
KAWASHIMA Takahiro
bffc7b6c8f man: Add man of MPI_Message_{c2f,f2c} and MPI_Op_commutative.
These routines were added in MPI-2.2 but were missing in OMPI man pages.
2015-10-19 13:49:40 +09:00
KAWASHIMA Takahiro
9942d5a933 man: MPI_IBARRIER has two output parameters. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
953c95e9bb man: Update description of MPI_IN_PLACE of MPI_Exscan.
MPI-2.2 added MPI_IN_PLACE support for MPI_Exscan.
2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
5a3b8b34cd man: Remove outdated description. MPI-2.2 is ratified. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
1261b115e4 man: Fix incorrect nroff markups. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
9b96209ac5 man: Fix incorrect C++ binding descriptions. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
ffce87328d man: MPI_Get_version now returns 3.1 instead of 2.1. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
80a0b30be8 man: Correct wrong argument order. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
4bbe86b171 man: Fix a typo of an argument. 2015-10-19 13:46:03 +09:00
KAWASHIMA Takahiro
519ddd9ae9 man: Insert missing error classes & Fix incorrect error codes. 2015-10-19 13:46:02 +09:00
Nathan Hjelm
7dac5d36e5 bump fortran mpi version
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-16 11:04:17 -06:00
Nathan Hjelm
467ff3a450 Merge pull request #1024 from hjelmn/mpi_31
bump mpi version to 3.1
2015-10-16 09:35:52 -06:00
Edgar Gabriel
e9be9a1772 Merge pull request #1032 from edgargabriel/pr/macos-sharedfp-sm-fix
limit the number of bytes used for the semaphore name depending on pl…
2015-10-15 15:09:44 -05:00
Edgar Gabriel
f97655f28e make sure the iov buffer is initialized to zero, otherwise bad things can happen for 0-byte contributions on a process. 2015-10-15 12:46:01 -05:00
Jeff Squyres
40b4d5d74d help-mpi-api.txt: remove now-stale help messages 2015-10-15 12:39:16 -04:00
Jeff Squyres
338257a2f4 man: update man pages for Init*/Finalize*
Update language surrounding initialization and finalization in
MPI_Init[_thread], MPI_Initialized, MPI_Finalize, and MPI_Finalized.
2015-10-15 12:39:16 -04:00
Jeff Squyres
f5ad90c920 init/finalize: extensions
Proposed extensions for Open MPI:

- If MPI_INITLIZED is invoked and MPI is only partially initialized,
  wait until MPI is fully initialized before returning.
- If MPI_FINALIZED is invoked and MPI is only partially finalized,
  wait until MPI is fully finalized before returning.
- If the ompi_mpix_allow_multi_init MCA param is true, allow MPI_INIT
  and MPI_INIT_THREAD to be invoked multiple times without error (MPI
  will be safely initialized only the first time it is invoked).
2015-10-15 12:39:15 -04:00
Edgar Gabriel
0177918a08 limit the number of bytes used for the semaphore name depending on platform (31 bytes for MacOS, 252 for Linux) 2015-10-15 11:13:45 -05:00
Nathan Hjelm
341b60dd57 Merge pull request #1029 from kawashima-fj/pr/ob1-fin-memory-leak
pml/ob1: Fix a memory leak regarding pending FIN control messages.
2015-10-15 07:55:52 -06:00
KAWASHIMA Takahiro
66a8bc9e45 fortran/mpif-h: Insert missing weak symbols & Fix incorrect symbol names. 2015-10-15 11:58:41 +09:00
KAWASHIMA Takahiro
4e56505202 pml/ob1: Fix a memory leak regarding pending FIN control messages.
Once a FIN control message is appended to the pending list,
the ob1 PML attempts to send the FIN again in the                               `mca_pml_ob1_process_pending_packets` function.
But if the PML failed to sent the FIN again, the `mca_pml_ob1_send_fin`
function creates a new `mca_pml_ob1_pckt_pending_t` object and the
old object is not retured to the free list.
2015-10-15 11:15:03 +09:00
Jeff Squyres
aceb1ebb47 Merge pull request #1026 from hjelmn/static_mutex
opal static mutex initializers
2015-10-14 22:10:51 -04:00
Jeff Squyres
2b9c9f3093 Fortran: add missing MPI_AINT in mpi_f08 module 2015-10-14 17:32:01 -07:00
Jeff Squyres
8307330e8a Merge pull request #989 from jsquyres/pr/friendly-message-when-dynamics-disabled
Print friendly message when dynamics disabled
2015-10-14 19:52:52 -04:00
Nathan Hjelm
7f7ff8d851 mpit: use opal static mutex initializer
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-14 16:08:42 -06:00
Jeff Squyres
889d80a659 mxm/yalla: disable MPI dynamic process functionality
Disable the MPI dynamic process functionality when these components
are selected to be used.
2015-10-14 13:42:56 -07:00
Jeff Squyres
ac25505e03 mpi: infrastructure to gracefully disable MPI dyn procs
Add ompi_mpi_dynamics_disable() function to disable MPI dynamic
process functionality (i.e., such that if MPI_COMM_SPAWN/etc. are
invoked, you'll get a show_help error explaining that MPI dynamic
process functionality is disabled in this environment -- instead of a
potentially-cryptic network or hardware error).

Fixes #984
2015-10-14 13:42:56 -07:00
Nathan Hjelm
e11f014c6e osc/rdma: fix segmentation fault when running 1 ppn
This commit fixes an issue identified by @rolfv. The local peer was
not being correctly initialized when running with a single process on
a node.

This fixes open-mpi/ompi#1010

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-14 12:40:52 -06:00
Nathan Hjelm
06dd9ec317 bump mpi version to 3.1
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-14 09:52:30 -06:00
Jeff Squyres
5d97d7b5d5 Merge pull request #1017 from jsquyres/pr/fix-cr-exits
dynamics: fix OPAL_CR_EXIT_LIBRARY()
2015-10-14 05:45:05 -04:00
Jeff Squyres
62351f442a help: remove stale help messages and files
Found by contrib/check-help-strings.pl.
2015-10-13 16:50:20 -04:00
Jeff Squyres
a4adee5329 dynamics: fix OPAL_CR_EXIT_LIBRARY()
Noticed that these were wrong will working on a different pull
request.  Submit these fixes indepdent of other changes, just to keep
things separated.
2015-10-13 10:57:33 -07:00
Todd Kordenbrock
7c738fb657 coll-portals4: add gather and igather implementations that use Portals4 triggered operations
This commit adds implementations of gather and igather using
Portals4 triggered operations.  The default algorithm is linear,
but binomial can be selected using an MCA parameter -
coll_portals4_use_binomial_gather_algorithm.
2015-10-13 11:26:35 -05:00
Jeff Squyres
71dfba9ed3 Merge pull request #845 from ggouaillardet/topic/pmpi_vs_mpi
Remove --enable-mpi-profile configure option (i.e., always build PMPI bindings)
2015-10-13 12:13:10 -04:00
Jeff Squyres
9045d6de00 proc.c: fix some compiler warnings
Eliminate unused variables and fix a signed/unsigned comparison issue.
2015-10-13 09:34:18 -04:00
Gilles Gouaillardet
3e469662ad trim man pages if no c++/f08/fortran 2015-10-13 10:21:42 +09:00
Gilles Gouaillardet
66c30b2721 Add Fortran 2008 syntax to the manpages 2015-10-13 09:21:45 +09:00
Gilles Gouaillardet
291a464efb configury: remove the --enable-mpi-profiling option
and directly call the PMPI_* symbols from C and Fortran bindings
2015-10-13 08:52:35 +09:00
Gilles Gouaillardet
40b57ff347 fortran: only generate the correct symbol based on the compiler mangling. 2015-10-13 08:52:03 +09:00
Gilles Gouaillardet
53b952dc2b oshmem: invoke the C PMPI_* subroutines instead of the MPI_* ones
when profiling is built.
This prevents oshmem subroutines from being wrapped twice by third
party tools (e.g. once in oshmem and once in MPI)
see discussion starting at http://www.open-mpi.org/community/lists/devel/2015/08/17842.php

Thanks to Bert Wesarg for bringing this to our attention
2015-10-13 08:52:03 +09:00
Gilles Gouaillardet
16d65a2762 fortran/mpif-h: invoke the C PMPI_* subroutines instead of the MPI_* ones
when profiling is built.
This prevents Fortran subroutines from being wrapped twice by third
party tools (e.g. once in Fortran and once in C)
see discussion starting at http://www.open-mpi.org/community/lists/devel/2015/08/17842.php
2015-10-13 08:52:02 +09:00
Nathan Hjelm
d8dc5292ed Merge pull request #1002 from hjelmn/ompi_coverity
ompi: fix coverity issues
2015-10-09 12:27:41 -06:00
bosilca
1310acc83f Merge pull request #912 from bosilca/topic/coll_requests
This patch fixes the issues identified by @ggouaillardet in the IBM tests (collectives and topologies). It also improves the memory usage of OMPI, as a communicator without collective communications will never allocate the array of requests needed to coordinate the basic collective algorithms. This ticket replaced #790.
2015-10-09 11:27:07 -04:00
Nathan Hjelm
4cb42f8264 ompi: fix coverity issues
Fixes CID 715741: Logically dead code

Verified. Removed dead code.

Fixes CID 1320878: Resource leak

Free proc_list before returning.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-10-09 08:41:27 -06:00
Todd Kordenbrock
141b20d991 osc-portals4: Initialize datatype in MPI_Get_accumulate and MPI_Rget_accumulate
Fix code paths that didn't convert the MPI datatype to the
corresponding Portals4 datatype.

Thanks to Nicolas Chevalier (@shawone) for finding this bug and
submitting a patch.
2015-10-08 12:17:19 -05:00
Gilles Gouaillardet
e946c82847 Revert "coll/basic: fix segmentation fault in neighborhood collectives if the degree"
This partially reverts commit open-mpi/ompi@76204dfafe.
2015-10-08 12:00:41 -04:00
Gilles Gouaillardet
99cca2cfd3 Revert "* comment on communicator creation in mca_topo_base_dist_graph_create(...)"
This partially reverts commit open-mpi/ompi@27e4389259.
2015-10-08 12:00:41 -04:00
George Bosilca
a8bdd8f668 Don't lose the pointer to the request array. Patch provided by
@ggouaillardet.
2015-10-08 12:00:41 -04:00
George Bosilca
88492a1e12 Consistently use the request array for all modules (single array stored
in the base).
Correctly deal with persistent requests (they must be always freed when
they are stored in the request array associated with the communicator).
Always use MPI_STATUS_IGNORE for single request waiting functions.
2015-10-08 12:00:41 -04:00
George Bosilca
01b32caf98 Update the basic module to dynamically allocate the right
number of requests.

Remove unnecessary fields.We don't need these fields.
2015-10-08 12:00:41 -04:00
George Bosilca
a324602174 Never allocate a temporary array for the requests. Instead rely on the
module_data to hold one with the largest necessary size. This array is
only allocated when needed, and it is released upon communicator
destruction.
2015-10-08 12:00:41 -04:00
Ryan Grant
8134ba76f1 Merge pull request #998 from tkordenbrock/topic/fix.incorrect.ompi_proc.cast
Looks good to me.

mtl-portals4: fix bug in the Portals4 get_peer family
2015-10-08 08:38:16 -06:00
Todd Kordenbrock
88d79efd9f mtl-portals4: fix bug in the Portals4 get_peer family
The Portals4 get_peer family incorrectly cast the ompi_proc_t to
ptl_process_t and returned that as the peer.  The ptl_process_t is
actually found in the endpoint array.  This commit fixes the
Portals4 get_peer family to return the dereferenced endpoint
pointer.
2015-10-08 07:57:48 -05:00
Todd Kordenbrock
f33b0c1cdf coll-portals4: allreduce: remove extra %d from error message. 2015-10-08 07:57:33 -05:00
Devendar Bureddy
72f98ccf6c HCOLL: Enable alltoall interface 2015-10-07 08:00:04 +03:00
Nathan Hjelm
c124fd4a0b Merge pull request #977 from hjelmn/ompi_win_free
win: free windows in ompi_win_finalize
2015-10-06 10:11:28 -06:00
Nathan Hjelm
d7205f90f1 win: free windows in ompi_win_finalize
This commit frees any outstanding windows at ompi_win_finalize. If
ompi_debug_show_handle_leaks is set a warning message is printed out
indicating that a window is still allocated.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-06 09:25:51 -06:00
Edagr Gabriel
8af80cd02c update the interfaces of the sharedfp addproc component to match the changes made in the const commit. 2015-10-06 07:54:38 -05:00
Gilles Gouaillardet
de8de65b07 coll/tuned: remove unused prototypes from coll_tuned.h 2015-10-06 09:07:48 +09:00
Mike Dubman
e8d7373b14 COLL/FCA: revert to prev barrier if called from finalize
FCA barrier may not complete if FCA progress is not called periodically.
PMI/PMI2 API that can be used in rte barrier has no provision for calling
external progress function.

So it is possible that during finalize some ranks will be stuck
in fca barrier while others are in PMI barrier.
2015-10-04 09:40:19 +03:00
Mike Dubman
5bebed45eb OMPI: set "in finalize" indicator in finalize flow 2015-10-04 09:39:37 +03:00
Nathan Hjelm
5122327727 fcoll/two_phase: fix new coverity errors
Fix CID 1325467: use after free

Remove extra free of aggregator_list.

Fix CID 1325466: resource leak

Fix typo in prior coverity fix.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-10-02 21:38:31 -06:00
Nathan Hjelm
c99a8a55ba Merge pull request #967 from hjelmn/libnbc_fix
op: allow user operations in ompi_3buff_op_reduce
2015-10-02 18:48:32 -06:00
Nathan Hjelm
57d3b83297 op: allow user operations in ompi_3buff_op_reduce
This commit allows user operations to be used in the
ompi_3buff_op_reduce function. This fixes an issue identified in:

http://www.open-mpi.org/community/lists/devel/2014/04/14586.php

and

http://www.open-mpi.org/community/lists/users/2015/10/27769.php

The fix is to copy source1 into the target then call the user op
function with source2 and target.

Fixes #966

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-02 10:35:21 -06:00
Nathan Hjelm
eb79edff33 Merge pull request #963 from hjelmn/ompi_coverity
fcoll/two_phase: fix coverity errors
2015-10-02 10:22:24 -06:00
Devendar Bureddy
243b75aa80 HCOLL: Add alltoallv interface 2015-10-02 01:51:33 +03:00
Nathan Hjelm
95b95e19af fcoll/dynamic: fix coverity errors
Fixes CID 72320: Explicit NULL dereferenced

On error it is possible that the blocklen_per_process array is
NULL. Change the NULL check before the free to check for non-NULL on
the array not the array element. Also clean up allocation of this
array to use calloc instead of malloc + setting each element to NULL.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-01 14:38:09 -06:00
Nathan Hjelm
09df7aa205 fcoll/two_phase: fix coverity errors
Fixes CIDs 72300, 72344, 1196764-1196768, 72300: Resource leaks

Mulitple allocated arrays are going out of scope at the end of
mca_fcoll_two_phase_file_write_all. Free these arrays. Also removed
the extraneous NULL checks since free (NULL) is safe in C.

Change returns to goto exit where the allocated resources are freed.

Fixes CIDs 72285-72292, 72297, 72298: Resource leaks

Change all appropriate return statements to goto exit to ensure that
all resources are freed. Also removed the NULL checks since free
(NULL) is safe in C.

Fixes CIDs 72295, 72296: Resource leaks

Moved free of requests and recv_types to after exit label. This will
ensure these are freed on error.

Also added a loop and statement to free send_buf which is going out of
scope at the end of the function.

Fixes CIDs 72336-72240, 735197, 735198: Resource leaks

Moved the exit label before to before the resources are released and
changed all appropriate return statements to goto exit. Also removed
extraneous NULL checks because free (NULL) is safe in C.

Fixes CIDs 72341, 72343, 1196805-1196809: Resource leaks

Free all resources after exit label and change return statements to
goto exit to ensure all resources are freed on error.

Fixes CID 1269973: Unused value

Check return code of ompi_request_wait_all. If it fails jump to the
exit.

Fixes CID 714119: Dereference before NULL check

Wrong value checked in conditional.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-01 14:38:09 -06:00
Nathan Hjelm
9d9450a054 Merge pull request #958 from hjelmn/man_pages
ompi/man: fix typos in formatting
2015-09-30 07:30:59 -06:00
Nathan Hjelm
fbaa79835f ompi/man: fix typos in formatting
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-29 23:32:44 -06:00
Nathan Hjelm
5fd9c35957 osc/rdma: fix incorrect assert
This commit fixes MTT failures in debug builds.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-29 15:37:40 -06:00
Nathan Hjelm
7b8ec48c68 osc/rdma: fix typos inarguments to btl_atomic_[f]op
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-29 08:09:00 -06:00
Gilles Gouaillardet
57ecce4e0f ompi_proc_complete_init: always reset u16ptr
if a key is not found, u16ptr is set to NULL and following
opal_value_unload calls might fail
2015-09-29 11:41:51 +09:00
Nathan Hjelm
12bd300c40 Merge pull request #929 from hjelmn/add_procs
Update add_procs support
2015-09-28 17:29:13 -06:00
Nathan Hjelm
6b83fa2f58 ompi/comm: fix coverity errors
Fixes CID 1323841: Logically dead code

Wrong value in conditional. Should be newcomp not newcomm.

Fixes CID 1269762: Explicit null dereference

ompi_group_incl could return an error and not set local_group. Add a
check to ensure ompi_group_incl succeeded before incrementing the proc
count.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-28 15:55:19 -06:00
Nathan Hjelm
6611c000c9 Fix coverity warnings
Fix CID 1315271: Constant expression result

The intent of this conditional is to not produce a peruse event for
probe or mprobe requests. Coverity is correct that the expression is
always true. Changed the || to && to fix. Also moved the conditional
within an OMPI_WANT_PERUSE to ensure the conditional is not evaluated
if peruse is disabled.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-28 15:35:25 -06:00
rhc54
73449e969e Merge pull request #949 from rhc54/topic/nmclean
Cleanup the code a bit by simply adding our nspace to the top of the …
2015-09-28 10:44:22 -07:00
Ralph Castain
a4a3dfd480 Cleanup the code a bit by simply adding our nspace to the top of the list of jobid <-> nspace correlations. Add two new APIs to opal_pmix for registering new jobid/nspace pairs and retrieving an nspace given a jobid - these are required to support connect/accept. No impact on the PMIx library. 2015-09-28 08:50:13 -07:00
Gilles Gouaillardet
97b9d12c58 man: fix a typo in MPI_Ibarrier C prototype
Thanks Harald Servat for reporting this
2015-09-28 16:54:20 +09:00
Gilles Gouaillardet
5e15c20cf8 ompi/info: silence a warning in ompi_info_set_value_enum 2015-09-28 16:42:54 +09:00
Gilles Gouaillardet
f241475db9 ompi: initialize ompi_proc_list common symbol 2015-09-28 10:09:27 +09:00
Nathan Hjelm
20d5c07638 Fix CID 1312113: Logically dead code
Removed logically dead code.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-27 09:46:06 -06:00
bosilca
0a3c54ed61 Merge pull request #942 from bosilca/topic/global_request
Fix for "Random errors on MPI_COMPARE_AND_SWAP with pt2pt OSC of Open MPI master" (#933)
2015-09-27 16:56:29 +02:00
Nathan Hjelm
552e1b59a5 osc/rdma: fix coverity issues
Fixes CID 1324730, 1327429, 1324728, 1196633, 1324731, 1324727, and
1196632: Logically dead code

OMPI_OSC_RDMA_REQUEST_ALLOC can never return a NULL request. Removed
unnecessary NULL checks.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-26 12:45:14 -06:00
Nathan Hjelm
ebf19ac5eb osc/pt2pt: fix coveity issues
Fixed CID 1269712, 1269709, 1269706, 1269703, 1269694: Logically dead code

Remove extra NULL check as OMPI_OSC_PT2PT_REQUEST_ALLOC can never set the
request to NULL.

Fixes CID 1269668: Unchecked return value

False positive. Add (void) to indicate we do not care about the return code
from opal_hash_table_get_uint32.

Fixes CID 1324726: Free of address-of expression

Do not free lock if it was not allocated.

Fixes CID 1269658: Free of address-of expression

Never will happen but because op is always a built-in op there is no
reason to retain/release it anyway.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-26 11:18:22 -06:00
George Bosilca
01d8e23ccc Fix the random errors related to the recursive sends and receives
identified by Fujitsu.
2015-09-26 00:44:51 +02:00
Nathan Hjelm
f84716fcd0 Merge pull request #941 from hjelmn/osc_pt2pt_fix
osc/pt2pt: fix heterogenous build
2015-09-25 08:07:09 -06:00
Nathan Hjelm
ae7f47e04d osc/pt2pt: fix heterogenous build
Fixes #940

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-25 00:15:02 -06:00
Todd Kordenbrock
3e63a3458c portals4: add support for dynamic add_procs() to all Portals4 components
In the default mode of operation, the Portals4 components support
dynamic add_procs().

The Portals4 components have two alternate modes (flow control and
logical-to-physical) that require knowledge of all procs at startup.
In these modes, mtl-portals4 sets the MCA_MTL_BASE_FLAG_REQUIRE_WORLD
flag and btl-portals4 sets the MCA_BTL_FLAGS_SINGLE_ADD_PROCS flag
to tell the PML that we need all the procs in one add_procs() call.
2015-09-24 22:12:57 -05:00
Nathan Hjelm
248212276d osc/sm: fix remaining coverity issues
Fixes CID 1324870: Memory - illegal accesses (USE_AFTER_FREE)

Free osc module after calling destruct on the lock.

Fixes CID 1324868: Integer handling issues (OVERFLOW_BEFORE_WIDEN)
Fixes CID 1324867: Integer handling issues (OVERFLOW_BEFORE_WIDEN)

Explicitly cast to uint64_t to ensure the widen happens before an overflow
can occur.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-24 15:55:01 -06:00
Nathan Hjelm
4ab4c4d7e9 Merge pull request #930 from hjelmn/ompi_coverity
coll/libnbc: fix coverity errors
2015-09-23 17:07:34 -06:00
Nathan Hjelm
54a4061d88 Add support for detecting when dynamic add_procs is not possible
This commit adds support to the pml, mtl, and btl frameworks for
components to indicate at runtime that they do not support the new
dynamic add_procs behavior. At the high end the lack of dynamic
add_procs support is signalled by the pml using the new pml_flags
member to the pml module structure. If the
MCA_PML_BASE_FLAG_REQUIRE_WORLD flag is set MPI_Init will generate the
ompi_proc_t array passed to add_proc from ompi_proc_world () instead
of ompi_proc_get_allocated ().

Both cm and ob1 have been updated to detect if the underlying mtl and
btl components support dynamic add_procs.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 16:22:05 -06:00
Nathan Hjelm
2c89c7f47d ompi/proc: add function to get all allocated procs
This commit adds two new functions:

 - ompi_proc_get_allocated - Returns all procs in the current job that
   have already been allocated. This is used in init/finalize to
   determine which procs to pass to add_procs/del_procs.

 - ompi_proc_world_size - returns the number of processes in
   MPI_COMM_WORLD. This may be removed in favor of callers just
   looking at ompi_process_info.

The behavior of ompi_proc_world has been restored to return
ompi_proc_t's for all processes in the current job. The use of this
function is discouraged.

Code that was using ompi_proc_world() has been updated to make use of
the new functions to avoid the memory overhead of ompi_comm_world ().

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 16:22:05 -06:00
Nathan Hjelm
30f8d0b038 coll/libnbc: fix coverity errors
Fix CID 1196812: Resource Leak

dsts array was leaked on error.

Fix CID 710565: Copy-paste error

The line in question (nbc:513) is indeed a copy-paste error.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 16:14:49 -06:00
William Throwe
80bb41a079 ROMIO configure looks for lstat in wrong header
ROMIO configure looks for lstat in wrong header

The ROMIO configure script checks for a declaration of lstat in
unistd.h, but, at least on the Linux machines I checked, lstat is in
sys/stat.h.  (The detection failure led to a linker error when building
ROMIO as part of OpenMPI on one of my admittedly strangely configured
machines, somehow.)  It appears from the man page that either location
is possible, so check both.

(cherry picked from mpich/mpich@7b8bd055df)

Signed-off-by: Rob Latham <robl@mcs.anl.gov>
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 11:56:53 -06:00
Nathan Hjelm
db74fa9d0f bml/r2: fix memory leak
The add_procs change made some assumptions in the bml/r2 add_procs
wrong. This lead to del_procs never being called. I removed the logic
that checks the ompi_proc_t reference count and removed an unnecessary
allocation. The allocation only makes sense if we pass more than a
single proc at a time to the btl del_procs.

This commit also ensures that the btl del_procs is called if the
endpoint is in the btl_rdma array but not the btl_send array.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 10:45:13 -06:00
Nathan Hjelm
ee5810813b osc/pt2pt: fix regression in pscw sync on 0 size groups
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 17:09:00 -06:00
Nathan Hjelm
f6920aa916 osc/rdma: check for usable btls during query
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 17:08:28 -06:00
Nathan Hjelm
903762e194 osc/sm: fix pscw synchronization
The osc/sm component was using a simple counter to determine if all
expected posts had arrived to start a PSCW access epoch. This is
incorrect as a post may arrive from a peer that isn't part of the
current start group. There are many ways this could have been fixed.
This commit adds an n^2 bitmap. When a process posts it sets a bit in
the bitmap associated with the access rank to indicate the post is
complete. The access rank checks for and clears the bits associated
with all the processes in the start group.

The bitmap requires comm_size ^ 2 bits of space. This should be
managable as most nodes have relatively small numbers of processes. If
this changes another algorigthm can be implemented.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 16:00:27 -06:00
Nathan Hjelm
5553dba0c4 Merge pull request #919 from hjelmn/accumulate_ops
ompi/win: save value of accumulate_ops info key on window
2015-09-22 10:50:50 -06:00
Nathan Hjelm
036395dc0f osc/pt2pt: fix typos
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 10:30:01 -06:00
Nathan Hjelm
974061c38f osc: fixed issues identified by coverity
Fix CID 1324733: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324734: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324735: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324736: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324737: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324751: Memory - illegal accesses  (USE_AFTER_FREE)
Fix CID 1324750: (USE_AFTER_FREE)
Fix CID 1324749: Memory - corruptions  (USE_AFTER_FREE)
Fix CID 1324748: Memory - illegal accesses  (USE_AFTER_FREE)
Fix CID 1324747: (USE_AFTER_FREE)
Fix CID 1324746: Memory - corruptions  (USE_AFTER_FREE)

Add missing return on an error path.

Fix CID 1324745: Code maintainability issues  (UNUSED_VALUE)

Ignore return code from barrier. It was not being used anyway.

Fix CID 1324738: Null pointer dereferences  (FORWARD_NULL)
Fix CID 1324741: Null pointer dereferences  (REVERSE_INULL)

module->selected_btl can not be NULL in osc/rdma during normal
operation. Removed the unnecessary NULL check.

Fix CID 1324752: Memory - illegal accesses  (USE_AFTER_FREE)

Move ompi_osc_pt2pt_module_lock_remove to before the lock is freed.

Fix CID 1324744: Uninitialized variables  (UNINIT)
Fix CID 1324743: Uninitialized variables  (UNINIT)

This array is not used unitialized but there is no reason not to use
calloc here to silence the warning.

The following CID is a false positive: 1324742. I will mark it such in
coverity.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-22 09:23:39 -06:00
bosilca
733328aa4d Merge pull request #916 from rolfv/pr/fix-coll-cuda-const-warnings
Fix warnings due to missing const
2015-09-22 16:34:40 +02:00
igor-ivanov
a9fc53cf20 Merge pull request #923 from igor-ivanov/pr/mpisync
ompi/tools: Add O(logN) algorithm for data collection
2015-09-22 16:11:25 +03:00
Igor Ivanov
53be890c03 ompi/tools: Add O(logN) algorithm for data collection
Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-09-22 15:21:37 +03:00
Nathan Hjelm
6751409c32 ompi/win: save value of accumulate_ops info key on window
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-21 16:37:29 -06:00
Nathan Hjelm
d6724f2828 ompi: add missing man pages
This commit adds man pages for the MPI_Win_allocate and MPI_Win_allocated_shared
MPI-3 functions. The man page for MPI_Win_create has also been updated to
indicate support for the same_size and same_disp_unit info keys

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-21 16:21:28 -06:00
Rolf vandeVaart
2c51faa58d Fix warnings due to missing const 2015-09-21 14:18:44 -04:00
Mike Dubman
23c41a0320 Merge pull request #908 from igor-ivanov/pr/oshmem-check
Recovering oshmem functionality
2015-09-21 19:50:24 +03:00
Nathan Hjelm
60c2b0df48 Merge pull request #903 from hjelmn/new_osc_rdma
osc/rdma: add true RDMA one-sided component
2015-09-21 10:29:11 -06:00
Nathan Hjelm
88100ad670 Merge pull request #902 from hjelmn/new_osc
osc/pt2pt: reduce memory footprint of windows
2015-09-21 10:28:41 -06:00
Edgar Gabriel
01fcfb08fe do not set the contigous flag in two_phase_file_read_all. This optimization
needs some more debugging for the two_phase component, and is disabled
for two_phase_file_write_all as well.
2015-09-18 09:30:50 -05:00
Edgar Gabriel
3734a38370 this file should have been part of the previous commit. for removeing io_ompio_nbc.[ch] 2015-09-18 09:28:25 -05:00
Edgar Gabriel
cf46a6bd4d remove the io_ompio_nbc.[ch] files, they are not used anymore at this point in time. 2015-09-18 09:26:25 -05:00
Gilles Gouaillardet
a611274704 pml: fix commit open-mpi/ompi@6e6a3e965c
do not use the const modifier for allocator nor recv buffers
2015-09-18 09:54:18 +09:00
Jeff Squyres
567c9e3a5b mtl_ofi_component.c: add missing argv.h header 2015-09-17 10:05:05 -07:00
Igor Ivanov
4b8d9b8eff oshmem/proc: Refactor proc component
Most functionality of oshmem_proc duplicates ompi_proc. In addition
to that, Current logic does not allow to do oshmem initialization
w/o ompi startup.
So this refactoring allows to  avoid code duplication, decrease used
memory and make oshmem support easier.
Now oshmem_proc is transparent ompi_proc structure, that can be
extended by oshmem specific data.

Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-09-17 18:49:00 +03:00
Igor Ivanov
11f61790ee ompi/proc: Extend ompi_proc_t structure with padding to support oshmem data
Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
2015-09-17 18:48:59 +03:00
Nathan Hjelm
dfbe584c92 ompi/group: fix typos in add_procs changes
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-17 09:21:32 -06:00
Nathan Hjelm
d8df9d414d osc/rdma: add true RDMA one-sided component
This commit adds support for performing one-sided operations over
supported hardware (currently Infiniband and Cray Gemini/Aries). This
component is still undergoing active development.

Current features:

 - Use network atomic operations (fadd, cswap) for implementing
   locking and PSCW synchronization.

 - Aggregate small contiguous puts.

 - Reduced memory footprint by storing window data (pointer, keys,
   etc) at the lowest rank on each node. The data is fetched as each
   process needs to communicate with a new peer. This is a trade-off
   between the performance of the first operation on a peer and the
   memory utilization of a window.

TODO:

 - Add support for the accumulate_ops info key. If it is known that
   the same op or same op/no op is used it may be possible to use
   hardware atomics for fetch-and-op and compare-and-swap.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-16 15:01:33 -06:00
Nathan Hjelm
fd42343ff0 osc/pt2pt: reduce memory footprint of window
This commit updates osc/pt2pt to allocate peer object as they are
needed rather than all at once. Additionally, to help improve the
memory footprint a new synchronization structure has been added.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-16 13:01:56 -06:00
Nathan Hjelm
c84c05bab7 ompi/comm: fix comm_[i]dup on intracommunicators
The behavior of ompi_comm_set was changed to get the remote size from
the remote group. This broke how ompi_comm_[i]dup were using
ompi_comm_set. In order to adapt to the new behavior these functions
now pass NULL for the remote group if the communicator is not an
inter-communicator.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-16 10:31:18 -06:00
George Bosilca
02624bd0b6 Fix all treematch issues idenfied by Coverity. 2015-09-15 23:49:11 -04:00
George Bosilca
6ab5f68fc3 indentation. 2015-09-15 22:46:13 -04:00
rhc54
5597416fe0 Merge pull request #897 from rhc54/topic/oob
Remove the last involvement of the OOB system from the MPI layer
2015-09-15 14:40:21 -07:00
Jeff Squyres
7cb546a221 core: yow; this should absolutely not be in the repo! 2015-09-15 16:15:04 -04:00
Ralph Castain
c1bbbb5e2f Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds 2015-09-15 13:08:35 -07:00
Nathan Hjelm
9c45c63143 ompi/dpm: fix typo in dynamic communicator detection
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-15 12:42:58 -06:00
Nathan Hjelm
6379178046 ompi/comm: fix bug in ompi_comm_set
This commit updates the behavior of ompi_comm_set to explicitly take
either local/remote group(s) OR local/remote array(s). If array(s) are
in use the sizes will be taken from the appropriate group(s).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-15 11:37:44 -06:00
Nathan Hjelm
f29b65aa14 ompi/proc: fix typos CID 1323840
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-11 21:02:30 -06:00
Ralph Castain
fbcf819d2e Remove unnecessary include 2015-09-11 15:53:00 -07:00
Nathan Hjelm
f798c909d1 Merge pull request #883 from hjelmn/comm_split_update
ompi/comm: improve comm_split_type scalability
2015-09-11 16:35:34 -06:00
Ralph Castain
b60b03d613 It is okay not to get the hostname - we don't require that it be provided 2015-09-11 13:01:20 -07:00
Nathan Hjelm
c45789a222 ompi/comm: improve comm_split_type scalability
This commit includes two changes. First, the locality code has been
factored out to improve readability and maintainability. Second,
instead of looking up each proc using ompi_group_peer_lookup the code
now uses ompi_group_peer_lookup_existing. The code falls back on modex
if a proc doesn't exist. This will prevent MPI_Comm_split_type from
allocating ompi_proc_t's for every process in the job.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-11 13:53:48 -06:00
Nathan Hjelm
1868b5937c Merge pull request #889 from hjelmn/sentinel_update
Use the low instead of the high bit to indicate a proc is a sentinel
2015-09-11 12:30:27 -06:00
rhc54
c31093ff19 Merge pull request #890 from rhc54/topic/fixpmi
Revert "Revert "Fix the handling of cpusets so we get the correct cpu…
2015-09-11 09:25:24 -07:00
Nathan Hjelm
898a0a038c bml/r2: fix coverity CID 1323765
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-11 09:39:10 -06:00
Nathan Hjelm
64c8f124fc Use the low instead of the high bit to indicate a proc is a sentinel
The assumption that the high bit is not in use in pointers on any of our
supported platforms was incorrect. A better assumption is that all
ompi_proc_t pointers will be at least 2-byte aligned. This allows us
to use the low bit. To do this we drop the highest bit of the
opal_process_name_t jobid (hope this is ok) and use the low bit to
indicate the proc is really a sentinel.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-11 09:32:02 -06:00
Ralph Castain
dc5796b8a1 Revert "Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local""
Fix the locality computation by correctly computing the vpid of the local peer

This reverts commit open-mpi/ompi@6a8fad49e5.
2015-09-11 08:29:51 -07:00
Ralph Castain
6a8fad49e5 Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local"
This reverts commit f94f3cda21.
2015-09-11 02:01:25 -07:00
Gilles Gouaillardet
a1627feaf7 coll/ml, bcol: fix prototypes (e.g. use the const modifier) 2015-09-11 13:20:44 +09:00
Gilles Gouaillardet
638a59adf3 fix compilation in heterogeneous mode
use OPAL_PMIX_GLOBAL instead of PMIX_GLOBAL
2015-09-11 09:23:21 +09:00
Nathan Hjelm
ad3a2ef6cc silence warnings introduced by add_procs merge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 16:33:52 -06:00
Ralph Castain
f94f3cda21 Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local 2015-09-10 10:25:30 -07:00
Nathan Hjelm
ed005f2a61 ompi/dpm: improve scalability of ompi_dpm_mark_dyncomm
This commit removes the use of ompi_group_peer_lookup in the
ompi_dpm_mark_dyncomm function. The function now uses
ompi_group_get_proc_name which does not allocate an ompi_proc_t if one
does not already exist.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 10:50:58 -06:00
Nathan Hjelm
987e865c99 mtl/psm2: add support for dynamic add_procs
Add an accessor for the proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]
member of the ompi_proc_t structure. This accessort calls add_procs
with the ompi_proc_t if the member is NULL.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
8df9b1d40d mtl/psm: add support for dynamic add_procs
Add an accessor for the proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]
member of the ompi_proc_t structure. This accessort calls add_procs
with the ompi_proc_t if the member is NULL. Tested on an infinipath
system with InfiniPath_QLE7340 HCAs.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
0a0e6d8eef ompi/group: clean up union/difference code
Updated the union/difference code to remove an extra n^2 translation
of ranks. This comes at the cost of extra memory but greatly
simplifies the code.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
5b7943db78 ompi/group: do not allocate ompi_proc_t's on group union/difference
This commit modifies the ompi_group_t union/difference code to compare/copy the
raw group values. This will either be a ompi_proc_t or a sentinel value. This
commit also adds helper functions to convert between opal process names and
sentinel values.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
d8b0a6efda Remove use of ompi_comm_peer_lookup in osc/sm
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
a41889112c Remove calls to ompi_group_peer_lookup in coll/sm and coll/fca
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
0bf06de3f1 group|comm: add initial support for group sentinel values
This commit modifies ompi's process list group object to support a
sentinel value for non-existant ompi_proc_t objects. The sentinel was
chosen to be the negative of the opal_process_name_t of the associated
ompi_proc_t. This takes advantage of the fact that on most (all?)
systems the top bit of a user-space pointer is never set. If this
changes then a new sentinel will be needed.

In addition this commit modifies the way ompi_mpi_comm_world is
initialized to fill in the group with sentinel values if the number of
processes exceeds the new add_procs behavior cutoff.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
408da16d50 ompi/proc: add proc hash table for ompi_proc_t objects
This commit adds an opal hash table to keep track of mapping between
process identifiers and ompi_proc_t's. This hash table is used by the
ompi_proc_by_name() function to lookup (in O(1) time) a given
process. This can be used by a BTL or other component to get a
ompi_proc_t when handling an incoming message from an as yet unknown
peer.

Additionally, this commit adds a new MCA variable to control the new
add_procs behavior: mpi_add_procs_cutoff. If the number of ranks in
the process falls below the threshold a ompi_proc_t is created for
every process. If the number of ranks is above the threshold then a
ompi_proc_t is only created for the local rank. The code needed to
generate additional ompi_proc_t's for a communicator is not yet
complete.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
b4a0d40915 pml/ob1: Add support for dynamically calling add_procs
This commit contains the following changes:

 - pml/ob1: use the bml accessor function when requesting a bml
   endpoint. this will ensure that bml endpoints are only created when
   needed. for example, a bml endpoint is not requested and not
   allocated when receiving an eager message from a peer.

 - pml/ob1: change the pml_procs array in the ob1 communicator to a
   proc pointer array. at the cost of a single level of extra
   redirection this will allow us to allocate pml procs on demand.

 - pml/ob1: add an accessor function to access the pml proc structure
   for a given peer. this function will allocate the proc if it
   doesn't already exist.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
6fa6513003 bml: Add support for dynamically calling add_procs
This commit contains the following changes:

 - bml: add a function to add a single process. this function is
   intended to remove the need to maintain a opal_bitmap_t as it is
   irrelevant for a single proc. BTLs will need to be updated to
   either 1) ignore the return code from opal_bitmap_set_bit or not
   call the function if the reachability bitmap is NULL.

 - bml: add an inline accessor function for getting the bml endpoint
   for a peer proc. this function will either 1) return the cached bml
   endpoint, or 2) create the endpoint and call add_proc will all
   available BTL modules.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Ralph Castain
b79cffc73b Protect ourselves - if the active pmix component doesn't have some optional functions, then gracefully decline to perform the operation OR use a required alternative (e.g., fence in place of disconnect)
This fixes the Slurm pmi2 support - still something wrong in pmi1
2015-09-09 02:29:00 -07:00
Gilles Gouaillardet
fe351f6801 io: do not cast way the const modifier when this is not necessary
update the io framework and mpi c bindings
2015-09-09 09:18:58 +09:00
Gilles Gouaillardet
e01bac962f coll: do not cast way the const modifier when this is not necessary
update the coll framework and mpi c bindings
2015-09-09 09:18:57 +09:00
Gilles Gouaillardet
6e6a3e965c pml: do not cast way the const modifier when this is not necessary
update the pml framework and mpi c bindings
2015-09-09 09:18:57 +09:00
Gilles Gouaillardet
43ef261d46 topo: do not cast way the const modifier when this is not necessary
update the topo framework and mpi c bindings
2015-09-09 09:18:57 +09:00
rhc54
3a446c9797 Merge pull request #876 from rhc54/topic/hnp
Fix segfault upon job error
2015-09-08 15:10:51 -07:00
Ralph Castain
459f169e06 Fix segfault upon job error
Silence some unnecessary error-logs
2015-09-08 14:03:06 -07:00
Ralph Castain
ae7156cabb Stop a segfault in the test by correctly passing all the argv during spawn 2015-09-08 13:42:46 -07:00
Jeff Squyres
bc9e5652ff whitespace: purge whitespace at end of lines
Generated by running "./contrib/whitespace-purge.sh".
2015-09-08 09:47:17 -07:00
Edgar Gabriel
c83e6ad0c8 fix coverty warnings 1322865 and 72136 2015-09-08 09:15:57 -05:00
Ralph Castain
e6add86e4f Deal with connect/accept between two jobs from different mpirun's. Somewhat optimize connect/accept by using MPI bcast to distribute the participants instead of another PMIx lookup. Cleanup some Coverity issues. 2015-09-07 09:19:24 -07:00
Gilles Gouaillardet
c404e98dce coll/ml: silence warnings (incorrect callback prototype) 2015-09-07 14:56:49 +09:00
Gilles Gouaillardet
56f8a7b840 coll/ml: declare a global variable as static to avoid an uninitialized common symbol. 2015-09-07 14:56:03 +09:00
Ralph Castain
37c3ed68e7 Cleanup connect/disconnect and bring comm_spawn back online! 2015-09-06 10:27:39 -07:00
Jeff Squyres
794ee4a604 treematch: remove stale test
This test was accidentally left over from
open-mpi/ompi@d97bc29102 that prevented
the treematch component from building.
2015-09-05 05:02:30 -07:00
rhc54
665b30376a Merge pull request #868 from rhc54/topic/hwloc
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 17:58:07 -07:00
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
rhc54
d45ccda813 Merge pull request #866 from rhc54/topic/updatepmix
Update PMIx support
2015-09-04 11:09:36 -07:00
Ralph Castain
f6948c2bb4 Sync with PMIx master 43e45c3. Get multi-node publish/lookup/unpublish working 2015-09-04 10:07:17 -07:00
Pavel Shamis / Pasha
c3446f363b Merge pull request #859 from shamisp/topic/ml_soft_disable
ML: Replace opal ignore with a zero priority
2015-09-04 12:37:37 -04:00
Pavel Shamis (Pasha)
32c69630ad ML: Replace opal ignore with a zero priority
The priority set by default to 0. As a result component open reports
an error and the component is not loaded (no resources allocated).
2015-09-04 11:28:47 -04:00
yohann
404393b9d7 mtl/ofi: Minor code cleanup. 2015-09-03 15:04:55 -07:00
yohann
a8cac09769 mtl/ofi: Renamed macro to prevent clash with FI_ namespace. 2015-09-03 14:42:45 -07:00
yohann
7adb9b7ab4 mtl/ofi: Handle -FI_EAGAIN on send and recv operations. 2015-09-03 10:47:00 -07:00
Edgar Gabriel
c9710660af Merge pull request #863 from edgargabriel/topic/fcoll-static-cleanup
Topic/fcoll static cleanup
2015-09-03 11:21:02 -05:00
Edgar Gabriel
a96a15a83c re-enable the contiguous buffer optimization similarly to the dynamic component. Passes all hdf5testsi and our own test suite.
Please enter the commit message for your changes. Lines starting
2015-09-03 10:13:03 -05:00
Edgar Gabriel
8007effc93 code cleanup for static component, similarly to the dynamic one 2015-09-03 10:12:45 -05:00
Jeff Squyres
6d9faf07e5 Merge pull request #858 from jsquyres/pr/fortran-use-only
fortran configiry: test for USE...ONLY support
2015-09-03 10:19:48 -04:00
Edgar Gabriel
ac3a01c39c Silence coverty warnings 1321702, 1321701, 1321700, 72331, 72330, 72327, 72326, 72325, 2015-09-03 09:10:25 -05:00
Jeff Squyres
66dda00f06 fortran configiry: test for USE...ONLY support
As of v15.7, the PGI Fortran compiler does not properly support how
Open MPI uses the "USE ... ONLY" Fortran syntax to include modules
with conflicting symbol definitions (interestingly, pgfortran only has
a problem with this when compiling with -g).

In short, OMPI uses "USE :: module_aaa, ONLY: foo" and "USE ::
module_bbb, ONLY: bar" to use modules aaa and bbb, even though they
contain conflicting definitions for some symbols.  However, the use of
the ONLY clause should preclude the inclusion of the conflicting
symbols -- as the word implies, it should direct the compiler to
*only* use the symbols identified by the clause (i.e., foo and bar, in
this example).

This commit adds a configure test for this capability.  If the
compiler fails to build a simple test that mimics this behavior, then
disable the mpi_f08 bindings.

Fixes open-mpi/ompi#857
2015-09-02 15:55:24 -07:00
Ralph Castain
a772b46c15 Bring the MPI_Publish and friends online 2015-09-02 12:04:07 -07:00
Edgar Gabriel
e95d01be97 Merge pull request #847 from edgargabriel/topic/fcoll-dynamic-cleanup
Topic/fcoll dynamic cleanup
2015-09-01 16:10:55 -05:00
Nathan Hjelm
2a8cc5e637 osc/pt2pt: remove outstanding lock only after lock/flush ack received
fixes #840

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-01 10:54:47 -06:00
Edgar Gabriel
82efc23e8d iclean up indenting and tabs/space of fcoll_static_file_read/write_all 2015-09-01 09:39:33 -05:00
Edgar Gabriel
a1778406d6 Re-enable the contiguous buffer optimization to the read_all and the write_all routines.
After long debugging, I found last week the reason this optimization originally broke
some hdf5 tests. We now pass the hdf5 test suite with the optimization being actively used.
2015-09-01 09:29:07 -05:00
Edgar Gabriel
c2c44b11dc Code cleanup for dynamic read_all and write_all
Specifically:
 - reduce the number of realloc's and malloc's by moving
   some arrays out of the cycle loop, if we know that there
   size is not changing
 - store the rank of the aggregator in a separate variable to avoid
   continuous dereferencing
 - change the wait_all logic in write_all to use a fix number of requests
   (even if they are MPI_REQUEST_NULL)
 - fix the timing to considere the two initial allgather and the one
   allgatherv operation to be a part of it
 - add more comments.
2015-09-01 09:29:07 -05:00
Edgar Gabriel
cf1e4e0d35 step 0: clean up indenting and space vs. tabs 2015-09-01 09:29:07 -05:00
Jeff Squyres
596557e61b Fortran: update a comment
Split the list of subroutines into cases #1 and #2, just for clarity.
2015-08-31 03:10:09 -07:00
Gilles Gouaillardet
21642a2407 osc: do not cast way the const modifier when this is not necessary
update the osc framework and mpi c bindings
2015-08-31 10:34:05 +09:00
Gilles Gouaillardet
21b1e7f8c5 mpi conformance: fix prototypes
- MPI_Compare_and_swap
- MPI_Fetch_and_op
- MPI_Raccumulate
- MPI_Win_detach

Thanks to Michael Knobloch and Takahiro Kawashima for bringing this
to our attention
2015-08-31 10:34:05 +09:00
Ralph Castain
0d5814b5ca Cleanup Coverity issues 2015-08-29 21:19:27 -07:00
Ralph Castain
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00
Jeff Squyres
d17497b4af Merge pull request #835 from Zhiming-Wang/master
Correct the wrong "Name Binding" of functions
2015-08-28 06:38:10 -04:00
Zhi Ming Wang
c8d4751ae6 Correct the wrong "Name Binding" of functions 2015-08-28 03:28:09 -04:00
Jeff Squyres
556c32e1d1 ompi_mpi_abort.c: use _exit(), not exit()
In an abort situation, just bail out immediately -- don't try to
invoke any atexit()/on_exit()-registered functions.

This is similar rationale to
open-mpi/ompi@17846411c3.
2015-08-27 17:08:25 -07:00
Edgar Gabriel
f214ccf499 fix the merge algorithm in the individual sharedfp component, which could
lead to file inconsistency in case of identical timestamps
Also fixes a potential buffer size problem.
2015-08-26 11:22:54 -05:00
Edgar Gabriel
423114e168 minor formatting fix. 2015-08-26 11:20:46 -05:00
Nathan Hjelm
f451876058 Merge pull request #825 from hjelmn/white_space_purge
periodic trailing whitespace purge
2015-08-25 19:23:52 -06:00
Jeff Squyres
1fdc5a5e57 Merge pull request #832 from jsquyres/pr/fortran-sizeof-fix
fortran sizeof fixes
2015-08-25 10:57:53 -04:00
Todd Kordenbrock
25c48b96bb Merge pull request #819 from tkordenbrock/allow-atomics-upto-max_fetch_atomic_size
osc-portals4: allow atomic ops on datatypes that are max_fetch_atomic_size bytes in length
2015-08-25 09:25:27 -05:00
Edgar Gabriel
70078175ee fix coverty warning 72107 2015-08-25 09:23:37 -05:00
Edgar Gabriel
a73f9470e0 fix coverty warning 1269829 2015-08-25 09:22:48 -05:00
Jeff Squyres
2cfdeff38d Fortran: these lines should not be commented out 2015-08-25 07:13:52 -07:00
Jeff Squyres
42a761e052 Fortran: remove dead Makefile.am code 2015-08-25 07:13:34 -07:00
Edgar Gabriel
6f2e8d2073 last nights coverty fix introduced a new coverty complain. This commit tries to fix the new complain by coverty. 2015-08-25 08:46:38 -05:00
Edgar Gabriel
db2d37ad93 correctly free some arrays in case of an error. This fixes a whole bunch of coverty warnings. 2015-08-24 14:13:37 -05:00
Nathan Hjelm
156ce6af21 periodic whitespace purge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-24 09:32:33 -06:00
Edgar Gabriel
58bd0c76b8 fix coverty warning CID 1317091 (properly freeing variables in case of an error) 2015-08-24 08:40:10 -05:00
Howard Pritchard
eb25c006eb Merge pull request #822 from nrgraham23/java_garbage_collection_bugfix
Java garbage collection bugfix
2015-08-22 14:58:01 -06:00
Jeff Squyres
0f3a3e52ba gen-mpi-sizeof: minor style change
Suggested by Paul Hargrove.
2015-08-22 03:07:44 -07:00
Jeff Squyres
9f345bd22f fortran: moar fixes for the Fortran MPI_SIZEOF debacle
Ensure to define ompi/pompi versions for platforms that don't have
weak symbols.  Also make fortran/mpif-h/profile build a separate
sizeof library, just like fortran/mpifh-h does.
2015-08-21 14:35:18 -07:00
Jeff Squyres
ede9fc17b0 gen-mpi-sizeof.pl: don't generate sub for headers
We only need the dummy subroutine when we're generating the body of a
file -- not when we're generating headers.
2015-08-20 14:24:45 -07:00
Jeff Squyres
edf485716e gen-mpi-sizeof.pl: restore execute permission
Somehow the "x" bit got reset in the last commit.
2015-08-20 13:31:02 -07:00
--quiet
d5763a8288 fortran sizeof: ensure mpi_sizeof*f90 is not empty
Per http://www.open-mpi.org/community/lists/devel/2015/08/17775.php,
some compilers don't like it when there's a .f90 file that only
contains comments (and no actual Fortran code).  So if OMPI determines
that the Fortran compiler does not support enough Fortran mojo to
support MPI_SIZEOF, generate at least one dummy Fortran subroutine
that can be compiled in an otherwise barren Fortran landscape that is
devoid of life and hope.
2015-08-20 13:01:14 -07:00
Nathaniel Graham
97422de7a8 Code cleanup
Removing the ArrayList import which is no longer needed.
2015-08-20 12:47:01 -06:00
--quiet
1e9227765a ofi mtl: also link in mtl_ofi_LIBS in the static case 2015-08-20 10:40:46 -07:00
Edgar Gabriel
4be20b119f bring the addproc component up to date with the fileview changes 2015-08-20 09:30:58 -05:00
Edgar Gabriel
8b84da5e35 bring the lockedfile component up to date with the fileview changes. 2015-08-20 09:26:30 -05:00
Nathaniel Graham
d363b5d832 Java garbage collection bugfix
This pull request adds an arraylist of type Buffer to
the Request class.  Whenever a request object is created
that has associated buffers, the buffers should be added
to this array list so the java garbage collector does
not dispose of the buffers prematurely.

This is a more robust expansion on the idea first proposed by
@ggouaillardet

Fixes #369

Signed-off-by: Nathaniel Graham <ngraham@lanl.gov>
2015-08-19 17:45:26 -06:00
Edgar Gabriel
b0461f8d3c the back pointer from the ompio_file structure to the ompi_file_t structure
has to be set earlier in case the user disables the lazy_open option.
2015-08-19 17:11:42 -05:00
Edgar Gabriel
096fe78d73 the offset provided to the read_at/write_at routines has to be a multiple of the etype. 2015-08-19 17:11:42 -05:00
Edgar Gabriel
7e370948c1 first cut on the fileview for shared filepointers fix. 2015-08-19 17:11:42 -05:00
yohann
bcc10fbcd4 mtl/ofi: remove redundant code. 2015-08-19 13:13:59 -07:00
Yossi Itigin
f9e2ede47f Merge pull request #816 from yosefe/topic/yalla-fix-on-demand-map
yalla: fix passing on-demand mapping config to mxm.
2015-08-19 17:25:30 +03:00