Ralph Castain
916f98a3ee
Rename an HWLOC member of a union in the diff.h file to avoid a naming conflict with an external library - it isn't that HWLOC did something wrong, but rather that the name being used is so close to a type name that other folks has a tendency to #define it as well. We could argue with those folks that what they are doing is incorrect, but it is just easier to make a slight change and resolve the problem.
...
This commit was SVN r32675.
2014-09-07 15:42:05 +00:00
Edgar Gabriel
0f59ce6591
use the fbtl return value as originally intended, namely to retrieve the
...
number of bytes written and read. Status contains now the actual number of
bytes written for individual operations. For collective operations, this is
unfortunately not possible.
This commit was SVN r32674.
2014-09-07 15:14:57 +00:00
Ralph Castain
6323b226c7
Bring over some updates from the PMIx branch - mostly just minor cleanups. Make the direct grpcomm component no longer be the default. For now, we seem to be having problems with non-blocking fence operations, so make them not be the default under any scenario (e.g., when sm is the only btl in operation).
...
This commit was SVN r32673.
2014-09-06 19:19:44 +00:00
Ralph Castain
f1a33b6476
Use the accessor function to get the jobid and vpid
...
This commit was SVN r32672.
2014-09-06 19:18:21 +00:00
Howard Pritchard
fe2ea1f0fb
fix handling of OPAL_DSTORE_LOCALITY and ref cnt
...
This commit was SVN r32671.
2014-09-05 21:36:19 +00:00
Gilles Gouaillardet
f0108f881f
oshmem: silence warning
...
ensure OSHMEM_PROFILING is #define'd even if profiling is not supported
cmr=v1.8.3:reviewer=miked
This commit was SVN r32670.
2014-09-05 08:37:29 +00:00
Ralph Castain
ec51cbab9f
We are failing to use the system dirname function because we are not correctly flagging that we found it. Modify opal_search_libs_core to set an "opal_have_foo" flag to indicate that we found the specified function, and then modify the have_dirname check to look for it.
...
cmr=v1.8.3:reviewer=jsquyres
This commit was SVN r32669.
2014-09-04 16:10:38 +00:00
Ralph Castain
41c6058153
Bring over changes to MXM from pmix branch:
...
MTL MXM: establish endpoint connection on the first communication when direct_modex used
This commit was SVN r32668.
2014-09-03 18:22:11 +00:00
Ralph Castain
a51d1d7a97
find_last_path_separator returns NULL if the filename doesn't contain a path separator in it - i.e., it's just a local file. So protect the loop to avoid a segfault
...
cmr=v1.8.3:reviewer=rolfv
This commit was SVN r32667.
2014-09-03 18:13:42 +00:00
Ralph Castain
94ffca4901
Correct the cutoff point for full modex operation as it is based on the number of nodes in the system, not the number of procs in the signature.
...
This commit was SVN r32666.
2014-09-03 17:28:12 +00:00
Ralph Castain
3fed455bbc
If something goes wrong in add_procs, let's not segfault during finalize
...
This commit was SVN r32665.
2014-09-03 17:27:31 +00:00
Ralph Castain
2bfb18e004
Resolve some race conditions when async pmix modex modes are invoked. Since calls to "get" data can come both locally and remotely before data for a given proc has actually been received, we have to track all requests that cannot be immediately fulfilled and provide the data once it has been received.
...
This commit was SVN r32664.
2014-09-02 20:04:17 +00:00
Ralph Castain
b372cd02d0
Ensure the hwloc headers get installed when --with-devel-headers is given
...
This commit was SVN r32663.
2014-09-02 19:58:25 +00:00
Ralph Castain
4d186e6402
Properly protect the MCA parameters being registered by the OOB/TCP component when IPv6 is enabled
...
cmr=v1.8.3:reviewer=jsquyres
This commit was SVN r32662.
2014-09-02 14:53:00 +00:00
Ralph Castain
1c4870357b
Per patch submitted by J. Randall, add missing library to LSF integration
...
cmr=v1.8.3:reviewer=rhc
This commit was SVN r32661.
2014-09-02 00:38:07 +00:00
Ralph Castain
f2b26bde4c
Resolve a race condition that could cause us to hang during abnormal terminations due to multi-counting num_terminated
...
This commit was SVN r32660.
2014-09-02 00:32:52 +00:00
Gilles Gouaillardet
edfbeba7bf
coll/ml: better error handling
...
when CHECK_AND_RECYCLE detects an error, a message is displayed
if the error occurs on an intrinsic communicator, then abort
the program (instead of trying to free the communicator)
cmr=v1.8.3:reviewer=hjelmn
This commit was SVN r32659.
2014-09-01 10:00:49 +00:00
Gilles Gouaillardet
c2bcda518f
oshmem: shpalloc returns the errcode as described in OpenSHMEM 1.1 api
...
cmr=v1.8.3:reviewer=jladd
This commit was SVN r32658.
2014-09-01 08:14:13 +00:00
Ralph Castain
aae1bb4f44
Silence warning
...
This commit was SVN r32657.
2014-08-31 08:10:35 +00:00
Ralph Castain
d13fb37ef9
Add array types to opal_value_t
...
This commit was SVN r32656.
2014-08-31 08:07:03 +00:00
Ralph Castain
9500939042
Fix abstraction violation
...
This commit was SVN r32655.
2014-08-31 08:06:35 +00:00
Mike Dubman
4497dada00
build: add "-verok" flag to ignore autotools version check and continue anyway.
...
This commit was SVN r32654.
2014-08-31 07:27:34 +00:00
MPI Team
f5905c7111
Update git/hg ignore files
...
This commit was SVN r32653.
2014-08-31 05:00:20 +00:00
Ralph Castain
60eb7124ab
Upgrade to hwloc 1.9.1
...
This commit was SVN r32652.
2014-08-31 03:13:06 +00:00
Ralph Castain
e49ca05f11
Remove unused variable
...
This commit was SVN r32651.
2014-08-31 03:11:50 +00:00
Ralph Castain
5cdbc00136
Re-enable the usock oob component. Ensure the TCP component promotes messages for other procs to the OOB base so that other components have a chance to send the relay. Seems to be passing MTT, so let's see how it works for others.
...
This commit was SVN r32650.
2014-08-30 19:33:46 +00:00
Ralph Castain
a2085a5916
Fix the PSM transport key generator to match prior releases
...
This commit was SVN r32649.
2014-08-30 00:48:25 +00:00
Ralph Castain
9ac75451ff
Nathan had requested this before as he needs to know the #procs in the job to optimize the UGNI btl. Add the fetch for that data - the native pmix component already provides it, but ensure the Slurm PMI-1 support does too. If not found, fall back to the non-optimized number
...
This commit was SVN r32648.
2014-08-29 22:53:35 +00:00
Ralph Castain
cb0739dfd4
Update the regex to resolve a bug
...
This commit was SVN r32647.
2014-08-29 22:24:20 +00:00
Ralph Castain
f865ef61ab
Need local_size returned by the Slurm components
...
This commit was SVN r32646.
2014-08-29 22:23:27 +00:00
Howard Pritchard
9a2891f2d6
handle PMIX_LOCAL_SIZE attr arg in cray pmix
...
This commit was SVN r32645.
2014-08-29 21:18:02 +00:00
Ralph Castain
8faabed2cd
Add some further initialization and protection for zero-byte messages
...
This commit was SVN r32644.
2014-08-29 17:24:55 +00:00
Ralph Castain
2b225e3776
Cleanup a race condition regarding marking that waitpid_fired. We should always mark it as fired when we enter the wait_local_proc routine, and also mark it as no longer alive if iof_complete has also been found. If other places in the code also update those flags, there is no harm done.
...
This commit was SVN r32643.
2014-08-29 17:03:31 +00:00
Gilles Gouaillardet
6916bfc368
btl/openib: fix use of mca_btl_openib_component.default_recv_qps
...
- do not have mca_btl_openib_component.default_recv_qps point to the stack
- do not reset mca_btl_openib_component.default_recv_qps in btl_openib_component_open
cmr=v1.8.3:reviewer=miked
This commit was SVN r32642.
2014-08-29 04:41:34 +00:00
Gilles Gouaillardet
b8a2e90f2d
btl/openib: fix a typo
...
cmr=v1.8.3:reviewer=miked
This commit was SVN r32639.
2014-08-29 04:23:42 +00:00
Ralph Castain
730e28349e
Some minor uninitialized variable cleanups
...
This commit was SVN r32629.
2014-08-29 02:21:13 +00:00
Jeff Squyres
733316372b
usnic: remove suggestion of enabling no-drop in the fabric
...
Reviewed by Reese Faucette
cmr=v1.8.3:reviewer=ompi-rm1.8
This commit was SVN r32628.
2014-08-28 23:56:56 +00:00
Jeff Squyres
f4238d65a5
fortran: also provide PMPI variants for MPI_Alloc_mem_cptr
...
r32622 was the first half of the fix -- we need the PMPI variants as well.
Refs trac:4882
This commit was SVN r32627.
The following SVN revision numbers were found above:
r32622 --> open-mpi/ompi@cf0f734a98
The following Trac tickets were found above:
Ticket 4882 --> https://svn.open-mpi.org/trac/ompi/ticket/4882
2014-08-28 23:47:38 +00:00
Howard Pritchard
2a12fd833d
Fix compile problem from pmix merge
...
This commit was SVN r32626.
2014-08-28 22:14:12 +00:00
Howard Pritchard
51c73f116b
switch check for ugni to use pkg-config
...
deprecate with-ugni in lanl/cray_xe6 platform file
This commit was SVN r32625.
2014-08-28 22:03:41 +00:00
Ralph Castain
fafdbeec0c
Cleanup and enable the new daemon collective modules for more scalable operations. Thanks to Nadezhda Kogteva (Mellanox) for doing them.
...
This commit was SVN r32624.
2014-08-28 20:35:35 +00:00
Jeff Squyres
008454a4a8
make_dist_tarball: check for a dirty source tree
...
Have the make_dist_tarball script check to ensure that the source tree
is clean before continuing. This ensures that we don't accidentally
build a distribution tarball with something that is not committed in
the repo.
There is a --dirtyok option to override this check, and if you access
this script via the "make_tarball" link, --dirtyok is added to the
default set of options.
cmr=v1.8.3:reviewer=rhc
This commit was SVN r32623.
2014-08-28 13:26:33 +00:00
Gilles Gouaillardet
cf0f734a98
Fortran: add mpi_alloc_mem_cptr like bindings when configured with --without-weak-symbols
...
cmr=v1.8.3:reviewer=jsquyres
This commit was SVN r32622.
2014-08-28 09:34:54 +00:00
Ralph Castain
b554cd7d86
Turn off the coll/ml component if --without-hwloc was given
...
cmr=v1.8.3:reviewer=jsquyres
This commit was SVN r32621.
2014-08-27 20:25:39 +00:00
Ralph Castain
731a878ff3
Add a bunch of debug to help track down the problem, and eventually find another place where comparison of signatures was incorrectly performed - use the dss compare operation to be consistent and safe
...
This commit was SVN r32620.
2014-08-27 19:52:20 +00:00
Ralph Castain
5fb7c7d23b
Don't explicitly add the hostname to the data fetch when we already cached a remote blob
...
This commit was SVN r32619.
2014-08-27 16:18:05 +00:00
Ralph Castain
3c24770bce
Protect debug printing on backend nodes
...
This commit was SVN r32618.
2014-08-27 16:17:28 +00:00
Ralph Castain
b87b69e977
Ensure the nodes get added to the job map on the remote nodes, add some debug to grpcomm daemon array construction
...
This commit was SVN r32617.
2014-08-27 16:16:46 +00:00
Ralph Castain
842aaf6167
Correctly end mapping oversubscribed nodes round-robin byslot
...
cmr=v1.8.3:reviewer=rhc
This commit was SVN r32616.
2014-08-27 16:15:18 +00:00
Jeff Squyres
d85527701a
Fix MPI_COMM_SPLIT_TYPE with MPI_UNDEFINED
...
Thanks to Lisandro Dalcin for identifying the problem.
Fixes trac:4876
Submitted by George Boscila, reviewed by Jeff Squyres.
cmr=v1.8.3:reviewer=ompi-rm1.8
This commit was SVN r32615.
The following Trac tickets were found above:
Ticket 4876 --> https://svn.open-mpi.org/trac/ompi/ticket/4876
2014-08-27 12:17:33 +00:00