Ralph Castain
dfb952fa78
[Contribution from Artem - moved it to svn from git for him]
...
Replace our old, clunky timing setup with a much nicer one that is only available if configured with --enable-timing. Add a tool for profiling clock differences between the nodes so you can get more precise timing measurements. I'll ask Artem to update the Github wiki with full instructions on how to use this setup.
This commit was SVN r32738.
2014-09-15 18:00:46 +00:00
Vasily Filipov
e26af91a64
BTL/OPENIB: set "max_lmc" param to be "1" and not "all available values" by default.
...
cmr=v1.8.3:reviewer=miked
This commit was SVN r32736.
2014-09-15 13:56:41 +00:00
Alex Mikheev
31d0724a08
OMPI: btl openib: fix detection of max registarable memory
...
Deal with the case when mlx4 module is loaded but device
is not present
cmr=v1.8.3:reviewer=miked
This commit was SVN r32734.
2014-09-15 12:17:23 +00:00
Ralph Castain
fad4384463
Not sure how we could get to this point without having already detected the error, but just to be safe - check for end-of-array and return if error.
...
Refs trac:4897
This commit was SVN r32731.
The following Trac tickets were found above:
Ticket 4897 --> https://svn.open-mpi.org/trac/ompi/ticket/4897
2014-09-13 02:23:30 +00:00
Jeff Squyres
66aeadacff
opal_search_libs: correctly AC_DEFINE results of search
...
1. It is not sufficient to put the result of m4_toupper() in a
variable and use that variable as the variable name in
AC_DEFINE_UNQUOTED. Instead, just use m4_toupper() directly in
AC_DEFINE_UNQUOTED. Also, save the result value in a "permanent"
variable that isn't erased, just in case autoconf decides to be lazy
about instantiating the body AC_DEFINE_UNQUOTED and move it later
(this is probably overkill :-) ).
1. Use the OMPI Way of always defining macros (to 0 or 1). Then also
slightly change the logic in util/basename.c to just check
OPAL_HAVE_DIRNAME (because it will always be defined).
Refs trac:4894
This commit was SVN r32723.
The following Trac tickets were found above:
Ticket 4894 --> https://svn.open-mpi.org/trac/ompi/ticket/4894
2014-09-13 00:28:30 +00:00
Ralph Castain
7269dae2da
Per patch from Samuel Thibault, silence warning from Clang
...
This commit was SVN r32720.
2014-09-12 22:22:11 +00:00
Ralph Castain
0445052a1c
Check for multiple declarations of a given MCA param and error out if detected as that can create an ambiguous definition of the param value.
...
Refs trac:4897
This commit was SVN r32719.
The following Trac tickets were found above:
Ticket 4897 --> https://svn.open-mpi.org/trac/ompi/ticket/4897
2014-09-12 22:21:30 +00:00
Jeff Squyres
d244b7b860
mca_base_var: fix possibilty of unaligned variable assignments
...
Add a debugging check that ensures that the registered storage is
aligned appropriately for the type that is specified.
When we know that the storage is properly aligned, we can cast the
mbv_storage to the appropriate type and then simply do the assignment.
We used to do this assignment via a union, but clang's
-fsanitizer=alignment complained about this.
This commit was SVN r32716.
2014-09-11 23:02:49 +00:00
Ralph Castain
1f2c5863f0
Revert r32675 in favor of a different solution proposed by Brice
...
This commit was SVN r32715.
The following SVN revision numbers were found above:
r32675 --> open-mpi/ompi@916f98a3ee
2014-09-11 21:58:48 +00:00
Howard Pritchard
e43715574a
remove ignored restrct return type qualifier
...
The use of restrict in the return type qualifier for mca_btl_vader_reserve_fbox
is being ignored by gnu compiler. for newer gcc, one sees this warning only
with -Wignored-qualifiers set, but for older variants of gcc it was reported
that numerous warning messages about this ignored qualifier were being
generated as vader is being compiled.
The warning reported by gcc is
btl_vader_fbox.h:53:47: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
static inline mca_btl_vader_fbox_t * restrict mca_btl_vader_reserve_fbox (struct mca_btl_base_endpoint_t *ep, const size_t size)
This commit was SVN r32714.
2014-09-11 21:12:41 +00:00
Rolf vandeVaart
9a2bab0e27
Add support for detecting CUDA managed memory. Disabled for now.
...
This commit was SVN r32713.
2014-09-11 21:07:17 +00:00
Howard Pritchard
820b34e5d2
Fix bad cut/paste for commit c19e7369
...
This commit was SVN r32712.
2014-09-11 21:00:04 +00:00
Howard Pritchard
d07c5674a3
Fix potential double free in cray pmi cray_fini
...
This commit was SVN r32711.
2014-09-11 20:30:40 +00:00
Ralph Castain
cb2ad98f57
Silence an unused function warning
...
This commit was SVN r32704.
2014-09-10 17:36:34 +00:00
Ralph Castain
a7c5b77d70
Just because the openib BTL can't reach a process doesn't mean it is a job-ending error. If we have other methods for reaching the process (e.g., sm for a local proc), then that's okay. If there is no method for reaching a proc, then that's an error - but the BML will report that situation.
...
The question of whether or not the openib BTL supports loopback is a separate question. It may be more appropriate to make the modex be PMIX_GLOBAL for cases where openib can support loopback so someone can run without a shared memory component. I'll leave that decision to the IB vendors.
This commit was SVN r32702.
2014-09-10 17:02:16 +00:00
Ralph Castain
93948f0c4e
Resolve alignment issues when unpacking buffers
...
cmr=v1.8.3:reviewer=jsquyres
This commit was SVN r32698.
2014-09-10 10:19:16 +00:00
Ralph Castain
e671620ac7
Per request from Jeff: tune up the help messages for binding options
...
Refs trac:4898
This commit was SVN r32691.
The following Trac tickets were found above:
Ticket 4898 --> https://svn.open-mpi.org/trac/ompi/ticket/4898
2014-09-09 22:39:22 +00:00
Ralph Castain
4207b4c4ad
Improve the --bind-to help message to better indicate the default options under various values of np. Remove the warning message if the user doesn't specify a binding policy and we are overloaded
...
cmr=v1.8.3:reviewer=jsquyres
This commit was SVN r32687.
2014-09-08 21:03:51 +00:00
Ralph Castain
4df1aa63f7
Since we've run into the situation where someone puts a script wrapper around a launcher such as srun, we need to always protect MCA cmd line params with quotes. This means we also need to protect the backend from quotes coming into the system as part of a value, or else the parser gets confused.
...
So add a new function for wrapping MCA arguments, and tell the backend parser to ignore/remove leading/trailing quotes.
cmr=v1.8.3:reviewer=jsquyres
This commit was SVN r32686.
2014-09-08 20:38:46 +00:00
Ralph Castain
5649841e26
Provide missing include file - generates errors when used with Intel compilers
...
This commit was SVN r32685.
2014-09-08 19:04:40 +00:00
Ralph Castain
e32d541c8d
Bring over a slight modification to the opal_init_test routine
...
This commit was SVN r32676.
2014-09-07 15:46:53 +00:00
Ralph Castain
916f98a3ee
Rename an HWLOC member of a union in the diff.h file to avoid a naming conflict with an external library - it isn't that HWLOC did something wrong, but rather that the name being used is so close to a type name that other folks has a tendency to #define it as well. We could argue with those folks that what they are doing is incorrect, but it is just easier to make a slight change and resolve the problem.
...
This commit was SVN r32675.
2014-09-07 15:42:05 +00:00
Ralph Castain
6323b226c7
Bring over some updates from the PMIx branch - mostly just minor cleanups. Make the direct grpcomm component no longer be the default. For now, we seem to be having problems with non-blocking fence operations, so make them not be the default under any scenario (e.g., when sm is the only btl in operation).
...
This commit was SVN r32673.
2014-09-06 19:19:44 +00:00
Ralph Castain
f1a33b6476
Use the accessor function to get the jobid and vpid
...
This commit was SVN r32672.
2014-09-06 19:18:21 +00:00
Howard Pritchard
fe2ea1f0fb
fix handling of OPAL_DSTORE_LOCALITY and ref cnt
...
This commit was SVN r32671.
2014-09-05 21:36:19 +00:00
Ralph Castain
ec51cbab9f
We are failing to use the system dirname function because we are not correctly flagging that we found it. Modify opal_search_libs_core to set an "opal_have_foo" flag to indicate that we found the specified function, and then modify the have_dirname check to look for it.
...
cmr=v1.8.3:reviewer=jsquyres
This commit was SVN r32669.
2014-09-04 16:10:38 +00:00
Ralph Castain
41c6058153
Bring over changes to MXM from pmix branch:
...
MTL MXM: establish endpoint connection on the first communication when direct_modex used
This commit was SVN r32668.
2014-09-03 18:22:11 +00:00
Ralph Castain
a51d1d7a97
find_last_path_separator returns NULL if the filename doesn't contain a path separator in it - i.e., it's just a local file. So protect the loop to avoid a segfault
...
cmr=v1.8.3:reviewer=rolfv
This commit was SVN r32667.
2014-09-03 18:13:42 +00:00
Ralph Castain
3fed455bbc
If something goes wrong in add_procs, let's not segfault during finalize
...
This commit was SVN r32665.
2014-09-03 17:27:31 +00:00
Ralph Castain
b372cd02d0
Ensure the hwloc headers get installed when --with-devel-headers is given
...
This commit was SVN r32663.
2014-09-02 19:58:25 +00:00
Ralph Castain
d13fb37ef9
Add array types to opal_value_t
...
This commit was SVN r32656.
2014-08-31 08:07:03 +00:00
Ralph Castain
9500939042
Fix abstraction violation
...
This commit was SVN r32655.
2014-08-31 08:06:35 +00:00
Ralph Castain
60eb7124ab
Upgrade to hwloc 1.9.1
...
This commit was SVN r32652.
2014-08-31 03:13:06 +00:00
Ralph Castain
5cdbc00136
Re-enable the usock oob component. Ensure the TCP component promotes messages for other procs to the OOB base so that other components have a chance to send the relay. Seems to be passing MTT, so let's see how it works for others.
...
This commit was SVN r32650.
2014-08-30 19:33:46 +00:00
Ralph Castain
9ac75451ff
Nathan had requested this before as he needs to know the #procs in the job to optimize the UGNI btl. Add the fetch for that data - the native pmix component already provides it, but ensure the Slurm PMI-1 support does too. If not found, fall back to the non-optimized number
...
This commit was SVN r32648.
2014-08-29 22:53:35 +00:00
Ralph Castain
f865ef61ab
Need local_size returned by the Slurm components
...
This commit was SVN r32646.
2014-08-29 22:23:27 +00:00
Howard Pritchard
9a2891f2d6
handle PMIX_LOCAL_SIZE attr arg in cray pmix
...
This commit was SVN r32645.
2014-08-29 21:18:02 +00:00
Ralph Castain
8faabed2cd
Add some further initialization and protection for zero-byte messages
...
This commit was SVN r32644.
2014-08-29 17:24:55 +00:00
Gilles Gouaillardet
6916bfc368
btl/openib: fix use of mca_btl_openib_component.default_recv_qps
...
- do not have mca_btl_openib_component.default_recv_qps point to the stack
- do not reset mca_btl_openib_component.default_recv_qps in btl_openib_component_open
cmr=v1.8.3:reviewer=miked
This commit was SVN r32642.
2014-08-29 04:41:34 +00:00
Gilles Gouaillardet
b8a2e90f2d
btl/openib: fix a typo
...
cmr=v1.8.3:reviewer=miked
This commit was SVN r32639.
2014-08-29 04:23:42 +00:00
Ralph Castain
730e28349e
Some minor uninitialized variable cleanups
...
This commit was SVN r32629.
2014-08-29 02:21:13 +00:00
Jeff Squyres
733316372b
usnic: remove suggestion of enabling no-drop in the fabric
...
Reviewed by Reese Faucette
cmr=v1.8.3:reviewer=ompi-rm1.8
This commit was SVN r32628.
2014-08-28 23:56:56 +00:00
Howard Pritchard
2a12fd833d
Fix compile problem from pmix merge
...
This commit was SVN r32626.
2014-08-28 22:14:12 +00:00
Gilles Gouaillardet
d743da18bf
pmix: fix process name parsing on 32 bits systems
...
opal_process_name_t is an uint64_t which is not equivalent to
an unsigned long on 32 bits systems.
this is now parsed as an unsigned long long.
This commit was SVN r32592.
2014-08-25 03:08:02 +00:00
Ralph Castain
f00af81c1d
Little more cleanup under the abort cases cited by Gilles. All seem to be working now
...
This commit was SVN r32585.
2014-08-22 19:57:57 +00:00
Ralph Castain
b1a7375192
Fix the "unreachable" message so it outputs the correct hostname for the remote proc. Cleanup some of the pmix stuff when running corner cases of errors
...
This commit was SVN r32584.
2014-08-22 19:20:45 +00:00
Joshua Ladd
97abb7c727
Backing out the new Opal Hash table until the legal issues are address by H.P.
...
Refs trac:4872
This commit was SVN r32583.
The following Trac tickets were found above:
Ticket 4872 --> https://svn.open-mpi.org/trac/ompi/ticket/4872
2014-08-22 19:10:09 +00:00
Ralph Castain
6ff2a60829
Handle the non-blocking fence case correctly, and ensure we always at least pass back the hostname of the process whose info is being requested so that the ompi_proc_t can correctly initialize it when we are in a non-blocking fence with np < cutoff scenario
...
This commit was SVN r32578.
2014-08-22 14:26:24 +00:00
Ralph Castain
8f1b9b463e
Fix shared memory operations - need to pass the local topology and cpusets of all local peers so we can properly compute relative locality for them. Also need to set default locality to "on node" in case where cpusets are not passed because procs are not bound.
...
This commit was SVN r32577.
2014-08-22 05:17:51 +00:00
Jeff Squyres
b0dfb9f401
usnic: avoid a possible race condition
...
Per #4874 , code review revealed a possible race condition in the
module struct and the connectivity agent. Move the setup of the
connectivity agent listener until the module struct has been fully
setup.
This commit was SVN r32573.
2014-08-22 02:34:24 +00:00