1
1
Граф коммитов

2913 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
27dcca0bb2 pmi/s1: fix large keys
do not overwrite the PMI key when pushing a message that does
not fit within 255 bytes
2014-10-16 13:29:32 +09:00
Gilles Gouaillardet
b5aea782ce Revert "Fix heterogeneous support"
Per the discussion at http://www.open-mpi.org/community/lists/devel/2014/10/16050.php

This reverts commit c9c5d4011b.
2014-10-16 12:24:38 +09:00
George Bosilca
63ba754f3f Remove unnecessary includes from the datatype 2014-10-15 21:49:32 -04:00
George Bosilca
7541c03b4c Mark all instances where atomic operations are used but their return value is unnecessary 2014-10-15 21:47:32 -04:00
Jeff Squyres
dc66e197cc var: fix segv in deprecated file var show_help()
Ensure to include the new variable filename in the show_help() output
when we load a deprecated MCA param from a file.

Fixes #236
2014-10-15 08:07:31 -07:00
Jeff Squyres
51027a6635 usnic: fix minor typo
Change harmless-but-weird comma to semicolon.  Found during code
review.
2014-10-15 05:32:36 -07:00
Gilles Gouaillardet
c9c5d4011b Fix heterogeneous support
* redefine orte_process_name_t so it can be converted
  between host and network format as an opal_identifier_t
  aka uint64_t by the OPAL layer.
* correctly send OPAL_DSTORE_ARCH key
2014-10-15 17:19:13 +09:00
Gilles Gouaillardet
5c81658d58 pmix: fix big endian arch
use the appropriate 64 bits type otherwise data gets incorrectly
truncated on big endian arch
2014-10-15 17:17:09 +09:00
Ralph Castain
3ef94a0675 Per email thread on devel list:
Revert "OPAL: drop dead with core on bad flow. rarely happens with helloworld on large scale."

This reverts commit 86f1d5af3e.

Will be reconsidered via RFC as it represents a significant change in behavior
2014-10-12 21:13:42 -07:00
Ralph Castain
4d27eb70f2 Extend the dstore framework to include a new "update_handle" API so the attributes of an existing handle can be changed. We can't just open a new handle as the upper layers won't know where to find the info. :-( 2014-10-10 12:40:32 -07:00
Ralph Castain
1ae34da5e5 Add an attributes parameter to the dstore.open function so we can pass directives to the active storage component. This can, for example, include the backing file info for a new shared memory segment. 2014-10-10 12:13:25 -07:00
Ralph Castain
63f619f871 Provide a mechanism by which an upstream project can rename the OPAL and ORTE libraries. This is required by projects such as ORCM that have their own ORTE and OPAL libraries in order to avoid library confusion. By renaming their version of the libraries, the OMPI applications can correctly dynamically load the correct one for their build. 2014-10-10 11:39:08 -07:00
Nathan Hjelm
a31cf3b740 btl/vader: missing include 2014-10-09 13:57:21 -06:00
Nathan Hjelm
9e0c07e4ce btl/ugni: improve the handling of eager get fragments when the btl runs out
of preregistered buffers

Before this change eager gets we retried on each progress loop. This commit
modifies the protocol to only retry eager gets when another eager get has
completed. This commit also cleans up some callback code that is no longer
needed.
2014-10-09 13:57:21 -06:00
Howard Pritchard
ebc368d26b remove GNI_RDMAMODE_FENCE bit in GNI_PostRdma
The GNI_RDMAMODE_FENCE bit was a left over from
async progress work that is not needed at this point
in the gni BTL.  Removing the bit also allows
for the removal of the GNI_CDM_MODE_BTE_SINGLE_CHANNEL
bit from the GNI_CdmCreate call.
2014-10-09 12:41:19 -06:00
Ralph Castain
ce8e33447f Silence warning 2014-10-09 10:45:25 -07:00
Joshua Ladd
1cabd73522 Adding a new OPAL hash table routine. Please read the algorithm description in opal/class/opal_hash_table.c for more precise details on the design and implementation. This algorithm was contributed by David Linden of H.P. in partnership with Mellanox Technologies. This contribution achieves two objectives:
1. It's actually hashing now, whereas the old OPAL hash table was not. Thus, it is a bug fix for and, as such, should be included in the 1.8 series.

2. It is dynamic and can grow and shrink the number of buckets in accordance with job size, whereas the old OPAL hash table had a fixed number of buckets which resulted in poor retrieval performance at large scale.

This scheme has been deployed in the field on very large H.P./Mellanox systems and has been demonstrated to significantly decrease job start-up time (~ 20% improvement) when launching applications directly with srun in SLURM environments. However, neither SLURM nor direct launch are prerequisites to take advantage of this change as any entity that utilizes OPAL hash table objects can benefit (at least partially) from this contribution.
2014-10-09 17:24:23 +02:00
Elena
c905fe9b78 pmix: removed pmix_base_direct modex mca parameter, renamed orte_full_modex_cutoff and ompi_hostname_cutoff to direct_modex_cutoff 2014-10-09 06:15:31 +02:00
Howard Pritchard
9947758d98 initial thread safety for ugni btl
This commit adds initial ugni thread safety support.
With this commit, sun thread tests (excepting MPI-2 RMA)
pass with various process counts and threads/process.
Also osu_latency_mt passes.
2014-10-08 10:13:22 -06:00
Jeff Squyres
a422d893b8 memchecker: per RFC, use calloc for OBJ_NEW
With --enable-memchecker builds, use calloc(3) for OBJ_NEW instead of
malloc(3).  This cuts down on a lot of valgrind/memory checker false
positive output.

Also make a minor change in the valgrind configure.m4; have it assign
0xf to a char.  The prior assignment (of 0xff) was warning about an
overflow.  This didn't really matter, but we might as well make the
test not have a gratuitious warning in it.
2014-10-07 09:55:54 -07:00
Mike Dubman
86f1d5af3e OPAL: drop dead with core on bad flow. rarely happens with helloworld on large scale. 2014-10-07 14:07:41 +03:00
Ralph Castain
fd6a044b7f Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages.
Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.
2014-10-03 16:02:57 -06:00
Howard Pritchard
5428301c81 Remove catamount timer support
With the 1.9 release, support for catamount is being
dropped. Hence,  removing catamount timer support.
2014-10-03 14:53:09 -06:00
rolfv
697b18db63 Making async copy the default 2014-10-03 06:42:18 -07:00
Gilles Gouaillardet
5c5453b8b1 pmix: fix test in native_get_attr 2014-10-03 11:54:08 +09:00
Jeff Squyres
413e775dbf version configury: make dist now works
Update the VERSION file scheme:

* Remove "want_repo_rev".
* Add "tarball_version".

All values are now always included (major, minor, release, greek,
repo_rev).  However, configure.ac now runs "opal_get_version.sh
... --tarball", which will return the value of tarball_version (if it
is non-empty) or the "full" version string (i.e.,
"major.minor.releasegreek").
2014-10-02 11:32:54 -07:00
Jeff Squyres
8468424f45 distscript: remove configure.params and autogen.subdirs kruft
Remove configure.params support: configure.params hasn't been used in
years.

Also remove autogen.subdirs support; those should really be handled by
their respective Makefile.am's.
2014-10-02 11:32:54 -07:00
Jeff Squyres
54544f64b3 wrappers: update URLs for GitHub 2014-10-01 14:37:17 -07:00
Ralph Castain
9e35f80ab6 Don't multiply define WANT_PMI_SUPPORT and friends. Turns out they weren't being used anywhere anyway, so no point in defining them at all
This commit was SVN r32822.
2014-09-30 20:43:25 +00:00
Howard Pritchard
8da51fab81 cray pmi equivalent to commit 5eb65b24
This commit was SVN r32820.
2014-09-30 19:25:00 +00:00
Ralph Castain
8d0b4f222a The pmix.get functions should not be returning "success" if the requested info isn't found. Fix the macros and the component functions so they correctly return "not found" in that situation, and set the data regions and size to NULL and 0, respectively.
This commit was SVN r32818.
2014-09-30 18:03:12 +00:00
Jeff Squyres
d4e2809531 version: always use all 3 version numbers
In all previous releases, the version number would be "A.B.C" unless C
was 0, in which case it would be "A.B".  This commit changes that
scheme to always be "A.B.C", even if C==0.

Hence, v1.9.0 will be the first release where this new scheme is evident.

This commit was SVN r32816.
2014-09-30 15:54:18 +00:00
Howard Pritchard
1df933ea27 remove ompi/runtime/params.h include in ugni btl
This commit was SVN r32813.
2014-09-29 19:26:33 +00:00
Howard Pritchard
201d4ec3ad fix setting of PMIX_NODE_RANK in cray pmix comp.
Per discussions with pmix folks, it was determined that
the way the cray pmi pmix component was computing the
PMIX_NODE_RANK attribute for a process was incorrect.
This commit fixes the problem.

This commit was SVN r32810.
2014-09-29 16:55:31 +00:00
Rolf vandeVaart
399dc3db43 Code to check for managed memory. Configure support also.
This commit was SVN r32801.
2014-09-26 16:24:45 +00:00
Rolf vandeVaart
35858f837a Revert r32713. Have different code for this.
This commit was SVN r32800.

The following SVN revision numbers were found above:
  r32713 --> open-mpi/ompi@9a2bab0e27
2014-09-26 14:56:18 +00:00
Nathan Hjelm
e0eb1f2e73 btl/vader: make vader registration lookup/caching thread safe
This commit was SVN r32798.
2014-09-25 22:24:06 +00:00
George Bosilca
53e012ae97 Fix typo.
This commit was SVN r32795.
2014-09-25 17:18:27 +00:00
Nathan Hjelm
aba87f3776 btl/vader:silence warning
This commit was SVN r32788.
2014-09-24 22:10:23 +00:00
Nathan Hjelm
79881ca892 btl/vader: prevent double-destruction of endpoints and move endpoint teardown code into destructor
This commit was SVN r32779.
2014-09-23 21:51:15 +00:00
Nathan Hjelm
2d8fba0861 btl/vader: silence warning
This commit was SVN r32778.
2014-09-23 21:33:45 +00:00
Nathan Hjelm
8bd3160432 btl/vader: fix several typos in vader update
This commit was SVN r32775.
2014-09-23 20:25:36 +00:00
Nathan Hjelm
12bfd13150 btl/vader: improve performance for both single and multiple threads
This is a large update that does the following:

 - Only allocate fast boxes for a peer if a send count threshold
   has been reached (default: 16). This will greatly reduce the memory
   usage with large numbers of local peers.

 - Improve performance by limiting the number of fast boxes that can
   be allocated per peer (default: 32). This will reduce the amount
   of time spent polling for fast box messages.

 - Provide new MCA variables to configure the size, maximum count,
   and send count thresholds for fast boxes allocations.

 - Updated buffer design to increase the range of message sizes that
   can be sent with a fast box.

 - Add thread protection around fast box allocation (locks). When
   spin locks are available this should be updated to use spin locks.

 - Various fixes and cleanup.

This commit was SVN r32774.
2014-09-23 18:11:22 +00:00
Howard Pritchard
1508a01325 Fixes to enable mpirun to work again on Cray
The ess pmi module was not handling aprun launched
daemons.  All daemons were thinking they were vpid 1.

Also, turns out that on cray systems using MOM nodes
for launched jobs, just detecting whether or not a
process is in a PAGG container is not sufficient.

Crank up the priority of the alps PLM component in the
event that the configure detected the presence of both
slurm and alps.

Have the ESS pmi component open the pmix framework and
select a pmix component.

This commit was SVN r32773.
2014-09-23 15:37:26 +00:00
Artem Polyakov
f2e586980b Fix timing framework:
1. Fixes according to (http://www.open-mpi.org/community/lists/devel/2014/09/15869.php)
2. Force mpisync:rank0 to gather results. Now sync info is written by rank0 to the output file.
3. Improve mpirun_prof: 1) adopt to the environment (SLURM/TORQUE); 2) recognize some noteset-related mpirun options.

This commit was SVN r32772.
2014-09-23 12:59:54 +00:00
Ralph Castain
70896550bf Per input from Artem, update the copyrights on these files, ensuring to include all the licensing info for the files broght over from the mpiperf project.
This commit was SVN r32770.
2014-09-20 14:54:24 +00:00
Artem Polyakov
70587d1804 Remove outdated OPAL parameter "opal_pmi_version". Now PMI selection is handled by PMIx MCA.
This commit was SVN r32767.
2014-09-20 02:30:23 +00:00
Ralph Castain
dfb952fa78 [Contribution from Artem - moved it to svn from git for him]
Replace our old, clunky timing setup with a much nicer one that is only available if configured with --enable-timing. Add a tool for profiling clock differences between the nodes so you can get more precise timing measurements. I'll ask Artem to update the Github wiki with full instructions on how to use this setup.

This commit was SVN r32738.
2014-09-15 18:00:46 +00:00
Vasily Filipov
e26af91a64 BTL/OPENIB: set "max_lmc" param to be "1" and not "all available values" by default.
cmr=v1.8.3:reviewer=miked 

This commit was SVN r32736.
2014-09-15 13:56:41 +00:00
Alex Mikheev
31d0724a08 OMPI: btl openib: fix detection of max registarable memory
Deal with the case when mlx4 module is loaded but device
is not present

cmr=v1.8.3:reviewer=miked

This commit was SVN r32734.
2014-09-15 12:17:23 +00:00
Ralph Castain
fad4384463 Not sure how we could get to this point without having already detected the error, but just to be safe - check for end-of-array and return if error.
Refs trac:4897

This commit was SVN r32731.

The following Trac tickets were found above:
  Ticket 4897 --> https://svn.open-mpi.org/trac/ompi/ticket/4897
2014-09-13 02:23:30 +00:00
Jeff Squyres
66aeadacff opal_search_libs: correctly AC_DEFINE results of search
1. It is not sufficient to put the result of m4_toupper() in a
variable and use that variable as the variable name in
AC_DEFINE_UNQUOTED.  Instead, just use m4_toupper() directly in
AC_DEFINE_UNQUOTED.  Also, save the result value in a "permanent"
variable that isn't erased, just in case autoconf decides to be lazy
about instantiating the body AC_DEFINE_UNQUOTED and move it later
(this is probably overkill :-) ).
1. Use the OMPI Way of always defining macros (to 0 or 1).  Then also
slightly change the logic in util/basename.c to just check
OPAL_HAVE_DIRNAME (because it will always be defined).

Refs trac:4894

This commit was SVN r32723.

The following Trac tickets were found above:
  Ticket 4894 --> https://svn.open-mpi.org/trac/ompi/ticket/4894
2014-09-13 00:28:30 +00:00
Ralph Castain
7269dae2da Per patch from Samuel Thibault, silence warning from Clang
This commit was SVN r32720.
2014-09-12 22:22:11 +00:00
Ralph Castain
0445052a1c Check for multiple declarations of a given MCA param and error out if detected as that can create an ambiguous definition of the param value.
Refs trac:4897

This commit was SVN r32719.

The following Trac tickets were found above:
  Ticket 4897 --> https://svn.open-mpi.org/trac/ompi/ticket/4897
2014-09-12 22:21:30 +00:00
Jeff Squyres
d244b7b860 mca_base_var: fix possibilty of unaligned variable assignments
Add a debugging check that ensures that the registered storage is
aligned appropriately for the type that is specified.

When we know that the storage is properly aligned, we can cast the
mbv_storage to the appropriate type and then simply do the assignment.
We used to do this assignment via a union, but clang's
-fsanitizer=alignment complained about this.

This commit was SVN r32716.
2014-09-11 23:02:49 +00:00
Ralph Castain
1f2c5863f0 Revert r32675 in favor of a different solution proposed by Brice
This commit was SVN r32715.

The following SVN revision numbers were found above:
  r32675 --> open-mpi/ompi@916f98a3ee
2014-09-11 21:58:48 +00:00
Howard Pritchard
e43715574a remove ignored restrct return type qualifier
The use of restrict in the return type qualifier for mca_btl_vader_reserve_fbox
is being ignored by gnu compiler.  for newer gcc, one sees this warning only
with -Wignored-qualifiers set, but for older variants of gcc it was reported
that numerous warning messages about this ignored qualifier were being
generated as vader is being compiled.

The warning reported by gcc is

btl_vader_fbox.h:53:47: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]
 static inline mca_btl_vader_fbox_t * restrict mca_btl_vader_reserve_fbox (struct mca_btl_base_endpoint_t *ep, const size_t size)

This commit was SVN r32714.
2014-09-11 21:12:41 +00:00
Rolf vandeVaart
9a2bab0e27 Add support for detecting CUDA managed memory. Disabled for now.
This commit was SVN r32713.
2014-09-11 21:07:17 +00:00
Howard Pritchard
820b34e5d2 Fix bad cut/paste for commit c19e7369
This commit was SVN r32712.
2014-09-11 21:00:04 +00:00
Howard Pritchard
d07c5674a3 Fix potential double free in cray pmi cray_fini
This commit was SVN r32711.
2014-09-11 20:30:40 +00:00
Ralph Castain
cb2ad98f57 Silence an unused function warning
This commit was SVN r32704.
2014-09-10 17:36:34 +00:00
Ralph Castain
a7c5b77d70 Just because the openib BTL can't reach a process doesn't mean it is a job-ending error. If we have other methods for reaching the process (e.g., sm for a local proc), then that's okay. If there is no method for reaching a proc, then that's an error - but the BML will report that situation.
The question of whether or not the openib BTL supports loopback is a separate question. It may be more appropriate to make the modex be PMIX_GLOBAL for cases where openib can support loopback so someone can run without a shared memory component. I'll leave that decision to the IB vendors.

This commit was SVN r32702.
2014-09-10 17:02:16 +00:00
Ralph Castain
93948f0c4e Resolve alignment issues when unpacking buffers
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32698.
2014-09-10 10:19:16 +00:00
Ralph Castain
e671620ac7 Per request from Jeff: tune up the help messages for binding options
Refs trac:4898

This commit was SVN r32691.

The following Trac tickets were found above:
  Ticket 4898 --> https://svn.open-mpi.org/trac/ompi/ticket/4898
2014-09-09 22:39:22 +00:00
Ralph Castain
4207b4c4ad Improve the --bind-to help message to better indicate the default options under various values of np. Remove the warning message if the user doesn't specify a binding policy and we are overloaded
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32687.
2014-09-08 21:03:51 +00:00
Ralph Castain
4df1aa63f7 Since we've run into the situation where someone puts a script wrapper around a launcher such as srun, we need to always protect MCA cmd line params with quotes. This means we also need to protect the backend from quotes coming into the system as part of a value, or else the parser gets confused.
So add a new function for wrapping MCA arguments, and tell the backend parser to ignore/remove leading/trailing quotes.

cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32686.
2014-09-08 20:38:46 +00:00
Ralph Castain
5649841e26 Provide missing include file - generates errors when used with Intel compilers
This commit was SVN r32685.
2014-09-08 19:04:40 +00:00
Ralph Castain
e32d541c8d Bring over a slight modification to the opal_init_test routine
This commit was SVN r32676.
2014-09-07 15:46:53 +00:00
Ralph Castain
916f98a3ee Rename an HWLOC member of a union in the diff.h file to avoid a naming conflict with an external library - it isn't that HWLOC did something wrong, but rather that the name being used is so close to a type name that other folks has a tendency to #define it as well. We could argue with those folks that what they are doing is incorrect, but it is just easier to make a slight change and resolve the problem.
This commit was SVN r32675.
2014-09-07 15:42:05 +00:00
Ralph Castain
6323b226c7 Bring over some updates from the PMIx branch - mostly just minor cleanups. Make the direct grpcomm component no longer be the default. For now, we seem to be having problems with non-blocking fence operations, so make them not be the default under any scenario (e.g., when sm is the only btl in operation).
This commit was SVN r32673.
2014-09-06 19:19:44 +00:00
Ralph Castain
f1a33b6476 Use the accessor function to get the jobid and vpid
This commit was SVN r32672.
2014-09-06 19:18:21 +00:00
Howard Pritchard
fe2ea1f0fb fix handling of OPAL_DSTORE_LOCALITY and ref cnt
This commit was SVN r32671.
2014-09-05 21:36:19 +00:00
Ralph Castain
ec51cbab9f We are failing to use the system dirname function because we are not correctly flagging that we found it. Modify opal_search_libs_core to set an "opal_have_foo" flag to indicate that we found the specified function, and then modify the have_dirname check to look for it.
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32669.
2014-09-04 16:10:38 +00:00
Ralph Castain
41c6058153 Bring over changes to MXM from pmix branch:
MTL MXM: establish endpoint connection on the first communication when direct_modex used

This commit was SVN r32668.
2014-09-03 18:22:11 +00:00
Ralph Castain
a51d1d7a97 find_last_path_separator returns NULL if the filename doesn't contain a path separator in it - i.e., it's just a local file. So protect the loop to avoid a segfault
cmr=v1.8.3:reviewer=rolfv

This commit was SVN r32667.
2014-09-03 18:13:42 +00:00
Ralph Castain
3fed455bbc If something goes wrong in add_procs, let's not segfault during finalize
This commit was SVN r32665.
2014-09-03 17:27:31 +00:00
Ralph Castain
b372cd02d0 Ensure the hwloc headers get installed when --with-devel-headers is given
This commit was SVN r32663.
2014-09-02 19:58:25 +00:00
Ralph Castain
d13fb37ef9 Add array types to opal_value_t
This commit was SVN r32656.
2014-08-31 08:07:03 +00:00
Ralph Castain
9500939042 Fix abstraction violation
This commit was SVN r32655.
2014-08-31 08:06:35 +00:00
Ralph Castain
60eb7124ab Upgrade to hwloc 1.9.1
This commit was SVN r32652.
2014-08-31 03:13:06 +00:00
Ralph Castain
5cdbc00136 Re-enable the usock oob component. Ensure the TCP component promotes messages for other procs to the OOB base so that other components have a chance to send the relay. Seems to be passing MTT, so let's see how it works for others.
This commit was SVN r32650.
2014-08-30 19:33:46 +00:00
Ralph Castain
9ac75451ff Nathan had requested this before as he needs to know the #procs in the job to optimize the UGNI btl. Add the fetch for that data - the native pmix component already provides it, but ensure the Slurm PMI-1 support does too. If not found, fall back to the non-optimized number
This commit was SVN r32648.
2014-08-29 22:53:35 +00:00
Ralph Castain
f865ef61ab Need local_size returned by the Slurm components
This commit was SVN r32646.
2014-08-29 22:23:27 +00:00
Howard Pritchard
9a2891f2d6 handle PMIX_LOCAL_SIZE attr arg in cray pmix
This commit was SVN r32645.
2014-08-29 21:18:02 +00:00
Ralph Castain
8faabed2cd Add some further initialization and protection for zero-byte messages
This commit was SVN r32644.
2014-08-29 17:24:55 +00:00
Gilles Gouaillardet
6916bfc368 btl/openib: fix use of mca_btl_openib_component.default_recv_qps
- do not have mca_btl_openib_component.default_recv_qps point to the stack
- do not reset mca_btl_openib_component.default_recv_qps in btl_openib_component_open

cmr=v1.8.3:reviewer=miked

This commit was SVN r32642.
2014-08-29 04:41:34 +00:00
Gilles Gouaillardet
b8a2e90f2d btl/openib: fix a typo
cmr=v1.8.3:reviewer=miked

This commit was SVN r32639.
2014-08-29 04:23:42 +00:00
Ralph Castain
730e28349e Some minor uninitialized variable cleanups
This commit was SVN r32629.
2014-08-29 02:21:13 +00:00
Jeff Squyres
733316372b usnic: remove suggestion of enabling no-drop in the fabric
Reviewed by Reese Faucette

cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32628.
2014-08-28 23:56:56 +00:00
Howard Pritchard
2a12fd833d Fix compile problem from pmix merge
This commit was SVN r32626.
2014-08-28 22:14:12 +00:00
Gilles Gouaillardet
d743da18bf pmix: fix process name parsing on 32 bits systems
opal_process_name_t is an uint64_t which is not equivalent to
an unsigned long on 32 bits systems.
this is now parsed as an unsigned long long.

This commit was SVN r32592.
2014-08-25 03:08:02 +00:00
Ralph Castain
f00af81c1d Little more cleanup under the abort cases cited by Gilles. All seem to be working now
This commit was SVN r32585.
2014-08-22 19:57:57 +00:00
Ralph Castain
b1a7375192 Fix the "unreachable" message so it outputs the correct hostname for the remote proc. Cleanup some of the pmix stuff when running corner cases of errors
This commit was SVN r32584.
2014-08-22 19:20:45 +00:00
Joshua Ladd
97abb7c727 Backing out the new Opal Hash table until the legal issues are address by H.P.
Refs trac:4872

This commit was SVN r32583.

The following Trac tickets were found above:
  Ticket 4872 --> https://svn.open-mpi.org/trac/ompi/ticket/4872
2014-08-22 19:10:09 +00:00
Ralph Castain
6ff2a60829 Handle the non-blocking fence case correctly, and ensure we always at least pass back the hostname of the process whose info is being requested so that the ompi_proc_t can correctly initialize it when we are in a non-blocking fence with np < cutoff scenario
This commit was SVN r32578.
2014-08-22 14:26:24 +00:00
Ralph Castain
8f1b9b463e Fix shared memory operations - need to pass the local topology and cpusets of all local peers so we can properly compute relative locality for them. Also need to set default locality to "on node" in case where cpusets are not passed because procs are not bound.
This commit was SVN r32577.
2014-08-22 05:17:51 +00:00
Jeff Squyres
b0dfb9f401 usnic: avoid a possible race condition
Per #4874, code review revealed a possible race condition in the
module struct and the connectivity agent.  Move the setup of the
connectivity agent listener until the module struct has been fully
setup.

This commit was SVN r32573.
2014-08-22 02:34:24 +00:00
Jeff Squyres
a896f90712 btl_base_select: fix faulty/incorrect show_help message
When no components were able to be found, btl_base_select() was
showing the wrong help message -- one that indicated that a specific
component could not be found.  And it left off a string argument, so
the end of the help message was garbage.

This commit creates a new help message for this case and updates the
show_help call to use the new message.

This commit was SVN r32572.
2014-08-22 01:53:38 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Mike Dubman
c3beb0472e openib/btl: better detect max reg memory. OFED has no runtime versioning API :(
based on http://www.open-mpi.org/community/lists/users/2014/08/25048.php

reviewed by AlexM
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32569.
2014-08-21 12:12:43 +00:00
Gilles Gouaillardet
cfc0773c8c btl/scif: use safe syntax
Thanks to Ashley Pittman for pointing the modex should now be zero'ed

cmr=v1.8.2:ticket=trac:4871

This commit was SVN r32568.

The following Trac tickets were found above:
  Ticket 4871 --> https://svn.open-mpi.org/trac/ompi/ticket/4871
2014-08-21 10:32:05 +00:00
Mike Dubman
acd5a9acac udcm: psn should be 24 bit, new OFED actually checks and fails if it is not 24 bit.
This commit was SVN r32567.
2014-08-21 08:49:43 +00:00
Gilles Gouaillardet
59caebe3ea new opal hash table
Decrease the hash table size when an element is removed

cmr=v1.8.2:ticket=trac:4872

This commit was SVN r32566.

The following Trac tickets were found above:
  Ticket 4872 --> https://svn.open-mpi.org/trac/ompi/ticket/4872
2014-08-21 06:51:11 +00:00
Ralph Castain
ea94659bd9 Silence warning
Refs trac:4872

This commit was SVN r32565.

The following Trac tickets were found above:
  Ticket 4872 --> https://svn.open-mpi.org/trac/ompi/ticket/4872
2014-08-20 22:27:03 +00:00
Joshua Ladd
84d0cc27a2 Adding a new OPAL hash table routine. Contributed by David Linden of H.P. in partnership
with Mellanox Technologies. This should be added to 


cmr=v1.8.2:subject=New OPAL hash table:reviewer=rhc

This commit was SVN r32564.
2014-08-20 21:40:28 +00:00
Gilles Gouaillardet
3c1944054e btl/scif: use safe syntax
PGI compilers 2013 and older do not support the following syntax :
mca_btl_scif_modex_t modex = {.port_id = mca_btl_scif_module.port_id};
so split it on two lines

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32555.
2014-08-20 02:48:47 +00:00
Jeff Squyres
0a398c155f opal MCA params: Move (and adapt) help message to opal help file
This commit was SVN r32547.
2014-08-16 11:54:41 +00:00
Jeff Squyres
ac7c907f8d usnic: ensure to have a safe destruction of an opal_list_item_t
It turns out that we ''can'' get to the endpoint destructor with the
endpoint still on the "endpoints needing ACKs" list.  So if it's on
the list, remove it first, and then DESTRUCT the opal_list_item_t.

This prevents an assert() fail in debug builds.  We'd like to let this
soak over the weekend.

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r32546.
2014-08-15 21:52:36 +00:00
Jeff Squyres
1cdcb7290b usnic: no need to check before calling this function
This function is intentionally always safe to call -- no need for a
double redundant check.

This commit was SVN r32545.
2014-08-15 21:39:29 +00:00
Jeff Squyres
082ab15d19 usnic: increase the listen() backlog size
Rarely -- but it happens -- the connectivity client gets ECONNREFUSED
because the connectivity agent listen() backlog is too small.  Rather
than put in a loop on the client side, take the simple way out for
now: increase the backlog size to an arbitrarily-large number.

Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32543.
2014-08-15 19:12:18 +00:00
Jeff Squyres
9373d6420e usnic: when a module is finalized, "unlisten" the connectivity checker
Instead of waiting to destroy the connectivity agent during component
shutdown, have the module shutdown send an "unlisten" command to the
cagent that will tell it to stop listening on a given interface.

This commit was SVN r32536.
2014-08-15 00:52:43 +00:00
Jeff Squyres
6b592d3016 usnic: convert some BTL_ERRORs to more descriptive show_help messages
1. After we receive N abnormally-short messages (meaning: corrupted),
print a show_help message about it.  N defaults to 25.  N can be set
to 0 disable the message via btl_usnic_max_short_packets.
1. If we receive a completion error for something other than a
receive, display a show_help message.

Reviewed by Dave Goodell.

CMR'ing to v1.8.3, but it will require a custom patch because of the
OMPI->OPAL BTL move.

cmr=v1.8.3

This commit was SVN r32522.
2014-08-13 15:01:20 +00:00
Mike Dubman
5b90af601c btl/openib: add missing definition for ConnectX3 card
This commit was SVN r32521.
2014-08-13 13:56:34 +00:00
Howard Pritchard
4d6d4f46b0 switch udreg config macro to use pkg-config
This commit was SVN r32516.
2014-08-12 21:30:06 +00:00
Rolf vandeVaart
37dc9477d0 As requested in RFC and discussed at weekly meeting, change default setting of ibv_fork_init() to off.
Link to RFC: http://www.open-mpi.org/community/lists/devel/2014/07/15393.php

cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32514.
2014-08-12 20:57:11 +00:00
Rolf vandeVaart
fb835a3b04 Needs to be a plus.
This commit was SVN r32513.
2014-08-12 20:39:00 +00:00
Rolf vandeVaart
c53c981506 Fix initialization and cleanup code for CUDA-aware code. Eliminates all resource leaks.
This commit was SVN r32512.
2014-08-12 19:41:46 +00:00
Gilles Gouaillardet
b5d5388c6c silence warning on solaris 10
use HAVE_STRINGS_H to protect <strings.h> include

cmr=v1.8.2:reviewer=rhc:ticket=trac:4853

This commit was SVN r32510.

The following Trac tickets were found above:
  Ticket 4853 --> https://svn.open-mpi.org/trac/ompi/ticket/4853
2014-08-12 04:27:49 +00:00
Howard Pritchard
5bbcf5f7fb remove ompi comm related constructs from ugni btl
There were remaining references to size of MPI_COMM_WORLD,
etc. in ugni btl which prevented building of opal library
following btl move to opal.

Eventually another mechanism will need to be found for
providing hints to BTLs about how to setup internal
resources relevant to max. number possible endpoints, etc.

Conflicts:
	opal/mca/btl/ugni/btl_ugni_component.c

This commit was SVN r32507.
2014-08-11 16:15:39 +00:00
George Bosilca
de7191132d Remove few warnings.
This commit was SVN r32506.
2014-08-11 13:34:44 +00:00
Gilles Gouaillardet
623945466e silence a warning on Solaris
cmr=v1.8.2:reviewer=bgoglin

This commit was SVN r32503.
2014-08-11 11:00:55 +00:00
Gilles Gouaillardet
c28918c5cf silence warning on solaris 10
on solaris 10, bzero is declared in strings.h and not in string.h

cmr=v1.8.2:reviewer=rhc

This commit was SVN r32502.
2014-08-11 08:18:43 +00:00
Gilles Gouaillardet
f24699623f check-help-strings cleanup
This commit was SVN r32495.
2014-08-11 03:25:22 +00:00
Gilles Gouaillardet
b565e69b86 check-help-strings cleanup
This commit was SVN r32491.
2014-08-11 03:19:57 +00:00
Jeff Squyres
ca0ccc5321 headers: remove trailing commas in enum lists
Per http://www.open-mpi.org/community/lists/devel/2014/08/15576.php,
trailing commas are not valid in enum lists in C++ until C++11.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r32482.
2014-08-09 12:04:17 +00:00
George Bosilca
beec6b4b4b Remove a lost #include.
This commit was SVN r32478.
2014-08-08 23:42:40 +00:00
Howard Pritchard
1e02bb056f openib btl check-help-strings cleanup
This commit was SVN r32470.
2014-08-08 20:40:18 +00:00
Jeff Squyres
65767aff68 usnic: remove errant OMPI header file
This commit was SVN r32469.
2014-08-08 20:34:50 +00:00
Rolf vandeVaart
876232e6d0 Fix previous checkin.
This commit was SVN r32468.
2014-08-08 18:58:25 +00:00
George Bosilca
5df7451429 Fix the #include for the common sm.
This commit was SVN r32467.
2014-08-08 18:30:27 +00:00
Howard Pritchard
a1f6ecf1e6 initial fixes for ugni btl move to opal
This commit was SVN r32466.
2014-08-08 18:02:46 +00:00
Jeff Squyres
323b9f346c usnic: update connectivity checker help message
Show an example of using the btl_usnic_connectivity_map option.  Also,
mention that another reason for the "total connectivity failure" may
be due to asymmetric / unexpected routing.

Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32465.
2014-08-08 17:18:29 +00:00
Rolf vandeVaart
ac16af0bff Small fix for case where no libcuda.so.1 is found.
This commit was SVN r32461.
2014-08-08 16:05:19 +00:00
Ralph Castain
644089ed0b Avoid alignment issues in opal_value_t copies
cmr=v1.8.2:reviewer=rhc

This commit was SVN r32459.
2014-08-08 15:57:03 +00:00
Rolf vandeVaart
909cfa35aa Fix some error messages.
This commit was SVN r32458.
2014-08-08 14:33:46 +00:00
Jeff Squyres
040989edde btl_sm_component.c: update outdated topic for call to show_help
This commit was SVN r32456.
2014-08-08 13:35:52 +00:00
Jeff Squyres
2d2534a1bc btl_openib_component.c: remove inline logic and use 2 calls to show_help
The contrib/check-help-strings.pl gets confused if the topic is an
inline logic check, so separate it into two calls to show_help.

This commit was SVN r32455.
2014-08-08 13:35:29 +00:00
Jeff Squyres
b0897031f0 btl vader: removed unused helpfile
vader does not invoke show_help() at all -- all the topics in this
helpfile were copied from the original sm btl helpfile.

This commit was SVN r32454.
2014-08-08 13:35:05 +00:00
Jeff Squyres
eefa17026d windows: effectively revert r32449
The _strdup usage in opal/util/basename looks like it was a product of
Windows compatibility (see r11336), which we don't care about any
more.  Further, opal/win32/win_compat.h, which we sitll maintain for
cygwin compatibility, #define's strdup to _strdup (which is what
Microsoft wants you to use).  

So this old _strdup in opal/util/basename.c (and its corresponding
check in configure.ac) should just be removed.

This commit was SVN r32450.

The following SVN revision numbers were found above:
  r11336 --> open-mpi/ompi@a28b025150
  r32449 --> open-mpi/ompi@d5a3448b8b
2014-08-08 11:36:45 +00:00
Gilles Gouaillardet
d5a3448b8b Fix missing prototype for _strdup
_strdup is not part of any include file i could find on Solaris 10.
manually add the _strdup prototype if needed.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32449.
2014-08-08 02:51:56 +00:00
Jeff Squyres
c6d9bf906e configury: ensure wrapper static LIBS is filled properly
In core library portions of the configury (e.g., top-level
configure.ac itself), we were calling AC_CHECK_LIB and
OPAL_CHECK_FUNC_LIB to check for various libraries.

'''SIDENOTE:''' It turns out that modern Autoconf has AC_SEARCH_LIBS,
which does just about exactly what OPAL_CHECK_FUNC_LIB does.  So this
commit effectively replaces OPAL_CHECK_FUNC_LIB with AC_SEARCH_LIBS.

However, we never bothered to add these found libraries to the wrapper
compiler list of libraries used for static linking (doh!).  We've been
getting lucky for quite a while that components were adding the same
libraries to their wrapper compiler LIBS list.  

This is problematic, however, if we don't build some of these
components.  For example, Paul Hargrove noticed that if he configured
with --disable-shared --enable-static --disable-io-romio, ROMIO was no
longer adding some libraries to the wrapper LIBS list -- libraries
that just happened to also be needed by core OPAL/ORTE/OMPI layers.

The solution is not to use AC_CHECK_LIB or OPAL_CHECK_FUNC_LIB, but
use a pair of new macros:

 * OPAL_SEARCH_LIBS_CORE: a wrapper around AC_SEARCH_LIBS.  If we add
   something to $LIBS, then also add it to the wrapper list of static
   libraries.  This is the main piece of functionality that was
   wrong/missing.
 * OPAL_SEARCH_LIBS_COMPONENT: similar to OPAL_SEARCH_LIBS_CORE, but
   instead of directly adding it to the wrapper list of static
   libaries, add it to <framework>_<component>_LIBS (which eventually
   gets slurped up into the wrapper list of static libraries.  See the
   lengthy comment in config/opal_setup_wrappers.m4 near the beginning
   of OPAL_SETUP_WRAPPER_INIT() for a more detailed explanation).
   Most components did this correctly already, but one or two weren't
   right, so I implemented this second macro quite similar to the
   first and put it everywhere we already used AC_SEARCH_LIBS or
   OPAL_CHECK_FUNC_LIB.

This needs to soak for a day or two on the trunk before moving to the
v1.8 branch.

Refs trac:4834

cmr=v1.8.2:reviewer=ggouaillardet

This commit was SVN r32447.

The following Trac tickets were found above:
  Ticket 4834 --> https://svn.open-mpi.org/trac/ompi/ticket/4834
2014-08-07 23:54:45 +00:00
Jeff Squyres
6bf28a6940 usnic: update help messages
These messages were committed in the v1.8 branch in r32341, but were
never committed to the trunk (because we were waiting for the OPAL BTL
move).  This commit brings the trunk and v1.8 help messages in line
with each other.

This commit was SVN r32445.

The following SVN revision numbers were found above:
  r32341 --> open-mpi/ompi@5e752b4aba
2014-08-07 20:50:29 +00:00
Jeff Squyres
70f5a10128 usnic: fix typo from r32438
This commit was SVN r32440.

The following SVN revision numbers were found above:
  r32438 --> open-mpi/ompi@d2e31ac647
2014-08-06 19:29:46 +00:00
Jeff Squyres
d2e31ac647 usnic: Fix connectivity checker pointer mismatch
Ensure that the connectivity checker agent only uses pointers from the
client that is the same process as the agent.

Not necessary for the v1.8 branch -- this is a trunk/v1.9-only problem.

This commit was SVN r32438.
2014-08-05 23:07:01 +00:00
Jeff Squyres
34897cee9f usnic: unify teardown between trunk and v1.8 branches
Make the del_procs, module finalize, and endpoint destructors be the
same between trunk and v1.8, with one exception: the very beginning of
v1.8 module_finalize calls del_procs for each proc to simulate/pretend
the trunk/v1.9 PML behavior of calling del_procs before module_finalize.

This commit was SVN r32437.
2014-08-05 22:31:55 +00:00
Jeff Squyres
1a8d72119f usnic: Fix configure.ac typo
This commit was SVN r32436.
2014-08-05 22:31:07 +00:00
Ralph Castain
846326af7f Silence warning
This commit was SVN r32433.
2014-08-05 15:33:12 +00:00
Ralph Castain
6b1a508595 We haven't supported SPARC 8 in a long time, so remove the atomics reference. Thanks to Paul Hargrove for bringing it to our attention.
cmr=v1.8.3:reviewer=hjelmn

This commit was SVN r32430.
2014-08-05 14:49:34 +00:00
Ralph Castain
1e93e85403 Cleanup some autoconf messages - thanks to Paul Hargrove for noting them
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32429.
2014-08-05 14:48:42 +00:00
Gilles Gouaillardet
ef1fdbb5d7 Use opal_getpagesize to get the proper page size
Refs trac:4826

This commit was SVN r32428.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-05 05:37:44 +00:00
Gilles Gouaillardet
e8bf030d93 Use opal_getpagesize to get the proper page size
Refs trac:4826

This commit was SVN r32427.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-05 05:35:57 +00:00
Gilles Gouaillardet
f7ede30c46 Use opal_getpagesize to get the proper page size
Refs trac:4826

This commit was SVN r32426.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-05 05:30:03 +00:00
Gilles Gouaillardet
3c2e75c6b7 Fix OPAL_PROCESS_NAME_xTOy for heterogeneous support
This commit was SVN r32425.
2014-08-05 05:22:50 +00:00
Dave Goodell
13b104bdef usnic: fix endpoint destruction on the trunk
Fixes an assertion failure in --enable-debug builds and SEGVs in normal
builds.

I'm not 100% sure I like this model, but it at least seems to be
consistent.  Some variation on this scheme will need to be adapted to
the trunk, where usnic_del_procs() is called by the PML instead of
internally in usnic_finalize().

A related bug (but with different mechanics) is #4832.

This commit was SVN r32424.
2014-08-04 21:30:21 +00:00
Dave Goodell
490c484f8c usnic: fix uninitialized param to accept(2)
This commit was SVN r32423.
2014-08-04 21:30:08 +00:00
Ralph Castain
e1c09cec2c Fixes trac:4377
Per patch from aryzhikh, initialize a copule of fields in the openib ini struct

Hard to understand why this has sat there for so long...

cmr=v1.8.2:reviewer=rhc:subject=initialize a copule of fields in the openib ini struct

This commit was SVN r32417.

The following Trac tickets were found above:
  Ticket 4377 --> https://svn.open-mpi.org/trac/ompi/ticket/4377
2014-08-04 19:41:52 +00:00
Dave Goodell
61a9b49d5b usnic: fix usnic breakage in ORCM repo
This commit was SVN r32416.
2014-08-04 19:34:55 +00:00
Howard Pritchard
4beab705aa different way to fix opal_config compile problem
This commit was SVN r32415.
2014-08-04 17:37:12 +00:00
Howard Pritchard
686eb40ba2 Revert "get file to compile with gcc 4.8.1"
This reverts commit 041a192502ebc87e22cd002dd20c67478831d451.

This commit was SVN r32414.
2014-08-04 17:37:08 +00:00
Howard Pritchard
2580b0db79 get file to compile with gcc 4.8.1
This commit was SVN r32412.
2014-08-04 16:40:47 +00:00
Ralph Castain
50586a3dde Per patch from Paul Hargrove, disable an unused function in libevent to restore OpenBSD support
Reviewed by rhc, RM-approved

cmr=v1.8.2:reviewer=ompi-gk1.8

This commit was SVN r32411.
2014-08-04 13:29:35 +00:00
Ralph Castain
61bf7af9d2 Per Paul Hargrove's suggestion, create an opal_pagesize function to abstract the various ways of obtaining that value. Rather than creating a separate file for only that one function, put it in a convenient place that is at least somewhat related.
Refs trac:4826

This commit was SVN r32407.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-02 18:38:16 +00:00
Ralph Castain
a347b19dc1 Add missing include
This commit was SVN r32406.
2014-08-01 18:49:37 +00:00
George Bosilca
1e37b67e5d No more assert in the proc destructor.
This commit was SVN r32401.
2014-08-01 16:36:23 +00:00
Ralph Castain
daeb9b6c4f Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain.
Thanks to Gilles for pointing out some of the discrepancies.

This commit was SVN r32398.
2014-08-01 14:44:11 +00:00
George Bosilca
3ec865558d Dont miss the Os X atomics on "make dist".
This commit was SVN r32390.
2014-08-01 03:35:38 +00:00
Jeff Squyres
ff4717b727 usnic: cagent now checks that incoming pings are expected
Previously, the connectivity agent was pretty dumb: it took whatever
pings it got and ACKed them.  Then we added an agent check to ensured
that the ping actually came from the source interface that it said it
came from.  Now we add another check such that when a ping is received
on interface X that corresponds to usnic module Y, we ensure that the
source interface of the ping is on the all_endpoints list for module Y
(i.e., module Y expects to be able to talk to that peer interface).

This detects cases where peers have come to different conclusions
about which interfaces should be used to communicate (which is bad!).
This usually reflects a network misconfiguration.

Fixes CSCuq05389.

This commit was SVN r32383.
2014-07-31 22:30:20 +00:00
George Bosilca
97de458cd7 Fix.
This commit was SVN r32382.
2014-07-31 21:54:07 +00:00
Jeff Squyres
a694e46560 tcp btl: remove the btl_tcp_if_seq MCA param
No one was using this functionality, anyway.

This commit was SVN r32381.
2014-07-31 20:16:10 +00:00
George Bosilca
daa076995a orte_rmaps_numa_node_t -> opal_rmaps_numa_node_t
This commit was SVN r32380.
2014-07-31 19:58:47 +00:00
Ralph Castain
c366554048 Cleanup the MCA param registration - the MPI_T interface is allowed to modify the value, so don't read the value until we are ready to use it. Discussed with Nathan, but will ask his review prior to porting it to 1.8.2
Refs trac:4816

This commit was SVN r32379.

The following Trac tickets were found above:
  Ticket 4816 --> https://svn.open-mpi.org/trac/ompi/ticket/4816
2014-07-31 16:30:08 +00:00
Rolf vandeVaart
cd729fe173 Improve error message.
This commit was SVN r32378.
2014-07-31 15:34:38 +00:00
Nathan Hjelm
85b78f838e Fix mpool/udreg compilation
This commit was SVN r32377.
2014-07-31 15:29:20 +00:00
Nathan Hjelm
6b7c4e72d5 Fix another ompi->opal naming issue
This commit was SVN r32376.
2014-07-31 15:01:19 +00:00
Rolf vandeVaart
97da541abf Fix hostname string usage in error messages.
This commit was SVN r32375.
2014-07-31 14:47:51 +00:00
Gilles Gouaillardet
6520f0578a Silence some warnings in btl/scif and btl/openib
This commit was SVN r32371.
2014-07-31 09:55:11 +00:00
Ralph Castain
db89071dc2 Cleanup the moved component's Makefile.am to use the opal instead of ompi directories
This commit was SVN r32370.
2014-07-31 04:41:04 +00:00
Jeff Squyres
959bdace3c usnic: check that connectivity pings came from where they said they came from
Ensure that incoming "ping" messages came from the IP address that
they think they came from.  If they don't, drop them (because it is
probably routing error), which will likely eventually cause the
connectivity checker to timeout, and therefore cause the job to abort.

This commit was SVN r32368.
2014-07-30 21:03:56 +00:00
Jeff Squyres
20349da03b usnic: minor cleanup
This commit was SVN r32367.
2014-07-30 20:56:49 +00:00
Jeff Squyres
f1fb4970a5 usnic: remove all trailing whitespace
Style cleanup only; no code changes.

This commit was SVN r32366.
2014-07-30 20:56:15 +00:00
Jeff Squyres
a6c6aa2815 usnic: remove compatibility with v1.6 series
Update compat.h to only handle compatability between v1.7/v1.8 and
v1.9/2.0 (i.e., the current trunk).  Remove what seems to be the last
vestiages of OMPI/ORTE pollution in the now-OPAL-ized usnic BTL.

Currently use a hard-coded constant for the MCW size (i.e.,
MPI_COMM_WORLD size) for some initialization values in the v1.9/2.0
series; still need to figure out something better there.

This commit was SVN r32365.
2014-07-30 20:55:26 +00:00
Jeff Squyres
aed8b43db2 usnic: remove last vestiges of ORTE process names
These were missed when the BTLs moved to OPAL.

This commit was SVN r32364.
2014-07-30 20:53:41 +00:00
Jeff Squyres
d195b8caf4 usnic: fix a bunch of OMPI_* and ORTE_* constant usage
Somehow a bunch of OMPI_* and ORTE_* constants didn't get renamed in
the BTL move to OPAL.  This commit fixes that.

This commit was SVN r32363.
2014-07-30 20:52:54 +00:00
Jeff Squyres
2447c8479f usnic: do not call ompi_rte_abort()
Adapt to moving down to OPAL: find a PML-registered error callback,
and use that when we don't have a module context and we need to abort.

Failing that, just call exit().

This commit was SVN r32362.
2014-07-30 20:52:06 +00:00
George Bosilca
f39abb9e69 Reverting r32355: a number of processes is not a notion that a low level
communication library should use to initialize itself.

Ralph will champion this change back with an RFC if there is a realistic
need/use case from the community.

This commit was SVN r32361.

The following SVN revision numbers were found above:
  r32355 --> open-mpi/ompi@c903917f47
2014-07-30 20:11:35 +00:00
Nathan Hjelm
a32d93ec20 mca/base: make clang static analyzer happy
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32360.
2014-07-30 17:45:28 +00:00
Nathan Hjelm
97fad1dd95 mca/base: ensure component version parameters get deregistered when the
component gets dlclosed

cmr=v1.8.2:reviewer=rhc

This commit was SVN r32359.
2014-07-30 17:45:23 +00:00
Rolf vandeVaart
01c2c1c78c Increase default minimum sm pool size. Discussed in this RFC:
http://www.open-mpi.org/community/lists/devel/2014/07/15274.php

cmr=v1.8.3:reviewer=bosilca

This commit was SVN r32356.
2014-07-30 14:01:08 +00:00
Ralph Castain
c903917f47 Expose the num_procs information to the opal layer as the info is needed in several BTLs
This commit was SVN r32355.
2014-07-30 09:33:41 +00:00
Gilles Gouaillardet
b2dbde2be4 btl/scif : fix compilation
btl/scif could not compile after it was moved down into opal.
this commit fixes the compilation issue *only*

This commit was SVN r32351.
2014-07-30 04:26:50 +00:00
Joshua Ladd
da286d9a84 Fixing the OpenIB receive queue selection logic. Refs trac:4816
This commit was SVN r32350.

The following Trac tickets were found above:
  Ticket 4816 --> https://svn.open-mpi.org/trac/ompi/ticket/4816
2014-07-30 02:17:46 +00:00
Jeff Squyres
10a58992af usnic: add --with-usnic configure switch
If --with-usnic is specified and we can't build the usnic BTL, abort.
If --without-usnic is specified, gracefully skip building the usnic
BTL.  If neither is specified, do the OMPI-default behavior: try to
configure/build the usnic BTL, and if we can't, skip it.
    
Fixes CSCuq13889.

This commit was SVN r32349.
2014-07-30 02:11:04 +00:00
Joshua Ladd
7a694cb95e Reverting r32346. Will do it the correct way and update the patch in Refs trac:32346
This commit was SVN r32348.

The following SVN revision numbers were found above:
  r32346 --> open-mpi/ompi@5be6ff07d5

The following Trac tickets were found above:
  Ticket 32346 --> https://svn.open-mpi.org/trac/ompi/ticket/32346
2014-07-29 23:23:44 +00:00
Joshua Ladd
5be6ff07d5 This fixes the OpenIB BTL receive queue selection logic in the trunk. Custom patch for 1.8.2 is provided in Refs trac:4816
This commit was SVN r32346.

The following Trac tickets were found above:
  Ticket 4816 --> https://svn.open-mpi.org/trac/ompi/ticket/4816
2014-07-29 21:42:20 +00:00
Nathan Hjelm
b8c3b01643 Fix some missed ompi->opal renamings
This commit was SVN r32345.
2014-07-29 18:59:59 +00:00
Nathan Hjelm
b10a29fd96 btl/vader: Fix typo in add_procs introduced by the btl move
This commit was SVN r32342.
2014-07-29 16:07:59 +00:00
Rolf vandeVaart
e39c673dd2 Fix typo.
This commit was SVN r32339.
2014-07-29 13:21:24 +00:00
Ralph Castain
d674e22433 Remove stale include as header no longer exists. Add missing header
This commit was SVN r32336.
2014-07-29 01:24:27 +00:00
Nathan Hjelm
0e47441333 Remove unused files
This commit was SVN r32335.
2014-07-28 22:01:16 +00:00
Nathan Hjelm
dffbcb3803 User opal_process_info.cpuset to determine if the process is bound
This commit was SVN r32334.
2014-07-28 22:00:18 +00:00