1
1
Граф коммитов

2965 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
1e93e85403 Cleanup some autoconf messages - thanks to Paul Hargrove for noting them
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32429.
2014-08-05 14:48:42 +00:00
Gilles Gouaillardet
ef1fdbb5d7 Use opal_getpagesize to get the proper page size
Refs trac:4826

This commit was SVN r32428.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-05 05:37:44 +00:00
Gilles Gouaillardet
e8bf030d93 Use opal_getpagesize to get the proper page size
Refs trac:4826

This commit was SVN r32427.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-05 05:35:57 +00:00
Gilles Gouaillardet
f7ede30c46 Use opal_getpagesize to get the proper page size
Refs trac:4826

This commit was SVN r32426.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-05 05:30:03 +00:00
Gilles Gouaillardet
3c2e75c6b7 Fix OPAL_PROCESS_NAME_xTOy for heterogeneous support
This commit was SVN r32425.
2014-08-05 05:22:50 +00:00
Dave Goodell
13b104bdef usnic: fix endpoint destruction on the trunk
Fixes an assertion failure in --enable-debug builds and SEGVs in normal
builds.

I'm not 100% sure I like this model, but it at least seems to be
consistent.  Some variation on this scheme will need to be adapted to
the trunk, where usnic_del_procs() is called by the PML instead of
internally in usnic_finalize().

A related bug (but with different mechanics) is #4832.

This commit was SVN r32424.
2014-08-04 21:30:21 +00:00
Dave Goodell
490c484f8c usnic: fix uninitialized param to accept(2)
This commit was SVN r32423.
2014-08-04 21:30:08 +00:00
Ralph Castain
e1c09cec2c Fixes trac:4377
Per patch from aryzhikh, initialize a copule of fields in the openib ini struct

Hard to understand why this has sat there for so long...

cmr=v1.8.2:reviewer=rhc:subject=initialize a copule of fields in the openib ini struct

This commit was SVN r32417.

The following Trac tickets were found above:
  Ticket 4377 --> https://svn.open-mpi.org/trac/ompi/ticket/4377
2014-08-04 19:41:52 +00:00
Dave Goodell
61a9b49d5b usnic: fix usnic breakage in ORCM repo
This commit was SVN r32416.
2014-08-04 19:34:55 +00:00
Howard Pritchard
4beab705aa different way to fix opal_config compile problem
This commit was SVN r32415.
2014-08-04 17:37:12 +00:00
Howard Pritchard
686eb40ba2 Revert "get file to compile with gcc 4.8.1"
This reverts commit 041a192502ebc87e22cd002dd20c67478831d451.

This commit was SVN r32414.
2014-08-04 17:37:08 +00:00
Howard Pritchard
2580b0db79 get file to compile with gcc 4.8.1
This commit was SVN r32412.
2014-08-04 16:40:47 +00:00
Ralph Castain
50586a3dde Per patch from Paul Hargrove, disable an unused function in libevent to restore OpenBSD support
Reviewed by rhc, RM-approved

cmr=v1.8.2:reviewer=ompi-gk1.8

This commit was SVN r32411.
2014-08-04 13:29:35 +00:00
Ralph Castain
61bf7af9d2 Per Paul Hargrove's suggestion, create an opal_pagesize function to abstract the various ways of obtaining that value. Rather than creating a separate file for only that one function, put it in a convenient place that is at least somewhat related.
Refs trac:4826

This commit was SVN r32407.

The following Trac tickets were found above:
  Ticket 4826 --> https://svn.open-mpi.org/trac/ompi/ticket/4826
2014-08-02 18:38:16 +00:00
Ralph Castain
a347b19dc1 Add missing include
This commit was SVN r32406.
2014-08-01 18:49:37 +00:00
George Bosilca
1e37b67e5d No more assert in the proc destructor.
This commit was SVN r32401.
2014-08-01 16:36:23 +00:00
Ralph Castain
daeb9b6c4f Some more cleanups. Remove direct references to ORTE by changing OMPI_CAST_ORTE_NAME -> OMPI_CAST_RTE_NAME. Ensure that ORTE tools (mpirun, orted, tools) set the OPAL proc structure fields so OPAL knows what is going on and uses the correct print functions (still need to fix the problem for non-MPI apps). Properly return uint32_t from the opal utilities instead of int32_t as that is what the ORTE process name fields contain.
Thanks to Gilles for pointing out some of the discrepancies.

This commit was SVN r32398.
2014-08-01 14:44:11 +00:00
George Bosilca
3ec865558d Dont miss the Os X atomics on "make dist".
This commit was SVN r32390.
2014-08-01 03:35:38 +00:00
Jeff Squyres
ff4717b727 usnic: cagent now checks that incoming pings are expected
Previously, the connectivity agent was pretty dumb: it took whatever
pings it got and ACKed them.  Then we added an agent check to ensured
that the ping actually came from the source interface that it said it
came from.  Now we add another check such that when a ping is received
on interface X that corresponds to usnic module Y, we ensure that the
source interface of the ping is on the all_endpoints list for module Y
(i.e., module Y expects to be able to talk to that peer interface).

This detects cases where peers have come to different conclusions
about which interfaces should be used to communicate (which is bad!).
This usually reflects a network misconfiguration.

Fixes CSCuq05389.

This commit was SVN r32383.
2014-07-31 22:30:20 +00:00
George Bosilca
97de458cd7 Fix.
This commit was SVN r32382.
2014-07-31 21:54:07 +00:00
Jeff Squyres
a694e46560 tcp btl: remove the btl_tcp_if_seq MCA param
No one was using this functionality, anyway.

This commit was SVN r32381.
2014-07-31 20:16:10 +00:00
George Bosilca
daa076995a orte_rmaps_numa_node_t -> opal_rmaps_numa_node_t
This commit was SVN r32380.
2014-07-31 19:58:47 +00:00
Ralph Castain
c366554048 Cleanup the MCA param registration - the MPI_T interface is allowed to modify the value, so don't read the value until we are ready to use it. Discussed with Nathan, but will ask his review prior to porting it to 1.8.2
Refs trac:4816

This commit was SVN r32379.

The following Trac tickets were found above:
  Ticket 4816 --> https://svn.open-mpi.org/trac/ompi/ticket/4816
2014-07-31 16:30:08 +00:00
Rolf vandeVaart
cd729fe173 Improve error message.
This commit was SVN r32378.
2014-07-31 15:34:38 +00:00
Nathan Hjelm
85b78f838e Fix mpool/udreg compilation
This commit was SVN r32377.
2014-07-31 15:29:20 +00:00
Nathan Hjelm
6b7c4e72d5 Fix another ompi->opal naming issue
This commit was SVN r32376.
2014-07-31 15:01:19 +00:00
Rolf vandeVaart
97da541abf Fix hostname string usage in error messages.
This commit was SVN r32375.
2014-07-31 14:47:51 +00:00
Gilles Gouaillardet
6520f0578a Silence some warnings in btl/scif and btl/openib
This commit was SVN r32371.
2014-07-31 09:55:11 +00:00
Ralph Castain
db89071dc2 Cleanup the moved component's Makefile.am to use the opal instead of ompi directories
This commit was SVN r32370.
2014-07-31 04:41:04 +00:00
Jeff Squyres
959bdace3c usnic: check that connectivity pings came from where they said they came from
Ensure that incoming "ping" messages came from the IP address that
they think they came from.  If they don't, drop them (because it is
probably routing error), which will likely eventually cause the
connectivity checker to timeout, and therefore cause the job to abort.

This commit was SVN r32368.
2014-07-30 21:03:56 +00:00
Jeff Squyres
20349da03b usnic: minor cleanup
This commit was SVN r32367.
2014-07-30 20:56:49 +00:00
Jeff Squyres
f1fb4970a5 usnic: remove all trailing whitespace
Style cleanup only; no code changes.

This commit was SVN r32366.
2014-07-30 20:56:15 +00:00
Jeff Squyres
a6c6aa2815 usnic: remove compatibility with v1.6 series
Update compat.h to only handle compatability between v1.7/v1.8 and
v1.9/2.0 (i.e., the current trunk).  Remove what seems to be the last
vestiages of OMPI/ORTE pollution in the now-OPAL-ized usnic BTL.

Currently use a hard-coded constant for the MCW size (i.e.,
MPI_COMM_WORLD size) for some initialization values in the v1.9/2.0
series; still need to figure out something better there.

This commit was SVN r32365.
2014-07-30 20:55:26 +00:00
Jeff Squyres
aed8b43db2 usnic: remove last vestiges of ORTE process names
These were missed when the BTLs moved to OPAL.

This commit was SVN r32364.
2014-07-30 20:53:41 +00:00
Jeff Squyres
d195b8caf4 usnic: fix a bunch of OMPI_* and ORTE_* constant usage
Somehow a bunch of OMPI_* and ORTE_* constants didn't get renamed in
the BTL move to OPAL.  This commit fixes that.

This commit was SVN r32363.
2014-07-30 20:52:54 +00:00
Jeff Squyres
2447c8479f usnic: do not call ompi_rte_abort()
Adapt to moving down to OPAL: find a PML-registered error callback,
and use that when we don't have a module context and we need to abort.

Failing that, just call exit().

This commit was SVN r32362.
2014-07-30 20:52:06 +00:00
George Bosilca
f39abb9e69 Reverting r32355: a number of processes is not a notion that a low level
communication library should use to initialize itself.

Ralph will champion this change back with an RFC if there is a realistic
need/use case from the community.

This commit was SVN r32361.

The following SVN revision numbers were found above:
  r32355 --> open-mpi/ompi@c903917f47
2014-07-30 20:11:35 +00:00
Nathan Hjelm
a32d93ec20 mca/base: make clang static analyzer happy
cmr=v1.8.3:reviewer=jsquyres

This commit was SVN r32360.
2014-07-30 17:45:28 +00:00
Nathan Hjelm
97fad1dd95 mca/base: ensure component version parameters get deregistered when the
component gets dlclosed

cmr=v1.8.2:reviewer=rhc

This commit was SVN r32359.
2014-07-30 17:45:23 +00:00
Rolf vandeVaart
01c2c1c78c Increase default minimum sm pool size. Discussed in this RFC:
http://www.open-mpi.org/community/lists/devel/2014/07/15274.php

cmr=v1.8.3:reviewer=bosilca

This commit was SVN r32356.
2014-07-30 14:01:08 +00:00
Ralph Castain
c903917f47 Expose the num_procs information to the opal layer as the info is needed in several BTLs
This commit was SVN r32355.
2014-07-30 09:33:41 +00:00
Gilles Gouaillardet
b2dbde2be4 btl/scif : fix compilation
btl/scif could not compile after it was moved down into opal.
this commit fixes the compilation issue *only*

This commit was SVN r32351.
2014-07-30 04:26:50 +00:00
Joshua Ladd
da286d9a84 Fixing the OpenIB receive queue selection logic. Refs trac:4816
This commit was SVN r32350.

The following Trac tickets were found above:
  Ticket 4816 --> https://svn.open-mpi.org/trac/ompi/ticket/4816
2014-07-30 02:17:46 +00:00
Jeff Squyres
10a58992af usnic: add --with-usnic configure switch
If --with-usnic is specified and we can't build the usnic BTL, abort.
If --without-usnic is specified, gracefully skip building the usnic
BTL.  If neither is specified, do the OMPI-default behavior: try to
configure/build the usnic BTL, and if we can't, skip it.
    
Fixes CSCuq13889.

This commit was SVN r32349.
2014-07-30 02:11:04 +00:00
Joshua Ladd
7a694cb95e Reverting r32346. Will do it the correct way and update the patch in Refs trac:32346
This commit was SVN r32348.

The following SVN revision numbers were found above:
  r32346 --> open-mpi/ompi@5be6ff07d5

The following Trac tickets were found above:
  Ticket 32346 --> https://svn.open-mpi.org/trac/ompi/ticket/32346
2014-07-29 23:23:44 +00:00
Joshua Ladd
5be6ff07d5 This fixes the OpenIB BTL receive queue selection logic in the trunk. Custom patch for 1.8.2 is provided in Refs trac:4816
This commit was SVN r32346.

The following Trac tickets were found above:
  Ticket 4816 --> https://svn.open-mpi.org/trac/ompi/ticket/4816
2014-07-29 21:42:20 +00:00
Nathan Hjelm
b8c3b01643 Fix some missed ompi->opal renamings
This commit was SVN r32345.
2014-07-29 18:59:59 +00:00
Nathan Hjelm
b10a29fd96 btl/vader: Fix typo in add_procs introduced by the btl move
This commit was SVN r32342.
2014-07-29 16:07:59 +00:00
Rolf vandeVaart
e39c673dd2 Fix typo.
This commit was SVN r32339.
2014-07-29 13:21:24 +00:00
Ralph Castain
d674e22433 Remove stale include as header no longer exists. Add missing header
This commit was SVN r32336.
2014-07-29 01:24:27 +00:00
Nathan Hjelm
0e47441333 Remove unused files
This commit was SVN r32335.
2014-07-28 22:01:16 +00:00
Nathan Hjelm
dffbcb3803 User opal_process_info.cpuset to determine if the process is bound
This commit was SVN r32334.
2014-07-28 22:00:18 +00:00
Nathan Hjelm
1407c1f501 Remove RML code from common/sm
The only user of this code was coll/sm. I implemented a basic replacement
for the removed code. This gets the trunk compiling again with
--disable-dlopen.

This commit was SVN r32333.
2014-07-28 22:00:12 +00:00
Nathan Hjelm
603ba71b0d Ignore components broken by the BTL move.
common/ofacm is only used by the iboffload code in ompi. This code does
not currently work so it is safe to ignore these components until it is
fixed.

This commit was SVN r32331.
2014-07-28 21:24:18 +00:00
Ryan Grant
caa10a5faf Portals fixes after latest move
This commit was SVN r32330.
2014-07-28 19:25:03 +00:00
George Bosilca
68c4ecbe06 Even less ompi usages down here.
This commit was SVN r32328.
2014-07-26 22:21:08 +00:00
George Bosilca
3d5578d6df No vestiges of the RTE or OMPI left.
This commit was SVN r32327.
2014-07-26 22:19:49 +00:00
George Bosilca
4d3d421b06 Cleanup (not tested).
This commit was SVN r32326.
2014-07-26 22:19:07 +00:00
George Bosilca
f217661ee0 Use opal_process_info whenever possible. Some other minor cleanups.
This commit was SVN r32325.
2014-07-26 21:48:23 +00:00
George Bosilca
a3feb627cf Move some of the ompi_process_info down in OPAL.
This commit was SVN r32324.
2014-07-26 21:43:34 +00:00
George Bosilca
85b89795ff Rename all the win32 files to use the correct layer name.
This commit was SVN r32323.
2014-07-26 20:17:09 +00:00
George Bosilca
de2cb71790 Rename the help file.
This commit was SVN r32322.
2014-07-26 20:16:28 +00:00
Ralph Castain
76d82b885f Correctly dereference the thread object
This commit was SVN r32321.
2014-07-26 17:01:27 +00:00
Ralph Castain
fbed15072c Add missing rename of file
This commit was SVN r32318.
2014-07-26 01:21:07 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Ralph Castain
408474d2d8 Remove some CRLFs so git quits changing them
This commit was SVN r32278.
2014-07-22 16:56:24 +00:00
Ralph Castain
c1bb5b68d0 It is possible to have a "standard" progress thread, so simplify the usage of the opal_progress_thread code.
This commit was SVN r32277.
2014-07-22 16:55:23 +00:00
George Bosilca
0b72cdabfd This is basically a revert of r32236. What looked as a good patch
turned out to be a nightmare, as the pointers to the classes are
located in shared libraries memory areas, and are not accesible
after the shared library is unloaded. Thus, OPAL cannot cleanup
the left-over classes from the other shared libraries.

This commit was SVN r32248.

The following SVN revision numbers were found above:
  r32236 --> open-mpi/ompi@59017433e1
2014-07-15 18:21:01 +00:00
George Bosilca
7c21491858 Fix the indentation. Allow for deregistration of OPAL params.
This commit was SVN r32242.
2014-07-15 05:20:26 +00:00
George Bosilca
2f6bc76dc1 Be symmetric to the opal_init.
This commit was SVN r32241.
2014-07-15 05:05:26 +00:00
George Bosilca
ca76fb69df Start with a sane value.
This commit was SVN r32240.
2014-07-15 05:04:55 +00:00
George Bosilca
77c74e8872 Don't iniaitlize twice the event framework (it is already initialized
the init_tool).

This commit was SVN r32239.
2014-07-15 05:04:29 +00:00
George Bosilca
ed3d98a76d Up to strlen and not to sizeof. This is guaranteed to work as
in the worst case we just forced a \0 at the end of the string.

This commit was SVN r32238.
2014-07-15 05:03:06 +00:00
George Bosilca
a648fcdeb0 Upon close reset the search_dirs.
This commit was SVN r32237.
2014-07-15 05:02:19 +00:00
George Bosilca
59017433e1 Allow for class allocation/deallocation.
This commit was SVN r32236.
2014-07-15 05:01:36 +00:00
Mike Dubman
0db23b0210 BUILD: support new automake
new automake requires subdirs-object directive, to resolve this:

09:43:37 automake: warning: possible forward-incompatibility.
09:43:37 automake: At least a source file is in a subdirectory, but the 'subdir-objects'
09:43:37 automake: automake option hasn't been enabled.  For now, the corresponding output
09:43:37 automake: object file(s) will be placed in the top-level directory.  However,
09:43:37 automake: this behaviour will change in future Automake versions: they will
09:43:37 automake: unconditionally cause object files to be placed in the same subdirectory
09:43:37 automake: of the corresponding sources.
09:43:37 automake: You are advised to start using 'subdir-objects' option throughout your
09:43:37 automake: project, to avoid future incompatibilities.
09:43:37 tools/otfmerge/Makefile.common:13: warning: source file '$(OTFMERGESRCDIR)/otfmerge.c' is in a subdirectory,
09:43:37 tools/otfmerge/Makefile.common:13: but option 'subdir-objects' is disabled

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32225.
2014-07-12 12:38:15 +00:00
Mike Dubman
e342a11c2e opal envlist mca: implement Jeff`s quibbles
fixed by Elena, reviewed by Miked

This commit was SVN r32216.
2014-07-11 07:23:20 +00:00
Ralph Castain
60da1456d9 Silence unused var warning
This commit was SVN r32187.
2014-07-09 22:37:22 +00:00
Ralph Castain
354a1d07a6 Silence warning about set-but-unused var
This commit was SVN r32186.
2014-07-09 22:37:05 +00:00
Mike Dubman
6efc9a7329 opal mca: externalize delimiter for base_env_list mca parameters
fixed by Elena, reviewed by MikeD

This commit was SVN r32182.
2014-07-09 18:55:49 +00:00
Nathan Hjelm
8ab9b628b5 Add missing file to tarball
This commit was SVN r32181.
2014-07-09 18:01:05 +00:00
Joshua Ladd
801e2cb544 Fix error and warning messages after reverting
the mca_base_env_list to being semicolon delimited.

This commit was SVN r32179.
2014-07-09 14:46:19 +00:00
Mike Dubman
32aeba4bdf opal: revert env_list separator to ";" from "+"
This commit was SVN r32172.
2014-07-09 05:38:51 +00:00
Joshua Ladd
057370364d Opal: Add a new MCA variable type "version_string". Also add a
new flag to ompi_info that allows a user to print all MCA variables of a specific type.  

 --type version_string

This command will print all MCA variables of type version_string.

This feature was developed by Elena Shipunova and was reviewed by Josh Ladd.

This commit was SVN r32166.
2014-07-09 01:37:23 +00:00
Joshua Ladd
30da6d3a17 Opal: add a new MCA parameter that allows the user to specify a list of environment variables. This parameter will become the standard mechanism by which environment variables are set for OMPI applications replacing the -x option.
mpirun ... -x env_foo1=val1 -x env_foo2 -x env_foo3=val3  should now be expressed as

mpirun ... -mca mca_base_env_list env_foo1=val1+env_foo2+env_foo3=val3. 

The motivation for doing this is so that a list of environment variables may be set via standard MCA mechanisms such as mca parameter files, amca lists, etc. 

This feature was developed by Elena Shipunova and was reviewed by Josh Ladd.

This commit was SVN r32163.
2014-07-09 00:38:25 +00:00
Jeff Squyres
86f747c627 mca_base_var: use ERR_NOT_FOUND when var is inactive
After discussion with Nathan, change the ERR_VALUE_OUT_OF_BOUNDS to be
ERR_NOT_FOUND, for two reasons:

1. It's consistent with other uses of ERR_NOT_FOUND in the code.
1. In this case, we could have just looked up a variable that is
   basically a "hole" -- e.g., var indexes 0, 1, 2 are valid, and 4,
   5, and 5 are valid, but 3 is invalid (e.g., 3 was de-registered --
   remember that MPI_T explicitly does not allow re-using indexes).
   So returning ERR_OUT_OF_BOUNDS seems weird -- returning
   ERR_NOT_FOUND seems a bit more natural.

cmr=v1.8.2:ticket=trac:4587

This commit was SVN r32158.

The following Trac tickets were found above:
  Ticket 4587 --> https://svn.open-mpi.org/trac/ompi/ticket/4587
2014-07-08 20:00:18 +00:00
Ralph Castain
1d5f8297c8 In case someone wants to use the pstat framework, we need to have the usual open/close functions that were lost in the conversion to the new framework system
This commit was SVN r32150.
2014-07-07 13:53:29 +00:00
Ralph Castain
b85471ac13 Add support for "double" to the DSS
This commit was SVN r32149.
2014-07-07 13:51:58 +00:00
Ralph Castain
832fa4a028 Ensure that the progress thread tracker properly cleans up the blocking event, if set. Also, use the blocking event to help wake up the progress thread for quick shutdown as some threads can be blocked in a long-running call to select.
This commit was SVN r32141.
2014-07-04 14:55:51 +00:00
Ralph Castain
f6d4b4c11b As discussed at the OMPI developer's meeting, add functions to start, stop, and restart libevent-driven progress threads. Critical NOTE: if you don't have a file descriptor event defined for your progress thread, it will spin hard! Accordingly, the "start progress thread" function has a boolean parameter you can use to request that the function automatically create one for you.
This commit was SVN r32137.
2014-07-03 18:56:46 +00:00
Nathan Hjelm
563eaf0726 Fix support for Cray alps
The alps ras and plm components were broken by recent changes in ORTE. This
commit resolves those issues.

Changes:

 - Define PMI2_SUCCESS if it isn't defined. This fixes a problem with Cray's
   PMI implementation which does not define (for some reason) PMI2_SUCCESS. We
   had previously just used PMI_SUCCESS.

 - Add missing definition and a typo in pml_alps_module.

 - launch_id is no longer available in the orte_node_t structure. Use the
   attribute lookup to get the value.

 - Do not use an O(n^2) sorting algorithm when putting alps nodes in order. Use
   opal_list_sort instead (O(nlogn)).

This commit was SVN r32076.
2014-06-24 21:29:04 +00:00
Ralph Castain
20535bca19 Reorder the var release so a debugger can still see the var name that caused a segfault, thus helping to identify the var in question
cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32068.
2014-06-24 13:51:31 +00:00
Ralph Castain
2aade28259 Protect against NULL return from malloc
This commit was SVN r32056.
2014-06-19 20:57:56 +00:00
Ralph Castain
1a53d541ab Cleanup memory leak
cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r32051.
2014-06-19 18:56:57 +00:00
Ralph Castain
b5a2ceaa7c Minor cleanup to node_stat packing routine
This commit was SVN r32049.
2014-06-19 18:46:27 +00:00
Ralph Castain
3f04d50cb0 Per the ticket, resolve our handling of overload conditions to provide a more consistent response. If we are overloaded (i.e., attempting to bind more processes to a location than the number of cpus under that location), then we consider the following conditions:
(a) default binding policy is in effect. In this case, we will emit a
warning and default to not binding unless the user provided the
"oversubscribe" or "overload" modifier to the "bind-to" option.

(b) user-specified binding policy is in effect. In this case, we will
error out unless the user provided the "oversubscribe" or "overload"
modifier to the "bind-to" option as we cannot meet the directive.

Either "bind-to" modifier (oversubscribe or overload) will be accepted for
now - in 1.9, we will deprecate the "overload" term in favor of
"oversubscribe".

Also added the ability to accept a --bind-to modifier without specifying the binding policy itself so a user can specify overload-allowed with the default policy.

Closes trac:4345

cmr=v1.8.2:reviewer=rhc:subject=resolve handling of overload conditions

This commit was SVN r32005.

The following Trac tickets were found above:
  Ticket 4345 --> https://svn.open-mpi.org/trac/ompi/ticket/4345
2014-06-14 15:38:32 +00:00
Ralph Castain
811f3d0665 Remove unused vars and actually test the indexed array member in the bitmap class
This commit was SVN r32004.
2014-06-14 03:18:24 +00:00
George Bosilca
13252206ad Correctly handle the case where we restrict the size to INT_MAX.
This commit was SVN r32003.
2014-06-13 21:32:46 +00:00
George Bosilca
fbe69808f2 A faster implementation of the OPAL_BITMAP. The corresponding
test has also been updated.

This commit was SVN r32001.
2014-06-13 21:15:35 +00:00
Ralph Castain
06dbfa3098 Make the cpus-per-proc equivalent a little more intuitive:
* allow users to specify just a modifier for map-by instead of requiring that they also specify a policy. Thus, we now accept --map-by :pe=3 as indicating that we should use the default mapping policy, but bind 3 cpus/proc.

* if users specify a pe's/proc but no policy, default to --map-by NUMA to ensure we have access to multiple cpus for the request. This won't guarantee we have access to enough to meet the request, but gives us a chance. In addition, we know that binding a proc to multiple cpus will work best if those cpus are all in the same NUMA, so this provides some degree of optimized behavior.

Per a request from Jeff, define "oversubscribe" for binding as a synonym for the "overload" modifier.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31967.
2014-06-08 20:26:59 +00:00
Ralph Castain
f197af530c Add print support for more of the opal_value_t values
This commit was SVN r31961.
2014-06-06 17:28:00 +00:00
Ralph Castain
ae5eec3e2f Minor tweaks to the pack/unpack routines
This commit was SVN r31954.
2014-06-05 14:49:37 +00:00
Ralph Castain
8736a1c138 Per RFC:
http://www.open-mpi.org/community/lists/devel/2014/05/14822.php

Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root).

This commit was SVN r31916.
2014-06-01 16:14:10 +00:00
Ralph Castain
304fe06e1b Don't spew warnings if PMI_FD isn't present - it isn't an error.
This commit was SVN r31911.
2014-06-01 13:40:12 +00:00
Mike Dubman
e85e0e926d PMI: hotfix for PMI1 support
Based on thread: http://www.open-mpi.org/community/lists/devel/2014/06/index.php

This commit was SVN r31909.
2014-06-01 06:43:05 +00:00
Ralph Castain
1107f9099e Per the RFC issued here:
http://www.open-mpi.org/community/lists/devel/2014/05/14827.php

Refactor PMI support

This commit was SVN r31907.
2014-06-01 04:28:17 +00:00
George Bosilca
6290f6cc58 Correctly deal with partially converted datatypes both
during the unpack and during the positioning.

Fixes trac:4610.

This commit was SVN r31904.

The following Trac tickets were found above:
  Ticket 4610 --> https://svn.open-mpi.org/trac/ompi/ticket/4610
2014-05-29 21:53:44 +00:00
Ralph Castain
7480a680e3 Turns out that valgrind has issues with strings that don't fall on nice 4-byte alignments with some optimized builds. So make the default security credential a nice 7-characters (so we get 8-bytes with the NULL terminator) to make valgrind shut up.
Thanks to Walter Spector for reporting this.

cmr=v1.8.2:reviewer=jsquyres:subject=align the default sec credential

This commit was SVN r31873.
2014-05-22 00:36:25 +00:00
Jeff Squyres
e532ab3020 atomic_impl.h: fix trivial typos in comments
cmr=v1.8.2:ticket=trac:4664

This commit was SVN r31858.

The following Trac tickets were found above:
  Ticket 4664 --> https://svn.open-mpi.org/trac/ompi/ticket/4664
2014-05-21 17:18:18 +00:00
George Bosilca
7ca7b54718 Removes few unreacheable warnings. This is somehow related to the
ticket #2645.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31845.
2014-05-21 00:06:04 +00:00
Nathan Hjelm
1d1cef76df opal: fix leaks
Two leaks are fixed by this commit:

 - opal_dss.lookup_data_type returns an allocated string. Free it.

 - opal_ifaddrtokindex was leaking a struct addrinfo. Ensure that is
   released before returning.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31777.
2014-05-15 15:59:41 +00:00
Ralph Castain
7b4afe7efe Revert r31765 as it causes segfaults in all orte tools. We aren't getting all the events off of the event base in those tools prior to opal calling event_base_free, and that means there are stale memory references that libevent is trying to release.
Either we need to ensure that all events are deleted (which may prove difficult), or find another way to release the base memory.

This commit was SVN r31775.

The following SVN revision numbers were found above:
  r31765 --> open-mpi/ompi@75fb6dbba9
2014-05-15 14:02:16 +00:00
Nathan Hjelm
0ef6baffd3 Fix bug in r31764
Need to remove the items of the list to avoid an assert in debug builds.

cmr=v1.8.2:ticket=trac:4628

This commit was SVN r31769.

The following SVN revision numbers were found above:
  r31764 --> open-mpi/ompi@13fd6ae774

The following Trac tickets were found above:
  Ticket 4628 --> https://svn.open-mpi.org/trac/ompi/ticket/4628
2014-05-14 23:45:50 +00:00
Nathan Hjelm
75fb6dbba9 opal/event: release the opal event context when closing the event base
This fixes a series of leaks identified by valgrind. Since
opal_event_base_open was creating the event base it follows that
opal_event_base_close should release the event base.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31765.
2014-05-14 21:15:27 +00:00
Nathan Hjelm
13fd6ae774 opal_free_list: destruct free lists items when destructing the free
list

This commit updates the behavior of opal_free_list_t to match the
behavior of ompi_free_list_t. opal_free_list_t constructed items
placed on the free list but never destructed them.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31764.
2014-05-14 21:15:19 +00:00
Rolf vandeVaart
71998ad67b Change one argument to use const. Matches better how function is called.
This commit was SVN r31741.
2014-05-13 18:55:41 +00:00
Ralph Castain
5602156a1c Use the correct abstraction layer name for the data dirs
This commit was SVN r31684.
2014-05-08 14:32:24 +00:00
Jeff Squyres
81afb4e18a hwloc: commit minor bug fix from hwloc git
Bring down 3aa0ed6 from the hwloc v1.7 branch: Stevens says we should
GETFD before we SETFD, so we do

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31683.
2014-05-08 14:29:10 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
Ralph Castain
74983c9002 Continue the renaming, fix ompi_show_subsubtitle
This commit was SVN r31674.
2014-05-07 15:45:47 +00:00
Ralph Castain
f4c31cae9b Per RFC, another round in the renaming game - nearly complete
This commit was SVN r31668.
2014-05-07 03:01:47 +00:00
Ralph Castain
a54dbb17d2 Per RFC, continue renaming project
This commit was SVN r31667.
2014-05-07 01:00:06 +00:00
Ralph Castain
2b7a3ae601 Per RFC, continue pecking away at the build system renaming
OMPI_CONFIG_SUBDIR  -> OPAL_CONFIG_SUBDIR
   OMPI_CONFIG_SUBDIR_ARGS  ->  OPAL_CONFIG_SUBDIR_ARGS

This commit was SVN r31647.
2014-05-06 16:27:38 +00:00
Ralph Castain
390d29733f Per RFC, continue renaming of build tools:
ompi_c_vendor  ->  opal_c_vendor
ompi_cv_c_compiler_vendor  ->  opal_cv_c_compiler_vendor

This commit was SVN r31646.
2014-05-06 15:01:34 +00:00
Ralph Castain
a1ae20fddb Per RFC: OMPI_CFLAGS_BEFORE_PICKY -> OPAL_CFLAGS_BEFORE_PICKY
- This line, and those below, will be ignored--

M    opal/mca/event/libevent2021/configure.m4
M    opal/mca/hwloc/hwloc172/configure.m4
M    configure.ac
M    config/opal_setup_libltdl.m4
M    config/opal_check_visibility.m4
M    config/opal_setup_cc.m4

This commit was SVN r31637.
2014-05-05 22:22:33 +00:00
George Bosilca
8a8218349a Datatype with count of zero do not generate traffic in the type map, nor
have any impact on the type signature.

Fixes trac:4597.
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31635.

The following Trac tickets were found above:
  Ticket 4597 --> https://svn.open-mpi.org/trac/ompi/ticket/4597
2014-05-05 21:49:18 +00:00
Ralph Castain
4def94900a Per RFC: OMPI_INSTALL_BINARIES -> OPAL_INSTALL_BINARIES
This commit was SVN r31634.
2014-05-05 21:43:05 +00:00
Ralph Castain
f9d892b7a4 As Nathan pointed out, C99 reserves all _foo identifiers, so rename _WORD_MASK as OPAL_CRC_WORD_MASK
This commit was SVN r31615.
2014-05-02 17:21:28 +00:00
Ralph Castain
87b6c1cf99 Add some protection to ensure that dstore doesn't segfault if uninitialized
This commit was SVN r31606.
2014-05-02 16:39:45 +00:00
Ralph Castain
e20dae536c Last step under current RFC: OMPI_CHECK_WITHDIR -> OPAL_CHECK_WITHDIR
This commit was SVN r31585.
2014-05-01 15:38:07 +00:00
Ralph Castain
e11eb15518 Next step of RFC: OMPI_CHECK_FUNC_LIB -> OPAL_CHECK_FUNC_LIB
This commit was SVN r31583.
2014-05-01 14:57:43 +00:00
Ralph Castain
3b64c603b4 First stage of RFC to rename OMPI_foo build system support: change OMPI_CHECK_PACKAGE -> OPAL_CHECK_PACKAGE
This commit was SVN r31582.
2014-05-01 14:24:56 +00:00
Nathan Hjelm
a28012b29d Fix MPI_T issues identified by friendly users.
Several fixes:

 - I was allowing an MPI_T_cvar_handle to be created for an invalid
   variable. Fixed this by checking if the variable is valid in
   mca_base_var_get.

 - Use a better error code when the caller tries to create an unbound
   pvar handle for a bound variable.

 - Return the verbosity level in MPI_T_cvar_get_info.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31576.
2014-04-30 22:10:30 +00:00
Ralph Castain
087b84b0ef Add some further debug to the dstore framework. When doing comm_spawn, we have to exchange any provided cpu bitmaps to ensure both sides compute the same locality, else various mpi frameworks can go bonkers.
This commit was SVN r31572.
2014-04-30 19:29:00 +00:00
Ralph Castain
b3f636d169 Correct typo in header definition
This commit was SVN r31558.
2014-04-29 23:41:49 +00:00
Ralph Castain
c4c9bc1573 As per the RFC:
http://www.open-mpi.org/community/lists/devel/2014/04/14496.php

Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM

This commit was SVN r31557.
2014-04-29 21:49:23 +00:00
Jeff Squyres
790cdb5cc7 Sigh. It helps when you commit the right version of the finished
code.

This commit fixes minor errors in the incorrectly-committed r31513
(new fd close-on-exec convenience function).

Refs trac:4550

This commit was SVN r31514.

The following SVN revision numbers were found above:
  r31513 --> open-mpi/ompi@e1655ae68d

The following Trac tickets were found above:
  Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
2014-04-24 13:20:32 +00:00
Jeff Squyres
e1655ae68d opal/util/fd.c: add new convenience function for setting FD_CLOEXEC
Paul Hargrove pointed out that Stevens tells us that we should
FD_GETFL before FD_SETFL.  And so we shall.

Make a new convenience function to do this (opal_fd_set_cloexec()),
just so that we don't have to litter this 2-step process throughout
the code.

Refs trac:4550

This commit was SVN r31513.

The following Trac tickets were found above:
  Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
2014-04-24 13:04:49 +00:00
Jeff Squyres
67bb0c261a hwloc: ensure that an internal fd is marked as close-on-exec
Make sure that an internal, long-lived hwloc fd is marked as
close-on-exec so that children don't inherit it.  This patch is
committed upstream in the hwloc master and v1.9 branches as 7489287
and b654e19, respectively.  The patch applied here is the exact same
logic, but the surrounding code changed slightly since the hwloc v1.7
series, so the patch doesn't apply cleanly.

Refs trac:4550

This commit was SVN r31511.

The following Trac tickets were found above:
  Ticket 4550 --> https://svn.open-mpi.org/trac/ompi/ticket/4550
2014-04-23 21:36:38 +00:00
Mike Dubman
a4990de055 mca: track external lib version (runtime/compiletime) for mca component
based on thread: http://www.open-mpi.org/community/lists/devel/2014/04/14505.php

Create mca parameter to track runtime/compiletime ext lib version for component.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31487.
2014-04-22 18:02:26 +00:00
Ryan Grant
6d577a2663 Update to atomics selection, fix octal issue and rename defines
This commit was SVN r31396.
2014-04-15 15:40:30 +00:00
Ryan Grant
e67ca81dca Fixing atomics selection issue, to be CMR'd after it passes the nightly tests
This commit was SVN r31393.
2014-04-15 13:17:04 +00:00
George Bosilca
1cddb2aae8 Correctly save the displacement for the case where the convertor is not
completed. As we need to have the right displacement at the beginning
of the next call, we should save the position relative to the beginning
of the buffer and not to the last loop.

This commit was SVN r31387.
2014-04-14 04:46:58 +00:00
George Bosilca
373f97cdae Remove a non-necessary label.
This commit was SVN r31380.
2014-04-14 00:20:30 +00:00
George Bosilca
7853f82567 Reshape all the packing/unpacking functions to use the same skeleton. Rewrite the
generic_unpacking to take advantage of the same capabilitites.

This commit was SVN r31370.
2014-04-11 20:06:56 +00:00
Ralph Castain
7c4fa3446c Per the telecon, revert r31302 for now pending an RFC review on the idea of setting app proc envar's using an MCA param
This commit was SVN r31345.

The following SVN revision numbers were found above:
  r31302 --> open-mpi/ompi@6a1b78e26b
2014-04-08 15:47:12 +00:00
Jeff Squyres
ddb44eb81f compress/crs: Remove empty help files.
Aside from comments, these help files are empty -- so just remove
them.  Also remove a few inclusions of show_help.h, since it isn't
used in those source files.

This commit was SVN r31328.
2014-04-07 15:43:40 +00:00
Jeff Squyres
53090bb5f7 mca base: Remove unused help file.
The contents of this file were superceded by mca-help-var.txt.
Indeed, this file was already removed from Makefile.am; it looks like
it was accidentally left in SVN.

This commit was SVN r31327.
2014-04-07 15:42:46 +00:00
Jeff Squyres
54c1a1d9b8 mca base: Remove unused help message.
This commit was SVN r31326.
2014-04-07 15:42:07 +00:00
Jeff Squyres
24e222f49d opal: Remove unused help message.
Help messages about deprecated variables are now provided by the MCA
var system.

This commit was SVN r31325.
2014-04-07 15:41:43 +00:00
Jeff Squyres
a4940a2a37 crs self: Remove unused help message.
Also:

* fix formatting error at top (make the emacs format line begin
  with a comment character)
* Remove blank lines between topics and replace them with empty
  comment lines (so that the help messages are not displayed with a
  trailing blank line)

This commit was SVN r31324.
2014-04-07 15:41:24 +00:00
Jeff Squyres
4eee2eff88 crs: Remove empty help text file
This commit was SVN r31323.
2014-04-07 15:40:50 +00:00
Jeff Squyres
82e104719a hwloc/rmaps base: Add missing help message.
Also, add missing ORTE_ERROR_LOG in the other case where this error
message is used (i.e., ORTE_ERROR_LOG was used in the one place, so
let's also use it in the other place).

This commit was SVN r31321.
2014-04-07 15:39:54 +00:00
Jeff Squyres
37d4c22912 hwloc base: Remove unused help messages.
These type of help messages are now displayed by the MCA var system
itself (via enumerated values).

This commit was SVN r31320.
2014-04-07 15:39:29 +00:00
Jeff Squyres
d47f88fb88 sec base: show_help isn't used in the sec base
This commit was SVN r31319.
2014-04-07 15:38:58 +00:00
Jeff Squyres
dfc447d260 memory linux: Remove unused help message.
This type of help message is now handled by the MCA var system itself
(via the MCA_BASE_VAR_FLAG_ENVIRONMENT_ONLY flag at var_register
time).

This commit was SVN r31318.
2014-04-07 15:38:25 +00:00
Jeff Squyres
4261d0bb9c memory linux: This is a level 3 MCA param
I.e., make it just like all the other MCA params in this component.

cmr=v1.8.1:reviewer=hjelmn

This commit was SVN r31317.
2014-04-07 15:38:00 +00:00
Jeff Squyres
94a5ae16b2 memory linux: show_help.h is not needed in this file
This commit was SVN r31316.
2014-04-07 15:37:16 +00:00
Jeff Squyres
0010601eb6 db sqlite: Add missing help message
This commit was SVN r31315.
2014-04-07 15:36:52 +00:00
Jeff Squyres
08beceba91 db base: Remove now-unused helpfile.
The only help message in this file wasn't being used, so remove the
whole file.

This commit was SVN r31314.
2014-04-07 15:36:29 +00:00
Jeff Squyres
d6f0036bad opal cr: Remove some unused help messages
This commit was SVN r31313.
2014-04-07 15:35:55 +00:00
Jeff Squyres
d763b66ded opal wrappers: Remove unused help message; add missing help message
This commit was SVN r31312.
2014-04-07 15:35:31 +00:00
Mike Dubman
6a1b78e26b opal: add mca param to control ranks env variables
add -mca base_env_list "var1=val1 var2=val2 ..." mca parameter that can be used in mca param files
or with -am app.conf mpirun commandline to set rank env variables with mca mechanism

fixed by Elena, reviewed by Miked

cmr=v1.8.1:reviewer=ompi-rm1.8

This commit was SVN r31302.
2014-04-01 21:14:31 +00:00
Jeff Squyres
173c046617 build: add Automake-like silent/verbose macros for "ln -s ..." operations
Also, since I put some of the macros for these silent/verbose rules up
in the top-level Makefile.man-page-rules file, I renamed it to
Makefile.ompi-rules.

I've had this sitting around for a while; now seems like as good a
time as any to commit it.

This commit was SVN r31271.
2014-03-28 18:24:32 +00:00
Nathan Hjelm
a4fff57720 make the opal progress yield variable settable at any time
The semantics of the variable mpi_yield_when_idle are to call
opal_progress_set_yield_when_idle at MPI_Init. It would be difficult
to modify the old variable to support setting this parameter at
runtime. The fix is to add an additional parameter to opal:
opal_progress_yield_when_idle that directly sets the variable. This
variable is settable anytime and does not affect the semantics of
the old mpi_yield_when_idle variable.

Refs trac:193

cmr=v1.8.1:reviewer=jsquyres

This commit was SVN r31255.

The following Trac tickets were found above:
  Ticket 193 --> https://svn.open-mpi.org/trac/ompi/ticket/193
2014-03-27 15:51:06 +00:00
Mike Dubman
3e81ee9f0d BUILD: fix "make dist" failure on some linux distro with old csh/tcsh
on some linux distro (sles11sp2) csh fails to parse $LS_COLORS and borks with error:
Unknown colorls variable `mh'.

The workaround is to unset LS_COLORS before calling to csh script

reviewed by Jeff

cmr=v1.8:reviewer=ompi-rm1.8

This commit was SVN r31244.
2014-03-27 06:34:00 +00:00
Adrian Reber
62dea2be84 opal-restart: fix variable passing from opal-restart to CRS
opal-restart.c disables crs module selection by setting
crs_base_do_not_select to true using opal_setenv() before
opal_init(). After opal_init() it enables module selection
by changing the variable back to false using opal_setenv().
This does not work anymore as the variables are only read
from the environment during variable registration.
This changes the second opal_setenv() to mca_base_var_set_value()

This commit was SVN r31108.
2014-03-18 15:28:42 +00:00
Ralph Castain
9a8d2d9989 Add a missing "break" statement that otherwise causes the fetch of any string object to return an error
cmr=v1.7.5:reviewer=hjelmn

This commit was SVN r31101.
2014-03-18 01:14:34 +00:00
Nathan Hjelm
7ec19358df MCA/base: document that is is valid for the string_value parameter to
an enumerator's mca_base_var_enum_sfv_fn_t can be NULL.

cmr=v1.7.5:ticket=trac:4398:reviewer=ompi-gk1.7

This commit was SVN r31085.

The following Trac tickets were found above:
  Ticket 4398 --> https://svn.open-mpi.org/trac/ompi/ticket/4398
2014-03-17 18:52:54 +00:00
Nathan Hjelm
b9dfe84b05 Fix segmentation fault in handling of boolean variables in mca_base_var_set_value.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31082.
2014-03-17 14:58:30 +00:00
Ralph Castain
ac421c931d The random number generator changes were incomplete (typo errors) in some places, and is missing the required declspec's for visibility.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31053.
2014-03-12 22:37:27 +00:00
Jeff Squyres
da87b506bd Remove warnings identified by clang 3.4
* Remove unused static functions
 * Remove unused static variables

cmr=v1.8:reviewer=hjelmn

This commit was SVN r31023.
2014-03-12 13:17:54 +00:00
Ralph Castain
9fca25a8dd Catch one more place where we need to use the actual topology instead of opal_hwloc_topology. Thanks to Tetsuya Mishima for the patch
Reviewed okay. RM-approved

cmr=v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r31016.
2014-03-12 00:49:54 +00:00
Ralph Castain
96f67507a5 Improve/expand the database configure logic for future use
This commit was SVN r30984.
2014-03-11 03:01:15 +00:00
Ralph Castain
c6906e8b8c Use correct unpack function
This commit was SVN r30983.
2014-03-11 02:59:34 +00:00
Ralph Castain
081669b440 When pretty-printing binding info, we need to pass the topology down to the routine as the mapper isn't always working with the local topology - otherwise, we get an erroneous help message. Thanks to Tetsuya Mishima for reporting it
cmr=v1.7.5:reviewer=rhc:subject=fix pretty-print of bindings

This commit was SVN r30968.
2014-03-10 15:53:07 +00:00
Adrian Reber
4ca07ae125 re-introduce distill_checkpoint_ready
In the OPAL_ENABLE_FT_CR code path there used to be a variable
'mca_base_component_distill_checkpoint_ready' which got removed.
The FT code was not compiling and while trying to get it to compile
again the old variable was #ifdef'd out. This re-introduces the
variable with a new name 'opal_base_distill_checkpoint_ready'
and enables the code previously #ifdef'd out.

This removes the last hack introduced to get the FT code to compile
again.

This commit was SVN r30928.
2014-03-04 16:14:46 +00:00
Jeff Squyres
3f845edfdd * Prefix the preprocessor macro used to protect the file
* Include opal_stdint.h so that we have uin32_t

cmr=v1.7.5:ticket=trac:4298

This commit was SVN r30890.

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-28 16:56:38 +00:00
Ralph Castain
fea8a52983 Cleanup trailing spaces and use of tab instead of spaces
Refs trac:4298

This commit was SVN r30827.

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-25 23:41:55 +00:00
Joshua Ladd
9ea9bec4ad Addressing Jeff's comments:
1. Changed rng_buff_t --> opal_rng_buff_t
2. All global variables obey the prefix rule
3. Old code has been removed 
4. Found a couple of unnecessary includes

Refs trac:4298

This commit was SVN r30807.

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-24 23:18:35 +00:00
Joshua Ladd
e39d9f4080 Per the RFC schedule, add an additive lagged Fibonacci parallel random number generator to OPAL. In order to use, please add the following header to your code: opal/util/alfg.h. See ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c for an example how to seed with opal_srand and invoke the generator with opal_rand. This should be added to
cmr=v1.7.5:reviewer=rhc:subject=Add an OPAL RNG

This commit was SVN r30801.
2014-02-23 21:41:38 +00:00
Adrian Reber
d7734ac6d8 CRS/CRIU: add code to actually checkpoint a process
This adds the code to actually checkpoint a process using CRIU
with the necessary variables to control the behaviour.

Right now only --np 1 is supported and --mca oob tcp.

Following parameters are supported:

* crs_criu_log: name of the log file
* crs_criu_log_level: verbosity level in the log file
* crs_criu_tcp_established: C/R established TCP connections
* crs_criu_shell_job: C/R shell jobs
* crs_criu_ext_unix_sk: allow external unix connections
* crs_criu_leave_running: leave tasks in running state after checkpoint

This commit was SVN r30772.
2014-02-19 13:30:12 +00:00
Ralph Castain
262c927778 Define a new key and store the process name of the local_rank=0 process on each node so that the MPI layer can retrieve it as desired.
This commit was SVN r30759.
2014-02-18 00:32:58 +00:00
Ralph Castain
d246d190ed Fix typo - thanks to Andreas Schwab for the patch
RM-approved

cmr:v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r30751.
2014-02-17 19:36:16 +00:00
Jeff Squyres
0243353feb There's no reason for these files to be executable; remove the
svn:executable property.

This commit was SVN r30750.
2014-02-17 19:15:58 +00:00
Ralph Castain
c3df744a3b Shift the orte_db_localrank key to the opal level. Add the job and proc-level session directory names to the database using opal_db keys.
This commit was SVN r30746.
2014-02-17 01:40:56 +00:00
Adrian Reber
14ba81d166 Simplification to the CRIU configure.m4 script:
* Remove redundant/unnecessary uses of $2
 * Change a bunch of logic from negative to positive
 * Use OPAL_VAR_SCOPE_PUSH/POP to help reduce env var usage
 * Only use "" in test statements with strings that require sanitization
 * Removed redundant AC_MSG_WARN/ERROR.  There's now only one check at
   the bottom for whether the component is "good" or not.  We'll
   AC_MSG_WARN/ERROR in that one location.

Thanks to Jeff Squyres for this patch.

This commit was SVN r30739.
2014-02-15 21:22:30 +00:00
Ralph Castain
452f73de3d Update the keystone sec module - will use curl to connect to server
This commit was SVN r30704.
2014-02-12 22:06:44 +00:00
Ralph Castain
d0e8aeaee4 Add the time_t datatype to the DSS
This commit was SVN r30700.
2014-02-12 19:37:21 +00:00
Adrian Reber
56c23f22be CRS/SELF: fix compiler warning: variable 'callback_matched' set but not used
This commit was SVN r30667.
2014-02-11 14:45:29 +00:00
Adrian Reber
42d7b014e4 CRS/CRIU: added CRIU as new CRS component
To be able to checkpoint/restart using criu (criu.org) a new
CRS component is added which is based on criu. This first commit
provides the minimal set of functions and configure script options
to enable --with-criu and link against libcriu.so.
No actual checkpoint/restart functionality is yet implemented.
This is only the framework which needs to be filled with the
actual functionality.

This commit was SVN r30666.
2014-02-11 14:43:38 +00:00
Ralph Castain
86e8a147c6 Resolve uninitialized variables on some systems. Thanks to Paul Hargrove for finding the problem and suggesting the patch.
cmr=v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r30656.
2014-02-10 21:17:34 +00:00
Ralph Castain
4e32a82638 If we are binding to hwthreads, then we need to treat hwthreads as cpus to get the mapping right
cmr=v1.7.5:reviewer=jsquyres:subject=set hwthreads to cpus when binding to them

This commit was SVN r30648.
2014-02-09 16:14:38 +00:00
Ralph Castain
ca0c806662 Resolve the problem of binding in inverted topologies - check the relative depth of the map and bind objects in the topology, and let that determine whether we bind downward or upwards.
cmr=v1.7.5:reviewer=jsquyres:subject=Resolve the problem of binding in inverted topologies

This commit was SVN r30643.
2014-02-09 05:30:17 +00:00
Ralph Castain
78e1846b4b Add further clarification regarding new "test" APIs
This commit was SVN r30567.
2014-02-05 15:48:31 +00:00
Jeff Squyres
9ba6c6fe41 Add missing header file
This commit was SVN r30556.
2014-02-04 19:50:02 +00:00
Ralph Castain
230336b6a8 Upgrade the security framework to avoid multiple hits against the global security server. Add support for future case where mpirun assings a global security credential for a given run, though we need to work out how to handle connect-accept from other mpirun's in that case. Remove a bunch of duplicate code in the OOB by consolidating the connection handshake code.
Refs trac:4221

This commit was SVN r30554.

The following Trac tickets were found above:
  Ticket 4221 --> https://svn.open-mpi.org/trac/ompi/ticket/4221
2014-02-04 14:47:04 +00:00
Ralph Castain
5980b7e042 Add a security framework for authenticating connections - we will add LDAP, Kerberos, and Keystone support in the next month. For now, just put a placeholder "basic" module that does the minimum.
Wire the security check into ORTE's OOB handshake, and add a "version" check to ensure that both ends are from the same ORTE version. If not, report the mismatch and refuse the connection

Fixes trac:4171

cmr=v1.7.5:reviewer=jsquyres:subject=Add a security framework for authenticating connections

This commit was SVN r30551.

The following Trac tickets were found above:
  Ticket 4171 --> https://svn.open-mpi.org/trac/ompi/ticket/4171
2014-02-04 01:38:45 +00:00
Ralph Castain
9514858067 As Rolf pointed out, this patch wasn't needed on the trunk - just the 1.7 branch. Sigh
This commit was SVN r30544.
2014-02-03 21:40:56 +00:00
Ralph Castain
4d533c81fb Minor cleanup required when configuring with an external libevent. Thanks to Orion Poplawski for the patch!
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r30543.
2014-02-03 21:03:05 +00:00
Ralph Castain
fab35dbffa Silence some Solaris warnings reported by Paul Hargrove
cmr=v1.7.4:reviewer=jsquyres:subject=Silence some Solaris warnings

This commit was SVN r30542.
2014-02-03 19:46:08 +00:00
Ralph Castain
193cceb483 Okay, since a certain other RM out there made a fuss about being able to lock their daemons to specified cores, offer the same option here. The MCA param orte_daemon_cores can be used to specify which core(s) you want the orte daemons to use. This will have no bearing on the application procs - unbound will remain unbound, and binding directives will be applied to the apps.
Yippee skippee...

This commit was SVN r30513.
2014-01-30 23:50:14 +00:00
Ralph Castain
ad7c84caee Cleanup some stale definitions in the new database components
This commit was SVN r30506.
2014-01-30 19:56:41 +00:00
Jeff Squyres
bc795cd25b Fix typo in comment.
This commit was SVN r30504.
2014-01-30 18:05:34 +00:00
Ralph Castain
83e32aadb7 Add a variant of opal_init/finalize for running unit tests
This commit was SVN r30497.
2014-01-30 11:14:36 +00:00
Ralph Castain
4a8888d377 Hey, having a unit test for opal_db proved useful. Found a few bugs and missing areas of support.
cmr=v1.7.5:reviewer=jsquyres:subject=fix bugs and missing support in opal_db

This commit was SVN r30493.
2014-01-30 00:51:18 +00:00
Rolf vandeVaart
791a3a5ec6 Fix CUDA-aware support with sendi optimization. Need to make sure copy function
is initialized.

This commit was SVN r30437.
2014-01-27 18:35:01 +00:00
George Bosilca
18ae20022a Don't forget to release the bitmaps.
This commit was SVN r30428.
2014-01-26 17:24:38 +00:00
Jeff Squyres
98d67add3c Updates to the README and wrapper compiler man pages for Fortran.
Thanks to Paul Hargrove for spotting these issues.

cmr=v1.7.4:reviewer=rhc:subject=Fortran README+man page updates

This commit was SVN r30414.
2014-01-24 21:00:00 +00:00
Ralph Castain
c6c10f47b8 Label rows with the table name when printing db logs
cmr=v1.7.5:reviewer=rhc

This commit was SVN r30410.
2014-01-24 17:30:12 +00:00
Ralph Castain
883c1a1c57 Fix ia64 operations by correcting a couple of bugs in the ia64 atomics. Thanks to Paul Hargrove for the patch!
Since Paul is the only one of the team with the required hardware to test it, and he has done so, consider this RM-approved.

cmr=v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r30401.
2014-01-24 00:14:37 +00:00
Ralph Castain
a19510187b This is an old patch (r26226) from two years ago that somehow went directly into the 1.6 branch without first entering the trunk. Hence, the problem it fixed still remains in the trunk, and in the 1.7 series as a regression :-(
Thanks to Paul Hargrove for tracking it down.

RM-approved

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r30397.

The following SVN revision numbers were found above:
  r26226 --> open-mpi/ompi@12781482b9
2014-01-23 15:47:49 +00:00
Jeff Squyres
4204bd0b70 Fix harmless typo in error message.
Submitted by Paul Hargrove, reviewed by Jeff Squyres.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30394.
2014-01-23 14:19:57 +00:00
Jeff Squyres
7768828d2d Addendum to r30298: tweak the wording of the help messages a bit.
Refs trac:4117.  Please use this commit rather than the patch attached to
the ticket; the patch had a few mistakes in the tweaked wording.

This commit was SVN r30362.

The following SVN revision numbers were found above:
  r30298 --> open-mpi/ompi@58479399c3

The following Trac tickets were found above:
  Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
2014-01-22 12:17:14 +00:00
Ralph Castain
9a2dc54311 Apply a patch provided by Paul Hargrove back in Jan 2013 that fixes MIPS assembly issues. This patch was originally reviewed and RM-approved to go into the 1.6 branch (which never happened for logistical reasons), and subsequently the trunk patch was provided. Paul has verified the patch and its application to 1.7.4, so we will consider it reviewed for that purpose.
cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Fix MIPS assembly

This commit was SVN r30348.
2014-01-21 16:42:49 +00:00
Brian Barrett
fe093556f7 Only provide OPAL_THREAD_ADD64 if we have 64 bit atomics
This commit was SVN r30339.
2014-01-20 20:22:38 +00:00
Ralph Castain
26fbb4e77b Necessary constants for postgress module
This commit was SVN r30338.
2014-01-20 19:58:56 +00:00
Ralph Castain
12e4f8a71d Add support for postgres database
This commit was SVN r30337.
2014-01-20 19:56:26 +00:00
Ralph Castain
d2d4eeb2d6 Let the floating point pack/unpack work at arbitrary precision
cmr=v1.7.5:reviewer=rhc

This commit was SVN r30334.
2014-01-20 19:34:15 +00:00
Jeff Squyres
ab31428bd3 opal_path_nfs(): If we get EPERM, just give up.
Also fix the wording in a comment.

This is worth fixing, but not worth holding up 1.7.4.

cmr=v1.7.5:reviewer=rhc

This commit was SVN r30307.
2014-01-17 14:28:12 +00:00
Jeff Squyres
afb33b8de8 Bring down upstream hwloc 438d9ed7457888c63d29778bda56cd27c52a8d51 to
work around buggy NUMA node cpusets (i.e., buggy BIOSs).

Thanks to Jeff Becker for reporting the issue.

Submitted by Brice Goglin, reviewed by Jeff Squyres.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30306.
2014-01-17 13:49:56 +00:00
Ralph Castain
58479399c3 As per RFC and telecon, deprecate cmd line options and their corresponding MCA params for old-style mapping and binding directives
cmr=v1.7.5:reviewer=jsquyres:subject=deprecate old-style mapping and binding directives

This commit was SVN r30298.
2014-01-15 14:48:39 +00:00
Jeff Squyres
e3f818ba87 Remove a sketchy use of asprintf() passing a union by value.
The original code was passing a union by value, and doing odd things
on Solaris/SPARC (where "odd" rhymes with "SIGBUS").  Replace it with
an exploded switch/case block for all the enum values.  Also use the
string literals so that we get compiler checking of the format string
vs. the type of the actual arguments.

cmr=v1.7.4:revier=hjelmn:subject=Fix MCA base var to not pass union by value

This commit was SVN r30276.
2014-01-13 22:24:14 +00:00
Nathan Hjelm
91c4890886 MCA/base/var: Fix integer overflow check and add support for printing new
variable names for deprecated variables.

Closes trac:3270

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30275.

The following Trac tickets were found above:
  Ticket 3270 --> https://svn.open-mpi.org/trac/ompi/ticket/3270
2014-01-13 20:29:56 +00:00
Jeff Squyres
9950471df7 Fixes for opal_path_nfs():
* Fix some typos in macro names.
 * Add case for OS's that have statfs() but no struct statfs (!).
 * Add case for NetBSD with struct statvfs.f_fstypename.

Many thanks to Paul Hargrove who developed the majority of this patch.

Reviewed by Jeff Squyres.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30255.
2014-01-11 01:07:10 +00:00
Jeff Squyres
962a14cf6d Pull upstream hwloc commit 5198d4c0fd6cae12756fb44aed16a2d4a58b1a25
Change the logic in bind.c to only include <malloc.h> if we don't have posix_memalign.
    
In http://www.open-mpi.org/community/lists/devel/2014/01/13619.php,
Paul Hargrove found a compiler warning on OpenBSD where <malloc.h>
exists, but is not intended to be used (and doesn't error out, so
AC_CHECK_HEADERS says its ok).

Reviewed by Brice Goglin.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30234.
2014-01-10 17:37:00 +00:00
Jeff Squyres
023c50e864 Fix typo in macro name (#$%@#$% defined-or-not macros!!)
Refs trac:4079

This commit was SVN r30206.

The following Trac tickets were found above:
  Ticket 4079 --> https://svn.open-mpi.org/trac/ompi/ticket/4079
2014-01-09 23:47:36 +00:00
Ralph Castain
fb9e427320 One last corner case - when encountering an overload condition (e.g., by comm_spawning more procs than we have cores) and we are using the default binding policy, do *not* bind the new procs to anything as this can cause major problems. Instead, let the spawn succeed since the user didn't specifically ask to be bound, and leave the new procs as unbound.
Refs trac:4077

This commit was SVN r30200.

The following Trac tickets were found above:
  Ticket 4077 --> https://svn.open-mpi.org/trac/ompi/ticket/4077
2014-01-09 22:39:34 +00:00
Jeff Squyres
c67c8e8187 Make the use of statfs()/statvfs() be more robust.
As noted by Paul Hargrove, the #if's surrounding the use of statfs()
and statvfs() in opal/util/path.c have apparently gotten stale (e.g.,
modern flavors of *BSD OSs no longer define __BSD).  Changes:

 * Add statfs and statvfs to the AC_CHECK_FUNCS in configure.ac
 * Add a sanity check to ensure that we have at least one of statfs()
   or statvfs().  Add a similar sanity check in opal/util/path.c, just
   as defensive programming.
 * Use AC_CHECK_MEMBERS in configure.ac to check for specific struct
   statfs/struct statvfs members that we use in opal/util/path.c
 * In path.c, add some #includes as listed on the OS man page for
   statfs(2) (OS X 10.8.5/Mountain Lion)
 * The previous code used statvfs() on Solaris and statfs() everywhere
   else.  Attempting to replicate this with behavior-based configure
   testing led to fairly complicted if/else logic, so the new code
   uses whichever of the two are available (i.e., it might actually
   use both -- OS X 10.8.5 and RHEL 6.5 have both statfs() and
   statvfs()).  The rationale here is that we don't really care which
   of the two functions report the answer; we'll take the answer
   regardless of where it comes from.  For example, if one function
   returns a failure and the other does not, we'll use the results
   from the successful function and ignore the failed one.

This new code seems to work on OS X and Linux.  We'll have to see what
happens with MTT and future Paul Hargrove testing...

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=Make statfs/statvfs more robust

This commit was SVN r30198.
2014-01-09 21:28:52 +00:00
Ralph Castain
f179f2086b Do a better job of reporting bindings - if someone gives a spec that binds us to all processors, then we are effectively unbound and should report it clearly instead of outputting a long line of B's.
cmr=v1.7.4:reviewer=jsquyres:subject=Do a better job of reporting bindings

This commit was SVN r30179.
2014-01-09 16:16:16 +00:00
Ralph Castain
2b92fccfd1 Looks like this code was intended to separate Sun's vfs struct from everyone else's, yet the #elif can make it fail on some systems that actually support the capability. So just make it an #else to cover the range of systems we now support and move on.
cmr=v1.7.4:reviewer=jsquyres:subject=correct opal_path_df logic

This commit was SVN r30172.
2014-01-09 04:10:26 +00:00
Jeff Squyres
13b29cff2c This commit compliements/completes r30140. r30140 made all the
configury/Makefile.am changes; this commit renames the internal
installdirs.h framework struct field names to match the configry macro
names:

 * pkgdatdir ->	ompidatadir
 * pkglibdir -> ompilibdir
 * pkgincludedir -> ompiincludedir

This commit was SVN r30145.

The following SVN revision numbers were found above:
  r30140 --> open-mpi/ompi@8b778903d8
2014-01-07 23:36:33 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Jeff Squyres
cbc9ee5894 Refs trac:4038
Revert r30096 and use better precious variable names.

This commit was SVN r30132.

The following SVN revision numbers were found above:
  r30096 --> open-mpi/ompi@e0f6a4ef47

The following Trac tickets were found above:
  Ticket 4038 --> https://svn.open-mpi.org/trac/ompi/ticket/4038
2014-01-07 15:58:40 +00:00
Adrian Reber
d7ab0231e8 Cleanup trailing/heading whitespaces in COMPRESS/CRS MCA
This commit was SVN r30131.
2014-01-07 14:34:45 +00:00
Adrian Reber
015799c724 Fix COMPRESS/CRS MCA for C/R by returning correct value
Right now the C/R code fails because of a change introduced in
opal/mca/compress/base/compress_base_open.c and
pal/mca/crs/base/crs_base_open.c in 2013 with commit

git 734c724ff76d9bf814f3ab0396bcd9ee6fddcd1b
svn r28239

    Update OPAL frameworks to use the MCA framework system.

This commit changed a lot but also the return value of functions from
OPAL_SUCCESS to OPAL_ERR_NOT_AVAILABLE/OPAL_ERR_NOT_AVAILABLE.

This commit lets opal_compress_base_register() and opal_crs_base_open()
always return OPAL_SUCCESS and removes unneeded #includes.

This commit was SVN r30130.

The following SVN revision numbers were found above:
  r28239 --> open-mpi/ompi@365cf48db5
2014-01-07 14:16:54 +00:00
Jeff Squyres
9ba0d19ef1 Corrected patch from Tetsuya Mishima (i.e., a more correct fix than
r30086: make sure that a super item is constructed properly).

Refs trac:4035

This commit was SVN r30090.

The following SVN revision numbers were found above:
  r30086 --> open-mpi/ompi@d1c63f878e

The following Trac tickets were found above:
  Ticket 4035 --> https://svn.open-mpi.org/trac/ompi/ticket/4035
2013-12-26 12:41:39 +00:00
Ralph Castain
a8a91b374e Update component-level selection comments to match latest revisions
cmr=v1.7.4:reviewer=rhc

This commit was SVN r30087.
2013-12-25 19:12:43 +00:00
Ralph Castain
d1c63f878e Init variable to avoid infinite loop issues with PGI compilers
Thanks to Tetsuya Mishima for identifying the problem and providing the patch!

cmr=v1.7.4:reviewer=jsquyres:subject=Fix LAMA mapper for PGI compilers

This commit was SVN r30086.
2013-12-25 16:43:45 +00:00
Ralph Castain
65654325ef Continue the quest for valgrind silence - ensure that the db modules are given a chance to cleanup, and that the hash module releases all of its memory
cmr=v1.7.5:reviewer=jsquyres:subject=Silence db framework valgrind reports

This commit was SVN r30079.
2013-12-24 17:03:45 +00:00
Nathan Hjelm
3be4536d9b Cleanup various leaks in ompi_info reported by valgrind.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30058.
2013-12-23 17:47:43 +00:00
George Bosilca
24879f9def Code cleanup while chasing valgrind complaints.
This commit was SVN r30048.
2013-12-21 23:28:14 +00:00
George Bosilca
5aa0837250 Init all fields (valgrind cleanup).
This commit was SVN r30046.
2013-12-21 23:24:29 +00:00
George Bosilca
38cbaeaa82 Try to impose a little bit of consistency on how we parse lists of
modules by enforcing the use of OPAL list accessors.

This commit was SVN r30045.
2013-12-21 23:23:33 +00:00
Ralph Castain
31248c0985 Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match.
Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node.

Refs trac:4003

This commit was SVN r30033.

The following Trac tickets were found above:
  Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003
2013-12-20 20:42:39 +00:00
George Bosilca
178c340992 rearrange the fields to remove a gap in the datatype.
This commit was SVN r30020.
2013-12-20 15:57:56 +00:00
George Bosilca
7178492dd5 Correctly initialize and finalize all the datatype classes. No memory leaks on the
datatype engine subsists.

This commit was SVN r30019.
2013-12-20 15:57:10 +00:00
Jeff Squyres
802c89680a Protect hwloc/configure/m4's use of some temporary shell variables
Fix problem reported by Paul Hargrove:
http://www.open-mpi.org/community/lists/devel/2013/12/13519.php

cmr=v1.7.4:reviewer=brbarret

This commit was SVN r30013.
2013-12-20 14:48:40 +00:00
Ralph Castain
7cf0fc5578 One more round of sys_limit fixes...sigh
Refs trac:4010

This commit was SVN r30011.

The following Trac tickets were found above:
  Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010
2013-12-20 14:44:51 +00:00
Ralph Castain
e49c16b975 Grrr....use #if instead of #ifdef
Refs trac:4010

This commit was SVN r30010.

The following Trac tickets were found above:
  Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010
2013-12-20 14:24:26 +00:00
Ralph Castain
6e6351959d Check for all the RLIMIT_foo constants that we use, and update the limit checks to use the new #define values. Fix a bug where failure of some might lead to incorrect bracketing.
Refs trac:4010

This commit was SVN r30009.

The following Trac tickets were found above:
  Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010
2013-12-20 14:09:43 +00:00
Jeff Squyres
090ce4187a Fix compiler errors on Solaris, NetBSD, and OpenBSD:
* Per
   http://www.open-mpi.org/community/lists/devel/2013/12/13504.php, 
   protect usage of struct ifreq->ifr_hwaddr
 * Per
   http://www.open-mpi.org/community/lists/devel/2013/12/13503.php,
   avoid #define conflict with the token "if_mtu"
 * Also fix some whitespace and string naming issues in opal/util/if.c

Tested by Paul Hargrove.

Refs trac:4010

This commit was SVN r30006.

The following Trac tickets were found above:
  Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010
2013-12-20 11:17:30 +00:00
Ralph Castain
f15b0c9863 Add protections around the various system limits to protect code on unusual systems
Thanks to Paul Hargrove for reporting it on OpenBSD-5

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30003.
2013-12-20 03:18:07 +00:00
Nathan Hjelm
653babc737 Fix a couple issues with the mca_base_var system:
- Use ->boolval for booleans when creating a string.
 - Solaris has some issue with the ?: used in one of find functions. Use an if instead.
 - Change all instances of index -> vari to avoid issues with redefining index.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29997.
2013-12-19 23:28:17 +00:00
Nathan Hjelm
ee9cd13b90 Remove opal_recursion_depth_counter and opal_progress_thread_count. These counters add two
atomics in the critical path and are not currently used. We can bring them back if there
turns out to be a good use for them.

cmr=v1.7.4:reviewer=brbarret

This commit was SVN r29994.
2013-12-19 23:15:27 +00:00
Jeff Squyres
e4097c5cc9 George pointed out that r29991 wasn't quite right. This patch is
added on top of r29991 and:

 * Consolidates the _debug variables in opal_datatype_internal.h and
   opal_convertor.h
 * Puts the DO_DEBUG macros back in the .c files, because they are
   slightly different from each other

Refs trac:4004

This commit was SVN r29992.

The following SVN revision numbers were found above:
  r29991 --> open-mpi/ompi@a88e143127

The following Trac tickets were found above:
  Ticket 4004 --> https://svn.open-mpi.org/trac/ompi/ticket/4004
2013-12-19 22:54:27 +00:00
Jeff Squyres
a88e143127 Fixes trac:3989: opal_pack_debug was instantiated as a bool in one file
and extern'ed s an int in another.  This caused a SIBGUS on
Solaris/SPARC.

This commit properly moves the extern to a .h file so that it's the
same in all files.  It also moves the DO_DEBUG to the header file,
because it was defined to the same thing in multiple .c files.

cmr=v1.7.4:reviewer=bosilca:subject=fix SPARC SIGBUS in opal convertor code

This commit was SVN r29991.

The following Trac tickets were found above:
  Ticket 3989 --> https://svn.open-mpi.org/trac/ompi/ticket/3989
2013-12-19 21:38:51 +00:00
Ralph Castain
79af9825ac Update of patch from Takahiro Kawashima
Refs trac:3986

This commit was SVN r29984.

The following Trac tickets were found above:
  Ticket 3986 --> https://svn.open-mpi.org/trac/ompi/ticket/3986
2013-12-19 17:22:37 +00:00
Jeff Squyres
42e3e5cd4b Fixes trac:3990: ensure we don't SIGBUS on SPARC by forcing a memory copy
and preventing access to potentially unaligned data.

Reviewed by Dave Goodell.  Tested by Siegmarr Gross.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=fix SPARC SIGBUS in opal net code

This commit was SVN r29983.

The following Trac tickets were found above:
  Ticket 3990 --> https://svn.open-mpi.org/trac/ompi/ticket/3990
2013-12-19 16:51:34 +00:00
Ralph Castain
55cd65b149 Don't warn about binding (process and/or memory) if the node cannot do it or if we would overload, but it wasn't specifically requested by the user (i.e., it is the result of the default policy). Instead, just don't bind and quietly move along.
Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff.

cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings

This commit was SVN r29978.
2013-12-19 16:31:45 +00:00
Jeff Squyres
d8c0c919e1 Clarify the comment: if ummunotify is present *and* if we were
compiled with ummunotify support (which is the check that r29720 just
recently added).

This commit was SVN r29961.

The following SVN revision numbers were found above:
  r29720 --> open-mpi/ompi@ae8c826527
2013-12-18 23:39:22 +00:00
Ralph Castain
77553f72be Per this email thread:
http://www.open-mpi.org/community/lists/devel/2013/12/13412.php

fix the backtrace function to avoid async issues. Thanks to Takahiro Kawashima for the patch

This commit was SVN r29955.
2013-12-18 17:57:37 +00:00
Adrian Reber
b42aad44a3 Trying to get the C/R code to compile again. This patch
includes various fixes all over the C/R code which are
hard to group like the other patches.

Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values

Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)

This commit was SVN r29922.
2013-12-16 15:35:28 +00:00
George Bosilca
3d72ccf1f4 Don't reset the convertor to the default size and buffer as it
should be already set to the right value. This fixes a problem
identified by Guillaume Gouaillardet, where using a single 
persistent receive leads to leaking the convertor stack memory.

Refs trac:3956
cmr=v1.7.4:reviewer=jsquyres:subject=Correctly handle the convertor internal stack for persistent receives.

This commit was SVN r29920.

The following Trac tickets were found above:
  Ticket 3956 --> https://svn.open-mpi.org/trac/ompi/ticket/3956
2013-12-15 18:16:38 +00:00
Ralph Castain
8b6d117541 Per the OMPI devel conference that changed our default behaviors:
* default to bind-to core 
* map-by slot if np=2
* map-by socket (balance across sockets on each node) if np > 2
* map-by <obj> will imply rank-by <obj> by default (leave default binding as above) 

Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs

cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values

This commit was SVN r29919.
2013-12-15 17:25:54 +00:00
George Bosilca
6189d5968b Make the builtin atomics follow the same convention as every other atomic
support we have ([op]_and_fetch instead of fetch_and_[op]).

This commit was SVN r29915.
2013-12-15 16:48:27 +00:00
Jeff Squyres
0ab48ad0d2 Fix some annoying flex warnings that have been there for years.
Many thanks to Tom Fogal for the initial patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings

This commit was SVN r29904.
2013-12-14 00:36:12 +00:00
Jeff Squyres
1bc8f41edb This commit combines 3 somewhat-unrelated things, which unfortunately
got linked together (work on one caused work in the other):

 * Clean up a bunch of VAR_SCOPE issues in configure.  This includes:
   * Using VAR_SCOPE_PUSH and VAR_SCOPE_POP in more places
   * Cleaning up the use of some shell variables (e.g., name them better)
 * Add support for external libevent via
   --with-libevent=<dir-to-libevent-install-tree>, as specifically
   asked for by downstream packagers.
 * Revamp how wrapper compiler RPATH (and RUNPATH) support is done.
   The external libevent work exposed weakenesses in how the original
   RPATH/RUNPATH work was done, so we had to re-do it to be a bit more
   robust.

This work has not yet been tested on Solaris.

Refs trac:3694

This commit was SVN r29899.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:24:45 +00:00
Brian Barrett
121ca26c59 Per discussion at Develoepr's Meeting, remove Solaris threads support. Solaris
will just fall back to pthreads, which should be no problem.

This commit was SVN r29893.
2013-12-13 20:07:11 +00:00
Brian Barrett
6ef938de3f * Per the Developer's meeting today, restructure the threading in Open MPI a bit
more:
  - Remove OPAL_ENABLE_MULTI_THREADS, since it didn't really do anything
    correctly.  Opal always has threads enabled at this point.
  - Remove OMPI_ENABLE_PROGRESS_THREADS, since this hasn't worked in
    8 years and it has performance issues we'll never be able to
    overcome.  Note that we have plans for re-adding async progress, using
    a hybrid protocol of async and sync sends.
  - OMPI_ENABLE_THREAD_MULTIPLE now determines whether the thread lock
    macros do the check or not.
  - Condition variables are ALWAYS polling right now, which fixes the thread
    live-lock currently found when THREAD_MULTIPLE is turned on.

This commit was SVN r29891.
2013-12-13 19:40:12 +00:00
Jeff Squyres
3bd9c603ff Clean up variables used in configure with OPAL_VAR_SCOPE.
This is helpful in the work for #3694: ensure that many places that
eventually end up in configure don't overly-pollute the global shell
variable space (because debugging accidental shell variable pollution
can be a real pain).

Refs trac:3694

This commit was SVN r29830.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-06 23:40:34 +00:00
Vasily Filipov
ae8c826527 "If" statement wrapping with #if MEMORY_LINUX_UMMUNOTIFY in order to prevent ptmalloc2 hooks disabling in case if OMPI was not configured with ummunotify support.
This commit was SVN r29720.
2013-11-19 07:00:21 +00:00
Jeff Squyres
abeef55a55 Fix a few compiler warnings reported by clang:
* Ensure "cnt" is always initialized
 * Ensure we dont' buffer overflow on strncat() -- need to ensure we
   account for the terminating \0 character
 * hwloc_get_type_depth() returns an int (not unsigned), and
   HWLOC_TYPE_DEPTH_UNKNOWN if it's unknown (which is probably <0, but
   still, might as well check what the official hwloc docs say to
   check for)

cmr=v1.7.4:reviewer=rhc:subject=fix hwloc base compiler warnings

This commit was SVN r29686.
2013-11-13 15:54:01 +00:00
Jeff Squyres
ad51705891 Fix compiler warnings about signed/unsigned comparisons
Change static opal_setlimit() function to return its value in an OUT
parameter and return the usual int error code indicating success or
failure.  

The OUT param and return code need to be separated because the OUT
param is an unsigned type, but opal_setlimit() was returning -1 upon
failure.  Hence, the caller could not know that it had failed because
the return type was previously an unsigned type.

cmr=v1.7.4:reviewer=rhc:subject=Fix opal sys_limits.c signed/unsigned warnings

This commit was SVN r29685.
2013-11-13 15:40:34 +00:00
Jeff Squyres
f1bff698a4 Fix compiler warning: event is unsigned; it can't be negative
cmr=v1.7.4:reviewer=rhc

This commit was SVN r29684.
2013-11-13 15:35:37 +00:00
Jeff Squyres
0749919127 Ensure is_tar is always initialized.
cmr=v1.7.4:reviewer=rhc

This commit was SVN r29683.
2013-11-13 15:34:33 +00:00
Jeff Squyres
750e6f6895 Fix compiler warning.
cmr=v1.7.4:reviewer=hjelmn

This commit was SVN r29682.
2013-11-13 15:33:55 +00:00
Mike Dubman
840e2cb4a2 mindist: cosmetic, use fallback to byslot if unable to read NUMA info, small fix.
fixed by Elena, reviewed by Ralph/Mike
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29679.
2013-11-13 09:26:40 +00:00
Jeff Squyres
71c8b471d0 Add comment: strings in values[] can be free()'d after mca_base_var_enum_create() returns
This commit was SVN r29655.
2013-11-11 22:20:58 +00:00
Ralph Castain
6ef7dc1f42 We previously weren't checking all the bits in locality to ensure we had a complete match - instead, we would report "local" to the specified level if only one bit matched. Ensure that a est for locality tests local to the specified level by checking that *all* bits match.
cmr=v1.7.4:reviewer=hjelmn:subject=Ensure locality is properly tested

This commit was SVN r29643.
2013-11-08 04:21:05 +00:00
Brian Barrett
6d7a1fbb82 Move opal_portable_platform.h to opal/include/opal, which is where it really
should have been all along and fix one place that uses the file

Update opal_portable_platform.h with changes to mpi_portable_platform.h made 
in r29608.

Make mpi_portable_platform.h a symlink to opal_portable_platform.h, so that
they won't get out of sync.  I'd like to remove mpi_portable_platform.h, but
we don't automatically add -I${includedir}/openmpi/ to make that sane from
a header include point of view, so that's future work.

This commit was SVN r29618.

The following SVN revision numbers were found above:
  r29608 --> open-mpi/ompi@b71bd51cdd
2013-11-06 17:12:26 +00:00
Jeff Squyres
01118fcfb9 Add an Open MPI-specific comment here so that we hopefully don't lose
this change the next time we update libevent.

cmr=v1.7.4:ticket=3882

This commit was SVN r29597.

The following Trac tickets were found above:
  Ticket 3882 --> https://svn.open-mpi.org/trac/ompi/ticket/3882
2013-11-05 03:40:00 +00:00
Brian Barrett
a45d5603a3 Fix installation of libevent header files with --with-devel-headers isn't specified
cmr=v1.7.4:reviewer=rhc

This commit was SVN r29588.
2013-11-04 16:54:00 +00:00
Nathan Hjelm
9cd18f926c Add missing OSX builtin define
This commit was SVN r29576.
2013-10-31 02:06:39 +00:00
Nathan Hjelm
b922cd1583 Add support for OSX builtin atomics.
OSX atomic support is disabled by default. Enable with --enable-osx-builtin-atomics.

Fixes trac:2120

This commit was SVN r29568.

The following Trac tickets were found above:
  Ticket 2120 --> https://svn.open-mpi.org/trac/ompi/ticket/2120
2013-10-30 17:48:15 +00:00
Ralph Castain
25385590e6 Silence warning
This commit was SVN r29528.
2013-10-26 19:41:35 +00:00
Ralph Castain
75c306994e Add some debug
This commit was SVN r29523.
2013-10-26 02:26:21 +00:00
Ralph Castain
8c5c7d0db4 Correct a bug in handling of oob_tcp_if_include/exclude addresses by using the kernel index instead of the raw index of the interface.
Refs trac:3696

This commit was SVN r29522.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-10-26 00:47:14 +00:00
Nathan Hjelm
f7428fb6a9 Small fixes for the MCA variable interface.
- Make a copy of enumerator data for default enumerators. This will allow
   the caller to free their data once the enumerator has been created. This
   is a change from just referencing the values array.

 - Make mca_base_pvar_notify check if the pvar is valid before calling the
   notify callback. This fixes a segmentation fault when destroying handles
   after MPI_Finalize().

cmr=v1.7.4:ticket=trac:3861

This commit was SVN r29512.

The following Trac tickets were found above:
  Ticket 3861 --> https://svn.open-mpi.org/trac/ompi/ticket/3861
2013-10-24 19:27:06 +00:00
Jeff Squyres
f45144aed0 Add a little more to the docs for mca_base_var_enum_create().
This commit was SVN r29496.
2013-10-23 22:11:19 +00:00
Dave Goodell
25dd719d4d opal: support __attribute__((__noinline__))
First cut does not attempt any "cross-check".  As we discover compilers
which complain about __noinline__, we will add specific cross checks to
handle those cases.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29488.
2013-10-23 15:52:05 +00:00
Nathan Hjelm
d34a4300b8 Fix various bugs in mca_base_pvar.
Fixes:

 - Segmentation fault when using watermark variables.

 - Segmentation fault when using a handle bound to a no longer valid
   performance variable.

 - Incorrect return codes from MPI_T_pvar_* functions.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29481.
2013-10-23 15:47:15 +00:00
Ralph Castain
772a376d73 Correct location of elog file
Refs trac:3847

This commit was SVN r29438.

The following Trac tickets were found above:
  Ticket 3847 --> https://svn.open-mpi.org/trac/ompi/ticket/3847
2013-10-14 19:21:45 +00:00
Ralph Castain
24c811805f ****************************************************************
This change contains a non-mandatory modification
       of the MPI-RTE interface. Anyone wishing to support
       coprocessors such as the Xeon Phi may wish to add
       the required definition and underlying support
****************************************************************

Add locality support for coprocessors such as the Intel Xeon Phi.

Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.

So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:

1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board

2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions

3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.

4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.

5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.

6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.

cmr:v1.7.4:reviewer=hjelmn

This commit was SVN r29435.
2013-10-14 16:52:58 +00:00
Nathan Hjelm
50b4b92758 hostname may not NULL-terminate the string if the buffer is too small.
Thanks to Kevin M. Hildebrand for catching this.

cmr=v1.7.3:reviewer=jsquyres

This commit was SVN r29412.
2013-10-09 15:49:18 +00:00
Ralph Castain
9902748108 ***** THIS INCLUDES A SMALL CHANGE IN THE MPI-RTE INTERFACE *****
Fix two problems that surfaced when using direct launch under SLURM:

1. locally store our own data because some BTLs want to retrieve 
   it during add_procs rather than use what they have internally

2. cleanup MPI_Abort so it correctly passes the error status all
   the way down to the actual exit. When someone implemented the
   "abort_peers" API, they left out the error status. So we lost
   it at that point and *always* exited with a status of 1. This 
   forces a change to the API to include the status.

cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch

This commit was SVN r29405.
2013-10-08 18:37:59 +00:00
Ralph Castain
6951976bc4 Update struct member name - this is why we put such things in the trunk before moving them to a branch, especially when coming from outside :-)
Refs trac:3830

This commit was SVN r29390.

The following Trac tickets were found above:
  Ticket 3830 --> https://svn.open-mpi.org/trac/ompi/ticket/3830
2013-10-07 15:43:43 +00:00
Ralph Castain
13cd112fb4 Avoid use of interface in struct because cygwin compilers apparently object (go figure)
This commit was SVN r29388.
2013-10-06 23:55:38 +00:00
Ralph Castain
2d2307b6eb Modify libevent to support cygwin - patch will be pushed upstream
This commit was SVN r29387.
2013-10-06 23:53:31 +00:00
Ralph Castain
2121e9c01b Fix an issue regarding use of PMI when running processes and tools that don't need or want to use it. We build PMI support based on configuration settings and library availability.
However, tools such as mpirun don't need it, and definitely shouldn't be using it. Ditto for procs launched by mpirun.

We used to have a way of dealing with this - we had the PMI component check to see if the process was the HNP or was launched by an HNP. Sadly, moving the OPAL db framework removed
 that ability as OPAL has no notion of HNPs or proc type.

So add a boolean flag to the db_base_select API that allows us to restrict selection to "local" components. This gives the PMI component the ability to reject itself as required. W
e then need to pass that param into the ess_base_std_app call so it can pass it all down.

This commit was SVN r29341.
2013-10-02 19:03:46 +00:00
George Bosilca
43b4d76913 Fix a corner case for a non-contiguous send convertor where the
convertor accepted to be set to a position in the middle of a
predefined datatype. Once set there is was unable to provide the
second part of the datatype. This fix force the convertor to be
aligned on predefined datatypes boundaries for any non-contiguous
send convertor.

This commit was SVN r29285.
2013-09-28 16:46:21 +00:00
Ralph Castain
d565a76814 Do some cleanup of the way we handle modex data. Identify data that needs to be shared with peers in my job vs data that needs to be shared with non-peers - no point in sharing extra data. When we share data with some process(es) from another job, we cannot know in advance what info they have or lack, so we have to share everything just in case. This limits the optimization we can do for things like comm_spawn.
Create a new required key in the OMPI layer for retrieving a "node id" from the database. ALL RTE'S MUST DEFINE THIS KEY. This allows us to compute locality in the MPI layer, which is necessary when we do things like intercomm_create.

cmr:v1.7.4:reviewer=rhc:subject=Cleanup handling of modex data

This commit was SVN r29274.
2013-09-27 00:37:49 +00:00
Ralph Castain
9aeba777fa Ensure we don't enter into an infinite loop looking for the PML modex key if it isn't present. The PMI implementation will load ALL modex keys when the first key is queried, so the hash db component can safely return "not found" if a subsequent key isn't present. The PML modex_recv needs to assume everything is okay if the modex recv fails to return a value.
cmr:v1.7.3:reviewer=jladd:subject=Prevent infinite loop when PML modex not found

This commit was SVN r29243.
2013-09-25 16:04:00 +00:00
Ralph Castain
63da76ad5f Silence warnings about pointer casting
This commit was SVN r29226.
2013-09-22 19:21:29 +00:00
Nathan Hjelm
01839db11b MCA/base: When encounter a duplicate file value don't free the filename.
Stale code.

cmr=v1.7.3:reviewer=rhc

This commit was SVN r29224.
2013-09-21 18:53:36 +00:00
Nathan Hjelm
bc31773523 Fix bug in db/pmi when a stored byte object has a NULL pointer.
cmr=v1.7.3:reviewer=samuel

This commit was SVN r29215.
2013-09-20 15:38:36 +00:00
Ralph Castain
7bc20866fd C standard stipulates that we have to cast the function to another of the same type to avoid unexpected behavior. We aren't using the function in this case, but Nick correctly points out that we should follow the standard regardless.
Refs trac:3755

This commit was SVN r29210.

The following Trac tickets were found above:
  Ticket 3755 --> https://svn.open-mpi.org/trac/ompi/ticket/3755
2013-09-19 18:42:21 +00:00
Ralph Castain
7de493fc02 Silence a warning about an address that can never be NULL - libevent needs to deal with the situation where the user may have compiled the code on a system where this function is present, but executes it on one where it isn't. Thus, a compile-time test isn't adequate.
Pushed upstream.

cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29201.
2013-09-18 02:03:01 +00:00
George Bosilca
55273f1c98 Cleanup spaces, nothing else.
This commit was SVN r29197.
2013-09-18 00:07:58 +00:00
Nathan Hjelm
7929fb9dea Cleanup complex datatypes and update datatypes and operator code to use C99.
This commit changes the underlying opal complex datatypes to match the
C99 types: float _Complex, double _Complex, and long double _Complex. The
fortran and C++ types now are aliases to these basic types instead of
structure types. The operators in ompi/mca/op/base now work on only the
C99 types and the fortran types use these operators if the fortran type
matches a C complex type (this should almost always be the case.)

C99 is not is use in both the datatype and operator code and should make
the code both cleaner and much less fragile.

This commit was SVN r29193.
2013-09-17 17:49:42 +00:00
Ralph Castain
2245ac0e7e Don't error log the return from setup_pmi as it can indicate that the process wasn't launched via srun or its equivalent.
cmr:v1.7.3:reviewer=jladd

This commit was SVN r29180.
2013-09-17 02:26:46 +00:00
Ralph Castain
e01953b440 Per Brice, silence warning on old Linux kernels
Refs trac:3744

This commit was SVN r29179.

The following Trac tickets were found above:
  Ticket 3744 --> https://svn.open-mpi.org/trac/ompi/ticket/3744
2013-09-16 15:43:33 +00:00
Ralph Castain
845e92bc5d Remove the old version of hwloc. Update the new one to reflect the official release dates.
Refs trac:3744

This commit was SVN r29154.

The following Trac tickets were found above:
  Ticket 3744 --> https://svn.open-mpi.org/trac/ompi/ticket/3744
2013-09-10 16:30:13 +00:00
Joshua Ladd
b3f88c4a1d Per the RFC schedule, this commit adds Mellanox OpenSHMEM to the trunk. It does not yet run on OSX or with CM PML for an MTL other than MXM. Mellanox is aware of these issues and is in the process of resolving them. This should be added to \ncmr=v1.7.4:subject=Move OSHMEM to 1.7.4:reviewer=rhc
This commit was SVN r29153.
2013-09-10 15:34:09 +00:00
Ralph Castain
46ed907003 Correctly handle list of cores specified in the rankfile - i.e., a rankfile entry such as:
rank 0=foo slot=0:0-1;1:0,1

cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29152.
2013-09-08 02:04:29 +00:00
Alex Margolin
50a3c01a0f fixed build without thread support
This commit was SVN r29145.
2013-09-06 19:03:19 +00:00
Ralph Castain
0d7fb932f1 Remove build product file
Refs trac:3744

This commit was SVN r29120.

The following Trac tickets were found above:
  Ticket 3744 --> https://svn.open-mpi.org/trac/ompi/ticket/3744
2013-09-04 16:38:22 +00:00
Ralph Castain
6011a4d29c As per the telecon, update hwloc to v1.7.2 so we can add MIC support. Ignore hwloc1.5.2 component for now until this tests out - will remove it then.
cmr:v1.7.4:reviewer=jsquyres

This commit was SVN r29107.
2013-09-03 16:23:42 +00:00
Ralph Castain
7a7cfdd519 A little cleanup - the base function to sort numa lists must return something or you get a warning about non-void function returning without value, so cleanup the return values. Ensure the mindist module actually checks for a return of "error" so it won't segfault, and have it emit a polite message when that happens.
cmr:v1.7.3:reviewer=jladd

This commit was SVN r29089.
2013-08-29 20:01:06 +00:00
Ralph Castain
3516348aad We don't need to report errors in pmi_setup as it is possible that PMI is available, but that we weren't launched under it (e.g., we launched via mpirun).
cmr:v1.7.3:reviewer=hjelmn:subject="Silence unnecessary PMI error msgs"

This commit was SVN r29086.
2013-08-29 16:35:20 +00:00
Joshua Ladd
1802aabf1a Add support for autodetecting a MLNX HCA in the rmaps min distance feature. In this way, .ini files distributed with software stacks need not specify a particular HCA but instead may select the key word auto which will automatically select the discovered device. To use this feature, simply pass the keyword auto instead of a specific device name, --mca rmaps_base_dist_hca auto. If more than one card is installed, the mapper will inform the user of this and, at this point, the user will then need to specify which card via the normal route, e.g. --mca rmaps_base_dist_hca <dev_name>. This should be added to \ncmr=v1.7.4:reviewer=rhc:subject=Autodetect logic for min dist mapping
This commit was SVN r29079.
2013-08-28 16:23:33 +00:00
Nathan Hjelm
77a41e1ca9 ompi_info: mark the variables from disabled components as disabled in
the output of ompi_info.

A variable is disabled if its component will never be selected due to
a component selection parameter (eg. -mca btl self). The old behavior
of ompi_info was to not print these parameters at all. Now we print the
parameters. After some discussion with George it was decided that there
needed to be some way to see what parameters will not be used. This was
the comprimise.

This commit also fixes a bug and a typo in the pvar sytem. The enum_count
value in mca_base_pvar_dump was being used without being set. The full_name
in mca_base_pvar_t was not being used.

cmr=v1.7.3:ticket=trac:3734

This commit was SVN r29078.

The following Trac tickets were found above:
  Ticket 3734 --> https://svn.open-mpi.org/trac/ompi/ticket/3734
2013-08-28 16:03:23 +00:00
Nathan Hjelm
3744c5e0be Also check for /dev/mic/scif when deciding whether to enable the Linux
memory hooks.

The MIC has a /dev/scif device and the host has /dev/mic/scif. I do not
know if this device exists when no MIC is connected.

cmr=v1.7.4:ticket=trac:3733:reviewer=jsquyres

This commit was SVN r29071.

The following Trac tickets were found above:
  Ticket 3733 --> https://svn.open-mpi.org/trac/ompi/ticket/3733
2013-08-27 19:40:02 +00:00
Nathan Hjelm
c699ee7812 Update the ompi_info man page with information about variable levels
and improve the behavior of ompi_info.

This commit changes the default behavior of ompi_info --all when a
level is not specified. Instead of assuming level 1 in this case we
now assume level 9. This change is due to feedback from the community
after the introduction of the --level option.

I also added a new option: --selected-only. This option will limit the
displayed variables to components that can be selected (ie. if there
is a selection parameter set-- btl self,sm)

cmr=v1.7.3:reviewer=jsquyres

This commit was SVN r29070.
2013-08-27 19:11:37 +00:00
Nathan Hjelm
6e1656279e Enable the use of the Linux memory hooks on Intel MIC.
cmr=v1.7.3:reviewer=jsquyres

This commit was SVN r29069.
2013-08-27 18:25:18 +00:00
Nathan Hjelm
2da64eb719 Fix compilation of the MPI tools information interface when profiling
is enabled and fix a bug in the handling of watermark performance
variables.

cmr=v1.7.3:ticket=trac:3725:reviewer=jsquyres

This commit was SVN r29068.

The following Trac tickets were found above:
  Ticket 3725 --> https://svn.open-mpi.org/trac/ompi/ticket/3725
2013-08-27 18:19:18 +00:00
Ralph Castain
a200e4f865 As per the RFC, bring in the ORTE async progress code and the rewrite of OOB:
*** THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE ***

Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro.

***************************************************************************************

I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week.

The code is in  https://bitbucket.org/rhc/ompi-oob2


WHAT:    Rewrite of ORTE OOB

WHY:       Support asynchronous progress and a host of other features

WHEN:    Wed, August 21

SYNOPSIS:
The current OOB has served us well, but a number of limitations have been identified over the years. Specifically:

* it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code)

* we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface.

* the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients

* there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort

* only one transport (i.e., component) can be "active"


The revised OOB resolves these problems:

* async progress is used for all application processes, with the progress thread blocking in the event library

* each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on")

* multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC.

* a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions.

* opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object

* NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions

* obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel

* the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport

* routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active

* all blocking send/recv APIs have been removed. Everything operates asynchronously.


KNOWN LIMITATIONS:

* although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline

* the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker

* routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways

* obviously, not every error path has been tested nor necessarily covered

* determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when *all* transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost.

* reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways

* the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC

This commit was SVN r29058.
2013-08-22 16:37:40 +00:00
Ralph Castain
611d7f9f6b When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require.
This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times.

Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes:

* upon first request for data, have the OPAL db pmi component fetch and decode *all* the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally

* reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test

* reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued).

Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it

Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time.

This commit was SVN r29040.
2013-08-17 00:49:18 +00:00
Ralph Castain
11a3743b21 Cleanup unitialized var warnings
This commit was SVN r29038.
2013-08-16 21:49:17 +00:00
Ralph Castain
7947cec8fa Cleanup warning
This commit was SVN r29031.
2013-08-16 21:13:40 +00:00
Ralph Castain
8a4c5f4957 Attempt to plug a few memory leaks by ensuring we finalize all things opened during init. However, we are still leaking memory like a sieve in param registration and hwloc.
This commit was SVN r29026.
2013-08-14 02:03:00 +00:00
Ralph Castain
2c286bccca Fix typo - thanks to Michael Schlottke for pointing it out
cmr:v1.7.3:reviewer=brbarret

This commit was SVN r29015.
2013-08-11 18:16:21 +00:00
Nathan Hjelm
524e9b148b MCA/base: add a function to unload a component without closing it for components that have been registered but not opened
This commit was SVN r29012.
2013-08-09 20:16:08 +00:00
Nathan Hjelm
841ed962f6 fix MCA variable and component system leaks
cmr=v1.7.3:reviewer=rhc

This commit was SVN r29011.
2013-08-09 19:50:28 +00:00
George Bosilca
30b910b54d More info in the debug mode.
This commit was SVN r29002.
2013-08-06 09:08:43 +00:00
Nathan Hjelm
be1bd4661c db/pmi: speed up modex by caching pmi data internally
This commit was SVN r29001.
2013-08-05 22:31:50 +00:00
Nathan Hjelm
88cadc552d Make opal/db/pmi use as few PMI keys as possible.
This commit reintroduces key compression into the pmi db. This feature
compresses the keys stored into the component into a small number of
PMI keys by serializing the data and base64 encoding the result. This
will avoid issues with Cray PMI which restricts us to ~ 3 PMI keys per
rank.

This commit was SVN r28993.
2013-08-03 01:06:59 +00:00
Ralph Castain
3c8aa7c296 Don't just hardcode the max length of the PMI name as it could be wrong. PMI2 installations seem to be retaining at least some of the PMI functions, so use the one to get the max name length.
This commit was SVN r28962.
2013-07-30 14:13:15 +00:00
Nathan Hjelm
99adeb7f6e Fix support for complex datatypes when fortran is not available but _Complex is
This commit was SVN r28951.
2013-07-25 19:08:21 +00:00
Nathan Hjelm
ebbb32120a MCA/base: variable system updates
- Use an enumerator to handle bool values.

 - Fix a leak in the variable enumerator.

 - Fix a leak in an orte parameter.

This commit was SVN r28949.
2013-07-25 15:42:01 +00:00
Ralph Castain
41f97931e9 Need to include module-level CPPFLAGS so it can build
This commit was SVN r28947.
2013-07-24 23:07:43 +00:00
Nathan Hjelm
c4c69b4ddf MPI-3: add support for large counts using derived datatypes
Add support for MPI_Count type and MPI_COUNT datatype and add the required
MPI-3 functions MPI_Get_elements_x, MPI_Status_set_elements_x,
MPI_Type_get_extent_x, MPI_Type_get_true_extent_x, and MPI_Type_size_x.
This commit adds only the C bindings. Fortran bindins will be added in
another commit. For now the MPI_Count type is define to have the same size
as MPI_Offset. The type is required to be at least as large as MPI_Offset
and MPI_Aint. The type was initially intended to be a ssize_t (if it was
the same size as a long long) but there were issues compiling romio with
that definition (despite the inclusion of stddef.h).

I updated the datatype engine to use size_t instead of uint32_t to support
large datatypes. This will require some review to make sure that 1) the
changes are beneficial, 2) nothing was broken by the change (I doubt
anything was), and 3) there are no performance regressions due to this
change.

Increase the maximum number of predifined datatypes to support MPI_Count

Put common get_elements code to ompi/datatype/ompi_datatype_get_elements.c

Update MPI_Get_count to reflect changes in MPI-3 (return MPI_UNDEFINED when the count is too large for an int)

This commit was SVN r28932.
2013-07-23 15:35:14 +00:00
Ralph Castain
6c1a140e99 Per request from Nathan, add a "commit" API to the opal db framework. This allows him to aggregate keys to work around the Cray's severe PMI limitations
This commit was SVN r28917.
2013-07-22 22:57:16 +00:00
Jeff Squyres
49b5342130 After talking with Nathan, update some comments/documentation about
the new MCA var and pvar systems.

This commit was SVN r28913.
2013-07-22 20:34:42 +00:00
Nathan Hjelm
61d331d5b5 MCA/base: fix some warnings and an error in the MCA variable system
This commit was SVN r28909.
2013-07-22 17:52:39 +00:00
Brian Barrett
0d8b57211a add missing include
This commit was SVN r28900.
2013-07-21 20:18:17 +00:00
Nathan Hjelm
1e8ba2b8cf fix condition in common/pmi init that c caused pmi to fail if PMI2_Init succeeds
This commit was SVN r28856.
2013-07-19 02:43:42 +00:00
Ralph Castain
4eb0dfa039 This has apparently been wrong for some time! Fix the common/pmi libraries so we build them dynamic so they can be properly linked into the components that use them. Define required library version numbers and so some other cuteness to make it all work.
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r28842.
2013-07-18 18:42:42 +00:00
Ralph Castain
92c6b806b9 Based on a patch submitted by Piotr Lesnicki of Bull, cleanup the PMI2 support. This has not been tested yet on multiple environments (e.g., Cray), so it needs more evaluation prior to moving to the 1.7 branch.
cmr:v1.7.3:reviewer=rhc

This commit was SVN r28837.
2013-07-18 14:46:07 +00:00
Nathan Hjelm
b88509af36 don't close components that failed to register. cmr:v1.7:reviewer=rhc
This commit was SVN r28823.
2013-07-17 19:49:05 +00:00
Nathan Hjelm
456de007a8 ignore unavailable components when registering
This commit was SVN r28802.
2013-07-16 16:02:33 +00:00
Nathan Hjelm
d446675526 MCA: Per-RFC, add support for performance variables
This commit adds an API for registering and querying performance
variables (mca_base_pvar) in the MCA base. The existing MCA variable
system API has been updated to reflect the new API: MCA variable
groups have performance variables, and new types have been added (double,
unsigned long long) to reflect what is required by the MPI_T
interface. Additionally, the MCA variable group code has been split
into its own set of files: mca_base_var_group.[ch].

Details of the new API can be found in doxygen comments in the header:
mca_base_pvar.h.

Other changes to the variable system:

 - Use an opal_hash_table to speed up variable/group lookup.

 - Clean up code associated with MCA variable types.

 - Registered performance variables are printed by ompi_info -a. In the
   future an option should be added to control this behavior.

Changes to OMPI:

 - Added full support for the MPI_T performance variable interface.

This commit was SVN r28800.
2013-07-16 16:02:13 +00:00
Ralph Castain
10ca1c1b04 Turns out that there was exactly ONE place in all of the OMPI code base that still referred to OPAL_TRACE, though a few places retained the include file for no reason. So no point in letting this sit as it is clearly an unused "feature".
This commit was SVN r28789.
2013-07-14 18:57:20 +00:00
Jeff Squyres
14424daf4c Remove auto-generated file
This commit was SVN r28784.
2013-07-13 20:55:09 +00:00
Nathan Hjelm
8f9b7926ec mca/base: fix component selection negation. cmr:v1.7:reviewer=jsquyres
This commit was SVN r28770.
2013-07-12 17:55:20 +00:00
Ralph Castain
b001d31c27 Per RFC, remove libevent 2.0.19 and leave 2.0.21 as the default
This commit was SVN r28767.
2013-07-12 16:37:15 +00:00
Jeff Squyres
9252afdcd9 Updates and tweaks to the documentation of the new MCA parameter
system (written in conjunction with Nathan).

This commit was SVN r28758.
2013-07-11 20:04:51 +00:00
Jeff Squyres
bdb45a2e4f Add an oh-so-slightly faster variant of the hotel "checkin" action
(since this is used in the fast path) for when you ''know'' that there
will be a room available:

 * Don't do the last_unoccupied_room check
 * Return void

This commit was SVN r28757.
2013-07-11 20:00:37 +00:00
Nathan Hjelm
a694bcb6b6 Add support for the MCA variable information level to ompi_info.
Add an option to ompi_info (-l, --level) that takes a number in the
interval (1,9). Only MCA variables up to this level will be printed.
The default level is 1.

Print the level as part of both the parsable and readable output.

This commit was SVN r28750.
2013-07-10 18:52:36 +00:00
Ralph Castain
028f5ee7a6 Cleanup some bitrot from moving the db framework to opal and from the new mca param system
This commit was SVN r28741.
2013-07-09 14:37:08 +00:00
Ralph Castain
315da8125d Remove stale headers
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r28732.
2013-07-08 18:26:58 +00:00
Ralph Castain
2ccc0438af On some systems, pthread_kill is actually in the "signals.h" header, so include it
This commit was SVN r28731.
2013-07-08 17:40:38 +00:00
Ralph Castain
eac174e624 For purposes of testing the RFC, make libevent2021 the default for now so it gets tested by MTT
This commit was SVN r28730.
2013-07-05 23:14:22 +00:00
Brian Barrett
ea9cee73c1 Per RFC, remove darwin backtrace, since OS X since 10.5 has supported the
execinfo() interface (which has been the default for OMPI to use on Darwin)

This commit was SVN r28727.
2013-07-05 19:06:27 +00:00
Ralph Castain
21c8041a40 Update libevent 2021 component so it also only warns once when detecting reentrant behavior
This commit was SVN r28721.
2013-07-04 04:41:04 +00:00
Ralph Castain
bd65937bf3 If we enable ipv6, we resolve a hosts addresses and check them all against our local interfaces to determine if the given host is us. However, if we don't enable ipv6, we only checked the first address returned. This can cause us to incorrectly identify a hostname as "not us".
Make -disable-ipv6 behave the same as --enable-ipv6 by checking all the returned addresses.

This commit was SVN r28716.
2013-07-03 21:41:36 +00:00
Ralph Castain
45fad1ddcc We really should be closing the event framework when told to do so.
cmr:v1.7.3,reviewer=jsquyres

This commit was SVN r28714.
2013-07-03 16:57:14 +00:00
Ralph Castain
9166a8cc95 Per telecon today, add a flag so we only warn once about reentrant libevent loops - this will allow developers to better diagnose the problem as we won't swamp filesystems with warning messages.
This commit was SVN r28712.
2013-07-03 04:51:36 +00:00
Jeff Squyres
ad16bcd6d1 Followup from Justin Bronder: Looks like I spoke too soon. The
sandbox team has informed me that they are getting rid of SANDBOX_PID
in the future and that using SANDBOX_ON would be preferred.

This commit was SVN r28708.
2013-07-03 01:38:26 +00:00
Jeff Squyres
fea15ec34e Add memory hooks override for Gentoo sandbox v2.5, too. Thanks to
Justin Bronder for the patch.

This commit was SVN r28702.
2013-07-02 12:34:51 +00:00
George Bosilca
a5bda43cfc Small typo.
This commit was SVN r28689.
2013-07-01 16:48:45 +00:00
Ralph Castain
446e33a5d8 There are cases where we want to use the novm state machine, but the backend node topology differs from that where mpirun is executing. In those cases, we can wind up thinking we are oversubscribed because the head node has fewer cores than the compute nodes.
To resolve this situation, add the ability to specify a backend topology file that mpirun shall use for its mapping operations. Create a new "set_topology" function in opal hwloc to support it.

This commit was SVN r28682.
2013-06-27 03:04:50 +00:00
Jeff Squyres
dd25421d48 Convert strcpy() to strncpy(), and just to be extra-super paranoid,
use memset(0) for extra bonus points.

This commit was SVN r28668.
2013-06-22 12:21:18 +00:00
Joshua Ladd
0b5c1f2ea8 Add 'generic' support for PMI2 (previously, we checked for PMI2 only on Cray systems.) If your resource manager (e.g. SLURM) has support for PMI2, then the --with-pmi configure flag will enable its usage. If you don't have PMI2, then you will fallback to regular old PMI1. This patch was submitted by Ralph Castain and reviewed and pushed by Josh Ladd. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28666.
2013-06-21 15:28:14 +00:00
George Bosilca
f5a55ccb39 Various cleanups.
This commit was SVN r28647.
2013-06-15 16:23:11 +00:00
George Bosilca
a6c3477e89 Remove useless include.
This commit was SVN r28646.
2013-06-15 16:07:30 +00:00
Nathan Hjelm
8924140916 Per RFC: use a better hash algorithm for the opal_hash_table_*_ptr functions.
Chose the crc32 function present in opal/util/crc.c as the hash function. The
performance should be sufficient for most cases. If not we can always change
the function again.

This commit was SVN r28629.
2013-06-13 17:11:04 +00:00
Nathan Hjelm
518d1fe200 Fix two typos that prevented alps direct launch from working
This commit was SVN r28628.
2013-06-13 17:04:08 +00:00
Joshua Ladd
46362d2761 Stomps compiler warnings in HCA min-dist calculation. This should be added to cmr:v1.7:reviewer=jladd
This commit was SVN r28620.
2013-06-12 16:25:25 +00:00
Tom Naughton
d86c3ce669 + remove autogenerated 'install-sh'
This commit was SVN r28602.
2013-06-07 20:40:24 +00:00
Rolf vandeVaart
62ab008017 Fix SEGV because missing CUDA initialization.
This commit was SVN r28601.
2013-06-07 18:31:36 +00:00
Rolf vandeVaart
1230029aa1 The debug messages were swapped. Fixed.
This commit was SVN r28600.
2013-06-07 17:23:41 +00:00
George Bosilca
72877f078f Based on the MPI 3.0 count equal to zero has a clear meaning, no modification
of the original datatype are allowed (not in type map nor extent). Make it
clear in the code.
Allow 0-count cases to the contiguous memory check.

This commit was SVN r28568.
2013-05-29 16:02:54 +00:00
Jeff Squyres
6d173af329 This commit introduces a new "mindist" ORTE RMAPS mapper, as well as
some relevant updates/new functionality in the opal/mca/hwloc and
orte/mca/rmaps bases.  This work was mainly developed by Mellanox,
with a bunch of advice from Ralph Castain, and some minor advice from
Brice Goglin and Jeff Squyres.

Even though this is mainly Mellanox's work, Jeff is committing only
for logistical reasons (he holds the hg+svn combo tree, and can
therefore commit it directly back to SVN).

-----

Implemented distance-based mapping algorithm as a new "mindist"
component in the rmaps framework.  It allows mapping processes by NUMA
due to PCI locality information as reported by the BIOS - from the
closest to device to furthest.

To use this algorithm, specify:

   {{{mpirun --map-by dist:<device_name>}}}

where <device_name> can be mlx5_0, ib0, etc.

There are two modes provided:

 1. bynode: load-balancing across nodes
 1. byslot: go through slots sequentially (i.e., the first nodes are
     more loaded)

These options are regulated by the optional ''span'' modifier; the
command line parameter looks like:

    {{{mpirun --map-by dist:<device_name>,span}}}

So, for example, if there are 2 nodes, each with 8 cores, and we'd
like to run 10 processes, the mindist algorithm will place 8 processes
to the first node and 2 to the second by default. But if you want to
place 5 processes to each node, you can add a span modifier in your
command line to do that.

If there are two NUMA nodes on the node, each with 4 cores, and we run
6 processes, the mindist algorithm will try to find the NUMA closest
to the specified device, and if successful, it will place 4 processes
on that NUMA but leaving the remaining two to the next NUMA node.

You can also specify the number of cpus per MPI process. This option
is handled so that we map as many processes to the closest NUMA as we
can (number of available processors at the NUMA divided by number of
cpus per rank) and then go on with the next closest NUMA.

The default binding option for this mapping is bind-to-numa. It works
if you don't specify any binding policy. But if you specified binding
level that was "lower" than NUMA (i.e hwthread, core, socket) it would
bind to whatever level you specify.

This commit was SVN r28552.
2013-05-22 13:04:40 +00:00
Jeff Squyres
55382c1bf8 Bring over upstream hwloc trunk commit
https://svn.open-mpi.org/trac/hwloc/changeset/5592 to fix the merging
of groups when they are I/O objects.

This commit was SVN r28551.
2013-05-22 12:34:59 +00:00
Nathan Hjelm
721779d7ab Per RFC: remove old MCA parameter system.
This commit was SVN r28541.
2013-05-20 15:36:13 +00:00
Jeff Squyres
089c632cce Remove a bunch of dead code: gcc 4.7 warns of set-but-unused
variables.  So get rid of them.

This commit was SVN r28538.
2013-05-17 21:45:49 +00:00
Ralph Castain
1ec13d530c Allow simple way to request comparison to full address regardless of addr family
This commit was SVN r28519.
2013-05-14 22:08:39 +00:00
Ralph Castain
eb2edb4b2b Silence warning
This commit was SVN r28516.
2013-05-14 22:00:01 +00:00
Rolf vandeVaart
8a8ea9ba1b Fix compile error in optimize build for CUDA-aware code.
This commit was SVN r28512.
2013-05-14 21:07:27 +00:00
Ralph Castain
37088f23d8 When ipv6 disabled, we still have getaddrinfo, so use it when checking common networks for resolving to kindex
This commit was SVN r28496.
2013-05-14 15:54:46 +00:00
Ralph Castain
3fc1bafd82 fix typo
This commit was SVN r28490.
2013-05-14 12:36:45 +00:00
Ralph Castain
f4f07bdb21 Ensure the opal_ifaddrtokindex function considers the full range of address space by using the netmask
This commit was SVN r28487.
2013-05-14 03:37:44 +00:00
Ralph Castain
c33219a51b Extend the bitmap API a bit to provide a test if all bits zero
This commit was SVN r28486.
2013-05-14 03:34:57 +00:00
Jeff Squyres
4d9da92e60 Fixes trac:376: bu default the wrappr compilers will enable rpath support
in generated executables on systems that support it.  Use
--disable-wrapper-rpath to disable this behavior.  See text in
README about --disable-wrapper-rpath for more details.

This commit was SVN r28479.

The following Trac tickets were found above:
  Ticket 376 --> https://svn.open-mpi.org/trac/ompi/ticket/376
2013-05-11 00:49:17 +00:00
Jeff Squyres
cad1d920b2 Check to ensure that we have struct ifreq.ifr_mtu before we try to use
it, because Solaris although has SIOCFIGMTU, it curiously does not
have ifreq.ifr_mtu.

This commit was SVN r28460.
2013-05-07 13:51:50 +00:00
Jeff Squyres
4b9b3a81ff Update the list of post-1.5.2 r numbers from hwloc that we have
committed here.

This commit was SVN r28458.
2013-05-07 01:22:06 +00:00
Jeff Squyres
ee0cdf86fd Fix issue raised by Stefan Friedel: remove an extraneous -L that is
added by hwloc's embedding so that it doesn't appear in
libhwloc_embedded.la (and therefore propogate all the way up to
libmpi.la). 

Committed upstream in hwloc SVN r5588.

This commit was SVN r28457.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r5588
2013-05-07 01:21:18 +00:00
Ralph Castain
527ea1d090 Per the RFC, always enable libevent thread support.
This commit was SVN r28443.
2013-05-03 15:39:05 +00:00