1
1
Граф коммитов

23512 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
b60b03d613 It is okay not to get the hostname - we don't require that it be provided 2015-09-11 13:01:20 -07:00
Nathan Hjelm
1868b5937c Merge pull request #889 from hjelmn/sentinel_update
Use the low instead of the high bit to indicate a proc is a sentinel
2015-09-11 12:30:27 -06:00
rhc54
c31093ff19 Merge pull request #890 from rhc54/topic/fixpmi
Revert "Revert "Fix the handling of cpusets so we get the correct cpu…
2015-09-11 09:25:24 -07:00
Nathan Hjelm
898a0a038c bml/r2: fix coverity CID 1323765
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-11 09:39:10 -06:00
Nathan Hjelm
64c8f124fc Use the low instead of the high bit to indicate a proc is a sentinel
The assumption that the high bit is not in use in pointers on any of our
supported platforms was incorrect. A better assumption is that all
ompi_proc_t pointers will be at least 2-byte aligned. This allows us
to use the low bit. To do this we drop the highest bit of the
opal_process_name_t jobid (hope this is ok) and use the low bit to
indicate the proc is really a sentinel.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-11 09:32:02 -06:00
Ralph Castain
dc5796b8a1 Revert "Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local""
Fix the locality computation by correctly computing the vpid of the local peer

This reverts commit open-mpi/ompi@6a8fad49e5.
2015-09-11 08:29:51 -07:00
Ralph Castain
6a8fad49e5 Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local"
This reverts commit f94f3cda21.
2015-09-11 02:01:25 -07:00
rhc54
d4017d5ed4 Merge pull request #888 from rhc54/topic/pmix
Sync to PMIx master
2015-09-11 01:10:13 -07:00
Gilles Gouaillardet
a1627feaf7 coll/ml, bcol: fix prototypes (e.g. use the const modifier) 2015-09-11 13:20:44 +09:00
Ralph Castain
e0a52354d4 Sync to PMIx master at open-mpi/pmix@89680d6663
Includes changes to support BigEndian machines
2015-09-10 20:47:40 -07:00
Gilles Gouaillardet
8f2d3aeb65 oshmem: do not include pml/ob1 headers
this is an abstraction violation and that can cause linker failure
2015-09-11 09:34:10 +09:00
Gilles Gouaillardet
638a59adf3 fix compilation in heterogeneous mode
use OPAL_PMIX_GLOBAL instead of PMIX_GLOBAL
2015-09-11 09:23:21 +09:00
rhc54
a4a20a39df Merge pull request #887 from rhc54/topic/s1
Fix the s1 component so direct launch is supported for SLURM
2015-09-10 17:04:08 -07:00
Ralph Castain
a2a15cea8a Fix the s1 component so direct launch is supported for SLURM 2015-09-10 16:07:37 -07:00
rhc54
3430f154fc Merge pull request #885 from hppritcha/topic/pmix_not_pmix1xx_u16_prob
pmix/~pmix1xx: use u32 for OPAL_PMIX_LOCAL_SIZE
2015-09-10 15:38:54 -07:00
Nathan Hjelm
ad3a2ef6cc silence warnings introduced by add_procs merge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 16:33:52 -06:00
Nathan Hjelm
2a269b52ee Merge pull request #886 from hjelmn/hwloc_numa_socket
opal/hwloc: fix topology detection when socket is above numa
2015-09-10 15:57:20 -06:00
Nathan Hjelm
899bf548a2 opal/hwloc: fix topology detection when socket is above numa
The OPAL_PROC_ON_* definitions have been changed from values to
flags. This should not cause any problems as these values were already
used as flags throughout the code base. Note, there will be a
difference between localities produced by the new code and the
old. For example, if a machine does not have a level-3 but two cores
share a level-1 or level-2 cache cache the level-3 bit will not be set
in the locality and OPAL_PROC_ON_LOCAL_L3CACHE will return 0. Before
this change it would have returned 1.

In addition the OPAL_PROC_ON_LOCAL_* macros have been simplified.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 14:17:45 -06:00
Jeff Squyres
0fd073b69e Merge pull request #882 from yburette/master
Update AUTHORS list.
2015-09-10 15:14:42 -04:00
Howard Pritchard
2bbf22e2d0 pmix/~pmix1xx: use u32 for OPAL_PMIX_LOCAL_SIZE
Looks like in ess_pmi_module.c u32 is being used
for retrieving OPAL_PMIX_LOCAL_SIZE, while s1/s2/cray
pmix components were storing as u16.

This commit fixes this problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-09-10 11:41:39 -07:00
rhc54
6c0767222d Merge pull request #884 from rhc54/topic/cpuset
Fix the handling of cpusets so we get the correct cpuset for each loc…
2015-09-10 11:28:17 -07:00
Ralph Castain
f94f3cda21 Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local 2015-09-10 10:25:30 -07:00
Nathan Hjelm
6a0c7b85bf Merge pull request #849 from hjelmn/add_procs
New add_procs behavior
2015-09-10 10:51:56 -06:00
Nathan Hjelm
ed005f2a61 ompi/dpm: improve scalability of ompi_dpm_mark_dyncomm
This commit removes the use of ompi_group_peer_lookup in the
ompi_dpm_mark_dyncomm function. The function now uses
ompi_group_get_proc_name which does not allocate an ompi_proc_t if one
does not already exist.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 10:50:58 -06:00
yohann
4dc5cad313 Update AUTHORS list. 2015-09-10 09:47:57 -07:00
Jeff Squyres
2b8b544f2c Merge pull request #880 from Zhiming-Wang/master
Update AUTHORS
2015-09-10 10:57:15 -04:00
Nathan Hjelm
202c6a38e4 scoll/mpi: work around bug in oshmem/proc design
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:56 -06:00
Jeff Squyres
f7d90abf42 usnic: update for new add_procs() downcall behavior 2015-09-10 08:55:55 -06:00
Jeff Squyres
2f2d5ff855 btl.h: update comment for new add_procs behavior 2015-09-10 08:55:55 -06:00
Nathan Hjelm
987e865c99 mtl/psm2: add support for dynamic add_procs
Add an accessor for the proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]
member of the ompi_proc_t structure. This accessort calls add_procs
with the ompi_proc_t if the member is NULL.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
8df9b1d40d mtl/psm: add support for dynamic add_procs
Add an accessor for the proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]
member of the ompi_proc_t structure. This accessort calls add_procs
with the ompi_proc_t if the member is NULL. Tested on an infinipath
system with InfiniPath_QLE7340 HCAs.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
0a0e6d8eef ompi/group: clean up union/difference code
Updated the union/difference code to remove an extra n^2 translation
of ranks. This comes at the cost of extra memory but greatly
simplifies the code.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
5b7943db78 ompi/group: do not allocate ompi_proc_t's on group union/difference
This commit modifies the ompi_group_t union/difference code to compare/copy the
raw group values. This will either be a ompi_proc_t or a sentinel value. This
commit also adds helper functions to convert between opal process names and
sentinel values.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
2041aac4e4 btl/openib: add support for dynamic add_procs
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
40067f7ec4 btl/tcp: add support for dynamic add_procs
This commit makes two changes to the tcp btl:

 - If a tcp proc does not exist when handling a new connection create
   a new proc and use it. The current implementation uses the
   opal_proc_by_name() function to get the opal_proc_t then calls
   add_procs on all btl modules. It may be sufficient to just call
   add_procs until an endpoint is created so this may change somewhat.

 - In add_procs add a check for an existing endpoint before creating
   one.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
536aba1172 btl/portals4: add send flag to btl_flags 2015-09-10 08:55:55 -06:00
Nathan Hjelm
d8b0a6efda Remove use of ompi_comm_peer_lookup in osc/sm
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
a41889112c Remove calls to ompi_group_peer_lookup in coll/sm and coll/fca
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
0bf06de3f1 group|comm: add initial support for group sentinel values
This commit modifies ompi's process list group object to support a
sentinel value for non-existant ompi_proc_t objects. The sentinel was
chosen to be the negative of the opal_process_name_t of the associated
ompi_proc_t. This takes advantage of the fact that on most (all?)
systems the top bit of a user-space pointer is never set. If this
changes then a new sentinel will be needed.

In addition this commit modifies the way ompi_mpi_comm_world is
initialized to fill in the group with sentinel values if the number of
processes exceeds the new add_procs behavior cutoff.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
408da16d50 ompi/proc: add proc hash table for ompi_proc_t objects
This commit adds an opal hash table to keep track of mapping between
process identifiers and ompi_proc_t's. This hash table is used by the
ompi_proc_by_name() function to lookup (in O(1) time) a given
process. This can be used by a BTL or other component to get a
ompi_proc_t when handling an incoming message from an as yet unknown
peer.

Additionally, this commit adds a new MCA variable to control the new
add_procs behavior: mpi_add_procs_cutoff. If the number of ranks in
the process falls below the threshold a ompi_proc_t is created for
every process. If the number of ranks is above the threshold then a
ompi_proc_t is only created for the local rank. The code needed to
generate additional ompi_proc_t's for a communicator is not yet
complete.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
6f8f2325ed btl: btls are now required to set the send flag if supported
This commit updates each non-compliant btl to send the
MCA_BTL_FLAGS_SEND flag in the btl_flags field if send is
supported. This fixes a problem identified after the latest bml/r2
update which excplicitly checks for the send flag.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
b4a0d40915 pml/ob1: Add support for dynamically calling add_procs
This commit contains the following changes:

 - pml/ob1: use the bml accessor function when requesting a bml
   endpoint. this will ensure that bml endpoints are only created when
   needed. for example, a bml endpoint is not requested and not
   allocated when receiving an eager message from a peer.

 - pml/ob1: change the pml_procs array in the ob1 communicator to a
   proc pointer array. at the cost of a single level of extra
   redirection this will allow us to allocate pml procs on demand.

 - pml/ob1: add an accessor function to access the pml proc structure
   for a given peer. this function will allocate the proc if it
   doesn't already exist.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
6fa6513003 bml: Add support for dynamically calling add_procs
This commit contains the following changes:

 - bml: add a function to add a single process. this function is
   intended to remove the need to maintain a opal_bitmap_t as it is
   irrelevant for a single proc. BTLs will need to be updated to
   either 1) ignore the return code from opal_bitmap_set_bit or not
   call the function if the reachability bitmap is NULL.

 - bml: add an inline accessor function for getting the bml endpoint
   for a peer proc. this function will either 1) return the cached bml
   endpoint, or 2) create the endpoint and call add_proc will all
   available BTL modules.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Zhiming Wang
88ff560068 Update AUTHORS
Add myself into "AUTHORS".
2015-09-10 21:28:38 +08:00
rhc54
6ddb8e8b9b Merge pull request #878 from rhc54/topic/bige
Sync to latest PMIx master, including some BigEndian fixes
2015-09-09 12:31:22 -07:00
Ralph Castain
4c47c498ac Sync to latest PMIx master
Allow the blocking send and recv to keep trying
2015-09-09 11:48:47 -07:00
Matias Cabral
f360eebfeb Merge pull request #855 from matcabral/btl_openib_mtu
Fix for openib btl mca command line parameter btl_openib_mtu being ignored
2015-09-09 11:22:00 -07:00
Ralph Castain
b79cffc73b Protect ourselves - if the active pmix component doesn't have some optional functions, then gracefully decline to perform the operation OR use a required alternative (e.g., fence in place of disconnect)
This fixes the Slurm pmi2 support - still something wrong in pmi1
2015-09-09 02:29:00 -07:00
Gilles Gouaillardet
7f0ed74d24 pmix1xx: fix CPPFLAGS when DSO are not built 2015-09-09 14:20:12 +09:00
rhc54
f6b6b9a9ca Merge pull request #877 from rhc54/topic/s1s2
Cleanup s1 and s2 components
2015-09-08 19:20:59 -07:00