1
1
Граф коммитов

8404 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
638a59adf3 fix compilation in heterogeneous mode
use OPAL_PMIX_GLOBAL instead of PMIX_GLOBAL
2015-09-11 09:23:21 +09:00
Nathan Hjelm
ad3a2ef6cc silence warnings introduced by add_procs merge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 16:33:52 -06:00
Ralph Castain
f94f3cda21 Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local 2015-09-10 10:25:30 -07:00
Nathan Hjelm
ed005f2a61 ompi/dpm: improve scalability of ompi_dpm_mark_dyncomm
This commit removes the use of ompi_group_peer_lookup in the
ompi_dpm_mark_dyncomm function. The function now uses
ompi_group_get_proc_name which does not allocate an ompi_proc_t if one
does not already exist.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 10:50:58 -06:00
Nathan Hjelm
987e865c99 mtl/psm2: add support for dynamic add_procs
Add an accessor for the proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]
member of the ompi_proc_t structure. This accessort calls add_procs
with the ompi_proc_t if the member is NULL.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
8df9b1d40d mtl/psm: add support for dynamic add_procs
Add an accessor for the proc_endpoints[OMPI_PROC_ENDPOINT_TAG_MTL]
member of the ompi_proc_t structure. This accessort calls add_procs
with the ompi_proc_t if the member is NULL. Tested on an infinipath
system with InfiniPath_QLE7340 HCAs.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
0a0e6d8eef ompi/group: clean up union/difference code
Updated the union/difference code to remove an extra n^2 translation
of ranks. This comes at the cost of extra memory but greatly
simplifies the code.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
5b7943db78 ompi/group: do not allocate ompi_proc_t's on group union/difference
This commit modifies the ompi_group_t union/difference code to compare/copy the
raw group values. This will either be a ompi_proc_t or a sentinel value. This
commit also adds helper functions to convert between opal process names and
sentinel values.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
d8b0a6efda Remove use of ompi_comm_peer_lookup in osc/sm
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
a41889112c Remove calls to ompi_group_peer_lookup in coll/sm and coll/fca
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
0bf06de3f1 group|comm: add initial support for group sentinel values
This commit modifies ompi's process list group object to support a
sentinel value for non-existant ompi_proc_t objects. The sentinel was
chosen to be the negative of the opal_process_name_t of the associated
ompi_proc_t. This takes advantage of the fact that on most (all?)
systems the top bit of a user-space pointer is never set. If this
changes then a new sentinel will be needed.

In addition this commit modifies the way ompi_mpi_comm_world is
initialized to fill in the group with sentinel values if the number of
processes exceeds the new add_procs behavior cutoff.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
408da16d50 ompi/proc: add proc hash table for ompi_proc_t objects
This commit adds an opal hash table to keep track of mapping between
process identifiers and ompi_proc_t's. This hash table is used by the
ompi_proc_by_name() function to lookup (in O(1) time) a given
process. This can be used by a BTL or other component to get a
ompi_proc_t when handling an incoming message from an as yet unknown
peer.

Additionally, this commit adds a new MCA variable to control the new
add_procs behavior: mpi_add_procs_cutoff. If the number of ranks in
the process falls below the threshold a ompi_proc_t is created for
every process. If the number of ranks is above the threshold then a
ompi_proc_t is only created for the local rank. The code needed to
generate additional ompi_proc_t's for a communicator is not yet
complete.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
b4a0d40915 pml/ob1: Add support for dynamically calling add_procs
This commit contains the following changes:

 - pml/ob1: use the bml accessor function when requesting a bml
   endpoint. this will ensure that bml endpoints are only created when
   needed. for example, a bml endpoint is not requested and not
   allocated when receiving an eager message from a peer.

 - pml/ob1: change the pml_procs array in the ob1 communicator to a
   proc pointer array. at the cost of a single level of extra
   redirection this will allow us to allocate pml procs on demand.

 - pml/ob1: add an accessor function to access the pml proc structure
   for a given peer. this function will allocate the proc if it
   doesn't already exist.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
6fa6513003 bml: Add support for dynamically calling add_procs
This commit contains the following changes:

 - bml: add a function to add a single process. this function is
   intended to remove the need to maintain a opal_bitmap_t as it is
   irrelevant for a single proc. BTLs will need to be updated to
   either 1) ignore the return code from opal_bitmap_set_bit or not
   call the function if the reachability bitmap is NULL.

 - bml: add an inline accessor function for getting the bml endpoint
   for a peer proc. this function will either 1) return the cached bml
   endpoint, or 2) create the endpoint and call add_proc will all
   available BTL modules.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Ralph Castain
b79cffc73b Protect ourselves - if the active pmix component doesn't have some optional functions, then gracefully decline to perform the operation OR use a required alternative (e.g., fence in place of disconnect)
This fixes the Slurm pmi2 support - still something wrong in pmi1
2015-09-09 02:29:00 -07:00
Gilles Gouaillardet
fe351f6801 io: do not cast way the const modifier when this is not necessary
update the io framework and mpi c bindings
2015-09-09 09:18:58 +09:00
Gilles Gouaillardet
e01bac962f coll: do not cast way the const modifier when this is not necessary
update the coll framework and mpi c bindings
2015-09-09 09:18:57 +09:00
Gilles Gouaillardet
6e6a3e965c pml: do not cast way the const modifier when this is not necessary
update the pml framework and mpi c bindings
2015-09-09 09:18:57 +09:00
Gilles Gouaillardet
43ef261d46 topo: do not cast way the const modifier when this is not necessary
update the topo framework and mpi c bindings
2015-09-09 09:18:57 +09:00
rhc54
3a446c9797 Merge pull request #876 from rhc54/topic/hnp
Fix segfault upon job error
2015-09-08 15:10:51 -07:00
Ralph Castain
459f169e06 Fix segfault upon job error
Silence some unnecessary error-logs
2015-09-08 14:03:06 -07:00
Ralph Castain
ae7156cabb Stop a segfault in the test by correctly passing all the argv during spawn 2015-09-08 13:42:46 -07:00
Jeff Squyres
bc9e5652ff whitespace: purge whitespace at end of lines
Generated by running "./contrib/whitespace-purge.sh".
2015-09-08 09:47:17 -07:00
Edgar Gabriel
c83e6ad0c8 fix coverty warnings 1322865 and 72136 2015-09-08 09:15:57 -05:00
Ralph Castain
e6add86e4f Deal with connect/accept between two jobs from different mpirun's. Somewhat optimize connect/accept by using MPI bcast to distribute the participants instead of another PMIx lookup. Cleanup some Coverity issues. 2015-09-07 09:19:24 -07:00
Gilles Gouaillardet
c404e98dce coll/ml: silence warnings (incorrect callback prototype) 2015-09-07 14:56:49 +09:00
Gilles Gouaillardet
56f8a7b840 coll/ml: declare a global variable as static to avoid an uninitialized common symbol. 2015-09-07 14:56:03 +09:00
Ralph Castain
37c3ed68e7 Cleanup connect/disconnect and bring comm_spawn back online! 2015-09-06 10:27:39 -07:00
Jeff Squyres
794ee4a604 treematch: remove stale test
This test was accidentally left over from
open-mpi/ompi@d97bc29102 that prevented
the treematch component from building.
2015-09-05 05:02:30 -07:00
rhc54
665b30376a Merge pull request #868 from rhc54/topic/hwloc
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 17:58:07 -07:00
Ralph Castain
d97bc29102 Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given 2015-09-04 16:54:40 -07:00
rhc54
d45ccda813 Merge pull request #866 from rhc54/topic/updatepmix
Update PMIx support
2015-09-04 11:09:36 -07:00
Ralph Castain
f6948c2bb4 Sync with PMIx master 43e45c3. Get multi-node publish/lookup/unpublish working 2015-09-04 10:07:17 -07:00
Pavel Shamis / Pasha
c3446f363b Merge pull request #859 from shamisp/topic/ml_soft_disable
ML: Replace opal ignore with a zero priority
2015-09-04 12:37:37 -04:00
Pavel Shamis (Pasha)
32c69630ad ML: Replace opal ignore with a zero priority
The priority set by default to 0. As a result component open reports
an error and the component is not loaded (no resources allocated).
2015-09-04 11:28:47 -04:00
yohann
404393b9d7 mtl/ofi: Minor code cleanup. 2015-09-03 15:04:55 -07:00
yohann
a8cac09769 mtl/ofi: Renamed macro to prevent clash with FI_ namespace. 2015-09-03 14:42:45 -07:00
yohann
7adb9b7ab4 mtl/ofi: Handle -FI_EAGAIN on send and recv operations. 2015-09-03 10:47:00 -07:00
Edgar Gabriel
c9710660af Merge pull request #863 from edgargabriel/topic/fcoll-static-cleanup
Topic/fcoll static cleanup
2015-09-03 11:21:02 -05:00
Edgar Gabriel
a96a15a83c re-enable the contiguous buffer optimization similarly to the dynamic component. Passes all hdf5testsi and our own test suite.
Please enter the commit message for your changes. Lines starting
2015-09-03 10:13:03 -05:00
Edgar Gabriel
8007effc93 code cleanup for static component, similarly to the dynamic one 2015-09-03 10:12:45 -05:00
Jeff Squyres
6d9faf07e5 Merge pull request #858 from jsquyres/pr/fortran-use-only
fortran configiry: test for USE...ONLY support
2015-09-03 10:19:48 -04:00
Edgar Gabriel
ac3a01c39c Silence coverty warnings 1321702, 1321701, 1321700, 72331, 72330, 72327, 72326, 72325, 2015-09-03 09:10:25 -05:00
Jeff Squyres
66dda00f06 fortran configiry: test for USE...ONLY support
As of v15.7, the PGI Fortran compiler does not properly support how
Open MPI uses the "USE ... ONLY" Fortran syntax to include modules
with conflicting symbol definitions (interestingly, pgfortran only has
a problem with this when compiling with -g).

In short, OMPI uses "USE :: module_aaa, ONLY: foo" and "USE ::
module_bbb, ONLY: bar" to use modules aaa and bbb, even though they
contain conflicting definitions for some symbols.  However, the use of
the ONLY clause should preclude the inclusion of the conflicting
symbols -- as the word implies, it should direct the compiler to
*only* use the symbols identified by the clause (i.e., foo and bar, in
this example).

This commit adds a configure test for this capability.  If the
compiler fails to build a simple test that mimics this behavior, then
disable the mpi_f08 bindings.

Fixes open-mpi/ompi#857
2015-09-02 15:55:24 -07:00
Ralph Castain
a772b46c15 Bring the MPI_Publish and friends online 2015-09-02 12:04:07 -07:00
Edgar Gabriel
e95d01be97 Merge pull request #847 from edgargabriel/topic/fcoll-dynamic-cleanup
Topic/fcoll dynamic cleanup
2015-09-01 16:10:55 -05:00
Nathan Hjelm
2a8cc5e637 osc/pt2pt: remove outstanding lock only after lock/flush ack received
fixes #840

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-01 10:54:47 -06:00
Edgar Gabriel
82efc23e8d iclean up indenting and tabs/space of fcoll_static_file_read/write_all 2015-09-01 09:39:33 -05:00
Edgar Gabriel
a1778406d6 Re-enable the contiguous buffer optimization to the read_all and the write_all routines.
After long debugging, I found last week the reason this optimization originally broke
some hdf5 tests. We now pass the hdf5 test suite with the optimization being actively used.
2015-09-01 09:29:07 -05:00
Edgar Gabriel
c2c44b11dc Code cleanup for dynamic read_all and write_all
Specifically:
 - reduce the number of realloc's and malloc's by moving
   some arrays out of the cycle loop, if we know that there
   size is not changing
 - store the rank of the aggregator in a separate variable to avoid
   continuous dereferencing
 - change the wait_all logic in write_all to use a fix number of requests
   (even if they are MPI_REQUEST_NULL)
 - fix the timing to considere the two initial allgather and the one
   allgatherv operation to be a part of it
 - add more comments.
2015-09-01 09:29:07 -05:00