Rolf vandeVaart
7da614c75e
Add ability for user to empty the CUDA IPC registration cache when it is full
2015-09-17 16:42:16 -04:00
Gilles Gouaillardet
975b6fd51b
hwloc: do not count not allowed cores in df_search_cores
2015-09-17 13:10:34 +09:00
Ralph Castain
1b7930ad52
Silence some warnings and address Coverity issues
2015-09-16 07:58:22 -07:00
rhc54
5597416fe0
Merge pull request #897 from rhc54/topic/oob
...
Remove the last involvement of the OOB system from the MPI layer
2015-09-15 14:40:21 -07:00
Ralph Castain
c1bbbb5e2f
Remove the last involvement of the OOB system from the MPI layer, remove the no-longer-needed usock/oob component, and have procs no longer open the RML, OOB, ROUTED, and GRPCOMM frameworks as PMIx now provides all required app-mpirun cmds
2015-09-15 13:08:35 -07:00
Rolf vandeVaart
555f14a479
Merge pull request #893 from rolfv/pr/more-verbose-fix
...
Cleanup handle verbose messages
2015-09-15 15:45:52 -04:00
George Bosilca
0e7e14449f
Typo in the modex_recv.
2015-09-14 18:00:02 -04:00
Rolf vandeVaart
34fe2188cd
Cleanup handle verbose messages
2015-09-14 11:01:25 -04:00
Gilles Gouaillardet
d5af5d106c
btl/sm: mca_btl_sm_sendi: do not set *descriptor when descriptor is NULL
2015-09-14 14:04:40 +09:00
rhc54
33f5e4c766
Merge pull request #892 from rhc54/topic/pmix
...
Fix the no-disconnect test by resolving a segfault on free - opal_dss…
2015-09-11 16:01:42 -07:00
Ralph Castain
22d7c0081a
Fix the no-disconnect test by resolving a segfault on free - opal_dss.unload will return the remaining unpacked portion of a buffer. As such, it cannot return the pointer to that info as it might be partway inside of a malloc'd region. So copy the data out of the buffer.
2015-09-11 13:01:35 -07:00
Rolf vandeVaart
90dd1d264b
Fix cuda verbosity messages
2015-09-11 15:44:36 -04:00
Ralph Castain
dc5796b8a1
Revert "Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local""
...
Fix the locality computation by correctly computing the vpid of the local peer
This reverts commit open-mpi/ompi@6a8fad49e5 .
2015-09-11 08:29:51 -07:00
Ralph Castain
6a8fad49e5
Revert "Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local"
...
This reverts commit f94f3cda214ab937c46802896fb53b84bec6cc3a.
2015-09-11 02:01:25 -07:00
Ralph Castain
e0a52354d4
Sync to PMIx master at open-mpi/pmix@89680d6663
...
Includes changes to support BigEndian machines
2015-09-10 20:47:40 -07:00
Ralph Castain
a2a15cea8a
Fix the s1 component so direct launch is supported for SLURM
2015-09-10 16:07:37 -07:00
rhc54
3430f154fc
Merge pull request #885 from hppritcha/topic/pmix_not_pmix1xx_u16_prob
...
pmix/~pmix1xx: use u32 for OPAL_PMIX_LOCAL_SIZE
2015-09-10 15:38:54 -07:00
Nathan Hjelm
899bf548a2
opal/hwloc: fix topology detection when socket is above numa
...
The OPAL_PROC_ON_* definitions have been changed from values to
flags. This should not cause any problems as these values were already
used as flags throughout the code base. Note, there will be a
difference between localities produced by the new code and the
old. For example, if a machine does not have a level-3 but two cores
share a level-1 or level-2 cache cache the level-3 bit will not be set
in the locality and OPAL_PROC_ON_LOCAL_L3CACHE will return 0. Before
this change it would have returned 1.
In addition the OPAL_PROC_ON_LOCAL_* macros have been simplified.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 14:17:45 -06:00
Howard Pritchard
2bbf22e2d0
pmix/~pmix1xx: use u32 for OPAL_PMIX_LOCAL_SIZE
...
Looks like in ess_pmi_module.c u32 is being used
for retrieving OPAL_PMIX_LOCAL_SIZE, while s1/s2/cray
pmix components were storing as u16.
This commit fixes this problem.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2015-09-10 11:41:39 -07:00
Ralph Castain
f94f3cda21
Fix the handling of cpusets so we get the correct cpuset for each local peer. Add the ability to indicate that a modex request is "optional" so we don't call the server if we don't find the value. Take advantage of that to allow the MPI layer to decide that the lack of locality info indicates non-local
2015-09-10 10:25:30 -07:00
Jeff Squyres
f7d90abf42
usnic: update for new add_procs() downcall behavior
2015-09-10 08:55:55 -06:00
Jeff Squyres
2f2d5ff855
btl.h: update comment for new add_procs behavior
2015-09-10 08:55:55 -06:00
Nathan Hjelm
2041aac4e4
btl/openib: add support for dynamic add_procs
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
40067f7ec4
btl/tcp: add support for dynamic add_procs
...
This commit makes two changes to the tcp btl:
- If a tcp proc does not exist when handling a new connection create
a new proc and use it. The current implementation uses the
opal_proc_by_name() function to get the opal_proc_t then calls
add_procs on all btl modules. It may be sufficient to just call
add_procs until an endpoint is created so this may change somewhat.
- In add_procs add a check for an existing endpoint before creating
one.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:55 -06:00
Nathan Hjelm
536aba1172
btl/portals4: add send flag to btl_flags
2015-09-10 08:55:55 -06:00
Nathan Hjelm
408da16d50
ompi/proc: add proc hash table for ompi_proc_t objects
...
This commit adds an opal hash table to keep track of mapping between
process identifiers and ompi_proc_t's. This hash table is used by the
ompi_proc_by_name() function to lookup (in O(1) time) a given
process. This can be used by a BTL or other component to get a
ompi_proc_t when handling an incoming message from an as yet unknown
peer.
Additionally, this commit adds a new MCA variable to control the new
add_procs behavior: mpi_add_procs_cutoff. If the number of ranks in
the process falls below the threshold a ompi_proc_t is created for
every process. If the number of ranks is above the threshold then a
ompi_proc_t is only created for the local rank. The code needed to
generate additional ompi_proc_t's for a communicator is not yet
complete.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
6f8f2325ed
btl: btls are now required to set the send flag if supported
...
This commit updates each non-compliant btl to send the
MCA_BTL_FLAGS_SEND flag in the btl_flags field if send is
supported. This fixes a problem identified after the latest bml/r2
update which excplicitly checks for the send flag.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Ralph Castain
4c47c498ac
Sync to latest PMIx master
...
Allow the blocking send and recv to keep trying
2015-09-09 11:48:47 -07:00
Matias Cabral
f360eebfeb
Merge pull request #855 from matcabral/btl_openib_mtu
...
Fix for openib btl mca command line parameter btl_openib_mtu being ignored
2015-09-09 11:22:00 -07:00
Gilles Gouaillardet
7f0ed74d24
pmix1xx: fix CPPFLAGS when DSO are not built
2015-09-09 14:20:12 +09:00
rhc54
f6b6b9a9ca
Merge pull request #877 from rhc54/topic/s1s2
...
Cleanup s1 and s2 components
2015-09-08 19:20:59 -07:00
Ralph Castain
1cdb86b8c7
Cleanup s1 and s2 components, and ensure that mpirun and orteds only use non-direct-launch pmix components.
2015-09-08 18:37:09 -07:00
Gilles Gouaillardet
6e6a3e965c
pml: do not cast way the const modifier when this is not necessary
...
update the pml framework and mpi c bindings
2015-09-09 09:18:57 +09:00
rhc54
3a446c9797
Merge pull request #876 from rhc54/topic/hnp
...
Fix segfault upon job error
2015-09-08 15:10:51 -07:00
rhc54
47f437608d
Merge pull request #875 from rhc54/topic/dynamics
...
Stop a segfault in the test by correctly passing all the argv during spawn
2015-09-08 14:35:42 -07:00
Ralph Castain
459f169e06
Fix segfault upon job error
...
Silence some unnecessary error-logs
2015-09-08 14:03:06 -07:00
Ralph Castain
ae7156cabb
Stop a segfault in the test by correctly passing all the argv during spawn
2015-09-08 13:42:46 -07:00
Rolf vandeVaart
188c30a01a
Merge pull request #867 from rolfv/pr/openib-hwloc-verbosity
...
Add some verbosity to help debug hwloc issues
2015-09-08 14:43:35 -04:00
rhc54
8053357fcc
Merge pull request #873 from rhc54/topic/static
...
Add the libs required for PMIx to support static builds (and trim all excess whitespace)
2015-09-08 11:28:47 -07:00
Rolf vandeVaart
2e64a69fa9
Add some verbosity to help debug hwloc issues
2015-09-08 10:50:22 -07:00
Ralph Castain
291afe502f
Add the libs required for PMIx to support static builds
...
Remove unneeded CPPFLAGS
2015-09-08 10:21:06 -07:00
Jeff Squyres
bc9e5652ff
whitespace: purge whitespace at end of lines
...
Generated by running "./contrib/whitespace-purge.sh".
2015-09-08 09:47:17 -07:00
Ralph Castain
e6add86e4f
Deal with connect/accept between two jobs from different mpirun's. Somewhat optimize connect/accept by using MPI bcast to distribute the participants instead of another PMIx lookup. Cleanup some Coverity issues.
2015-09-07 09:19:24 -07:00
Ralph Castain
37c3ed68e7
Cleanup connect/disconnect and bring comm_spawn back online!
2015-09-06 10:27:39 -07:00
Jeff Squyres
f782a7640e
usnic: minor re-order of Makefile.am sources
...
Put the hwloc.c file alphabetically in the list.
2015-09-05 05:02:00 -07:00
rhc54
665b30376a
Merge pull request #868 from rhc54/topic/hwloc
...
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 17:58:07 -07:00
Ralph Castain
d97bc29102
Remove OPAL_HAVE_HWLOC qualifier and error out if --without-hwloc is given
2015-09-04 16:54:40 -07:00
rhc54
d45ccda813
Merge pull request #866 from rhc54/topic/updatepmix
...
Update PMIx support
2015-09-04 11:09:36 -07:00
Ralph Castain
f6948c2bb4
Sync with PMIx master 43e45c3. Get multi-node publish/lookup/unpublish working
2015-09-04 10:07:17 -07:00
Rolf vandeVaart
ebfd00b66e
While debugging user problems, these extra verbosity statements would be helpful
2015-09-03 17:15:39 -04:00