1
1
Граф коммитов

26368 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
507623d6b1 mpool/hugepage: plug a memory leak on finalize
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:58 +09:00
Gilles Gouaillardet
51021028d6 mpool/base: plug a memory leak on finalize
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:58 +09:00
Gilles Gouaillardet
1daa80d78f mtl/psm2: plug a memory leak in ompi_mtl_psm2_component_open()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 09:28:32 +09:00
Ralph Castain
b343df43a1 Merge pull request #2669 from rhc54/topic/memprobe
Complete the memprobe support.
2017-01-05 12:02:56 -08:00
Ralph Castain
6509f60929 Complete the memprobe support. This provides a new scaling tool called "mpi_memprobe" that samples the memory footprint of the local daemon and the client procs, and then reports the results. The output contains the footprint of the daemon on each node, plus the average footprint of the client procs on that node.
Samples are taken after MPI_Init, and then again after MPI_Barrier. This allows the user to see memory consumption caused by add_procs, as well as any modex contribution from forming connections if pmix_base_async_modex is given.

Using the probe simply involves executing it via mpirun, with however many copies you want per node. Example:

$ mpirun -npernode 2 ./mpi_memprobe
Sampling memory usage after MPI_Init
Data for node rhc001
	Daemon: 12.483398
	Client: 6.514648

Data for node rhc002
	Daemon: 11.865234
	Client: 4.643555

Sampling memory usage after MPI_Barrier
Data for node rhc001
	Daemon: 12.520508
	Client: 6.576660

Data for node rhc002
	Daemon: 11.879883
	Client: 4.703125

Note that the client value on node rhc001 is larger - this is where rank=0 is housed, and apparently it gets a larger footprint for some reason.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-05 10:32:17 -08:00
Ralph Castain
b4088c331a Merge pull request #2662 from rhc54/topic/stuff
Variety of cleanups
2017-01-04 10:25:26 -08:00
Ralph Castain
91d714fe93 Add flags to direct PMIx to only use one listener, but without directing which one (tcp or usock) to use. This allows the user to set PMIX_MCA_ptl in their environment to select the transport method.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-04 09:16:44 -08:00
Ralph Castain
f355fb926d Continue cleanup of notifications. Resolve a race condition that can result in attempt to send a message on a closed socket
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-04 09:16:33 -08:00
Joshua Ladd
57c0c847d0 Merge pull request #2603 from xinzhao3/topic/revert-ucx-mt
Revert "PML/SPML/UCX: add UCX MT support to PML and SPML."
2017-01-04 11:50:37 -05:00
Ralph Castain
5737a45b35 Merge pull request #2658 from rhc54/topic/removal
Remove the bcol, coll/ml, and sbgp code as stale and lacking a maintainer
2017-01-03 20:34:09 -08:00
Ralph Castain
66131b4183 Remove the bcol, coll/ml, and sbgp code as stale and lacking a maintainer
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-03 19:32:48 -08:00
Ralph Castain
dadc6fbaf6 Merge pull request #2448 from thananon/remove_request_lock
Completely removed ompi_request_lock and ompi_request_cond
2017-01-03 19:31:46 -08:00
Jeff Squyres
33d2988985 Merge pull request #2647 from OMGtechy/master
Fixed -Wmisleading-indentation in ad_read_coll.c
2017-01-03 12:24:22 -05:00
Ralph Castain
218aed144d Merge pull request #2654 from rhc54/topic/memory
Remove stale global variables
2017-01-02 15:09:09 -08:00
Ralph Castain
9eab9a1ed3 Remove stale global variables
Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers.

Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation).

Begin first cut at memory profiler

Some minor cleanups of memprobe

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-02 14:04:24 -08:00
rhc54
5f68d655d6 Merge pull request #2651 from rhc54/topic/minor
Minor cleanups
2016-12-30 18:52:12 -08:00
Ralph Castain
e8aea2ebfc Minor cleanups
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-30 16:19:42 -08:00
rhc54
56b1e10ac0 Merge pull request #2649 from rhc54/topic/foot2
Update to latest PMIx master
2016-12-30 15:36:03 -08:00
Ralph Castain
08c76a42bb Update to latest PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>

Plug a minor memory leak. Tell the PMIx server not to create a dstore memory region for the daemon job as there is nobody to share it with.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>

Protect users of hwloc membind functions

Signed-off-by: Ralph Castain <rhc@open-mpi.org>

Update PMIx to include NULL string protection

Signed-off-by: Ralph Castain <rhc@open-mpi.org>

Update to PMIx master to include key overwrite protection

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-30 12:44:47 -08:00
rhc54
a16162832b Merge pull request #2648 from rhc54/topic/topo
Only instantiate the HWLOC topology in an MPI process if it actually will be used.
2016-12-29 11:52:08 -08:00
Ralph Castain
fe68f23099 Only instantiate the HWLOC topology in an MPI process if it actually will be used.
There are only five places in the non-daemon code paths where opal_hwloc_topology is currently referenced:

* shared memory BTLs (sm, smcuda). I have added a code path to those components that uses the location string
  instead of the topology itself, if available, thus avoiding instantiating the topology

* openib BTL. This uses the distance matrix. At present, I haven't developed a method
  for replacing that reference. Thus, this component will instantiate the topology

* usnic BTL. Uses the distance matrix.

* treematch TOPO component. Does some complex tree-based algorithm, so it will instantiate
  the topology

* ess base functions. If a process is direct launched and not bound at launch, this
  code attempts to bind it. Thus, procs in this scenario will instantiate the
  topology

Note that instantiating the topology on complex chips such as KNL can consume
megabytes of memory.

Fix pernode binding policy

Properly handle the unbound case

Correct pointer usage

Do not free static error messages!

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-29 10:33:29 -08:00
Ralph Castain
52533f755e Remove debug
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-28 13:24:39 -08:00
Joshua Gerrard
94e87654c6 Fixed -Wmisleading-indentation in ad_read_coll.c
Signed-off-by: Joshua Gerrard <joshuagerrard+ompi-commit@protonmail.com>
2016-12-28 20:14:13 +00:00
rhc54
acbf1cbaef Merge pull request #2646 from rhc54/topic/squeze
Begin to reduce reliance of application procs on the topology tree it…
2016-12-28 10:16:58 -08:00
Ralph Castain
3a2d6a5ab6 Begin to reduce reliance of application procs on the topology tree itself by having the daemon provide more detailed info. In this case, provide the topology description string so that procs can readily determine the number of types of objects on the node, and a "locality" string that describes which objects this process is executing upon. The latter allows a process to compute the objects of overlap between itself and another proc without consulting the topology tree.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-28 09:14:26 -08:00
rhc54
75be023f90 Merge pull request #2645 from rhc54/topic/maps
Fix mapping directive checks
2016-12-28 03:43:46 -08:00
Ralph Castain
7866bb1119 Add debug, cleanup cpus/rank
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-27 21:25:52 -08:00
Ralph Castain
1e4bffd937 Fix mapping directive checks
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-27 20:42:47 -08:00
Jeff Squyres
31e98401c7 Merge pull request #2636 from jsquyres/pr/fix-cflags-uniqueness
configury: fix OMPI_UNIQUE -> OMPI_FLAGS_UNIQUE
2016-12-27 17:24:06 -05:00
Jeff Squyres
d772fcf8f1 Merge pull request #2509 from OMGtechy/master
Fixed memory leak and some -Werror=unused-result warnings
2016-12-27 17:13:23 -05:00
Jeff Squyres
15c1ee13fb Merge pull request #2624 from jsquyres/pr/one-more-buildrpm-fix
buildrpm.sh: don't use $HOME
2016-12-27 17:02:36 -05:00
Jeff Squyres
fb74c80e4b configury: fix OMPI_UNIQUE -> OMPI_FLAGS_UNIQUE
Looks like we missed one place where we needed to swap OMPI_UNIQUE for
OMPI_FLAGS_UNIQUE.  Thanks to Phil Tooley (@Telemin) for reporting the
issue.

Fixes #2635.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-27 13:36:53 -08:00
rhc54
24000aae84 Merge pull request #2638 from rhc54/topic/pmixcflags
Avoid mangling user-provided CFLAGS by using the new PMIX_FLAGS_UNIQ autoconf script in place of PMIX_UNIQ
2016-12-27 10:13:58 -08:00
Ralph Castain
d3aa3777f3 Per @jsquyres: avoid mangling user-provided CFLAGS by using the new PMIX_FLAGS_UNIQ autoconf script in place of PMIX_UNIQ
Refs #2636

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-27 09:00:59 -08:00
Ralph Castain
791f4f1ce3 Adjust debug output for clarity
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-26 14:04:20 -08:00
Gilles Gouaillardet
22db1d36b6 pmix2x: silence misc warnings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-26 13:35:17 +09:00
rhc54
67a08e825e Merge pull request #2632 from rhc54/topic/updates
Transfer some minor cleanups back from the PMIx reference server
2016-12-23 12:49:33 -08:00
Ralph Castain
ef3f748d0d Transfer some minor cleanups back from the PMIx reference server
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-23 08:46:04 -08:00
Nysal Jan K A
19e3be31e5 Merge pull request #2421 from nysal/master
mpit: Fix MPI_T_pvar_get_index
2016-12-22 15:33:51 +05:30
Gilles Gouaillardet
54c84196a6 btl/vader: plug a memory leak
as reported by Coverity with CID 1362691

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-22 16:04:36 +09:00
Nysal Jan K.A
25ba507ada mpit: Fix MPI_T_pvar_get_index
MPI_T_pvar_get_index was returning an incorrect index. The index
was never set correctly while registering the performance variables.
Additionally fix a missing case in the mca_base_var_type_t to MPI
datatype conversion. This type is currently used for control variables
registered by mxm, fca and hcoll components.

Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
2016-12-22 12:30:21 +05:30
Gilles Gouaillardet
773cad6b3e ompi/debugger: fix mqs_version_string()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-22 15:00:47 +09:00
Jeff Squyres
3571c3c5bb hwloc external: minor fixes to 9649c44
- Fix capitolization typos
- Make comment more correct / flow better
- Use AM_CPPFLAGS, not DEFAULT_INCLUDES
- Remove extra "hwloc/" from external hwloc.h specification

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-21 09:06:24 -08:00
Jeff Squyres
6002a8bca5 buildrpm.sh: don't use $HOME
This is news to me: I didn't know that some distros do not set $HOME.
So use "~" instead, and only try to grep ~/.rpmmacros if it exists.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-21 07:42:32 -08:00
Jeff Squyres
678e314c0e opal_configure_options: remove stale option help
--with-libltdl is now added (via AC_ARG_WITH) in
opal/mca/dl/libltdl/configure.m4 -- it no longer belongs up here in
this top-level m4 file.  Plus, the help string in this stale entry is
also stale/incorrect.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-21 07:28:31 -08:00
Boris Karasev
5fb3e0a9b6 rmaps/mindist: fix pmix errors
Fixed the case were only part of the nodes in the allocation
are used by the applicaton proccesses.

Force PMIx nodemap key to only contain nodes that are actually
used by the application proccesses.

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2016-12-21 06:42:04 +02:00
Gilles Gouaillardet
9649c44fa0 hwloc: correctly handle --with-hwloc=external
- simply #include "hwloc.h" to use the external hwloc header
- do use the external hwloc header instead of opal/mca/hwloc/hwloc.h

Thanks Orion Poplawski for the report

Fixes open-mpi/ompi#2616

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-21 11:58:10 +09:00
Jeff Squyres
5ecd271934 buildrpm.sh: minor fixes
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-12-20 10:54:37 -08:00
rhc54
75ec38db7d Merge pull request #2609 from rhc54/topic/psrv
Bring across some more patches from the debugger work
2016-12-19 20:38:28 -08:00
Ralph Castain
ea133206ec Sync the internal OMPI component to PMIx master
Update external PMIx v2.x component
Add missing Makefile

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-19 19:14:16 -08:00