1
1

26342 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
6b9343a966 plm/rsh: plug a memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Gilles Gouaillardet
8ba92d7516 iof/base: plug a memory leak in orte_iof_base_close()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Gilles Gouaillardet
e396b17a7f orte/orted: plug a memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Gilles Gouaillardet
6b90b03c28 orted/pmix: plug a memoy leak in pmix_server_fencenb_fn()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Gilles Gouaillardet
7fe6840232 state/hnp: plug a memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Gilles Gouaillardet
4d58b8dcae ess/pmi: plug a memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Gilles Gouaillardet
c0c5dd8ccc orte: plug a memory leak in orte_rml.recv_cancel
do not invoke orte_rml.recv_cancel after the orte progress thread has gone

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:44 +09:00
Gilles Gouaillardet
17fac4bfd1 grpcomm/base: get rid of the seq_num field of the orte_grpcomm_signature_t struct
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:44 +09:00
Gilles Gouaillardet
fe25f50871 grpcomm/base: plug a memory leak on finalize
manually allocate sequence numbers to be stored into the
orte_grpcomm_base.sig_table hash table, and manually release
them on orte_grpcomm_base_close()

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:44 +09:00
Gilles Gouaillardet
2189c5bcc3 ompi/dpm: plug a memory leak in disconnect_waitall()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:44 +09:00
Gilles Gouaillardet
a988ad24eb orte/runtime: plug a leak in orte_finalize()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:44 +09:00
Gilles Gouaillardet
c2ddb1e2fc mca/base: plug a memory leak
register mca_base_var_enum_value_flag_t so they can be free'd
upon finalize

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:36 +09:00
Gilles Gouaillardet
cf534d0c95 ompi/proc: plug a memory leak in ompi_proc_finalize()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
6d5cb9fe0d event: plug a leak when closing the event framework
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
b3a2bdda7b opal/threads: manually invoke thread-specific key destructors on the main thread.
there is no such thing as pthread_join(main_thread), so key destructors
are never invoked on the main thread, which causes valgrind report
some memory leaks. Manually store and then invoke the key destructors and
make valgrind happy.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
6ef281e163 pmix/base: fix misc memory leaks
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
0ee5d56ab1 grpcomm/direct: plug a memory leak in barrier_release()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
a59dfd7b14 sec/munge: plug a memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
f2d6584189 grpcomm/base: plug misc memory leaks
- add a destructor to orte_grpcomm_caddy_t in order to plug a memory leak

- plug a memory leak in barrier_release()

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:21 +09:00
Gilles Gouaillardet
c4a47ae9a9 orte/orted: plug misc memory leaks
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
88535b6200 orte/util: revamp orte_attr_unload() to keep valgrind happy
reorder tests to avoid valgrind complaining about uninitialized variables

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
c612499bc1 opal: mca/base: fix a memory leak in the mca_base_var_enum_flag_t destructor
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
58f2a764f9 ess/hnp: plug memory leaks
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
24c61b0625 oob/tcp: plug a memory leak in mca_oob_tcp_component_lost_connection()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
7e5da7382e btl/tcp: plug leaks when closing component
remove tcp_local from the tcp_procs table, and release it

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
c7d9e62d47 rml/base: plug a memory leak
add a destructor to orte_rml_send_request_t in order
to plug a memory leak

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
507623d6b1 mpool/hugepage: plug a memory leak on finalize
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:58 +09:00
Gilles Gouaillardet
51021028d6 mpool/base: plug a memory leak on finalize
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:58 +09:00
Gilles Gouaillardet
1daa80d78f mtl/psm2: plug a memory leak in ompi_mtl_psm2_component_open()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 09:28:32 +09:00
Ralph Castain
b343df43a1 Merge pull request #2669 from rhc54/topic/memprobe
Complete the memprobe support.
2017-01-05 12:02:56 -08:00
Ralph Castain
6509f60929 Complete the memprobe support. This provides a new scaling tool called "mpi_memprobe" that samples the memory footprint of the local daemon and the client procs, and then reports the results. The output contains the footprint of the daemon on each node, plus the average footprint of the client procs on that node.
Samples are taken after MPI_Init, and then again after MPI_Barrier. This allows the user to see memory consumption caused by add_procs, as well as any modex contribution from forming connections if pmix_base_async_modex is given.

Using the probe simply involves executing it via mpirun, with however many copies you want per node. Example:

$ mpirun -npernode 2 ./mpi_memprobe
Sampling memory usage after MPI_Init
Data for node rhc001
	Daemon: 12.483398
	Client: 6.514648

Data for node rhc002
	Daemon: 11.865234
	Client: 4.643555

Sampling memory usage after MPI_Barrier
Data for node rhc001
	Daemon: 12.520508
	Client: 6.576660

Data for node rhc002
	Daemon: 11.879883
	Client: 4.703125

Note that the client value on node rhc001 is larger - this is where rank=0 is housed, and apparently it gets a larger footprint for some reason.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-05 10:32:17 -08:00
Ralph Castain
b4088c331a Merge pull request #2662 from rhc54/topic/stuff
Variety of cleanups
2017-01-04 10:25:26 -08:00
Ralph Castain
91d714fe93 Add flags to direct PMIx to only use one listener, but without directing which one (tcp or usock) to use. This allows the user to set PMIX_MCA_ptl in their environment to select the transport method.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-04 09:16:44 -08:00
Ralph Castain
f355fb926d Continue cleanup of notifications. Resolve a race condition that can result in attempt to send a message on a closed socket
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-04 09:16:33 -08:00
Joshua Ladd
57c0c847d0 Merge pull request #2603 from xinzhao3/topic/revert-ucx-mt
Revert "PML/SPML/UCX: add UCX MT support to PML and SPML."
2017-01-04 11:50:37 -05:00
Ralph Castain
5737a45b35 Merge pull request #2658 from rhc54/topic/removal
Remove the bcol, coll/ml, and sbgp code as stale and lacking a maintainer
2017-01-03 20:34:09 -08:00
Ralph Castain
66131b4183 Remove the bcol, coll/ml, and sbgp code as stale and lacking a maintainer
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-03 19:32:48 -08:00
Ralph Castain
dadc6fbaf6 Merge pull request #2448 from thananon/remove_request_lock
Completely removed ompi_request_lock and ompi_request_cond
2017-01-03 19:31:46 -08:00
Jeff Squyres
33d2988985 Merge pull request #2647 from OMGtechy/master
Fixed -Wmisleading-indentation in ad_read_coll.c
2017-01-03 12:24:22 -05:00
Ralph Castain
218aed144d Merge pull request #2654 from rhc54/topic/memory
Remove stale global variables
2017-01-02 15:09:09 -08:00
Ralph Castain
9eab9a1ed3 Remove stale global variables
Revamp the event notification integration to rely on the PMIx event chaining and remove the duplicate chaining in OPAL. This ensures we get system-level events that target non-default handlers.

Restore the hostname entries for MPI-level error messages, but provide an MCA param (orte_hostname_cutoff) to remove them for large clusters where the memory footprint is problematic. Set the default at 1000 nodes in the job (not the allocation).

Begin first cut at memory profiler

Some minor cleanups of memprobe

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-02 14:04:24 -08:00
rhc54
5f68d655d6 Merge pull request #2651 from rhc54/topic/minor
Minor cleanups
2016-12-30 18:52:12 -08:00
Ralph Castain
e8aea2ebfc Minor cleanups
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-30 16:19:42 -08:00
rhc54
56b1e10ac0 Merge pull request #2649 from rhc54/topic/foot2
Update to latest PMIx master
2016-12-30 15:36:03 -08:00
Ralph Castain
08c76a42bb Update to latest PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>

Plug a minor memory leak. Tell the PMIx server not to create a dstore memory region for the daemon job as there is nobody to share it with.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>

Protect users of hwloc membind functions

Signed-off-by: Ralph Castain <rhc@open-mpi.org>

Update PMIx to include NULL string protection

Signed-off-by: Ralph Castain <rhc@open-mpi.org>

Update to PMIx master to include key overwrite protection

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-30 12:44:47 -08:00
rhc54
a16162832b Merge pull request #2648 from rhc54/topic/topo
Only instantiate the HWLOC topology in an MPI process if it actually will be used.
2016-12-29 11:52:08 -08:00
Ralph Castain
fe68f23099 Only instantiate the HWLOC topology in an MPI process if it actually will be used.
There are only five places in the non-daemon code paths where opal_hwloc_topology is currently referenced:

* shared memory BTLs (sm, smcuda). I have added a code path to those components that uses the location string
  instead of the topology itself, if available, thus avoiding instantiating the topology

* openib BTL. This uses the distance matrix. At present, I haven't developed a method
  for replacing that reference. Thus, this component will instantiate the topology

* usnic BTL. Uses the distance matrix.

* treematch TOPO component. Does some complex tree-based algorithm, so it will instantiate
  the topology

* ess base functions. If a process is direct launched and not bound at launch, this
  code attempts to bind it. Thus, procs in this scenario will instantiate the
  topology

Note that instantiating the topology on complex chips such as KNL can consume
megabytes of memory.

Fix pernode binding policy

Properly handle the unbound case

Correct pointer usage

Do not free static error messages!

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-29 10:33:29 -08:00
Ralph Castain
52533f755e Remove debug
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-12-28 13:24:39 -08:00
Joshua Gerrard
94e87654c6 Fixed -Wmisleading-indentation in ad_read_coll.c
Signed-off-by: Joshua Gerrard <joshuagerrard+ompi-commit@protonmail.com>
2016-12-28 20:14:13 +00:00
rhc54
acbf1cbaef Merge pull request #2646 from rhc54/topic/squeze
Begin to reduce reliance of application procs on the topology tree it…
2016-12-28 10:16:58 -08:00