1
1
Граф коммитов

26409 Коммитов

Автор SHA1 Сообщение Дата
Joshua Ladd
3e23380bba Merge pull request #2675 from artpol84/orte/state/exit_1_fix
orte/odls: Fix ORTE state machine for the non-zero exit case
2017-01-09 12:32:37 -05:00
Ralph Castain
67fce2861b Merge pull request #2685 from rhc54/topic/cov
Resolve Coverity issues
2017-01-07 13:11:40 -08:00
Ralph Castain
84ce7eed2a Merge pull request #2683 from rhc54/topic/nits
Cleanup some configure stuff for static builds
2017-01-07 11:33:20 -08:00
Ralph Castain
e25e69dc2f Resolve Coverity issues
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-07 10:45:52 -08:00
Ralph Castain
822e2680ba Cleanup some configure stuff for static builds - still can't get wrapper extra libs to be recognized
Signed-off-by: Ralph Castain <rhc@open-mpi.org>

pmix2x: minor configure updates

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-01-07 08:37:36 -08:00
Nathan Hjelm
5b70ae3ec0 amd64: save/restore all 64 bits of rbx around cpuid
This commit fixes a bug in the timer check. When -fPIC is used we need
to save/restore ebx. The code copied from patcher was meant for 32-bit
systems and did not work correctly on 64-bit systems. This commit
updates the save/restore to use rbx instead of ebx.

Fixes #2678

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-01-06 18:54:20 -07:00
Joshua Ladd
dc7d2f5b6a Merge pull request #2571 from alex-mikheev/topic/sshmem_prio_fix
oshmem: sshmem: make mmap allocator a default instead of verbs
2017-01-06 17:39:37 -05:00
Joshua Ladd
7fc9f9bbac Merge pull request #2620 from karasevb/fix_rmaps_mindist
rmaps/mindist: fix pmix errors
2017-01-06 17:26:48 -05:00
Ralph Castain
ca16f3f9ed Merge pull request #2676 from rhc54/topic/alps
Minor cleanups to eliminate warnings
2017-01-06 12:43:42 -08:00
Ralph Castain
684e69695f Minor cleanups to eliminate warnings
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-06 08:44:10 -08:00
Ralph Castain
39d880f65d Merge pull request #2673 from rhc54/topic/usock
Raise the priority of the usock component so it gets preferentially picked
2017-01-06 08:24:35 -08:00
Artem Polyakov
3eb6c98542 orte/odls: Fix ORTE state machine for the non-zero exit case
This commit fixes rare race condition that occurs when the process
that is calling `exit(-1)` has delay between fd cleanup and actual
OS-level exit. This may happen if the process has some work to do
`on_exit()`.

**Problem description**:
Consider an application process that has called `exit(nonzero)`, it's
fd's was closed
but it's actual termination at OS level is delayed by some cleanups (eg.
in callbacks registered via `on_exit()`).
Observed sequence of events was the following:

* orted gets stdio disconnection and activating `IOF COMPLETE` state.
* parallel OOB disconnection causes `COMMUNICATION FAILURE` state to be
activated.
* during `COMMUNICATION FAILURE` processing `odls_base_default_wait_local_proc`
is called even though real waitpid wasn't yet called (code mentions that
waitpid might not be called for unspecified reason). Because of that real exit
code is unknown and set to 0. `odls_base_default_wait_local_proc` callback sees
`IOF COMPLETE` flag and in conjunction with 0-exit-code it activates
`WAITPID FIRED` state.
* processing of `WAITPID FIRED` leads to `NORMALLY TERMINATED` to be
activated.
* `NORMALLY TERMINATED` state in particular leads `ORTE_PROC_FLAG_ALIVE` flag
for this proc to be dropped.
* when application process finally exits and `wait_signal_callback` is
launched. It sets real exit code and calls `odls_base_default_wait_local_proc`
again but at this time since the process has `ORTE_PROC_FLAG_ALIVE` flag
dropped `WAITPID FIRED` state is activated (instead of `EXITED WITH NON-ZERO`)
leading to a hang that was observed.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2017-01-06 11:12:55 +02:00
Gilles Gouaillardet
189d7b9480 opal/dss: revamp opal_value_unload() to keep valgrind happy
reorder tests to avoid valgrind complaining about uninitialized variables

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 17:10:39 +09:00
Ralph Castain
444f5fa35d Raise the priority of the usock component so it gets preferentially picked
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-05 22:53:04 -08:00
Gilles Gouaillardet
a1a0e324b3 util/hostfile: plug a memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Gilles Gouaillardet
6b9343a966 plm/rsh: plug a memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Gilles Gouaillardet
8ba92d7516 iof/base: plug a memory leak in orte_iof_base_close()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Gilles Gouaillardet
e396b17a7f orte/orted: plug a memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Gilles Gouaillardet
6b90b03c28 orted/pmix: plug a memoy leak in pmix_server_fencenb_fn()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Gilles Gouaillardet
7fe6840232 state/hnp: plug a memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Gilles Gouaillardet
4d58b8dcae ess/pmi: plug a memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:45 +09:00
Gilles Gouaillardet
c0c5dd8ccc orte: plug a memory leak in orte_rml.recv_cancel
do not invoke orte_rml.recv_cancel after the orte progress thread has gone

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:44 +09:00
Gilles Gouaillardet
17fac4bfd1 grpcomm/base: get rid of the seq_num field of the orte_grpcomm_signature_t struct
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:44 +09:00
Gilles Gouaillardet
fe25f50871 grpcomm/base: plug a memory leak on finalize
manually allocate sequence numbers to be stored into the
orte_grpcomm_base.sig_table hash table, and manually release
them on orte_grpcomm_base_close()

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:44 +09:00
Gilles Gouaillardet
2189c5bcc3 ompi/dpm: plug a memory leak in disconnect_waitall()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:44 +09:00
Gilles Gouaillardet
a988ad24eb orte/runtime: plug a leak in orte_finalize()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 15:38:44 +09:00
Gilles Gouaillardet
c2ddb1e2fc mca/base: plug a memory leak
register mca_base_var_enum_value_flag_t so they can be free'd
upon finalize

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:36 +09:00
Gilles Gouaillardet
cf534d0c95 ompi/proc: plug a memory leak in ompi_proc_finalize()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
6d5cb9fe0d event: plug a leak when closing the event framework
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
b3a2bdda7b opal/threads: manually invoke thread-specific key destructors on the main thread.
there is no such thing as pthread_join(main_thread), so key destructors
are never invoked on the main thread, which causes valgrind report
some memory leaks. Manually store and then invoke the key destructors and
make valgrind happy.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
6ef281e163 pmix/base: fix misc memory leaks
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
0ee5d56ab1 grpcomm/direct: plug a memory leak in barrier_release()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
a59dfd7b14 sec/munge: plug a memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:35 +09:00
Gilles Gouaillardet
f2d6584189 grpcomm/base: plug misc memory leaks
- add a destructor to orte_grpcomm_caddy_t in order to plug a memory leak

- plug a memory leak in barrier_release()

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 13:46:21 +09:00
Gilles Gouaillardet
c4a47ae9a9 orte/orted: plug misc memory leaks
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
88535b6200 orte/util: revamp orte_attr_unload() to keep valgrind happy
reorder tests to avoid valgrind complaining about uninitialized variables

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
c612499bc1 opal: mca/base: fix a memory leak in the mca_base_var_enum_flag_t destructor
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
58f2a764f9 ess/hnp: plug memory leaks
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
24c61b0625 oob/tcp: plug a memory leak in mca_oob_tcp_component_lost_connection()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
7e5da7382e btl/tcp: plug leaks when closing component
remove tcp_local from the tcp_procs table, and release it

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
c7d9e62d47 rml/base: plug a memory leak
add a destructor to orte_rml_send_request_t in order
to plug a memory leak

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:59 +09:00
Gilles Gouaillardet
507623d6b1 mpool/hugepage: plug a memory leak on finalize
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:58 +09:00
Gilles Gouaillardet
51021028d6 mpool/base: plug a memory leak on finalize
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 11:35:58 +09:00
Gilles Gouaillardet
1daa80d78f mtl/psm2: plug a memory leak in ompi_mtl_psm2_component_open()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-01-06 09:28:32 +09:00
Ralph Castain
b343df43a1 Merge pull request #2669 from rhc54/topic/memprobe
Complete the memprobe support.
2017-01-05 12:02:56 -08:00
Ralph Castain
6509f60929 Complete the memprobe support. This provides a new scaling tool called "mpi_memprobe" that samples the memory footprint of the local daemon and the client procs, and then reports the results. The output contains the footprint of the daemon on each node, plus the average footprint of the client procs on that node.
Samples are taken after MPI_Init, and then again after MPI_Barrier. This allows the user to see memory consumption caused by add_procs, as well as any modex contribution from forming connections if pmix_base_async_modex is given.

Using the probe simply involves executing it via mpirun, with however many copies you want per node. Example:

$ mpirun -npernode 2 ./mpi_memprobe
Sampling memory usage after MPI_Init
Data for node rhc001
	Daemon: 12.483398
	Client: 6.514648

Data for node rhc002
	Daemon: 11.865234
	Client: 4.643555

Sampling memory usage after MPI_Barrier
Data for node rhc001
	Daemon: 12.520508
	Client: 6.576660

Data for node rhc002
	Daemon: 11.879883
	Client: 4.703125

Note that the client value on node rhc001 is larger - this is where rank=0 is housed, and apparently it gets a larger footprint for some reason.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-05 10:32:17 -08:00
Ralph Castain
b4088c331a Merge pull request #2662 from rhc54/topic/stuff
Variety of cleanups
2017-01-04 10:25:26 -08:00
Ralph Castain
91d714fe93 Add flags to direct PMIx to only use one listener, but without directing which one (tcp or usock) to use. This allows the user to set PMIX_MCA_ptl in their environment to select the transport method.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-04 09:16:44 -08:00
Ralph Castain
f355fb926d Continue cleanup of notifications. Resolve a race condition that can result in attempt to send a message on a closed socket
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-01-04 09:16:33 -08:00
Joshua Ladd
57c0c847d0 Merge pull request #2603 from xinzhao3/topic/revert-ucx-mt
Revert "PML/SPML/UCX: add UCX MT support to PML and SPML."
2017-01-04 11:50:37 -05:00