openmpi

Автор	SHA1	Сообщение	Дата
rhc54	b2e36f0824	Merge pull request #2493 from rhc54/topic/pmix Update to latest PMIx master	2016-12-01 16:02:25 -08:00
rhc54	88fbdf82c9	Merge pull request #2492 from rhc54/topic/connect Fix IOF when outputing to files - the remote orteds were failing to o…	2016-12-01 16:00:13 -08:00
Ralph Castain	6041467df0	Update to latest PMIx master Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-01 14:47:44 -08:00
Ralph Castain	dd491db21f	Fix IOF when outputing to files - the remote orteds were failing to output stdout/err from their procs. Silence a warning in orted_submit Protect against a free'd value in an error path when forming oob tcp connections Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-12-01 14:12:47 -08:00
Gilles Gouaillardet	3a76a78bff	btl/openib: plug a memory leak in btl_openib_register_mca_params() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:30 +09:00
Gilles Gouaillardet	c9aeccb84e	opal/if: open the if framework once in opal_init_util the if framework is no more open in opal_if*, which plugs several memory leaks Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:30 +09:00
Gilles Gouaillardet	188b9668e4	ompi/attribute: plug a memory leak in set_value() OBJ_RELEASE() the previous attribute value if any Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:30 +09:00
Gilles Gouaillardet	d94e8c97a0	ompi/runtime: release F90 types in ompi_mpi_finalize() F90 types cannot be freed by the enduser as specified by the standard. but since they are ompi_datatype_dup'ed from predefined datatypes, they have to be explicitly free'd at finalize time in order to avoid a memory leak. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:30 +09:00
Gilles Gouaillardet	45732fd764	hwloc/base: fix a memory leak in buffer_cleanup() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:29 +09:00
Gilles Gouaillardet	2739346a18	opal: invoke mca_base_close() in opal_finalize_util() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:29 +09:00
Gilles Gouaillardet	b2aca6c753	ompi/proc: plug a memory leak in ompi_proc_unpack() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:29 +09:00
Gilles Gouaillardet	ae278fd5df	ompi/runtime: plug a memory leak declare ompi_mpi_show_mca_params_file as NULL so MPI_T_Init_thread() can be invoked without leaking memory Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 14:24:29 +09:00
Gilles Gouaillardet	43ee08b20e	ompi/c: remove unused variable in [i]gatherv Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 13:59:25 +09:00
Gilles Gouaillardet	fe4c4e95eb	coll/libnbc: fix MPI_IN_PLACE handling in i{gather,scatter}[v] MPI_IN_PLACE is only relevant on the root task, so only test is there Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 13:59:25 +09:00
Gilles Gouaillardet	1a8a276914	coll/libnbc: use zero-size messages in ibarrier and silence a valgrind warning Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 13:59:25 +09:00
Gilles Gouaillardet	2eec6a08b5	coll/base: fix ompi_coll_base_reduce_scatter_intra_nonoverlapping() with MPI_IN_PLACE invoke underlying scatterv with MPI_IN_PLACE when appropriate Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 13:59:24 +09:00
Gilles Gouaillardet	8b7999469b	coll/base: fix MPI_IN_PLACE in ompi_coll_base_reduce_generic() avoid copying data to itself when MPI_IN_PLACE is used Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 13:59:24 +09:00
rhc54	eeec99ac12	Merge pull request #2484 from artpol84/oob/msg_drop orte/oob/tcp: Plug the memory leak.	2016-11-30 20:36:40 -08:00
Gilles Gouaillardet	3f1486a508	pml/ob1: initialize one more field in mca_pml_ob1_recv_request_progress_rget() always initialize recvreq->req_rdma_offset to zero. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 13:14:23 +09:00
Jeff Squyres	756d09fd6f	Merge pull request #2457 from OMGtechy/master Fixed -Werror=unused-result warnings in comm_cid.c by adding error checking	2016-11-30 20:41:55 -05:00
Artem Polyakov	58300afff2	orte/oob/tcp: Plug the memory leak. Plug coverity defect CID 1396541. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2016-12-01 06:48:25 +07:00
rhc54	b6647ce286	Merge pull request #2480 from rhc54/topic/sessiondir Fix session directory cleanup	2016-11-30 12:24:37 -08:00
Gilles Gouaillardet	15098161a3	coll/libnbc: add some comments on how locks are used no code change Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-30 17:29:51 +09:00
Ralph Castain	9c596a6f54	We added another layer to the session directory tree (the jobfam layer), but we forgot to include it in the teardown procedure, thus causing us to leave droppings behind. Add the jobfam_session_dir to the teardown, and ensure that all levels are addressed Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-29 20:51:28 -08:00
rhc54	d4c4babcc0	Merge pull request #2479 from rhc54/topic/dbgupdate Do not resend if max_retries is exceeded.	2016-11-29 20:36:29 -08:00
Ralph Castain	47ed214458	Do not resend if max_retries is exceeded. Make a verbose output available to tell us where the intended message was to go. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-29 19:21:16 -08:00
rhc54	d31f173744	Merge pull request #2476 from rhc54/topic/dbgupdate Bring forward the debugger-related changes	2016-11-29 19:10:32 -08:00
Ralph Castain	d5fd635efe	Bring forward the debugger-related changes Refs https://github.com/open-mpi/ompi/pull/2425 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-29 13:15:20 -08:00
Joshua Gerrard	7cf5de12b9	Fixed -Werror=unused-result warnings in comm_cid.c by adding error checking Signed-off-by: Joshua Gerrard <enquiries@joshuagerrard.com>	2016-11-29 21:08:12 +00:00
rhc54	ef5ee73579	Merge pull request #2472 from rhc54/topic/mpiinit Never collect data when doing the fence at the end of MPI_Init	2016-11-29 11:48:50 -08:00
rhc54	650a67b2a8	Merge pull request #2473 from rhc54/topic/oob Silence minor warnings	2016-11-29 11:48:36 -08:00
Ralph Castain	30ff8be9c9	Silence minor warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-29 08:33:22 -08:00
Ralph Castain	114e20ad66	Never collect data when doing the fence at the end of MPI_Init Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-29 08:31:35 -08:00
Jeff Squyres	a6d390fe7b	Merge pull request #2461 from artpol84/oob/msg_drop orte/oob/tcp: Fix message dropping in case of concurrent connection.	2016-11-29 11:23:15 -05:00
Joshua Ladd	a3782718e7	Merge pull request #2469 from vspetrov/hcoll_context_free coll/hcoll: hcoll_context_free	2016-11-29 10:06:53 -05:00
Valentin Petrov	4cdb8ecaad	coll/hcoll: hcoll_context_free Adds the new API hcoll_conetxt_free that resolves the issues observed with the ctx cache and group_destroy_notify. Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2016-11-29 07:33:05 +02:00
Jeff Squyres	34ea3ce25a	Merge pull request #1946 from thananon/romio-add-notes romio: update REFRESH_NOTES to accommodate the random() patch.	2016-11-28 16:37:23 -05:00
Ralph Castain	f7699a7eeb	Silence warnings in a .opal_ignore'd component Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-28 13:18:25 -08:00
KAWASHIMA Takahiro	001095399d	Merge pull request #2466 from kawashima-fj/pr/mca_pml_ob1_comm_proc_t pml/ob1: Reduce per-rank memory footprint slightly	2016-11-28 21:09:38 +09:00
KAWASHIMA Takahiro	9bfca8b274	pml/ob1: Reduce per-rank memory footprint slightly `sturct mca_pml_ob1_comm_proc_t`, which is allocated per connected rank in a communicator, had two paddings after `expected_sequence` and `send_sequence` by alignments. By changing the order of the members, the size of `mca_pml_ob1_comm_proc_t` is reduced by 8 bytes on 64-bit architectures. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2016-11-28 19:20:48 +09:00
Mike Dubman	53a0c86c16	Merge pull request #2455 from yosefe/topic/ucp-uct-nonblock-mem-reg-api spml_ucx: allow registering the heap in non-blocking mode.	2016-11-27 11:42:09 +02:00
Artem Polyakov	ada93e0c02	orte/oob/tcp: Fix message dropping in case of concurrent connection. The problem was observed for direct modex used with recursive doubling algorithm (used for collective ID calculation prior to d52a2d081e9598a9ac9a50fb4b013a6d2a72375b) that has pairwise nature and counter-connections are highly likely. The following scenario was uncovering the issue: * ranks `x` and `y` want to communicate with each other, `x` < `y`; * rank `x` initiates the connection and sends the ack; * rank `y` starts to `connect()` and gets the ack from `x`; * `y` identifies that it already started connecting and `y` > `x` so it rejects incoming connection. * `x` sees that his connection was rejected in `mca_oob_tcp_peer_recv_connect_ack()` when trying to read the message header using `tcp_peer_recv_blocking()` which calls `mca_oob_tcp_peer_close()` that effectively flushes all the messages in the peer->send_queue. * `y` send the ack to `x` and the connection is established, however all the messages for the peer at `x` are vanished (except the front one in peer->send_msg). This commit introduces a "nack" function that will be used at `y` side to tell `x` that `y` has the priority and `x`'s connection should be closed. This allows to avoid "guessing" on the unexpectedly closed connection. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2016-11-27 04:58:34 +07:00
Howard Pritchard	7ce3ca25ef	Merge pull request #2458 from hppritcha/topic/pmix_cray_no_dlopen pmix/cray: abort job if using aprun for general case	2016-11-25 14:03:20 -07:00
Mike Dubman	f339632216	Merge pull request #2452 from alex-mikheev/topic/scoll_basic_fixes oshmem: fixes scoll basic barrier and broadcast	2016-11-25 18:03:56 +02:00
Howard Pritchard	eee9f7ae3a	pmix/cray: abort job if using aprun for general case It turns that there is an incompatibility between the Cray PMI library and the default configuration for building Open MPI (master). To work around this, we now disable use of aprun for direct launch of Open MPI jobs except under specific conditions. The problem is that there are now (on master) packages getting initialized that do not work properly across a fork operation. As part of a constructor in the Cray PMI library, a fork operation is done to simplify use of shared memory between the processes in a job on the same node. This ends up thoroughly messing up the Open MPI initialization process in the case that dlopen support is enabled. The initialization process gets about half-way through when the PMIX framework is opened and components are loaded, which triggers the Cray PMI constructor and hence the fork operation. There are two workarounds for this: 1) configure Open MPI for Cray XE/XC systems using aprun with the --disable-dlopen option 2) set the PMI_NO_FORK environment variable in the shell in which the aprun command is run. Without taking these measures, a Open MPI job will just hang at job startup in the first attempt to "thread-shift" the PMIx fence_nb operation. Additional hangs occur at shutdown if this problem is worked around, again due to the insertion of a fork operation halfway through the Open MPI initialization procedure. This commit detects if the conditions that bring out the hang situation are present, and if so, prints out a message and aborts the job launch. Note on systems using slurm, the PMI_NO_FORK environment variable is set as part of the srun job launch, hence this issue is avoided on those systems. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-11-25 06:28:19 -07:00
Yossi Itigin	0241a2697d	spml_ucx: allow registering the heap in non-blocking mode. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2016-11-25 15:09:22 +02:00
Edgar Gabriel	ebcfbbc045	Merge pull request #2456 from edgargabriel/pr/dynamic_gen2_uneven_distro_bug fcoll/dynamic_gen2: fix bug exposed by uneven distribution of data	2016-11-24 16:34:19 -06:00
Edgar Gabriel	b10558c3da	fcoll/dynamic_gen2: fix bug exposed by uneven distribution of data This fixes a bug reported in-house occuring with this component. It is triggered if the data assigned to different aggregators is highly differing, leading to different number of internal iterations required to handle it. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2016-11-24 13:02:19 -06:00
Alex Mikheev	0f83a1fd57	oshmem: scoll: fixes basic barrier broadcast and alltoall Add missing fence() call to alltoall and central counter broadcast. Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2016-11-24 16:56:55 +02:00
Gilles Gouaillardet	8fd1c3f0df	opal/util: handle a race condition in opal_os_dirpath_destroy An file might have been destroyed by an other task between readdir() and stat(), so simply ignore stat() failure. That typically occurs when one task is removing the job_session_dir and an other task is still removing its proc_session_dir. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-11-24 10:45:48 +09:00

1 2 3 4 5 ...

26175 Коммитов