openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	9b9cb5fef0	to be squashed: move wait-for-init loop to ompi_mpi_init() Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-06 05:35:19 -07:00
Jeff Squyres	67ba8da76f	ompi_mpi_init: fix race condition There was a race condition in 35438ae9b5: if multiple threads invoked ompi_mpi_init() simultaneously (which could happen from both MPI and OSHMEM), the code did not catch this condition -- Bad Things would happen. Now use an atomic cmp/set to ensure that only one thread is able to advance ompi_mpi_init from NOT_INITIALIZED to INIT_STARTED. Additionally, change the prototype of ompi_mpi_init() so that oshmem_init() can safely invoke ompi_mpi_init() multiple times (as long as MPI_FINALIZE has not started) without displaying an error. If multiple threads invoke oshmem_init() simultaneously, one of them will actually do the initialization, and the rest will loop waiting for it to complete. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-05 18:09:13 -07:00
Jeff Squyres	35438ae9b5	mpi/finalized: revamp INITIALIZED/FINALIZED Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be invoked during an attribute destruction callback (e.g., during the destruction of keyvals on MPI_COMM_SELF during the very beginning of MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false". Prior to this commit, we hung in FINALIZED if it were invoked during a COMM_SELF attribute destruction callback in FINALIZE. See https://github.com/open-mpi/ompi/issues/5084. This commit converts the MPI_INITIALIZED / MPI_FINALIZED infrastructure to use a single enum (ompi_mpi_state, set atomically) to represent the state of MPI: - not initialized - init started - init completed - finalize started - finalize past COMM_SELF destruction - finalize completed The "finalize past COMM_SELF destruction" state is what allows us to return "false" from MPI_FINALIZED before COMM_SELF has been fully destroyed / all attribute callbacks have been invoked. Since this state is checked at nearly every MPI API call (to see if we're outside of the INIT/FINALIZE epoch), care was taken to use atomics to set the ompi_mpi_state value in ompi_mpi_init() and ompi_mpi_finalize(), but performance-critical code paths can simply read the variable without needing to use a slow call to an opal_atomic_*() function. Thanks to @AndrewGaspar for reporting the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-01 13:36:29 -07:00
Sergey Oblomov	319bb376f9	MCA/UCX: branch optimization in cswap call Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-30 07:14:43 +03:00
Sergey Oblomov	daad71f036	MCA/UCX: switch/case optimization Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-29 22:33:33 +03:00
Sergey Oblomov	6be4066e23	MCA/UCX: cswap call if updated to non-blocking API - minor fixes Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-29 20:07:51 +03:00
Sergey Oblomov	b668e19cd1	Merge remote-tracking branch 'wg/master' into topic/amo-non-blocking-ucp Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-29 19:08:48 +03:00
Yossi Itigin	976cd5e307	Merge pull request #5186 from hoopoepg/topic/ucx-amo-error-msg MCA/UCX: fixed error messages for incorrect msg size	2018-05-29 15:54:02 +03:00
Sergey Oblomov	0c3ed93ef0	MCA/UCX: added opal progress call to wait request routine - added opal_progress call to wait function to avoid possible [dead]lock issues - wait call is declared as inline - minor fixes Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-29 11:34:40 +03:00
Yossi Itigin	705c8a7b9b	Merge pull request #5198 from brminich/shmem_fence OSHMEM/SMPL/UCX: Add real fence support	2018-05-27 11:25:42 +03:00
Artem Polyakov	66e774d959	Merge pull request #4638 from karasevb/oshmem/spec_1.3/c11 oshmem: remove `shmem_put/get` when not the C11 case in accordance with the spec v1.3	2018-05-26 17:29:51 -07:00
Mikhail Brinskii	8e9d401938	OSHMEM/SMPL/UCX: Add real fence support + Add quiet method to SPML, so it can have different implementation with fence. + Use ucp_worker_fence for spml_fence method of UCX SPML Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>	2018-05-25 22:43:06 +03:00
Brian Barrett	9fff40647d	oshmem: disable if no spmls build This patch disables the oshmem layer if there are no SPMLs that will build. With the limited set of SPMLs available to support oshmem, many builds end up installing an oshmem library that we know will not work. There has been a bit of customer confusion over oshmem, hopefully this will lead customers in the right direction. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-05-25 08:48:50 -07:00
Sergey Oblomov	bbaffd3681	MCA/UCX: atomic add/swap are moved to new UCX atomic API Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-22 22:23:31 +03:00
Sergey Oblomov	4495da5cb9	MCA/UCX: fixed error messages for incorrect msg size - supported 4 or 8 bytes only Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-22 19:53:23 +03:00
Yossi Itigin	66d931b7c4	Merge pull request #5116 from yosefe/topic/ucx-connect-errs ucx: improve error messages during connection establishment	2018-05-02 14:04:24 +03:00
Geoff Paulsen	591b174434	Merge pull request #5003 from sam6258/shmem_free_fix ompi/oshmem: fix shmem_free to perform no-op on null ptr	2018-04-30 12:03:24 -05:00
Yossi Itigin	385f38ab4e	ucx: improve error messages during connection establishment Also, unite common code calling ucp_ep_create() Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-04-30 15:45:05 +03:00
Joshua Ladd	15d5e2937a	Merge pull request #4996 from xinzhao3/topic/shmem-cswap ompi/oshmem: fix cswap bug in mca/atomic/mxm.	2018-04-04 08:28:57 -04:00
Joshua Ladd	e87cb25711	Merge pull request #4982 from xinzhao3/topic/shmem-final ompi/oshmem: fix bug in shmem_finalize.	2018-04-04 08:27:55 -04:00
Scott Miller	a8766adb55	ompi/oshmem: fix shmem_free to perform no-op on null ptr Signed-off-by: Scott Miller <scott.miller1@ibm.com>	2018-04-02 17:12:24 -04:00
Xin Zhao	4aad386c2b	ompi/oshmem: fix bug in shmem_finalize. Signed-off-by: Xin Zhao <xinz@mellanox.com>	2018-04-02 09:07:59 -05:00
Xin Zhao	a5b72cc2e4	ompi/oshmem: fix cswap bug in mca/atomic/mxm. Signed-off-by: Xin Zhao <xinz@mellanox.com>	2018-03-30 03:17:01 -05:00
Xin Zhao	af32c305de	ompi/oshmem: fix bug in shmem_alltoall in mca/scoll/basic. Signed-off-by: Xin Zhao <xinz@mellanox.com>	2018-03-29 14:54:36 -05:00
Artem Polyakov	77ff99e9ee	Merge pull request #4933 from karasevb/timings_update timings: added new timing points	2018-03-25 00:10:49 -07:00
Jeff Squyres	c3adcb05eb	Miscellaneous compiler warnings fixes Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-03-23 11:45:30 -07:00
Boris Karasev	3796307a57	timings: added new timing points Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2018-03-21 05:16:25 +02:00
Alex Mikheev	292d185c30	oshmem: refactor group cache - Use opal hash table instead of list for group lookup. - Code cleanup/refactoring. Group cache is now a part of the proc_group. Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2018-02-22 11:48:06 +02:00
Yossi Itigin	1b1402299a	Merge pull request #4833 from alex-mikheev/topic/oshmem_gcache_grp_msg_fix oshmem: increase group cache size to 1000	2018-02-19 14:39:26 +02:00
Alex Mikheev	03a094b9a8	oshmem: increase group cache size to 1000 and fix typos in help messages Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2018-02-19 11:50:24 +02:00
Alex Mikheev	cca67a69ea	oshmem: scoll: fixes strided alltoall Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2018-02-19 09:41:21 +02:00
Gilles Gouaillardet	88e26c63e0	spml/ucx: fix a double free() issue in mca_spml_ucx_add_procs() error path Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-01-22 13:42:16 +09:00
Joshua Ladd	dbefb35aad	Merge pull request #4635 from karasevb/oshmem/spec_1.3/broadcast oshmem: remove "shmem_broadcast" in accordance with the spec v1.3	2018-01-17 12:11:09 -05:00
Alex Mikheev	ae326546f4	ompi/oshmem: ucx is selected over yalla/ikrit by default Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2018-01-17 15:08:04 +02:00
Yossi Itigin	1193e1eb83	spml_ucx: fix rkey leak Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2017-12-26 20:47:26 +02:00
Boris Karasev	5ea892dc0b	oshmem: remove `shmem_put/get` when not the C11 case in accordance with the spec v1.3 Fixes: https://github.com/open-mpi/ompi/issues/4307 Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2017-12-19 12:40:10 +02:00
Boris Karasev	f6818af1ab	oshmem: remove "shmem_broadcast" in accordance with the spec v1.3 Fixes: https://github.com/open-mpi/ompi/issues/4098 Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2017-12-19 10:41:12 +02:00
Nathan Hjelm	1282e98a01	opal/asm: rename existing arithmetic atomic functions This commit renames the arithmetic atomic operations in opal to indicate that they return the new value not the old value. This naming differentiates these routines from new functions that return the old value. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-11-30 10:41:22 -07:00
Nathan Hjelm	9d0b3fe9f4	opal/asm: remove opal_atomic_bool_cmpset functions This commit eliminates the old opal_atomic_bool_cmpset functions. They have been replaced by the opal_atomic_compare_exchange_strong functions. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-11-30 10:41:22 -07:00
Nathan Hjelm	3ff34af355	opal: rename opal_atomic_cmpset* to opal_atomic_bool_cmpset* This commit renames the atomic compare-and-swap functions to indicate the return value. This is in preperation for adding support for a compare-and-swap that returns the old value. At the same time the return type has been changed to bool. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-10-31 12:47:23 -06:00
Alex Mikheev	7cb7af1685	OSHMEM: add ucx to the list of default spmls Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2017-10-18 10:41:00 +03:00
Alina Sklarevich	c7f5d13550	OSHMEM/CONFIGURE: verbs component - restore the previous build behavior In case where support was requested but not found, stop the build. Signed-off-by: Alina Sklarevich <alinas@mellanox.com>	2017-10-16 11:53:02 +03:00
Alina Sklarevich	3008827f83	OSHMEM/CONFIGURE: Check for the presence of ibv_exp_reg_shared_mr. + The sshmem verbs component will disqualify itself if this verb isn't present on the build host. + In case where support was requested but not found, don't stop the build - continue without this component. Signed-off-by: Alina Sklarevich <alinas@mellanox.com>	2017-10-12 19:57:12 +03:00
Mike Dubman	3d1a7ddd9f	Merge pull request #4271 from karasevb/oshmem/spec oshmem: refactoring the definition of `SHMEM_ALLTOALLS_SYNC_SIZE`	2017-09-27 13:17:37 +03:00
Boris Karasev	7479328937	oshmem: refactoring the definition of `SHMEM_ALLTOALLS_SYNC_SIZE` Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2017-09-26 12:08:55 +03:00
Mike Dubman	4c98e6bde2	Merge pull request #4258 from yosefe/topic/spml-ucx-fix-quiet-typo spml_ucx: fix typo in shmem_quiet() error message.	2017-09-26 11:10:40 +03:00
Boris Karasev	584ff76dea	oshmem: introduced the definition `SHMEM_ALLTOALLS_SYNC_SIZE` In accordance with the OSHMEM spec, this definition must be included in the code. Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2017-09-26 09:12:09 +03:00
Yossi Itigin	3081576124	spml_ucx: fix typo in shmem_quiet() error message. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2017-09-24 19:20:55 +03:00
Gilles Gouaillardet	b9315edb85	configury: remove the --disable-mpi-io option Fixes open-mpi/ompi#2185 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-09-20 14:39:09 +09:00
Howard Pritchard	bfd5ed6e98	Merge pull request #1910 from hpcraink/pr/shmem_fix_f77 Fix shmem.fh: fails to compile with F77 fixed-form compiled programs...	2017-09-19 14:28:08 -06:00

1 2 3 4 5 ...

530 Коммитов