openmpi

Автор	SHA1	Сообщение	Дата
Aravind Gopalakrishnan	109d0569ff	MTL/OFI: Add OFI Scalable Endpoint support OFI MTL supports OFI Scalable Endpoints feature as means to improve multi-threaded application throughput and message rate. Currently the feature is designed to utilize multiple TX/RX contexts exposed by the OFI provider in conjunction with a multi-communicator MPI application model. For more information, refer to README under mtl/ofi. Reviewed-by: Matias Cabral <matias.a.cabral@intel.com> Reviewed-by: Neil Spruit <neil.r.spruit@intel.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-12-03 09:56:52 -08:00
matcabral	6a15712df5	MTL/OFI: revert PR 6082 Revert to avoid issues with dynamic processes. Signed-off-by: matcabral <matias.a.cabral@intel.com>	2018-11-30 13:44:39 -08:00
matcabral	5f58453e63	MTL/OFI: Lower priority when all procs are local So far Vader is faster than OFI MTL for doing shared memory. Therefore, let it run by default when all procs are local. Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com> Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com> Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-11-14 11:01:33 -08:00
Aravind Gopalakrishnan	5cf43de445	MTL/OFI: Check threshold number of peers allowed per rank When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when this limit is crossed. Check the max allowed number of ranks during add_procs() and return if there is danger of exceeding this threshold. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-11-01 14:03:00 -07:00
Brian Barrett	e9e4d2a4bc	Handle asprintf errors with opal_asprintf wrapper The Open MPI code base assumed that asprintf always behaved like the FreeBSD variant, where ptr is set to NULL on error. However, the C standard (and Linux) only guarantee that the return code will be -1 on error and leave ptr undefined. Rather than fix all the usage in the code, we use opal_asprintf() wrapper instead, which guarantees the BSD-like behavior of ptr always being set to NULL. In addition to being correct, this will fix many, many warnings in the Open MPI code base. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-10-08 16:43:53 -07:00
Brian Barrett	c5eaa38491	mtl ofi: Change from opt-in to opt-out provider selection Change default provider selection logic for the OFI MTL. The old logic was whitelist-only, so any new HPC NIC provider would have to ask users to do extra work or wait for an OMPI release to be whitelisted. The reason for the logic was to avoid selecting a "generic" provider like sockets or shm that would frequently have worse performance than the optimized BTL options Open MPI supports. With the change, we blacklist the (small, relatively static) list of providers that duplicate internal capabilities. Users can use one of thse blacklisted providers in two ways: first, they can explicitly request the provider in the include list (which will override the default exclude list) and second, the can set a new empty exclude list. Since most HPC networks require special libraries and therefore an explicit build of libfabric, it is highly unlikely that this change will cause users to use libfabric when they didn't want to do so. It does, however, solve the whitelisting problem. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-09-27 11:02:18 -07:00
Nathan Hjelm	000f9eed4d	opal: add types for atomic variables This commit updates the entire codebase to use specific opal types for all atomic variables. This is a change from the prior atomic support which required the use of the volatile keyword. This is the first step towards implementing support for C11 atomics as that interface requires the use of types declared with the _Atomic keyword. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-09-14 10:48:55 -06:00
Gilles Gouaillardet	316e4e38f4	mtl/psm2: fix a misc memory leak Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-08-30 10:07:17 +09:00
Aravind Gopalakrishnan	5cbcae79d8	MTL OFI: Ask for FI_THREAD_DOMAIN support when not using MPI_THREAD_MULTIPLE When an application is not using multiple threads to call into MPI, we can safely ask for FI_THREAD_DOMAIN setting from the provider as it should translate to the least amount of locking in provider. Conversely, for applications using THREAD_MULTIPLE, explicitly ask for FI_THREAD_SAFE to prevent race conditions. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-08-23 14:18:32 -07:00
Aravind Gopalakrishnan	ed2343034d	MTL OFI: Fix race condition due to global progress entries array Since progress entries array is globally allocated, it is susceptible to race conditions when using multi-threaded applications. Allocating it on the stack resolves any potential races as it is thread local by default. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-08-09 10:52:28 -07:00
Ralph Castain	1aef0a64aa	Merge pull request #5477 from nrspruit/ns_mtl_send_isend MTL OFI: send/isend split into blocking/non-blocking paths	2018-07-31 13:08:37 -07:00
Spruit, Neil R	7dc8c8ba3f	MTL OFI: send/isend split into blocking/non-blocking paths -Updated blocking send to directly call functionality and set completion events expected to 0 initally. This allows for optimization for providers that support fi_tinject up to larger sizes. This also reduces latency on running the OFI mtl with smaller sizes without requiring calls to progress given fi_tinject is required to complete the messaging before returning and will not create any events in the Completion Queue. -Updated non-blocking send to directly call fi_tsend and avoid calling fi_tinject as the functionality should not wait on completions. This resolves a bug where applications calling MPI_Isend can overrun the TX buffer with small (inject) messages causing a deadlock. In addition this improves performance in message rates by preventing waiting on any size message to complete in non-blocking send messages. -Created common ompi_mtl_ofi_ssend_recv function to post the ssend recv which is common between isend and send code paths. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-24 07:54:24 -07:00
Spruit, Neil R	767135c580	MTL OFI: Fix Deadlock in fi_cancel given completion during cancel - If a message for a recv that is being cancelled gets completed after the call to fi_cancel, then the OFI mtl will enter a deadlock state waiting for ofi_req->super.ompi_req->req_status._cancelled which will never happen since the recv was successfully finished. - To resolve this issue, the OFI mtl now checks ofi_req->req_started to see if the request has been started within the loop waiting for the event to be cancelled. If the request is being completed, then the loop is broken and fi_cancel exits setting ofi_req->super.ompi_req->req_status._cancelled = false; Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-24 03:12:44 -07:00
Matias Cabral	d996f529c0	MTL OFI: Add support for mem_tag_format OFI providers may reserve some of the upper bits of the tag for internal usage and expose it using mem_tag_format. Check for that and adjust communicator bits as needed. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-07-23 11:39:40 -07:00
Spruit, Neil R	d4f408a7f8	MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow - Added support in MTL_OFI_RETRY_UNTIL_DONE to handle -FI_EAGAIN from the provider and correctly attempt to progress the OFI Completion queue by calling ompi_mtl_ofi_progress. - If events were pending that blocked OFI operations from being enqueued they will be completed and the OFI operation will be retried once ompi_mtl_ofi_progress has successfully completed. - Updated MTL_OFI_RETRY_UNTIL_DONE to take a RETURN variable instead of requiring the existance of a "ret" variable to pass back the return value from completing the OFI operation. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-17 03:00:38 -07:00
Spruit, Neil R	9a17864278	MTL OFI: Redesign sync send with reduced tag bits and quick ack -Updated the design for sync send MPI calls to use 2 protocol bits for denoting "sync_send" or "sync_send_ack". -"Sync_send" is added to the send tag only and is masked out in receives such that it can be read by the original Recv posted in the send/recv operation. -"Sync_send_ack" is sent from the recv callback to the send side. This 0 byte send does not generate a completion entry and instead sends the message and immediately completes the opal completion in the recv. -Tag formats ofi_tag_1 and ofi_tag_2 have been updated to include 2 more tag bits per format type due to the reduced protocal bits required by OMPI. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-09 06:50:21 -07:00
Matias A Cabral	e6674556aa	MTL OFI: add support for FI_REMOTE_CQ_DATA. Extend number of supported ranks with providers that support FI_REMOTE_CQ_DATA. Add README file to OFI MTL Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-06-14 17:17:38 -07:00
Brian Barrett	09e4c40ce9	mtl: remove MXM MTL Remove the MXM MTL, which has been deprecated in preference for the Yalla PML. This was discussed at the last developers meeting and somehow I ended up with the action item to do the removal. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-05-21 14:18:30 -07:00
Nathan Hjelm	f432d07844	mtl: reset ompi_mtl_base_selected_component on framework close Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-05-02 14:53:34 -06:00
Todd Kordenbrock	d646a00cd9	Merge pull request #5054 from tkordenbrock/topic/master/mtl-portals4.finalize.fix master: mtl-portals4: don't call progress() in finalize() if Portals4 was not initialized	2018-04-12 12:12:05 -05:00
Todd Kordenbrock	90659671bc	mtl-portals4: don't call progress() in finalize() if Portals4 was not initialized This commit fixes a segfault in mtl-portals4 finalize(). The segfault occurs if finalize() is called without any calls to add_procs(). This commit resolves the segfault by skipping the progress() loop in finalize() if the Portals was not initialized. Signed-off-by: Todd Kordenbrock (thkgcode@gmail.com)	2018-04-10 14:22:32 -05:00
Spruit, Neil R	e7bff501cd	MTL OFI: Added support for reading multiple CQ events in ofi progress -Updated ompi_mtl_ofi_progress to use an array to read CQ events up to a threshold that can be set by the Open MPI User. -Users can adjust the number of events that can be handled in the ompi_mtl_ofi_progress by setting "--mca mtl_ofi_progress_event_cnt #". -The default value for the the number of CQ events that can be read in a single call to ofi progress is 100 which is an average based off workload usecase anaylsis showing 70-128 as the range of multiple events returned during ofi progress. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-02-15 09:41:14 -05:00
Aravind Gopalakrishnan	fb68726baf	MTL OFI: Allow retries in MTL progress for interrupted syscalls This fixes a regression in sockets provider which could return -EINTR value from fi_cq_read() due to a syscall being interrupted. The error value is currently interpreted as fatal condition. Relax the rule so that we can retry fi_cq_read() operation. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2017-12-20 14:58:49 -08:00
Matias Cabral	2c86b8723d	Merge pull request #4510 from matcabral/mtl_psm2_shadow_vars New flag for MCA parameters that allows a behaving with a default value of "unset".	2017-12-04 12:25:37 -08:00
Howard Pritchard	b160cf6339	Merge pull request #4533 from hppritcha/topic/ofi_mtl_mprobe_fixes mtl/ofi: fix problem with mprobe/mrecv	2017-12-04 09:11:47 -07:00
Nathan Hjelm	1282e98a01	opal/asm: rename existing arithmetic atomic functions This commit renames the arithmetic atomic operations in opal to indicate that they return the new value not the old value. This naming differentiates these routines from new functions that return the old value. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-11-30 10:41:22 -07:00
Nathan Hjelm	9d0b3fe9f4	opal/asm: remove opal_atomic_bool_cmpset functions This commit eliminates the old opal_atomic_bool_cmpset functions. They have been replaced by the opal_atomic_compare_exchange_strong functions. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-11-30 10:41:22 -07:00
Howard Pritchard	cd48eccbae	mtl/ofi: fix problem with mprobe/mrecv At least with some providers (sockets and GNI), the mprobe/mrecv ofi mtl methods were incorrect. For these two providers at least one must supply the original tag and mask bits used with the prior FI_PEEK \| FI_CLAIM request that had been used to probe for the message. These providers take a strict interpretation of the following sentence from the libfabric fi_tagged man page: ``` Claimed messages can only be retrieved using a subsequent, paired receive operation with the FI_CLAIM flag set. ``` Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-11-24 08:11:18 -07:00
Matias A Cabral	1fad59465f	New flag for MCA parameters that allows a behaving with a default value of "unset". mtl/psm2: Update some shadow mca parameters to use the default "unset". mtl/psm2: Add new shadow parameter to allow specifying the service level. Signed-off-by: Matias A Cabral <matias.a.cabral@intel.com>	2017-11-16 16:28:50 -08:00
Matias Cabral	d1869a725a	Merge pull request #4467 from matcabral/master mtl/ofi: Set data and control progress options default values to FI_PROGRESS_UNSPEC	2017-11-13 07:35:39 -08:00
Jeff Squyres	a8686a6813	mtl ofi: squelch compiler warnings gcc 5.2 complains: ``` mtl_ofi_component.c: In function ‘ompi_mtl_ofi_finalize’: mtl_ofi_component.c:613:5: warning: suggest parentheses around assignment used as truth value [-Wparentheses] if (ret = fi_close((fid_t)ompi_mtl_ofi.fabric)) { ^ ``` Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-11-11 05:07:11 -08:00
Jeff Squyres	5a6ddf42d6	mtl ofi: it is not an error to return no data from fi_getinfo() Before this commit, the presence of usNIC devices -- which will (currently) return no data when fi_getinfo() is queried for tagged matching providers -- would cause an error message to be displayed. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-11-11 05:07:11 -08:00
Jeff Squyres	f910f554f7	mtl ofi: show the positive value of the error The value of ret is negative (e.g., -61), but it is displayed in the help message as `%zd`, which renders as unsigned (i.e., a giant positive value). So make sure to negate the negative value before rendering it (e.g., so we display "61", not "4294967235"). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-11-11 05:07:11 -08:00
Jeff Squyres	e8c13ef286	mtl ofi: fix trivial comment whitespace No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-11-11 05:07:10 -08:00
Jeff Squyres	bed1930df8	mtl ofi: fix formatting of help message No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-11-11 05:07:05 -08:00
Matias Cabral	b76bb42ac1	mtl/ofi: Set data and control progress options default values to FI_PROGRESS_UNSPEC so each provider will use its default. Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2017-11-08 08:24:33 -08:00
bosilca	63e8a8c608	Merge pull request #4431 from hjelmn/asm_cleanup opal: rename opal_atomic_cmpset* to opal_atomic_bool_cmpset*	2017-11-02 18:45:56 -04:00
Nathan Hjelm	3ff34af355	opal: rename opal_atomic_cmpset* to opal_atomic_bool_cmpset* This commit renames the atomic compare-and-swap functions to indicate the return value. This is in preperation for adding support for a compare-and-swap that returns the old value. At the same time the return type has been changed to bool. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-10-31 12:47:23 -06:00
Aravind Gopalakrishnan	285fc42b4e	Fix OFI MTL to recognize correct CQ empty scenario Currently, the progress function is incorrectly interpreting any error value other than a positive value or -FI_EAVAIL to mean CQ is empty. CQ is empty only if fi_cq_read() call returned -EAGAIN error code. Fix that here. While at it, fix help text output for calls made to OFI API. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2017-10-30 12:13:44 -07:00
Aravind Gopalakrishnan	bea4503f95	Move help text output regarding PSM2_CUDA envvar to component init phase The messages should be printed only in the event of CUDA builds and in the presence of supporting hardware and when PSM2 MTL has actually been selected for use. To this end, move help text output to component init phase. Also use opal_setenv/unsetenv() for safer setting, unsetting of the environment variable and sanitize the help text message. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2017-10-26 16:01:01 -07:00
Matias Cabral	b81bcd4b0d	MTL PSM2: add a thread lock while peeking and completing the psm2 requests. Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com> Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2017-10-20 14:46:48 -07:00
Aravind Gopalakrishnan	f8a2b7f6bf	Use opal_show_help to warn about PSM2_CUDA envvar setting If Open MPI is configured with CUDA, then user also should be using a CUDA build of PSM2 and therefore be setting PSM2_CUDA environment variable to 1 while using CUDA buffers for transfers. If we detect this setting to be missing, force set it. If user wants to use this build for regular (Host buffer) transfers, we allow the option of setting PSM2_CUDA=0, but print a warning message to user that it is not a recommended usage scenario. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2017-09-29 17:04:10 -07:00
yohann	1f8cabc890	mtl/ofi: Fix provider selection. This allows mtl_ofi_provider_include to work with layered providers as well. e.g. --mca mtl_ofi_provider_include "providerX;ofi_rxm" Signed-off-by: yohann <yohann.burette@intel.com>	2017-09-20 16:00:50 -07:00
Aravind Gopalakrishnan	2e83cf15ce	Add support for GPU buffers for PSM2 MTL PSM2 enables support for GPU buffers and CUDA managed memory and it can directly recognize GPU buffers, handle copies between HFIs and GPUs. Therefore, it is not required for OMPI to handle GPU buffers for pt2pt cases. In this patch, we allow the PSM2 MTL to specify when it does not require CUDA convertor support. This allows us to skip CUDA convertor init phases and lets PSM2 handle the memory transfers. This translates to improvements in latency. The patch enables blocking collectives and workloads with GPU contiguous, GPU non-contiguous memory. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2017-09-01 16:59:03 -07:00
Joshua Hursey	e1d079544b	mca: Dynamic components link against project lib * Resolves #3705 * Components should link against the project level library to better support `dlopen` with `RTLD_LOCAL`. * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am` with the appropriate project level library: ``` MCA components in ompi/ $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la MCA components in orte/ $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la MCA components in opal/ $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la MCA components in oshmem/ $(top_builddir)/oshmem/liboshmem.la" ``` Note: The changes in this commit were automated by the script in the commit that proceeds it with the `libadd_mca_comp_update.py` script. Some components were not included in this change because they are statically built only. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-08-24 11:56:16 -04:00
Howard Pritchard	701a1d0218	mtl/psm2: add pvar support for PSM2 MQ stats Add pvars for PSM2 MQ stats to help in analyzing performance of Omnipath. Tested (modestly) using modified OSU pt2pt benchmarks. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-07-14 10:31:35 -06:00
Ryan Grant	0ce8590e7c	Merge pull request #3837 from tkordenbrock/topic/master/get.retry.timeout master: mtl-portals4: add timeout to rendezvous get fragments	2017-07-13 09:59:54 -06:00
Nathan Hjelm	6fb81f20e4	mtl/psm2: create mca variables to shadow PSM2 environment variables This commit enables MCA support for the following PSM2 environment variables: PSM2_DEVICES, PSM2_MEMORY, PSM2_MQ_SENDREQS_MAX, PSM2_MQ_RECVREQS_MAX, PSM2_MQ_RNDV_HFI_THRESH, PSM2_MQ_RNDV_SHM_THRESH, PSM2_RCVTHREAD, PSM2_SHAREDCONTEXTS, PSM2_SHAREDCONTEXTS_MAX, and PSM2_TRACEMASK. These variable can be set by MCA if they are not already set in the environment. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-07-13 09:48:46 -06:00
Todd Kordenbrock	5ecd905358	mtl/portals4: move opal_timer_base_get_usec() out of the fast path Rearrange the receive frag timeout logic to avoid calling opal_timer_base_get_usec() in read_msg(). Instead set it at the first retry. Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>	2017-07-09 22:12:45 -05:00
Todd Kordenbrock	37766d770d	mtl/portals4: if frag retry fails, then fail the entire receive If the a frag cannot be retried because the ni_fail_type is other than PTL_NI_DROPPED, then set the return type and jump to callback_error. This sets MPI_ERROR and completes the receive. Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>	2017-07-09 22:12:31 -05:00

1 2 3 4 5 ...

538 Коммитов