openmpi

Автор	SHA1	Сообщение	Дата
Howard Pritchard	d6d73b7724	mtl/ofi: replace OMPI_UNLIKELY with OPAL version one off patch for v4.0.x. for some reason commit on master didn't have this problem. Signed-off-by: Howard Pritchard <howardp@lanl.gov> (cherry picked from commit 5f3dbdb5c8a94a4f426ecca1a3a91c83035f956c) Note that this commit is actually a cherry-pick from the v4.0.x branch. This is the opposite direction than what we nornmally do: we usually commit to master first and then cherry-pick to the release branches (vs. the other way around). As is probably evident from the original commit message above, through a comedy of errors, this commit was actually applied to the v4.0.x branch first and then cherry-picked back to master (i.e., the problem did exist in the original master commit 3aca4af548a3d781b6b52f89f4d6c7e66d379609, but it was not recongized at the time). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-10-01 09:52:27 -07:00
Michael Heinz	3aca4af548	REF6976 Silent failure of OMPI over OFI with large messages sizes INTERNAL: STL-59403 The OFI (libfabric) MTL does not respect the maximum message size parameter that OFI provides in the fi_info data. This patch adds this missing max_msg_size field to the mca_ofi_module_t structure and adds a length check to the low-level send routines. Change-Id: I05aa71d332f2df897133b30c28bf37d98f061996 Signed-off-by: Michael Heinz <michael.william.heinz@intel.com> Reviewed-by: Adam Goldman <adam.goldman@intel.com> Reviewed-by: Brendan Cunningham <brendan.cunningham@intel.com>	2019-09-23 15:23:48 -04:00
Jeff Squyres	ac54d771ec	mtl/ofi: add a .gitignore Ignore generated files. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-05-01 14:00:00 -07:00
Matias A Cabral	25bdd118ac	MTL_OFI: Changed Recv cancel to be non-blocking Updated the OFI MTL's Recv cancel to be a non-blocking call to match the MPI spec. Given fi_cancel succeeded, then it is expected that the user will wait on the request to read the result of if the cancel has completed. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com	2019-02-14 17:07:20 -05:00
Aravind Gopalakrishnan	6edcc479c4	mtl/ofi: Fix segfault when not using Thread-Grouping feature For the non thread-grouping paths, only the first (0th) OFI context should be used for communication. Otherwise this would access a non existant array item and cause segfault. While at it, clarifiy some content regarding SEPs in README (Credit to Matias Cabral for README edits). Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-02-07 11:52:53 -08:00
Jeff Squyres	f5e1a672cc	ofi: revamp OPAL_CHECK_OFI configury Update the OPAL_CHECK_OFI configury macro: - Make it safe to call the macro multiple times: - The checks only execute the first time it is invoked - Subsequent invocations, it just emits a friendly "checking..." message so that configure output is sensible/logical - With the goal of ultimately removing opal/mca/common/ofi, rename the output variables from OPAL_CHECK_OFI to be opal_ofi_{happy\|CPPFLAGS\|LDFLAGS\|LIBS}. - Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions. - Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that causes the macro to be invoked at a fairly random time, which makes configure stdout confusing / hard to grok. - Remove a little left-over kruft in OPAL_CHECK_OFI, too (which resulted in an indenting change, making the change to opal_check_ofi.m4 look larger than it really is). Thanks Alastair McKinstry for the report and initial fix. Thanks Rashika Kheria for the reminder. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Jeff Squyres	aba2571881	mtl/ofi/Makefile.am: down with tabs! Replace all tabs with spaces. No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Gilles Gouaillardet	945f830f7a	mtl/ofi: fix configury when VPATH is used Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-07 06:29:58 -08:00
Aravind Gopalakrishnan	9cabcfdbba	mtl/ofi: Fix reference to help text object When we exceed the threshold number of contexts created, print appropriate help text Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-01-29 15:10:06 -08:00
Brian Barrett	23da9fac23	Merge pull request #6294 from bwbarrett/mtl-ofi-no-device-warning mtl/ofi: Print descriptive error message on modex failure	2019-01-29 08:32:49 -08:00
Brian Barrett	44be7f139a	mtl/ofi: Provide av count hint during initialization Provide the av_attr.count hint (number of addresses that will be inserted into the address vector through the life of the process) at initialization of the address vector. It's ok to be a bit wrong, but some endpoints (RxR) can benefit by not going through the slow growth realloc churn. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2019-01-24 15:47:24 -08:00
Brian Barrett	fe25097194	mtl/ofi: Print descriptive error message on modex failure With MTLs, there's no "other transport" when the remote side does not have an active NIC, so we should print a useful error message when the modex failed (indicating lack of a NIC on the remote side). Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2019-01-21 23:50:31 +00:00
Aravind Gopalakrishnan	37f9aff2a0	mtl/ofi: Add MCA variables to enable SEP and to request number of OFI contexts Moving to a model where we have users actively _enable_ SEP feature for use rather than opening SEP by default if provider supports it. This allows us to not regress (either functionally or for performance reasons) any apps that were working correctly on regular endpoints. Also, providing MCA to specify number of OFI contexts to create and default this value to 1 (Given btl/ofi also creates one by default, this reduces the incidence of a scenario where we allocate all available contexts by default and if btl/ofi asks for one more, then provider breaks as it doesn't support it). While at it, spruce up README on SEP content. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-01-14 09:58:36 -08:00
Spruit, Neil R	bef5f50a42	MTL_OFI: Generation of specialized functions at build time -> Added new targets in Makefile.am to call a new build script generate-opt-funcs.pl to generate specialized functions for each .pm file. -> Added new perl module .pm files for send,isend,irecv,iprobe,improbe which are loaded by generate-opt-funcs.pl to create new source files that correspond to the name of the .pm file to be used as part of MTL OFI. -> Added mtl_ofi_opt.pm.template and updated README with details on the specialization features and how to add additional specialization support. -> Added new opt_common/mtl_ofi_opt_common.pm containing common functions for generating the specialized functions used by all other *.pm modules. -> Added new mtl_ofi.h which includes the definitions for the function symbol table for storing the specialized functions along with the definitions for the initialization functions for the corresponding function pointers. -> Based off the OFI provider capabilities the specialized function pointers are assigned at mtl_ofi_component_init to the corresponding MTL OFI function. -> mca_mtl_ofi_module_t has been updated with the symbol table struct which is assigned at component init. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-12-13 00:35:19 -08:00
Aravind Gopalakrishnan	e5e19dfcf7	Fix for SEP when num local procs is greater than available contexts For cases when the number of local processes is greater than the number of available contexts, the SEP initialization phase would calculate the number of contexts to provision for each rank to be 0 and would eventually crash. Fix the issue here by using regular endpoints in the event the number of local processes is more than available contexts. This fixes issue #6182. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-12-12 16:49:04 -08:00
Brian Barrett	6e15128d96	mtl/ofi: Fix crash if no providers found Commit 109d0569ffd introduced a crash when an error occurred before ofi_ctxt was allocated, including when no providers passed the selection logic. Properly check that the pointer is not NULL in the error cleanup code before dereferencing the pointer. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-12-11 15:46:18 -08:00
Aravind Gopalakrishnan	109d0569ff	MTL/OFI: Add OFI Scalable Endpoint support OFI MTL supports OFI Scalable Endpoints feature as means to improve multi-threaded application throughput and message rate. Currently the feature is designed to utilize multiple TX/RX contexts exposed by the OFI provider in conjunction with a multi-communicator MPI application model. For more information, refer to README under mtl/ofi. Reviewed-by: Matias Cabral <matias.a.cabral@intel.com> Reviewed-by: Neil Spruit <neil.r.spruit@intel.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-12-03 09:56:52 -08:00
matcabral	6a15712df5	MTL/OFI: revert PR 6082 Revert to avoid issues with dynamic processes. Signed-off-by: matcabral <matias.a.cabral@intel.com>	2018-11-30 13:44:39 -08:00
matcabral	5f58453e63	MTL/OFI: Lower priority when all procs are local So far Vader is faster than OFI MTL for doing shared memory. Therefore, let it run by default when all procs are local. Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com> Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com> Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-11-14 11:01:33 -08:00
Aravind Gopalakrishnan	5cf43de445	MTL/OFI: Check threshold number of peers allowed per rank When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when this limit is crossed. Check the max allowed number of ranks during add_procs() and return if there is danger of exceeding this threshold. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-11-01 14:03:00 -07:00
Brian Barrett	e9e4d2a4bc	Handle asprintf errors with opal_asprintf wrapper The Open MPI code base assumed that asprintf always behaved like the FreeBSD variant, where ptr is set to NULL on error. However, the C standard (and Linux) only guarantee that the return code will be -1 on error and leave ptr undefined. Rather than fix all the usage in the code, we use opal_asprintf() wrapper instead, which guarantees the BSD-like behavior of ptr always being set to NULL. In addition to being correct, this will fix many, many warnings in the Open MPI code base. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-10-08 16:43:53 -07:00
Brian Barrett	c5eaa38491	mtl ofi: Change from opt-in to opt-out provider selection Change default provider selection logic for the OFI MTL. The old logic was whitelist-only, so any new HPC NIC provider would have to ask users to do extra work or wait for an OMPI release to be whitelisted. The reason for the logic was to avoid selecting a "generic" provider like sockets or shm that would frequently have worse performance than the optimized BTL options Open MPI supports. With the change, we blacklist the (small, relatively static) list of providers that duplicate internal capabilities. Users can use one of thse blacklisted providers in two ways: first, they can explicitly request the provider in the include list (which will override the default exclude list) and second, the can set a new empty exclude list. Since most HPC networks require special libraries and therefore an explicit build of libfabric, it is highly unlikely that this change will cause users to use libfabric when they didn't want to do so. It does, however, solve the whitelisting problem. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-09-27 11:02:18 -07:00
Aravind Gopalakrishnan	5cbcae79d8	MTL OFI: Ask for FI_THREAD_DOMAIN support when not using MPI_THREAD_MULTIPLE When an application is not using multiple threads to call into MPI, we can safely ask for FI_THREAD_DOMAIN setting from the provider as it should translate to the least amount of locking in provider. Conversely, for applications using THREAD_MULTIPLE, explicitly ask for FI_THREAD_SAFE to prevent race conditions. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-08-23 14:18:32 -07:00
Aravind Gopalakrishnan	ed2343034d	MTL OFI: Fix race condition due to global progress entries array Since progress entries array is globally allocated, it is susceptible to race conditions when using multi-threaded applications. Allocating it on the stack resolves any potential races as it is thread local by default. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-08-09 10:52:28 -07:00
Ralph Castain	1aef0a64aa	Merge pull request #5477 from nrspruit/ns_mtl_send_isend MTL OFI: send/isend split into blocking/non-blocking paths	2018-07-31 13:08:37 -07:00
Spruit, Neil R	7dc8c8ba3f	MTL OFI: send/isend split into blocking/non-blocking paths -Updated blocking send to directly call functionality and set completion events expected to 0 initally. This allows for optimization for providers that support fi_tinject up to larger sizes. This also reduces latency on running the OFI mtl with smaller sizes without requiring calls to progress given fi_tinject is required to complete the messaging before returning and will not create any events in the Completion Queue. -Updated non-blocking send to directly call fi_tsend and avoid calling fi_tinject as the functionality should not wait on completions. This resolves a bug where applications calling MPI_Isend can overrun the TX buffer with small (inject) messages causing a deadlock. In addition this improves performance in message rates by preventing waiting on any size message to complete in non-blocking send messages. -Created common ompi_mtl_ofi_ssend_recv function to post the ssend recv which is common between isend and send code paths. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-24 07:54:24 -07:00
Spruit, Neil R	767135c580	MTL OFI: Fix Deadlock in fi_cancel given completion during cancel - If a message for a recv that is being cancelled gets completed after the call to fi_cancel, then the OFI mtl will enter a deadlock state waiting for ofi_req->super.ompi_req->req_status._cancelled which will never happen since the recv was successfully finished. - To resolve this issue, the OFI mtl now checks ofi_req->req_started to see if the request has been started within the loop waiting for the event to be cancelled. If the request is being completed, then the loop is broken and fi_cancel exits setting ofi_req->super.ompi_req->req_status._cancelled = false; Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-24 03:12:44 -07:00
Matias Cabral	d996f529c0	MTL OFI: Add support for mem_tag_format OFI providers may reserve some of the upper bits of the tag for internal usage and expose it using mem_tag_format. Check for that and adjust communicator bits as needed. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-07-23 11:39:40 -07:00
Spruit, Neil R	d4f408a7f8	MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow - Added support in MTL_OFI_RETRY_UNTIL_DONE to handle -FI_EAGAIN from the provider and correctly attempt to progress the OFI Completion queue by calling ompi_mtl_ofi_progress. - If events were pending that blocked OFI operations from being enqueued they will be completed and the OFI operation will be retried once ompi_mtl_ofi_progress has successfully completed. - Updated MTL_OFI_RETRY_UNTIL_DONE to take a RETURN variable instead of requiring the existance of a "ret" variable to pass back the return value from completing the OFI operation. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-17 03:00:38 -07:00
Spruit, Neil R	9a17864278	MTL OFI: Redesign sync send with reduced tag bits and quick ack -Updated the design for sync send MPI calls to use 2 protocol bits for denoting "sync_send" or "sync_send_ack". -"Sync_send" is added to the send tag only and is masked out in receives such that it can be read by the original Recv posted in the send/recv operation. -"Sync_send_ack" is sent from the recv callback to the send side. This 0 byte send does not generate a completion entry and instead sends the message and immediately completes the opal completion in the recv. -Tag formats ofi_tag_1 and ofi_tag_2 have been updated to include 2 more tag bits per format type due to the reduced protocal bits required by OMPI. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-09 06:50:21 -07:00
Matias A Cabral	e6674556aa	MTL OFI: add support for FI_REMOTE_CQ_DATA. Extend number of supported ranks with providers that support FI_REMOTE_CQ_DATA. Add README file to OFI MTL Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-06-14 17:17:38 -07:00
Spruit, Neil R	e7bff501cd	MTL OFI: Added support for reading multiple CQ events in ofi progress -Updated ompi_mtl_ofi_progress to use an array to read CQ events up to a threshold that can be set by the Open MPI User. -Users can adjust the number of events that can be handled in the ompi_mtl_ofi_progress by setting "--mca mtl_ofi_progress_event_cnt #". -The default value for the the number of CQ events that can be read in a single call to ofi progress is 100 which is an average based off workload usecase anaylsis showing 70-128 as the range of multiple events returned during ofi progress. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-02-15 09:41:14 -05:00
Aravind Gopalakrishnan	fb68726baf	MTL OFI: Allow retries in MTL progress for interrupted syscalls This fixes a regression in sockets provider which could return -EINTR value from fi_cq_read() due to a syscall being interrupted. The error value is currently interpreted as fatal condition. Relax the rule so that we can retry fi_cq_read() operation. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2017-12-20 14:58:49 -08:00
Howard Pritchard	cd48eccbae	mtl/ofi: fix problem with mprobe/mrecv At least with some providers (sockets and GNI), the mprobe/mrecv ofi mtl methods were incorrect. For these two providers at least one must supply the original tag and mask bits used with the prior FI_PEEK \| FI_CLAIM request that had been used to probe for the message. These providers take a strict interpretation of the following sentence from the libfabric fi_tagged man page: ``` Claimed messages can only be retrieved using a subsequent, paired receive operation with the FI_CLAIM flag set. ``` Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-11-24 08:11:18 -07:00
Matias Cabral	d1869a725a	Merge pull request #4467 from matcabral/master mtl/ofi: Set data and control progress options default values to FI_PROGRESS_UNSPEC	2017-11-13 07:35:39 -08:00
Jeff Squyres	a8686a6813	mtl ofi: squelch compiler warnings gcc 5.2 complains: ``` mtl_ofi_component.c: In function ‘ompi_mtl_ofi_finalize’: mtl_ofi_component.c:613:5: warning: suggest parentheses around assignment used as truth value [-Wparentheses] if (ret = fi_close((fid_t)ompi_mtl_ofi.fabric)) { ^ ``` Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-11-11 05:07:11 -08:00
Jeff Squyres	5a6ddf42d6	mtl ofi: it is not an error to return no data from fi_getinfo() Before this commit, the presence of usNIC devices -- which will (currently) return no data when fi_getinfo() is queried for tagged matching providers -- would cause an error message to be displayed. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-11-11 05:07:11 -08:00
Jeff Squyres	f910f554f7	mtl ofi: show the positive value of the error The value of ret is negative (e.g., -61), but it is displayed in the help message as `%zd`, which renders as unsigned (i.e., a giant positive value). So make sure to negate the negative value before rendering it (e.g., so we display "61", not "4294967235"). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-11-11 05:07:11 -08:00
Jeff Squyres	e8c13ef286	mtl ofi: fix trivial comment whitespace No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-11-11 05:07:10 -08:00
Jeff Squyres	bed1930df8	mtl ofi: fix formatting of help message No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-11-11 05:07:05 -08:00
Matias Cabral	b76bb42ac1	mtl/ofi: Set data and control progress options default values to FI_PROGRESS_UNSPEC so each provider will use its default. Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2017-11-08 08:24:33 -08:00
Aravind Gopalakrishnan	285fc42b4e	Fix OFI MTL to recognize correct CQ empty scenario Currently, the progress function is incorrectly interpreting any error value other than a positive value or -FI_EAVAIL to mean CQ is empty. CQ is empty only if fi_cq_read() call returned -EAGAIN error code. Fix that here. While at it, fix help text output for calls made to OFI API. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2017-10-30 12:13:44 -07:00
yohann	1f8cabc890	mtl/ofi: Fix provider selection. This allows mtl_ofi_provider_include to work with layered providers as well. e.g. --mca mtl_ofi_provider_include "providerX;ofi_rxm" Signed-off-by: yohann <yohann.burette@intel.com>	2017-09-20 16:00:50 -07:00
Joshua Hursey	e1d079544b	mca: Dynamic components link against project lib * Resolves #3705 * Components should link against the project level library to better support `dlopen` with `RTLD_LOCAL`. * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am` with the appropriate project level library: ``` MCA components in ompi/ $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la MCA components in orte/ $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la MCA components in opal/ $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la MCA components in oshmem/ $(top_builddir)/oshmem/liboshmem.la" ``` Note: The changes in this commit were automated by the script in the commit that proceeds it with the `libadd_mca_comp_update.py` script. Some components were not included in this change because they are statically built only. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-08-24 11:56:16 -04:00
Howard Pritchard	841192645b	common/libfabric: move libfabric to ofi This PR renames the common library for OFI libfabric from libfabric to ofi. There are a number of reasons this is good to do: 1) its shorter and replaces 9 characters with three for function names for what may eventually be a fairly extensive interface 2) OFI is the term used for MTL and RML components that use the OFI libfabric interface 3) A planned OSC component will also use the OFI term. 4) Other HPC libraries that can use OFI libfabric tend to use the term "ofi" internally and also in their configure options relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM) There seem to be comments in places in the Open MPI source code that indicate that this common library will be going away. Far from it as we will want to be able to share things like AV objects between OMPI and possibly OSHMEM components that use the OFI libfabric interface. This PR also adds a synonym to the --with-libfabric(-libdir) configury options: --with-ofi and with-ofi-libdir. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-04-20 13:07:16 -06:00
Ralph Castain	1e2019ce2a	Revert "Update to sync with OMPI master and cleanup to build" This reverts commit cb55c88a8b7817d5891ff06a447ea190b0e77479.	2016-11-22 15:03:20 -08:00
Ralph Castain	cb55c88a8b	Update to sync with OMPI master and cleanup to build Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-22 14:24:54 -08:00
Howard Pritchard	61d62b6821	mtl/ofi: fix a botched assignment of av_type Well now the av_type is being assigned correctly Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-08-19 17:01:02 -05:00
Howard Pritchard	e46eee3fcb	mtl/ofi: use mca param to set av type Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-08-10 16:10:17 -06:00
Howard Pritchard	22c8743557	mtl/ofi: add some more mca parameters allow for toggling of both control/data progress models. allow for using FI_AV_TABLE or FI_AV_MAP for av type. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-07-28 02:35:09 -06:00

1 2 3

127 Коммитов