This commit rewrites much of the btl/self component to fix a long
standing memory usage bug. Before this commit the prepare_src path
would always allocate a max send fragment (256kB). This caused the
rank to allocate 32 * 256k useless buffers from one send. This commit
makes the following changes:
- Add the MCA_BTL_FLAGS_GET flag by default. No reason not to set it.
- Reduce the eager limit, max send size, buffers per allocation, and
maximum buffer count per fragment size. These changes should have
no noticible affect on performance but should greatly reduce the
memory usage of the component.
- Implement the sendi function. This should reduce self send latency
somewhat.
- Rewrite prepare_src to never allocate a eager or max send fragment
for contiguous data.
- add_procs needs to return something in the peer array for the proc
self not just set the reachability bit. Now stores (void *) 1.
- Various cleanups. Removed and unused file.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
MPI_Sizeof related stuff has been moved to their own files.
Remove MPI_Sizeof from Fortran interfaces when it cannot be built
(e.g. stock gcc 4.8 on CentOS 7)
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/base/rml_base_stubs.c
deleted: ../orte/mca/rml/ofi/.opal_ignore
modified: ../orte/mca/rml/ofi/Makefile.am
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
modified: ../orte/test/system/ofi_conduit_stress.c
Removed stale include directive
modified: ../orte/mca/rml/ofi/Makefile.am
The ofi plugin supports multiple providers, and identifies them
by ofi_prov_id, changed the previous name conduit_id to ofi_prov_id
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_request.h
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Adding ofi plugin to allow for opening a conduit to use ethernet/fabric.
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/base/rml_base_stubs.c
deleted: ../orte/mca/rml/ofi/.opal_ignore
modified: ../orte/mca/rml/ofi/Makefile.am
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
modified: ../orte/test/system/ofi_conduit_stress.c
Removed stale include directive
modified: ../orte/mca/rml/ofi/Makefile.am
The ofi plugin supports multiple providers, and identifies them
by ofi_prov_id, changed the previous name conduit_id to ofi_prov_id
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_request.h
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Fixed merge issues, and minor pull-request comments
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Adding ofi plugin to allow for opening a conduit to use ethernet/fabric.
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/base/rml_base_stubs.c
deleted: ../orte/mca/rml/ofi/.opal_ignore
modified: ../orte/mca/rml/ofi/Makefile.am
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
modified: ../orte/test/system/ofi_conduit_stress.c
Removed stale include directive
modified: ../orte/mca/rml/ofi/Makefile.am
The ofi plugin supports multiple providers, and identifies them
by ofi_prov_id, changed the previous name conduit_id to ofi_prov_id
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_request.h
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Adding ofi plugin to allow for opening a conduit to use ethernet/fabric.
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/base/rml_base_stubs.c
deleted: ../orte/mca/rml/ofi/.opal_ignore
modified: ../orte/mca/rml/ofi/Makefile.am
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
modified: ../orte/test/system/ofi_conduit_stress.c
Removed stale include directive
modified: ../orte/mca/rml/ofi/Makefile.am
Fixed merge issues, and minor pull-request comments
modified: ../orte/mca/rml/base/base.h
modified: ../orte/mca/rml/base/rml_base_frame.c
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Removed trailing space
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Cleaned up test- ofi_conduit_stress.c
modified: ../orte/test/system/ofi_conduit_stress.c
cleaned up printing the provider info during initialisation
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Fixing warnings
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
minor cleanup
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
more cleanup
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Sending the ethernet address only in the get_contact_info, rest will be sent through modex
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Adding error logging on failures
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Handling the OPAL_MODEX_SEND/RECV generically for all ofi providers.
modified: ../orte/mca/rml/ofi/rml_ofi.h
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
modified: ../orte/mca/rml/ofi/rml_ofi_send.c
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Adding to build ofi for limited people
new file: ../orte/mca/rml/ofi/.opal_ignore
new file: ../orte/mca/rml/ofi/.opal_unignore
Signed-off-by: Anandhi S Jayakumar <anandhi.s.jayakumar@intel.com>
Removign the error logging for now
modified: ../orte/mca/rml/ofi/rml_ofi_component.c
Do not call mpi comm_dup() if mpi failed to initialize. Also do not set
signal handlers.
Small code styling fixes.
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
the class system can be initialized/finalized as many times as we like,
so there is no more need to have opal_class_finalize() invoked in a destructor
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
libnl and libnl-3 are known to conflict with each other, so detect
and abort if these two libs are both used directly (e.g. Open MPI
uses libnl-3) or indirectly (e.g. libibverbs.so might depend on libnl)
recvreq->req_recv.req_base.req_type should always be set before invoking
MCA_PML_OB1_RECV_REQUEST_INIT(recvreq, ...) otherwise, the previous type
might be set, and you could end up with MPC_PML_REQUEST_IMPROBE when
MCA_PML_REQUEST_RECV is expected.
Thanks Chris Pattison for the report and test case.
Fixesopen-mpi/ompi#2275
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
* If an error is detected internal to libnbc (e.g., PML truncation error)
this patch makes sure that the request is completed and the `MPI_ERROR`
field is set approprately.
* Make an attempt to cleanup outstanding requests before returning.
- This is a "best attempt" since not all PMLs support canceling requests.