1
1

160 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
cb7cc171f9 usnic: update README.txt notes
Update notes about copying the usnic BTL between master and the v1.8
branch.
2015-02-03 15:54:36 -08:00
Jeff Squyres
edf7232e00 usnic: enable building with an external libfabric 2015-02-03 13:46:06 -08:00
Jeff Squyres
bfa54d5d7b usnic: update to match new libfabric 2015-02-03 13:46:06 -08:00
Jeff Squyres
436223959d usnic: update to match new libfabric APIs 2015-01-24 05:49:36 -08:00
Jeff Squyres
65a279019e usnic: fix typo in memchecker usage 2015-01-16 09:42:19 -08:00
Jeff Squyres
d13c14ec82 CSCus22527: fix off-by-one error in checking the number of VFs
Ensure to count *this* process when checking for how many VFs we need
on the local server.

(cherry picked from commit 386c01934e98cb8dcb48ff648ecdfb0c8677baa9)
2015-01-15 11:44:29 -08:00
Jeff Squyres
e4e5e7dbc0 usnic: ensure to clean up nicely in case of low resources
If there are not enough resources (e.g., low VFs), we can end up
calling finalize_one_channel() on the same channel multiple times.  So
ensure to NULL out fields that we have freed already so that we do not
try to free them a second time.

Fixes CSCus26648.
2015-01-13 14:37:31 -08:00
Jeff Squyres
d00cede718 usnic: fix if_include/exclude of CIDR-specified networks
Fix the ordering so that we obtain the usnic netmask information
*before* we do the filtering based on CIDR-specified networks.

Also requires upstream Github libfabric commit 3976745.

Fixes CSCus22495.
2015-01-13 12:04:51 -08:00
Jeff Squyres
a220b92cf8 usnic: fix function name in opal_output 2015-01-13 12:04:07 -08:00
Jeff Squyres
5ed688a074 usnic: enusre that we only get "usnic"-named providers
Also, a minor update to a verbose message.
2015-01-12 13:21:22 -08:00
Jeff Squyres
881b1dcf19 usnic: document libfabric abstractions
Handy tips to remember the libfabric abstractions and what they
correspond to in usnic/VIC terms.
2015-01-09 15:21:51 -08:00
Gilles Gouaillardet
194d9f84d3 btl/usnic: move call to check_reg_mem_basics()
avoid annoying memlock related messages when there is no usnic device.
2015-01-09 11:37:45 +09:00
Dave Goodell
49069bc661 usnic: fix fi_av_insert (ARP resolution) bugs
We had several problems in the old code:

1. We were specifying an arbitrary timeout (100 ms) and then abandoning
   all remaining pending AV insert operations.  We would then free the
   endpoint buffer that we gave to fi_av_insert(), usually causing
   libfabric's progress thread to write to a freed buffer.

2. We were claiming in a show_help message that the timeout was
   controllable via an MCA parameter.  This commit removes that
   parameter, since there's no good method for us to specify a timeout
   like this to libfabric right now.

3. We also weren't waiting for the correct number of fi_av_insert()
   operations to complete.  We were waiting for nprocs, which is
   accidentally fine for 2 procs on separate hosts, but not for most
   other proc counts.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
2015-01-07 08:25:17 -08:00
Jeff Squyres
c621d1e622 libfabric: don't LIBADD the common library in the static case
Adding the libfabric common library in the --disable-dlopen case will
result in duplicate symbols.
2014-12-18 11:04:08 -08:00
Jeff Squyres
d6f059f538 configury: add some descriptive output messages in configure
Ensure that the ofi MTL and the usnic BTL have good descriptive output
messages in configure.
2014-12-17 13:36:01 -08:00
Jeff Squyres
95da4a5a0e usnic: no longer use opal_using_threads()
Instead, use the flag that is passed in.
2014-12-16 08:49:01 -08:00
Jeff Squyres
cd0a54d76f usnic: short term fix to enable builds on non-libfabric platforms
This isn't quite the Right fix yet, because it doesn't address usnic
for external libfabric builds.  I'll fix that separately / later.
2014-12-09 09:19:26 -08:00
Jeff Squyres
6e24a1eb85 usnic: update for libfabric API change
Use FI_ADDR_UNSPEC for posting a receive from an unspecified source.
2014-12-09 06:06:52 -08:00
Jeff Squyres
9547345b18 usnic: fix show_help message
Rename a few symbols to use libfabric-friendly names.  Fix a show_help
message when fi_av_insert times out.
2014-12-08 11:39:07 -08:00
Jeff Squyres
8e49cc754f usnic: update to latest libfabric API changes 2014-12-08 11:37:37 -08:00
Jeff Squyres
984982790a usnic: convert from verbs to libfabric (yay!)
This commit represents the conversion of the usnic BTL from verbs to
libfabric.

For the moment, libfabric is embedded in Open MPI (currently in the
usnic BTL).  This is because the libfabric API is still changing, and
also has not yet been released.  Ultimately, this embedded copy of
libfabric will likely disappear and the usnic BTL will rely on an
external installation of libfabric.

New configure options:

* --with-libfabric: will cause configure to fail if libfabric support
    cannot be built
* --without-libfabric: will prevent libfabric support from being built
* --with-libfabric=DIR: use an external libfabric installation
* --with-libfabric-libdir=LIBDIR: when paired with --with-libfabric=DIR,
    use LIBDIR for the libfabric installation library dir

The --with-libnl3[-libdir] arguments are now gone.
2014-12-08 11:37:37 -08:00
Nathan Hjelm
1b564f62bd Revert "Merge pull request #275 from hjelmn/btlmod"
This reverts commit ccaecf0fd6c862877e6a1e2643f95fa956c87769, reversing
changes made to 6a19bf85dde5306f559f09952cf3919d97f52502.
2014-11-19 23:22:43 -07:00
Nathan Hjelm
ec33374339 btl: remove des_remote/des_remote_count from the mca_btl_base_descriptor_t
structure

This structure member was originally used to specify the remote segment
for an RDMA operation. Since the new btl interface no longer uses
desriptors for RDMA this member no longer has a purpose. In addition
to removing these members the local segment information has been
renamed to des_segments/des_segment_count.
2014-11-19 11:33:02 -07:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Jeff Squyres
ec4268b59c usnic: do not send zero-length modex message
If there are no usnic BTL modules, then just avoid sending any modex
message at all (other BTLs do this; it's safe to do).

The change is smaller than it looks: I added a "if 0 ==..." check at
the top to return immediately if there are no BTL modules.  Then I
removed some now-unnecessary conditionals and un-indented as
appropriate.

Fixes #248
2014-10-22 11:11:58 -07:00
Jeff Squyres
c22e1ae33b configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros
These two macros set the prefix for the OPAL and ORTE libraries,
respectively.  Specifically, the OPAL library will be named
libPREFIXopen-pal.la and the ORTE library will be named
libPREFIXopen-rte.la.

These macros must be called, even if the prefix argument is empty.

The intent is that Open MPI will call these macros with an empty
prefix, but other projects (such as ORCM) will call these macros with
a non-empty prefix.  For example, ORCM libraries can be named
liborcm-open-pal.la and liborcm-open-rte.la.

This scheme is necessary to allow running Open MPI applications under
systems that use their own versions of ORTE and OPAL.  For example,
when running MPI applications under ORTE, if the ORTE and OPAL
libraries between OMPI and ORCM are not identical (which, because they
are released at different times, are likely to be different), we need
to ensure that the OMPI applications link against their ORTE and OPAL
libraries, but the ORCM executables link against their ORTE and OPAL
libraries.
2014-10-22 10:32:19 -07:00
Jeff Squyres
51027a6635 usnic: fix minor typo
Change harmless-but-weird comma to semicolon.  Found during code
review.
2014-10-15 05:32:36 -07:00
Ralph Castain
fd6a044b7f Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages.
Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.
2014-10-03 16:02:57 -06:00
Jeff Squyres
733316372b usnic: remove suggestion of enabling no-drop in the fabric
Reviewed by Reese Faucette

cmr=v1.8.3:reviewer=ompi-rm1.8

This commit was SVN r32628.
2014-08-28 23:56:56 +00:00
Jeff Squyres
b0dfb9f401 usnic: avoid a possible race condition
Per #4874, code review revealed a possible race condition in the
module struct and the connectivity agent.  Move the setup of the
connectivity agent listener until the module struct has been fully
setup.

This commit was SVN r32573.
2014-08-22 02:34:24 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Jeff Squyres
ac7c907f8d usnic: ensure to have a safe destruction of an opal_list_item_t
It turns out that we ''can'' get to the endpoint destructor with the
endpoint still on the "endpoints needing ACKs" list.  So if it's on
the list, remove it first, and then DESTRUCT the opal_list_item_t.

This prevents an assert() fail in debug builds.  We'd like to let this
soak over the weekend.

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r32546.
2014-08-15 21:52:36 +00:00
Jeff Squyres
1cdcb7290b usnic: no need to check before calling this function
This function is intentionally always safe to call -- no need for a
double redundant check.

This commit was SVN r32545.
2014-08-15 21:39:29 +00:00
Jeff Squyres
082ab15d19 usnic: increase the listen() backlog size
Rarely -- but it happens -- the connectivity client gets ECONNREFUSED
because the connectivity agent listen() backlog is too small.  Rather
than put in a loop on the client side, take the simple way out for
now: increase the backlog size to an arbitrarily-large number.

Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32543.
2014-08-15 19:12:18 +00:00
Jeff Squyres
9373d6420e usnic: when a module is finalized, "unlisten" the connectivity checker
Instead of waiting to destroy the connectivity agent during component
shutdown, have the module shutdown send an "unlisten" command to the
cagent that will tell it to stop listening on a given interface.

This commit was SVN r32536.
2014-08-15 00:52:43 +00:00
Jeff Squyres
6b592d3016 usnic: convert some BTL_ERRORs to more descriptive show_help messages
1. After we receive N abnormally-short messages (meaning: corrupted),
print a show_help message about it.  N defaults to 25.  N can be set
to 0 disable the message via btl_usnic_max_short_packets.
1. If we receive a completion error for something other than a
receive, display a show_help message.

Reviewed by Dave Goodell.

CMR'ing to v1.8.3, but it will require a custom patch because of the
OMPI->OPAL BTL move.

cmr=v1.8.3

This commit was SVN r32522.
2014-08-13 15:01:20 +00:00
Jeff Squyres
65767aff68 usnic: remove errant OMPI header file
This commit was SVN r32469.
2014-08-08 20:34:50 +00:00
Jeff Squyres
323b9f346c usnic: update connectivity checker help message
Show an example of using the btl_usnic_connectivity_map option.  Also,
mention that another reason for the "total connectivity failure" may
be due to asymmetric / unexpected routing.

Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32465.
2014-08-08 17:18:29 +00:00
Jeff Squyres
6bf28a6940 usnic: update help messages
These messages were committed in the v1.8 branch in r32341, but were
never committed to the trunk (because we were waiting for the OPAL BTL
move).  This commit brings the trunk and v1.8 help messages in line
with each other.

This commit was SVN r32445.

The following SVN revision numbers were found above:
  r32341 --> open-mpi/ompi@5e752b4aba
2014-08-07 20:50:29 +00:00
Jeff Squyres
70f5a10128 usnic: fix typo from r32438
This commit was SVN r32440.

The following SVN revision numbers were found above:
  r32438 --> open-mpi/ompi@d2e31ac647
2014-08-06 19:29:46 +00:00
Jeff Squyres
d2e31ac647 usnic: Fix connectivity checker pointer mismatch
Ensure that the connectivity checker agent only uses pointers from the
client that is the same process as the agent.

Not necessary for the v1.8 branch -- this is a trunk/v1.9-only problem.

This commit was SVN r32438.
2014-08-05 23:07:01 +00:00
Jeff Squyres
34897cee9f usnic: unify teardown between trunk and v1.8 branches
Make the del_procs, module finalize, and endpoint destructors be the
same between trunk and v1.8, with one exception: the very beginning of
v1.8 module_finalize calls del_procs for each proc to simulate/pretend
the trunk/v1.9 PML behavior of calling del_procs before module_finalize.

This commit was SVN r32437.
2014-08-05 22:31:55 +00:00
Jeff Squyres
1a8d72119f usnic: Fix configure.ac typo
This commit was SVN r32436.
2014-08-05 22:31:07 +00:00
Dave Goodell
13b104bdef usnic: fix endpoint destruction on the trunk
Fixes an assertion failure in --enable-debug builds and SEGVs in normal
builds.

I'm not 100% sure I like this model, but it at least seems to be
consistent.  Some variation on this scheme will need to be adapted to
the trunk, where usnic_del_procs() is called by the PML instead of
internally in usnic_finalize().

A related bug (but with different mechanics) is #4832.

This commit was SVN r32424.
2014-08-04 21:30:21 +00:00
Dave Goodell
490c484f8c usnic: fix uninitialized param to accept(2)
This commit was SVN r32423.
2014-08-04 21:30:08 +00:00
Dave Goodell
61a9b49d5b usnic: fix usnic breakage in ORCM repo
This commit was SVN r32416.
2014-08-04 19:34:55 +00:00
Jeff Squyres
ff4717b727 usnic: cagent now checks that incoming pings are expected
Previously, the connectivity agent was pretty dumb: it took whatever
pings it got and ACKed them.  Then we added an agent check to ensured
that the ping actually came from the source interface that it said it
came from.  Now we add another check such that when a ping is received
on interface X that corresponds to usnic module Y, we ensure that the
source interface of the ping is on the all_endpoints list for module Y
(i.e., module Y expects to be able to talk to that peer interface).

This detects cases where peers have come to different conclusions
about which interfaces should be used to communicate (which is bad!).
This usually reflects a network misconfiguration.

Fixes CSCuq05389.

This commit was SVN r32383.
2014-07-31 22:30:20 +00:00
Ralph Castain
db89071dc2 Cleanup the moved component's Makefile.am to use the opal instead of ompi directories
This commit was SVN r32370.
2014-07-31 04:41:04 +00:00
Jeff Squyres
959bdace3c usnic: check that connectivity pings came from where they said they came from
Ensure that incoming "ping" messages came from the IP address that
they think they came from.  If they don't, drop them (because it is
probably routing error), which will likely eventually cause the
connectivity checker to timeout, and therefore cause the job to abort.

This commit was SVN r32368.
2014-07-30 21:03:56 +00:00
Jeff Squyres
20349da03b usnic: minor cleanup
This commit was SVN r32367.
2014-07-30 20:56:49 +00:00