1
1

680 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
249e5e009f Fix knem support in both sm and vader 2014-11-19 11:33:02 -07:00
Nathan Hjelm
e03956e099 Update the scif and openib btls for the new btl interface
Other changes:
 - Remove the registration argument from prepare_src since it no
   longer is meant for RDMA buffers.

 - Additional cleanup and bugfixes.
2014-11-19 11:33:02 -07:00
Nathan Hjelm
ec33374339 btl: remove des_remote/des_remote_count from the mca_btl_base_descriptor_t
structure

This structure member was originally used to specify the remote segment
for an RDMA operation. Since the new btl interface no longer uses
desriptors for RDMA this member no longer has a purpose. In addition
to removing these members the local segment information has been
renamed to des_segments/des_segment_count.
2014-11-19 11:33:02 -07:00
Nathan Hjelm
2d381f800f Update the interface to provide a cleaner interface for RDMA operations.
The old BTL interface provided support for RDMA through the use of
the btl_prepare_src and btl_prepare_dst functions. These functions were
expected to prepare as much of the user buffer as possible for the RDMA
operation and return a descriptor. The descriptor contained segment
information on the prepared region. The btl user could then pass the
RDMA segment information to a remote peer. Once the peer received that
information it then packed it into a similar descriptor on the other
side that could then be passed into a single btl_put or btl_get
operation.

Changes:

 - Removed the btl_prepare_dst function. This reflects the fact that
   RDMA operations no longer depend on "prepared" descriptors.

 - Removed the btl_seg_size member. There is no need to btl's to
   subclass the mca_btl_base_segment_t class anymore.

...

Add more
2014-11-19 11:33:02 -07:00
Nathan Hjelm
cfbb9cba16 btl/vader: don't assume the address in the put/get segment is unmodified when
using knem

It is valid to modify the remote segment that will be used with the
btl put/get operations as long as the resulting address range falls in
the originally prepared segment. Vader should have been calculating the
offset of the remote address in the registered region. This commit
fixes this issue.
2014-11-12 10:12:52 -07:00
Gilles Gouaillardet
b088175705 btl/vader: fix a typo in mca_btl_vader_put_knem 2014-11-12 19:00:00 +09:00
Ralph Castain
780c93ee57 Per the PR and discussion on today's telecon, extend the process name definition as a two-field struct of uint32_t's down to the OPAL layer. This resolves issues created by prior commits that impacted both heterogeneous and SPARC support. This also simplifies the OMPI code base by removing the need for frequent memcpy's when transitioning between the OMPI/ORTE layers and OPAL.
We recognize that this means other users of OPAL will need to "wrap" the opal_process_name_t if they desire to abstract it in some fashion. This is regrettable, and we are looking at possible alternatives that might mitigate that requirement. Meantime, however, we have to put the needs of the OMPI community first, and are taking this step to restore hetero and SPARC support.
2014-11-11 17:00:42 -08:00
Gilles Gouaillardet
d2d7f39a4b btl/vader: use FRAG_ALLOC_USER when single_copy_mechanism is VADER_NONE 2014-11-10 17:02:45 +09:00
Howard Pritchard
5c08aa8552 enable ugni btl to work without disable-dlopen
There were mistakes in the Makefiles for the ugni btl and
mca/common/ugni that prevented the ugni btl from being
used unless one happened to set the --disable-dlopen option
on the config line.

This commit fixes this problem.
2014-11-09 15:19:47 -07:00
rolfv
022612c83b Missed a removal from previous commit 2014-11-07 11:08:41 -08:00
rolfv
cbb43d5ac3 Make sure initialization happens 2014-11-07 11:00:45 -08:00
George Bosilca
8da5dcc22e Don't release the provided opal_proc in the error path. 2014-11-06 08:42:23 -08:00
Gilles Gouaillardet
e269a52ac7 btl/openib: send openib modex with the PMIX_GLOBAL flag 2014-11-06 08:42:23 -08:00
Steve Wise
7316a88754 openib btl: add Soft iWARP device to the ini file
This enables IBM's software iWARP provider.  With this driver you can
run iWARP/RDMA over any ethernet NIC.  Useful for testing OMPI RDMA
logic without requiring an expensive RDMA adapter/infrastructure.

The Soft iWARP code is at: https://www.gitorious.org/softiwarp
2014-11-04 14:48:43 -08:00
Gilles Gouaillardet
76ee98c86a btl/scif: start the listening thread once only 2014-10-31 16:34:02 +09:00
Gilles Gouaillardet
b4e445afb5 btl/sm: fix a typo in the error message 2014-10-28 11:25:42 +09:00
rolfv
9134f48d4c Do not use sendi path with GPU buffer 2014-10-24 13:35:01 -07:00
Nathan Hjelm
d72fc7a05f btl/vader: more updates to the help messages 2014-10-23 08:48:54 -06:00
Gilles Gouaillardet
55a5c99ff0 btl/vader: fix typos in the help file 2014-10-23 19:28:09 +09:00
Nathan Hjelm
e1bc2de853 btl/vader: defensive programming: use an actual function for the dummy btl_get and btl_put 2014-10-22 14:57:55 -06:00
Nathan Hjelm
19fbe868b8 btl/sm: defensive programming: use an actual function for the dummy btl_get 2014-10-22 14:57:55 -06:00
Aurélien Bouteiller
f232e94c02 Merge branch 'master' of github.com:open-mpi/ompi 2014-10-22 16:56:06 -04:00
Aurélien Bouteiller
55e49470de Patch from Nathan outlined with a crash the mishandling of the case where CMA is requested but not available. 2014-10-22 16:55:18 -04:00
Nathan Hjelm
998e69a6fa btl/sm: add some protection for the use_knem = -1 case
Need to unset the dummy btl_get and remove the MCA_BTL_FLAGS_GET flag
if neither knem nor cma can be used.
2014-10-22 13:57:01 -06:00
Nathan Hjelm
d7c7bb3993 btl/sm: re-enable the use of CMA and knem
At some point we added a sanity check to the btl base to ensure that
the btl flags match the available functions (this prevents user's from
specifying get or put when no function exists). This check was
disabling get for the sm btl since at the time of the check there is
no btl_get function. The simplest fix is to set a dummy value to btl_get
that will be overwritten with the proper value on btl initialization.

Closes #239.
2014-10-22 13:30:27 -06:00
Jeff Squyres
ec4268b59c usnic: do not send zero-length modex message
If there are no usnic BTL modules, then just avoid sending any modex
message at all (other BTLs do this; it's safe to do).

The change is smaller than it looks: I added a "if 0 ==..." check at
the top to return immediately if there are no BTL modules.  Then I
removed some now-unnecessary conditionals and un-indented as
appropriate.

Fixes #248
2014-10-22 11:11:58 -07:00
Jeff Squyres
e415c8f9a8 vader: Remove stale comment 2014-10-22 10:32:33 -07:00
Jeff Squyres
c22e1ae33b configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros
These two macros set the prefix for the OPAL and ORTE libraries,
respectively.  Specifically, the OPAL library will be named
libPREFIXopen-pal.la and the ORTE library will be named
libPREFIXopen-rte.la.

These macros must be called, even if the prefix argument is empty.

The intent is that Open MPI will call these macros with an empty
prefix, but other projects (such as ORCM) will call these macros with
a non-empty prefix.  For example, ORCM libraries can be named
liborcm-open-pal.la and liborcm-open-rte.la.

This scheme is necessary to allow running Open MPI applications under
systems that use their own versions of ORTE and OPAL.  For example,
when running MPI applications under ORTE, if the ORTE and OPAL
libraries between OMPI and ORCM are not identical (which, because they
are released at different times, are likely to be different), we need
to ensure that the OMPI applications link against their ORTE and OPAL
libraries, but the ORCM executables link against their ORTE and OPAL
libraries.
2014-10-22 10:32:19 -07:00
Gilles Gouaillardet
75e8387a4e vader: vader_add_procs report the error if init_vader_endpoint fails 2014-10-22 19:11:54 +09:00
Nathan Hjelm
1a3734ae57 btl/vader: fix compilation on OS X 2014-10-21 09:27:36 -06:00
Gilles Gouaillardet
f56169cee6 btl/vader: silence warning
correctly check HAVE_SYS_PRCTL_H
2014-10-21 19:51:29 +09:00
Gilles Gouaillardet
d60f0cbd88 btl/vader: report an error when a segment cannot be attached 2014-10-21 10:42:22 +09:00
Nathan Hjelm
13643f5b6e btl/vader: improved single-copy support
This commit makes the folowing changes:

 - Add support for the knem single-copy mechanism. Initially vader will only
   support the synchronous copy mode. Asynchronous copy support may be added
   int the future.

 - Improve Linux cross memory attach (CMA) when using restrictive ptrace
   settings. This will allow Open MPI to use CMA without modifying the system
   settings to support ptrace attach (see /etc/sysctl.d/10-ptrace.conf).

 - Allow runtime selection of the single copy mechanism. The default behavior
   is to use the best available. The priority list of single-copy mehanisms is
   as follows: xpmem, cma, and knem.

 - Allow disabling support for kernel-assisted single copy.

 - Some tuning and bug fixes.
2014-10-20 11:44:52 -06:00
Aurélien Bouteiller
e3be1fb9a5 Quick pass over the sm-knem code, indent fixes 2014-10-17 10:38:35 -04:00
Jeff Squyres
43aff4d8b3 btl sm: error if knem support is requested and cannot be activated
Restore the functionality to error out (and show a helpful message) if
knem support is requested by is either not compiled in or cannot be
activated.

Thanks to Gus Correa for bringing the matter to our attention.
2014-10-16 20:01:26 -07:00
Jeff Squyres
b04a2634c6 btl sm: restore btl_sm_have_knem_support MCA param
Somehow, this MCA param was accidentally dropped after v1.6.5.  Thanks
to Gus Correa for bringing this matter to our attention.

Also moving some MCA params down from level 9 to levels 4/5.
2014-10-16 19:48:21 -07:00
George Bosilca
7541c03b4c Mark all instances where atomic operations are used but their return value is unnecessary 2014-10-15 21:47:32 -04:00
Jeff Squyres
51027a6635 usnic: fix minor typo
Change harmless-but-weird comma to semicolon.  Found during code
review.
2014-10-15 05:32:36 -07:00
Nathan Hjelm
a31cf3b740 btl/vader: missing include 2014-10-09 13:57:21 -06:00
Nathan Hjelm
9e0c07e4ce btl/ugni: improve the handling of eager get fragments when the btl runs out
of preregistered buffers

Before this change eager gets we retried on each progress loop. This commit
modifies the protocol to only retry eager gets when another eager get has
completed. This commit also cleans up some callback code that is no longer
needed.
2014-10-09 13:57:21 -06:00
Howard Pritchard
ebc368d26b remove GNI_RDMAMODE_FENCE bit in GNI_PostRdma
The GNI_RDMAMODE_FENCE bit was a left over from
async progress work that is not needed at this point
in the gni BTL.  Removing the bit also allows
for the removal of the GNI_CDM_MODE_BTE_SINGLE_CHANNEL
bit from the GNI_CdmCreate call.
2014-10-09 12:41:19 -06:00
Howard Pritchard
9947758d98 initial thread safety for ugni btl
This commit adds initial ugni thread safety support.
With this commit, sun thread tests (excepting MPI-2 RMA)
pass with various process counts and threads/process.
Also osu_latency_mt passes.
2014-10-08 10:13:22 -06:00
Ralph Castain
fd6a044b7f Cleanup some cruft resulting from the move of the btl's to opal. We had created the ability to delay modex operations, which included a need to delay retrieving hostname info for remote procs. This allowed us to not retrieve the modex info until first message unless required - the hostname is generally only required for debug and error messages.
Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.
2014-10-03 16:02:57 -06:00
Howard Pritchard
1df933ea27 remove ompi/runtime/params.h include in ugni btl
This commit was SVN r32813.
2014-09-29 19:26:33 +00:00
Nathan Hjelm
e0eb1f2e73 btl/vader: make vader registration lookup/caching thread safe
This commit was SVN r32798.
2014-09-25 22:24:06 +00:00
George Bosilca
53e012ae97 Fix typo.
This commit was SVN r32795.
2014-09-25 17:18:27 +00:00
Nathan Hjelm
aba87f3776 btl/vader:silence warning
This commit was SVN r32788.
2014-09-24 22:10:23 +00:00
Nathan Hjelm
79881ca892 btl/vader: prevent double-destruction of endpoints and move endpoint teardown code into destructor
This commit was SVN r32779.
2014-09-23 21:51:15 +00:00
Nathan Hjelm
2d8fba0861 btl/vader: silence warning
This commit was SVN r32778.
2014-09-23 21:33:45 +00:00
Nathan Hjelm
8bd3160432 btl/vader: fix several typos in vader update
This commit was SVN r32775.
2014-09-23 20:25:36 +00:00