1
1
Граф коммитов

4659 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
042ed95e4e Remove an annoying warning. If the user excludes a non-existent interface, there is no reason to warn - the interface may simply not exist on that node.
cmr=v1.7.4:reviewer=jsquyres:subject=Remove an annoying warning

This commit was SVN r30042.
2013-12-21 01:51:11 +00:00
Adrian Reber
53a70fe87f Trying to get the C/R code to compile again. (send_*_nb)
This patch changes all send/send_buffer occurrences in the C/R code
to send_nb/send_buffer_nb.
The new code compiles but does not work.

Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED

Changes from V2:
* just replace the blocking calls with the non-blocking calls
* all #ifdef's introduced in V1 are gone
* send_* returns error code or ORTE_SUCCESS (not the number of bytes)

This commit was SVN r30036.
2013-12-20 21:58:28 +00:00
Adrian Reber
a3813d37c7 Trying to get the C/R code to compile again. (recv_*_nb)
This patch changes all recv/recv_buffer occurrences in the C/R code
to recv_nb/recv_buffer_nb.
The old code is still there but disabled using ifdefs (ENABLE_FT_FIXED).
The new code compiles but does not work.

Changes from V1:
* #ifdef out the code (so it is preserved for later re-design)
* marked the broken C/R code with ENABLE_FT_FIXED

Changes from V2:
* only #ifdef out the code where the behaviour is changed
  (used to be blocking; now non-blocking)

This commit was SVN r30035.
2013-12-20 21:05:40 +00:00
Rolf vandeVaart
695d854cd8 Fix return value.
This commit was SVN r30034.
2013-12-20 20:57:04 +00:00
Ralph Castain
31248c0985 Correctly add support for the "env" MPI_Info key during comm_spawn, update the "map-by", "rank-by", and "bind-to" Info key behaviors to match the new mapping/ranking/binding system, and update all docs and comments to match.
Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node.

Refs trac:4003

This commit was SVN r30033.

The following Trac tickets were found above:
  Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003
2013-12-20 20:42:39 +00:00
Rolf vandeVaart
4cd1958deb Fix so we do not get warnings when running on system without CUDA software installed and CUDA-aware compiled in.
This commit was SVN r30032.
2013-12-20 20:39:25 +00:00
Dave Goodell
bd901a68ed usnic: fix 'fls' warnings+errors
The old version caused compilation errors on Solaris.  Thanks to Paul
Hargrove for testing and reporting the bug:

  http://www.open-mpi.org/community/lists/devel/2013/12/13520.php

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30025.
2013-12-20 17:37:22 +00:00
Ralph Castain
6959ba5577 Add missing include file.
Thanks to Paul Hargrove for spotting it.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29998.
2013-12-19 23:39:21 +00:00
Dave Goodell
0c6b292442 romio: pick "infinitely stale" fix from upstream
Some NFS scenarios can result in an infinite ESTALE return, which will
hang ROMIO.  This commit causes ROMIO to error out after a large number
of retries instead of spinning forever.

This is MPICH commit b250d338:

http://git.mpich.org/mpich.git/commit/b250d338e66667a8a1071a5f73a4151fd59f83b2

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r29993.
2013-12-19 22:55:26 +00:00
Ralph Castain
b745078535 Support user-provided envars for comm_spawn using info key "env"
Thanks to Tom Fogal for the request

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29990.
2013-12-19 20:59:50 +00:00
Jeff Squyres
3a14adef63 Remove the comments around these assignments; otherwise, we won't get
function pointers set to the _map functions, and we get segv's in MTT
testing (e.g., the C++ suite, which actually calls MPI_Cart_map and
MPI_Graph_map).

cmr=v1.7.4:reviewer=bosilca:subject=Fix topo _map function pointer assignments

This commit was SVN r29988.
2013-12-19 20:41:32 +00:00
Yossi Etigin
6ab4aba9e6 Fix missing include of show_help.h in mtl mxm.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29987.
2013-12-19 19:37:21 +00:00
Tom Naughton
3aefca32b0 + update rte db_fetch comments with change from r29931
This commit was SVN r29971.

The following SVN revision numbers were found above:
  r29931 --> open-mpi/ompi@0995a6f3b9
2013-12-19 01:16:58 +00:00
Jeff Squyres
bb59b07321 Remove CFLAGS setting that was really only intended for the v1.6
branch (it's not necessary on trunk/v1.7 because they require C99,
which allows variadic macros).

Also fix another compiler warning (using %p to print a (void*)).

Submitted by Jeff, reviewed by Dave.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=two usnic BTL fixes

This commit was SVN r29966.
2013-12-19 00:19:05 +00:00
Nathan Hjelm
b9765a380f Update NEWS with new MPI-3 features and a note about the new ROMIO
version.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29965.
2013-12-19 00:16:07 +00:00
Jeff Squyres
515fd00411 CSCul95082: DMAR faults during mtt testing
usnic_channel_finalize() was deregistering recv buffers before
destroying the QP to which they were posted. The QP needs to be
destroyed first so that the NIC does not attemp tto write to
deregistered memory, causing the DMAR messages.

Submitted by Reese, reviewed by Jeff.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29963.
2013-12-19 00:01:35 +00:00
Yossi Etigin
ecfb122c97 Fix segfault in osc pt2pt completion handler, when the request is canceled during finalization.
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29938.
2013-12-17 17:30:14 +00:00
Ralph Castain
0995a6f3b9 Revert r29917 and replace it with a fix that resolves the thread deadlock while retaining the desired debug info. In an earlier commit, we had changed the modex accordingly:
* automatically retrieve the hostname (and all RTE info) for all procs during MPI_Init if nprocs < cutoff

* if nprocs > cutoff, retrieve the hostname (and all RTE info) for a proc upon the first call to modex_recv for that proc. This would provide the hostname for debugging purposes as we only report errors on messages, and so we must have called modex_recv to get the endpoint info

* BTLs are not to call modex_recv until they need the endpoint info for first message - i.e., not during add_procs so we don't call it for every process in the job, but only those with whom we communicate

My understanding is that only some BTLs have been modified to meet that third requirement, but those include the Cray ones where jobs are big enough that launch times were becoming an issue. Other BTLs would hopefully be modified as time went on and interest in using them at scale arose. Meantime, those BTLs would call modex_recv on every proc, and we would therefore be no worse than the prior behavior.

This commit revises the MPI-RTE interface to pass the ompi_proc_t instead of the ompi_process_name_t for the proc so that the hostname can be easily inserted. I have advised the ORNL folks of the change.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix thread deadlock

This commit was SVN r29931.

The following SVN revision numbers were found above:
  r29917 --> open-mpi/ompi@1a972e2c9d
2013-12-17 03:26:00 +00:00
Adrian Reber
b42aad44a3 Trying to get the C/R code to compile again. This patch
includes various fixes all over the C/R code which are
hard to group like the other patches.

Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values

Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)

This commit was SVN r29922.
2013-12-16 15:35:28 +00:00
George Bosilca
1a972e2c9d Don't be greedy, just do what we asked for.
This commit was SVN r29917.
2013-12-15 16:54:01 +00:00
George Bosilca
430a13719f Only if OMPI_BTL_SM_HAVE_CMA is set to 1.
This commit was SVN r29916.
2013-12-15 16:49:27 +00:00
Jeff Squyres
0ab48ad0d2 Fix some annoying flex warnings that have been there for years.
Many thanks to Tom Fogal for the initial patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings

This commit was SVN r29904.
2013-12-14 00:36:12 +00:00
Rolf vandeVaart
b955dbd6d9 Fix various items discovered by review of ticket #3951.
This commit was SVN r29900.
2013-12-13 21:25:07 +00:00
Jeff Squyres
f4afa4fd1f Add missing include, exposed in "external libevent" work.
Refs trac:3694

This commit was SVN r29898.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:21:30 +00:00
Jeff Squyres
bcfe2156d5 Bring over m4 quoting fix from v1.7 branch (in r29894) that was
discovered when removing some components.

This commit was SVN r29895.

The following SVN revision numbers were found above:
  r29894 --> open-mpi/ompi@58ed00296c
2013-12-13 20:27:33 +00:00
Ralph Castain
f763be26c4 Closes trac:2433. Check for hetero architecture and disqualify sm connections if that is found as the sm btl currently doesn't support hetero operations.
cmr=v1.7.4:reviewer=brbarret:subject=Disqualify sm btl for hetero procs

This commit was SVN r29882.

The following Trac tickets were found above:
  Ticket 2433 --> https://svn.open-mpi.org/trac/ompi/ticket/2433
2013-12-13 15:23:33 +00:00
Mike Dubman
fb3f94a16e remove debug print
Refs trac:3969

This commit was SVN r29876.

The following Trac tickets were found above:
  Ticket 3969 --> https://svn.open-mpi.org/trac/ompi/ticket/3969
2013-12-13 06:08:44 +00:00
Mike Dubman
21be95c9b5 Initialize sm global variables in mca_btl_sm_component_open(), because they are destructed in mca_btl_sm_component_close(), and init() function might not be called or fail.
For exammple, mca_btl_sm.knem_fd remained 0, and mca_btl_sm_component_close() ended up doing closing fd 0 which belongs to someone else.

fixed by Yossi, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29875.
2013-12-13 06:01:24 +00:00
Jeff Squyres
bac67e0d81 Per discussion @Chicago OMPI dev meeting Dec 2013: remove all MX support.
This commit was SVN r29873.
2013-12-12 18:54:47 +00:00
Nathan Hjelm
3262080391 Cleanup udcm structures to avoid issues with nesting structures with
flexible members.

UDCM is ready to go for 1.7.4 with this patch.

cmr=v1.7.4:ticket=3940

This commit was SVN r29861.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-12 05:24:37 +00:00
Nathan Hjelm
e0e94a6029 Fix warning caused by typo in r29815
This commit was SVN r29860.

The following SVN revision numbers were found above:
  r29815 --> open-mpi/ompi@d556b60b21
2013-12-11 21:45:39 +00:00
Nathan Hjelm
6ab69c758b Fix warnings in udcm.
cmr=v1.7.4:reviewer=rhc:ticket=3940

This commit was SVN r29859.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-11 21:40:06 +00:00
Rolf vandeVaart
3ae88f8a24 Ensure no fork support with GDR. CUDA-aware code only.
This commit was SVN r29854.
2013-12-10 18:08:53 +00:00
Rolf vandeVaart
1cc55f305f Add extra check for GDR. Adjust some names and replace opal_output with opal_show_help.
This commit was SVN r29853.
2013-12-10 16:04:08 +00:00
Jeff Squyres
0f61bb651e Technically, the PORT_ACTIVE is not a "bad" event.
Note that this event should never happen within a single OMPI job,
because OMPI will ignore usnic ports that are down.  The PORT_ACTIVE
event should only occur if a port ''was'' down and is now ''up''.  But
what the heck -- if we ever do get this event, it is harmless -- just
ignore it.

This commit was SVN r29852.
2013-12-09 20:45:55 +00:00
Edgar Gabriel
c253c2eec6 fix the condition for the lazy open of shared filepointers.
This commit was SVN r29850.
2013-12-09 19:37:21 +00:00
Mike Dubman
9a65e0d8c6 cosmetic fixed fpr hcol autotools
Refs: #3694

This commit was SVN r29841.
2013-12-08 09:45:13 +00:00
Mike Dubman
2e124454b4 cosmitic fix to remove redundant -lfca
use CPP extra flags var which propagated to coll/fca and scoll/fca
Refs: #3694

This commit was SVN r29832.
2013-12-07 15:00:54 +00:00
Jeff Squyres
3bd9c603ff Clean up variables used in configure with OPAL_VAR_SCOPE.
This is helpful in the work for #3694: ensure that many places that
eventually end up in configure don't overly-pollute the global shell
variable space (because debugging accidental shell variable pollution
can be a real pain).

Refs trac:3694

This commit was SVN r29830.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-06 23:40:34 +00:00
Rolf vandeVaart
d556b60b21 Chnage some CUDA configure code and macro names per review request by jsquyres in ticket #3880.
Functionally, nothing changes.

This commit was SVN r29815.
2013-12-06 14:35:10 +00:00
Nathan Hjelm
231ebb09c9 Update romio configury to remove a warning message.
cmr=v1.7.4:ticket=3158

This commit was SVN r29811.

The following Trac tickets were found above:
  Ticket 3158 --> https://svn.open-mpi.org/trac/ompi/ticket/3158
2013-12-06 00:12:35 +00:00
Dave Goodell
da26226e3c usnic: add some extra debug-build sanity checks
On the off chance that the PML is twiddling fields that it really
shouldn't be...

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29804.
2013-12-05 00:28:11 +00:00
Jeff Squyres
ba018b3603 Protect the container_of #define.
MOFED apparently has a /usr/include/infiniband/verbs.h that also
defines a (slightly different but fully compatible) container_of
macro.  So put proper #ifndef protection around our definition of
container_of.

Thanks to Rolf vandeVaart for pointing out the issue.

Reviewed by Dave Goodell.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29799.
2013-12-04 14:24:56 +00:00
Yossi Etigin
a913b00f89 mtl mxm: update configuration parsing api to mxm 2.1, drop
older version support (1.0 and 1.1), and cleanup the code.

reviewed by miked.

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29797.
2013-12-04 09:11:55 +00:00
Jeff Squyres
c74c1e86d3 Per suggestion from Paul Kapinos, report in BTL verbosity if a device
is skipped because it is too far away.

(see thread starting here:
http://www.open-mpi.org/community/lists/devel/2013/06/12470.php)

This commit was SVN r29790.
2013-12-03 22:44:11 +00:00
Rolf vandeVaart
218c05a4d1 Make sure synchronous copies are complete before moving the data.
This commit was SVN r29789.
2013-12-03 21:20:14 +00:00
Rolf vandeVaart
ab77435d9b Fix the CUDA-aware case where we are not sending any GPU data.
This commit was SVN r29788.
2013-12-03 20:25:58 +00:00
Devendar Bureddy
4554770ee4 hcol fixes
cmr=v1.7.4:reviewer=jladd

This commit was SVN r29787.
2013-12-03 20:21:40 +00:00
Nathan Hjelm
fe327d9859 udcm: cleanup code and improve the ack handling
Originally udcm acks used the immediate data to indicate which message was
being acknowleged. This data was (mysteriously) junk when using QLogic HCAs so I
updated udcm to use the source info (slid, qp, etc) to determine which message was being
acked. This works as long as we don't have two messages simultaneously in flight
to a particular peer and then loose the first of the two messages. The chances of this
happening are tiny. To fix this case I updated the udcm message header to include
a pointer to the in flight message. This pointer is then sent back to the sending
process to ack receipt.

cmr=v1.7.4:ticket=trac:3940

This commit was SVN r29775.

The following Trac tickets were found above:
  Ticket 3940 --> https://svn.open-mpi.org/trac/ompi/ticket/3940
2013-12-02 20:18:46 +00:00
Jeff Squyres
3a7af4ab40 Fix another clang warning: sendreq is undefined if proc==NULL.
cmr=v1.7.4:reviewer=hjelmn:subject=fix ob1 undefined sendreq value

This commit was SVN r29774.
2013-12-02 19:44:42 +00:00
Nathan Hjelm
fb0b0442c4 openib/connect: re-enable xrc support in the openib btl
This commit updates the udcm cpc to support xrc. The steps followed by udcm
mimic those in the removed xoob cpc. This update has been tested with both XRC
and RC.

Mellanox, this is intended to go into 1.7.4. Please review carefully and let
me know if there are any issues.

cmr=v1.7.4:reviewer=miked

This commit was SVN r29767.
2013-11-27 22:28:04 +00:00
George Bosilca
cb24277737 Restrict the usage of MPI_Type_extent only to receiving processes
(aka the root). This commit is based on a patch provided by Pierre 
Jolivet.
Fix all the output to match the failing MPI call.

This commit was SVN r29761.
2013-11-27 12:09:31 +00:00
Rolf vandeVaart
aa98b0333b Call function from function table. Discovered during static build.
This commit was SVN r29755.
2013-11-25 22:46:07 +00:00
Ralph Castain
ac9820c46f Link against common cuda library
Thanks to Jorg Bornschein for pointing it out

cmr=v1.7.4:reviewer=rolfv

This commit was SVN r29750.
2013-11-24 17:06:51 +00:00
George Bosilca
68268377af Fix an error message for the igather and the usage of the extent on
non non-root processes for the iscatter. Thanks to Pierre
Jolivet for the bug report and the patch.

This commit was SVN r29736.
2013-11-23 00:59:22 +00:00
Edgar Gabriel
4f425872be fix the streams used in opal_output in the sharedfp components.
This commit was SVN r29726.
2013-11-21 16:11:49 +00:00
Devendar Bureddy
4a311ae9fd continue search sorted openib device list if no btls found with nearest HCA.
cmr=v1.7.4:reviewer=jladd

This commit was SVN r29725.
2013-11-20 22:23:12 +00:00
Nathan Hjelm
24a7e7aa34 Add support for the udreg registration cache and dynamics on XE/XK/XC.
To support the new mpool two changes were made to the mpool infrastructure:

 1) Added an mpool flag to indicate that an mpool does not need the memory
    hooks to use the leave pinned protocols. This flag is checked in the
    mpool lookup.

 2) Add a mpool context to the base registration. This new member is used
    by the udreg mpool to store the udreg context associated with the
    particular registration. The new member will not break the ABI
    compatibility as the new member is only currently used by the udreg
    mpool.

Dynamics support for Cray systems makes use of the global rank provided by
orte to give the ugni library a unique rank for each process. Dynamics
support is not available under direct-launch (srun.)

cmr=v1.7.4

This commit was SVN r29719.
2013-11-18 04:58:37 +00:00
Jeff Squyres
5206e877be Help decrease conflicts between SVN trunk and Cisco git branch of OMPI v1.6 branch
This commit was SVN r29715.
2013-11-15 21:35:56 +00:00
Jeff Squyres
e6ed7c9f4d Avoid trivial "don't mix declarations and code" compiler warning
This commit was SVN r29714.
2013-11-15 21:31:10 +00:00
Rolf vandeVaart
92e6aaa808 Adjust a default value. Adjust some levels of verbosity and one more debug message.
This commit was SVN r29712.
2013-11-14 21:47:27 +00:00
Ralph Castain
7480beb7f0 Per request from Nathan, add an offset value to the job struct so we can construct a "global rank" that spans multiple jobs during dynamic launch operations. Store a new ORTE_DB_GLOBAL_RANK value for each process in the database, and ensure that we share our own value during connect_accept so both sides can see it.
This isn't being used yet - just enabling Nathan to do what he needs.

***** NOTE: any use of the OMPI_DB_GLOBAL_RANK database key must be protected by #ifdef OMPI_DB_GLOBAL_RANK as not all RTE's will define this key. *****

This commit was SVN r29708.
2013-11-14 17:01:43 +00:00
Ralph Castain
22e30a680d Given that the oob and xoob cpc's are no longer operable and haven't been since the OOB update, remove them to avoid confusion
cmr:v1.7.4:reviewer=hjelmn:subject=Remove stale cpcs from openib

This commit was SVN r29703.
2013-11-14 04:16:53 +00:00
Nathan Hjelm
6b3cf0c1ba Merge branch 'romio_refresh'
This commit was SVN r29695.
2013-11-13 21:02:55 +00:00
Rolf vandeVaart
4964a5e98b Per this RFC from October 8, 2013 and as discuessed in telecon.
http://www.open-mpi.org/community/lists/devel/2013/10/13072.php

Add support for pinning GPU Direct RDMA in openib BTL for better small message latency of GPU buffers. 
Note that none of this is compiled in unless CUDA-aware support is requested.

This commit was SVN r29680.
2013-11-13 13:22:39 +00:00
Jeff Squyres
98ff91cfeb Refs trac:3091
Gah!  The "device" variable isn't used at all in this loop (my eye
glossed over the next line and thought that "device" was used in the
free() statement, but it's actually "devices" -- not "device").

This commit was SVN r29665.

The following Trac tickets were found above:
  Ticket 3091 --> https://svn.open-mpi.org/trac/ompi/ticket/3091
2013-11-12 23:01:04 +00:00
Jeff Squyres
7cb31111a6 Refs trac:3901
Feedback from Dave's review.

This commit was SVN r29664.

The following Trac tickets were found above:
  Ticket 3901 --> https://svn.open-mpi.org/trac/ompi/ticket/3901
2013-11-12 22:51:20 +00:00
Ralph Castain
762400d559 Silence warning
Refs trac:3898

This commit was SVN r29659.

The following Trac tickets were found above:
  Ticket 3898 --> https://svn.open-mpi.org/trac/ompi/ticket/3898
2013-11-11 22:53:09 +00:00
Jeff Squyres
5a940f5ee7 Arrgh -- remove debugging printf.
This commit was SVN r29657.
2013-11-11 22:44:28 +00:00
Jeff Squyres
e20217eccc Expand the "btl_usnic" MPI_T enumeration to have strings of the form:
<usnic device name>,<eth device>,<ip address>/<CIDR prefix>

For example:

   usnic_0,eth4,10.1.0.15/16

This is just handy for mapping the usnic_X device back to the IP
network to which it corresponds.

This commit was SVN r29656.
2013-11-11 22:25:30 +00:00
Nathan Hjelm
6a331275d8 Set transfers as active before starting them.
cmr=v1.7.4:ticket=trac:3898

This commit was SVN r29654.

The following Trac tickets were found above:
  Ticket 3898 --> https://svn.open-mpi.org/trac/ompi/ticket/3898
2013-11-11 21:50:54 +00:00
Nathan Hjelm
3d3c29ae96 btl/scif: do not return resource busy if we started a connection attempt.
Resolves a hang when using scif for shared memory transfers. This is a
simple change and doesn't require a review.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29653.
2013-11-11 19:36:34 +00:00
Nathan Hjelm
b5ce72cc15 Set the modex as active before starting it. This resolves a hang in
MPI_Init() on comm-spawned processes.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29652.
2013-11-11 19:33:32 +00:00
Rolf vandeVaart
a6df7bc33a Fix issues reported in ticket #3877. Also added additional comments.
This commit was SVN r29641.
2013-11-07 20:44:47 +00:00
Rolf vandeVaart
2cf7c40ee5 Minor adjustments to error messages due to review of #3880.
This commit was SVN r29640.
2013-11-07 20:21:21 +00:00
Rolf vandeVaart
3290cde630 Various minor changes to bring smcuda up to date with sm.
This commit was SVN r29639.
2013-11-07 19:45:56 +00:00
Dave Goodell
82db913490 usnic: fix module_recv_buffers perf regression
Cisco v1.6 git commit 913ec6c and upstream trunk r29593 (segfault fix)
introduced a performance regression by inadvertently disabling the
`module_recv_buffers` functionality.  With those changes in place, the
`btl_usnic_recv.c` logic would end up mallocing a buffer that should
have otherwise come from a `module_recv_buffers` pool.  It also resulted
in a small, bounded memory leak (128 buffers at each power-of-two size
interval).

The new version just places the buffer after the free list item with a
flexible array member.  I bumped the pool to allocate all 128 elements
up front because the deferred allocation was modestly impacting IMB
Sendrecv performance at a few sizes.

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29631.

The following SVN revision numbers were found above:
  r29593 --> open-mpi/ompi@1ed9b8ff43
2013-11-07 01:27:31 +00:00
Vishwanath Venkatesan
d37a5faa20 Need not do aggregator selection for one process case
So adding a check for this corner case!

This commit was SVN r29622.
2013-11-06 21:05:26 +00:00
Brian Barrett
cf8de1ef0f Minor indent cleanup in init_query()
Only use Portals on communicators with more than one rank
Fix computation of number of children when using the hypercube tree

This commit was SVN r29616.
2013-11-06 15:21:09 +00:00
Jeff Squyres
e28261898d Per discussion on the devel list, rename the btl_usnic_devices MPI_T
state pvar to be btl_usnic (i.e., the best suggestion so far).

See http://www.open-mpi.org/community/lists/devel/2013/11/13188.php
for more detail.

This commit was SVN r29614.
2013-11-06 06:19:03 +00:00
Rolf vandeVaart
e46c0bb952 Fix one more space for consistent defines.
This commit was SVN r29607.
2013-11-05 15:31:49 +00:00
Rolf vandeVaart
64b3a24fec Fix CUDA-aware compile issues.
This commit was SVN r29606.
2013-11-05 14:46:58 +00:00
Rolf vandeVaart
e57795f097 Revert r29594. That was just plain wrong. Sorry about workday configure change.
This commit was SVN r29605.

The following SVN revision numbers were found above:
  r29594 --> open-mpi/ompi@ed7ddcd9c7
2013-11-05 14:45:56 +00:00
Rolf vandeVaart
ed7ddcd9c7 Fix CUDA-aware compile error introduces with r29581.
This commit was SVN r29594.

The following SVN revision numbers were found above:
  r29581 --> open-mpi/ompi@ee7510b025
2013-11-05 00:08:33 +00:00
Dave Goodell
1ed9b8ff43 usnic: fix segfault at finalize time
Without this commit, if you run IMB pingpong between two nodes with only
one usnic selected (e.g., via `--mca btl_usnic_if_include usnic_0`) then
the run will seem fine but will segfault at MPI_Finalize time.

This behavior has happened since Cisco v1.6 git commit ec7ddf8, upstream
trunk r29484, and upstream v1.7 r29507.

Root cause was that the free list element was being used as the recv
buffer instead of the data buffer associated with the element.  So the
reassembly code would stomp all over the free list element, which would
cause the destructor to explode when the free list attempted to clean up
all of its elements.  This surprisingly did not cause any other problems
until now.

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29593.

The following SVN revision numbers were found above:
  r29484 --> open-mpi/ompi@a6ed232a10
  r29507 --> open-mpi/ompi@790d269ce8
2013-11-04 22:52:14 +00:00
Dave Goodell
73a943492c usnic: pack via convertor on the fly
If we need to use a convertor, go back to stashing that convertor in the
frag and populating segments "on the fly" (in
ompi_btl_usnic_module_progress_sends).  Previously we would pack into a
chain of chunk segments at prepare_src time, unnecessarily consuming
additional memory.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29592.
2013-11-04 22:52:03 +00:00
Dave Goodell
71d0d73575 usnic: refactor callback invocation
This makes it a little easier to see what's happening with callbacks to
the PML.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29591.
2013-11-04 22:51:48 +00:00
Dave Goodell
4c791e21d2 usnic: add MSGDEBUG1_OUT/MSGDEBUG2_OUT macros
This includes suppressing picky-mode warnings about __VA_ARGS__, which
we know are supported by any compilers we care about.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29590.
2013-11-04 22:51:35 +00:00
Dave Goodell
825686a205 usnic: certain send frag members are immutable
Ensure that they never are touched by checking in their destructors.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29589.
2013-11-04 22:51:24 +00:00
Nathan Hjelm
c71125acfd Using MPI_* functions in iallreduce can cause comm-spawned processes to
crash. Update libnbc's iallreduce function to use ompi_* functions
instead.

cmr=v1.7.4:reviewer=brbarret

This commit was SVN r29582.
2013-11-01 16:45:54 +00:00
Rolf vandeVaart
ee7510b025 Remove redundant macro. This was from reviewed of earlier ticket.
Fixes trac:3878.  Reviewed by jsquyres.

This commit was SVN r29581.

The following Trac tickets were found above:
  Ticket 3878 --> https://svn.open-mpi.org/trac/ompi/ticket/3878
2013-11-01 12:19:40 +00:00
Rolf vandeVaart
99f9fdee01 Fix corner case involving threads and CUDA-aware support.
This commit was SVN r29579.
2013-10-31 20:53:46 +00:00
Nathan Hjelm
fd25b7af01 Fix common ugni Makefile.am for non-DSO builds.
This commit was SVN r29571.
2013-10-30 19:37:14 +00:00
Nathan Hjelm
a31e617d17 Remove outdated comments in coll_basic_reduce_scatter.c.
Refs trac:1559

This commit was SVN r29566.

The following Trac tickets were found above:
  Ticket 1559 --> https://svn.open-mpi.org/trac/ompi/ticket/1559
2013-10-30 16:20:20 +00:00
Mike Dubman
b0e64427a9 ompi/mca/btl/openib: Fix memory leak and accessing free'd memory issues
Let imagine that we have two btls in btl_openib_component_init() both points to the same openib_btl->device and as a result have the same openib_btl->device->endpoints array.

Finalization phase calls twice mca_btl_openib_finalize()->mca_btl_openib_finalize_resources().
mca_btl_openib_finalize_resources() frees endpoint related btl. But the second call of mca_btl_openib_finalize_resources() checks endpoint that is released by previus call.

fixed by Igor, reviewed by miked/vasily
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29563.
2013-10-30 11:47:49 +00:00
Nathan Hjelm
167d5613db Do not do arithmetic with void * in basic neighborhood alltoall[vw].
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29558.
2013-10-29 20:02:13 +00:00
Jeff Squyres
6569019b06 Move all usNIC stats to _stats.c|h and export them as MPI_T pvars.
This commit moves all the module stats into their own struct so that
the stats only need to appear as a single line in the module_t
definition, and then moves all the logic for reporting the stats into
btl_usnic_stats.c|h.

Further, the stats are now exported as MPI_T_BIND_NO_OBJECT entities
(i.e., not bound to any particular MPI handle), and are marked as
READONLY and CONTINUOUS.  They currently all default to verbose level
5 ("Application tuner / detailed", according to
https://svn.open-mpi.org/trac/ompi/wiki/MCAParamLevels).

Most of the statistics are counters, but a small number are high
watermark values.  Due to how counters are reported via MPI_T, none of
the counters are exported through MPI_T if the MCA param
btl_usnic_stats_relative=1 (i.e., the module resets the stats back to
zero at a given frequency).

When MPI_T_pvar_handle_alloc() is invoked on any of these pvars, it
will return a count that is equal to the number of active usnic BTL
modules.  The values returned for any given pvar (e.g.,
num_total_sends) are an array containing one value for each active
usnic BTL module.  The ordering of values in the array is both
consistent across all usnic pvars and stable throughout a single job:
array slot 0 corresponds to module X, array slot 1 corresponds to
module Y, etc.

Mapping which array slot corresponds to which underlying Linux usnic_X
device works as follows:

 * The btl_usnic_devices MPI_T state pvar is associated with a
   btl_usnic_device MPI_T enum, and be obtained via
   MPI_T_pvar_get_info().
 * If all usNIC pvars are of length N, the values [0,N) in the
   btl_usnic_device enum are associated with strings of the
   corresponding underlying Linux device.

For exampe, to look up which Linux device is reported in all usNIC
pvars' array slot 1, look up the int value 1 in the btl_usnic_devices
enum.  Its corresponding string value is underlying Linux device name
(e.g., "usnic_1").

cmr=v1.7.4:subject="usnic BTL MPI_T pvars"

This commit was SVN r29545.
2013-10-28 22:23:08 +00:00
Nathan Hjelm
404cceb9c4 Always check the return of [mc]alloc and fix a warning introduced by
r29479.

This fixes some issues reported awhile ago in the openib btl. There
are a couple more unchecked mallocs but they are a bit more difficult
to fix since they are in void functions (btl_openib_endpoint.c).

Refs trac:2401.

cmr=v1.7.4:reviewer=miked

This commit was SVN r29543.

The following SVN revision numbers were found above:
  r29479 --> open-mpi/ompi@d6ead2a3a5

The following Trac tickets were found above:
  Ticket 2401 --> https://svn.open-mpi.org/trac/ompi/ticket/2401
2013-10-28 20:04:49 +00:00
Nathan Hjelm
b202bb0d63 Fix the recursive halfing algorithms for reduce scatter in both basic
and tuned to correctly handle 0 recvcounts.

Tested with the reproducer from #1550.

Refs trac:1559

This commit was SVN r29542.

The following Trac tickets were found above:
  Ticket 1559 --> https://svn.open-mpi.org/trac/ompi/ticket/1559
2013-10-28 19:06:38 +00:00
Rolf vandeVaart
fa5d20a5ec Add optimization that can be used when CUDA 6.0 comes out. Use new pointer attribute.
This commit was SVN r29514.
2013-10-24 21:17:58 +00:00
Rolf vandeVaart
628a109a74 Make casting very clear.
This commit was SVN r29511.
2013-10-24 14:40:01 +00:00
Rolf vandeVaart
5687e4387d Fix compiler warning.
This commit was SVN r29510.
2013-10-24 13:11:12 +00:00
Nathan Hjelm
26f3a029d3 Fix scif configury.
cmr=v1.7.4:ticket=3862

This commit was SVN r29493.

The following Trac tickets were found above:
  Ticket 3862 --> https://svn.open-mpi.org/trac/ompi/ticket/3862
2013-10-23 17:04:20 +00:00
Nathan Hjelm
6186b5ed9d Remove extra file that made its way into r29490.
cmr=v1.7.4:ticket=3862

This commit was SVN r29491.

The following SVN revision numbers were found above:
  r29490 --> open-mpi/ompi@cde3b05ed3

The following Trac tickets were found above:
  Ticket 3862 --> https://svn.open-mpi.org/trac/ompi/ticket/3862
2013-10-23 16:17:51 +00:00
Nathan Hjelm
cde3b05ed3 Add support for the Intel scif interface.
Depends on #3847.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29490.
2013-10-23 15:59:14 +00:00
Dave Goodell
e9dbb66e58 mpool/rdma: fix memory leak at module finalize
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29487.
2013-10-23 15:51:55 +00:00
Dave Goodell
647e5a6fd2 rcache/vma: fix module finalization memory leaks
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29486.
2013-10-23 15:51:44 +00:00
Dave Goodell
d969cfa513 usnic: correctly clean up verbs resources
Due to deallocation ordering (and an entirely missed deallocation), we
were leaking modest amounts of memory inside libusnic_verbs.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29485.
2013-10-23 15:51:33 +00:00
Dave Goodell
a6ed232a10 usnic: fix several memory leaks
- some free lists simply were not being OBJ_DESTRUCTed, so they never
  freed their internal memory

- channel->recv_segs.ctx was being assigned in a way that got clobbered
  by ompi_free_list_init_new, so the cleanup code that relied on it
  being set never ran

- numerous other ".ctx" assignments were similarly ineffectual and were
  not being consumed, so I deleted them

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29484.
2013-10-23 15:51:22 +00:00
Dave Goodell
c9b2343982 usnic: add ompi_btl_usnic_component_debug helper
This new routine can be called in exceptional situations, either
conditionally in BTL code or from a debugger, to help with debugging in
cases where MSGDEBUG1/2 or stats logging are impractical but more detail
is needed.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29483.
2013-10-23 15:51:11 +00:00
Dave Goodell
d0b7d125b2 usnic: refactor usnic_stats_callback
Pull the bulk of the functionality out into a new routine,
ompi_btl_usnic_print_stats, which can be used in other debugging
contexts.  This also lets us eliminate the module->final_stats state
tracking.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

This commit was SVN r29482.
2013-10-23 15:50:57 +00:00
Jeff Squyres
0fb8edd720 Trivial comment change
This commit was SVN r29480.
2013-10-23 10:15:18 +00:00
Mike Dubman
d6ead2a3a5 Add support for routable ROCE where different subnet_id is a valid to proceed with MPI routing.
(can happen in the same LAN)
developed by vasily, reviewed by miked
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29479.
2013-10-23 06:08:54 +00:00
Rolf vandeVaart
3c916d55c9 Fix two issues pointed out in review of ticket #3870.
This commit was SVN r29473.
2013-10-22 17:28:12 +00:00
Mike Dubman
d27cffedb9 expand tabs to 4 spaces
cd ompi/mca/coll/fca
for i in *.[ch]; do expand -t 4 $i > koko && mv koko $i; done
Refs: #3799

This commit was SVN r29472.
2013-10-22 17:05:55 +00:00
Jeff Squyres
6714890244 paffinity.h is gone and won't be coming back.
This commit was SVN r29467.
2013-10-22 15:59:00 +00:00
Nathan Hjelm
280a89448f Make btl/vader valgrind safe.
cmr=v1.7.4:reviewer=samuel

This commit was SVN r29464.
2013-10-22 15:33:32 +00:00
Jeff Squyres
09fae6e62b Prefix DSO filenames with "lib" so that Automake doesn't complain.
Follow the convention established by the ompi/mca/common/sm tree and
prefix both the "install" and "no install" versions of the build with
"lib" so that Automake doesn't complain.  Differentiate the two by
adding a "_noinst" suffix to the "no install" version.

This commit was SVN r29462.
2013-10-22 13:16:33 +00:00
Rolf vandeVaart
0cd1e8dfd9 Add runtime support to turn off CUDA IPC support.
This commit was SVN r29444.
2013-10-16 16:48:18 +00:00
Rolf vandeVaart
9f83405c78 Fix one more corner case initialization issue.
This commit was SVN r29443.
2013-10-16 16:39:19 +00:00
Ralph Castain
24c811805f ****************************************************************
This change contains a non-mandatory modification
       of the MPI-RTE interface. Anyone wishing to support
       coprocessors such as the Xeon Phi may wish to add
       the required definition and underlying support
****************************************************************

Add locality support for coprocessors such as the Intel Xeon Phi.

Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host.

So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following:

1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board

2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions

3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future.

4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time.

5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored.

6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set.

cmr:v1.7.4:reviewer=hjelmn

This commit was SVN r29435.
2013-10-14 16:52:58 +00:00
Mike Dubman
5a7dff2d15 fix icc warning
fixed by Dinar, reviewed by miked
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29428.
2013-10-12 18:04:28 +00:00
Jeff Squyres
b5e2ae86ad Remove all of our "to-do" items from the README.txt.
This commit was SVN r29424.
2013-10-11 16:43:56 +00:00
Rolf vandeVaart
fbf143f3b4 Move another function that was missed in r29347.
This commit was SVN r29422.

The following SVN revision numbers were found above:
  r29347 --> open-mpi/ompi@ce61985503
2013-10-10 14:48:56 +00:00
Jeff Squyres
d9be19f011 Added shared library versions to those who were missing it.
The following common shared libraries did not have versioning:

 * ompi/common/ofacm
 * ompi/common/verbs
 * ompi/common/ugni

Additionally, we still had shared library versions in VERSION for the
following libraries, which no longer exist:

 * ompi/common/portals
 * opal/common/hwloc

This commit was SVN r29421.
2013-10-10 13:25:57 +00:00
Ralph Castain
9902748108 ***** THIS INCLUDES A SMALL CHANGE IN THE MPI-RTE INTERFACE *****
Fix two problems that surfaced when using direct launch under SLURM:

1. locally store our own data because some BTLs want to retrieve 
   it during add_procs rather than use what they have internally

2. cleanup MPI_Abort so it correctly passes the error status all
   the way down to the actual exit. When someone implemented the
   "abort_peers" API, they left out the error status. So we lost
   it at that point and *always* exited with a status of 1. This 
   forces a change to the API to include the status.

cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch

This commit was SVN r29405.
2013-10-08 18:37:59 +00:00
Jeff Squyres
66dadbe1e7 Per RFC, remove the udapl BTL.
This commit was SVN r29400.
2013-10-08 15:18:59 +00:00
Rolf vandeVaart
3bd02fbaf5 Add one more verbose debug output that prints when we are out of memory.
This commit was SVN r29378.
2013-10-04 18:56:06 +00:00
Rolf vandeVaart
66725f6973 Enable some CUDA-aware support on tcp btl. Only when configured in.
This commit was SVN r29364.
2013-10-04 12:50:16 +00:00
Ralph Castain
f4f2287958 Singletons currently start out by spawning an HNP - this is required solely in the cases where the singleton subsequently calls MPI_Comm_spawn or publishes port info without support from an external orte-server. In all other cases, the HNP is of no value and can actually be a detriment by creating additional overhead on the node. This is particularly concerning for async operations where processes may begin as singletons and then dynamically wireup to perform pt2pt communications.
So we now allow singletons to start on their own, only spawning an HNP when initiating an operation that actually requires it.

cmr:v1.7.4:reviewer=jsquyres

This commit was SVN r29354.
2013-10-04 02:58:26 +00:00
Rolf vandeVaart
4dd1c86b36 Add a few support functions for future features.
This commit was SVN r29353.
2013-10-03 21:06:17 +00:00
Rolf vandeVaart
ce61985503 Move registration function inside initial initialization function.
This commit was SVN r29347.
2013-10-03 14:14:42 +00:00
Nathan Hjelm
6232ef3bfb At coll_select time we can not check whether the communicator has a
virtual topology. Remove code checking for a virtual topology until
this flag is set before coll_select.

This commit was SVN r29344.
2013-10-03 03:37:46 +00:00
Nathan Hjelm
7bedf62dd8 Add basic algorithms for the remaining non-blocking collectives.
The algorithms are intended for MPI-3.0 compliance and are not
optimized. We should aim to add better algorithms in the future through
cheetah.

MPI_Iallreduce and MPI_Igatherv on intercommunicators are required for
MPI_Comm_idup support.

cmr=v1.7.4:reviewer=brbarret:ticket=trac:2715

This commit was SVN r29333.

The following Trac tickets were found above:
  Ticket 2715 --> https://svn.open-mpi.org/trac/ompi/ticket/2715
2013-10-02 14:26:23 +00:00
Mike Dubman
19748e6957 fix race condition which can happen on finalize
1. Change in rte api implementation: now comm_world used to do p2p. 
This allows to not worry about other comms being destroyed.

2. added a notification mechanism with a help of which runtime can say libhcoll that RTE api can not be used any longer. 
pass a pointer to a flag, and its size to libhcoll.
The flag changes when the RTE is no longer available. 
Currently this flag is just ompi_mpi_finalized global bool value.

cmr=v1.7.3:reviewer=jladd

This commit was SVN r29331.
2013-10-02 13:38:47 +00:00
Nathan Hjelm
4f12406436 Don't check for neighborhood collective routines on non-virtual topology communicators
This commit was SVN r29319.
2013-10-01 19:59:18 +00:00
Nathan Hjelm
f3d18028e5 Fix typo in uGNI prepare source that could cause incorrect results with
non-contiguous datatypes.

cmr=v1.7.3

This commit was SVN r29294.
2013-09-30 16:00:58 +00:00
Mike Dubman
9bf7578ff2 fix memory corruption
cmr:v1.7.3:reviewer=ompi-rm1.7

This commit was SVN r29293.
2013-09-30 06:18:12 +00:00
Ralph Castain
d565a76814 Do some cleanup of the way we handle modex data. Identify data that needs to be shared with peers in my job vs data that needs to be shared with non-peers - no point in sharing extra data. When we share data with some process(es) from another job, we cannot know in advance what info they have or lack, so we have to share everything just in case. This limits the optimization we can do for things like comm_spawn.
Create a new required key in the OMPI layer for retrieving a "node id" from the database. ALL RTE'S MUST DEFINE THIS KEY. This allows us to compute locality in the MPI layer, which is necessary when we do things like intercomm_create.

cmr:v1.7.4:reviewer=rhc:subject=Cleanup handling of modex data

This commit was SVN r29274.
2013-09-27 00:37:49 +00:00
Ralph Castain
bc92c260ca Add missing library dependency
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29273.
2013-09-27 00:08:43 +00:00
Dave Goodell
2c7975eb86 common_verbs: fix bad opal_output args
Spotted by Reese Faucette <rfaucett@cisco.com>.

cmr=v1.7.3

This commit was SVN r29267.
2013-09-26 21:59:00 +00:00
Nathan Hjelm
0b8fc13299 MPI-3.0: update C bindings with const and consistent use of [] for
arrays.

The MPI 3.0 standard added const to all in buffers in the C bindings. This
commit adds the const keyword and in most cases casts const away. We will
eventually should go through and update the various interfaces (coll, pml,
io, etc) to take the const keyword. The group, comm, win, and datatype
interfaces have been updated with const.

cmr=v1.7.4:ticket=trac:3785:reviewer=jsquyres

This commit was SVN r29266.

The following Trac tickets were found above:
  Ticket 3785 --> https://svn.open-mpi.org/trac/ompi/ticket/3785
2013-09-26 21:56:20 +00:00
Nathan Hjelm
c5596548b2 MPI-3: Add support for neighborhood collectives
Blocking versions are simple linear algorithms implemented in coll/basic. Non-
blocking versions are from libnbc 1.1.1. All algorithms have been tested with
simple test cases.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29265.
2013-09-26 21:55:08 +00:00
Dave Goodell
a42fa78da7 usnic: SEGV in OSU benchmarks
Prevent frag from being freed out from under us in the case
the PML callback routine calls usnic_free().  We accomplish this
by delaying decrement of sf_bytes_to_ack until after the callback is
performed, since sf_bytes_to_ack == 0 is condition of freeing the frag.

Fixes Cisco bug CSCuj45094.

Authored-by: Reese Faucette <rfaucett@cisco.com>

cmr=v1.7.3

This commit was SVN r29264.
2013-09-26 21:48:04 +00:00
Mike Dubman
7c6ff00da5 Add caching of FCA communicators
developed by Dinar, reviewed by miked/yossi.
cmr:v1.7.3:reviewer=jsquyres:subject=add caching of FCA communicators.

This commit was SVN r29256.
2013-09-26 17:48:07 +00:00
Rolf vandeVaart
d67e3077f5 Add a check for the CUDA 6.0 version of the cuda.h header file.
This commit was SVN r29250.
2013-09-26 12:46:06 +00:00
Joshua Ladd
82e092db1b Adding interface changes in hcoll component to support non-blocking collectives in libhcoll. This was added by Elena Elkina and reviewed by Josh Ladd.
cmr:v1.7.3:reviewer=jladd:subject=Add support for non-blocking collectives in hcoll

This commit was SVN r29244.
2013-09-25 16:14:59 +00:00
Ralph Castain
9aeba777fa Ensure we don't enter into an infinite loop looking for the PML modex key if it isn't present. The PMI implementation will load ALL modex keys when the first key is queried, so the hash db component can safely return "not found" if a subsequent key isn't present. The PML modex_recv needs to assume everything is okay if the modex recv fails to return a value.
cmr:v1.7.3:reviewer=jladd:subject=Prevent infinite loop when PML modex not found

This commit was SVN r29243.
2013-09-25 16:04:00 +00:00
Rolf vandeVaart
667c66941b Remove redundant (and possibly erroneous) alignment code in rcache. It is already handled by users of the rcache.
This was per RFC http://www.open-mpi.org/community/lists/devel/2013/09/12927.php and discussed in developers meeting.

This commit was SVN r29233.
2013-09-24 17:23:50 +00:00
Rolf vandeVaart
3b5e0736a3 Adjust verbosity levels upward.
This commit was SVN r29232.
2013-09-24 14:35:48 +00:00
Ralph Castain
34fbec1f49 Sadly, the connection priorities being defined at time of variable instantiation were being overridden just before registering the param. Thus, changes people made to the relative priority of the cpc methods were being lost. Fix it be removing the duplicate initializiation, letting the value defined at instantiation be the one actually used.
cmr:v1.7.4:reviewer=hjelmn

This commit was SVN r29212.
2013-09-19 19:45:00 +00:00
Rolf vandeVaart
804545278f Per discussion on devel list, delete unused registration cache.
http://www.open-mpi.org/community/lists/devel/2013/08/12803.php

This component was .ompi_ignored on December 17, 2006 by gleb.  Now, it is time for it go....

This commit was SVN r29209.
2013-09-18 21:22:34 +00:00
Rolf vandeVaart
c9a33fad83 Fix some tabs. Add optional messsage to dump. Some minor format change to dump function.
This commit was SVN r29208.
2013-09-18 21:08:15 +00:00
Mike Dubman
2a5c342587 Modifications that are necessary in order to meet latest libhcoll API.
cmr:v1.7.3:reviewer=jladd

This commit was SVN r29202.
2013-09-18 12:22:02 +00:00
Ralph Castain
865a7028f8 Per patch from George, with a few minor cleanups. Correctly address the complete exchange of required wireup information in Intercomm_create so all procs in the resulting communicator know how to talk to each other.
Refs trac:29166

This commit was SVN r29200.

The following Trac tickets were found above:
  Ticket 29166 --> https://svn.open-mpi.org/trac/ompi/ticket/29166
2013-09-18 02:01:30 +00:00
Ralph Castain
99611ac1d2 Revert r29166 in favor of a better solution from George
This commit was SVN r29199.

The following SVN revision numbers were found above:
  r29166 --> open-mpi/ompi@497c7e6abb
2013-09-18 01:41:26 +00:00
George Bosilca
55273f1c98 Cleanup spaces, nothing else.
This commit was SVN r29197.
2013-09-18 00:07:58 +00:00
George Bosilca
7b319a101d Fix the case where we build without Fortran support.
This commit was SVN r29194.
2013-09-17 20:45:46 +00:00
Nathan Hjelm
7929fb9dea Cleanup complex datatypes and update datatypes and operator code to use C99.
This commit changes the underlying opal complex datatypes to match the
C99 types: float _Complex, double _Complex, and long double _Complex. The
fortran and C++ types now are aliases to these basic types instead of
structure types. The operators in ompi/mca/op/base now work on only the
C99 types and the fortran types use these operators if the fortran type
matches a C complex type (this should almost always be the case.)

C99 is not is use in both the datatype and operator code and should make
the code both cleaner and much less fragile.

This commit was SVN r29193.
2013-09-17 17:49:42 +00:00
Rolf vandeVaart
440632b57f Add a function that will dump out the contents of the memory registration cache.
Useful for debugging any rcache issues.

This commit was SVN r29189.
2013-09-17 15:40:32 +00:00
Jeff Squyres
74d1278f48 btl_usnic_util.c:ompi_btl_usnic_util_abort() also passes in the strerror().
This commit was SVN r29188.
2013-09-17 12:35:51 +00:00
George Bosilca
5f686a90d0 Fix several issues regarding MPI_IN_PLACE and different flavors
of MPI_Alltoall.
 - add support for MPI_IN_PLACE in the self collective component.
 - fix the extent usage in the tuned collective component.
 - correctly use the peer counts instead of local - add support for MPI_IN_PLACE in the self collective component.
 - fix the extent usage in the tuned collective component.
 - correctly use the peer counts instead of local.
Thanks to Fujitsu for the patch.

This commit was SVN r29187.
2013-09-17 11:35:18 +00:00
Reese Faucette
8f235e6977 usnic: wrong SG entry used to compute length for small put()s
This commit was SVN r29186.
2013-09-17 08:18:02 +00:00
Reese Faucette
651d61f1a3 Clean up debugging logging a bit.
MSGDEBUG2 now means "print a one-liner for all PML calls into BTL, and
also when BTL calls PML with a recv completion (not send completions)"
MSGDEBUG1 means print more internal gory detail
MSGDEBUG is gone, replaced by MSGDEBUG1

In the process also found that PUT_DEST style fragments could
potentially be leaked in usnic_free() since send_fragment tests were
being applied to see if it was eligible to be freed.

This commit was SVN r29185.
2013-09-17 07:29:40 +00:00
Reese Faucette
f35d9b50e3 Cisco CSCuj22803: fixes for Bsend
changes required to support MPI_Bsend().  Introduces concept of
attaching a buffer to a large segment that the PML can scribble into and
we will send from.  The reason we don't use a pinned buffer and send
directly from that is that usnic_verbs does not (yes) support num_sge>1
for regular sends.  This means the data gets copied twice, but that is
unavoidable.

changed the logic in handle_large_send to be more sensible

Incorporated David's review comments

This commit was SVN r29184.
2013-09-17 07:27:39 +00:00
Reese Faucette
25b5c84d0f Cisco CSCuj13135: Data corruption in MPI_Bsend_ator_c
Do not assume that the "size" passed to alloc_send() will be the same as
the size of the message the resulting fragment will hold when
usnic_send() is called.  This means usnic_send()/usnic_put() can never
trust any pre-computed size values, and are only allowed to look at the
lengths and pointers of the elements in the desc SG list.

This commit was SVN r29183.
2013-09-17 07:25:05 +00:00
Reese Faucette
b9103c0f66 Cisco CSCuj12524: c_put_big segfault
- usnic_free() cannot free the fragment until ACK is received

This commit was SVN r29182.
2013-09-17 07:23:15 +00:00
Reese Faucette
89b5f0899b Cisco CSCuj12520: various problems running c_fence_put_1
- tag needs to be sent in *our* header, not the PML header
- usnic_alloc() should return smaller value if too much data requested
- be careful about callbacks vs removing items from lists
  (we need to remove from outr lists *before* the callback)
- improve send callback handling
- add some more MSGDEBUG2 logging and cleanup

This commit was SVN r29181.
2013-09-17 07:20:44 +00:00
Mike Dubman
432c10750a enable mxm2 from np>0
reviewed by yossi
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29178.
2013-09-16 12:36:28 +00:00
Mike Dubman
44bfa95553 enable mxm2 by default on np>=0
reviewed by yossi
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29177.
2013-09-16 12:32:29 +00:00
Ralph Castain
52caa75552 Gar - forgot to commit a few more cleanups
Refs trac:3696

This commit was SVN r29168.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-09-15 15:32:01 +00:00
Ralph Castain
b64c8dafd8 Cleanup some errors in pubsub - must set the active flag before posting the recv in case the message has already arrived
Refs trac:3696

This commit was SVN r29167.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-09-15 15:26:32 +00:00
Ralph Castain
497c7e6abb Fixes trac:2904
The intercomm "merge" function can create a linkage between procs that was not reflected anywhere in a modex, and so at least some of the procs in the resulting communicator don't know how to talk to some of the new communicator's peers.

For example, consider the case where:

1. parent job A comm_spawns a process (job B) - these processes exchange modex and can communicate

2. parent job A now comm_spawns another process (job C) - again, these can communicate, but the proc in C knows nothing of B

3. do an intercomm merge across the communicators created by the two comm_spawns. This puts B and C into the same communicator, but they know nothing about how to talk to each other as they were not involved in any exchange of contact info. Hence, collectives on that communicator now fail. 

This fix adds an API to the ompi/dpm framework that (a) exchanges the modex info across the procs in the merge to ensure all procs know how to communicate, and (b) calls add_procs to give the btl's a chance to select transports to any new procs.

cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29166.

The following Trac tickets were found above:
  Ticket 2904 --> https://svn.open-mpi.org/trac/ompi/ticket/2904
2013-09-15 15:00:40 +00:00
Rolf vandeVaart
096b8c022e Also add flag to debug output.
This commit was SVN r29163.
2013-09-13 19:47:05 +00:00
Rolf vandeVaart
c15b2a26b8 Fix some formatting. Move some CUDA-aware mca parameter initialization earlier.
This commit was SVN r29162.
2013-09-13 17:43:41 +00:00
Rolf vandeVaart
d247c26b84 In the case that HAVE_IBV_FORK_INIT is not defined, we will need this variable so we can give the user an error if they ask for it. Also fixes compile error when HAVE_IBV_FORK_INIT is not defined.
This commit was SVN r29160.
2013-09-13 14:38:49 +00:00
Rolf vandeVaart
ba9ec1b8bc For debug builds, add the ability to view memory registrations and deregistrations in the openib BTL.
This commit was SVN r29159.
2013-09-13 14:28:26 +00:00
Joshua Ladd
b3f88c4a1d Per the RFC schedule, this commit adds Mellanox OpenSHMEM to the trunk. It does not yet run on OSX or with CM PML for an MTL other than MXM. Mellanox is aware of these issues and is in the process of resolving them. This should be added to \ncmr=v1.7.4:subject=Move OSHMEM to 1.7.4:reviewer=rhc
This commit was SVN r29153.
2013-09-10 15:34:09 +00:00
Jeff Squyres
c9f05a2664 Delineate OMPI_FREE_LIST_*_MT separately.
The FREE_LIST_*_MT stuff was introduced on the SVN trunk in r28722
(2013-07-04), but so far, has not been merged into the v1.7 branch yet
(2013-09-06).  So put it in its own #ifdef, rather than defining it
based on OMPI_MAJOR_VERSION/OMPI_MINOR_VERSION.

This commit was SVN r29148.

The following SVN revision numbers were found above:
  r28722 --> open-mpi/ompi@c9e5ab9ed1
2013-09-06 19:22:56 +00:00
Jeff Squyres
e02cc0a7ec No need for this header file.
This commit was SVN r29147.
2013-09-06 19:22:28 +00:00
Jeff Squyres
c53b0890cf Ensure that btl_usnic_compat.h is in the tarball.
This commit was SVN r29140.
2013-09-06 15:53:56 +00:00
Dave Goodell
75fa28c303 usnic: v1.6<->trunk unification, trunk side
The Cisco-maintained v1.6 port of the usnic BTL has diverged from the
upstream trunk and v1.7 branches.  This commit adjusts the trunk to more
closely match the v1.6 branch to simplify future merging and
cherry-picking.

The usnic MCA parameters also need work on this side.

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29138.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:21:34 +00:00
Dave Goodell
a669bd01e6 usnic: revamp convertor handling.
The fix for the HPL SEGV was incorrect because it assumed the
prepare_src() routine was always allowed to return "bytes processed"
less than the requested "bytes to send".  It turns out this is only true
if the convertor is what limits the size, we are not allowed to limit
the data sent for our own reasons, else we break login in the upper
layers.

This means we need to learn the number of bytes out of the size
requested the convertor will give us, no matter how big the size is.
Unfortunately, this is a destructive test, and (currently) the only way to
learn that number is to actually have the convertor copy the data out into
buffers.

This change implements this, copying the entire data out into a chain of
send segments which are attached to the large send fragment.  Now we can
always return the proper size value to the PML.

Fixes Cisco bug CSCuj08024

Authored-by: Reese Faucette <rfaucett@cisco.com>

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29137.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:21:21 +00:00
Dave Goodell
0ef8336502 new bookkeeping code should return value indicating whether packet is good or not.
Authored-by: Reese Faucette <rfaucett@cisco.com>

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29136.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:19:32 +00:00
Dave Goodell
122890c2fd usnic: "bookeeping" --> "bookkeeping"
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29135.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:19:20 +00:00
Dave Goodell
0df6ed4acc usnic: squash warnings from perf improvements
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29134.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:19:08 +00:00
Dave Goodell
6dc54d372d usnic: Basket of performance changes including:
- round segment buffer allocation to cache-line
    - split some routines into an inline fast section and a called
      slower section
    - introduce receive fastpath in component_progress that:
        o returns immediately if there is a packet available on priority
          queue and fastpath is enabled
        o disables fastpath for 1 time after use to provide fairness to
          other processing
        o defers receive buffer posting
        o defers bookeeping for receive until next call
          to usnic_component_progress

Authored-by: Reese Faucette <rfaucett@cisco.com>

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29133.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:18:57 +00:00
Dave Goodell
9cab9777d9 usnic: properly destroy embedded small send frag
Without this, an `--enable-debug` build would hit an assertion in the
list code when run under valgrind with `--malloc-fill=0xff` or any other
case where malloc returned non-zeroed buffers.

Also allow the normal OBJ_ machinery to handle the constructor
invocation ordering for us instead of doing it by hand (which could have
led to future bugs).

Reviewed-by: jsquyres@cisco.com

cmr=v1.7.4

Depends on trunk functionality in r29095 and r29096.  Refs trac:3740,#3741.

This commit was SVN r29127.

The following SVN revision numbers were found above:
  r29095 --> open-mpi/ompi@d1b5940e97
  r29096 --> open-mpi/ompi@a552921171

The following Trac tickets were found above:
  Ticket 3740 --> https://svn.open-mpi.org/trac/ompi/ticket/3740
2013-09-04 20:59:12 +00:00
Jeff Squyres
f6619f8e9e Fix compile error in the heterogeneous case.
We've been forcing C99 compiler compliance for a while now, so use C99
syntax to keep the #if code tidy.

This commit was SVN r29101.
2013-08-31 12:56:08 +00:00
Brian Barrett
16a1166884 Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a
configure-time dynamic allocation of flags.  The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process.  Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.

This commit was SVN r29100.
2013-08-30 16:54:55 +00:00
Rolf vandeVaart
18962d296b This has bothered me for a while. Change MCA_BTL_TAG_BTL to MCA_BTL_TAG_IB. They are the same
value so this does not change anything.  (MCA_BTL_TAG_IB = MCA_BTL_TAG_BTL + 0).  This just makes it more correct.

This commit was SVN r29099.
2013-08-30 14:53:59 +00:00
Dave Goodell
c5a7e8a079 usnic: stomp format specifier warnings
The usnic BTL now builds cleanly under `--enable-picky` when `MSGDEBUG1`
is set.

Reviewed-by: jsquyres

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29097.
2013-08-29 23:24:14 +00:00
Ralph Castain
5d1fa4fa0e Silence warnings:
osc_pt2pt_data_move.c: In function 'ompi_osc_pt2pt_sendreq_recv_accum_long_cb':
osc_pt2pt_data_move.c:643:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
osc_rdma_data_move.c: In function 'ompi_osc_rdma_control_send_cb':
osc_rdma_data_move.c:1312:37: warning: variable 'header' set but not used [-Wunused-but-set-variable]

This commit was SVN r29092.
2013-08-29 20:56:36 +00:00
George Bosilca
305fa88d4b Remove two warnings from the SM BTL. The return code can be safely ignored
as the internals of the SM BTL will repost the fragment until the send operation
succesfully complete.

This commit was SVN r29077.
2013-08-28 06:36:01 +00:00
Dave Goodell
dd82bd3c19 usnic: fix invalid rfstart initialization
endpoint_rfstart was being initialized from a value which was not yet
set.  Also ensure that rfstart is a valid index in the range
0..WINDOW_SIZE-1, since it is used as the index into endpoint_rcvd_segs,
which has WINDOW_SIZE elements.

Without this change there is significant risk of memory corruption or
segfaults, resulting in hangs or crashes, if malloc ever returns us a
value >=WINDOW_SIZE (4096).  Right now we seem to be getting lucky that
the malloc is returning zero-pages to us when we are allocating endpoint
structures (possibly because the freelist performs a single large
allocation for all endpoints).

Fixes Cisco bug CSCui88781.

Reviewed-by: rfaucett@cisco.com
Reviewed-by: jsquyres@cisco.com

cmr=v1.7.3:reviewer=jsquyres

This commit was SVN r29075.
2013-08-27 22:43:20 +00:00
Nathan Hjelm
f5495ace48 coll/ml: update the coll_ml_enable_fragmentation variable to support the
option to autodetect whether fragmentation should be enabled

cmr=v1.7.3:ticket=trac:3717

This commit was SVN r29065.

The following Trac tickets were found above:
  Ticket 3717 --> https://svn.open-mpi.org/trac/ompi/ticket/3717
2013-08-27 16:36:54 +00:00
Ralph Castain
6d24b34940 Extend the dpm framework API to support persistent accept/connect operations:
* paccept - establish a persistent listening port for async connect requests

* pconnect - async connect to remote process that has posted a paccept port. Provides a timeout mechanism, and allows the underlying implementation to retry until timeout 

* pclose - shuts down a prior paccept posting

Includes example programs paccept.c and pconnect.c in orte/test/mpi. New MPI extension interfaces coming...

This commit was SVN r29063.
2013-08-23 18:02:50 +00:00
Rolf vandeVaart
96457df9bc Fix compile errors created from changeset 29058.
This commit was SVN r29061.
2013-08-22 18:25:23 +00:00
Jeff Squyres
63ac60864b Refs trac:3730
Turns out that AC_CHECK_DECLS is one of the "new style" Autoconf
macros that #defines the output to be 0 or 1 (vs. #define'ing or
#undef'ing it).  So don't check for "#if defined(..."; just check for
"#if ...".

This commit was SVN r29059.

The following Trac tickets were found above:
  Ticket 3730 --> https://svn.open-mpi.org/trac/ompi/ticket/3730
2013-08-22 17:44:20 +00:00
Ralph Castain
a200e4f865 As per the RFC, bring in the ORTE async progress code and the rewrite of OOB:
*** THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE ***

Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro.

***************************************************************************************

I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week.

The code is in  https://bitbucket.org/rhc/ompi-oob2


WHAT:    Rewrite of ORTE OOB

WHY:       Support asynchronous progress and a host of other features

WHEN:    Wed, August 21

SYNOPSIS:
The current OOB has served us well, but a number of limitations have been identified over the years. Specifically:

* it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code)

* we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface.

* the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients

* there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort

* only one transport (i.e., component) can be "active"


The revised OOB resolves these problems:

* async progress is used for all application processes, with the progress thread blocking in the event library

* each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on")

* multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC.

* a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions.

* opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object

* NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions

* obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel

* the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport

* routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active

* all blocking send/recv APIs have been removed. Everything operates asynchronously.


KNOWN LIMITATIONS:

* although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline

* the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker

* routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways

* obviously, not every error path has been tested nor necessarily covered

* determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when *all* transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost.

* reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways

* the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC

This commit was SVN r29058.
2013-08-22 16:37:40 +00:00