1
1
Граф коммитов

18416 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
49b5342130 After talking with Nathan, update some comments/documentation about
the new MCA var and pvar systems.

This commit was SVN r28913.
2013-07-22 20:34:42 +00:00
Nathan Hjelm
562cfd9630 Update README with information about uGNI and vader BTLs. Also remove references to the csum pml.
cmr=v1.7.3:reviewer=jsquyres

This commit was SVN r28911.
2013-07-22 19:16:59 +00:00
Nathan Hjelm
b17cd13c09 sharedfp: ensure sharedfp components register their parameters in mca_register_component_params not mca_component_open
This commit was SVN r28910.
2013-07-22 17:53:58 +00:00
Nathan Hjelm
61d331d5b5 MCA/base: fix some warnings and an error in the MCA variable system
This commit was SVN r28909.
2013-07-22 17:52:39 +00:00
Jeff Squyres
b437041aeb Update one more comment.
This commit was SVN r28908.
2013-07-22 17:29:00 +00:00
Jeff Squyres
4b6006402d Use the RTE framework instead of calling ORTE directly.
Brian (rightfully) hit me on the head with the
don't-use-ORTE-use-the-rte-framework clue bat; the usnic BTL now
nicely plays with the RTE framework.

This commit was SVN r28907.
2013-07-22 17:28:23 +00:00
Jeff Squyres
ca9da8a554 Fix minor typo in the comments/docs.
This commit was SVN r28905.
2013-07-22 17:24:17 +00:00
Rolf vandeVaart
67badf384c Only search SONAME of library. Expand comments.
This commit was SVN r28904.
2013-07-22 15:54:45 +00:00
Brian Barrett
0d8b57211a add missing include
This commit was SVN r28900.
2013-07-21 20:18:17 +00:00
Brian Barrett
e1d72409cd add missing header
This commit was SVN r28897.
2013-07-21 19:40:31 +00:00
Brian Barrett
704f1ecc18 fix non-orte builds of PSM
This commit was SVN r28893.
2013-07-21 19:12:32 +00:00
Brian Barrett
05ab9cbaa6 Need to ship pmi_internal.h
This commit was SVN r28891.
2013-07-21 19:00:50 +00:00
Brian Barrett
495384d8b7 Update documentation in rte.h to match recent changes
This commit was SVN r28887.
2013-07-20 22:14:12 +00:00
Brian Barrett
414ba3dad8 Update PMI RTE to match error handling changes that were part of r28852.
Note that the PMI RTE still doesn't listen for asynchronous errors, so
the error handler still won't ever actually do anything :).

This commit was SVN r28886.

The following SVN revision numbers were found above:
  r28852 --> open-mpi/ompi@e4e678e234
2013-07-20 22:09:02 +00:00
Brian Barrett
5bfd980968 update PMI RTE component to adapt to ORTE changes
This commit was SVN r28885.
2013-07-20 22:06:47 +00:00
Brian Barrett
d984d25da3 Remove orte header file from sharedfp components (OMPI layer should not
include ORTE layer with the RTE framework).  Thankfully, nothing used
orte_show_help, so easy fix.

This commit was SVN r28884.
2013-07-20 22:03:44 +00:00
Ralph Castain
d64e45cfa3 Add utility for comparing two code trees
This commit was SVN r28883.
2013-07-20 21:48:23 +00:00
Jeff Squyres
7a63ee24fb Remove Elan and Windows Verbs from the list of supported networks.
This commit was SVN r28881.
2013-07-19 22:15:25 +00:00
Jeff Squyres
bcf40e075b Add some notes about the Cisco usNIC BTL.
This commit was SVN r28880.
2013-07-19 22:14:49 +00:00
Jeff Squyres
194b285447 First commit of the Cisco usNIC BTL.
This BTL accesses the Cisco usNIC Linux device via the Linux verbs
API via Unreliable Datagram queue pairs.  A few noteworthy points:

 * This BTL does most of its own fragmentation; it tells the PML that
   it has a very high max_send_size (much higher than the network
   MTU).
 * Since UD fragments are, by definition, unreliable, the usnic BTL
   handles all of its own reliability via a sliding window approach
   using the opal_hotel construct and many tricks stolen from the
   corpus of knowledge surrounding efficient TCP.
 * There is a fun PML latency-metric based optimization for NUMA
   awareness of short messages.
 * Note that this is ''not'' a generic UD verbs BTL; it is specific to
   the Cisco usNIC device.

This commit was SVN r28879.
2013-07-19 22:13:58 +00:00
Jeff Squyres
3546163c48 Devices that do not support RC QP's are also intentionally skipped;
don't warn about skipping them.

This commit was SVN r28874.
2013-07-19 19:05:18 +00:00
Ralph Castain
5d12ab3873 Ensure we always set num_local_peers for both PMI2 and PMI1
This commit was SVN r28860.
2013-07-19 04:34:58 +00:00
Ralph Castain
b033a6b6d6 One last Cray-inspired fix...
Refs trac:3685

This commit was SVN r28857.

The following Trac tickets were found above:
  Ticket 3685 --> https://svn.open-mpi.org/trac/ompi/ticket/3685
2013-07-19 03:04:00 +00:00
Nathan Hjelm
1e8ba2b8cf fix condition in common/pmi init that c caused pmi to fail if PMI2_Init succeeds
This commit was SVN r28856.
2013-07-19 02:43:42 +00:00
Ralph Castain
92cb93b21e Remove set-but-unused variable
Refs trac:3685

This commit was SVN r28855.

The following Trac tickets were found above:
  Ticket 3685 --> https://svn.open-mpi.org/trac/ompi/ticket/3685
2013-07-19 01:42:35 +00:00
Ralph Castain
bc2586cf3c Refs trac:3685. Check error code returned by PMI2_Info_GetJobAttr.
This commit was SVN r28854.

The following Trac tickets were found above:
  Ticket 3685 --> https://svn.open-mpi.org/trac/ompi/ticket/3685
2013-07-19 01:24:51 +00:00
Ralph Castain
a10546d5c1 Cleanup and rename of platform files
This commit was SVN r28853.
2013-07-19 01:18:41 +00:00
Ralph Castain
e4e678e234 Per the RFC and discussion on the devel list, update the RTE-MPI error handling interface. There are a few differences in the code from the original RFC that came out of the discussion - I've captured those in the following writeup
George and I were talking about ORTE's error handling the other day in regards to the right way to deal with errors in the updated OOB. Specifically, it seemed a bad idea for a library such as ORTE to be aborting the job on its own prerogative. If we lose a connection or cannot send a message, then we really should just report it upwards and let the application and/or upper layers decide what to do about it.

The current code base only allows a single error callback to exist, which seemed unduly limiting. So, based on the conversation, I've modified the errmgr interface to provide a mechanism for registering any number of error handlers (this replaces the current "set_fault_callback" API). When an error occurs, these handlers will be called in order until one responds that the error has been "resolved" - i.e., no further action is required - by returning OMPI_SUCCESS. The default MPI layer error handler is specified to go "last" and calls mpi_abort, so the current "abort" behavior is preserved unless other error handlers are registered.

In the register_callback function, I provide an "order" param so you can specify "this callback must come first" or "this callback must come last". Seemed to me that we will probably have different code areas registering callbacks, and one might require it go first (the default "abort" will always require it go last). So you can append and prepend, or go first. Note that only one registration can declare itself "first" or "last", and since the default "abort" callback automatically takes "last", that one isn't available. :-)

The errhandler callback function passes an opal_pointer_array of structs, each of which contains the name of the proc involved (which can be yourself for internal errors) and the error code. This is a change from the current fault callback which returned an opal_pointer_array of just process names. Rationale is that you might need to see the cause of the error to decide what action to take. I realize that isn't a requirement for remote procs, but remember that we will use the SAME interface to report RTE errors internal to the proc itself. In those cases, you really do need to see the error code. It is legal to pass a NULL for the pointer array (e.g., when reporting an internal failure without error code), so handlers must be prepared for that possibility. If people find that too burdensome, we can remove it.

Should we ever decide to create a separate callback path for internal errors vs remote process failures, or if we decide to do something different based on experience, then we can adjust this API.

This commit was SVN r28852.
2013-07-19 01:08:53 +00:00
Ralph Castain
6c50c8167c Fix pmi-1 compile when no pmi2 is present
This commit was SVN r28849.
2013-07-18 22:45:08 +00:00
Ralph Castain
351b1203a7 Set ignores to ignore mca_sharedfp_addproc_control, a generated file
This commit was SVN r28846.
2013-07-18 22:19:52 +00:00
Ralph Castain
8a8b4896be Need to protect libgen.h as some systems might not have it
This commit was SVN r28845.
2013-07-18 20:21:37 +00:00
Edgar Gabriel
185e365dad make the sm sharedfp component compile on Mac.
This commit was SVN r28844.
2013-07-18 20:17:14 +00:00
Ralph Castain
256034a3dc Sigh - fix a couple of spots I missed
Refs trac:3683

This commit was SVN r28843.

The following Trac tickets were found above:
  Ticket 3683 --> https://svn.open-mpi.org/trac/ompi/ticket/3683
2013-07-18 19:07:16 +00:00
Ralph Castain
4eb0dfa039 This has apparently been wrong for some time! Fix the common/pmi libraries so we build them dynamic so they can be properly linked into the components that use them. Define required library version numbers and so some other cuteness to make it all work.
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r28842.
2013-07-18 18:42:42 +00:00
Ralph Castain
fc3b777ef5 Cleanup a variable that isn't used if pmi2 support is available
Refs trac:3683

This commit was SVN r28841.

The following Trac tickets were found above:
  Ticket 3683 --> https://svn.open-mpi.org/trac/ompi/ticket/3683
2013-07-18 17:19:13 +00:00
Edgar Gabriel
93cef82873 remove the ylib component from the fcoll framework. It is not used, there are
no plans to use it. We can always recover it from svn if we would ever change
our minds.

This commit was SVN r28840.
2013-07-18 16:18:06 +00:00
Ralph Castain
92c6b806b9 Based on a patch submitted by Piotr Lesnicki of Bull, cleanup the PMI2 support. This has not been tested yet on multiple environments (e.g., Cray), so it needs more evaluation prior to moving to the 1.7 branch.
cmr:v1.7.3:reviewer=rhc

This commit was SVN r28837.
2013-07-18 14:46:07 +00:00
Pavel Shamis
68969ba6e5 Removing bogus references in iboffload code.
cmr:v1.7:reviewer=hjelmn

This commit was SVN r28834.
2013-07-17 22:35:24 +00:00
Rolf vandeVaart
49663fb802 Move CUDA-aware configurary to its own file and other minor changes due to review.
This commit was SVN r28832.
2013-07-17 22:12:29 +00:00
Edgar Gabriel
6e8522fec5 infuse life into the shared file pointer framework. For this:
- extend the framework API
 - remove the dummy component, not require anymore
 - add four components to perform the actual job.

This commit was SVN r28828.
2013-07-17 21:55:24 +00:00
Edgar Gabriel
ac694b7056 in preparation for the new shared file pointer components to be committed
soon:
 - add a new abstraction layer to be used internally for some operations
 - add a new mca parameter to control lazy intialization of shared file
 pointer structures

This commit was SVN r28826.
2013-07-17 21:30:50 +00:00
Dave Goodell
94977f9501 add authors-to-cvsimport.pl script
Helpful when creating a git-svn clone of the OMPI repository.

Reviewed by jsquyres@

cmr:v1.7:reviewer=jsquyres

This commit was SVN r28825.
2013-07-17 21:21:15 +00:00
Dave Goodell
e768b3c8c4 clean up AUTHORS file
Reviewed by jsquyres@

cmr:v1.7:reviewer=jsquyres

This commit was SVN r28824.
2013-07-17 21:21:02 +00:00
Nathan Hjelm
b88509af36 don't close components that failed to register. cmr:v1.7:reviewer=rhc
This commit was SVN r28823.
2013-07-17 19:49:05 +00:00
Vishwanath Venkatesan
ce8f8f0829 Changing the MPI Datatype from MPI_LONG to OMPI_OFFSET_DATATYPE for send/recv offsets
This commit was SVN r28822.
2013-07-17 19:16:53 +00:00
Nathan Hjelm
d4c6029cf3 sbgp/ibnet: set mca_sbgp_ibnet_component.mtu to IBV_MTU_1024 before registering it. cmr:v1.7:reviewer=pasha
This commit was SVN r28821.
2013-07-17 19:16:31 +00:00
Rolf vandeVaart
7a45be8bde Fix variable initialization.
This commit was SVN r28819.
2013-07-17 17:37:35 +00:00
Nathan Hjelm
5999906dec Remote duplicate mpi/tool from DIST_SUBDIRS
This commit was SVN r28818.
2013-07-17 04:35:47 +00:00
Nathan Hjelm
f0aeb36d80 Fix warnings in ob1 introduced by the pvar commit
This commit was SVN r28817.
2013-07-17 03:41:05 +00:00
Ralph Castain
956317ac1e Cleanup the MPIT errors - include the generated mpit lib in libompi, set ignores, remove the now unused mpit directory
This commit was SVN r28816.
2013-07-17 03:13:13 +00:00