1
1
Граф коммитов

87 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
bf89131f9e add owner files to opa/ompi/orte mca directories
This commit adds an owner file in each of the component directories
for each framework.  This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page.  Currently there are two
"fields" in the file, an owner and a status.  A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
2015-02-22 15:10:23 -07:00
Howard Pritchard
45e7c7fd60 Merge pull request #378 from hppritcha/topic/mtl_query_cast_fix
mtl/query: squash compiler warning
2015-02-06 12:23:33 -07:00
Todd Kordenbrock
762b05bcda mtl-portals4: fix mismatch between format and type-size 2015-02-04 15:35:03 -06:00
Todd Kordenbrock
87759a1b1e mtl-portals4: fix signedness mismatch warning 2015-02-04 15:35:03 -06:00
Todd Kordenbrock
5ddce1acbc mtl-portals4: add "unused" attribute to rc to prevent compiler warning 2015-02-04 15:35:03 -06:00
Howard Pritchard
69d2b818f7 mtl/query: squash compiler warning
Squash compiler warnings now showing up in the
query methods for the mtls.  Cast pointers to the different
mtl module specific types to the mca_base_module_t.

Also, fix up a missing extern in mtl_psm_types.h.
This was causing "multiple definition" errors when building
the mca_mtl_psm.so shared library.
2015-02-04 14:15:54 -07:00
Todd Kordenbrock
b8b07d2d62 mtl-portals4: Fix initialization of the Portals4 MTL component
Swap close and query methods in the initialization of the Portals4
MTL component.

Fixes #373
2015-02-04 13:29:13 -06:00
Howard Pritchard
ed537ddca0 copyright updates for commit eb977de5
I really should start using Jeff's script..
2015-01-31 13:50:32 -07:00
Howard Pritchard
eb977de5e9 mtl: add query method to mtl components
Switch to using the query/priority method for selecting
MTLs.  This switch was motivated by the fact that now
on some platforms, its possible for multiple MTLs to
be initializable, but only one MTL should be selected.

In addition, there is a complication with the PSM and
IFO (with PSM provider) MTLs owing to the fact that
they cannot both intialize the underlying PSM context,
i.e. only one call to psm_init is allowed per process.

The mxm component has not been compiled as the author
doesn't currently have access to a system with a recent
enough mxm installed to allow for a compile.

The portals4, ofi, and psm components have been checked
for compilation.  The ofi and psm components have been
checked for runtime correctness on a intel/qlogic system
with up to date PSM installed.
2015-01-29 09:02:52 -07:00
Todd Kordenbrock
6a3225d800 Fix invalid symbols left by the PMIx merge.
This commit was SVN r32597.
2014-08-25 16:30:26 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
Ryan Grant
caa10a5faf Portals fixes after latest move
This commit was SVN r32330.
2014-07-28 19:25:03 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Ryan Grant
ca0a7b1a9a Correct typo in r31332, mtl_portals_enpoint.h -> mtl_portals_endpoint.h
This commit was SVN r31338.

The following SVN revision numbers were found above:
  r31332 --> open-mpi/ompi@b12ee27b3d
2014-04-08 14:41:51 +00:00
Ralph Castain
b12ee27b3d Add missing files - thanks to Mr. Anonymous for reporting them as missing from the 1.8 tarball
cmr=v1.8.1:reviewer=jsquyres:subject=add missing portals4 files

This commit was SVN r31332.
2014-04-08 02:55:14 +00:00
Brian Barrett
7d472ad5a5 Improve some comments
This commit was SVN r30144.
2014-01-07 23:35:04 +00:00
Brian Barrett
afde8370b3 Pull both calls to get into one function, and wrap with the appropriate
reference count if flow control is enabled.

This commit was SVN r30141.
2014-01-07 23:15:09 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Brian Barrett
dbcc53bc6f Fix a threading issue
Remove some unneeded UNLIKELYs

This commit was SVN r30138.
2014-01-07 19:41:39 +00:00
Brian Barrett
d4bb1cbbad * Start working on thread safety of Portals 4 MTL
* Only call flowctl_add_procs if there's a new proc in the add_procs call

This commit was SVN r30110.
2014-01-02 22:37:01 +00:00
Brian Barrett
16a1166884 Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a
configure-time dynamic allocation of flags.  The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process.  Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.

This commit was SVN r29100.
2013-08-30 16:54:55 +00:00
Brian Barrett
2cc947513b * Fix some compile errors
* Need to subtract 1 off the size so that we stay in the bit length requirements

This commit was SVN r28997.
2013-08-05 18:49:48 +00:00
Brian Barrett
ecbbf888d3 * Update Portals 4 MTL's multi-md code to be a bit cleaner (no if statements
in the path) and not create MDs due to boundary crossing
* Add the same logic to the Coll component

This commit was SVN r28733.
2013-07-08 21:27:37 +00:00
Brian Barrett
d3b49535b5 Only allow communication from the same user, since we don't have job-level
protection.

This commit was SVN r28715.
2013-07-03 17:29:02 +00:00
Brian Barrett
c4577723ed fix misuse of param api
This commit was SVN r28705.
2013-07-02 21:41:42 +00:00
Brian Barrett
e4698f5cd4 Shell of the Portals 4 collectives componetn
This commit was SVN r28703.
2013-07-02 15:23:55 +00:00
Nathan Hjelm
9d4a26f47d Update OMPI frameworks to use the MCA framework system.
Notes:
  - This commit also eliminates the need for an available components list in use
    in several frameworks. None of the code in question was making use of the
    priority field of the priority component list item so these extra lists were
    removed.
  - Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
  - Cleans up the ompi/orte-info functions. Expose the functions that construct the
    list of params so they can be used elsewhere.

patches for mtl/portals4 from brian

missed a few output variables in openib

This commit was SVN r28241.
2013-03-27 21:17:31 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Brian Barrett
1370d4569a workaround for case when MD can't span all of memory (sigh)
This commit was SVN r28132.
2013-02-27 17:02:45 +00:00
Brian Barrett
b8442ba505 Revamp the handling of wrapper compiler flags. The user flags, main configure
flags, and mca flags are kept seperate until the very end.  The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro.  MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.

This commit was SVN r27950.
2013-01-29 00:00:43 +00:00
Brian Barrett
f42783ae1a Move the RTE framework change into the trunk. With this change, all non-CR
runtime code goes through one of the rte, dpm, or pubsub frameworks.

This commit was SVN r27934.
2013-01-27 23:25:10 +00:00
Brian Barrett
fa4c2af9ed THe Portals 4 reference implementation will sometimes return a NI_FLOWCTL for both a
send and an ack.  I'm not sure whether this violates the spec, so work around until
we decide...

This commit was SVN r27244.
2012-09-05 19:36:19 +00:00
Josh Hursey
28681deffa Backout the ORCA commit. :(
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.

This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
542330e3a7 Commit of ORCA: Open MPI Runtime Collaborative Abstraction
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.

The project is described on the wiki:
  https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition

And on this email thread:
  http://www.open-mpi.org/community/lists/devel/2012/06/11109.php

This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Brian Barrett
defaefd59e Clean up resources from flowcontrol on shutdown
This commit was SVN r26605.
2012-06-14 22:38:35 +00:00
Brian Barrett
946ec4cd97 * Update usage of PtlHandleIsEqual to match new semantic
* Properly set message to MPI_MESSAGE_NULL in the right places
* Fix double free of buffer for non-contiguous blocking sends
* Remove useless debugging output

This commit was SVN r26604.
2012-06-14 22:24:23 +00:00
Brian Barrett
31279eb641 Fix segfault with long expected messages when using the rndv protocol. We were
freeing the ME before the get to grab the long part of the message.

This commit was SVN r26589.
2012-06-11 16:37:01 +00:00
Brian Barrett
25693363e9 * Fix internal accounting error regarding number of available credits
* Use a single MD covering all of address space for put transfers, rather
 than a per-send MD.

This commit was SVN r26458.
2012-05-20 23:42:26 +00:00
Brian Barrett
2e52374847 * Split send and receive eq sizes
* Need to look at slot count before flowcontrol for sending to prevent
  race in restart
* Need to free pending request fragments when done with the request
* A number of branch prediction optimizations for error conditions

This commit was SVN r26430.
2012-05-10 21:43:48 +00:00
Brian Barrett
0ae2277796 Add a backoff mechanism for re-establishing communication
This commit was SVN r26366.
2012-05-01 15:53:00 +00:00
Brian Barrett
74ade8b181 need to order the pending list before we restart
This commit was SVN r26365.
2012-04-30 23:06:00 +00:00
Brian Barrett
5dec52af8d remove some now unneeded debugging
This commit was SVN r26364.
2012-04-30 22:50:52 +00:00
Brian Barrett
c654ee6afc * Use triggered operations for restart barrier as well
This commit was SVN r26363.
2012-04-30 22:48:10 +00:00
Brian Barrett
91a9973bde * Make flow control on by default
* Move alarm code back into a triggered operation

This commit was SVN r26362.
2012-04-30 22:25:40 +00:00
Brian Barrett
e6a0a1cf8a * Make sure to release all resources on failed send
* Avoid triggered ops until we get everything debugged
* Simplify flowctl interface a bit

This commit was SVN r26356.
2012-04-27 21:11:01 +00:00
Brian Barrett
8a70747da2 Fix some naming that doesn't make a ton of sense
This commit was SVN r26277.
2012-04-18 01:05:18 +00:00
Brian Barrett
f4d4e87176 add some flow control debugging output
This commit was SVN r26276.
2012-04-17 23:14:05 +00:00
Brian Barrett
fe0dfc8e26 First take at flow control protocol
This commit was SVN r26274.
2012-04-17 21:46:21 +00:00
Brian Barrett
dde6f094eb In preperation for flow control changes coming, always utilize ACKs for
message completion.

This commit was SVN r26272.
2012-04-16 17:25:27 +00:00
Brian Barrett
451af0e832 Ensure async progress for long unexpected messages by waiting for an
event on the ME.  The events we're likely to see are LINK (the ME was
added to the match list), PUT (weird to see first, but means that the ME
was linked to the match list and then matched), or PUT_OVERFLOW, meaning
the message was unexpected.

This commit was SVN r26199.
2012-03-26 22:54:35 +00:00