1
1
Граф коммитов

61 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
c9a257f1a0 btl/ugni: always buffer sendi fragments
This commit will improve the message rate when using the sendi function
by not waiting for the send to get to the remote process.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31526.
2014-04-24 18:50:29 +00:00
Nathan Hjelm
bd275e642e btl/ugni: fix typo
cmr=v1.7.5:ticket=trac:4151

This commit was SVN r30795.

The following Trac tickets were found above:
  Ticket 4151 --> https://svn.open-mpi.org/trac/ompi/ticket/4151
2014-02-21 17:46:22 +00:00
Nathan Hjelm
ff4c9c808a btl/ugni: fix leak in new sendi function.
cmr=v1.7.5:ticket=trac:4151

This commit was SVN r30365.

The following Trac tickets were found above:
  Ticket 4151 --> https://svn.open-mpi.org/trac/ompi/ticket/4151
2014-01-22 16:32:07 +00:00
Nathan Hjelm
c9c335544e btl/ugni: fix a typo in r30353
cmr=v1.7.5:ticket=trac:4151

This commit was SVN r30354.

The following SVN revision numbers were found above:
  r30353 --> open-mpi/ompi@aa3fea55b2

The following Trac tickets were found above:
  Ticket 4151 --> https://svn.open-mpi.org/trac/ompi/ticket/4151
2014-01-21 21:02:28 +00:00
Nathan Hjelm
aa3fea55b2 btl/ugni: re-add a sendi function to exploit the new optimization in
ob1.

Also update LANL platform files to use the latest version of ugni.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r30353.
2014-01-21 20:53:35 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Nathan Hjelm
24a7e7aa34 Add support for the udreg registration cache and dynamics on XE/XK/XC.
To support the new mpool two changes were made to the mpool infrastructure:

 1) Added an mpool flag to indicate that an mpool does not need the memory
    hooks to use the leave pinned protocols. This flag is checked in the
    mpool lookup.

 2) Add a mpool context to the base registration. This new member is used
    by the udreg mpool to store the udreg context associated with the
    particular registration. The new member will not break the ABI
    compatibility as the new member is only currently used by the udreg
    mpool.

Dynamics support for Cray systems makes use of the global rank provided by
orte to give the ugni library a unique rank for each process. Dynamics
support is not available under direct-launch (srun.)

cmr=v1.7.4

This commit was SVN r29719.
2013-11-18 04:58:37 +00:00
Nathan Hjelm
f3d18028e5 Fix typo in uGNI prepare source that could cause incorrect results with
non-contiguous datatypes.

cmr=v1.7.3

This commit was SVN r29294.
2013-09-30 16:00:58 +00:00
Ralph Castain
611d7f9f6b When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require.
This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times.

Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes:

* upon first request for data, have the OPAL db pmi component fetch and decode *all* the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally

* reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test

* reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued).

Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it

Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time.

This commit was SVN r29040.
2013-08-17 00:49:18 +00:00
Nathan Hjelm
dfca3d4804 fix typos in the ugni and vader btls
This commit was SVN r28772.
2013-07-12 17:55:33 +00:00
Aurelien Bouteiller
e1066143a4 rename ompi_free_list operations to _mt, as per discussions at last face to face meeting
This commit was SVN r28734.
2013-07-08 22:07:52 +00:00
George Bosilca
c9e5ab9ed1 Our macros for the OMPI-level free list had one extra argument, a possible return
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.

The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.

This commit was SVN r28722.
2013-07-04 08:34:37 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Brian Barrett
b8442ba505 Revamp the handling of wrapper compiler flags. The user flags, main configure
flags, and mca flags are kept seperate until the very end.  The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro.  MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.

This commit was SVN r27950.
2013-01-29 00:00:43 +00:00
Brian Barrett
f42783ae1a Move the RTE framework change into the trunk. With this change, all non-CR
runtime code goes through one of the rte, dpm, or pubsub frameworks.

This commit was SVN r27934.
2013-01-27 23:25:10 +00:00
Nathan Hjelm
84e34ee0d7 Fix a bug in the uGNI btl that could cause certain descriptor callbacks to be called twice.
There was a race condition in the eager get protocol where the RDMA complete message could be received before the local completion of the SMSG message that started the eager get protocol.

cmr:v1.7

This commit was SVN r27740.
2013-01-03 23:11:13 +00:00
Nathan Hjelm
4d1920ee87 Fix a bug on 32-bit systems introduced by r26626. This fix ensures that all supported btls (with exception of wv-- shiqing will need to help bring that one up to date with r26626) set the lval in prepare_src/dst when preparing a put or get segment. This fix also ensures a consistent use of lval in put and get for both local and remote segments.
This commit was SVN r26793.

The following SVN revision numbers were found above:
  r26626 --> open-mpi/ompi@249066e06d
2012-07-13 21:19:16 +00:00
Nathan Hjelm
a847df9ba5 ugni: fix eager get
This commit was SVN r26699.
2012-06-29 15:43:29 +00:00
Nathan Hjelm
249066e06d Timeout! Per RFC update the BTL interface to hide segment keys. All BTLs (with the exception of wv), all relevant PMLs, and osc/rdma have been updated for the new interface.
This commit was SVN r26626.
2012-06-21 17:09:12 +00:00
Nathan Hjelm
e3bc6c0f73 btl/ugni: use grdma mpool to take advantage of shared lru
This commit was SVN r26623.
2012-06-20 23:03:59 +00:00
Nathan Hjelm
3d86b5055e btl/ugni: don't call opal_convertor_pack if there is nothing to pack
This commit was SVN r26622.
2012-06-20 23:01:37 +00:00
Nathan Hjelm
71bffa5158 ugni: update to latest btl code. bug fixes and cleanup
This commit was SVN r26529.
2012-05-31 20:02:41 +00:00
Nathan Hjelm
91d99c6fef ugni: reserve memory domain descriptors (MDDs) for mailbox registration
This commit was SVN r26419.
2012-05-10 00:24:42 +00:00
Nathan Hjelm
903f9fac09 ugni: fixed buffered sends and code cleanup
This commit was SVN r26401.
2012-05-07 17:23:06 +00:00
Nathan Hjelm
49eda71ca0 ugni: fix invalid parameter with opal_pointer_array_init
This commit was SVN r26400.
2012-05-07 17:22:55 +00:00
Nathan Hjelm
584c457352 ugni: update smsg defaults and add parameter to control local completion queue size
This commit was SVN r26399.
2012-05-07 17:22:49 +00:00
Nathan Hjelm
bfcf67391a ugni: set fragment id from opal_pointer_array_add
This commit was SVN r26398.
2012-05-07 17:22:42 +00:00
Nathan Hjelm
b3dc726e9d ugni: don't create completion queues until add_procs
This commit was SVN r26397.
2012-05-07 17:22:35 +00:00
Nathan Hjelm
c36ab84116 ugni: missed a couple of lines in the last commit
This commit was SVN r26340.
2012-04-25 14:24:48 +00:00
Nathan Hjelm
a753fe91f7 fix merge
This commit was SVN r26332.
2012-04-24 21:16:51 +00:00
Nathan Hjelm
9a35f96bda ob1: add support for get fallback on put/send
This commit was SVN r26329.
2012-04-24 20:18:56 +00:00
Nathan Hjelm
0f60858a01 ugni: improve handling of smsg completions
This commit was SVN r26327.
2012-04-24 20:18:35 +00:00
Nathan Hjelm
363bd184e7 ugni: re-disable uGNI for local procs
This commit was SVN r26318.
2012-04-23 21:12:12 +00:00
Nathan Hjelm
ca3ceb840c ugni: add mca parameter to control the number of smsg retries
This commit was SVN r26317.
2012-04-23 21:12:05 +00:00
Nathan Hjelm
95b12f140a ugni: cleanup frag setup code
This commit was SVN r26316.
2012-04-23 21:11:57 +00:00
Nathan Hjelm
1340f9c65a ugni update:
- Move endpoint code back up to BTL
 - Use opal_pointer_array_t for bounce buffer to identify local smsg completions.
 - Update and reenable sendi
 - Create a new endpoint for FMA/BTE transactions (keep local smsg/fma transactions seperate)
 - Move reverse get code into btl_ugni_put.c
 - Move eager get code into btl_ugni_get.c
 - Handle remote SMSG overruns correctly
 - Added support for inplace sends
 - etc

This commit was SVN r26307.
2012-04-19 21:51:55 +00:00
Nathan Hjelm
2b9827f45c ugni: restrict number of memory registrations per process
This commit was SVN r26306.
2012-04-19 21:51:44 +00:00
Nathan Hjelm
f88babfb92 ugni: minor updates
This commit was SVN r26262.
2012-04-10 19:56:19 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Nathan Hjelm
d62c0f1872 ugni: handle smsg failure in mca_btl_ugni_ep_connect_finish
This commit was SVN r26202.
2012-03-28 05:40:16 +00:00
Nathan Hjelm
fca42347e3 ugni: use hash table to keep track of smsg frag completion
This commit was SVN r26153.
2012-03-15 20:13:32 +00:00
Nathan Hjelm
deddf0b33e ugni: fix frag leak in sendi
This commit was SVN r26152.
2012-03-15 20:13:20 +00:00
Nathan Hjelm
99f05d56e3 ugni: updated parameters and code cleanup
This commit was SVN r26151.
2012-03-15 20:13:11 +00:00
Nathan Hjelm
a7209e309a ugni: opps, sendi was missing from Makefile.am
This commit was SVN r26067.
2012-02-28 16:10:35 +00:00
Nathan Hjelm
8217c46666 ompi_free_list: allocate payload if payload size > 0 in the fl_mpool = NULL case
This commit was SVN r26027.
2012-02-23 16:47:28 +00:00
Nathan Hjelm
9843cd0466 ugni: missed one more merge typo
This commit was SVN r26026.
2012-02-23 16:39:15 +00:00
Nathan Hjelm
c83fe003a0 ugni: fix branch merge typo
This commit was SVN r26024.
2012-02-23 16:16:21 +00:00
Nathan Hjelm
1ee6d5d21a ugni: fix typos in mca_btl_ugni_put
This commit was SVN r25937.
2012-02-15 23:26:03 +00:00
Nathan Hjelm
8c9dc990a9 ugni: fixed typo
This commit was SVN r25899.
2012-02-10 01:27:31 +00:00
Nathan Hjelm
994127cb6b ugni: move mca_btl_ugni_smsg_mbox_t back to btl_ugni_endpoint.h
This commit was SVN r25898.
2012-02-10 01:11:54 +00:00