1
1
Граф коммитов

334 Коммитов

Автор SHA1 Сообщение Дата
Mike Dubman
05ee929832 OMPI-MXM: handle multiple calls to add_procs() in MXM
- now add_procs can be called more than once (during MPI_INIT and Inter_Comm_Create)
- adjust MXM to this reality

fixed by Alina, reviewed by Yossi/Mike

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30907.
2014-03-03 13:50:37 +00:00
Mike Dubman
49ee63f4b8 MXM: do not enforce version check
- MXM uses libtool versioning scheme which is enough, no need additional in OMPI

reviewed by yossi

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30768.
2014-02-18 19:44:37 +00:00
Yossi Etigin
7564e2c13f Fix a recursion in mxm send flow which happens when mpi starts a new send from the context of send completion callback.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30265.
2014-01-12 17:47:03 +00:00
Alina Sklarevich
2869ff1782 mxm: fixes for compilation warnings.
removed set but not used variables and a variable that is unused.

reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30176.
2014-01-09 15:15:14 +00:00
Brian Barrett
7d472ad5a5 Improve some comments
This commit was SVN r30144.
2014-01-07 23:35:04 +00:00
Brian Barrett
afde8370b3 Pull both calls to get into one function, and wrap with the appropriate
reference count if flow control is enabled.

This commit was SVN r30141.
2014-01-07 23:15:09 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Brian Barrett
dbcc53bc6f Fix a threading issue
Remove some unneeded UNLIKELYs

This commit was SVN r30138.
2014-01-07 19:41:39 +00:00
Brian Barrett
d4bb1cbbad * Start working on thread safety of Portals 4 MTL
* Only call flowctl_add_procs if there's a new proc in the add_procs call

This commit was SVN r30110.
2014-01-02 22:37:01 +00:00
Yossi Etigin
6ab4aba9e6 Fix missing include of show_help.h in mtl mxm.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29987.
2013-12-19 19:37:21 +00:00
Jeff Squyres
bac67e0d81 Per discussion @Chicago OMPI dev meeting Dec 2013: remove all MX support.
This commit was SVN r29873.
2013-12-12 18:54:47 +00:00
Yossi Etigin
a913b00f89 mtl mxm: update configuration parsing api to mxm 2.1, drop
older version support (1.0 and 1.1), and cleanup the code.

reviewed by miked.

cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29797.
2013-12-04 09:11:55 +00:00
Mike Dubman
432c10750a enable mxm2 from np>0
reviewed by yossi
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29178.
2013-09-16 12:36:28 +00:00
Mike Dubman
44bfa95553 enable mxm2 by default on np>=0
reviewed by yossi
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29177.
2013-09-16 12:32:29 +00:00
Brian Barrett
16a1166884 Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a
configure-time dynamic allocation of flags.  The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process.  Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.

This commit was SVN r29100.
2013-08-30 16:54:55 +00:00
Ralph Castain
45e695928f As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time:
* add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit.

* remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL"

* modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded

* removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base

* added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames

This commit was SVN r29052.
2013-08-20 18:59:36 +00:00
Ralph Castain
611d7f9f6b When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require.
This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times.

Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes:

* upon first request for data, have the OPAL db pmi component fetch and decode *all* the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally

* reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test

* reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued).

Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it

Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time.

This commit was SVN r29040.
2013-08-17 00:49:18 +00:00
Brian Barrett
2cc947513b * Fix some compile errors
* Need to subtract 1 off the size so that we stay in the bit length requirements

This commit was SVN r28997.
2013-08-05 18:49:48 +00:00
Brian Barrett
704f1ecc18 fix non-orte builds of PSM
This commit was SVN r28893.
2013-07-21 19:12:32 +00:00
Aurelien Bouteiller
e1066143a4 rename ompi_free_list operations to _mt, as per discussions at last face to face meeting
This commit was SVN r28734.
2013-07-08 22:07:52 +00:00
Brian Barrett
ecbbf888d3 * Update Portals 4 MTL's multi-md code to be a bit cleaner (no if statements
in the path) and not create MDs due to boundary crossing
* Add the same logic to the Coll component

This commit was SVN r28733.
2013-07-08 21:27:37 +00:00
George Bosilca
c9e5ab9ed1 Our macros for the OMPI-level free list had one extra argument, a possible return
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.

The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.

This commit was SVN r28722.
2013-07-04 08:34:37 +00:00
Brian Barrett
d3b49535b5 Only allow communication from the same user, since we don't have job-level
protection.

This commit was SVN r28715.
2013-07-03 17:29:02 +00:00
Brian Barrett
c4577723ed fix misuse of param api
This commit was SVN r28705.
2013-07-02 21:41:42 +00:00
Brian Barrett
e4698f5cd4 Shell of the Portals 4 collectives componetn
This commit was SVN r28703.
2013-07-02 15:23:55 +00:00
Mike Dubman
d1c82994be fix: detect threading model to take appropriate flow in mxm
This commit was SVN r28648.
2013-06-16 08:40:06 +00:00
Yossi Etigin
64d98e0438 Fix data corruption in MXM by registering to OPAL memory release hooks and removing any mappings created by mxm
This commit was SVN r28489.
2013-05-14 12:27:44 +00:00
Alex Margolin
aebd794bf6 Fixed macro definition order in MXM component headers
This commit was SVN r28378.
2013-04-24 16:51:43 +00:00
Alex Margolin
0ab7675019 Fix MXM connection establishment flow
This commit was SVN r28329.
2013-04-12 16:37:42 +00:00
Nathan Hjelm
9d4a26f47d Update OMPI frameworks to use the MCA framework system.
Notes:
  - This commit also eliminates the need for an available components list in use
    in several frameworks. None of the code in question was making use of the
    priority field of the priority component list item so these extra lists were
    removed.
  - Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
  - Cleans up the ompi/orte-info functions. Expose the functions that construct the
    list of params so they can be used elsewhere.

patches for mtl/portals4 from brian

missed a few output variables in openib

This commit was SVN r28241.
2013-03-27 21:17:31 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Nathan Hjelm
3c5cd95087 mtl/psm: add missing header for opal_show_help (one more)
This commit was SVN r28147.
2013-03-05 00:18:51 +00:00
Nathan Hjelm
25d0d97d6b mtl/psm: add missing header for opal_show_help
This commit was SVN r28146.
2013-03-05 00:17:48 +00:00
Nathan Hjelm
213cb79fab mtl/psm: add missing header for opal_show_help
This commit was SVN r28145.
2013-03-05 00:15:11 +00:00
Brian Barrett
1370d4569a workaround for case when MD can't span all of memory (sigh)
This commit was SVN r28132.
2013-02-27 17:02:45 +00:00
Vasily Filipov
f897c8a1e0 MTL MXM: STREAM supporting for isend and irecv.
This commit was SVN r28122.
2013-02-27 13:21:30 +00:00
Vasily Filipov
52a9241859 MTL MXM: adapt to mxm 2.0 api changes - flags are only for send requests, and SYNC is part of the opcode.
This commit was SVN r28069.
2013-02-17 10:04:19 +00:00
Vasily Filipov
8270d8f52a MTL MXM: "#include "opal/util/show_help.h" adding.
This commit was SVN r28068.
2013-02-17 09:51:03 +00:00
Brian Barrett
312f37706e In talking about this with Jeff and Ralph, we don't actually need
ompi_show_help, because opal_show_help is replaced with an 
aggregating version when using ORTE, so there's no reason to
directly call orte_show_help.

This commit was SVN r28051.
2013-02-12 21:10:11 +00:00
Brian Barrett
d80218996f Rather than setting up the direct call stuff in ompi_mca (which requires
modifying ompi_mca for every interface that is direct called), do it in
the framework's .m4 file.

This commit was SVN r28031.
2013-02-04 23:26:42 +00:00
Vasily Filipov
21b170b43b MTL MXM: push commit r27987 back, now with right user.
r27987 - MTL MXM: ver. 2.0 interface changes.

This commit was SVN r28026.

The following SVN revision numbers were found above:
  r27987 --> open-mpi/ompi@2735658d81
2013-02-04 06:59:24 +00:00
Vasily Filipov
aa5e436479 Revert revesion -r27986, the reason is - it was submitted with wrong user name.
This commit was SVN r28025.

The following SVN revision numbers were found above:
  r27986 --> open-mpi/ompi@729caaf0cd
2013-02-04 06:54:24 +00:00
Pavel Shamis
2735658d81 MTL MXM: ver. 2.0 interface changes.
This commit was SVN r27987.
2013-01-31 08:38:08 +00:00
Brian Barrett
b8442ba505 Revamp the handling of wrapper compiler flags. The user flags, main configure
flags, and mca flags are kept seperate until the very end.  The main configure
wrapper flags should now be modified by using the OPAL_WRAPPER_FLAGS_ADD
macro.  MCA components should either let <framework>_<component>_{LIBS,LDFLAGS}
be copied over OR set <framework>_<component>_WRAPPER_EXTRA_{LIBS,LDFLAGS}.
The situations in which WRAPPER CPPFLAGS can be set by MCA components was
made very small to match the one use case where it makes sense.

This commit was SVN r27950.
2013-01-29 00:00:43 +00:00
Brian Barrett
f42783ae1a Move the RTE framework change into the trunk. With this change, all non-CR
runtime code goes through one of the rte, dpm, or pubsub frameworks.

This commit was SVN r27934.
2013-01-27 23:25:10 +00:00
Mike Dubman
a454341e2b add support for mxm 2.0
This commit was SVN r27661.
2012-12-09 22:58:37 +00:00
Brian Barrett
702451111b Remove Portals 3.3 support
This commit was SVN r27656.
2012-12-06 20:11:27 +00:00
Aleksey Senin
ae92f64842 Check that MXM runtime version match compiled.
Reviewed by Mike Dubman.

This commit was SVN r27575.
2012-11-07 14:44:33 +00:00
Nathan Hjelm
bdedd8b0d3 Per RFC modify the behavior of mca_base_components_close to NOT close the output. Modify frameworks to always close their output and set to -1.
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.

This commit was SVN r27570.
2012-11-06 19:09:26 +00:00
Eugene Loh
25ad84b925 Ensure that MPI_Status objects have proper alignment:
- fix the Fortran layer to use new macros to convert Fortran-to-C status
- change the C internals to pull out old OMPI_SET_STATUS* macros

Also, change name of "status" argument in topo_test_f.c to "topo_type".

This commit was SVN r27403.
2012-10-04 14:39:51 +00:00
Nathan Hjelm
9200473a59 mtl/psm: do not let psm set processor affinity
This commit was SVN r27389.
2012-10-02 15:56:51 +00:00
Brian Barrett
fa4c2af9ed THe Portals 4 reference implementation will sometimes return a NI_FLOWCTL for both a
send and an ack.  I'm not sure whether this violates the spec, so work around until
we decide...

This commit was SVN r27244.
2012-09-05 19:36:19 +00:00
Aleksey Senin
33ae1fe6c7 Fix untitialized return code in ompi_mtl_mxm_add_procs function.
This commit was SVN r27216.
2012-09-02 13:17:49 +00:00
Aleksey Senin
68e0894a58 MXM send/recv request changes.
Adapt OMPI to the latest MXM changes in send/recv request.
Use memory handle structure instead of memory key.

This commit was SVN r27155.
2012-08-28 05:57:36 +00:00
Samuel Gutierrez
7867330dcc Fix the PSM MTL in trunk by gathering node locality information differently.
This commit was SVN r27063.
2012-08-16 00:50:24 +00:00
Yael Dayan
79e6b9c91d Adapt OMPI to use newer version of MXM.
This commit was SVN r26974.
2012-08-08 15:29:38 +00:00
Yael Dayan
954bcdc0a5 adapt the way to find amount of local processes to OMPI trunk.
This commit was SVN r26973.
2012-08-08 15:26:28 +00:00
Vasily Filipov
fc712182db MTL MXM: make MXM use MXM_VERSION macro for MXM version checking.
This commit was SVN r26952.
2012-08-06 06:35:57 +00:00
Vasily Filipov
c386847d9a MTL MXM: Adding MXM version protect for Mprobe, Mrecv resources.
This commit was SVN r26922.
2012-07-31 07:57:25 +00:00
Vasily Filipov
4e66ff030b MTL MXM Mrecv: adding missed return message to a free list.
This commit was SVN r26870.
2012-07-26 11:22:22 +00:00
Vasily Filipov
ef9bd8e4cb MTL MXM: MPI_Mprobe, MPI_Mrecv implementation for MXM adding.
This commit was SVN r26866.
2012-07-25 13:26:40 +00:00
Mike Dubman
4784253f5c revert commit, breaks backwards compatability, will be revised
This commit was SVN r26852.
2012-07-24 11:48:18 +00:00
Vasily Filipov
99bd5977bd MTL MXM: small fix in the mxm_req_probe func interface.
This commit was SVN r26850.
2012-07-24 08:46:38 +00:00
Vasily Filipov
597a422272 MTL: make MXM work with read (in blocking send case) call-backs.
This commit was SVN r26807.
2012-07-19 13:28:06 +00:00
George Bosilca
4326537fe9 Remove compiler warning about uninitialized variable.
This commit was SVN r26760.
2012-07-08 00:07:52 +00:00
Yevgeny Kliteynik
0e28fa984b Remove dead code that was related to ticket #2971
This commit was SVN r26701.
2012-07-02 11:19:09 +00:00
Jeff Squyres
5d030278e1 Refs trac:3130: Per comment 8 on the ticket, this MX patch fixes the cases
where the MX BTL and MTL are stepping on each other regarding the
mpool.  Thanks to Yong Qin for assistance in tracking this down.

This commit was SVN r26698.

The following Trac tickets were found above:
  Ticket 3130 --> https://svn.open-mpi.org/trac/ompi/ticket/3130
2012-06-29 13:52:40 +00:00
Ralph Castain
0dfe29b1a6 Roll in the rest of the modex change. Eliminate all non-modex API access of RTE info from the MPI layer - in some cases, the info was already present (either in the ompi_proc_t or in the orte_process_info struct) and no call was necessary. This removes all calls to orte_ess from the MPI layer. Calls to orte_grpcomm remain required.
Update all the orte ess components to remove their associated APIs for retrieving proc data. Update the grpcomm API to reflect transfer of set/get modex info to the db framework.

Note that this doesn't recreate the old GPR. This is strictly a local db storage that may (at some point) obtain any missing data from the local daemon as part of an async methodology. The framework allows us to experiment with such methods without perturbing the default one.

This commit was SVN r26678.
2012-06-27 14:53:55 +00:00
Josh Hursey
28681deffa Backout the ORCA commit. :(
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.

This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
542330e3a7 Commit of ORCA: Open MPI Runtime Collaborative Abstraction
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.

The project is described on the wiki:
  https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition

And on this email thread:
  http://www.open-mpi.org/community/lists/devel/2012/06/11109.php

This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Brian Barrett
defaefd59e Clean up resources from flowcontrol on shutdown
This commit was SVN r26605.
2012-06-14 22:38:35 +00:00
Brian Barrett
946ec4cd97 * Update usage of PtlHandleIsEqual to match new semantic
* Properly set message to MPI_MESSAGE_NULL in the right places
* Fix double free of buffer for non-contiguous blocking sends
* Remove useless debugging output

This commit was SVN r26604.
2012-06-14 22:24:23 +00:00
Brian Barrett
31279eb641 Fix segfault with long expected messages when using the rndv protocol. We were
freeing the ME before the get to grab the long part of the message.

This commit was SVN r26589.
2012-06-11 16:37:01 +00:00
Mike Dubman
10831e111a detect num of local procs
This commit was SVN r26555.
2012-06-05 09:13:16 +00:00
Yevgeny Kliteynik
1cbce83ece Fixed wording of MXM parameters as suggested By Jeff.
This commit was SVN r26545.
2012-06-03 21:48:42 +00:00
Yevgeny Kliteynik
f02bf707a4 Added MXM parameter "np" that controls the minimal number of processes that allow MXM to run
Default: 128

MXM advantages kick in with large number of processes.

This commit was SVN r26544.
2012-06-02 11:07:20 +00:00
Brian Barrett
2effbb1ba6 fix copy/paste typo
This commit was SVN r26492.
2012-05-24 16:06:20 +00:00
Ralph Castain
c0304eb23a Fix copy/paste typo
This commit was SVN r26491.
2012-05-24 15:47:20 +00:00
Brian Barrett
25693363e9 * Fix internal accounting error regarding number of available credits
* Use a single MD covering all of address space for put transfers, rather
 than a per-send MD.

This commit was SVN r26458.
2012-05-20 23:42:26 +00:00
Brian Barrett
2e52374847 * Split send and receive eq sizes
* Need to look at slot count before flowcontrol for sending to prevent
  race in restart
* Need to free pending request fragments when done with the request
* A number of branch prediction optimizations for error conditions

This commit was SVN r26430.
2012-05-10 21:43:48 +00:00
Brian Barrett
0ae2277796 Add a backoff mechanism for re-establishing communication
This commit was SVN r26366.
2012-05-01 15:53:00 +00:00
Brian Barrett
74ade8b181 need to order the pending list before we restart
This commit was SVN r26365.
2012-04-30 23:06:00 +00:00
Brian Barrett
5dec52af8d remove some now unneeded debugging
This commit was SVN r26364.
2012-04-30 22:50:52 +00:00
Brian Barrett
c654ee6afc * Use triggered operations for restart barrier as well
This commit was SVN r26363.
2012-04-30 22:48:10 +00:00
Brian Barrett
91a9973bde * Make flow control on by default
* Move alarm code back into a triggered operation

This commit was SVN r26362.
2012-04-30 22:25:40 +00:00
Brian Barrett
e6a0a1cf8a * Make sure to release all resources on failed send
* Avoid triggered ops until we get everything debugged
* Simplify flowctl interface a bit

This commit was SVN r26356.
2012-04-27 21:11:01 +00:00
Brian Barrett
8a70747da2 Fix some naming that doesn't make a ton of sense
This commit was SVN r26277.
2012-04-18 01:05:18 +00:00
Brian Barrett
f4d4e87176 add some flow control debugging output
This commit was SVN r26276.
2012-04-17 23:14:05 +00:00
Brian Barrett
fe0dfc8e26 First take at flow control protocol
This commit was SVN r26274.
2012-04-17 21:46:21 +00:00
Brian Barrett
dde6f094eb In preperation for flow control changes coming, always utilize ACKs for
message completion.

This commit was SVN r26272.
2012-04-16 17:25:27 +00:00
Mike Dubman
34acf769d4 mtl_mxm: support canceling messages
This commit was SVN r26256.
2012-04-09 16:02:05 +00:00
Brian Barrett
451af0e832 Ensure async progress for long unexpected messages by waiting for an
event on the ME.  The events we're likely to see are LINK (the ME was
added to the match list), PUT (weird to see first, but means that the ME
was linked to the match list and then matched), or PUT_OVERFLOW, meaning
the message was unexpected.

This commit was SVN r26199.
2012-03-26 22:54:35 +00:00
Brian Barrett
2a26d0f9a2 Forgot to add new file in the last commit.
Mark ME as invalid once we see a completion event, and look for events before
trying to unlink.

This commit was SVN r26198.
2012-03-26 22:39:05 +00:00
Brian Barrett
0e91084385 * Add type field to the request structure to deal with random user requests
(ie, cancel)
* Implement cancel for receives.  Sends are slightly more complicated...

This commit was SVN r26197.
2012-03-26 22:32:36 +00:00
Brian Barrett
61a090e0d1 Checking for NULL function pointers and direct-call semantics can't work
together, so implement all functions in the MTL interface for all
MTLs.  The only places NULL was still being set was for add_comm/del_comm,
and matched probe, both of which are straight forward to implement (or
return ERROR_NOT_IMPLEMENTED, since the PML can't emulate matched probe).

This commit was SVN r26194.
2012-03-26 19:27:03 +00:00
Brian Barrett
cdaf110c0f * Implement mtl_send in addition to mtl_sendi
This commit was SVN r26193.
2012-03-26 19:19:11 +00:00
Brian Barrett
27c8f71773 Start of the flow control implementation. #defined out for now.
This commit was SVN r26192.
2012-03-26 01:31:58 +00:00
Brian Barrett
cce936b94c * Implement matched probe for the CM PML. Required adding a peer field to
the ompi_message_t structure to properly initialize convertor (the peer
  is available in the request in OB1, and wasn't needed when I did the
  original implementation).
* Implement matched probe for the Portals4 MTL and add NULL function pointers
  for the other MTLs.
* Add add_comm and del_comm functions to portals4 MTL so that direct call
  almost works again.
* Add NEWS item that we've implemented matched probe

This commit was SVN r26180.
2012-03-22 22:55:59 +00:00
Brian Barrett
4d12616b64 Frank pointed out that PTL_OK is zero and PtlHandleIsEqual either returns
PTL_OK or PTL_FAIL and that I had these backwards.

This commit was SVN r26179.
2012-03-22 15:58:00 +00:00
Brian Barrett
1c6b5a1358 * Set all appropriate flags for portal table entries
* split eq into send and receive eqs so that we can control the number
  of outstanding events in send eq and ensure we never lose an ack
* Shouldn't ever truncate on short unexpected receive bocks, so don't set
  the truncate bit
* Track active vs. waiting for free short unexpected receive blocks so
  to ensure an active short unexpected receive block is posted coming out
  of flow control.  Also allow creation of "temporary" blocks which should
  be released once FREE event is received.
* Slight reorganization of some code in preparation for more flow control
  work.

This commit was SVN r26174.
2012-03-21 22:20:55 +00:00