1
1
Граф коммитов

256 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
e89ecac83c bml r2: fix exclusivity comparison
Fixes open-mpi/ompi#1106
2015-11-06 13:26:32 -08:00
Nathan Hjelm
08e267b811 add_procs: add threading protection for dynamic add_procs
This commit add protection to the group, ob1, and bml endpoint lookup
code. For ob1 and the bml a lock has been added. For performance
reasons the lock is only held if a bml or ob1 endpoint does not
exist. ompi_group_dense_lookup no uses opal_atomic_cmpset to ensure
the proc is only retained by the thread that actually updates the
group.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-10-21 16:13:41 -06:00
Nathan Hjelm
db74fa9d0f bml/r2: fix memory leak
The add_procs change made some assumptions in the bml/r2 add_procs
wrong. This lead to del_procs never being called. I removed the logic
that checks the ompi_proc_t reference count and removed an unnecessary
allocation. The allocation only makes sense if we pass more than a
single proc at a time to the btl del_procs.

This commit also ensures that the btl del_procs is called if the
endpoint is in the btl_rdma array but not the btl_send array.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-23 10:45:13 -06:00
Nathan Hjelm
898a0a038c bml/r2: fix coverity CID 1323765
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-09-11 09:39:10 -06:00
Nathan Hjelm
ad3a2ef6cc silence warnings introduced by add_procs merge
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 16:33:52 -06:00
Nathan Hjelm
6fa6513003 bml: Add support for dynamically calling add_procs
This commit contains the following changes:

 - bml: add a function to add a single process. this function is
   intended to remove the need to maintain a opal_bitmap_t as it is
   irrelevant for a single proc. BTLs will need to be updated to
   either 1) ignore the return code from opal_bitmap_set_bit or not
   call the function if the reachability bitmap is NULL.

 - bml: add an inline accessor function for getting the bml endpoint
   for a peer proc. this function will either 1) return the cached bml
   endpoint, or 2) create the endpoint and call add_proc will all
   available BTL modules.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Nathan Hjelm
2f447b2c4c bml/r2: use the bml framework output and set verbosity level to info
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-08-17 11:48:06 -06:00
Jeff Squyres
13425e759c bml r2: very minor cleanups
Delete stale comments, use C99 struct initialization.
2015-06-25 15:54:16 -07:00
Ralph Castain
869041f770 Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Gilles Gouaillardet
9d56b85b55 initialize common symbols from ompi 2015-05-08 10:11:58 +09:00
Nathan Hjelm
df75d0382f ompi: use C99 subobject naming for component initialization
This commit helps future-proof ompi components by initializing each
component member by name.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-18 10:29:58 -06:00
Nathan Hjelm
b68d66bb9b MCA: Add the project/project version to the MCA base component
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.

All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2015-03-27 10:59:04 -06:00
Adrian Reber
1c5a8df724 FT: fix compilation using --with-ft (2/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

The FT code used barrier mechanisms which have been removed
with aec5cd08bd. This patch replaces
all those different barriers with opal_pmix.fence(NULL, 0);
I am not sure this is completely correct but at least a starting
point for a review.
2015-03-11 14:23:33 +01:00
Adrian Reber
f45dd069bd FT: fix compilation using --with-ft (1/5)
Enabling the FT code breaks compilation (again). This series
tries to fix the compiler errors. This is again only fixing
the compiler errors without any warranty that the result
might actually support FT again.

This first patch moves orte_cr_continue_like_restart from ORTE
to opal_cr_continue_like_restart in OPAL. This only leaves three
calls from OPAL to ORTE in the FT code. As it is not yet 100%
clear how to handle these calls the code orte_sstore.set_attr()
has been #ifdef'd out for now.
2015-03-11 14:23:33 +01:00
Howard Pritchard
bf89131f9e add owner files to opa/ompi/orte mca directories
This commit adds an owner file in each of the component directories
for each framework.  This allows for a simple script to parse
the contents of the files and generate, among other things, tables
to be used on the project's wiki page.  Currently there are two
"fields" in the file, an owner and a status.  A tool to parse
the files and generate tables for the wiki page will be added
in a subsequent commit.
2015-02-22 15:10:23 -07:00
Gilles Gouaillardet
7dabc7b3ab bml/r2: fix a typo
reported by Coverity as CID 1270228
2015-02-17 14:28:17 +09:00
Nathan Hjelm
4bf7a207e9 bml/r2: add all rdma btls even if another btl has higher exclusivity
Background: In order to support atomics each btl needs to provide support
for communicating with self unless the btl module can guarantee global
atomicity. Before this commit bml/r2 discarded any BTL with lower
exclusivity than an existing send btl. This would cause the BML to
discard any btl other than self.

The new behavior is as follows:

 - If an exisiting send btl has higher exclusivity then the btl will not be
   added to the send btl list for the endpoint.

 - If a btl provides RDMA support then it is always added to the rdma btl
   list.

 - bml_btl weight for send btls is now calculated across all send btls.

 - bml_btl weight for rdma btls is now calculated across all rdma btls.

With this change self should still win as the only send btl for loopback
without disqualifying other btls (ugni, openib) for atomic operations.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-13 11:46:37 -07:00
Nathan Hjelm
9285e2c356 bml: update for BTL 3.0 interface
This commit brings the bml framework up to date with BTL 3.0 interface.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-13 11:46:37 -07:00
Nathan Hjelm
1b564f62bd Revert "Merge pull request #275 from hjelmn/btlmod"
This reverts commit ccaecf0fd6, reversing
changes made to 6a19bf85dd.
2014-11-19 23:22:43 -07:00
Nathan Hjelm
8f1a44e60e bml/r2: add all rdma btls even if another btl has higher exclusivity
Background: In order to support atomics each btl needs to provide support
for communicating with self unless the btl module can guarantee global
atomicity. Before this commit bml/r2 discarded any BTL with lower
exclusivity than an existing send btl. This would cause the BML to
discard any btl other than self.

The new behavior is as follows:

 - If an exisiting send btl has higher exclusivity then the btl will not be
   added to the send btl list for the endpoint.

 - If a btl provides RDMA support then it is always added to the rdma btl
   list.

 - bml_btl weight for send btls is now calculated across all send btls.

 - bml_btl weight for rdma btls is now calculated across all rdma btls.

With this change self should still win as the only send btl for loopback
without disqualifying other btls (ugni, openib) for atomic operations.
2014-11-19 11:33:04 -07:00
Nathan Hjelm
49ff5a79d0 Update BML for the latest BTL update 2014-11-19 11:33:02 -07:00
Nathan Hjelm
c61e017177 pml: updates to reflect member changes in mca_btl_base_descriptor_t
and mca_btl_base_module_t structures
2014-11-19 11:33:02 -07:00
Nathan Hjelm
66bd698eaf Update BML for BTL interface changes 2014-11-19 11:33:02 -07:00
Ralph Castain
b1a7375192 Fix the "unreachable" message so it outputs the correct hostname for the remote proc. Cleanup some of the pmix stuff when running corner cases of errors
This commit was SVN r32584.
2014-08-22 19:20:45 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Nathan Hjelm
a14e0f10d4 Per RFC: Remove des_src and des_dst members from the
mca_btl_base_segment_t and replace them with des_local and des_remote

This change also updates the BTL version to 3.0.0. This commit does
not represent the final version of BTL 3.0.0. More changes are coming.

In making this change I updated all of the BTLs as well as BTL user's
to use the new structure members. Please evaluate your component to
ensure the changes are correct.

RFC text:

This is the first of several BTL interface changes I am proposing for
the 1.9/2.0 release series.

What: Change naming of btl descriptor members. I propose we change
des_src and des_dst (and their associated counts) to be des_local and
des_remote. For receive callbacks the des_local member will be used to
communicate the segment information to the callback. The proposed change
will include updating all of the doxygen in btl.h as well as updating
all BTLs and BTL users to use the new naming scheme.

Why: My btl usage makes use of both put and get operations on the same
descriptor. With the current naming scheme I need to ensure that there
is consistency beteen the segments described in des_src and des_dst
depending on whether a put or get operation is executed. Additionally,
the current naming prevents BTLs that do not require prepare/RMA matched
operations (do not set MCA_BTL_FLAGS_RDMA_MATCHED) from executing
multiple simultaneous put AND get operations. At the moment the
descriptor can only be used with one or the other. The naming change
makes it easier for BTL users to setup/modify descriptors for RMA
operations as the local segment and remote segment are always in the
same member field. The only issue I forsee with this change is that it
will require a little more work to move BTL fixes to the 1.8 release
series.

This commit was SVN r32196.
2014-07-10 16:31:15 +00:00
Ralph Castain
f3cb124e50 Revert r32082 and r32070 - the developer's conference has decided to go a different direction on the threaded progress effort. This will involve some degree of prototyping to understand the tradeoffs prior to making a final design decision, and so we'll hold off on the final change until that is completed.
This commit was SVN r32089.

The following SVN revision numbers were found above:
  r32070 --> open-mpi/ompi@12d92d0c22
  r32082 --> open-mpi/ompi@aa6438ef7a
2014-06-25 20:43:28 +00:00
Ralph Castain
12d92d0c22 Per the OMPI developer conference, remove the last vestiges of OMPI_USE_PROGRESS_THREADS
This commit was SVN r32070.
2014-06-24 17:05:11 +00:00
George Bosilca
cc0239d52f Remove unused variables.
This commit was SVN r31846.
2014-05-21 01:34:26 +00:00
George Bosilca
137874ec4d In fact btl_eager and btl_dma will just vanish upon destruction of the
bml_endpoint. No need to clean them bfore.

This commit was SVN r31835.
2014-05-20 08:53:22 +00:00
George Bosilca
85e3caaa17 Handle the del_procs correctly. The btl_send is the complete list
of existing BTL fo an endpoint, all the others are just partial list.
Thus, all the cleaning should first be done in the btl_send array,
and them in the other arrays (btl_eager and btl_rdma).

This commit was SVN r31834.
2014-05-20 08:46:57 +00:00
George Bosilca
1647664c43 Show the unreacheable message for the first unreacheable proc
and then stop.

This commit was SVN r31833.
2014-05-20 08:40:32 +00:00
Gilles Gouaillardet
ef4548a215 bml/r2 : fix mca_bml_r2_del_procs()
cmr=v1.8.2:reviewer=hjelmn:ticket=trac:4645

This commit was SVN r31830.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855

The following Trac tickets were found above:
  Ticket 4645 --> https://svn.open-mpi.org/trac/ompi/ticket/4645
2014-05-20 02:55:47 +00:00
Nathan Hjelm
27d3e1ca25 bml/r2: fix a problem identified by Gilles
This commit fixes two issues:

 - This intent of the code @ bml_r2.c:486 is to prevent calling the
   btl_del_procs more than once for a given proc. Gilles correctly
   identified there was a problem in this code but r31786 we not the
   correct fix.

 - Fix a segmentation fault in r2 finalize revealed by the fact we
   actually call del_procs now.

cmr=v1.8.2:reviewer=ggouaillardet:ticket=trac:4645

This commit was SVN r31829.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
  r31786 --> open-mpi/ompi@fc96b0a7b8

The following Trac tickets were found above:
  Ticket 4645 --> https://svn.open-mpi.org/trac/ompi/ticket/4645
2014-05-19 20:22:34 +00:00
Gilles Gouaillardet
fc96b0a7b8 Fix a typo in mca_bml_r2_del_procs()
Use bml_endpoint->btl_eager instead of bml_endpoint->btl_send.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31786.
2014-05-16 04:43:18 +00:00
Nathan Hjelm
faf008f527 Fix bugs that were causing leaks in finalize.
This commit fixes leaks of bml endpoints in finalize. A summary of the
bugs/fixes is below.

 1) ompi_mpi_finalize used ompi_proc_all to get the list of procs but
    never released the reference to them (ompi_proc_all called
    OBJ_RETAIN on all the procs returned). When calling del_procs at
    finalize it should suffice to call ompi_proc_world which does not
    increment the reference count.

 2) del_procs is called BEFORE ompi_comm_finalize. This leaves the
    references to the procs from calling the pml_add_comm
    function. The fix is to reorder the calls to do omp_comm_finalize,
    del_procs, pml_finalize instead of del_procs, pml_finalize,
    ompi_comm_finalize.

 3) The check in del_procs in r2 checked for a reference count of
    1. This is incorrect. At this point there should be 2 references:
    1 from ompi_proc, and another from the add_procs. The fix is to
    change this check to look for a reference count of 22. This check
    makes me extremely uncomforable as nothing will call del_procs if
    the reference count of a procs is not 2 when del_procs is
    called. Maybe there should be an assert since this is a developer
    error IMHO.

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31782.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2014-05-15 18:28:03 +00:00
Nathan Hjelm
e4db2c3ebb ompi: fix various small leaks
This commit fixes three leaks:

 - bml/r2: fix leak of del_procs in mca_bml_r2_del_procs

 - Release the modex data in btl/scif, btl/ugni, and btl/vader

 - ompi_mpi_finalize: close the allocator framework

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31778.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2014-05-15 15:59:51 +00:00
Nathan Hjelm
518f188ad4 bml/base: ensure all components are closed when the framework is
closed

We were leaving the selected component open. This commit should
eliminate a leak detected by valgrind.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31749.
2014-05-13 23:04:40 +00:00
Rolf vandeVaart
ce5274652f Add some additional verbose output per this RFC
http://www.open-mpi.org/community/lists/devel/2014/03/14282.php
Reviewed by Jeff Squyres

This commit was SVN r31072.
2014-03-14 20:17:47 +00:00
Jeff Squyres
da87b506bd Remove warnings identified by clang 3.4
* Remove unused static functions
 * Remove unused static variables

cmr=v1.8:reviewer=hjelmn

This commit was SVN r31023.
2014-03-12 13:17:54 +00:00
Joshua Ladd
9ea9bec4ad Addressing Jeff's comments:
1. Changed rng_buff_t --> opal_rng_buff_t
2. All global variables obey the prefix rule
3. Old code has been removed 
4. Found a couple of unnecessary includes

Refs trac:4298

This commit was SVN r30807.

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-24 23:18:35 +00:00
Joshua Ladd
e39d9f4080 Per the RFC schedule, add an additive lagged Fibonacci parallel random number generator to OPAL. In order to use, please add the following header to your code: opal/util/alfg.h. See ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c for an example how to seed with opal_srand and invoke the generator with opal_rand. This should be added to
cmr=v1.7.5:reviewer=rhc:subject=Add an OPAL RNG

This commit was SVN r30801.
2014-02-23 21:41:38 +00:00
Christoph Niethammer
4f23d8214c Fixed incorrect calculation of reallocated memory in mca_bml_r2_del_btl.
This commit was SVN r30529.
2014-02-03 08:43:59 +00:00
George Bosilca
d265981c55 Don't always retain the proc, do it only for new procs. This enforce a strict policy in the BML, it has one and only one ref on each proc.
This commit was SVN r30429.
2014-01-26 17:26:04 +00:00
Christoph Niethammer
86776daf75 Fixed typo in opal output message.
This commit was SVN r30392.
2014-01-23 08:37:40 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Adrian Reber
b42aad44a3 Trying to get the C/R code to compile again. This patch
includes various fixes all over the C/R code which are
hard to group like the other patches.

Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values

Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)

This commit was SVN r29922.
2013-12-16 15:35:28 +00:00
Brian Barrett
16a1166884 Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a
configure-time dynamic allocation of flags.  The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process.  Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.

This commit was SVN r29100.
2013-08-30 16:54:55 +00:00
Ralph Castain
45e695928f As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time:
* add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit.

* remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL"

* modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded

* removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base

* added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames

This commit was SVN r29052.
2013-08-20 18:59:36 +00:00
Ralph Castain
611d7f9f6b When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require.
This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times.

Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes:

* upon first request for data, have the OPAL db pmi component fetch and decode *all* the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally

* reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test

* reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued).

Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it

Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time.

This commit was SVN r29040.
2013-08-17 00:49:18 +00:00