1
1

40 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
77060cad07 btl/vader: fix finalize sequence
free the component mpool in mca_btl_vader_component_close()
and after freeing soem objects that depend on it such as
mca_btl_vader_component.vader_frags_user

Thanks Christoph Niethammer for reporting this.

Refs. open-mpi/ompi#6524

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-27 11:57:40 +09:00
Nathan Hjelm
f62d26ddbc btl/vader: use basic mpool type to handle frag/fbox allocation
This commit updates btl/vader to use an mpool for handling all shared
memory allocations (frags, fboxes).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-14 15:54:16 -07:00
Brian Barrett
e9e4d2a4bc Handle asprintf errors with opal_asprintf wrapper
The Open MPI code base assumed that asprintf always behaved like
the FreeBSD variant, where ptr is set to NULL on error.  However,
the C standard (and Linux) only guarantee that the return code will
be -1 on error and leave ptr undefined.  Rather than fix all the
usage in the code, we use opal_asprintf() wrapper instead, which
guarantees the BSD-like behavior of ptr always being set to NULL.
In addition to being correct, this will fix many, many warnings
in the Open MPI code base.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-08 16:43:53 -07:00
Nathan Hjelm
000f9eed4d opal: add types for atomic variables
This commit updates the entire codebase to use specific opal types for
all atomic variables. This is a change from the prior atomic support
which required the use of the volatile keyword. This is the first step
towards implementing support for C11 atomics as that interface
requires the use of types declared with the _Atomic keyword.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-14 10:48:55 -06:00
George Bosilca
6d11a45f44
Remove few warnings identified by @rhc in #5514.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-08-03 16:21:06 -04:00
Nathan Hjelm
9d3a79925b btl/vader: fix bugs in rma emulation
This commit fixes two bugs in the RMA/atomic emulation code:

 1) Fix a fragment leak when using AMO emulation.

 2) Always initialize the single-copy emulation code. This is required
 to use the AMO support.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-12 15:50:50 -06:00
Nathan Hjelm
87d41da62b btl/vader: add support for atomics and emulated rdma
This commit adds support for atomic operations as well as rdma for
systems without rdma support. This support is implemented using an
internal send tag.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-02 13:57:11 -06:00
Gilles Gouaillardet
611d7c2d27 btl/vader: make the backing file job specific
Since open-mpi/ompi@47fd2313ab
the backing file is now in /dev/shm by default. As a consequence,
the backing file name has to include the jobid so more than one job
can run at a time.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-30 16:52:51 +09:00
Ralph Castain
6216225bda Ensure cleanup of registered files/dirs
Resolve a race condition between registering for a file to be removed upon termination and actual creation of that file by providing attributes that identify whether the path is a file or directory. This removes the need for PMIx to detect the difference.

Refs #4686

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-01-11 11:05:30 -08:00
Nathan Hjelm
47fd2313ab btl/vader: move backing files into /dev/shm on Linux
This commit moves the backing files to /dev/shm to avoid limitations
that may be set on /tmp. The files are registered with pmix to ensure
they are cleaned up after an erroneous exit.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 48101278160672317ade352365592f56ef3b8977)
2017-12-18 07:09:18 -08:00
Ralph Castain
07427c6d89 Update to PMIx v3.0 PR for cleanup registration
If available, have apps use registration capability to cleanup their session directories. Setup capability for vader to register its shared memory file location - let someone familiar with that code do so.

Final cleanup to track uid/gid, update the opal/pmix API to pass flags for ignore and leave top directory alone

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-12-18 06:53:11 -08:00
Gilles Gouaillardet
54c84196a6 btl/vader: plug a memory leak
as reported by Coverity with CID 1362691

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2016-12-22 16:04:36 +09:00
Nathan Hjelm
d4afb16f5a opal: rework mpool and rcache frameworks
This commit rewrites both the mpool and rcache frameworks. Summary of
changes:

 - Before this change a significant portion of the rcache
   functionality lived in mpool components. This meant that it was
   impossible to add a new memory pool to use with rdma networks
   (ugni, openib, etc) without duplicating the functionality of an
   existing mpool component. All the registration functionality has
   been removed from the mpool and placed in the rcache framework.

 - All registration cache mpools components (udreg, grdma, gpusm,
   rgpusm) have been changed to rcache components. rcaches are
   allocated and released in the same way mpool components were.

 - It is now valid to pass NULL as the resources argument when
   creating an rcache. At this time the gpusm and rgpusm components
   support this. All other rcache components require non-NULL
   resources.

 - A new mpool component has been added: hugepage. This component
   supports huge page allocations on linux.

 - Memory pools are now allocated using "hints". Each mpool component
   is queried with the hints and returns a priority. The current hints
   supported are NULL (uses posix_memalign/malloc), page_size=x (huge
   page mpool), and mpool=x.

 - The sm mpool has been moved to common/sm. This reflects that the sm
   mpool is specialized and not meant for any general
   allocations. This mpool may be moved back into the mpool framework
   if there is any objection.

 - The opal_free_list_init arguments have been updated. The unused0
   argument is not used to pass in the registration cache module. The
   mpool registration flags are now rcache registration flags.

 - All components have been updated to make use of the new framework
   interfaces.

As this commit makes significant changes to both the mpool and rcache
frameworks both versions have been bumped to 3.0.0.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-14 10:50:41 -06:00
Nathan Hjelm
2a0b3a5700 btl/vader: various threading fixes
This commit fixes several threading bugs:

 - Add an additional lock to the btl_base_endpoint_t structure to lock
   the list of pending frags. This allows the progress function to
   attempt to send pending frags without needing to drop/reaquire the
   lock. This should provide a small improvement in performance and
   fixes a potential race between adding an removing items from the
   pending list.

 - Ensure fast boxes are only set up once by updating the send count
   using atomics when needed and do not set the fast box buffer
   pointer until the fast box is set up.

Closes open-mpi/ompi#1408

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-03-02 10:50:59 -07:00
Nathan Hjelm
60591ae753 btl/vader: do not attempt to munmap opal/shmem pointer
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-12-15 08:48:04 -07:00
Nathan Hjelm
6f8f2325ed btl: btls are now required to set the send flag if supported
This commit updates each non-compliant btl to send the
MCA_BTL_FLAGS_SEND flag in the btl_flags field if send is
supported. This fixes a problem identified after the latest bml/r2
update which excplicitly checks for the send flag.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-09-10 08:55:54 -06:00
Ralph Castain
cf6137b530 Integrate PMIx 1.0 with OMPI.
Bring Slurm PMI-1 component online
Bring the s2 component online

Little cleanup - let the various PMIx modules set the process name during init, and then just raise it up to the ORTE level. Required as the different PMI environments all pass the jobid in different ways.

Bring the OMPI pubsub/pmi component online

Get comm_spawn working again

Ensure we always provide a cpuset, even if it is NULL

pmix/cray: adjust cray pmix component for pmix

Make changes so cray pmix can work within the integrated
ompi/pmix framework.

Bring singletons back online. Implement the comm_spawn operation using pmix - not tested yet

Cleanup comm_spawn - procs now starting, error in connect_accept

Complete integration
2015-08-29 16:04:10 -07:00
Nathan Hjelm
108f55a963 btl/vader: clean up progress of waiting endpoints
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-20 16:14:58 -06:00
Nathan Hjelm
69e70776aa btl/vader: fix double unlock
References #594

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-05-20 14:35:22 -06:00
Nathan Hjelm
f8158a5ec1 btl/vader: fix deadlock in mca_btl_vader_progress_endpoints
This commit fixes a typo in mca_btl_vader_progress_endpoints where
OPAL_THREAD_LOCK was used when OPAL_THREAD_UNLOCK was intended.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-14 09:34:45 -06:00
Nathan Hjelm
a7b0c00ab6 fix memory leaks and valgrind errors
This commit fixes several vagrind errors. Included:

 - installdirs did not correctly reinitialize all pointers to NULL
   at close. This causes valgrind errors on a subsequent call to
   opal_init_tool.

 - several opal strings were leaked by opal_deregister_params which
   was setting them to NULL instead of letting them be freed by the
   MCA variable system.

 - move opal_net_init to AFTER the variable system is initialized and
   opal's MCA variables have been registered. opal_net_init uses a
   variable registered by opal_register_params!

 - do not leak ompi_mpi_main_thread when it is allocated by
   MPI_T_init_thread.

 - do not overwrite ompi_mpi_main_thread if it is already set (by
   MPI_T_init_thread).

 - mca_base_var: read_files was overwritting mca_base_var_file_list
   even if it was non-NULL.

 - mca_base_var: set all file global variables to initial states on
   finalize.

 - btl/vader: decrement enumerator reference count to ensure that it
   is freed.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-04-11 09:28:35 -06:00
Nathan Hjelm
5f1254d710 Update code base to use the new opal_free_list_t
Use of the old ompi_free_list_t and ompi_free_list_item_t is
deprecated. These classes will be removed in a future commit.

This commit updates the entire code base to use opal_free_list_t and
opal_free_list_item_t.

Notes:

OMPI_FREE_LIST_*_MT -> opal_free_list_* (uses opal_using_threads ())

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-24 10:05:45 -07:00
Nathan Hjelm
aba0675fe7 btl/vader: update for BTL 3.0 interface
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-13 11:46:37 -07:00
Nathan Hjelm
f1dc29b145 btl/vader: fix modex size when xpmem is in use
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2015-02-12 14:06:24 -07:00
Nathan Hjelm
6733d89cf9 btl/vader: fix return code check when opening ptrace_scope file 2015-01-06 15:17:56 -07:00
Aurélien Bouteiller
ee3b090316 The fallback case when yama is not installed was not correct in CMA vader 2014-12-16 14:39:14 -05:00
Aurélien Bouteiller
0bf860ef02 indentation 2014-12-16 14:22:26 -05:00
Nathan Hjelm
38d66272c5 btl/vader: fix compile on SGI UV 2014-12-12 09:09:01 -07:00
Nathan Hjelm
1b564f62bd Revert "Merge pull request #275 from hjelmn/btlmod"
This reverts commit ccaecf0fd6c862877e6a1e2643f95fa956c87769, reversing
changes made to 6a19bf85dde5306f559f09952cf3919d97f52502.
2014-11-19 23:22:43 -07:00
Nathan Hjelm
249e5e009f Fix knem support in both sm and vader 2014-11-19 11:33:02 -07:00
Nathan Hjelm
e03956e099 Update the scif and openib btls for the new btl interface
Other changes:
 - Remove the registration argument from prepare_src since it no
   longer is meant for RDMA buffers.

 - Additional cleanup and bugfixes.
2014-11-19 11:33:02 -07:00
Nathan Hjelm
e1bc2de853 btl/vader: defensive programming: use an actual function for the dummy btl_get and btl_put 2014-10-22 14:57:55 -06:00
Nathan Hjelm
1a3734ae57 btl/vader: fix compilation on OS X 2014-10-21 09:27:36 -06:00
Gilles Gouaillardet
f56169cee6 btl/vader: silence warning
correctly check HAVE_SYS_PRCTL_H
2014-10-21 19:51:29 +09:00
Nathan Hjelm
13643f5b6e btl/vader: improved single-copy support
This commit makes the folowing changes:

 - Add support for the knem single-copy mechanism. Initially vader will only
   support the synchronous copy mode. Asynchronous copy support may be added
   int the future.

 - Improve Linux cross memory attach (CMA) when using restrictive ptrace
   settings. This will allow Open MPI to use CMA without modifying the system
   settings to support ptrace attach (see /etc/sysctl.d/10-ptrace.conf).

 - Allow runtime selection of the single copy mechanism. The default behavior
   is to use the best available. The priority list of single-copy mehanisms is
   as follows: xpmem, cma, and knem.

 - Allow disabling support for kernel-assisted single copy.

 - Some tuning and bug fixes.
2014-10-20 11:44:52 -06:00
Nathan Hjelm
12bfd13150 btl/vader: improve performance for both single and multiple threads
This is a large update that does the following:

 - Only allocate fast boxes for a peer if a send count threshold
   has been reached (default: 16). This will greatly reduce the memory
   usage with large numbers of local peers.

 - Improve performance by limiting the number of fast boxes that can
   be allocated per peer (default: 32). This will reduce the amount
   of time spent polling for fast box messages.

 - Provide new MCA variables to configure the size, maximum count,
   and send count thresholds for fast boxes allocations.

 - Updated buffer design to increase the range of message sizes that
   can be sent with a fast box.

 - Add thread protection around fast box allocation (locks). When
   spin locks are available this should be updated to use spin locks.

 - Various fixes and cleanup.

This commit was SVN r32774.
2014-09-23 18:11:22 +00:00
Ralph Castain
aec5cd08bd Per the PMIx RFC:
WHAT:    Merge the PMIx branch into the devel repo, creating a new
               OPAL “lmix” framework to abstract PMI support for all RTEs.
               Replace the ORTE daemon-level collectives with a new PMIx
               server and update the ORTE grpcomm framework to support
               server-to-server collectives

WHY:      We’ve had problems dealing with variations in PMI implementations,
               and need to extend the existing PMI definitions to meet exascale
               requirements.

WHEN:   Mon, Aug 25

WHERE:  https://github.com/rhc54/ompi-svn-mirror.git

Several community members have been working on a refactoring of the current PMI support within OMPI. Although the APIs are common, Slurm and Cray implement a different range of capabilities, and package them differently. For example, Cray provides an integrated PMI-1/2 library, while Slurm separates the two and requires the user to specify the one to be used at runtime. In addition, several bugs in the Slurm implementations have caused problems requiring extra coding.

All this has led to a slew of #if’s in the PMI code and bugs when the corner-case logic for one implementation accidentally traps the other. Extending this support to other implementations would have increased this complexity to an unacceptable level.

Accordingly, we have:

* created a new OPAL “pmix” framework to abstract the PMI support, with separate components for Cray, Slurm PMI-1, and Slurm PMI-2 implementations.

* Replaced the current ORTE grpcomm daemon-based collective operation with an integrated PMIx server, and updated the grpcomm APIs to provide more flexible, multi-algorithm support for collective operations. At this time, only the xcast and allgather operations are supported.

* Replaced the current global collective id with a signature based on the names of the participating procs. The allows an unlimited number of collectives to be executed by any group of processes, subject to the requirement that only one collective can be active at a time for a unique combination of procs. Note that a proc can be involved in any number of simultaneous collectives - it is the specific combination of procs that is subject to the constraint

* removed the prior OMPI/OPAL modex code

* added new macros for executing modex send/recv to simplify use of the new APIs. The send macros allow the caller to specify whether or not the BTL supports async modex operations - if so, then the non-blocking “fence” operation is used, if the active PMIx component supports it. Otherwise, the default is a full blocking modex exchange as we currently perform.

* retained the current flag that directs us to use a blocking fence operation, but only to retrieve data upon demand

This commit was SVN r32570.
2014-08-21 18:56:47 +00:00
George Bosilca
68c4ecbe06 Even less ompi usages down here.
This commit was SVN r32328.
2014-07-26 22:21:08 +00:00
George Bosilca
f217661ee0 Use opal_process_info whenever possible. Some other minor cleanups.
This commit was SVN r32325.
2014-07-26 21:48:23 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00