openmpi

Автор	SHA1	Сообщение	Дата
Rolf vandeVaart	2cf7c40ee5	Minor adjustments to error messages due to review of #3880 . This commit was SVN r29640.	2013-11-07 20:21:21 +00:00
Rolf vandeVaart	3290cde630	Various minor changes to bring smcuda up to date with sm. This commit was SVN r29639.	2013-11-07 19:45:56 +00:00
Dave Goodell	82db913490	usnic: fix module_recv_buffers perf regression Cisco v1.6 git commit 913ec6c and upstream trunk r29593 (segfault fix) introduced a performance regression by inadvertently disabling the `module_recv_buffers` functionality. With those changes in place, the `btl_usnic_recv.c` logic would end up mallocing a buffer that should have otherwise come from a `module_recv_buffers` pool. It also resulted in a small, bounded memory leak (128 buffers at each power-of-two size interval). The new version just places the buffer after the free list item with a flexible array member. I bumped the pool to allocate all 128 elements up front because the deferred allocation was modestly impacting IMB Sendrecv performance at a few sizes. Reviewed-by: Reese Faucette <rfaucett@cisco.com> This commit was SVN r29631. The following SVN revision numbers were found above: r29593 --> open-mpi/ompi@1ed9b8ff43	2013-11-07 01:27:31 +00:00
Vishwanath Venkatesan	d37a5faa20	Need not do aggregator selection for one process case So adding a check for this corner case! This commit was SVN r29622.	2013-11-06 21:05:26 +00:00
Brian Barrett	cf8de1ef0f	Minor indent cleanup in init_query() Only use Portals on communicators with more than one rank Fix computation of number of children when using the hypercube tree This commit was SVN r29616.	2013-11-06 15:21:09 +00:00
Jeff Squyres	e28261898d	Per discussion on the devel list, rename the btl_usnic_devices MPI_T state pvar to be btl_usnic (i.e., the best suggestion so far). See http://www.open-mpi.org/community/lists/devel/2013/11/13188.php for more detail. This commit was SVN r29614.	2013-11-06 06:19:03 +00:00
Rolf vandeVaart	e46c0bb952	Fix one more space for consistent defines. This commit was SVN r29607.	2013-11-05 15:31:49 +00:00
Rolf vandeVaart	64b3a24fec	Fix CUDA-aware compile issues. This commit was SVN r29606.	2013-11-05 14:46:58 +00:00
Rolf vandeVaart	e57795f097	Revert r29594. That was just plain wrong. Sorry about workday configure change. This commit was SVN r29605. The following SVN revision numbers were found above: r29594 --> open-mpi/ompi@ed7ddcd9c7	2013-11-05 14:45:56 +00:00
Rolf vandeVaart	ed7ddcd9c7	Fix CUDA-aware compile error introduces with r29581. This commit was SVN r29594. The following SVN revision numbers were found above: r29581 --> open-mpi/ompi@ee7510b025	2013-11-05 00:08:33 +00:00
Dave Goodell	1ed9b8ff43	usnic: fix segfault at finalize time Without this commit, if you run IMB pingpong between two nodes with only one usnic selected (e.g., via `--mca btl_usnic_if_include usnic_0`) then the run will seem fine but will segfault at MPI_Finalize time. This behavior has happened since Cisco v1.6 git commit ec7ddf8, upstream trunk r29484, and upstream v1.7 r29507. Root cause was that the free list element was being used as the recv buffer instead of the data buffer associated with the element. So the reassembly code would stomp all over the free list element, which would cause the destructor to explode when the free list attempted to clean up all of its elements. This surprisingly did not cause any other problems until now. Reviewed-by: Reese Faucette <rfaucett@cisco.com> This commit was SVN r29593. The following SVN revision numbers were found above: r29484 --> open-mpi/ompi@a6ed232a10 r29507 --> open-mpi/ompi@790d269ce8	2013-11-04 22:52:14 +00:00
Dave Goodell	73a943492c	usnic: pack via convertor on the fly If we need to use a convertor, go back to stashing that convertor in the frag and populating segments "on the fly" (in ompi_btl_usnic_module_progress_sends). Previously we would pack into a chain of chunk segments at prepare_src time, unnecessarily consuming additional memory. Reviewed-by: Jeff Squyres <jsquyres@cisco.com> Reviewed-by: Reese Faucette <rfaucett@cisco.com> This commit was SVN r29592.	2013-11-04 22:52:03 +00:00
Dave Goodell	71d0d73575	usnic: refactor callback invocation This makes it a little easier to see what's happening with callbacks to the PML. Reviewed-by: Jeff Squyres <jsquyres@cisco.com> Reviewed-by: Reese Faucette <rfaucett@cisco.com> This commit was SVN r29591.	2013-11-04 22:51:48 +00:00
Dave Goodell	4c791e21d2	usnic: add MSGDEBUG1_OUT/MSGDEBUG2_OUT macros This includes suppressing picky-mode warnings about __VA_ARGS__, which we know are supported by any compilers we care about. Reviewed-by: Jeff Squyres <jsquyres@cisco.com> Reviewed-by: Reese Faucette <rfaucett@cisco.com> This commit was SVN r29590.	2013-11-04 22:51:35 +00:00
Dave Goodell	825686a205	usnic: certain send frag members are immutable Ensure that they never are touched by checking in their destructors. Reviewed-by: Jeff Squyres <jsquyres@cisco.com> Reviewed-by: Reese Faucette <rfaucett@cisco.com> This commit was SVN r29589.	2013-11-04 22:51:24 +00:00
Nathan Hjelm	c71125acfd	Using MPI_* functions in iallreduce can cause comm-spawned processes to crash. Update libnbc's iallreduce function to use ompi_* functions instead. cmr=v1.7.4:reviewer=brbarret This commit was SVN r29582.	2013-11-01 16:45:54 +00:00
Rolf vandeVaart	ee7510b025	Remove redundant macro. This was from reviewed of earlier ticket. Fixes trac:3878. Reviewed by jsquyres. This commit was SVN r29581. The following Trac tickets were found above: Ticket 3878 --> https://svn.open-mpi.org/trac/ompi/ticket/3878	2013-11-01 12:19:40 +00:00
Rolf vandeVaart	99f9fdee01	Fix corner case involving threads and CUDA-aware support. This commit was SVN r29579.	2013-10-31 20:53:46 +00:00
Nathan Hjelm	fd25b7af01	Fix common ugni Makefile.am for non-DSO builds. This commit was SVN r29571.	2013-10-30 19:37:14 +00:00
Nathan Hjelm	a31e617d17	Remove outdated comments in coll_basic_reduce_scatter.c. Refs trac:1559 This commit was SVN r29566. The following Trac tickets were found above: Ticket 1559 --> https://svn.open-mpi.org/trac/ompi/ticket/1559	2013-10-30 16:20:20 +00:00
Mike Dubman	b0e64427a9	ompi/mca/btl/openib: Fix memory leak and accessing free'd memory issues Let imagine that we have two btls in btl_openib_component_init() both points to the same openib_btl->device and as a result have the same openib_btl->device->endpoints array. Finalization phase calls twice mca_btl_openib_finalize()->mca_btl_openib_finalize_resources(). mca_btl_openib_finalize_resources() frees endpoint related btl. But the second call of mca_btl_openib_finalize_resources() checks endpoint that is released by previus call. fixed by Igor, reviewed by miked/vasily cmr=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r29563.	2013-10-30 11:47:49 +00:00
Nathan Hjelm	167d5613db	Do not do arithmetic with void * in basic neighborhood alltoall[vw]. cmr=v1.7.4:reviewer=jsquyres This commit was SVN r29558.	2013-10-29 20:02:13 +00:00
Jeff Squyres	6569019b06	Move all usNIC stats to _stats.c\|h and export them as MPI_T pvars. This commit moves all the module stats into their own struct so that the stats only need to appear as a single line in the module_t definition, and then moves all the logic for reporting the stats into btl_usnic_stats.c\|h. Further, the stats are now exported as MPI_T_BIND_NO_OBJECT entities (i.e., not bound to any particular MPI handle), and are marked as READONLY and CONTINUOUS. They currently all default to verbose level 5 ("Application tuner / detailed", according to https://svn.open-mpi.org/trac/ompi/wiki/MCAParamLevels). Most of the statistics are counters, but a small number are high watermark values. Due to how counters are reported via MPI_T, none of the counters are exported through MPI_T if the MCA param btl_usnic_stats_relative=1 (i.e., the module resets the stats back to zero at a given frequency). When MPI_T_pvar_handle_alloc() is invoked on any of these pvars, it will return a count that is equal to the number of active usnic BTL modules. The values returned for any given pvar (e.g., num_total_sends) are an array containing one value for each active usnic BTL module. The ordering of values in the array is both consistent across all usnic pvars and stable throughout a single job: array slot 0 corresponds to module X, array slot 1 corresponds to module Y, etc. Mapping which array slot corresponds to which underlying Linux usnic_X device works as follows: * The btl_usnic_devices MPI_T state pvar is associated with a btl_usnic_device MPI_T enum, and be obtained via MPI_T_pvar_get_info(). * If all usNIC pvars are of length N, the values [0,N) in the btl_usnic_device enum are associated with strings of the corresponding underlying Linux device. For exampe, to look up which Linux device is reported in all usNIC pvars' array slot 1, look up the int value 1 in the btl_usnic_devices enum. Its corresponding string value is underlying Linux device name (e.g., "usnic_1"). cmr=v1.7.4:subject="usnic BTL MPI_T pvars" This commit was SVN r29545.	2013-10-28 22:23:08 +00:00
Nathan Hjelm	404cceb9c4	Always check the return of [mc]alloc and fix a warning introduced by r29479. This fixes some issues reported awhile ago in the openib btl. There are a couple more unchecked mallocs but they are a bit more difficult to fix since they are in void functions (btl_openib_endpoint.c). Refs trac:2401. cmr=v1.7.4:reviewer=miked This commit was SVN r29543. The following SVN revision numbers were found above: r29479 --> open-mpi/ompi@d6ead2a3a5 The following Trac tickets were found above: Ticket 2401 --> https://svn.open-mpi.org/trac/ompi/ticket/2401	2013-10-28 20:04:49 +00:00
Nathan Hjelm	b202bb0d63	Fix the recursive halfing algorithms for reduce scatter in both basic and tuned to correctly handle 0 recvcounts. Tested with the reproducer from #1550. Refs trac:1559 This commit was SVN r29542. The following Trac tickets were found above: Ticket 1559 --> https://svn.open-mpi.org/trac/ompi/ticket/1559	2013-10-28 19:06:38 +00:00
Rolf vandeVaart	fa5d20a5ec	Add optimization that can be used when CUDA 6.0 comes out. Use new pointer attribute. This commit was SVN r29514.	2013-10-24 21:17:58 +00:00
Rolf vandeVaart	628a109a74	Make casting very clear. This commit was SVN r29511.	2013-10-24 14:40:01 +00:00
Rolf vandeVaart	5687e4387d	Fix compiler warning. This commit was SVN r29510.	2013-10-24 13:11:12 +00:00
Nathan Hjelm	26f3a029d3	Fix scif configury. cmr=v1.7.4:ticket=3862 This commit was SVN r29493. The following Trac tickets were found above: Ticket 3862 --> https://svn.open-mpi.org/trac/ompi/ticket/3862	2013-10-23 17:04:20 +00:00
Nathan Hjelm	6186b5ed9d	Remove extra file that made its way into r29490. cmr=v1.7.4:ticket=3862 This commit was SVN r29491. The following SVN revision numbers were found above: r29490 --> open-mpi/ompi@cde3b05ed3 The following Trac tickets were found above: Ticket 3862 --> https://svn.open-mpi.org/trac/ompi/ticket/3862	2013-10-23 16:17:51 +00:00
Nathan Hjelm	cde3b05ed3	Add support for the Intel scif interface. Depends on #3847. cmr=v1.7.4:reviewer=rhc This commit was SVN r29490.	2013-10-23 15:59:14 +00:00
Dave Goodell	e9dbb66e58	mpool/rdma: fix memory leak at module finalize Reviewed-by: Jeff Squyres <jsquyres@cisco.com> This commit was SVN r29487.	2013-10-23 15:51:55 +00:00
Dave Goodell	647e5a6fd2	rcache/vma: fix module finalization memory leaks Reviewed-by: Jeff Squyres <jsquyres@cisco.com> This commit was SVN r29486.	2013-10-23 15:51:44 +00:00
Dave Goodell	d969cfa513	usnic: correctly clean up verbs resources Due to deallocation ordering (and an entirely missed deallocation), we were leaking modest amounts of memory inside libusnic_verbs. Reviewed-by: Jeff Squyres <jsquyres@cisco.com> This commit was SVN r29485.	2013-10-23 15:51:33 +00:00
Dave Goodell	a6ed232a10	usnic: fix several memory leaks - some free lists simply were not being OBJ_DESTRUCTed, so they never freed their internal memory - channel->recv_segs.ctx was being assigned in a way that got clobbered by ompi_free_list_init_new, so the cleanup code that relied on it being set never ran - numerous other ".ctx" assignments were similarly ineffectual and were not being consumed, so I deleted them Reviewed-by: Jeff Squyres <jsquyres@cisco.com> This commit was SVN r29484.	2013-10-23 15:51:22 +00:00
Dave Goodell	c9b2343982	usnic: add ompi_btl_usnic_component_debug helper This new routine can be called in exceptional situations, either conditionally in BTL code or from a debugger, to help with debugging in cases where MSGDEBUG1/2 or stats logging are impractical but more detail is needed. Reviewed-by: Jeff Squyres <jsquyres@cisco.com> This commit was SVN r29483.	2013-10-23 15:51:11 +00:00
Dave Goodell	d0b7d125b2	usnic: refactor usnic_stats_callback Pull the bulk of the functionality out into a new routine, ompi_btl_usnic_print_stats, which can be used in other debugging contexts. This also lets us eliminate the module->final_stats state tracking. Reviewed-by: Jeff Squyres <jsquyres@cisco.com> This commit was SVN r29482.	2013-10-23 15:50:57 +00:00
Jeff Squyres	0fb8edd720	Trivial comment change This commit was SVN r29480.	2013-10-23 10:15:18 +00:00
Mike Dubman	d6ead2a3a5	Add support for routable ROCE where different subnet_id is a valid to proceed with MPI routing. (can happen in the same LAN) developed by vasily, reviewed by miked cmr=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r29479.	2013-10-23 06:08:54 +00:00
Rolf vandeVaart	3c916d55c9	Fix two issues pointed out in review of ticket #3870 . This commit was SVN r29473.	2013-10-22 17:28:12 +00:00
Mike Dubman	d27cffedb9	expand tabs to 4 spaces cd ompi/mca/coll/fca for i in *.[ch]; do expand -t 4 $i > koko && mv koko $i; done Refs: #3799 This commit was SVN r29472.	2013-10-22 17:05:55 +00:00
Jeff Squyres	6714890244	paffinity.h is gone and won't be coming back. This commit was SVN r29467.	2013-10-22 15:59:00 +00:00
Nathan Hjelm	280a89448f	Make btl/vader valgrind safe. cmr=v1.7.4:reviewer=samuel This commit was SVN r29464.	2013-10-22 15:33:32 +00:00
Jeff Squyres	09fae6e62b	Prefix DSO filenames with "lib" so that Automake doesn't complain. Follow the convention established by the ompi/mca/common/sm tree and prefix both the "install" and "no install" versions of the build with "lib" so that Automake doesn't complain. Differentiate the two by adding a "_noinst" suffix to the "no install" version. This commit was SVN r29462.	2013-10-22 13:16:33 +00:00
Rolf vandeVaart	0cd1e8dfd9	Add runtime support to turn off CUDA IPC support. This commit was SVN r29444.	2013-10-16 16:48:18 +00:00
Rolf vandeVaart	9f83405c78	Fix one more corner case initialization issue. This commit was SVN r29443.	2013-10-16 16:39:19 +00:00
Ralph Castain	24c811805f	************************************************************** This change contains a non-mandatory modification of the MPI-RTE interface. Anyone wishing to support coprocessors such as the Xeon Phi may wish to add the required definition and underlying support ************************************************************** Add locality support for coprocessors such as the Intel Xeon Phi. Detecting that we are on a coprocessor inside of a host node isn't straightforward. There are no good "hooks" provided for programmatically detecting that "we are on a coprocessor running its own OS", and the ORTE daemon just thinks it is on another node. However, in order to properly use the Phi's public interface for MPI transport, it is necessary that the daemon detect that it is colocated with procs on the host. So we have to split the locality to separately record "on the same host" vs "on the same board". We already have the board-level locality flag, but not quite enough flexibility to handle this use-case. Thus, do the following: 1. add OPAL_PROC_ON_HOST flag to indicate we share a host, but not necessarily the same board 2. modify OPAL_PROC_ON_NODE to indicate we share both a host AND the same board. Note that we have to modify the OPAL_PROC_ON_LOCAL_NODE macro to explicitly check both conditions 3. add support in opal/mca/hwloc/base/hwloc_base_util.c for the host to check for coprocessors, and for daemons to check to see if they are on a coprocessor. The former is done via hwloc, but support for the latter is not yet provided by hwloc. So the code for detecting we are on a coprocessor currently is Xeon Phi specific - hopefully, we will find more generic methods in the future. 4. modify the orted and the hnp startup so they check for coprocessors and to see if they are on a coprocessor, and have the orteds pass that info back in their callback message. Automatically detect that coprocessors have been found and identify which coprocessors are on which hosts. Note that this algo isn't scalable at the moment - this will hopefully be improved over time. 5. modify the ompi proc locality detection function to look for coprocessor host info IF the OMPI_RTE_HOST_ID database key has been defined. RTE's that choose not to provide this support do not have to do anything - the associated code will simply be ignored. 6. include some cleanup of the hwloc open/close code so it conforms to how we did things in other frameworks (e.g., having a single "frame" file instead of open/close). Also, fix the locality flags - e.g., being on the same node means you must also be on the same cluster/cu, so ensure those flags are also set. cmr:v1.7.4:reviewer=hjelmn This commit was SVN r29435.	2013-10-14 16:52:58 +00:00
Mike Dubman	5a7dff2d15	fix icc warning fixed by Dinar, reviewed by miked cmr=v1.7.4:reviewer=ompi-gk1.7 This commit was SVN r29428.	2013-10-12 18:04:28 +00:00
Jeff Squyres	b5e2ae86ad	Remove all of our "to-do" items from the README.txt. This commit was SVN r29424.	2013-10-11 16:43:56 +00:00
Rolf vandeVaart	fbf143f3b4	Move another function that was missed in r29347. This commit was SVN r29422. The following SVN revision numbers were found above: r29347 --> open-mpi/ompi@ce61985503	2013-10-10 14:48:56 +00:00
Jeff Squyres	d9be19f011	Added shared library versions to those who were missing it. The following common shared libraries did not have versioning: * ompi/common/ofacm * ompi/common/verbs * ompi/common/ugni Additionally, we still had shared library versions in VERSION for the following libraries, which no longer exist: * ompi/common/portals * opal/common/hwloc This commit was SVN r29421.	2013-10-10 13:25:57 +00:00
Ralph Castain	9902748108	*** THIS INCLUDES A SMALL CHANGE IN THE MPI-RTE INTERFACE *** Fix two problems that surfaced when using direct launch under SLURM: 1. locally store our own data because some BTLs want to retrieve it during add_procs rather than use what they have internally 2. cleanup MPI_Abort so it correctly passes the error status all the way down to the actual exit. When someone implemented the "abort_peers" API, they left out the error status. So we lost it at that point and always exited with a status of 1. This forces a change to the API to include the status. cmr:v1.7.3:reviewer=jsquyres:subject=Fix MPI_Abort and modex_recv for direct launch This commit was SVN r29405.	2013-10-08 18:37:59 +00:00
Jeff Squyres	66dadbe1e7	Per RFC, remove the udapl BTL. This commit was SVN r29400.	2013-10-08 15:18:59 +00:00
Rolf vandeVaart	3bd02fbaf5	Add one more verbose debug output that prints when we are out of memory. This commit was SVN r29378.	2013-10-04 18:56:06 +00:00
Rolf vandeVaart	66725f6973	Enable some CUDA-aware support on tcp btl. Only when configured in. This commit was SVN r29364.	2013-10-04 12:50:16 +00:00
Ralph Castain	f4f2287958	Singletons currently start out by spawning an HNP - this is required solely in the cases where the singleton subsequently calls MPI_Comm_spawn or publishes port info without support from an external orte-server. In all other cases, the HNP is of no value and can actually be a detriment by creating additional overhead on the node. This is particularly concerning for async operations where processes may begin as singletons and then dynamically wireup to perform pt2pt communications. So we now allow singletons to start on their own, only spawning an HNP when initiating an operation that actually requires it. cmr:v1.7.4:reviewer=jsquyres This commit was SVN r29354.	2013-10-04 02:58:26 +00:00
Rolf vandeVaart	4dd1c86b36	Add a few support functions for future features. This commit was SVN r29353.	2013-10-03 21:06:17 +00:00
Rolf vandeVaart	ce61985503	Move registration function inside initial initialization function. This commit was SVN r29347.	2013-10-03 14:14:42 +00:00
Nathan Hjelm	6232ef3bfb	At coll_select time we can not check whether the communicator has a virtual topology. Remove code checking for a virtual topology until this flag is set before coll_select. This commit was SVN r29344.	2013-10-03 03:37:46 +00:00
Nathan Hjelm	7bedf62dd8	Add basic algorithms for the remaining non-blocking collectives. The algorithms are intended for MPI-3.0 compliance and are not optimized. We should aim to add better algorithms in the future through cheetah. MPI_Iallreduce and MPI_Igatherv on intercommunicators are required for MPI_Comm_idup support. cmr=v1.7.4:reviewer=brbarret:ticket=trac:2715 This commit was SVN r29333. The following Trac tickets were found above: Ticket 2715 --> https://svn.open-mpi.org/trac/ompi/ticket/2715	2013-10-02 14:26:23 +00:00
Mike Dubman	19748e6957	fix race condition which can happen on finalize 1. Change in rte api implementation: now comm_world used to do p2p. This allows to not worry about other comms being destroyed. 2. added a notification mechanism with a help of which runtime can say libhcoll that RTE api can not be used any longer. pass a pointer to a flag, and its size to libhcoll. The flag changes when the RTE is no longer available. Currently this flag is just ompi_mpi_finalized global bool value. cmr=v1.7.3:reviewer=jladd This commit was SVN r29331.	2013-10-02 13:38:47 +00:00
Nathan Hjelm	4f12406436	Don't check for neighborhood collective routines on non-virtual topology communicators This commit was SVN r29319.	2013-10-01 19:59:18 +00:00
Nathan Hjelm	f3d18028e5	Fix typo in uGNI prepare source that could cause incorrect results with non-contiguous datatypes. cmr=v1.7.3 This commit was SVN r29294.	2013-09-30 16:00:58 +00:00
Mike Dubman	9bf7578ff2	fix memory corruption cmr:v1.7.3:reviewer=ompi-rm1.7 This commit was SVN r29293.	2013-09-30 06:18:12 +00:00
Ralph Castain	d565a76814	Do some cleanup of the way we handle modex data. Identify data that needs to be shared with peers in my job vs data that needs to be shared with non-peers - no point in sharing extra data. When we share data with some process(es) from another job, we cannot know in advance what info they have or lack, so we have to share everything just in case. This limits the optimization we can do for things like comm_spawn. Create a new required key in the OMPI layer for retrieving a "node id" from the database. ALL RTE'S MUST DEFINE THIS KEY. This allows us to compute locality in the MPI layer, which is necessary when we do things like intercomm_create. cmr:v1.7.4:reviewer=rhc:subject=Cleanup handling of modex data This commit was SVN r29274.	2013-09-27 00:37:49 +00:00
Ralph Castain	bc92c260ca	Add missing library dependency cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29273.	2013-09-27 00:08:43 +00:00
Dave Goodell	2c7975eb86	common_verbs: fix bad opal_output args Spotted by Reese Faucette <rfaucett@cisco.com>. cmr=v1.7.3 This commit was SVN r29267.	2013-09-26 21:59:00 +00:00
Nathan Hjelm	0b8fc13299	MPI-3.0: update C bindings with const and consistent use of [] for arrays. The MPI 3.0 standard added const to all in buffers in the C bindings. This commit adds the const keyword and in most cases casts const away. We will eventually should go through and update the various interfaces (coll, pml, io, etc) to take the const keyword. The group, comm, win, and datatype interfaces have been updated with const. cmr=v1.7.4:ticket=trac:3785:reviewer=jsquyres This commit was SVN r29266. The following Trac tickets were found above: Ticket 3785 --> https://svn.open-mpi.org/trac/ompi/ticket/3785	2013-09-26 21:56:20 +00:00
Nathan Hjelm	c5596548b2	MPI-3: Add support for neighborhood collectives Blocking versions are simple linear algorithms implemented in coll/basic. Non- blocking versions are from libnbc 1.1.1. All algorithms have been tested with simple test cases. cmr=v1.7.4:reviewer=jsquyres This commit was SVN r29265.	2013-09-26 21:55:08 +00:00
Dave Goodell	a42fa78da7	usnic: SEGV in OSU benchmarks Prevent frag from being freed out from under us in the case the PML callback routine calls usnic_free(). We accomplish this by delaying decrement of sf_bytes_to_ack until after the callback is performed, since sf_bytes_to_ack == 0 is condition of freeing the frag. Fixes Cisco bug CSCuj45094. Authored-by: Reese Faucette <rfaucett@cisco.com> cmr=v1.7.3 This commit was SVN r29264.	2013-09-26 21:48:04 +00:00
Mike Dubman	7c6ff00da5	Add caching of FCA communicators developed by Dinar, reviewed by miked/yossi. cmr:v1.7.3:reviewer=jsquyres:subject=add caching of FCA communicators. This commit was SVN r29256.	2013-09-26 17:48:07 +00:00
Rolf vandeVaart	d67e3077f5	Add a check for the CUDA 6.0 version of the cuda.h header file. This commit was SVN r29250.	2013-09-26 12:46:06 +00:00
Joshua Ladd	82e092db1b	Adding interface changes in hcoll component to support non-blocking collectives in libhcoll. This was added by Elena Elkina and reviewed by Josh Ladd. cmr:v1.7.3:reviewer=jladd:subject=Add support for non-blocking collectives in hcoll This commit was SVN r29244.	2013-09-25 16:14:59 +00:00
Ralph Castain	9aeba777fa	Ensure we don't enter into an infinite loop looking for the PML modex key if it isn't present. The PMI implementation will load ALL modex keys when the first key is queried, so the hash db component can safely return "not found" if a subsequent key isn't present. The PML modex_recv needs to assume everything is okay if the modex recv fails to return a value. cmr:v1.7.3:reviewer=jladd:subject=Prevent infinite loop when PML modex not found This commit was SVN r29243.	2013-09-25 16:04:00 +00:00
Rolf vandeVaart	667c66941b	Remove redundant (and possibly erroneous) alignment code in rcache. It is already handled by users of the rcache. This was per RFC http://www.open-mpi.org/community/lists/devel/2013/09/12927.php and discussed in developers meeting. This commit was SVN r29233.	2013-09-24 17:23:50 +00:00
Rolf vandeVaart	3b5e0736a3	Adjust verbosity levels upward. This commit was SVN r29232.	2013-09-24 14:35:48 +00:00
Ralph Castain	34fbec1f49	Sadly, the connection priorities being defined at time of variable instantiation were being overridden just before registering the param. Thus, changes people made to the relative priority of the cpc methods were being lost. Fix it be removing the duplicate initializiation, letting the value defined at instantiation be the one actually used. cmr:v1.7.4:reviewer=hjelmn This commit was SVN r29212.	2013-09-19 19:45:00 +00:00
Rolf vandeVaart	804545278f	Per discussion on devel list, delete unused registration cache. http://www.open-mpi.org/community/lists/devel/2013/08/12803.php This component was .ompi_ignored on December 17, 2006 by gleb. Now, it is time for it go.... This commit was SVN r29209.	2013-09-18 21:22:34 +00:00
Rolf vandeVaart	c9a33fad83	Fix some tabs. Add optional messsage to dump. Some minor format change to dump function. This commit was SVN r29208.	2013-09-18 21:08:15 +00:00
Mike Dubman	2a5c342587	Modifications that are necessary in order to meet latest libhcoll API. cmr:v1.7.3:reviewer=jladd This commit was SVN r29202.	2013-09-18 12:22:02 +00:00
Ralph Castain	865a7028f8	Per patch from George, with a few minor cleanups. Correctly address the complete exchange of required wireup information in Intercomm_create so all procs in the resulting communicator know how to talk to each other. Refs trac:29166 This commit was SVN r29200. The following Trac tickets were found above: Ticket 29166 --> https://svn.open-mpi.org/trac/ompi/ticket/29166	2013-09-18 02:01:30 +00:00
Ralph Castain	99611ac1d2	Revert r29166 in favor of a better solution from George This commit was SVN r29199. The following SVN revision numbers were found above: r29166 --> open-mpi/ompi@497c7e6abb	2013-09-18 01:41:26 +00:00
George Bosilca	55273f1c98	Cleanup spaces, nothing else. This commit was SVN r29197.	2013-09-18 00:07:58 +00:00
George Bosilca	7b319a101d	Fix the case where we build without Fortran support. This commit was SVN r29194.	2013-09-17 20:45:46 +00:00
Nathan Hjelm	7929fb9dea	Cleanup complex datatypes and update datatypes and operator code to use C99. This commit changes the underlying opal complex datatypes to match the C99 types: float _Complex, double _Complex, and long double _Complex. The fortran and C++ types now are aliases to these basic types instead of structure types. The operators in ompi/mca/op/base now work on only the C99 types and the fortran types use these operators if the fortran type matches a C complex type (this should almost always be the case.) C99 is not is use in both the datatype and operator code and should make the code both cleaner and much less fragile. This commit was SVN r29193.	2013-09-17 17:49:42 +00:00
Rolf vandeVaart	440632b57f	Add a function that will dump out the contents of the memory registration cache. Useful for debugging any rcache issues. This commit was SVN r29189.	2013-09-17 15:40:32 +00:00
Jeff Squyres	74d1278f48	btl_usnic_util.c:ompi_btl_usnic_util_abort() also passes in the strerror(). This commit was SVN r29188.	2013-09-17 12:35:51 +00:00
George Bosilca	5f686a90d0	Fix several issues regarding MPI_IN_PLACE and different flavors of MPI_Alltoall. - add support for MPI_IN_PLACE in the self collective component. - fix the extent usage in the tuned collective component. - correctly use the peer counts instead of local - add support for MPI_IN_PLACE in the self collective component. - fix the extent usage in the tuned collective component. - correctly use the peer counts instead of local. Thanks to Fujitsu for the patch. This commit was SVN r29187.	2013-09-17 11:35:18 +00:00
Reese Faucette	8f235e6977	usnic: wrong SG entry used to compute length for small put()s This commit was SVN r29186.	2013-09-17 08:18:02 +00:00
Reese Faucette	651d61f1a3	Clean up debugging logging a bit. MSGDEBUG2 now means "print a one-liner for all PML calls into BTL, and also when BTL calls PML with a recv completion (not send completions)" MSGDEBUG1 means print more internal gory detail MSGDEBUG is gone, replaced by MSGDEBUG1 In the process also found that PUT_DEST style fragments could potentially be leaked in usnic_free() since send_fragment tests were being applied to see if it was eligible to be freed. This commit was SVN r29185.	2013-09-17 07:29:40 +00:00
Reese Faucette	f35d9b50e3	Cisco CSCuj22803: fixes for Bsend changes required to support MPI_Bsend(). Introduces concept of attaching a buffer to a large segment that the PML can scribble into and we will send from. The reason we don't use a pinned buffer and send directly from that is that usnic_verbs does not (yes) support num_sge>1 for regular sends. This means the data gets copied twice, but that is unavoidable. changed the logic in handle_large_send to be more sensible Incorporated David's review comments This commit was SVN r29184.	2013-09-17 07:27:39 +00:00
Reese Faucette	25b5c84d0f	Cisco CSCuj13135: Data corruption in MPI_Bsend_ator_c Do not assume that the "size" passed to alloc_send() will be the same as the size of the message the resulting fragment will hold when usnic_send() is called. This means usnic_send()/usnic_put() can never trust any pre-computed size values, and are only allowed to look at the lengths and pointers of the elements in the desc SG list. This commit was SVN r29183.	2013-09-17 07:25:05 +00:00
Reese Faucette	b9103c0f66	Cisco CSCuj12524: c_put_big segfault - usnic_free() cannot free the fragment until ACK is received This commit was SVN r29182.	2013-09-17 07:23:15 +00:00
Reese Faucette	89b5f0899b	Cisco CSCuj12520: various problems running c_fence_put_1 - tag needs to be sent in our header, not the PML header - usnic_alloc() should return smaller value if too much data requested - be careful about callbacks vs removing items from lists (we need to remove from outr lists before the callback) - improve send callback handling - add some more MSGDEBUG2 logging and cleanup This commit was SVN r29181.	2013-09-17 07:20:44 +00:00
Mike Dubman	432c10750a	enable mxm2 from np>0 reviewed by yossi cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29178.	2013-09-16 12:36:28 +00:00
Mike Dubman	44bfa95553	enable mxm2 by default on np>=0 reviewed by yossi cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29177.	2013-09-16 12:32:29 +00:00
Ralph Castain	52caa75552	Gar - forgot to commit a few more cleanups Refs trac:3696 This commit was SVN r29168. The following Trac tickets were found above: Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696	2013-09-15 15:32:01 +00:00
Ralph Castain	b64c8dafd8	Cleanup some errors in pubsub - must set the active flag before posting the recv in case the message has already arrived Refs trac:3696 This commit was SVN r29167. The following Trac tickets were found above: Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696	2013-09-15 15:26:32 +00:00
Ralph Castain	497c7e6abb	Fixes trac:2904 The intercomm "merge" function can create a linkage between procs that was not reflected anywhere in a modex, and so at least some of the procs in the resulting communicator don't know how to talk to some of the new communicator's peers. For example, consider the case where: 1. parent job A comm_spawns a process (job B) - these processes exchange modex and can communicate 2. parent job A now comm_spawns another process (job C) - again, these can communicate, but the proc in C knows nothing of B 3. do an intercomm merge across the communicators created by the two comm_spawns. This puts B and C into the same communicator, but they know nothing about how to talk to each other as they were not involved in any exchange of contact info. Hence, collectives on that communicator now fail. This fix adds an API to the ompi/dpm framework that (a) exchanges the modex info across the procs in the merge to ensure all procs know how to communicate, and (b) calls add_procs to give the btl's a chance to select transports to any new procs. cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29166. The following Trac tickets were found above: Ticket 2904 --> https://svn.open-mpi.org/trac/ompi/ticket/2904	2013-09-15 15:00:40 +00:00
Rolf vandeVaart	096b8c022e	Also add flag to debug output. This commit was SVN r29163.	2013-09-13 19:47:05 +00:00
Rolf vandeVaart	c15b2a26b8	Fix some formatting. Move some CUDA-aware mca parameter initialization earlier. This commit was SVN r29162.	2013-09-13 17:43:41 +00:00
Rolf vandeVaart	d247c26b84	In the case that HAVE_IBV_FORK_INIT is not defined, we will need this variable so we can give the user an error if they ask for it. Also fixes compile error when HAVE_IBV_FORK_INIT is not defined. This commit was SVN r29160.	2013-09-13 14:38:49 +00:00
Rolf vandeVaart	ba9ec1b8bc	For debug builds, add the ability to view memory registrations and deregistrations in the openib BTL. This commit was SVN r29159.	2013-09-13 14:28:26 +00:00
Joshua Ladd	b3f88c4a1d	Per the RFC schedule, this commit adds Mellanox OpenSHMEM to the trunk. It does not yet run on OSX or with CM PML for an MTL other than MXM. Mellanox is aware of these issues and is in the process of resolving them. This should be added to \ncmr=v1.7.4:subject=Move OSHMEM to 1.7.4:reviewer=rhc This commit was SVN r29153.	2013-09-10 15:34:09 +00:00
Jeff Squyres	c9f05a2664	Delineate OMPI_FREE_LIST__MT separately. The FREE_LIST__MT stuff was introduced on the SVN trunk in r28722 (2013-07-04), but so far, has not been merged into the v1.7 branch yet (2013-09-06). So put it in its own #ifdef, rather than defining it based on OMPI_MAJOR_VERSION/OMPI_MINOR_VERSION. This commit was SVN r29148. The following SVN revision numbers were found above: r28722 --> open-mpi/ompi@c9e5ab9ed1	2013-09-06 19:22:56 +00:00
Jeff Squyres	e02cc0a7ec	No need for this header file. This commit was SVN r29147.	2013-09-06 19:22:28 +00:00
Jeff Squyres	c53b0890cf	Ensure that btl_usnic_compat.h is in the tarball. This commit was SVN r29140.	2013-09-06 15:53:56 +00:00
Dave Goodell	75fa28c303	usnic: v1.6<->trunk unification, trunk side The Cisco-maintained v1.6 port of the usnic BTL has diverged from the upstream trunk and v1.7 branches. This commit adjusts the trunk to more closely match the v1.6 branch to simplify future merging and cherry-picking. The usnic MCA parameters also need work on this side. Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760) This commit was SVN r29138. The following Trac tickets were found above: Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760	2013-09-06 03:21:34 +00:00
Dave Goodell	a669bd01e6	usnic: revamp convertor handling. The fix for the HPL SEGV was incorrect because it assumed the prepare_src() routine was always allowed to return "bytes processed" less than the requested "bytes to send". It turns out this is only true if the convertor is what limits the size, we are not allowed to limit the data sent for our own reasons, else we break login in the upper layers. This means we need to learn the number of bytes out of the size requested the convertor will give us, no matter how big the size is. Unfortunately, this is a destructive test, and (currently) the only way to learn that number is to actually have the convertor copy the data out into buffers. This change implements this, copying the entire data out into a chain of send segments which are attached to the large send fragment. Now we can always return the proper size value to the PML. Fixes Cisco bug CSCuj08024 Authored-by: Reese Faucette <rfaucett@cisco.com> Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760) This commit was SVN r29137. The following Trac tickets were found above: Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760	2013-09-06 03:21:21 +00:00
Dave Goodell	0ef8336502	new bookkeeping code should return value indicating whether packet is good or not. Authored-by: Reese Faucette <rfaucett@cisco.com> Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760) This commit was SVN r29136. The following Trac tickets were found above: Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760	2013-09-06 03:19:32 +00:00
Dave Goodell	122890c2fd	usnic: "bookeeping" --> "bookkeeping" Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760) This commit was SVN r29135. The following Trac tickets were found above: Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760	2013-09-06 03:19:20 +00:00
Dave Goodell	0df6ed4acc	usnic: squash warnings from perf improvements Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760) This commit was SVN r29134. The following Trac tickets were found above: Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760	2013-09-06 03:19:08 +00:00
Dave Goodell	6dc54d372d	usnic: Basket of performance changes including: - round segment buffer allocation to cache-line - split some routines into an inline fast section and a called slower section - introduce receive fastpath in component_progress that: o returns immediately if there is a packet available on priority queue and fastpath is enabled o disables fastpath for 1 time after use to provide fairness to other processing o defers receive buffer posting o defers bookeeping for receive until next call to usnic_component_progress Authored-by: Reese Faucette <rfaucett@cisco.com> Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760) This commit was SVN r29133. The following Trac tickets were found above: Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760	2013-09-06 03:18:57 +00:00
Dave Goodell	9cab9777d9	usnic: properly destroy embedded small send frag Without this, an `--enable-debug` build would hit an assertion in the list code when run under valgrind with `--malloc-fill=0xff` or any other case where malloc returned non-zeroed buffers. Also allow the normal OBJ_ machinery to handle the constructor invocation ordering for us instead of doing it by hand (which could have led to future bugs). Reviewed-by: jsquyres@cisco.com cmr=v1.7.4 Depends on trunk functionality in r29095 and r29096. Refs trac:3740,#3741. This commit was SVN r29127. The following SVN revision numbers were found above: r29095 --> open-mpi/ompi@d1b5940e97 r29096 --> open-mpi/ompi@a552921171 The following Trac tickets were found above: Ticket 3740 --> https://svn.open-mpi.org/trac/ompi/ticket/3740	2013-09-04 20:59:12 +00:00
Jeff Squyres	f6619f8e9e	Fix compile error in the heterogeneous case. We've been forcing C99 compiler compliance for a while now, so use C99 syntax to keep the #if code tidy. This commit was SVN r29101.	2013-08-31 12:56:08 +00:00
Brian Barrett	16a1166884	Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a configure-time dynamic allocation of flags. The net result for platforms which only support BTL-based communication is a reduction of 8*nprocs bytes per process. Platforms which support both MTLs and BTLs will not see a space reduction, but will now be able to safely run both the MTL and BTL side-by-side, which will prove useful. This commit was SVN r29100.	2013-08-30 16:54:55 +00:00
Rolf vandeVaart	18962d296b	This has bothered me for a while. Change MCA_BTL_TAG_BTL to MCA_BTL_TAG_IB. They are the same value so this does not change anything. (MCA_BTL_TAG_IB = MCA_BTL_TAG_BTL + 0). This just makes it more correct. This commit was SVN r29099.	2013-08-30 14:53:59 +00:00
Dave Goodell	c5a7e8a079	usnic: stomp format specifier warnings The usnic BTL now builds cleanly under `--enable-picky` when `MSGDEBUG1` is set. Reviewed-by: jsquyres cmr=v1.7.4:reviewer=jsquyres This commit was SVN r29097.	2013-08-29 23:24:14 +00:00
Ralph Castain	5d1fa4fa0e	Silence warnings: osc_pt2pt_data_move.c: In function 'ompi_osc_pt2pt_sendreq_recv_accum_long_cb': osc_pt2pt_data_move.c:643:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable] osc_rdma_data_move.c: In function 'ompi_osc_rdma_control_send_cb': osc_rdma_data_move.c:1312:37: warning: variable 'header' set but not used [-Wunused-but-set-variable] This commit was SVN r29092.	2013-08-29 20:56:36 +00:00
George Bosilca	305fa88d4b	Remove two warnings from the SM BTL. The return code can be safely ignored as the internals of the SM BTL will repost the fragment until the send operation succesfully complete. This commit was SVN r29077.	2013-08-28 06:36:01 +00:00
Dave Goodell	dd82bd3c19	usnic: fix invalid rfstart initialization endpoint_rfstart was being initialized from a value which was not yet set. Also ensure that rfstart is a valid index in the range 0..WINDOW_SIZE-1, since it is used as the index into endpoint_rcvd_segs, which has WINDOW_SIZE elements. Without this change there is significant risk of memory corruption or segfaults, resulting in hangs or crashes, if malloc ever returns us a value >=WINDOW_SIZE (4096). Right now we seem to be getting lucky that the malloc is returning zero-pages to us when we are allocating endpoint structures (possibly because the freelist performs a single large allocation for all endpoints). Fixes Cisco bug CSCui88781. Reviewed-by: rfaucett@cisco.com Reviewed-by: jsquyres@cisco.com cmr=v1.7.3:reviewer=jsquyres This commit was SVN r29075.	2013-08-27 22:43:20 +00:00
Nathan Hjelm	f5495ace48	coll/ml: update the coll_ml_enable_fragmentation variable to support the option to autodetect whether fragmentation should be enabled cmr=v1.7.3:ticket=trac:3717 This commit was SVN r29065. The following Trac tickets were found above: Ticket 3717 --> https://svn.open-mpi.org/trac/ompi/ticket/3717	2013-08-27 16:36:54 +00:00
Ralph Castain	6d24b34940	Extend the dpm framework API to support persistent accept/connect operations: * paccept - establish a persistent listening port for async connect requests * pconnect - async connect to remote process that has posted a paccept port. Provides a timeout mechanism, and allows the underlying implementation to retry until timeout * pclose - shuts down a prior paccept posting Includes example programs paccept.c and pconnect.c in orte/test/mpi. New MPI extension interfaces coming... This commit was SVN r29063.	2013-08-23 18:02:50 +00:00
Rolf vandeVaart	96457df9bc	Fix compile errors created from changeset 29058. This commit was SVN r29061.	2013-08-22 18:25:23 +00:00
Jeff Squyres	63ac60864b	Refs trac:3730 Turns out that AC_CHECK_DECLS is one of the "new style" Autoconf macros that #defines the output to be 0 or 1 (vs. #define'ing or #undef'ing it). So don't check for "#if defined(..."; just check for "#if ...". This commit was SVN r29059. The following Trac tickets were found above: Ticket 3730 --> https://svn.open-mpi.org/trac/ompi/ticket/3730	2013-08-22 17:44:20 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Ralph Castain	16c5b30a1f	Since the calls to "PMI get" scale by number of procs (not nodes), it makes more sense to have the MCA param be the cutoff based on number of procs. Also, it occurred to me that this shouldn't impact the nidmap process as that is built and circulated when we launch via mpirun, not during direct launch. So shift the cutoff param to the MPI layer, and have it solely determine whether or not we call modex_recv on the hostname. If comm_world is of size greater than the cutoff, then we don't automatically retrieve the hostname when we build the ompi_proc_t for a process - instead, we fill the hostname entry on first call to modex_recv for that process. The param is now "ompi_hostname_cutoff=N", where N=number of procs for cutoff. Refs trac:3729 This commit was SVN r29056. The following Trac tickets were found above: Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729	2013-08-22 03:40:26 +00:00
Rolf vandeVaart	504fa2cda9	Fix support in smcuda btl so it does not blow up when there is no CUDA IPC support between two GPUs. Also make it so CUDA IPC support is added dynamically. Fixes ticket 3531. This commit was SVN r29055.	2013-08-21 21:00:09 +00:00
Rolf vandeVaart	96fdb060ea	Fix compile errors and warnings from changeset 29052. This commit was SVN r29054.	2013-08-21 19:01:54 +00:00
Steve Wise	67fe3f23ed	Use the HAVE_DECL_IBV_LINK_LAYER_ETHERNET macro. Commit r27211 added ifdef checks for #define HAVE_IBV_LINK_LAYER_ETHERNET, which is incorrect. The correct #define is HAVE_DECL_IBV_LINK_LAYER_ETHERNET. This broke OMPI over iWARP. This fixes trac:3726 and should be added to cmr:v1.7.3:reviewer=jsquyres This commit was SVN r29053. The following SVN revision numbers were found above: r27211 --> open-mpi/ompi@b27862e5c7 The following Trac tickets were found above: Ticket 3726 --> https://svn.open-mpi.org/trac/ompi/ticket/3726	2013-08-20 20:00:46 +00:00
Ralph Castain	45e695928f	As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time: * add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit. * remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL" * modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded * removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base * added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames This commit was SVN r29052.	2013-08-20 18:59:36 +00:00
Jeff Squyres	b30ad28276	Remove some unused variables and an unused goto label. This commit was SVN r29044.	2013-08-19 16:18:35 +00:00
Ralph Castain	611d7f9f6b	When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require. This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times. Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes: * upon first request for data, have the OPAL db pmi component fetch and decode all the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally * reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test * reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued). Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time. This commit was SVN r29040.	2013-08-17 00:49:18 +00:00
Ralph Castain	f8a72feb25	Silence unitialized var warning This commit was SVN r29036.	2013-08-16 21:39:28 +00:00
Ralph Castain	c74c54e18d	Cleanup uninitialized warnings This commit was SVN r29033.	2013-08-16 21:23:09 +00:00
Ralph Castain	33beab5918	Avoid segfault due to uninitialized variable This commit was SVN r29030.	2013-08-16 21:10:38 +00:00
Ralph Castain	bebe852057	Add new info key for publish that allows user to designate that the port is to be unique - i.e., to return an error if that service has already been published. Default is to overwrite This commit was SVN r29028.	2013-08-14 04:21:17 +00:00
Ralph Castain	318467c04f	If we only have global scope, then don't fall back to looking at local scope if the lookup target wasn't found else we will hang This commit was SVN r29025.	2013-08-13 04:45:33 +00:00
Nathan Hjelm	6c75699068	coll/ml: fix typo in assert that could cause an abort in debug builds. cmr=v1.7.3:reviewer=manjugv This commit was SVN r29024.	2013-08-12 14:31:44 +00:00
Jeff Squyres	c09ec204ad	Change usNIC BTL to always use small fragments when there is a non-contiguous converter. We can't "convert on the fly" because the # of bytes requested may not divide evenly into the convertor data type. This commit was SVN r29014.	2013-08-11 17:04:13 +00:00
Nathan Hjelm	47320713bb	coll/ml: do not register variables in open and fix a bug in the coll/ml parser cmr=v1.7.3:reviewer=pasha This commit was SVN r29010.	2013-08-09 17:55:30 +00:00
Rolf vandeVaart	cd72024a3c	Refactor some of the initialization code. This commit was SVN r29009.	2013-08-09 14:54:17 +00:00
Edgar Gabriel	f7391eca23	Lazy open does not work for the addproc sharedfp component since it starts by spawning a process using MPI_Comm_spawn. For this, the first operation has to be collective which we can not guarantuee outside of the MPI_File_open operation. This commit was SVN r29008.	2013-08-06 20:48:20 +00:00
Edgar Gabriel	e348f5567f	add unignore for me. This commit was SVN r29007.	2013-08-06 20:47:08 +00:00
George Bosilca	837b3363fe	Silence few warnings. This commit was SVN r29004.	2013-08-06 09:38:30 +00:00
Brian Barrett	2cc947513b	* Fix some compile errors * Need to subtract 1 off the size so that we stay in the bit length requirements This commit was SVN r28997.	2013-08-05 18:49:48 +00:00
Jeff Squyres	87910daf51	Fix a collection of bugs found by QA and Coverity, and make some minor improvements: * Fix minor memory leaks during component_init * Ensure that an initialization loop does not underflow an unsigned int * Improve mlock limit checking * Fix set of BTL modules created during component_init when failing to get QP resources or otherwise excluding some (but not all) usnic verbs devices * Fix/improve error messages to be consistent with other Cisco documentation * Randomize the initial sliding window sequence number so that we silently drop incoming frames from previous jobs that still have existant processes in the middle of dying (and are still transmitting) * Ensure we don't break out of add_procs too soon and create an asymetrical view of what interfaces are available This commit was SVN r28975.	2013-08-01 16:56:15 +00:00
Nathan Hjelm	8429485a39	mpool/grdma: use the rcache even if not using mpi_leave_pinned or mpi_leave_pinned_pipeline This change should improve performance is the non-pinned case where the same memory region is involved in multiple simultaneous transfers. cmr=v1.7.3:reviewer=brbarret This commit was SVN r28973.	2013-07-31 23:50:41 +00:00
Nathan Hjelm	1382d4fb53	Fix typos in _Complex ops This commit was SVN r28959.	2013-07-26 17:02:45 +00:00
Nathan Hjelm	e4f105ffb3	revert change that shouldn't have been part of r28952 This commit was SVN r28953. The following SVN revision numbers were found above: r28952 --> open-mpi/ompi@cb90a4a7fc	2013-07-25 20:23:55 +00:00
Nathan Hjelm	cb90a4a7fc	Add simple algorithms to support MPI_IN_PLACE for MPI_Alltoall, MPI_Alltoallv, and MPI_Alltoallw. Working on faster algorithms for tuned that will come at a later time. cmr=v1.7.3:ticket=trac:2965 This commit was SVN r28952. The following Trac tickets were found above: Ticket 2965 --> https://svn.open-mpi.org/trac/ompi/ticket/2965	2013-07-25 19:19:41 +00:00
Nathan Hjelm	99adeb7f6e	Fix support for complex datatypes when fortran is not available but _Complex is This commit was SVN r28951.	2013-07-25 19:08:21 +00:00
Edgar Gabriel	012f99c3b6	ompi_ignore this component until we find a solution for the dependence on libmpi for the external process that is being spawned by this component. This commit was SVN r28945.	2013-07-24 20:02:47 +00:00
Jeff Squyres	f7337b8f77	Correct faulty max payload and MTU computations (and update some debugging that helped us find those). This commit was SVN r28942.	2013-07-24 16:06:28 +00:00
Ralph Castain	db214a2321	Refs trac:3697 - use the opal_pmi_error function instead of ompi_error as the returned error codes are from PMI This commit was SVN r28941. The following Trac tickets were found above: Ticket 3697 --> https://svn.open-mpi.org/trac/ompi/ticket/3697	2013-07-24 04:05:41 +00:00
Jeff Squyres	5323051047	Use sysfs to check MPI has enough VFs, QPs, and CQs Use the new sysfs files to check that there are enough VFs, QPs, and CQs for all the MPI processes on this server. Move the checking code into its own subroutine to make it smaller and easier to read/grok. This commit was SVN r28937.	2013-07-24 00:38:32 +00:00
Ralph Castain	59a71765cf	Hmmm...these error outputs will never occur, which is probably not what the author intended. So do the output and THEN jump to the error exit. This commit was SVN r28918.	2013-07-22 22:58:03 +00:00
Edgar Gabriel	8ffc1aac89	update the _component.c files in ompio to use the explicit assignment of the mca_register_component_params element of the structure. This commit was SVN r28914.	2013-07-22 21:11:05 +00:00
Nathan Hjelm	b17cd13c09	sharedfp: ensure sharedfp components register their parameters in mca_register_component_params not mca_component_open This commit was SVN r28910.	2013-07-22 17:53:58 +00:00
Jeff Squyres	b437041aeb	Update one more comment. This commit was SVN r28908.	2013-07-22 17:29:00 +00:00
Jeff Squyres	4b6006402d	Use the RTE framework instead of calling ORTE directly. Brian (rightfully) hit me on the head with the don't-use-ORTE-use-the-rte-framework clue bat; the usnic BTL now nicely plays with the RTE framework. This commit was SVN r28907.	2013-07-22 17:28:23 +00:00
Jeff Squyres	ca9da8a554	Fix minor typo in the comments/docs. This commit was SVN r28905.	2013-07-22 17:24:17 +00:00
Rolf vandeVaart	67badf384c	Only search SONAME of library. Expand comments. This commit was SVN r28904.	2013-07-22 15:54:45 +00:00
Brian Barrett	e1d72409cd	add missing header This commit was SVN r28897.	2013-07-21 19:40:31 +00:00
Brian Barrett	704f1ecc18	fix non-orte builds of PSM This commit was SVN r28893.	2013-07-21 19:12:32 +00:00
Brian Barrett	05ab9cbaa6	Need to ship pmi_internal.h This commit was SVN r28891.	2013-07-21 19:00:50 +00:00
Brian Barrett	495384d8b7	Update documentation in rte.h to match recent changes This commit was SVN r28887.	2013-07-20 22:14:12 +00:00
Brian Barrett	414ba3dad8	Update PMI RTE to match error handling changes that were part of r28852. Note that the PMI RTE still doesn't listen for asynchronous errors, so the error handler still won't ever actually do anything :). This commit was SVN r28886. The following SVN revision numbers were found above: r28852 --> open-mpi/ompi@e4e678e234	2013-07-20 22:09:02 +00:00
Brian Barrett	5bfd980968	update PMI RTE component to adapt to ORTE changes This commit was SVN r28885.	2013-07-20 22:06:47 +00:00
Brian Barrett	d984d25da3	Remove orte header file from sharedfp components (OMPI layer should not include ORTE layer with the RTE framework). Thankfully, nothing used orte_show_help, so easy fix. This commit was SVN r28884.	2013-07-20 22:03:44 +00:00
Jeff Squyres	194b285447	First commit of the Cisco usNIC BTL. This BTL accesses the Cisco usNIC Linux device via the Linux verbs API via Unreliable Datagram queue pairs. A few noteworthy points: * This BTL does most of its own fragmentation; it tells the PML that it has a very high max_send_size (much higher than the network MTU). * Since UD fragments are, by definition, unreliable, the usnic BTL handles all of its own reliability via a sliding window approach using the opal_hotel construct and many tricks stolen from the corpus of knowledge surrounding efficient TCP. * There is a fun PML latency-metric based optimization for NUMA awareness of short messages. * Note that this is ''not'' a generic UD verbs BTL; it is specific to the Cisco usNIC device. This commit was SVN r28879.	2013-07-19 22:13:58 +00:00
Jeff Squyres	3546163c48	Devices that do not support RC QP's are also intentionally skipped; don't warn about skipping them. This commit was SVN r28874.	2013-07-19 19:05:18 +00:00
Ralph Castain	e4e678e234	Per the RFC and discussion on the devel list, update the RTE-MPI error handling interface. There are a few differences in the code from the original RFC that came out of the discussion - I've captured those in the following writeup George and I were talking about ORTE's error handling the other day in regards to the right way to deal with errors in the updated OOB. Specifically, it seemed a bad idea for a library such as ORTE to be aborting the job on its own prerogative. If we lose a connection or cannot send a message, then we really should just report it upwards and let the application and/or upper layers decide what to do about it. The current code base only allows a single error callback to exist, which seemed unduly limiting. So, based on the conversation, I've modified the errmgr interface to provide a mechanism for registering any number of error handlers (this replaces the current "set_fault_callback" API). When an error occurs, these handlers will be called in order until one responds that the error has been "resolved" - i.e., no further action is required - by returning OMPI_SUCCESS. The default MPI layer error handler is specified to go "last" and calls mpi_abort, so the current "abort" behavior is preserved unless other error handlers are registered. In the register_callback function, I provide an "order" param so you can specify "this callback must come first" or "this callback must come last". Seemed to me that we will probably have different code areas registering callbacks, and one might require it go first (the default "abort" will always require it go last). So you can append and prepend, or go first. Note that only one registration can declare itself "first" or "last", and since the default "abort" callback automatically takes "last", that one isn't available. :-) The errhandler callback function passes an opal_pointer_array of structs, each of which contains the name of the proc involved (which can be yourself for internal errors) and the error code. This is a change from the current fault callback which returned an opal_pointer_array of just process names. Rationale is that you might need to see the cause of the error to decide what action to take. I realize that isn't a requirement for remote procs, but remember that we will use the SAME interface to report RTE errors internal to the proc itself. In those cases, you really do need to see the error code. It is legal to pass a NULL for the pointer array (e.g., when reporting an internal failure without error code), so handlers must be prepared for that possibility. If people find that too burdensome, we can remove it. Should we ever decide to create a separate callback path for internal errors vs remote process failures, or if we decide to do something different based on experience, then we can adjust this API. This commit was SVN r28852.	2013-07-19 01:08:53 +00:00
Ralph Castain	8a8b4896be	Need to protect libgen.h as some systems might not have it This commit was SVN r28845.	2013-07-18 20:21:37 +00:00
Edgar Gabriel	185e365dad	make the sm sharedfp component compile on Mac. This commit was SVN r28844.	2013-07-18 20:17:14 +00:00
Edgar Gabriel	93cef82873	remove the ylib component from the fcoll framework. It is not used, there are no plans to use it. We can always recover it from svn if we would ever change our minds. This commit was SVN r28840.	2013-07-18 16:18:06 +00:00
Pavel Shamis	68969ba6e5	Removing bogus references in iboffload code. cmr:v1.7:reviewer=hjelmn This commit was SVN r28834.	2013-07-17 22:35:24 +00:00
Rolf vandeVaart	49663fb802	Move CUDA-aware configurary to its own file and other minor changes due to review. This commit was SVN r28832.	2013-07-17 22:12:29 +00:00
Edgar Gabriel	6e8522fec5	infuse life into the shared file pointer framework. For this: - extend the framework API - remove the dummy component, not require anymore - add four components to perform the actual job. This commit was SVN r28828.	2013-07-17 21:55:24 +00:00
Edgar Gabriel	ac694b7056	in preparation for the new shared file pointer components to be committed soon: - add a new abstraction layer to be used internally for some operations - add a new mca parameter to control lazy intialization of shared file pointer structures This commit was SVN r28826.	2013-07-17 21:30:50 +00:00
Vishwanath Venkatesan	ce8f8f0829	Changing the MPI Datatype from MPI_LONG to OMPI_OFFSET_DATATYPE for send/recv offsets This commit was SVN r28822.	2013-07-17 19:16:53 +00:00
Nathan Hjelm	d4c6029cf3	sbgp/ibnet: set mca_sbgp_ibnet_component.mtu to IBV_MTU_1024 before registering it. cmr:v1.7:reviewer=pasha This commit was SVN r28821.	2013-07-17 19:16:31 +00:00
Rolf vandeVaart	7a45be8bde	Fix variable initialization. This commit was SVN r28819.	2013-07-17 17:37:35 +00:00
Nathan Hjelm	f0aeb36d80	Fix warnings in ob1 introduced by the pvar commit This commit was SVN r28817.	2013-07-17 03:41:05 +00:00
Rolf vandeVaart	f95c95cf79	Additional cleanup of how libraries and paths are searched. This commit was SVN r28815.	2013-07-16 18:40:55 +00:00
Nathan Hjelm	e6e9f2c6fd	Add profiling function definitions for MPI_T and add a missing type into mpi.h This commit was SVN r28803.	2013-07-16 16:03:33 +00:00
Nathan Hjelm	35673ea400	Add example performance variables to ob1: unexpected message queue length, posted receive length This commit was SVN r28801.	2013-07-16 16:02:25 +00:00
Rolf vandeVaart	54b1fbdb4a	Better error message code. Remove commented out code. This commit was SVN r28793.	2013-07-15 22:27:34 +00:00
Rolf vandeVaart	4d2c2bcefe	Better error message. Remove a tab. This commit was SVN r28791.	2013-07-15 19:39:54 +00:00
Mike Dubman	5bd2e15cbb	support for ConnectX3-Pro card. cmr:v1.7:reviewer=jsquyres cmr:v1.6:reviewer=jsquyres This commit was SVN r28787.	2013-07-14 06:44:19 +00:00
Nathan Hjelm	dfca3d4804	fix typos in the ugni and vader btls This commit was SVN r28772.	2013-07-12 17:55:33 +00:00
Nathan Hjelm	1119cd3e8a	Merge branch 'vader_fix' This commit was SVN r28764.	2013-07-11 23:30:20 +00:00
Brian Barrett	2f19fc52de	use the same multi-md workaround the rest of the Portals code is using. This commit was SVN r28761.	2013-07-11 21:00:11 +00:00
Nathan Hjelm	b5281778b0	btl/vader: improve small message performance This commit improved the small message latency and bandwidth when using the vader btl. These improvements should make performance competative with other MPI implementations. This commit was SVN r28760.	2013-07-11 20:54:12 +00:00
Brian Barrett	bea54eeeb1	First take at a BTL for Portals 4 This commit was SVN r28759.	2013-07-11 20:47:08 +00:00
Jeff Squyres	baa3182794	Per RFC (http://www.open-mpi.org/community/lists/devel/2013/07/12534.php), remove a bunch of dead code. This commit was SVN r28756.	2013-07-11 17:34:28 +00:00
Rolf vandeVaart	858ef65142	Fix loop limit. This commit was SVN r28755.	2013-07-11 17:15:43 +00:00
Rolf vandeVaart	5051cd53fd	Use new API. This commit was SVN r28754.	2013-07-11 17:06:14 +00:00
Joshua Ladd	16beaa3878	This fixes the nasty configure.m4 hack that was added long ago and not removed. My fault for not catching earlier. I've also removed the '.ompi_ignore' in coll/hcoll. Throwing this to Nathan for review. Upon successful review, this should be added to cmr:v1.7:reviewer=hjelmn This commit was SVN r28753.	2013-07-11 09:55:46 +00:00
Jeff Squyres	28dac8010b	The hcoll component configure.m4 commits multiple sins, and breaks many builds. I am temporarily .ompi_ignore'ing this component until it can be fixed by its owner. * It calls AC_MSG_ERROR, which configure.m4 scripts are ''never'' supposed to do. If you don't want to build, then call $2. * All static and --disable-dlopen builds are broken; they fall afoul of whatever test configure.m4 is doing and therefore error out of configure entirely (vs. simply disabling the hcoll component). * There appear to be multiple shell scripting errors in the configure.m4. Here's the output of "./configure --disable-dlopen": {{{ --- MCA component coll:hcoll (m4 configuration macro) checking for MCA component coll:hcoll compile mode... static checking --with-hcoll value... simple ok (unspecified) ./configure: line 421: test: basic: integer expression expected configure: error: Can not use coll/hcoll and coll/ml (static build) simultaneously. You have two options: 1. Use static build & disable ml with: --enable-mpi-no-build=coll-ml 2. Use dso build for ML & disable ml at runtime: -mca coll self ./configure: line 310: return: basic: numeric argument required ./configure: line 320: exit: basic: numeric argument required }}} Finally, all of these configure.m4 errors aside, I don't understand why there is a ''compile-time'' exclusion between the hcoll and ml components. Why isn't this a ''run-time'' decision? Having what seems to be an unnecessary compile-time exclusion goes against the general Open MPI philosophy. Note: Open MPI 1.7 is also broken in all the same ways. I suggest that the RM's .ompi_ignore hcoll over there, too. Mellanox: please fix. This commit was SVN r28748.	2013-07-10 16:03:15 +00:00

... 2 3 4 5 6 ...

4585 Коммитов