1
1
Граф коммитов

6628 Коммитов

Автор SHA1 Сообщение Дата
Mike Dubman
432c10750a enable mxm2 from np>0
reviewed by yossi
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29178.
2013-09-16 12:36:28 +00:00
Mike Dubman
44bfa95553 enable mxm2 by default on np>=0
reviewed by yossi
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29177.
2013-09-16 12:32:29 +00:00
Ralph Castain
52caa75552 Gar - forgot to commit a few more cleanups
Refs trac:3696

This commit was SVN r29168.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-09-15 15:32:01 +00:00
Ralph Castain
b64c8dafd8 Cleanup some errors in pubsub - must set the active flag before posting the recv in case the message has already arrived
Refs trac:3696

This commit was SVN r29167.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-09-15 15:26:32 +00:00
Ralph Castain
497c7e6abb Fixes trac:2904
The intercomm "merge" function can create a linkage between procs that was not reflected anywhere in a modex, and so at least some of the procs in the resulting communicator don't know how to talk to some of the new communicator's peers.

For example, consider the case where:

1. parent job A comm_spawns a process (job B) - these processes exchange modex and can communicate

2. parent job A now comm_spawns another process (job C) - again, these can communicate, but the proc in C knows nothing of B

3. do an intercomm merge across the communicators created by the two comm_spawns. This puts B and C into the same communicator, but they know nothing about how to talk to each other as they were not involved in any exchange of contact info. Hence, collectives on that communicator now fail. 

This fix adds an API to the ompi/dpm framework that (a) exchanges the modex info across the procs in the merge to ensure all procs know how to communicate, and (b) calls add_procs to give the btl's a chance to select transports to any new procs.

cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29166.

The following Trac tickets were found above:
  Ticket 2904 --> https://svn.open-mpi.org/trac/ompi/ticket/2904
2013-09-15 15:00:40 +00:00
Rolf vandeVaart
096b8c022e Also add flag to debug output.
This commit was SVN r29163.
2013-09-13 19:47:05 +00:00
Rolf vandeVaart
c15b2a26b8 Fix some formatting. Move some CUDA-aware mca parameter initialization earlier.
This commit was SVN r29162.
2013-09-13 17:43:41 +00:00
Rolf vandeVaart
d247c26b84 In the case that HAVE_IBV_FORK_INIT is not defined, we will need this variable so we can give the user an error if they ask for it. Also fixes compile error when HAVE_IBV_FORK_INIT is not defined.
This commit was SVN r29160.
2013-09-13 14:38:49 +00:00
Rolf vandeVaart
ba9ec1b8bc For debug builds, add the ability to view memory registrations and deregistrations in the openib BTL.
This commit was SVN r29159.
2013-09-13 14:28:26 +00:00
Joshua Ladd
b3f88c4a1d Per the RFC schedule, this commit adds Mellanox OpenSHMEM to the trunk. It does not yet run on OSX or with CM PML for an MTL other than MXM. Mellanox is aware of these issues and is in the process of resolving them. This should be added to \ncmr=v1.7.4:subject=Move OSHMEM to 1.7.4:reviewer=rhc
This commit was SVN r29153.
2013-09-10 15:34:09 +00:00
Jeff Squyres
c9f05a2664 Delineate OMPI_FREE_LIST_*_MT separately.
The FREE_LIST_*_MT stuff was introduced on the SVN trunk in r28722
(2013-07-04), but so far, has not been merged into the v1.7 branch yet
(2013-09-06).  So put it in its own #ifdef, rather than defining it
based on OMPI_MAJOR_VERSION/OMPI_MINOR_VERSION.

This commit was SVN r29148.

The following SVN revision numbers were found above:
  r28722 --> open-mpi/ompi@c9e5ab9ed1
2013-09-06 19:22:56 +00:00
Jeff Squyres
e02cc0a7ec No need for this header file.
This commit was SVN r29147.
2013-09-06 19:22:28 +00:00
Jeff Squyres
c53b0890cf Ensure that btl_usnic_compat.h is in the tarball.
This commit was SVN r29140.
2013-09-06 15:53:56 +00:00
Dave Goodell
75fa28c303 usnic: v1.6<->trunk unification, trunk side
The Cisco-maintained v1.6 port of the usnic BTL has diverged from the
upstream trunk and v1.7 branches.  This commit adjusts the trunk to more
closely match the v1.6 branch to simplify future merging and
cherry-picking.

The usnic MCA parameters also need work on this side.

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29138.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:21:34 +00:00
Dave Goodell
a669bd01e6 usnic: revamp convertor handling.
The fix for the HPL SEGV was incorrect because it assumed the
prepare_src() routine was always allowed to return "bytes processed"
less than the requested "bytes to send".  It turns out this is only true
if the convertor is what limits the size, we are not allowed to limit
the data sent for our own reasons, else we break login in the upper
layers.

This means we need to learn the number of bytes out of the size
requested the convertor will give us, no matter how big the size is.
Unfortunately, this is a destructive test, and (currently) the only way to
learn that number is to actually have the convertor copy the data out into
buffers.

This change implements this, copying the entire data out into a chain of
send segments which are attached to the large send fragment.  Now we can
always return the proper size value to the PML.

Fixes Cisco bug CSCuj08024

Authored-by: Reese Faucette <rfaucett@cisco.com>

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29137.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:21:21 +00:00
Dave Goodell
0ef8336502 new bookkeeping code should return value indicating whether packet is good or not.
Authored-by: Reese Faucette <rfaucett@cisco.com>

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29136.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:19:32 +00:00
Dave Goodell
122890c2fd usnic: "bookeeping" --> "bookkeeping"
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29135.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:19:20 +00:00
Dave Goodell
0df6ed4acc usnic: squash warnings from perf improvements
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29134.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:19:08 +00:00
Dave Goodell
6dc54d372d usnic: Basket of performance changes including:
- round segment buffer allocation to cache-line
    - split some routines into an inline fast section and a called
      slower section
    - introduce receive fastpath in component_progress that:
        o returns immediately if there is a packet available on priority
          queue and fastpath is enabled
        o disables fastpath for 1 time after use to provide fairness to
          other processing
        o defers receive buffer posting
        o defers bookeeping for receive until next call
          to usnic_component_progress

Authored-by: Reese Faucette <rfaucett@cisco.com>

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29133.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:18:57 +00:00
Dave Goodell
9cab9777d9 usnic: properly destroy embedded small send frag
Without this, an `--enable-debug` build would hit an assertion in the
list code when run under valgrind with `--malloc-fill=0xff` or any other
case where malloc returned non-zeroed buffers.

Also allow the normal OBJ_ machinery to handle the constructor
invocation ordering for us instead of doing it by hand (which could have
led to future bugs).

Reviewed-by: jsquyres@cisco.com

cmr=v1.7.4

Depends on trunk functionality in r29095 and r29096.  Refs trac:3740,#3741.

This commit was SVN r29127.

The following SVN revision numbers were found above:
  r29095 --> open-mpi/ompi@d1b5940e97
  r29096 --> open-mpi/ompi@a552921171

The following Trac tickets were found above:
  Ticket 3740 --> https://svn.open-mpi.org/trac/ompi/ticket/3740
2013-09-04 20:59:12 +00:00
George Bosilca
2aed876be6 Fix the bug where MPI_is_thread_main returns true for all
threads when not in MPI_THREAD_MULTIPLE mode.
Thanks to Lisandro Dalcin for pointing it out.

This commit was SVN r29113.
2013-09-04 11:10:51 +00:00
Jeff Squyres
f6619f8e9e Fix compile error in the heterogeneous case.
We've been forcing C99 compiler compliance for a while now, so use C99
syntax to keep the #if code tidy.

This commit was SVN r29101.
2013-08-31 12:56:08 +00:00
Brian Barrett
16a1166884 Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a
configure-time dynamic allocation of flags.  The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process.  Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.

This commit was SVN r29100.
2013-08-30 16:54:55 +00:00
Rolf vandeVaart
18962d296b This has bothered me for a while. Change MCA_BTL_TAG_BTL to MCA_BTL_TAG_IB. They are the same
value so this does not change anything.  (MCA_BTL_TAG_IB = MCA_BTL_TAG_BTL + 0).  This just makes it more correct.

This commit was SVN r29099.
2013-08-30 14:53:59 +00:00
Dave Goodell
c5a7e8a079 usnic: stomp format specifier warnings
The usnic BTL now builds cleanly under `--enable-picky` when `MSGDEBUG1`
is set.

Reviewed-by: jsquyres

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29097.
2013-08-29 23:24:14 +00:00
Dave Goodell
a552921171 call element destructors in ompi_free_list_destruct
The free list code called the constructor for each object it
slab allocated in ompi_free_list_grow.  This permits free list-managed
elements to safely allocate/deallocate resources in their
constructors/destructors without leaking.

It's probably best to let this soak on the trunk a little while before
moving it over to v1.7.

Reviewed-by: bosilca

cmr=v1.7.4:reviewer=bosilca

This commit was SVN r29096.
2013-08-29 22:56:31 +00:00
Dave Goodell
d1b5940e97 don't call OMPI_REQUEST_INIT in req constructor
Calling OMPI_REQUEST_INIT puts the request into an _INACTIVE state
instead of an _INVALID state, which we don't want if it's been
simply been constructed, e.g., in a free_list.  Without this change a
future change to call destructors at free list destruction time will
result in request dtor state assertion failures.

Reviewed-by: bosilca

cmr=v1.7.4:reviewer=bosilca

This commit was SVN r29095.
2013-08-29 22:56:22 +00:00
Ralph Castain
5d1fa4fa0e Silence warnings:
osc_pt2pt_data_move.c: In function 'ompi_osc_pt2pt_sendreq_recv_accum_long_cb':
osc_pt2pt_data_move.c:643:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
osc_rdma_data_move.c: In function 'ompi_osc_rdma_control_send_cb':
osc_rdma_data_move.c:1312:37: warning: variable 'header' set but not used [-Wunused-but-set-variable]

This commit was SVN r29092.
2013-08-29 20:56:36 +00:00
Ralph Castain
537e7380b1 As per the discussion on the devel telecon, do not compute ompi_comm_world_thread_level_mult if thread multiple is disabled. We aren't using the value anyway, but we will leave the current code in-place until we understand if it is needed or not.
This commit was SVN r29080.
2013-08-28 17:44:04 +00:00
George Bosilca
305fa88d4b Remove two warnings from the SM BTL. The return code can be safely ignored
as the internals of the SM BTL will repost the fragment until the send operation
succesfully complete.

This commit was SVN r29077.
2013-08-28 06:36:01 +00:00
George Bosilca
badd011ac3 Minor cleanup.
This commit was SVN r29076.
2013-08-28 05:48:58 +00:00
Dave Goodell
dd82bd3c19 usnic: fix invalid rfstart initialization
endpoint_rfstart was being initialized from a value which was not yet
set.  Also ensure that rfstart is a valid index in the range
0..WINDOW_SIZE-1, since it is used as the index into endpoint_rcvd_segs,
which has WINDOW_SIZE elements.

Without this change there is significant risk of memory corruption or
segfaults, resulting in hangs or crashes, if malloc ever returns us a
value >=WINDOW_SIZE (4096).  Right now we seem to be getting lucky that
the malloc is returning zero-pages to us when we are allocating endpoint
structures (possibly because the freelist performs a single large
allocation for all endpoints).

Fixes Cisco bug CSCui88781.

Reviewed-by: rfaucett@cisco.com
Reviewed-by: jsquyres@cisco.com

cmr=v1.7.3:reviewer=jsquyres

This commit was SVN r29075.
2013-08-27 22:43:20 +00:00
Nathan Hjelm
c699ee7812 Update the ompi_info man page with information about variable levels
and improve the behavior of ompi_info.

This commit changes the default behavior of ompi_info --all when a
level is not specified. Instead of assuming level 1 in this case we
now assume level 9. This change is due to feedback from the community
after the introduction of the --level option.

I also added a new option: --selected-only. This option will limit the
displayed variables to components that can be selected (ie. if there
is a selection parameter set-- btl self,sm)

cmr=v1.7.3:reviewer=jsquyres

This commit was SVN r29070.
2013-08-27 19:11:37 +00:00
Nathan Hjelm
2da64eb719 Fix compilation of the MPI tools information interface when profiling
is enabled and fix a bug in the handling of watermark performance
variables.

cmr=v1.7.3:ticket=trac:3725:reviewer=jsquyres

This commit was SVN r29068.

The following Trac tickets were found above:
  Ticket 3725 --> https://svn.open-mpi.org/trac/ompi/ticket/3725
2013-08-27 18:19:18 +00:00
George Bosilca
cf09fe7c99 It wasn't even compiling when heterogeneous support was on.
This commit was SVN r29067.
2013-08-27 16:53:33 +00:00
Nathan Hjelm
f5495ace48 coll/ml: update the coll_ml_enable_fragmentation variable to support the
option to autodetect whether fragmentation should be enabled

cmr=v1.7.3:ticket=trac:3717

This commit was SVN r29065.

The following Trac tickets were found above:
  Ticket 3717 --> https://svn.open-mpi.org/trac/ompi/ticket/3717
2013-08-27 16:36:54 +00:00
Ralph Castain
6d24b34940 Extend the dpm framework API to support persistent accept/connect operations:
* paccept - establish a persistent listening port for async connect requests

* pconnect - async connect to remote process that has posted a paccept port. Provides a timeout mechanism, and allows the underlying implementation to retry until timeout 

* pclose - shuts down a prior paccept posting

Includes example programs paccept.c and pconnect.c in orte/test/mpi. New MPI extension interfaces coming...

This commit was SVN r29063.
2013-08-23 18:02:50 +00:00
Rolf vandeVaart
96457df9bc Fix compile errors created from changeset 29058.
This commit was SVN r29061.
2013-08-22 18:25:23 +00:00
Jeff Squyres
63ac60864b Refs trac:3730
Turns out that AC_CHECK_DECLS is one of the "new style" Autoconf
macros that #defines the output to be 0 or 1 (vs. #define'ing or
#undef'ing it).  So don't check for "#if defined(..."; just check for
"#if ...".

This commit was SVN r29059.

The following Trac tickets were found above:
  Ticket 3730 --> https://svn.open-mpi.org/trac/ompi/ticket/3730
2013-08-22 17:44:20 +00:00
Ralph Castain
a200e4f865 As per the RFC, bring in the ORTE async progress code and the rewrite of OOB:
*** THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE ***

Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro.

***************************************************************************************

I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week.

The code is in  https://bitbucket.org/rhc/ompi-oob2


WHAT:    Rewrite of ORTE OOB

WHY:       Support asynchronous progress and a host of other features

WHEN:    Wed, August 21

SYNOPSIS:
The current OOB has served us well, but a number of limitations have been identified over the years. Specifically:

* it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code)

* we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface.

* the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients

* there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort

* only one transport (i.e., component) can be "active"


The revised OOB resolves these problems:

* async progress is used for all application processes, with the progress thread blocking in the event library

* each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on")

* multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC.

* a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions.

* opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object

* NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions

* obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel

* the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport

* routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active

* all blocking send/recv APIs have been removed. Everything operates asynchronously.


KNOWN LIMITATIONS:

* although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline

* the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker

* routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways

* obviously, not every error path has been tested nor necessarily covered

* determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when *all* transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost.

* reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways

* the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC

This commit was SVN r29058.
2013-08-22 16:37:40 +00:00
Ralph Castain
16c5b30a1f Since the calls to "PMI get" scale by number of procs (not nodes), it makes more sense to have the MCA param be the cutoff based on number of procs. Also, it occurred to me that this shouldn't impact the nidmap process as that is built and circulated when we launch via mpirun, not during direct launch.
So shift the cutoff param to the MPI layer, and have it solely determine whether or not we call modex_recv on the hostname. If comm_world is of size greater than the cutoff, then we don't automatically retrieve the hostname when we build the ompi_proc_t for a process - instead, we fill the hostname entry on first call to modex_recv for that process.

The param is now "ompi_hostname_cutoff=N", where N=number of procs for cutoff.

Refs trac:3729

This commit was SVN r29056.

The following Trac tickets were found above:
  Ticket 3729 --> https://svn.open-mpi.org/trac/ompi/ticket/3729
2013-08-22 03:40:26 +00:00
Rolf vandeVaart
504fa2cda9 Fix support in smcuda btl so it does not blow up when there is no CUDA IPC support between two GPUs. Also make it so CUDA IPC support is added dynamically.
Fixes ticket 3531.    

This commit was SVN r29055.
2013-08-21 21:00:09 +00:00
Rolf vandeVaart
96fdb060ea Fix compile errors and warnings from changeset 29052.
This commit was SVN r29054.
2013-08-21 19:01:54 +00:00
Steve Wise
67fe3f23ed Use the HAVE_DECL_IBV_LINK_LAYER_ETHERNET macro.
Commit r27211 added ifdef checks for #define
HAVE_IBV_LINK_LAYER_ETHERNET, which is incorrect.  The correct #define
is HAVE_DECL_IBV_LINK_LAYER_ETHERNET.  This broke OMPI over iWARP.

This fixes trac:3726 and should be added to cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29053.

The following SVN revision numbers were found above:
  r27211 --> open-mpi/ompi@b27862e5c7

The following Trac tickets were found above:
  Ticket 3726 --> https://svn.open-mpi.org/trac/ompi/ticket/3726
2013-08-20 20:00:46 +00:00
Ralph Castain
45e695928f As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time:
* add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit.

* remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL"

* modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded

* removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base

* added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames

This commit was SVN r29052.
2013-08-20 18:59:36 +00:00
Jeff Squyres
b30ad28276 Remove some unused variables and an unused goto label.
This commit was SVN r29044.
2013-08-19 16:18:35 +00:00
Ralph Castain
e0cfcf376f Okay, fix it so it works both --disable-mpi-profile and --enable-mpi-profile. I'm not sure why mpit's library has to be treated differently, but it seems that it needs some special care to work in both scenarios
Refs trac:3725

This commit was SVN r29043.

The following Trac tickets were found above:
  Ticket 3725 --> https://svn.open-mpi.org/trac/ompi/ticket/3725
2013-08-19 14:48:23 +00:00
Ralph Castain
b730c9540e Fix --disable-mpi-profile option so it can build
cmr:v1.7.3:reviewer=hjelmn

This commit was SVN r29041.
2013-08-18 18:22:34 +00:00
Ralph Castain
611d7f9f6b When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require.
This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times.

Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes:

* upon first request for data, have the OPAL db pmi component fetch and decode *all* the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally

* reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test

* reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued).

Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it

Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time.

This commit was SVN r29040.
2013-08-17 00:49:18 +00:00
Ralph Castain
90cfd139cf Cleanup error - need an "and" instead of an "or"
This commit was SVN r29037.
2013-08-16 21:41:59 +00:00
Ralph Castain
f8a72feb25 Silence unitialized var warning
This commit was SVN r29036.
2013-08-16 21:39:28 +00:00
Ralph Castain
c5f395d36a Silence unitialized var warnings
This commit was SVN r29035.
2013-08-16 21:37:35 +00:00
Ralph Castain
c74c54e18d Cleanup uninitialized warnings
This commit was SVN r29033.
2013-08-16 21:23:09 +00:00
Ralph Castain
33beab5918 Avoid segfault due to uninitialized variable
This commit was SVN r29030.
2013-08-16 21:10:38 +00:00
Ralph Castain
7d2e3028d6 Add unique info_key to documentation
This commit was SVN r29029.
2013-08-14 04:24:17 +00:00
Ralph Castain
bebe852057 Add new info key for publish that allows user to designate that the port is to be unique - i.e., to return an error if that service has already been published. Default is to overwrite
This commit was SVN r29028.
2013-08-14 04:21:17 +00:00
Ralph Castain
318467c04f If we only have global scope, then don't fall back to looking at local scope if the lookup target wasn't found else we will hang
This commit was SVN r29025.
2013-08-13 04:45:33 +00:00
Nathan Hjelm
6c75699068 coll/ml: fix typo in assert that could cause an abort in debug builds.
cmr=v1.7.3:reviewer=manjugv

This commit was SVN r29024.
2013-08-12 14:31:44 +00:00
Jeff Squyres
c09ec204ad Change usNIC BTL to always use small fragments when there is a
non-contiguous converter.  We can't "convert on the fly" because the #
of bytes requested may not divide evenly into the convertor data type.

This commit was SVN r29014.
2013-08-11 17:04:13 +00:00
Nathan Hjelm
47320713bb coll/ml: do not register variables in open and fix a bug in the coll/ml parser
cmr=v1.7.3:reviewer=pasha

This commit was SVN r29010.
2013-08-09 17:55:30 +00:00
Rolf vandeVaart
cd72024a3c Refactor some of the initialization code.
This commit was SVN r29009.
2013-08-09 14:54:17 +00:00
Edgar Gabriel
f7391eca23 Lazy open does not work for the addproc sharedfp component since it starts by
spawning a process using MPI_Comm_spawn. For this, the first operation has to
be collective which we can not guarantuee outside of the MPI_File_open
operation.

This commit was SVN r29008.
2013-08-06 20:48:20 +00:00
Edgar Gabriel
e348f5567f add unignore for me.
This commit was SVN r29007.
2013-08-06 20:47:08 +00:00
Jeff Squyres
ed130dcef0 Add missing Fortran mpi module TKR implementation for MPI_Get_address
This commit was SVN r29005.
2013-08-06 15:08:00 +00:00
George Bosilca
837b3363fe Silence few warnings.
This commit was SVN r29004.
2013-08-06 09:38:30 +00:00
George Bosilca
710d3836d5 Use a recv convertor for the pack external case.
This commit was SVN r29003.
2013-08-06 09:09:42 +00:00
George Bosilca
4adaaa0b2b Fix the profiling prototypes and the copyright.
This commit was SVN r29000.
2013-08-05 21:07:32 +00:00
George Bosilca
a938f8fcc5 Add all missing prototypes for the _x functions.
This commit was SVN r28999.
2013-08-05 20:49:31 +00:00
George Bosilca
47b1128993 It must be an MPI_Count.
This commit was SVN r28998.
2013-08-05 20:49:00 +00:00
Brian Barrett
2cc947513b * Fix some compile errors
* Need to subtract 1 off the size so that we stay in the bit length requirements

This commit was SVN r28997.
2013-08-05 18:49:48 +00:00
Jeff Squyres
87910daf51 Fix a collection of bugs found by QA and Coverity, and make some minor
improvements:

* Fix minor memory leaks during component_init
* Ensure that an initialization loop does not underflow an unsigned int
* Improve mlock limit checking
* Fix set of BTL modules created during component_init when failing to
  get QP resources or otherwise excluding some (but not all) usnic
  verbs devices
* Fix/improve error messages to be consistent with other Cisco
  documentation
* Randomize the initial sliding window sequence number so that we
  silently drop incoming frames from previous jobs that still have
  existant processes in the middle of dying (and are still
  transmitting) 
* Ensure we don't break out of add_procs too soon and create an
  asymetrical view of what interfaces are available

This commit was SVN r28975.
2013-08-01 16:56:15 +00:00
Nathan Hjelm
8429485a39 mpool/grdma: use the rcache even if not using mpi_leave_pinned or
mpi_leave_pinned_pipeline

This change should improve performance is the non-pinned case where
the same memory region is involved in multiple simultaneous transfers.

cmr=v1.7.3:reviewer=brbarret

This commit was SVN r28973.
2013-07-31 23:50:41 +00:00
Matthias Jurenz
5c43ae156c Fixed # 3704
This commit was SVN r28967.
2013-07-31 07:38:24 +00:00
Jeff Squyres
c7ff45d046 This define is not needed in mpi.h (it's private to the
implementation, and is already included in opal_config.h)

This commit was SVN r28964.
2013-07-30 21:33:19 +00:00
Nathan Hjelm
befcd8b63e Disable mpi_param_check by default if parameter checking is compiled out.
cmr=v1.7.3:reviewer=rhc

This commit was SVN r28960.
2013-07-26 17:44:10 +00:00
Nathan Hjelm
1382d4fb53 Fix typos in _Complex ops
This commit was SVN r28959.
2013-07-26 17:02:45 +00:00
Nathan Hjelm
9c519c5ce8 Bump MPI version to 2.2 now that in place alltoall and reductions on MPI_C_*COMPLEX and MPI_CXX_*COMPLEX are supported
This commit was SVN r28958.
2013-07-26 15:51:38 +00:00
Jeff Squyres
c59770651f Propagate the use of MPI_COUNT_KIND everywhere.
This commit was SVN r28956.
2013-07-25 22:41:48 +00:00
Jeff Squyres
a001f01f05 Remove the other bogus KIND from here; it looks like old kruft
This commit was SVN r28955.
2013-07-25 22:41:12 +00:00
Nathan Hjelm
7097ba3992 MPI_COUNT_KIND is defined in mpif-config.h
This commit was SVN r28954.
2013-07-25 21:42:27 +00:00
Nathan Hjelm
e4f105ffb3 revert change that shouldn't have been part of r28952
This commit was SVN r28953.

The following SVN revision numbers were found above:
  r28952 --> open-mpi/ompi@cb90a4a7fc
2013-07-25 20:23:55 +00:00
Nathan Hjelm
cb90a4a7fc Add simple algorithms to support MPI_IN_PLACE for MPI_Alltoall, MPI_Alltoallv, and MPI_Alltoallw.
Working on faster algorithms for tuned that will come at a later time.

cmr=v1.7.3:ticket=trac:2965

This commit was SVN r28952.

The following Trac tickets were found above:
  Ticket 2965 --> https://svn.open-mpi.org/trac/ompi/ticket/2965
2013-07-25 19:19:41 +00:00
Nathan Hjelm
99adeb7f6e Fix support for complex datatypes when fortran is not available but _Complex is
This commit was SVN r28951.
2013-07-25 19:08:21 +00:00
Edgar Gabriel
012f99c3b6 ompi_ignore this component until we find a solution for the dependence on
libmpi for the external process that is being spawned by this component.

This commit was SVN r28945.
2013-07-24 20:02:47 +00:00
Nathan Hjelm
22868b9f68 MPI_T: add man pages for MPI_T_* functions and fix typos in tool file names
This commit was SVN r28943.
2013-07-24 18:19:40 +00:00
Jeff Squyres
f7337b8f77 Correct faulty max payload and MTU computations (and update some
debugging that helped us find those).

This commit was SVN r28942.
2013-07-24 16:06:28 +00:00
Ralph Castain
db214a2321 Refs trac:3697 - use the opal_pmi_error function instead of ompi_error as the returned error codes are from PMI
This commit was SVN r28941.

The following Trac tickets were found above:
  Ticket 3697 --> https://svn.open-mpi.org/trac/ompi/ticket/3697
2013-07-24 04:05:41 +00:00
Jeff Squyres
5323051047 Use sysfs to check MPI has enough VFs, QPs, and CQs
Use the new sysfs files to check that there are enough VFs, QPs, and
CQs for all the MPI processes on this server.
    
Move the checking code into its own subroutine to make it smaller and
easier to read/grok.

This commit was SVN r28937.
2013-07-24 00:38:32 +00:00
Nathan Hjelm
90f5bd9424 Add missing f08 binding declarations for MPI_Count functions
This commit was SVN r28936.
2013-07-23 19:00:06 +00:00
Rolf vandeVaart
a3995f73d3 Update trunk to match OMPI 1.7.3 due to code reviews.
This commit was SVN r28934.
2013-07-23 17:58:21 +00:00
Nathan Hjelm
c6e586a81d MPI-3: fortran support for large counts using derived datatypes
Jeff:
 - Make sure not to go over 72 characters.  Love Fortran!
 - Ensure to include 'mpif-config.h' in Type_size_x.

This commit was SVN r28933.
2013-07-23 15:36:03 +00:00
Nathan Hjelm
c4c69b4ddf MPI-3: add support for large counts using derived datatypes
Add support for MPI_Count type and MPI_COUNT datatype and add the required
MPI-3 functions MPI_Get_elements_x, MPI_Status_set_elements_x,
MPI_Type_get_extent_x, MPI_Type_get_true_extent_x, and MPI_Type_size_x.
This commit adds only the C bindings. Fortran bindins will be added in
another commit. For now the MPI_Count type is define to have the same size
as MPI_Offset. The type is required to be at least as large as MPI_Offset
and MPI_Aint. The type was initially intended to be a ssize_t (if it was
the same size as a long long) but there were issues compiling romio with
that definition (despite the inclusion of stddef.h).

I updated the datatype engine to use size_t instead of uint32_t to support
large datatypes. This will require some review to make sure that 1) the
changes are beneficial, 2) nothing was broken by the change (I doubt
anything was), and 3) there are no performance regressions due to this
change.

Increase the maximum number of predifined datatypes to support MPI_Count

Put common get_elements code to ompi/datatype/ompi_datatype_get_elements.c

Update MPI_Get_count to reflect changes in MPI-3 (return MPI_UNDEFINED when the count is too large for an int)

This commit was SVN r28932.
2013-07-23 15:35:14 +00:00
Matthias Jurenz
c4a7dded5f Changes to VT:
- configure: Removed double slashes in path names which make trouble when building RPMs on Fedora (see #3688)

This commit was SVN r28924.
2013-07-23 08:12:18 +00:00
Nathan Hjelm
1349b825c2 MPI-2.2: Add C++ datatypes to mpi.h and fix support for MPI_C_*COMPLEX
This commit was SVN r28919.
2013-07-22 23:45:45 +00:00
Ralph Castain
59a71765cf Hmmm...these error outputs will never occur, which is probably not what the author intended. So do the output and THEN jump to the error exit.
This commit was SVN r28918.
2013-07-22 22:58:03 +00:00
Edgar Gabriel
8ffc1aac89 update the _component.c files in ompio to use the explicit assignment of the
mca_register_component_params element of the structure.

This commit was SVN r28914.
2013-07-22 21:11:05 +00:00
Nathan Hjelm
b17cd13c09 sharedfp: ensure sharedfp components register their parameters in mca_register_component_params not mca_component_open
This commit was SVN r28910.
2013-07-22 17:53:58 +00:00
Jeff Squyres
b437041aeb Update one more comment.
This commit was SVN r28908.
2013-07-22 17:29:00 +00:00
Jeff Squyres
4b6006402d Use the RTE framework instead of calling ORTE directly.
Brian (rightfully) hit me on the head with the
don't-use-ORTE-use-the-rte-framework clue bat; the usnic BTL now
nicely plays with the RTE framework.

This commit was SVN r28907.
2013-07-22 17:28:23 +00:00
Jeff Squyres
ca9da8a554 Fix minor typo in the comments/docs.
This commit was SVN r28905.
2013-07-22 17:24:17 +00:00