1
1
Граф коммитов

18582 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
497c7e6abb Fixes trac:2904
The intercomm "merge" function can create a linkage between procs that was not reflected anywhere in a modex, and so at least some of the procs in the resulting communicator don't know how to talk to some of the new communicator's peers.

For example, consider the case where:

1. parent job A comm_spawns a process (job B) - these processes exchange modex and can communicate

2. parent job A now comm_spawns another process (job C) - again, these can communicate, but the proc in C knows nothing of B

3. do an intercomm merge across the communicators created by the two comm_spawns. This puts B and C into the same communicator, but they know nothing about how to talk to each other as they were not involved in any exchange of contact info. Hence, collectives on that communicator now fail. 

This fix adds an API to the ompi/dpm framework that (a) exchanges the modex info across the procs in the merge to ensure all procs know how to communicate, and (b) calls add_procs to give the btl's a chance to select transports to any new procs.

cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29166.

The following Trac tickets were found above:
  Ticket 2904 --> https://svn.open-mpi.org/trac/ompi/ticket/2904
2013-09-15 15:00:40 +00:00
Jeff Squyres
df7654e8cf 1. Per my previous email
(http://www.open-mpi.org/community/lists/devel/2013/09/12889.php), I
renamed all "f77" and "f90" directory/file names to "fortran"
(including removing shmemf77 / shmemf90 wrapper compilers and
replacing them with "shmemfort").

2. Fixed several Fortran coding errors.

3. Removed lots of old/stale comments that were clearly the result of
copying from the OMPI layer and then not cleaning up afterwards (i.e.,
the comments were wholly inaccurate in the oshmem layer).

4. Removed both redundant and harmful code from oshmem_config.h.in.

5. Temporarily slave building the oshmem Fortran bindings to
--enable-mpi-fortran.  This doesn't seem like a good long-term
solution, but at least you can now build all Fortran bindings (MPI +
oshmem) or not. *** SEE MY NOTE IN config/oshmem_configure_options.m4
FOR WORK THAT STILL NEEDS TO BE DONE!

This commit was SVN r29165.
2013-09-15 09:32:07 +00:00
Joshua Ladd
027e7deb7f Adding more fixes to stomp casting/addressing issues on 32-bit systems.
This commit was SVN r29164.
2013-09-13 20:37:30 +00:00
Rolf vandeVaart
096b8c022e Also add flag to debug output.
This commit was SVN r29163.
2013-09-13 19:47:05 +00:00
Rolf vandeVaart
c15b2a26b8 Fix some formatting. Move some CUDA-aware mca parameter initialization earlier.
This commit was SVN r29162.
2013-09-13 17:43:41 +00:00
Ralph Castain
5da795247d Attempt to resolve the nightly build problem
Refs trac:3763

This commit was SVN r29161.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2013-09-13 14:50:07 +00:00
Rolf vandeVaart
d247c26b84 In the case that HAVE_IBV_FORK_INIT is not defined, we will need this variable so we can give the user an error if they ask for it. Also fixes compile error when HAVE_IBV_FORK_INIT is not defined.
This commit was SVN r29160.
2013-09-13 14:38:49 +00:00
Rolf vandeVaart
ba9ec1b8bc For debug builds, add the ability to view memory registrations and deregistrations in the openib BTL.
This commit was SVN r29159.
2013-09-13 14:28:26 +00:00
Joshua Ladd
936c42a872 This commit 1. Fixes the void pointer casting to 64-bit integer issue in shmem_lock.c, line 170. 2. Applies the patch provided by George to add support for Intel (12.1.020110811) compiler in OSHMEM. 3. Fixes the configure warning generated by AC_TRY_RUN - disable mxm atomic locks if cross compiling.
This commit was SVN r29158.
2013-09-12 20:54:55 +00:00
Ralph Castain
eb132f923b Check for bozo error of negative np for an app as this will cause ORTE to spin forever.
cmr:v1.7.3:reviewer=jsquyres:subject=Check for negative np
cmr:v1.6.6:reviewer=jsquyres:subject=Check for negative np

This commit was SVN r29157.
2013-09-11 19:21:22 +00:00
Ralph Castain
3cf3b88bd1 Fix typo in function name
This commit was SVN r29156.
2013-09-11 03:40:13 +00:00
Ralph Castain
6abe030e99 Silence all the svn crud by adding svn ignores
This commit was SVN r29155.
2013-09-10 16:40:28 +00:00
Ralph Castain
845e92bc5d Remove the old version of hwloc. Update the new one to reflect the official release dates.
Refs trac:3744

This commit was SVN r29154.

The following Trac tickets were found above:
  Ticket 3744 --> https://svn.open-mpi.org/trac/ompi/ticket/3744
2013-09-10 16:30:13 +00:00
Joshua Ladd
b3f88c4a1d Per the RFC schedule, this commit adds Mellanox OpenSHMEM to the trunk. It does not yet run on OSX or with CM PML for an MTL other than MXM. Mellanox is aware of these issues and is in the process of resolving them. This should be added to \ncmr=v1.7.4:subject=Move OSHMEM to 1.7.4:reviewer=rhc
This commit was SVN r29153.
2013-09-10 15:34:09 +00:00
Ralph Castain
46ed907003 Correctly handle list of cores specified in the rankfile - i.e., a rankfile entry such as:
rank 0=foo slot=0:0-1;1:0,1

cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29152.
2013-09-08 02:04:29 +00:00
Jeff Squyres
c9f05a2664 Delineate OMPI_FREE_LIST_*_MT separately.
The FREE_LIST_*_MT stuff was introduced on the SVN trunk in r28722
(2013-07-04), but so far, has not been merged into the v1.7 branch yet
(2013-09-06).  So put it in its own #ifdef, rather than defining it
based on OMPI_MAJOR_VERSION/OMPI_MINOR_VERSION.

This commit was SVN r29148.

The following SVN revision numbers were found above:
  r28722 --> open-mpi/ompi@c9e5ab9ed1
2013-09-06 19:22:56 +00:00
Jeff Squyres
e02cc0a7ec No need for this header file.
This commit was SVN r29147.
2013-09-06 19:22:28 +00:00
Ralph Castain
0da3968ade Update this script to output diff files
This commit was SVN r29146.
2013-09-06 19:08:12 +00:00
Alex Margolin
50a3c01a0f fixed build without thread support
This commit was SVN r29145.
2013-09-06 19:03:19 +00:00
Jeff Squyres
c53b0890cf Ensure that btl_usnic_compat.h is in the tarball.
This commit was SVN r29140.
2013-09-06 15:53:56 +00:00
Ralph Castain
2a116ecdfc Fix a race condition created when two processes attempt to send to each other at the same time. This causes both processes to start connection procedures, resulting in a c
onflict that can cause messages to be lost. Add detection of this condition, and have both processes cancel their connect operations. The process with the higher rank will
 reconnect, while the lower rank process will simply wait for the connection to be created.

Refs trac:3696

This commit was SVN r29139.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-09-06 05:15:25 +00:00
Dave Goodell
75fa28c303 usnic: v1.6<->trunk unification, trunk side
The Cisco-maintained v1.6 port of the usnic BTL has diverged from the
upstream trunk and v1.7 branches.  This commit adjusts the trunk to more
closely match the v1.6 branch to simplify future merging and
cherry-picking.

The usnic MCA parameters also need work on this side.

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29138.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:21:34 +00:00
Dave Goodell
a669bd01e6 usnic: revamp convertor handling.
The fix for the HPL SEGV was incorrect because it assumed the
prepare_src() routine was always allowed to return "bytes processed"
less than the requested "bytes to send".  It turns out this is only true
if the convertor is what limits the size, we are not allowed to limit
the data sent for our own reasons, else we break login in the upper
layers.

This means we need to learn the number of bytes out of the size
requested the convertor will give us, no matter how big the size is.
Unfortunately, this is a destructive test, and (currently) the only way to
learn that number is to actually have the convertor copy the data out into
buffers.

This change implements this, copying the entire data out into a chain of
send segments which are attached to the large send fragment.  Now we can
always return the proper size value to the PML.

Fixes Cisco bug CSCuj08024

Authored-by: Reese Faucette <rfaucett@cisco.com>

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29137.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:21:21 +00:00
Dave Goodell
0ef8336502 new bookkeeping code should return value indicating whether packet is good or not.
Authored-by: Reese Faucette <rfaucett@cisco.com>

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29136.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:19:32 +00:00
Dave Goodell
122890c2fd usnic: "bookeeping" --> "bookkeeping"
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29135.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:19:20 +00:00
Dave Goodell
0df6ed4acc usnic: squash warnings from perf improvements
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29134.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:19:08 +00:00
Dave Goodell
6dc54d372d usnic: Basket of performance changes including:
- round segment buffer allocation to cache-line
    - split some routines into an inline fast section and a called
      slower section
    - introduce receive fastpath in component_progress that:
        o returns immediately if there is a packet available on priority
          queue and fastpath is enabled
        o disables fastpath for 1 time after use to provide fairness to
          other processing
        o defers receive buffer posting
        o defers bookeeping for receive until next call
          to usnic_component_progress

Authored-by: Reese Faucette <rfaucett@cisco.com>

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29133.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:18:57 +00:00
Dave Goodell
e44337742f add Reese Faucette to the AUTHORS file
Reese actually authored several usnic BTL changes prior to this commit,
but they were committed on his behalf by Jeff or me.

cmr=v1.7.3:reviewer=jsquyres

This commit was SVN r29132.
2013-09-06 03:10:03 +00:00
Ralph Castain
e8697de521 Deal with PGI compilers on the Mac by initializing a global variable.
cmr:v1.6.6:reviewer=jsquyres
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29129.
2013-09-05 21:40:50 +00:00
Ralph Castain
13ae51a91b Protect against possible race conditions and threads by ensuring that rml send always occurs inside an event.
cmr:v1.7.4:reviewer=jsquyres:subject=Protect against race conditions in rml send

This commit was SVN r29128.
2013-09-05 01:16:32 +00:00
Dave Goodell
9cab9777d9 usnic: properly destroy embedded small send frag
Without this, an `--enable-debug` build would hit an assertion in the
list code when run under valgrind with `--malloc-fill=0xff` or any other
case where malloc returned non-zeroed buffers.

Also allow the normal OBJ_ machinery to handle the constructor
invocation ordering for us instead of doing it by hand (which could have
led to future bugs).

Reviewed-by: jsquyres@cisco.com

cmr=v1.7.4

Depends on trunk functionality in r29095 and r29096.  Refs trac:3740,#3741.

This commit was SVN r29127.

The following SVN revision numbers were found above:
  r29095 --> open-mpi/ompi@d1b5940e97
  r29096 --> open-mpi/ompi@a552921171

The following Trac tickets were found above:
  Ticket 3740 --> https://svn.open-mpi.org/trac/ompi/ticket/3740
2013-09-04 20:59:12 +00:00
Ralph Castain
0d7fb932f1 Remove build product file
Refs trac:3744

This commit was SVN r29120.

The following Trac tickets were found above:
  Ticket 3744 --> https://svn.open-mpi.org/trac/ompi/ticket/3744
2013-09-04 16:38:22 +00:00
Ralph Castain
d32dfc96be Use the rankfile to obtain list of nodes for VM launch if/when rankfile is given.
cmr:v1.7.3:reviewer=jsquyres:subject=Obtain VM nodes from rankfile

This commit was SVN r29119.
2013-09-04 16:37:30 +00:00
Jeff Squyres
4bd5023593 Sync with 1.6.6 NEWS bullets.
This commit was SVN r29116.
2013-09-04 14:47:34 +00:00
George Bosilca
2aed876be6 Fix the bug where MPI_is_thread_main returns true for all
threads when not in MPI_THREAD_MULTIPLE mode.
Thanks to Lisandro Dalcin for pointing it out.

This commit was SVN r29113.
2013-09-04 11:10:51 +00:00
Ralph Castain
d9f0505952 Fix the lama verbose outputs so they don't segfault if someone asks for verbose output, but isn't using lama
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29108.
2013-09-03 17:55:35 +00:00
Ralph Castain
6011a4d29c As per the telecon, update hwloc to v1.7.2 so we can add MIC support. Ignore hwloc1.5.2 component for now until this tests out - will remove it then.
cmr:v1.7.4:reviewer=jsquyres

This commit was SVN r29107.
2013-09-03 16:23:42 +00:00
Ralph Castain
2bfa99e945 If a rankfile is given and the number of procs not specified in the mpirun cmd line, then set the number of procs to the number of ranks in the rankfile
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29104.
2013-09-02 15:04:40 +00:00
Jeff Squyres
f6619f8e9e Fix compile error in the heterogeneous case.
We've been forcing C99 compiler compliance for a while now, so use C99
syntax to keep the #if code tidy.

This commit was SVN r29101.
2013-08-31 12:56:08 +00:00
Brian Barrett
16a1166884 Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a
configure-time dynamic allocation of flags.  The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process.  Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.

This commit was SVN r29100.
2013-08-30 16:54:55 +00:00
Rolf vandeVaart
18962d296b This has bothered me for a while. Change MCA_BTL_TAG_BTL to MCA_BTL_TAG_IB. They are the same
value so this does not change anything.  (MCA_BTL_TAG_IB = MCA_BTL_TAG_BTL + 0).  This just makes it more correct.

This commit was SVN r29099.
2013-08-30 14:53:59 +00:00
Dave Goodell
c5a7e8a079 usnic: stomp format specifier warnings
The usnic BTL now builds cleanly under `--enable-picky` when `MSGDEBUG1`
is set.

Reviewed-by: jsquyres

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29097.
2013-08-29 23:24:14 +00:00
Dave Goodell
a552921171 call element destructors in ompi_free_list_destruct
The free list code called the constructor for each object it
slab allocated in ompi_free_list_grow.  This permits free list-managed
elements to safely allocate/deallocate resources in their
constructors/destructors without leaking.

It's probably best to let this soak on the trunk a little while before
moving it over to v1.7.

Reviewed-by: bosilca

cmr=v1.7.4:reviewer=bosilca

This commit was SVN r29096.
2013-08-29 22:56:31 +00:00
Dave Goodell
d1b5940e97 don't call OMPI_REQUEST_INIT in req constructor
Calling OMPI_REQUEST_INIT puts the request into an _INACTIVE state
instead of an _INVALID state, which we don't want if it's been
simply been constructed, e.g., in a free_list.  Without this change a
future change to call destructors at free list destruction time will
result in request dtor state assertion failures.

Reviewed-by: bosilca

cmr=v1.7.4:reviewer=bosilca

This commit was SVN r29095.
2013-08-29 22:56:22 +00:00
Ralph Castain
43d1cd92ac Ensure we activate the "daemons launched" state when only the HNP is left or else we will hang.
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29094.
2013-08-29 22:50:51 +00:00
Dave Goodell
d17f104e7a oob: squash some valgrind warnings
These warnings were harmless, but they appeared even for simple programs
like single-process runs of `ring_c`.

This commit was SVN r29093.
2013-08-29 21:08:44 +00:00
Ralph Castain
5d1fa4fa0e Silence warnings:
osc_pt2pt_data_move.c: In function 'ompi_osc_pt2pt_sendreq_recv_accum_long_cb':
osc_pt2pt_data_move.c:643:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
osc_rdma_data_move.c: In function 'ompi_osc_rdma_control_send_cb':
osc_rdma_data_move.c:1312:37: warning: variable 'header' set but not used [-Wunused-but-set-variable]

This commit was SVN r29092.
2013-08-29 20:56:36 +00:00
Ralph Castain
12d4f45b5e Silence warning:
oob_tcp_connection.c: In function 'mca_oob_tcp_peer_accept':
oob_tcp_connection.c:725:9: warning: variable 'cmpval' set but not used [-Wunused-but-set-variable]

Refs trac:3696

This commit was SVN r29091.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-08-29 20:56:05 +00:00
Ralph Castain
7a7cfdd519 A little cleanup - the base function to sort numa lists must return something or you get a warning about non-void function returning without value, so cleanup the return values. Ensure the mindist module actually checks for a return of "error" so it won't segfault, and have it emit a polite message when that happens.
cmr:v1.7.3:reviewer=jladd

This commit was SVN r29089.
2013-08-29 20:01:06 +00:00
Ralph Castain
3516348aad We don't need to report errors in pmi_setup as it is possible that PMI is available, but that we weren't launched under it (e.g., we launched via mpirun).
cmr:v1.7.3:reviewer=hjelmn:subject="Silence unnecessary PMI error msgs"

This commit was SVN r29086.
2013-08-29 16:35:20 +00:00