1
1
Граф коммитов

18568 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
46ed907003 Correctly handle list of cores specified in the rankfile - i.e., a rankfile entry such as:
rank 0=foo slot=0:0-1;1:0,1

cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29152.
2013-09-08 02:04:29 +00:00
Jeff Squyres
c9f05a2664 Delineate OMPI_FREE_LIST_*_MT separately.
The FREE_LIST_*_MT stuff was introduced on the SVN trunk in r28722
(2013-07-04), but so far, has not been merged into the v1.7 branch yet
(2013-09-06).  So put it in its own #ifdef, rather than defining it
based on OMPI_MAJOR_VERSION/OMPI_MINOR_VERSION.

This commit was SVN r29148.

The following SVN revision numbers were found above:
  r28722 --> open-mpi/ompi@c9e5ab9ed1
2013-09-06 19:22:56 +00:00
Jeff Squyres
e02cc0a7ec No need for this header file.
This commit was SVN r29147.
2013-09-06 19:22:28 +00:00
Ralph Castain
0da3968ade Update this script to output diff files
This commit was SVN r29146.
2013-09-06 19:08:12 +00:00
Alex Margolin
50a3c01a0f fixed build without thread support
This commit was SVN r29145.
2013-09-06 19:03:19 +00:00
Jeff Squyres
c53b0890cf Ensure that btl_usnic_compat.h is in the tarball.
This commit was SVN r29140.
2013-09-06 15:53:56 +00:00
Ralph Castain
2a116ecdfc Fix a race condition created when two processes attempt to send to each other at the same time. This causes both processes to start connection procedures, resulting in a c
onflict that can cause messages to be lost. Add detection of this condition, and have both processes cancel their connect operations. The process with the higher rank will
 reconnect, while the lower rank process will simply wait for the connection to be created.

Refs trac:3696

This commit was SVN r29139.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-09-06 05:15:25 +00:00
Dave Goodell
75fa28c303 usnic: v1.6<->trunk unification, trunk side
The Cisco-maintained v1.6 port of the usnic BTL has diverged from the
upstream trunk and v1.7 branches.  This commit adjusts the trunk to more
closely match the v1.6 branch to simplify future merging and
cherry-picking.

The usnic MCA parameters also need work on this side.

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29138.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:21:34 +00:00
Dave Goodell
a669bd01e6 usnic: revamp convertor handling.
The fix for the HPL SEGV was incorrect because it assumed the
prepare_src() routine was always allowed to return "bytes processed"
less than the requested "bytes to send".  It turns out this is only true
if the convertor is what limits the size, we are not allowed to limit
the data sent for our own reasons, else we break login in the upper
layers.

This means we need to learn the number of bytes out of the size
requested the convertor will give us, no matter how big the size is.
Unfortunately, this is a destructive test, and (currently) the only way to
learn that number is to actually have the convertor copy the data out into
buffers.

This change implements this, copying the entire data out into a chain of
send segments which are attached to the large send fragment.  Now we can
always return the proper size value to the PML.

Fixes Cisco bug CSCuj08024

Authored-by: Reese Faucette <rfaucett@cisco.com>

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29137.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:21:21 +00:00
Dave Goodell
0ef8336502 new bookkeeping code should return value indicating whether packet is good or not.
Authored-by: Reese Faucette <rfaucett@cisco.com>

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29136.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:19:32 +00:00
Dave Goodell
122890c2fd usnic: "bookeeping" --> "bookkeeping"
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29135.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:19:20 +00:00
Dave Goodell
0df6ed4acc usnic: squash warnings from perf improvements
Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29134.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:19:08 +00:00
Dave Goodell
6dc54d372d usnic: Basket of performance changes including:
- round segment buffer allocation to cache-line
    - split some routines into an inline fast section and a called
      slower section
    - introduce receive fastpath in component_progress that:
        o returns immediately if there is a packet available on priority
          queue and fastpath is enabled
        o disables fastpath for 1 time after use to provide fairness to
          other processing
        o defers receive buffer posting
        o defers bookeeping for receive until next call
          to usnic_component_progress

Authored-by: Reese Faucette <rfaucett@cisco.com>

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29133.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:18:57 +00:00
Dave Goodell
e44337742f add Reese Faucette to the AUTHORS file
Reese actually authored several usnic BTL changes prior to this commit,
but they were committed on his behalf by Jeff or me.

cmr=v1.7.3:reviewer=jsquyres

This commit was SVN r29132.
2013-09-06 03:10:03 +00:00
Ralph Castain
e8697de521 Deal with PGI compilers on the Mac by initializing a global variable.
cmr:v1.6.6:reviewer=jsquyres
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29129.
2013-09-05 21:40:50 +00:00
Ralph Castain
13ae51a91b Protect against possible race conditions and threads by ensuring that rml send always occurs inside an event.
cmr:v1.7.4:reviewer=jsquyres:subject=Protect against race conditions in rml send

This commit was SVN r29128.
2013-09-05 01:16:32 +00:00
Dave Goodell
9cab9777d9 usnic: properly destroy embedded small send frag
Without this, an `--enable-debug` build would hit an assertion in the
list code when run under valgrind with `--malloc-fill=0xff` or any other
case where malloc returned non-zeroed buffers.

Also allow the normal OBJ_ machinery to handle the constructor
invocation ordering for us instead of doing it by hand (which could have
led to future bugs).

Reviewed-by: jsquyres@cisco.com

cmr=v1.7.4

Depends on trunk functionality in r29095 and r29096.  Refs trac:3740,#3741.

This commit was SVN r29127.

The following SVN revision numbers were found above:
  r29095 --> open-mpi/ompi@d1b5940e97
  r29096 --> open-mpi/ompi@a552921171

The following Trac tickets were found above:
  Ticket 3740 --> https://svn.open-mpi.org/trac/ompi/ticket/3740
2013-09-04 20:59:12 +00:00
Ralph Castain
0d7fb932f1 Remove build product file
Refs trac:3744

This commit was SVN r29120.

The following Trac tickets were found above:
  Ticket 3744 --> https://svn.open-mpi.org/trac/ompi/ticket/3744
2013-09-04 16:38:22 +00:00
Ralph Castain
d32dfc96be Use the rankfile to obtain list of nodes for VM launch if/when rankfile is given.
cmr:v1.7.3:reviewer=jsquyres:subject=Obtain VM nodes from rankfile

This commit was SVN r29119.
2013-09-04 16:37:30 +00:00
Jeff Squyres
4bd5023593 Sync with 1.6.6 NEWS bullets.
This commit was SVN r29116.
2013-09-04 14:47:34 +00:00
George Bosilca
2aed876be6 Fix the bug where MPI_is_thread_main returns true for all
threads when not in MPI_THREAD_MULTIPLE mode.
Thanks to Lisandro Dalcin for pointing it out.

This commit was SVN r29113.
2013-09-04 11:10:51 +00:00
Ralph Castain
d9f0505952 Fix the lama verbose outputs so they don't segfault if someone asks for verbose output, but isn't using lama
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29108.
2013-09-03 17:55:35 +00:00
Ralph Castain
6011a4d29c As per the telecon, update hwloc to v1.7.2 so we can add MIC support. Ignore hwloc1.5.2 component for now until this tests out - will remove it then.
cmr:v1.7.4:reviewer=jsquyres

This commit was SVN r29107.
2013-09-03 16:23:42 +00:00
Ralph Castain
2bfa99e945 If a rankfile is given and the number of procs not specified in the mpirun cmd line, then set the number of procs to the number of ranks in the rankfile
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29104.
2013-09-02 15:04:40 +00:00
Jeff Squyres
f6619f8e9e Fix compile error in the heterogeneous case.
We've been forcing C99 compiler compliance for a while now, so use C99
syntax to keep the #if code tidy.

This commit was SVN r29101.
2013-08-31 12:56:08 +00:00
Brian Barrett
16a1166884 Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a
configure-time dynamic allocation of flags.  The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process.  Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.

This commit was SVN r29100.
2013-08-30 16:54:55 +00:00
Rolf vandeVaart
18962d296b This has bothered me for a while. Change MCA_BTL_TAG_BTL to MCA_BTL_TAG_IB. They are the same
value so this does not change anything.  (MCA_BTL_TAG_IB = MCA_BTL_TAG_BTL + 0).  This just makes it more correct.

This commit was SVN r29099.
2013-08-30 14:53:59 +00:00
Dave Goodell
c5a7e8a079 usnic: stomp format specifier warnings
The usnic BTL now builds cleanly under `--enable-picky` when `MSGDEBUG1`
is set.

Reviewed-by: jsquyres

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29097.
2013-08-29 23:24:14 +00:00
Dave Goodell
a552921171 call element destructors in ompi_free_list_destruct
The free list code called the constructor for each object it
slab allocated in ompi_free_list_grow.  This permits free list-managed
elements to safely allocate/deallocate resources in their
constructors/destructors without leaking.

It's probably best to let this soak on the trunk a little while before
moving it over to v1.7.

Reviewed-by: bosilca

cmr=v1.7.4:reviewer=bosilca

This commit was SVN r29096.
2013-08-29 22:56:31 +00:00
Dave Goodell
d1b5940e97 don't call OMPI_REQUEST_INIT in req constructor
Calling OMPI_REQUEST_INIT puts the request into an _INACTIVE state
instead of an _INVALID state, which we don't want if it's been
simply been constructed, e.g., in a free_list.  Without this change a
future change to call destructors at free list destruction time will
result in request dtor state assertion failures.

Reviewed-by: bosilca

cmr=v1.7.4:reviewer=bosilca

This commit was SVN r29095.
2013-08-29 22:56:22 +00:00
Ralph Castain
43d1cd92ac Ensure we activate the "daemons launched" state when only the HNP is left or else we will hang.
cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29094.
2013-08-29 22:50:51 +00:00
Dave Goodell
d17f104e7a oob: squash some valgrind warnings
These warnings were harmless, but they appeared even for simple programs
like single-process runs of `ring_c`.

This commit was SVN r29093.
2013-08-29 21:08:44 +00:00
Ralph Castain
5d1fa4fa0e Silence warnings:
osc_pt2pt_data_move.c: In function 'ompi_osc_pt2pt_sendreq_recv_accum_long_cb':
osc_pt2pt_data_move.c:643:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
osc_rdma_data_move.c: In function 'ompi_osc_rdma_control_send_cb':
osc_rdma_data_move.c:1312:37: warning: variable 'header' set but not used [-Wunused-but-set-variable]

This commit was SVN r29092.
2013-08-29 20:56:36 +00:00
Ralph Castain
12d4f45b5e Silence warning:
oob_tcp_connection.c: In function 'mca_oob_tcp_peer_accept':
oob_tcp_connection.c:725:9: warning: variable 'cmpval' set but not used [-Wunused-but-set-variable]

Refs trac:3696

This commit was SVN r29091.

The following Trac tickets were found above:
  Ticket 3696 --> https://svn.open-mpi.org/trac/ompi/ticket/3696
2013-08-29 20:56:05 +00:00
Ralph Castain
7a7cfdd519 A little cleanup - the base function to sort numa lists must return something or you get a warning about non-void function returning without value, so cleanup the return values. Ensure the mindist module actually checks for a return of "error" so it won't segfault, and have it emit a polite message when that happens.
cmr:v1.7.3:reviewer=jladd

This commit was SVN r29089.
2013-08-29 20:01:06 +00:00
Ralph Castain
3516348aad We don't need to report errors in pmi_setup as it is possible that PMI is available, but that we weren't launched under it (e.g., we launched via mpirun).
cmr:v1.7.3:reviewer=hjelmn:subject="Silence unnecessary PMI error msgs"

This commit was SVN r29086.
2013-08-29 16:35:20 +00:00
Ralph Castain
c71e760e6c The modex code was unfortunately written solely for PMI1 when updated to minimize calls to PMI_get - add the required PMI2 code
This commit was SVN r29084.
2013-08-28 23:52:32 +00:00
Ralph Castain
537e7380b1 As per the discussion on the devel telecon, do not compute ompi_comm_world_thread_level_mult if thread multiple is disabled. We aren't using the value anyway, but we will leave the current code in-place until we understand if it is needed or not.
This commit was SVN r29080.
2013-08-28 17:44:04 +00:00
Joshua Ladd
1802aabf1a Add support for autodetecting a MLNX HCA in the rmaps min distance feature. In this way, .ini files distributed with software stacks need not specify a particular HCA but instead may select the key word auto which will automatically select the discovered device. To use this feature, simply pass the keyword auto instead of a specific device name, --mca rmaps_base_dist_hca auto. If more than one card is installed, the mapper will inform the user of this and, at this point, the user will then need to specify which card via the normal route, e.g. --mca rmaps_base_dist_hca <dev_name>. This should be added to \ncmr=v1.7.4:reviewer=rhc:subject=Autodetect logic for min dist mapping
This commit was SVN r29079.
2013-08-28 16:23:33 +00:00
Nathan Hjelm
77a41e1ca9 ompi_info: mark the variables from disabled components as disabled in
the output of ompi_info.

A variable is disabled if its component will never be selected due to
a component selection parameter (eg. -mca btl self). The old behavior
of ompi_info was to not print these parameters at all. Now we print the
parameters. After some discussion with George it was decided that there
needed to be some way to see what parameters will not be used. This was
the comprimise.

This commit also fixes a bug and a typo in the pvar sytem. The enum_count
value in mca_base_pvar_dump was being used without being set. The full_name
in mca_base_pvar_t was not being used.

cmr=v1.7.3:ticket=trac:3734

This commit was SVN r29078.

The following Trac tickets were found above:
  Ticket 3734 --> https://svn.open-mpi.org/trac/ompi/ticket/3734
2013-08-28 16:03:23 +00:00
George Bosilca
305fa88d4b Remove two warnings from the SM BTL. The return code can be safely ignored
as the internals of the SM BTL will repost the fragment until the send operation
succesfully complete.

This commit was SVN r29077.
2013-08-28 06:36:01 +00:00
George Bosilca
badd011ac3 Minor cleanup.
This commit was SVN r29076.
2013-08-28 05:48:58 +00:00
Dave Goodell
dd82bd3c19 usnic: fix invalid rfstart initialization
endpoint_rfstart was being initialized from a value which was not yet
set.  Also ensure that rfstart is a valid index in the range
0..WINDOW_SIZE-1, since it is used as the index into endpoint_rcvd_segs,
which has WINDOW_SIZE elements.

Without this change there is significant risk of memory corruption or
segfaults, resulting in hangs or crashes, if malloc ever returns us a
value >=WINDOW_SIZE (4096).  Right now we seem to be getting lucky that
the malloc is returning zero-pages to us when we are allocating endpoint
structures (possibly because the freelist performs a single large
allocation for all endpoints).

Fixes Cisco bug CSCui88781.

Reviewed-by: rfaucett@cisco.com
Reviewed-by: jsquyres@cisco.com

cmr=v1.7.3:reviewer=jsquyres

This commit was SVN r29075.
2013-08-27 22:43:20 +00:00
Ralph Castain
7125143253 Replace missing opal_db open/select that was apparently lost on a prior merge. Thanks to Nathan for pointing it out
This commit was SVN r29072.
2013-08-27 19:42:31 +00:00
Nathan Hjelm
3744c5e0be Also check for /dev/mic/scif when deciding whether to enable the Linux
memory hooks.

The MIC has a /dev/scif device and the host has /dev/mic/scif. I do not
know if this device exists when no MIC is connected.

cmr=v1.7.4:ticket=trac:3733:reviewer=jsquyres

This commit was SVN r29071.

The following Trac tickets were found above:
  Ticket 3733 --> https://svn.open-mpi.org/trac/ompi/ticket/3733
2013-08-27 19:40:02 +00:00
Nathan Hjelm
c699ee7812 Update the ompi_info man page with information about variable levels
and improve the behavior of ompi_info.

This commit changes the default behavior of ompi_info --all when a
level is not specified. Instead of assuming level 1 in this case we
now assume level 9. This change is due to feedback from the community
after the introduction of the --level option.

I also added a new option: --selected-only. This option will limit the
displayed variables to components that can be selected (ie. if there
is a selection parameter set-- btl self,sm)

cmr=v1.7.3:reviewer=jsquyres

This commit was SVN r29070.
2013-08-27 19:11:37 +00:00
Nathan Hjelm
6e1656279e Enable the use of the Linux memory hooks on Intel MIC.
cmr=v1.7.3:reviewer=jsquyres

This commit was SVN r29069.
2013-08-27 18:25:18 +00:00
Nathan Hjelm
2da64eb719 Fix compilation of the MPI tools information interface when profiling
is enabled and fix a bug in the handling of watermark performance
variables.

cmr=v1.7.3:ticket=trac:3725:reviewer=jsquyres

This commit was SVN r29068.

The following Trac tickets were found above:
  Ticket 3725 --> https://svn.open-mpi.org/trac/ompi/ticket/3725
2013-08-27 18:19:18 +00:00
George Bosilca
cf09fe7c99 It wasn't even compiling when heterogeneous support was on.
This commit was SVN r29067.
2013-08-27 16:53:33 +00:00
George Bosilca
65a362909d Can't see how it works ...
Thanks Thomas and Arm for the patch.

This commit was SVN r29066.
2013-08-27 16:52:24 +00:00