1
1
Граф коммитов

19733 Коммитов

Автор SHA1 Сообщение Дата
Adrian Reber
4ca07ae125 re-introduce distill_checkpoint_ready
In the OPAL_ENABLE_FT_CR code path there used to be a variable
'mca_base_component_distill_checkpoint_ready' which got removed.
The FT code was not compiling and while trying to get it to compile
again the old variable was #ifdef'd out. This re-introduces the
variable with a new name 'opal_base_distill_checkpoint_ready'
and enables the code previously #ifdef'd out.

This removes the last hack introduced to get the FT code to compile
again.

This commit was SVN r30928.
2014-03-04 16:14:46 +00:00
Adrian Reber
e5bef82ee1 OPAL_ENABLE_FT_CR: remove compiler warnings
When compiling --with-ft there are a few compiler warnings about
unused variables. This patch fixes those compiler warnings.

This commit was SVN r30927.
2014-03-04 15:28:07 +00:00
Rolf vandeVaart
c2ae29d860 Adjust priorities in smcuda BTL so it is used when CUDA-aware compiled in.
cmr=v1.7.5:reviewer=hjelmn

This commit was SVN r30925.
2014-03-04 14:44:44 +00:00
Mike Dubman
e630b0f47a update ignore list
disable coll-ml

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30924.
2014-03-04 14:23:12 +00:00
Vasily Filipov
5acb96d11b BTL/OPENIB: always define BTL_OPENIB_RDMACM_IB_ADDR to 0 or 1.
This commit was SVN r30923.
2014-03-04 08:54:17 +00:00
Ralph Castain
da4cb39683 If we can't find a route to communicate, emit an error message rather than just exiting with a non-zero status
cmr=v1.7.5:reviewer=jsquyres:subject=print error if cannot communicate

This commit was SVN r30922.
2014-03-04 04:57:53 +00:00
Jeff Squyres
6710c2ef3f usnic: remove unnecessary header union
Realistically, the usnic BTL doesn't need to know anything about the
underlying transport except for its header length (so that it knows
where the payload begins in a received buffer).  So remove the use of
the specific transport prefix union and just rely on the usnic verbs
extension to tell us what the header length is if we're using the
usNIC/UDP transport, or sizeof(struct ibv_grh) if we're using usNIC/L2
transport.

This commit was SVN r30914.
2014-03-03 21:33:12 +00:00
Jeff Squyres
05af83d5d8 common_verbs: Remove usnic magic probe test
Check the IBV_TRANSPORT_* values.  In the case of IBV_TRANSPORT_IWARP,
there's an ambiguity and we need to also check to see whether the
usnic verbs externsion probe exists.

This commit was SVN r30913.
2014-03-03 21:32:44 +00:00
Jeff Squyres
d61765cb2a usnic: use the new usnic verbs extensions
If they exist, call the usnic verbs extensions to both enable UDP
support and get the UD receiver header length that should be used
(rather than assume 40/struct GRH).

This commit was SVN r30912.
2014-03-03 21:31:42 +00:00
Nathan Hjelm
9e92c5be53 osc/sm: check for pthread_condattr_setpshared and pthread_mutexattr_setpshared. fall
back on barrier if either function doesn't exist.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30911.
2014-03-03 17:09:09 +00:00
Nathan Hjelm
dc3d4ffbf3 osc/sm: do not use gcc specific calls
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r30910.
2014-03-03 16:47:29 +00:00
Ralph Castain
0ac97761cc Now that we are binding by default, the issue of #slots and what to do when oversubscribed has become a bit more complicated. This isn't a problem in managed environments as we are always provided an accurate assignment for the #slots, or when -host is used to define the allocation since we automatically assume one slot for every time a node is named.
The problem arises when a hostfile is used, and the user provides host names without specifying the slots= paramater. In these cases, we assign slots=1, but automatically allow oversubscription since that number isn't confirmed. We then provide a separate parameter by which the user can direct that we assign the number of slots based on the sensed hardware - e.g., by telling us to set the #slots equal to the #cores on each node. However, this has been set to "off" by default.

In order to make this a little less complex for the user, set the default such that we automatically set #slots equal to #cores (or #hwt's if use_hwthreads_as_cpus has been set) only for those cases where the user provides names in a hostfile but does not provide slot information.

Also cleanup some a couple of issues in the mapping/binding system:

* ensure we only override the binding directive if we are oversubscribed *and* overload is not allowed

* ensure that the MPI procs don't attempt to bind themselves if they are launched by an orted as any binding directive (no matter what it was) would have been serviced by the orted on launch

* minor cleanup to the warning message when oversubscribed and binding was requested

cmr=v1.7.5:reviewer=rhc:subject=update mapping/binding system

This commit was SVN r30909.
2014-03-03 16:46:37 +00:00
Jeff Squyres
baf21ab446 Fix DESTDIR installs for javadocs
Thanks to Orion Poplawski for noticing the issue
(http://www.open-mpi.org/community/lists/devel/2014/03/14263.php).

This commit was SVN r30908.
2014-03-03 16:26:18 +00:00
Mike Dubman
05ee929832 OMPI-MXM: handle multiple calls to add_procs() in MXM
- now add_procs can be called more than once (during MPI_INIT and Inter_Comm_Create)
- adjust MXM to this reality

fixed by Alina, reviewed by Yossi/Mike

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30907.
2014-03-03 13:50:37 +00:00
Mike Dubman
33d363e65d OSHMEM: fix fortran binding
based on true story: http://www.open-mpi.org/community/lists/devel/2014/03/14262.php

fixed by Roman, reviewed by Igor/Mike

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30906.
2014-03-03 09:58:11 +00:00
Vasily Filipov
f36d50d494 OPENIB BTL/CONNECT: warning fixes caused by r30875.
This commit was SVN r30905.

The following SVN revision numbers were found above:
  r30875 --> open-mpi/ompi@f2014b96e7
2014-03-03 06:41:46 +00:00
Ralph Castain
85770ff782 Remove stale file reference
Refs trac:4322

This commit was SVN r30904.

The following Trac tickets were found above:
  Ticket 4322 --> https://svn.open-mpi.org/trac/ompi/ticket/4322
2014-03-02 03:48:56 +00:00
Oscar Vega-Gisbert
3a31869a58 Authorship of Java bindings
This commit was SVN r30901.
2014-03-01 20:18:05 +00:00
Oscar Vega-Gisbert
4def6c5ae2 Java: avoid setting Status attributes in the C side.
Make File implementation consistent with the rest of the API.

This commit was SVN r30900.
2014-03-01 19:54:36 +00:00
Nathan Hjelm
ab2a42d4e6 Update NEWS for MPI-3 RMA
cmr=v1.7.5:reviewer=rhc

This commit was SVN r30899.
2014-03-01 17:07:31 +00:00
Mike Dubman
1f69e2d588 OSHMEM: fix warn in macro
fixed by Elena, reviewed by Igor/Mike

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30897.
2014-03-01 07:54:48 +00:00
Ralph Castain
88b0e0cc6d Allow the user to turn off the oversubscribed-binding warning if overload-allowed has been provided
Refs trac:4317

This commit was SVN r30892.

The following Trac tickets were found above:
  Ticket 4317 --> https://svn.open-mpi.org/trac/ompi/ticket/4317
2014-02-28 17:55:53 +00:00
Jeff Squyres
1bda304a35 Mark the ompi_rte_abort() function as "no return"
This allows compilers to know that the code path(s) where
ompi_rte_abort() is invoked won't return (and therefore won't warn in
certain cases).

cmr=v1.8:reviewer=rhc

This commit was SVN r30891.
2014-02-28 17:45:36 +00:00
Jeff Squyres
3f845edfdd * Prefix the preprocessor macro used to protect the file
* Include opal_stdint.h so that we have uin32_t

cmr=v1.7.5:ticket=trac:4298

This commit was SVN r30890.

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-28 16:56:38 +00:00
Jeff Squyres
f8dbba78a7 Send the BTL-passed message to ompi_rte_abort.
cmr=v1.8:reviewer=rolfv

This commit was SVN r30889.
2014-02-28 16:20:54 +00:00
Ralph Castain
4a645f0342 Add detection of oversubscription with binding requested - if binding requested to core or hwt, warn and do not bind or else we will hurt performance. Also, if no binding directive was given, turn off the default binding
Refs trac:4317

This commit was SVN r30888.

The following Trac tickets were found above:
  Ticket 4317 --> https://svn.open-mpi.org/trac/ompi/ticket/4317
2014-02-28 16:08:52 +00:00
Ralph Castain
8500247c7b Fix the by-obj mapper in the case where slots are not specified, and so we are in a perpetual oversubscribed state
cmr=v1.7.5:reviewer=rhc

This commit was SVN r30887.
2014-02-28 05:21:46 +00:00
Ralph Castain
a4c3d0a5a0 Add some more debug to the by-obj mapper
This commit was SVN r30884.
2014-02-28 02:52:53 +00:00
Tom Naughton
8793560bde + fix abstraction violation (ORTE_process_info => OMPI_process_info)
This commit was SVN r30883.
2014-02-27 23:59:46 +00:00
Jeff Squyres
e819b5a34a Remove the vendor_ids parsing.
We don't use this functionality any more; we use the transport_type
and device name to identify usnic devices.  It's slightly easier
because we can transport_type+name from ibv_device_open() and don't
have to do an additional ibv_query_device() to get its attributes.

Reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30882.
2014-02-27 21:47:01 +00:00
Jeff Squyres
3cbdf33b88 This is what r30852 should have been: Consolidate into a single, outter loop of ibv_create_ah() calls
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style.  That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint.  If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later.  We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30879.

The following SVN revision numbers were found above:
  r30850 --> open-mpi/ompi@3641500442
  r30852 --> open-mpi/ompi@4e282a3295

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-27 17:19:50 +00:00
Jeff Squyres
45810f0efb Revert r30852: the wrong version of this patch got committed to SVN.
This commit was SVN r30878.

The following SVN revision numbers were found above:
  r30852 --> open-mpi/ompi@4e282a3295
2014-02-27 15:02:15 +00:00
Mike Dubman
d584869dda OSHMEM: memheap mkey exchange fix
fix situations where cluster nodes can have different btls

Fixed by Roman, reviewed by Igor, Mike
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30877.
2014-02-27 14:02:30 +00:00
Vasily Filipov
d702307521 OPENIB BTL/CONNECT: replace wrong rdma_freeaddrinfo call in rdmacm_component_query func.
This commit was SVN r30876.
2014-02-27 11:52:10 +00:00
Vasily Filipov
f2014b96e7 OPENIB BTL/CONNECT: Add support for AF_IB addressing in rdmacm.
This commit was SVN r30875.
2014-02-27 11:29:47 +00:00
Mike Dubman
e466fee747 OSHMEM: memheap framework fix warn, remove verbs deps
fixed by Igor, reviewed by Miked

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30874.
2014-02-27 07:22:57 +00:00
Mike Dubman
27b07a5d42 update ignores for new framework
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30873.
2014-02-27 06:46:06 +00:00
Ralph Castain
ce26b096b4 Prevent failover to direct_modex if key isn't found unless direct_modex was enabled
Refs trac:4258

This commit was SVN r30865.

The following Trac tickets were found above:
  Ticket 4258 --> https://svn.open-mpi.org/trac/ompi/ticket/4258
2014-02-27 02:04:56 +00:00
Ralph Castain
ff37524bdd Update ignores
cmr=v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r30864.
2014-02-27 02:04:06 +00:00
Ralph Castain
d109c523b9 Per patch from Tetsuya Mishima, complete the overhaul of the round-robin mappers
Refs trac:4296

This commit was SVN r30861.

The following Trac tickets were found above:
  Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296
2014-02-27 00:43:53 +00:00
Jeff Squyres
7440f21b75 Add usnic connectivity-checking agent service.
Basically: since usnic is a connectionless transport, we do not get
OS-provided services "for free" that connection-oriented transports
get, namely: "hey, I wasn't able to make a connection to peer X", and
"hey, your connection to peer X has died."
    
This connectivity-checker runs in a separate progress thread in the
usnic BTL in local rank 0 on each server.  Upon first send in any
process, the connectivty-checker agent will send some UDP pings to the
peer to ensure that we can reach it.  If we can't, we'll abort the job
with a nice show_help message.
    
There's a lengthy comment in btl_usnic_connectivity.h explains the
scheme and how it works.

Reviewed by Dave Goodell.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30860.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 22:21:25 +00:00
Oscar Vega-Gisbert
f2043776f6 mpi.Comm: some methods must be private.
They were protected because the old Prequest implementation used them.

This commit was SVN r30859.
2014-02-26 21:17:52 +00:00
Joshua Ladd
d1baf3f00c Stop linking in verbs stuff in oshmem/mca/memheap/base now that we have the sshmem framework.
Refs trac:4261

This commit was SVN r30858.

The following Trac tickets were found above:
  Ticket 4261 --> https://svn.open-mpi.org/trac/ompi/ticket/4261
2014-02-26 20:28:47 +00:00
Ralph Castain
61a21e4f31 Based on Tetsuya's patch, with some changes, correct the case of map-by node where multiple cpus/rank are requested and result in a non-integer match with num slots. Also correct tests for binding policy given to use the proper macro.
Refs trac:4296

This commit was SVN r30857.

The following Trac tickets were found above:
  Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296
2014-02-26 18:12:23 +00:00
Mike Dubman
4572bd58e5 OSHMEM: fix bug in scoll/mpi, add fortran support
dtypes support to oshmem scoll mpi. added comment to
oshmem scoll mpi component regarding casting size_t to int.

Fixed by Elena, Reviewed by Igor/Mike

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30856.
2014-02-26 17:06:44 +00:00
Mike Dubman
323e4418b9 OSHMEM: extract memheap allocate methods into separate framework
- similar to opal/shmem
- next step is some refactoring and merge into opal/shmem
 Developed by Igor, reviewed by AlexM, MikeD

This commit fixes trac:4261.

This commit was SVN r30855.

The following Trac tickets were found above:
  Ticket 4261 --> https://svn.open-mpi.org/trac/ompi/ticket/4261
2014-02-26 16:32:23 +00:00
Nathan Hjelm
dfe4a504e4 udcm: fix race between ack arrival and message send and potential hang in udcm
finalize.

Closes trac:4290

cmr=v1.7.5:reviewer=miked

This commit was SVN r30854.

The following Trac tickets were found above:
  Ticket 4290 --> https://svn.open-mpi.org/trac/ompi/ticket/4290
2014-02-26 15:33:27 +00:00
Nathan Hjelm
30b61a3333 Fix a number of issues in the new one sided code.
- Fix several typos is osc/rdma.

 - Fix a locking issue in osc/sm that was caused by an incorrect
   assumption about the semantics of opal_atomic_add_32.

 - Always unlock the accumulation lock in osc/sm.

 - The base of a processes shared memory window should be NULL if
   the size is zero. Fixed.

cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30853.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-26 15:33:18 +00:00
Jeff Squyres
4e282a3295 Consolidate into a single, outter loop of ibv_create_ah() calls
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style.  That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint.  If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later.  We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30852.

The following SVN revision numbers were found above:
  r30850 --> open-mpi/ompi@3641500442

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 11:02:12 +00:00
Jeff Squyres
52c48b34f0 Set svn:ignore in tests directory
cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30851.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 10:58:48 +00:00