1
1
Граф коммитов

25462 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
e9ce11c6a7 help-orterun.txt: minor word smything
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-31 16:33:46 -07:00
Jeff Squyres
347497cc7e mpirun.1in: add descriptions of new options
Add descriptions for the new --report-state-on-timeout and
--get-stack-traces options.

Also add --timeout, and cross-reference MPIEXEC_TIMEOUT with it.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-31 16:33:46 -07:00
Jeff Squyres
36f653164f .mailmap: Updates
Remove all @open-mpi-git-mirror entries; those are no longer necessary
since the official migration to Git/Github.

Add aliases for @users.noreply.github.com addresses.

Add fixes for what look like accidental name mispellings /
common-name-isms.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-31 19:18:24 -04:00
Jeff Squyres
1d83d594c8 AUTHORS: reformat and include all git log email addresses
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-31 19:18:24 -04:00
Nathan Hjelm
bf10d79914 btl/ugni: remove erroneous unlock
The endpoint lock was being released twice in mca_btl_ugni_get_ep.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-31 16:52:53 -06:00
Nathan Hjelm
cc96097873 btl/ugni: fix bug when attempting unaligned get on aries
This commit fixes a programming error when using an aries nic. The
documentation of ugni shows that only the local alignment restriction
for get was lifted on aries. There is still a remote address alignment
restriction.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-31 16:52:09 -06:00
Jeff Squyres
17202e5177 Merge pull request #1733 from jsquyres/pr/hwloc1113-fix
hwloc1113: add missing file to Makefile.am
2016-05-31 13:59:08 -04:00
Jeff Squyres
5cfee95ea4 hwloc1113: add missing file to Makefile.am
Lack of this file causes a failure when you run autogen.pl on a
distribution tarball.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-31 09:57:50 -07:00
rhc54
93ff4ce36d Merge pull request #1731 from rhc54/topic/timeout
Provide ETIMEDOUT as the mpirun exit code if the timeout limit was hit
2016-05-31 08:41:21 -07:00
Ralph Castain
0cd0ccb7fd Provide ETIMEDOUT as the mpirun exit code if the timeout limit was hit 2016-05-31 07:45:31 -07:00
Gilles Gouaillardet
1bbc5fadee ompi/win: silence an other warning 2016-05-31 13:18:39 +09:00
Gilles Gouaillardet
c41321b9e5 ompi/win: silence warning 2016-05-31 13:03:20 +09:00
rhc54
0965cb3d41 Merge pull request #1730 from rhc54/topic/pmixext
Patch from Gilles - modify detection of PMIx version for external libraries
2016-05-30 18:50:12 -07:00
Ralph Castain
7b115a9e0b Patch from Gilles - modify detection of PMIx version for external libraries 2016-05-30 14:30:10 -07:00
Nathan Hjelm
60519c2b4e cma: add support for MIPS and ARM
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-05-30 12:13:20 -06:00
George Bosilca
d2abff583e Fix race condition during BTL TCP tear-down.
bot🏷️bug
bot:assign:@hjelmn
2016-05-30 10:47:14 -05:00
rhc54
876257469e Merge pull request #1728 from rhc54/topic/sim
Enable simulation of large-scale clusters
2016-05-29 21:29:16 -07:00
Ralph Castain
3913595e10 Enable simulation of large-scale clusters by allowing multiple daemons/node. Specifying the ras_base_multiplier parameter to be greater than 1 will cause ORTE to replicate each allocated node by that factor. A daemon will be spawned for each replica, thus letting ORTE function as if it were on a much larger cluster.
Note that this cannot be used for MPI performance testing. It is really only useful for ORTE scaling tests. It also only works with the rsh/ssh launcher.
2016-05-29 18:56:18 -07:00
rhc54
a93c01d4f4 Merge pull request #1724 from rhc54/topic/timeout
Add a timeout cmd line option and an option to report state info upon timeout to assist with debugging Jenkins tests
2016-05-28 08:36:41 -07:00
Ralph Castain
ebe159acef Add a timeout cmd line option and an option to report state info upon timeout to assist with debugging Jenkins tests
If requested, obtain stacktraces for each application process and report it to stderr upon timeout

stack traces: minor improvements

- Also include the hostname and PID of the each process for which
  we're sending the stack traces (vs. just including the ORTE process
  name)
- Send a specific error message if we couldn't find "gstack" in the
  $PATH (e.g., on OS X)
- Send a sepcific error message if gstack fails to run
- Print a message that obtaining the stack traces may take a few
  seconds so that users don't wonder what's happening

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

help-orterun.txt: minor tweaks

Trivial update: show "--timeout" (instead of "-timeout") in the help
message, just to encourage the use of double-dash options.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

trivial: stacktrace -> stack trace

Trivial word smything.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-05-28 08:36:25 -07:00
Jeff Squyres
59f4a765b3 Merge pull request #1656 from hpcraink/pr/make_manpage
In case, we do not build Fortran, Fortran 2008 or CXX, the regexp in …
2016-05-28 11:02:12 -04:00
Jeff Squyres
e126d2cd18 Merge pull request #1584 from bgoglin/master
Update hwloc to v1.11.3
2016-05-28 11:01:54 -04:00
Nathan Hjelm
d8fd3a411a Merge pull request #1725 from hjelmn/request_fixes
ompi/request: fix bugs in MPI_Wait_some and MPI_Wait_any
2016-05-27 13:47:49 -06:00
Nathan Hjelm
0591139f49 ompi/request: fix bugs in MPI_Wait_some and MPI_Wait_any
This commit fixes two bugs in MPI_Wait_any:

 - If all requests are inactive then the sync wait would hang forever
   because no requests are attached to the sync.

 - The request pointer was pointing to the request before the completed
   request which caused the wrong request to be freed or marked inactive.

MPI_Wait_some had a similar issue if all the requests were pending.

These issues were identified by MTT.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-27 12:36:10 -06:00
Nathan Hjelm
3974987ba3 Merge pull request #1723 from hjelmn/warning_fixes
win: fix warnings
2016-05-27 12:26:04 -06:00
Nathan Hjelm
0adfb328e1 win: fix warnings
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-05-27 10:14:02 -06:00
rhc54
e5ee7adbe0 Merge pull request #1722 from rhc54/topic/pmixext
Enable PMIx external support for both 1.1.4 and 2.0 versions
2016-05-27 08:59:09 -07:00
Ralph Castain
55923eacd3 Stealing some pieces of Josh Hursey's PR #1583 and modifying a bit, allow the opal/pmix external component to handle both PMIx 1.1.4 and PMIx 2.0 versions. Automatically detect the version of the target external library and adjust the only two APIs that changed (PMIx_Init and PMIx_Finalize)
Rename temp vars in .m4 to avoid conflict with Travis
2016-05-27 08:06:31 -07:00
Nathan Hjelm
28dfa36a3f btl/ugni: fix bug when attempting unaligned get on aries
This commit fixes a programming error when using an aries nic. The
documentation of ugni shows that only the local alignment restriction
for get was lifted on aries. There is still a remote address alignment
restriction.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-27 08:22:13 -06:00
Nathan Hjelm
c19426ac1b btl/ugni: add support for additional atomic operations
This commit adds support for Cray Aries atomic operations. This
includes 32-bit and floating point support.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-27 08:22:13 -06:00
Nathan Hjelm
23fe19a956 btl: add support for more atomics
This commit add support for more atomic operations and type. The
operations added are logical and, logical or, logical xor, swap, min,
and max. New types are 32-bit int by using the
MCA_BTL_ATOMIC_FLAG_32BIT flag, 64-bit float by using the
MCA_BTL_ATOMIC_FLAG_FLOAT flag, and 32-bit float by using both
flags. Floating point numbers are supported by packing the number in
as an int64_t or int32_t. We will update the btl interface in the
future to make this less confusing.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-27 08:22:13 -06:00
Nathan Hjelm
d25b846c01 Merge pull request #1704 from hpcraink/pr/configure_framework
Fix configure for FreePGI on OSX
2016-05-26 17:01:08 -06:00
Nathan Hjelm
8c9292d5d1 Merge pull request #1721 from hjelmn/xrc_fix
btl/openib: fix XRC WQE calculation
2016-05-26 17:00:31 -06:00
Nathan Hjelm
56bdcd0888 btl/openib: fix XRC WQE calculation
Before dynamic add_procs support was committed to master we called
add_procs with every proc in the job. The XRC code in the openib btl
was taking advantage of this and setting the number of work queue
entries (WQE) based on all the procs on a remote node. Since that is
no longer the case we can not simply increment the sd_wqe field on the
queue pair. To fix the issue a new field has been added to the xrc
queue pair structure to keep track of how many wqes there are total on
the queue pair. If a new endpoint is added that increases the number
of wqes and the xrc queue pair is already connected the code will
attempt to modify the number of wqes on the queue pair. A failure is
ignored because all that will happen is the number of active send work
requests on an XRC queue pair will be more limited.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-26 15:58:31 -06:00
Aurelien Bouteiller
49bd28d0ac Merge pull request #1714 from hjelmn/scif_exclusivity
btl/scif: reduce default exclusivity
2016-05-26 17:53:11 -04:00
Pavel Shamis (Pasha)
60fd25f3fb VADER: Adjusting VADER_MAX_ADDRESS for non x86 platforms.
The original VADER_MAX_ADDRESS was tunned for x86_64 platforms only.
For non x86_64 platforms we can use XPMEM_MAXADDR_SIZE.

Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
2016-05-26 16:38:04 -05:00
Nathan Hjelm
f19c647f21 Merge pull request #1718 from hjelmn/config_fix
config: fix typo in mxm configury
2016-05-26 13:19:23 -06:00
Joshua Ladd
1a5fd6bf83 Merge pull request #1719 from ICLDisco/ucx_request_fix
Removal of ompi_request_lock from pml/ucx.
2016-05-26 15:09:57 -04:00
Thananon Patinyasakdikul
60d0fbf683 Removal of ompi_request_lock from pml/ucx. 2016-05-26 12:36:58 -04:00
Nathan Hjelm
8c2086995d config: fix typo in mxm configury
A 1 was missing when setting $1_LDFLAGS leading to erroneous items in
the wrapper cflags.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-26 10:28:07 -06:00
Nathan Hjelm
87ea9be863 Merge pull request #1715 from hjelmn/ugni_overhead
btl/ugni: reduce overhead of progress function
2016-05-26 10:17:00 -06:00
Gilles Gouaillardet
46710ba151 travis: fix a typo and create bogus directories to avoid compiler warnings 2016-05-26 15:28:10 +09:00
George Bosilca
90f294096e Remove more references to the request mutex.
Regarding BFO it should be mentionned that this component is currently
unmaintained, and that despite my efforts I could not make it compile
(it would not compile before this patch either).
2016-05-25 23:27:06 -04:00
Nathan Hjelm
5d322170a0 Merge pull request #1716 from hjelmn/request_fixes
Request fixes
2016-05-25 18:14:03 -06:00
Nathan Hjelm
9d439664f0 pml/yalla: update for request changes
This commit brings the pml/yalla component up to date with the request
rework changes.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 15:42:53 -06:00
Nathan Hjelm
8445c885ce pml/cm: update for request changes
This fixes a hang caused by the request refactor work. The cm pml was
not updated and was hanging is most cases.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 15:35:32 -06:00
Nathan Hjelm
dbfab94ede atomic/mxm: rename symbol that is a duplicate of one in atomic/ucx
This fixes an error when building with --enable-static.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 15:34:40 -06:00
Nathan Hjelm
99627319f0 btl/ugni: reduce overhead of progress function
This commit reduces the overhead of calling the ugni progress
function. It does the following:

 - Check for new connections once every eight calls.

 - Do not call remote smsg progress unless we are connected to at
   least one remote peer.

 - Do not call rdma progress unless at least one rdma fragment is
   outstanding.

 - Check endpoint wait list size before obtaining a lock.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 14:27:34 -06:00
Nathan Hjelm
5caf12cd9b btl/scif: reduce default exclusivity
This commit reduces the default exclusivity so that btl/scif is not
used for send/recv over other shared memory transports.

Fixes open-mpi/ompi#1712

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-05-25 14:25:07 -06:00
Nathan Hjelm
8e1d59aea8 Merge pull request #1708 from hjelmn/c__fix
request: fix compilation error
2016-05-25 10:48:02 -06:00