1
1
Граф коммитов

19102 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
802c89680a Protect hwloc/configure/m4's use of some temporary shell variables
Fix problem reported by Paul Hargrove:
http://www.open-mpi.org/community/lists/devel/2013/12/13519.php

cmr=v1.7.4:reviewer=brbarret

This commit was SVN r30013.
2013-12-20 14:48:40 +00:00
Ralph Castain
71b52fe861 Ensure that comm_spawn'd procs get user-specified forwarded envars
Thanks to Tim Miller for reporting the regression from the 1.6 series

cmr=v1.7.4:reviewer=jsquyres:subject=Ensure that comm_spawn'd procs get user-specified forwarded envars

This commit was SVN r30012.
2013-12-20 14:47:35 +00:00
Ralph Castain
7cf0fc5578 One more round of sys_limit fixes...sigh
Refs trac:4010

This commit was SVN r30011.

The following Trac tickets were found above:
  Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010
2013-12-20 14:44:51 +00:00
Ralph Castain
e49c16b975 Grrr....use #if instead of #ifdef
Refs trac:4010

This commit was SVN r30010.

The following Trac tickets were found above:
  Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010
2013-12-20 14:24:26 +00:00
Ralph Castain
6e6351959d Check for all the RLIMIT_foo constants that we use, and update the limit checks to use the new #define values. Fix a bug where failure of some might lead to incorrect bracketing.
Refs trac:4010

This commit was SVN r30009.

The following Trac tickets were found above:
  Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010
2013-12-20 14:09:43 +00:00
Jeff Squyres
4739850931 As reported by Paul Hargrove
(http://www.open-mpi.org/community/lists/devel/2013/12/13521.php),
OpenBSD-5 #define's MIN and MAX, so we need to #undef them.

cmr=v1.7.4:reviewer=rhc:subject=undef MIN and MAX for OpenBSD-5

This commit was SVN r30007.
2013-12-20 11:40:59 +00:00
Jeff Squyres
090ce4187a Fix compiler errors on Solaris, NetBSD, and OpenBSD:
* Per
   http://www.open-mpi.org/community/lists/devel/2013/12/13504.php, 
   protect usage of struct ifreq->ifr_hwaddr
 * Per
   http://www.open-mpi.org/community/lists/devel/2013/12/13503.php,
   avoid #define conflict with the token "if_mtu"
 * Also fix some whitespace and string naming issues in opal/util/if.c

Tested by Paul Hargrove.

Refs trac:4010

This commit was SVN r30006.

The following Trac tickets were found above:
  Ticket 4010 --> https://svn.open-mpi.org/trac/ompi/ticket/4010
2013-12-20 11:17:30 +00:00
Mike Dubman
d78a9cdd77 add rpath on mca_mtl_mxm.so to point to /path/to/mxm/lib/libmxm.so which was detected at configure time
This *should* fix following situation:

1 mxm.rpm puts /etc/ld.so.conf.d/mxm.conf file during rpm install with libpath to /opt/mellanox/mxm/lib
2 some1 can extract mxm.rpm into $HOME/mxm and compile OMPI with new mxm location
3 during runtime, OMPI from prev step will pick MXM from step (1) instead of from step (2)

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r30005.
2013-12-20 11:15:41 +00:00
Mike Dubman
6dbce7f9f8 fix compile warning
Refs trac:3763

This commit was SVN r30004.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2013-12-20 11:03:09 +00:00
Ralph Castain
f15b0c9863 Add protections around the various system limits to protect code on unusual systems
Thanks to Paul Hargrove for reporting it on OpenBSD-5

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30003.
2013-12-20 03:18:07 +00:00
Ralph Castain
6959ba5577 Add missing include file.
Thanks to Paul Hargrove for spotting it.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29998.
2013-12-19 23:39:21 +00:00
Nathan Hjelm
653babc737 Fix a couple issues with the mca_base_var system:
- Use ->boolval for booleans when creating a string.
 - Solaris has some issue with the ?: used in one of find functions. Use an if instead.
 - Change all instances of index -> vari to avoid issues with redefining index.

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29997.
2013-12-19 23:28:17 +00:00
Nathan Hjelm
ee9cd13b90 Remove opal_recursion_depth_counter and opal_progress_thread_count. These counters add two
atomics in the critical path and are not currently used. We can bring them back if there
turns out to be a good use for them.

cmr=v1.7.4:reviewer=brbarret

This commit was SVN r29994.
2013-12-19 23:15:27 +00:00
Dave Goodell
0c6b292442 romio: pick "infinitely stale" fix from upstream
Some NFS scenarios can result in an infinite ESTALE return, which will
hang ROMIO.  This commit causes ROMIO to error out after a large number
of retries instead of spinning forever.

This is MPICH commit b250d338:

http://git.mpich.org/mpich.git/commit/b250d338e66667a8a1071a5f73a4151fd59f83b2

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r29993.
2013-12-19 22:55:26 +00:00
Jeff Squyres
e4097c5cc9 George pointed out that r29991 wasn't quite right. This patch is
added on top of r29991 and:

 * Consolidates the _debug variables in opal_datatype_internal.h and
   opal_convertor.h
 * Puts the DO_DEBUG macros back in the .c files, because they are
   slightly different from each other

Refs trac:4004

This commit was SVN r29992.

The following SVN revision numbers were found above:
  r29991 --> open-mpi/ompi@a88e143127

The following Trac tickets were found above:
  Ticket 4004 --> https://svn.open-mpi.org/trac/ompi/ticket/4004
2013-12-19 22:54:27 +00:00
Jeff Squyres
a88e143127 Fixes trac:3989: opal_pack_debug was instantiated as a bool in one file
and extern'ed s an int in another.  This caused a SIBGUS on
Solaris/SPARC.

This commit properly moves the extern to a .h file so that it's the
same in all files.  It also moves the DO_DEBUG to the header file,
because it was defined to the same thing in multiple .c files.

cmr=v1.7.4:reviewer=bosilca:subject=fix SPARC SIGBUS in opal convertor code

This commit was SVN r29991.

The following Trac tickets were found above:
  Ticket 3989 --> https://svn.open-mpi.org/trac/ompi/ticket/3989
2013-12-19 21:38:51 +00:00
Ralph Castain
b745078535 Support user-provided envars for comm_spawn using info key "env"
Thanks to Tom Fogal for the request

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29990.
2013-12-19 20:59:50 +00:00
Ralph Castain
d47d2569f3 We stripped the process info packing routine to minimize message size when sending the launch message, but tools still require all the info. So modify the tool-hnp handshake to explicitly add the missing info
Refs trac:3992

This commit was SVN r29989.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 20:42:20 +00:00
Jeff Squyres
3a14adef63 Remove the comments around these assignments; otherwise, we won't get
function pointers set to the _map functions, and we get segv's in MTT
testing (e.g., the C++ suite, which actually calls MPI_Cart_map and
MPI_Graph_map).

cmr=v1.7.4:reviewer=bosilca:subject=Fix topo _map function pointer assignments

This commit was SVN r29988.
2013-12-19 20:41:32 +00:00
Yossi Etigin
6ab4aba9e6 Fix missing include of show_help.h in mtl mxm.
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29987.
2013-12-19 19:37:21 +00:00
Ralph Castain
79af9825ac Update of patch from Takahiro Kawashima
Refs trac:3986

This commit was SVN r29984.

The following Trac tickets were found above:
  Ticket 3986 --> https://svn.open-mpi.org/trac/ompi/ticket/3986
2013-12-19 17:22:37 +00:00
Jeff Squyres
42e3e5cd4b Fixes trac:3990: ensure we don't SIGBUS on SPARC by forcing a memory copy
and preventing access to potentially unaligned data.

Reviewed by Dave Goodell.  Tested by Siegmarr Gross.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=fix SPARC SIGBUS in opal net code

This commit was SVN r29983.

The following Trac tickets were found above:
  Ticket 3990 --> https://svn.open-mpi.org/trac/ompi/ticket/3990
2013-12-19 16:51:34 +00:00
Ralph Castain
55cd65b149 Don't warn about binding (process and/or memory) if the node cannot do it or if we would overload, but it wasn't specifically requested by the user (i.e., it is the result of the default policy). Instead, just don't bind and quietly move along.
Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff.

cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings

This commit was SVN r29978.
2013-12-19 16:31:45 +00:00
Ralph Castain
2a6376fcf5 Update platform files
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29977.
2013-12-19 15:38:28 +00:00
Mike Dubman
d70f93b2dc fix corrupted verbose output in oshmem
set yoda prio lower than ikrit
fix anon unions in ikrit
Refs trac:3763

This commit was SVN r29976.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2013-12-19 11:59:32 +00:00
Ralph Castain
9b32dacb6c Ensure we don't abort if a tool cannot send a message - the orte/util/comm library used by tools to query mpirun knows how to handle this situation.
Refs trac:3992

This commit was SVN r29975.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 07:10:36 +00:00
Ralph Castain
6239e64f36 Further cleanup of orte-ps so it doesn't abort when hitting a stale HNP - only report that event once and just keep working.
Refs trac:3992

This commit was SVN r29974.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 03:28:05 +00:00
Ralph Castain
bf5e314f76 Tools require their own errmgr and state components so they can handle any errors that occur in, for example, communication .
Refs trac:3992

This commit was SVN r29972.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 01:49:33 +00:00
Tom Naughton
3aefca32b0 + update rte db_fetch comments with change from r29931
This commit was SVN r29971.

The following SVN revision numbers were found above:
  r29931 --> open-mpi/ompi@0995a6f3b9
2013-12-19 01:16:58 +00:00
Ralph Castain
3aaca16faa Silence warnings that are no longer valid
Refs trac:3992

This commit was SVN r29970.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 00:40:36 +00:00
Ralph Castain
c5956e7b8c Convert debug output to opal_output_verbose
Thanks to Tetsuya Mishima for reporting it

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29969.
2013-12-19 00:36:15 +00:00
Jeff Squyres
bb59b07321 Remove CFLAGS setting that was really only intended for the v1.6
branch (it's not necessary on trunk/v1.7 because they require C99,
which allows variadic macros).

Also fix another compiler warning (using %p to print a (void*)).

Submitted by Jeff, reviewed by Dave.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=two usnic BTL fixes

This commit was SVN r29966.
2013-12-19 00:19:05 +00:00
Nathan Hjelm
b9765a380f Update NEWS with new MPI-3 features and a note about the new ROMIO
version.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29965.
2013-12-19 00:16:07 +00:00
Jeff Squyres
515fd00411 CSCul95082: DMAR faults during mtt testing
usnic_channel_finalize() was deregistering recv buffers before
destroying the QP to which they were posted. The QP needs to be
destroyed first so that the NIC does not attemp tto write to
deregistered memory, causing the DMAR messages.

Submitted by Reese, reviewed by Jeff.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29963.
2013-12-19 00:01:35 +00:00
Jeff Squyres
d8c0c919e1 Clarify the comment: if ummunotify is present *and* if we were
compiled with ummunotify support (which is the check that r29720 just
recently added).

This commit was SVN r29961.

The following SVN revision numbers were found above:
  r29720 --> open-mpi/ompi@ae8c826527
2013-12-18 23:39:22 +00:00
Ralph Castain
da6551bd3d Update opal_backtrace_print call in oshmem
This commit was SVN r29960.
2013-12-18 23:15:40 +00:00
Ralph Castain
39957df08e Fixes trac:3963. Fix the tool ess procedure so it opens and selects the OOB framework, and have the OOB TCP module update the route to new connections (the routed modules know what to do).
Thanks to Dave Love and Ashley Pittman for pointing out the problem.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix tool communications with mpirun

This commit was SVN r29959.

The following Trac tickets were found above:
  Ticket 3963 --> https://svn.open-mpi.org/trac/ompi/ticket/3963
2013-12-18 23:13:46 +00:00
Rolf vandeVaart
a9b7693da3 Update some new CUDA-aware features.
This commit was SVN r29958.
2013-12-18 21:09:25 +00:00
Ralph Castain
77553f72be Per this email thread:
http://www.open-mpi.org/community/lists/devel/2013/12/13412.php

fix the backtrace function to avoid async issues. Thanks to Takahiro Kawashima for the patch

This commit was SVN r29955.
2013-12-18 17:57:37 +00:00
Ralph Castain
c3d2b3e9b8 Update the default ranking policy
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29954.
2013-12-18 17:13:13 +00:00
Jeff Squyres
2665c91b2a Fixes trac:3958: use the right type name (mca_topo_base_module_t) in the
debugger code (not mca_topo_base_module_2_1_0_t).

I checked: we do a similar thing for coll in the communicator struct
(i.e., leave the version number off the module struct).  I confess to
not remembering ''why'' we leave the version number off, but it seems
to be consistent this way...

cmr=v1.7.4:reviewer=bosilca:subject=fix debugger type symbol lookup for mca_topo_base_module_t

This commit was SVN r29953.

The following Trac tickets were found above:
  Ticket 3958 --> https://svn.open-mpi.org/trac/ompi/ticket/3958
2013-12-18 15:17:15 +00:00
Jeff Squyres
4d6967efc4 Sync 1.6.6 bullets with v1.6 branch
This commit was SVN r29952.
2013-12-18 13:31:46 +00:00
Mike Dubman
da5c55342f fix bash comparison to work as expected
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29949.
2013-12-18 10:25:27 +00:00
Ralph Castain
ab4636c47b Per email on devel list, change the default rank-by to slot unless map-by <obj> is specified, in which case use rank-by <obj>
Refs trac:3977

This commit was SVN r29945.

The following Trac tickets were found above:
  Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977
2013-12-18 00:48:50 +00:00
Yossi Etigin
ecfb122c97 Fix segfault in osc pt2pt completion handler, when the request is canceled during finalization.
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29938.
2013-12-17 17:30:14 +00:00
Ralph Castain
1b0bed5539 Add news item
This commit was SVN r29935.
2013-12-17 14:55:33 +00:00
Ralph Castain
53cd00fe16 By setting a default mapping/ranking/binding policy that wasn't "none", we introduced a problem for users of the Mac and any other machine where sockets aren't defined and/or binding is not supported. Fix that by checking to see if the user specified the failing policy - if not, then fall back to the old map/rank by slot and no binding.
Refs trac:3977

This commit was SVN r29933.

The following Trac tickets were found above:
  Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977
2013-12-17 14:50:10 +00:00
Ralph Castain
0995a6f3b9 Revert r29917 and replace it with a fix that resolves the thread deadlock while retaining the desired debug info. In an earlier commit, we had changed the modex accordingly:
* automatically retrieve the hostname (and all RTE info) for all procs during MPI_Init if nprocs < cutoff

* if nprocs > cutoff, retrieve the hostname (and all RTE info) for a proc upon the first call to modex_recv for that proc. This would provide the hostname for debugging purposes as we only report errors on messages, and so we must have called modex_recv to get the endpoint info

* BTLs are not to call modex_recv until they need the endpoint info for first message - i.e., not during add_procs so we don't call it for every process in the job, but only those with whom we communicate

My understanding is that only some BTLs have been modified to meet that third requirement, but those include the Cray ones where jobs are big enough that launch times were becoming an issue. Other BTLs would hopefully be modified as time went on and interest in using them at scale arose. Meantime, those BTLs would call modex_recv on every proc, and we would therefore be no worse than the prior behavior.

This commit revises the MPI-RTE interface to pass the ompi_proc_t instead of the ompi_process_name_t for the proc so that the hostname can be easily inserted. I have advised the ORNL folks of the change.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix thread deadlock

This commit was SVN r29931.

The following SVN revision numbers were found above:
  r29917 --> open-mpi/ompi@1a972e2c9d
2013-12-17 03:26:00 +00:00
Ralph Castain
f13a37637f Update platform files to always enable mpi-thread-multiple
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29930.
2013-12-17 03:11:26 +00:00
Ralph Castain
353663e51b Update NEWS
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29928.
2013-12-16 20:12:23 +00:00