1
1
Граф коммитов

19180 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
55cd65b149 Don't warn about binding (process and/or memory) if the node cannot do it or if we would overload, but it wasn't specifically requested by the user (i.e., it is the result of the default policy). Instead, just don't bind and quietly move along.
Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff.

cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings

This commit was SVN r29978.
2013-12-19 16:31:45 +00:00
Ralph Castain
2a6376fcf5 Update platform files
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29977.
2013-12-19 15:38:28 +00:00
Mike Dubman
d70f93b2dc fix corrupted verbose output in oshmem
set yoda prio lower than ikrit
fix anon unions in ikrit
Refs trac:3763

This commit was SVN r29976.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2013-12-19 11:59:32 +00:00
Ralph Castain
9b32dacb6c Ensure we don't abort if a tool cannot send a message - the orte/util/comm library used by tools to query mpirun knows how to handle this situation.
Refs trac:3992

This commit was SVN r29975.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 07:10:36 +00:00
Ralph Castain
6239e64f36 Further cleanup of orte-ps so it doesn't abort when hitting a stale HNP - only report that event once and just keep working.
Refs trac:3992

This commit was SVN r29974.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 03:28:05 +00:00
Ralph Castain
bf5e314f76 Tools require their own errmgr and state components so they can handle any errors that occur in, for example, communication .
Refs trac:3992

This commit was SVN r29972.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 01:49:33 +00:00
Tom Naughton
3aefca32b0 + update rte db_fetch comments with change from r29931
This commit was SVN r29971.

The following SVN revision numbers were found above:
  r29931 --> open-mpi/ompi@0995a6f3b9
2013-12-19 01:16:58 +00:00
Ralph Castain
3aaca16faa Silence warnings that are no longer valid
Refs trac:3992

This commit was SVN r29970.

The following Trac tickets were found above:
  Ticket 3992 --> https://svn.open-mpi.org/trac/ompi/ticket/3992
2013-12-19 00:40:36 +00:00
Ralph Castain
c5956e7b8c Convert debug output to opal_output_verbose
Thanks to Tetsuya Mishima for reporting it

cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r29969.
2013-12-19 00:36:15 +00:00
Jeff Squyres
bb59b07321 Remove CFLAGS setting that was really only intended for the v1.6
branch (it's not necessary on trunk/v1.7 because they require C99,
which allows variadic macros).

Also fix another compiler warning (using %p to print a (void*)).

Submitted by Jeff, reviewed by Dave.

cmr=v1.7.4:reviewer=ompi-rm1.7:subject=two usnic BTL fixes

This commit was SVN r29966.
2013-12-19 00:19:05 +00:00
Nathan Hjelm
b9765a380f Update NEWS with new MPI-3 features and a note about the new ROMIO
version.

cmr=v1.7.4:reviewer=rhc

This commit was SVN r29965.
2013-12-19 00:16:07 +00:00
Jeff Squyres
515fd00411 CSCul95082: DMAR faults during mtt testing
usnic_channel_finalize() was deregistering recv buffers before
destroying the QP to which they were posted. The QP needs to be
destroyed first so that the NIC does not attemp tto write to
deregistered memory, causing the DMAR messages.

Submitted by Reese, reviewed by Jeff.

cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29963.
2013-12-19 00:01:35 +00:00
Jeff Squyres
d8c0c919e1 Clarify the comment: if ummunotify is present *and* if we were
compiled with ummunotify support (which is the check that r29720 just
recently added).

This commit was SVN r29961.

The following SVN revision numbers were found above:
  r29720 --> open-mpi/ompi@ae8c826527
2013-12-18 23:39:22 +00:00
Ralph Castain
da6551bd3d Update opal_backtrace_print call in oshmem
This commit was SVN r29960.
2013-12-18 23:15:40 +00:00
Ralph Castain
39957df08e Fixes trac:3963. Fix the tool ess procedure so it opens and selects the OOB framework, and have the OOB TCP module update the route to new connections (the routed modules know what to do).
Thanks to Dave Love and Ashley Pittman for pointing out the problem.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix tool communications with mpirun

This commit was SVN r29959.

The following Trac tickets were found above:
  Ticket 3963 --> https://svn.open-mpi.org/trac/ompi/ticket/3963
2013-12-18 23:13:46 +00:00
Rolf vandeVaart
a9b7693da3 Update some new CUDA-aware features.
This commit was SVN r29958.
2013-12-18 21:09:25 +00:00
Ralph Castain
77553f72be Per this email thread:
http://www.open-mpi.org/community/lists/devel/2013/12/13412.php

fix the backtrace function to avoid async issues. Thanks to Takahiro Kawashima for the patch

This commit was SVN r29955.
2013-12-18 17:57:37 +00:00
Ralph Castain
c3d2b3e9b8 Update the default ranking policy
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29954.
2013-12-18 17:13:13 +00:00
Jeff Squyres
2665c91b2a Fixes trac:3958: use the right type name (mca_topo_base_module_t) in the
debugger code (not mca_topo_base_module_2_1_0_t).

I checked: we do a similar thing for coll in the communicator struct
(i.e., leave the version number off the module struct).  I confess to
not remembering ''why'' we leave the version number off, but it seems
to be consistent this way...

cmr=v1.7.4:reviewer=bosilca:subject=fix debugger type symbol lookup for mca_topo_base_module_t

This commit was SVN r29953.

The following Trac tickets were found above:
  Ticket 3958 --> https://svn.open-mpi.org/trac/ompi/ticket/3958
2013-12-18 15:17:15 +00:00
Jeff Squyres
4d6967efc4 Sync 1.6.6 bullets with v1.6 branch
This commit was SVN r29952.
2013-12-18 13:31:46 +00:00
Mike Dubman
da5c55342f fix bash comparison to work as expected
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29949.
2013-12-18 10:25:27 +00:00
Ralph Castain
ab4636c47b Per email on devel list, change the default rank-by to slot unless map-by <obj> is specified, in which case use rank-by <obj>
Refs trac:3977

This commit was SVN r29945.

The following Trac tickets were found above:
  Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977
2013-12-18 00:48:50 +00:00
Yossi Etigin
ecfb122c97 Fix segfault in osc pt2pt completion handler, when the request is canceled during finalization.
cmr=v1.7.4:reviewer=ompi-rm1.7

This commit was SVN r29938.
2013-12-17 17:30:14 +00:00
Ralph Castain
1b0bed5539 Add news item
This commit was SVN r29935.
2013-12-17 14:55:33 +00:00
Ralph Castain
53cd00fe16 By setting a default mapping/ranking/binding policy that wasn't "none", we introduced a problem for users of the Mac and any other machine where sockets aren't defined and/or binding is not supported. Fix that by checking to see if the user specified the failing policy - if not, then fall back to the old map/rank by slot and no binding.
Refs trac:3977

This commit was SVN r29933.

The following Trac tickets were found above:
  Ticket 3977 --> https://svn.open-mpi.org/trac/ompi/ticket/3977
2013-12-17 14:50:10 +00:00
Ralph Castain
0995a6f3b9 Revert r29917 and replace it with a fix that resolves the thread deadlock while retaining the desired debug info. In an earlier commit, we had changed the modex accordingly:
* automatically retrieve the hostname (and all RTE info) for all procs during MPI_Init if nprocs < cutoff

* if nprocs > cutoff, retrieve the hostname (and all RTE info) for a proc upon the first call to modex_recv for that proc. This would provide the hostname for debugging purposes as we only report errors on messages, and so we must have called modex_recv to get the endpoint info

* BTLs are not to call modex_recv until they need the endpoint info for first message - i.e., not during add_procs so we don't call it for every process in the job, but only those with whom we communicate

My understanding is that only some BTLs have been modified to meet that third requirement, but those include the Cray ones where jobs are big enough that launch times were becoming an issue. Other BTLs would hopefully be modified as time went on and interest in using them at scale arose. Meantime, those BTLs would call modex_recv on every proc, and we would therefore be no worse than the prior behavior.

This commit revises the MPI-RTE interface to pass the ompi_proc_t instead of the ompi_process_name_t for the proc so that the hostname can be easily inserted. I have advised the ORNL folks of the change.

cmr=v1.7.4:reviewer=jsquyres:subject=Fix thread deadlock

This commit was SVN r29931.

The following SVN revision numbers were found above:
  r29917 --> open-mpi/ompi@1a972e2c9d
2013-12-17 03:26:00 +00:00
Ralph Castain
f13a37637f Update platform files to always enable mpi-thread-multiple
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29930.
2013-12-17 03:11:26 +00:00
Ralph Castain
353663e51b Update NEWS
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29928.
2013-12-16 20:12:23 +00:00
George Bosilca
af6ccbc453 Update the emails from UTK folks.
This commit was SVN r29925.
2013-12-16 19:07:16 +00:00
Adrian Reber
fbef1d7a1f Added myself to the .mailmap and AUTHORS files.
This commit was SVN r29923.
2013-12-16 16:21:37 +00:00
Adrian Reber
b42aad44a3 Trying to get the C/R code to compile again. This patch
includes various fixes all over the C/R code which are
hard to group like the other patches.

Changes from V1:
* explain why mca_base_component_distill_checkpoint_ready no longer works
* compare return result of opal functions with OPAL_* values

Changes from V2:
* use orte_rml_oob_ft_event() instead of referencing through the modules
* properly protect variable (thanks to --enable-picky)

This commit was SVN r29922.
2013-12-16 15:35:28 +00:00
Mike Dubman
b95a9d865a rework SHMEM verbose macros to enable if --enable-debug specified
Refs trac:3763

This commit was SVN r29921.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2013-12-16 09:13:27 +00:00
George Bosilca
3d72ccf1f4 Don't reset the convertor to the default size and buffer as it
should be already set to the right value. This fixes a problem
identified by Guillaume Gouaillardet, where using a single 
persistent receive leads to leaking the convertor stack memory.

Refs trac:3956
cmr=v1.7.4:reviewer=jsquyres:subject=Correctly handle the convertor internal stack for persistent receives.

This commit was SVN r29920.

The following Trac tickets were found above:
  Ticket 3956 --> https://svn.open-mpi.org/trac/ompi/ticket/3956
2013-12-15 18:16:38 +00:00
Ralph Castain
8b6d117541 Per the OMPI devel conference that changed our default behaviors:
* default to bind-to core 
* map-by slot if np=2
* map-by socket (balance across sockets on each node) if np > 2
* map-by <obj> will imply rank-by <obj> by default (leave default binding as above) 

Fix a bug in the map-by <obj> mapper where we incorrectly compute the #procs to assign if the #slots > #procs

cmr=v1.7.4:reviewer=jsquyres:subject=Update default binding and mapping values

This commit was SVN r29919.
2013-12-15 17:25:54 +00:00
George Bosilca
efb32da1e0 There is no need for this include.
This commit was SVN r29918.
2013-12-15 17:04:45 +00:00
George Bosilca
1a972e2c9d Don't be greedy, just do what we asked for.
This commit was SVN r29917.
2013-12-15 16:54:01 +00:00
George Bosilca
430a13719f Only if OMPI_BTL_SM_HAVE_CMA is set to 1.
This commit was SVN r29916.
2013-12-15 16:49:27 +00:00
George Bosilca
6189d5968b Make the builtin atomics follow the same convention as every other atomic
support we have ([op]_and_fetch instead of fetch_and_[op]).

This commit was SVN r29915.
2013-12-15 16:48:27 +00:00
Ralph Castain
659cb9652d Seems to work either way, but add semi-colon for correctness
This commit was SVN r29913.
2013-12-15 14:55:45 +00:00
Mike Dubman
879ea64e6b add mlnx packages autodetect logic
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29911.
2013-12-15 12:33:41 +00:00
Ralph Castain
ba94c937bb Update platform files
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29910.
2013-12-14 17:49:20 +00:00
Mike Dubman
ac4573b6db code formatting according to OMPI code style
Refs trac:3763

This commit was SVN r29908.

The following Trac tickets were found above:
  Ticket 3763 --> https://svn.open-mpi.org/trac/ompi/ticket/3763
2013-12-14 14:39:56 +00:00
Jeff Squyres
770bf77149 Fix some minor memory leaks in error code paths.
Many thanks to Tom Fogal for the patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix minor memory leaks in error code paths

This commit was SVN r29905.
2013-12-14 00:41:21 +00:00
Jeff Squyres
0ab48ad0d2 Fix some annoying flex warnings that have been there for years.
Many thanks to Tom Fogal for the initial patch.

cmr=v1.7.4:reviewer=rhc:subject=Fix annoying flex warnings

This commit was SVN r29904.
2013-12-14 00:36:12 +00:00
Jeff Squyres
a7e65df6bc Update the --enable-wrapper-rpath help string to be correct.
Refs trac:3694

This commit was SVN r29903.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 22:20:10 +00:00
Jeff Squyres
a25630c5e7 Fix rpath m4 typo that seeped in at the last minute.
Refs trac:3694

This commit was SVN r29901.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:40:03 +00:00
Rolf vandeVaart
b955dbd6d9 Fix various items discovered by review of ticket #3951.
This commit was SVN r29900.
2013-12-13 21:25:07 +00:00
Jeff Squyres
1bc8f41edb This commit combines 3 somewhat-unrelated things, which unfortunately
got linked together (work on one caused work in the other):

 * Clean up a bunch of VAR_SCOPE issues in configure.  This includes:
   * Using VAR_SCOPE_PUSH and VAR_SCOPE_POP in more places
   * Cleaning up the use of some shell variables (e.g., name them better)
 * Add support for external libevent via
   --with-libevent=<dir-to-libevent-install-tree>, as specifically
   asked for by downstream packagers.
 * Revamp how wrapper compiler RPATH (and RUNPATH) support is done.
   The external libevent work exposed weakenesses in how the original
   RPATH/RUNPATH work was done, so we had to re-do it to be a bit more
   robust.

This work has not yet been tested on Solaris.

Refs trac:3694

This commit was SVN r29899.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:24:45 +00:00
Jeff Squyres
f4afa4fd1f Add missing include, exposed in "external libevent" work.
Refs trac:3694

This commit was SVN r29898.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:21:30 +00:00
Jeff Squyres
2e7653e4c2 Add missing argv.h includes.
Noticed these as part of #3694: external libevent's don't cause argv.h
to automatically get included.

Refs trac:3694

This commit was SVN r29897.

The following Trac tickets were found above:
  Ticket 3694 --> https://svn.open-mpi.org/trac/ompi/ticket/3694
2013-12-13 21:17:36 +00:00