1
1
Граф коммитов

15764 Коммитов

Автор SHA1 Сообщение Дата
Rolf vandeVaart
2634f6401a Add some basic support for sending and receiving CUDA device memory. Feature is disabled by default and has no effect on default code paths.
This commit was SVN r24659.
2011-04-28 23:05:55 +00:00
Ralph Castain
0ff0d20e72 Grr...get the prefix right - need to strip the bin out of absolute path to mpirun.
This commit was SVN r24658.
2011-04-28 22:20:55 +00:00
Ralph Castain
6af2677fb8 Check for both absolute-path-to-mpirun and -prefix being specified. If the two differ, print out a warning and ignore -prefix. If they are the same, or only one was given, then proceed as directed.
This commit was SVN r24657.
2011-04-28 22:12:41 +00:00
Jeff Squyres
c8c8044b92 Minor updates to NEWS
This commit was SVN r24654.
2011-04-28 21:40:44 +00:00
Brad Benton
0876e3549d Added a couple more items for 1.4.4.
This commit was SVN r24651.
2011-04-28 18:47:34 +00:00
Brad Benton
359de0bde6 Add 1.4.4 items.
This commit was SVN r24650.
2011-04-28 15:48:51 +00:00
Jeff Squyres
0882d636a6 Oops -- need string.h, too (for strcasecmp).
This commit was SVN r24649.
2011-04-28 15:42:35 +00:00
Jeff Squyres
7362a0730a Change the default to "none". David Singleton raises a good point
that enabling "local_only" by default could cause excessive
by-NUMA-node paging and/or OOMs (rather than allowing memory
allocations to spill over to other NUMA nodes).

This brought home the very real-world example of people buying servers
with more processors/cores than they need, just to get more memory.
We wouldn't want Badness to occur in such scenarios by default.
Instead, let people turn on "only allow memory allocations on my local
NUMA node" if their application would benefit from it.

This commit was SVN r24648.
2011-04-28 15:16:39 +00:00
Ralph Castain
b586f2952e Arggg...revert r24645. I knew those fields were there for a reason...sigh.
This commit was SVN r24647.

The following SVN revision numbers were found above:
  r24645 --> open-mpi/ompi@e4732110da
2011-04-28 15:07:00 +00:00
Ralph Castain
859aaab93d In the case of direct-launched processes running under slurm, psm requires that the pre_condition_transports MCA param be set. This is normally computed by mpirun and inserted into each proc's environ, but that doesn't work here.
So separate out the printing of that key, and let the individual procs generate it in a way that ensures they all get the same result.

This commit was SVN r24646.
2011-04-28 13:54:33 +00:00
Ralph Castain
e4732110da Remove a couple more stale fields
This commit was SVN r24645.
2011-04-28 00:26:38 +00:00
Ralph Castain
39369f8807 Remove stale fields from global objects - have been moved to the layer that actually uses them
This commit was SVN r24644.
2011-04-28 00:20:49 +00:00
Ralph Castain
8858d9a40e Add a marker for other layers to use in defining data types
This commit was SVN r24643.
2011-04-28 00:19:35 +00:00
Jeff Squyres
7b48042ffd Commit patch from upstream hwloc: r3482. Fixes some compiler
warnings. 

This commit was SVN r24641.

The following SVN revision numbers were found above:
  r3482 --> open-mpi/ompi@2435be8d49
2011-04-27 17:08:15 +00:00
Jeff Squyres
d134ff9b4d Refs trac:2698
After a long period of development with many starts and stops, we
finally got this where we wanted it.

This commit introduces 2 new MCA params (note that the
"maffinity_libnuma_policy" MCA param introduced by r24290 was removed
when libnuma support was removed).  Remember that maffinity policies
are only in effect when paffinity is enaabled -- i.e., when processes
are bound to processors!

 * '''maffinity_base_alloc_policy:''' Policy that determines how
   general memory allocations are bound after MPI_INIT.  A value of
   "none" means that no memory policy is applied.  A value of
   "local_only" means that all memory allocations will be restricted
   to the local NUMA node where each process is placed.  Note that
   operating system paging policies are unaffected by this setting.
   For example, if "local_only" is used and local NUMA node memory is
   exhausted, a new memory allocation may cause paging.
 * '''maffinity_base_bind_failure_action:''' What Open MPI will do if
   it explicitly tries to bind memory to a specific NUMA location, and
   fails.  Note that this is a different case than the general
   allocation policy described by maffinity_base_alloc_policy.  A
   value of "warn" means that Open MPI will warn the first time this
   happens, but allow the job to continue (possibly with degraded
   performance).  A value of "error" means that Open MPI will abort
   the job if this happens.

This needs at least a little soak time on the trunk before going to
v1.5.

This commit was SVN r24639.

The following SVN revision numbers were found above:
  r24290 --> open-mpi/ompi@afa654746c

The following Trac tickets were found above:
  Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
2011-04-26 13:31:07 +00:00
Matthias Jurenz
a1e304b2d6 Removed redundant debug message
This commit was SVN r24638.
2011-04-26 08:02:46 +00:00
Jeff Squyres
926af377fe Refs trac:2778.
Upgrade to hwloc 1.2 (from hwloc 1.1.2).  This should fix the problems
Nathan's seeing in #2778.

Let's let this soak on the trunk for a little while and see how LANL's
MTT's work out.  If that works, then we can CMR this to v1.5.

This commit was SVN r24635.

The following Trac tickets were found above:
  Ticket 2778 --> https://svn.open-mpi.org/trac/ompi/ticket/2778
2011-04-25 19:31:49 +00:00
Jeff Squyres
b8af3b7c4a New comment explains it all -- previous code was failing to find the
Nth core, so it fell over to try to find the Nth PU.

-----

hwloc isn't able to find cores on all platforms.  Example: PPC64
running RHEL 5.4 (linux kernel 2.6.18) only reports NUMA nodes and
PU's.  Fine.

However, note that hwloc_get_obj_by_type() will return NULL in 2
(effectively) different cases:

- no objects of the requested type were found
- the Nth object of the requested type was not found

So first we have to see if we can find *any* cores by looking for the
0th core.  If we find it, then try to find the Nth core.  Otherwise,
try to find the Nth PU.

This commit was SVN r24632.
2011-04-25 16:55:27 +00:00
Jeff Squyres
16d8e9216b Ran across this comment about i18n support, so I figured I'd update
it.  :-)

This commit was SVN r24631.
2011-04-22 12:14:20 +00:00
Jeff Squyres
ddc44cfbce Fix the types of the sendcounts and displs parameters to MPI_Scatterv.
Thanks to Stanislav Sazykin for identifying the issue.

This commit was SVN r24630.
2011-04-22 10:11:45 +00:00
Ralph Castain
9988b97b97 Extend/update how we handle process stats. Add the ability to collect node-level stats separate from the process stats. Update the process stat memory fields to report in MBytes instead of KBytes as I can't find any process that runs in KBytes nowadays.
Rename the memusage sensor plugin to "resusage" as it will soon be updated to include full process stat monitoring.

Extend the heartbeat sensor to report node and process stats in the heartbeat.

Store the process and node stats in their respective orte_xxx_t object.

This commit was SVN r24629.
2011-04-21 22:55:45 +00:00
Brian Barrett
3d4b7ecbaf updates for API changes
This commit was SVN r24628.
2011-04-20 16:48:27 +00:00
Brian Barrett
e1676fd61e Make the no-orte case compile again
This commit was SVN r24627.
2011-04-20 16:48:07 +00:00
Jeff Squyres
25a8944e09 Fixes trac:2776. Let the openib BTL auto-detect its bandwidth.
cmr:v1.5.4

This commit was SVN r24621.

The following Trac tickets were found above:
  Ticket 2776 --> https://svn.open-mpi.org/trac/ompi/ticket/2776
2011-04-19 16:31:36 +00:00
Ralph Castain
5f64b830f9 Ensure we only kill threads once
This commit was SVN r24620.
2011-04-18 14:47:09 +00:00
Ralph Castain
8014e7432c Send recovery defined flag in app_contexts, include recovery flags in debug prints
This commit was SVN r24619.
2011-04-18 14:46:42 +00:00
Ralph Castain
89501e6e24 Don't try to politely end threads when abnormally terminating as we can hang if the thread is in a stuck callback.
This commit was SVN r24618.
2011-04-18 12:21:47 +00:00
George Bosilca
971711474f Based on the patch submitted by Pascal Deveze, here is the memory leak fix
for the type indexed creation.

CMR v1.4 and v1.5.

This commit was SVN r24617.
2011-04-14 21:50:06 +00:00
Ralph Castain
3a28556472 Expand our handling of non-zero exit status. If a process exits with non-zero status, pass that info along to the user in case it means something to them, even if the process also exited without calling MPI_Finalize. If the process calls MPI_Abort, that trumps the exit status question.
Provide a new MCA param that allows the user to direct that we abort the job once a process exits with non-zero status. No recovery is allowed in such cases to avoid trying to restart a process that has already exited MPI.

This commit was SVN r24614.
2011-04-14 15:04:21 +00:00
Ralph Castain
e4c36a3611 Add optimized platform files
This commit was SVN r24613.
2011-04-13 19:14:33 +00:00
Jeff Squyres
2fe94b929a Manually add hwloc v1.1 branch r3418 commit (went in after v1.1.2
released): 

backport hwloc r 3416 from trunk: Add cache info entry _after_ checking
that we need one, thanks Andriy Gapon for the fix

This commit was SVN r24612.

The following SVN revision numbers were found above:
  r3418 --> open-mpi/ompi@9972663a12
2011-04-12 14:41:46 +00:00
Jeff Squyres
9dc3a1aa54 Upgrade to hwloc 1.1.2; most likely the last release of the hwloc
1.1.x series

This commit was SVN r24611.
2011-04-12 14:35:26 +00:00
Jeff Squyres
38d3cdd4a6 Update hwloc to 1.1.1. Next stop: 1.1.2.
This commit was SVN r24610.
2011-04-12 14:16:37 +00:00
Jeff Squyres
48f418ee7b Fixes trac:2768: exclude opal/libltdl from "make distclean" when
--disable-dlopen is used.  Thanks to David Gunter for reporting the
issue. 

This commit was SVN r24603.

The following Trac tickets were found above:
  Ticket 2768 --> https://svn.open-mpi.org/trac/ompi/ticket/2768
2011-04-08 14:59:49 +00:00
Matthias Jurenz
fe8cc366c8 Don't try to rename the compiler output of an OPARI modified source file if it's specified by '-o'
This commit was SVN r24601.
2011-04-08 11:53:46 +00:00
Samuel Gutierrez
38077f8692 added check for sys/shm.h
This commit was SVN r24598.
2011-04-07 19:52:19 +00:00
Shiqing Fan
4b3b713bfc Update the windows installdir component.
Don't use the old env component for windows, so remove the .windows file.

This commit was SVN r24597.
2011-04-05 12:15:41 +00:00
Edgar Gabriel
725a0d2100 fix a formatting issue
This commit was SVN r24596.
2011-03-31 20:05:45 +00:00
Edgar Gabriel
ad9f793ce4 avoid calling omp_dpm.mark_dyncomm if the size of the local communicator
is zero. The routine assumes that at least one process is available in the
group, which lead to a segfault when creating communicators with GROUP_EMPTY.

Fixes trac:2752

This commit was SVN r24595.

The following Trac tickets were found above:
  Ticket 2752 --> https://svn.open-mpi.org/trac/ompi/ticket/2752
2011-03-31 19:57:06 +00:00
Terry Dontje
266e663091 Add opal_tree class. This will be used in the future by sysinfo to store hw maps to be used by rmaps for the new affinity code.
This commit was SVN r24594.
2011-03-30 08:05:28 +00:00
Ralph Castain
30fb002524 Take the first small step towards rationalizing rsh support. Create a new "rshbase" component that contains a simple rsh module - no tree spawn, uses all the base functions for launch support. Extend the base rsh support functions to include those functions in common across all rsh modules.
Only a minor change made to the current rsh module to avoid a naming conflict. Otherwise, left it alone to avoid creating conflicts with other external work. The current rsh module remains the default for rsh/ssh support, and continues to contain the support for SGE and Loadleveler.

This commit was SVN r24593.
2011-03-30 01:15:07 +00:00
Nysal Jan
866ae8b43a Close the file descriptor
This commit was SVN r24580.
2011-03-29 08:42:49 +00:00
Nysal Jan
c8c6b0edab Improve LoadLeveler integration with Open MPI. Add support for LL native rsh agent - llspawn
This commit was SVN r24579.
2011-03-29 07:46:59 +00:00
Ralph Castain
f40edd6b4f Add the stupid test word
This commit was SVN r24578.
2011-03-26 03:38:59 +00:00
Nathan Hjelm
8634b6394f fixed plm/tm component
This commit was SVN r24577.
2011-03-25 22:20:15 +00:00
Ralph Castain
5bfb01c6c8 Only build the linux component of sysinfo if linux is the operating system.
Thanks to Paul Hargrove for the suggestion.

This commit was SVN r24576.
2011-03-25 20:55:57 +00:00
Matthias Jurenz
53346a9c1a - fixed handling NULL value of pathname given to certain I/O calls (e.g. fopen, open, unlink)
- incremented version number

This commit was SVN r24575.
2011-03-25 11:15:49 +00:00
Jeff Squyres
58a13f87e6 Oops -- forgot to add opal_config_top.h to Makefile.am (so that it'll
be included in the tarball).

This commit was SVN r24572.
2011-03-25 01:21:11 +00:00
Jeff Squyres
5ae1b15b6e Ensure that other packages defining PACKAGE_ macros don't hurt us, and protect others from our PACKAGE_ macros.
This commit was SVN r24571.
2011-03-24 22:39:56 +00:00
Ralph Castain
d7e029cb40 Convert heartbeat to multicast basis
This commit was SVN r24570.
2011-03-24 19:05:39 +00:00