1
1
Граф коммитов

15851 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
7b48042ffd Commit patch from upstream hwloc: r3482. Fixes some compiler
warnings. 

This commit was SVN r24641.

The following SVN revision numbers were found above:
  r3482 --> open-mpi/ompi@2435be8d49
2011-04-27 17:08:15 +00:00
Jeff Squyres
d134ff9b4d Refs trac:2698
After a long period of development with many starts and stops, we
finally got this where we wanted it.

This commit introduces 2 new MCA params (note that the
"maffinity_libnuma_policy" MCA param introduced by r24290 was removed
when libnuma support was removed).  Remember that maffinity policies
are only in effect when paffinity is enaabled -- i.e., when processes
are bound to processors!

 * '''maffinity_base_alloc_policy:''' Policy that determines how
   general memory allocations are bound after MPI_INIT.  A value of
   "none" means that no memory policy is applied.  A value of
   "local_only" means that all memory allocations will be restricted
   to the local NUMA node where each process is placed.  Note that
   operating system paging policies are unaffected by this setting.
   For example, if "local_only" is used and local NUMA node memory is
   exhausted, a new memory allocation may cause paging.
 * '''maffinity_base_bind_failure_action:''' What Open MPI will do if
   it explicitly tries to bind memory to a specific NUMA location, and
   fails.  Note that this is a different case than the general
   allocation policy described by maffinity_base_alloc_policy.  A
   value of "warn" means that Open MPI will warn the first time this
   happens, but allow the job to continue (possibly with degraded
   performance).  A value of "error" means that Open MPI will abort
   the job if this happens.

This needs at least a little soak time on the trunk before going to
v1.5.

This commit was SVN r24639.

The following SVN revision numbers were found above:
  r24290 --> open-mpi/ompi@afa654746c

The following Trac tickets were found above:
  Ticket 2698 --> https://svn.open-mpi.org/trac/ompi/ticket/2698
2011-04-26 13:31:07 +00:00
Matthias Jurenz
a1e304b2d6 Removed redundant debug message
This commit was SVN r24638.
2011-04-26 08:02:46 +00:00
Jeff Squyres
926af377fe Refs trac:2778.
Upgrade to hwloc 1.2 (from hwloc 1.1.2).  This should fix the problems
Nathan's seeing in #2778.

Let's let this soak on the trunk for a little while and see how LANL's
MTT's work out.  If that works, then we can CMR this to v1.5.

This commit was SVN r24635.

The following Trac tickets were found above:
  Ticket 2778 --> https://svn.open-mpi.org/trac/ompi/ticket/2778
2011-04-25 19:31:49 +00:00
Jeff Squyres
b8af3b7c4a New comment explains it all -- previous code was failing to find the
Nth core, so it fell over to try to find the Nth PU.

-----

hwloc isn't able to find cores on all platforms.  Example: PPC64
running RHEL 5.4 (linux kernel 2.6.18) only reports NUMA nodes and
PU's.  Fine.

However, note that hwloc_get_obj_by_type() will return NULL in 2
(effectively) different cases:

- no objects of the requested type were found
- the Nth object of the requested type was not found

So first we have to see if we can find *any* cores by looking for the
0th core.  If we find it, then try to find the Nth core.  Otherwise,
try to find the Nth PU.

This commit was SVN r24632.
2011-04-25 16:55:27 +00:00
Jeff Squyres
16d8e9216b Ran across this comment about i18n support, so I figured I'd update
it.  :-)

This commit was SVN r24631.
2011-04-22 12:14:20 +00:00
Jeff Squyres
ddc44cfbce Fix the types of the sendcounts and displs parameters to MPI_Scatterv.
Thanks to Stanislav Sazykin for identifying the issue.

This commit was SVN r24630.
2011-04-22 10:11:45 +00:00
Ralph Castain
9988b97b97 Extend/update how we handle process stats. Add the ability to collect node-level stats separate from the process stats. Update the process stat memory fields to report in MBytes instead of KBytes as I can't find any process that runs in KBytes nowadays.
Rename the memusage sensor plugin to "resusage" as it will soon be updated to include full process stat monitoring.

Extend the heartbeat sensor to report node and process stats in the heartbeat.

Store the process and node stats in their respective orte_xxx_t object.

This commit was SVN r24629.
2011-04-21 22:55:45 +00:00
Brian Barrett
3d4b7ecbaf updates for API changes
This commit was SVN r24628.
2011-04-20 16:48:27 +00:00
Brian Barrett
e1676fd61e Make the no-orte case compile again
This commit was SVN r24627.
2011-04-20 16:48:07 +00:00
Jeff Squyres
25a8944e09 Fixes trac:2776. Let the openib BTL auto-detect its bandwidth.
cmr:v1.5.4

This commit was SVN r24621.

The following Trac tickets were found above:
  Ticket 2776 --> https://svn.open-mpi.org/trac/ompi/ticket/2776
2011-04-19 16:31:36 +00:00
Ralph Castain
5f64b830f9 Ensure we only kill threads once
This commit was SVN r24620.
2011-04-18 14:47:09 +00:00
Ralph Castain
8014e7432c Send recovery defined flag in app_contexts, include recovery flags in debug prints
This commit was SVN r24619.
2011-04-18 14:46:42 +00:00
Ralph Castain
89501e6e24 Don't try to politely end threads when abnormally terminating as we can hang if the thread is in a stuck callback.
This commit was SVN r24618.
2011-04-18 12:21:47 +00:00
George Bosilca
971711474f Based on the patch submitted by Pascal Deveze, here is the memory leak fix
for the type indexed creation.

CMR v1.4 and v1.5.

This commit was SVN r24617.
2011-04-14 21:50:06 +00:00
Ralph Castain
3a28556472 Expand our handling of non-zero exit status. If a process exits with non-zero status, pass that info along to the user in case it means something to them, even if the process also exited without calling MPI_Finalize. If the process calls MPI_Abort, that trumps the exit status question.
Provide a new MCA param that allows the user to direct that we abort the job once a process exits with non-zero status. No recovery is allowed in such cases to avoid trying to restart a process that has already exited MPI.

This commit was SVN r24614.
2011-04-14 15:04:21 +00:00
Ralph Castain
e4c36a3611 Add optimized platform files
This commit was SVN r24613.
2011-04-13 19:14:33 +00:00
Jeff Squyres
2fe94b929a Manually add hwloc v1.1 branch r3418 commit (went in after v1.1.2
released): 

backport hwloc r 3416 from trunk: Add cache info entry _after_ checking
that we need one, thanks Andriy Gapon for the fix

This commit was SVN r24612.

The following SVN revision numbers were found above:
  r3418 --> open-mpi/ompi@9972663a12
2011-04-12 14:41:46 +00:00
Jeff Squyres
9dc3a1aa54 Upgrade to hwloc 1.1.2; most likely the last release of the hwloc
1.1.x series

This commit was SVN r24611.
2011-04-12 14:35:26 +00:00
Jeff Squyres
38d3cdd4a6 Update hwloc to 1.1.1. Next stop: 1.1.2.
This commit was SVN r24610.
2011-04-12 14:16:37 +00:00
Jeff Squyres
48f418ee7b Fixes trac:2768: exclude opal/libltdl from "make distclean" when
--disable-dlopen is used.  Thanks to David Gunter for reporting the
issue. 

This commit was SVN r24603.

The following Trac tickets were found above:
  Ticket 2768 --> https://svn.open-mpi.org/trac/ompi/ticket/2768
2011-04-08 14:59:49 +00:00
Matthias Jurenz
fe8cc366c8 Don't try to rename the compiler output of an OPARI modified source file if it's specified by '-o'
This commit was SVN r24601.
2011-04-08 11:53:46 +00:00
Samuel Gutierrez
38077f8692 added check for sys/shm.h
This commit was SVN r24598.
2011-04-07 19:52:19 +00:00
Shiqing Fan
4b3b713bfc Update the windows installdir component.
Don't use the old env component for windows, so remove the .windows file.

This commit was SVN r24597.
2011-04-05 12:15:41 +00:00
Edgar Gabriel
725a0d2100 fix a formatting issue
This commit was SVN r24596.
2011-03-31 20:05:45 +00:00
Edgar Gabriel
ad9f793ce4 avoid calling omp_dpm.mark_dyncomm if the size of the local communicator
is zero. The routine assumes that at least one process is available in the
group, which lead to a segfault when creating communicators with GROUP_EMPTY.

Fixes trac:2752

This commit was SVN r24595.

The following Trac tickets were found above:
  Ticket 2752 --> https://svn.open-mpi.org/trac/ompi/ticket/2752
2011-03-31 19:57:06 +00:00
Terry Dontje
266e663091 Add opal_tree class. This will be used in the future by sysinfo to store hw maps to be used by rmaps for the new affinity code.
This commit was SVN r24594.
2011-03-30 08:05:28 +00:00
Ralph Castain
30fb002524 Take the first small step towards rationalizing rsh support. Create a new "rshbase" component that contains a simple rsh module - no tree spawn, uses all the base functions for launch support. Extend the base rsh support functions to include those functions in common across all rsh modules.
Only a minor change made to the current rsh module to avoid a naming conflict. Otherwise, left it alone to avoid creating conflicts with other external work. The current rsh module remains the default for rsh/ssh support, and continues to contain the support for SGE and Loadleveler.

This commit was SVN r24593.
2011-03-30 01:15:07 +00:00
Nysal Jan
866ae8b43a Close the file descriptor
This commit was SVN r24580.
2011-03-29 08:42:49 +00:00
Nysal Jan
c8c6b0edab Improve LoadLeveler integration with Open MPI. Add support for LL native rsh agent - llspawn
This commit was SVN r24579.
2011-03-29 07:46:59 +00:00
Ralph Castain
f40edd6b4f Add the stupid test word
This commit was SVN r24578.
2011-03-26 03:38:59 +00:00
Nathan Hjelm
8634b6394f fixed plm/tm component
This commit was SVN r24577.
2011-03-25 22:20:15 +00:00
Ralph Castain
5bfb01c6c8 Only build the linux component of sysinfo if linux is the operating system.
Thanks to Paul Hargrove for the suggestion.

This commit was SVN r24576.
2011-03-25 20:55:57 +00:00
Matthias Jurenz
53346a9c1a - fixed handling NULL value of pathname given to certain I/O calls (e.g. fopen, open, unlink)
- incremented version number

This commit was SVN r24575.
2011-03-25 11:15:49 +00:00
Jeff Squyres
58a13f87e6 Oops -- forgot to add opal_config_top.h to Makefile.am (so that it'll
be included in the tarball).

This commit was SVN r24572.
2011-03-25 01:21:11 +00:00
Jeff Squyres
5ae1b15b6e Ensure that other packages defining PACKAGE_ macros don't hurt us, and protect others from our PACKAGE_ macros.
This commit was SVN r24571.
2011-03-24 22:39:56 +00:00
Ralph Castain
d7e029cb40 Convert heartbeat to multicast basis
This commit was SVN r24570.
2011-03-24 19:05:39 +00:00
Jeff Squyres
cf6c5e8d48 Fix a bug noted by Gus Correa on the user's list: mpi_paffinity_alone
appeared multiple times in ompi_info output (so did others, but this
is the one that was noticed).  Ensure that we don't repeat
opal_paffinity_base_register_params() multiple times.

This commit was SVN r24569.
2011-03-24 00:58:25 +00:00
Ralph Castain
90698a2c02 Ensure that blocking recvs wait until the data is actually recvd
This commit was SVN r24558.
2011-03-22 18:45:54 +00:00
Ralph Castain
888472f671 Do not release recv as the calling function needs that data and will release it later
This commit was SVN r24557.
2011-03-22 18:44:56 +00:00
Ralph Castain
a3b0a9fcb7 Update platform file
This commit was SVN r24556.
2011-03-22 18:28:12 +00:00
Ralph Castain
30981de200 Minor cleanups courtesy of Nysal - thanks!
This commit was SVN r24552.
2011-03-22 13:48:58 +00:00
Josh Hursey
045035963a Fix return code from MPI_Probe and MPI_Iprobe.
Instead of returning MPI_SUCCESS every time they are called regardless of the status of the call, they should return a value representative of the action. So similar to MPI_Wait/MPI_Test they will return MPI_SUCCESS if the action was successfull, or the value that matches status.MPI_ERROR for the operation if it is unsuccessful.

This was discussed on the [http://www.open-mpi.org/community/lists/devel/2011/03/9109.php ompi-devel list]

This commit was SVN r24551.
2011-03-22 13:29:29 +00:00
Ralph Castain
c1396b278c Resolve the rsh confusion by splitting the initial search for a launch agent from the actual setup of the launch agent values in the plm base globals. Have each aspiring rsh-clone call lookup to see if their desired launch agent is available - if not, then reject that plm component.
If so, then setup the actual launch agent values only when the module init function is called.

This resolves the current conflict between the rsh and rshd components. Hopefully, it may avoid future problems in this area -provided- any new uses of rsh-like launchers abide by the lookup-and-then-setup rule.

This commit was SVN r24550.
2011-03-22 02:23:09 +00:00
Ralph Castain
d17b50e1ff Add the appropriate hooks to tell Totalview to display the user's main program upon startup. Apparently, this hook got lost somewhere after the 1.2 series :-(
Thanks to David Turner and the TV folks for passing this along.

This commit was SVN r24549.
2011-03-21 17:40:58 +00:00
Ralph Castain
795ca2cff2 Complete implementation of the multicast-based grpcomm module
This commit was SVN r24548.
2011-03-20 01:18:06 +00:00
Ralph Castain
fa40f5d7c3 Fix bad formatting
This commit was SVN r24547.
2011-03-20 01:17:29 +00:00
Ralph Castain
281116ddc5 A max_restarts value of -1 is now valid and indicates infinite restarts, so correct the validity check
This commit was SVN r24546.
2011-03-20 01:17:00 +00:00
Eugene Loh
2770a12beb Continue clean up of thread options started in r22841, 22842, and 22849.
No need for any CMRs to 1.5... that was already done in CMR 2728.

This commit was SVN r24545.

The following SVN revision numbers were found above:
  r22841 --> open-mpi/ompi@b400b84162
2011-03-18 21:36:35 +00:00
Matthias Jurenz
c34eed80c6 Fixed typo in configure options
This commit was SVN r24544.
2011-03-18 14:42:49 +00:00