1
1
Граф коммитов

8431 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
d0eb7d7216 Complete the attribute management functions.
Modify the mapper to better bookmark its stopping place each time, and to pick up the next time from there. This needs to be validated on a multi-node system.

Fix a major memory corruption problem in the registry put/get functions that was doing multiple free's. Not sure how valgrind missed this one, though it only occurred in specific circumstances (such as comm_spawn).

This commit was SVN r12179.
2006-10-18 20:02:16 +00:00
Galen Shipman
2036bf5c3c make smart and dumb compilers happy
This commit was SVN r12178.
2006-10-18 19:33:39 +00:00
George Bosilca
c9da782804 Keep only one function to get the size of a datatype.
This commit was SVN r12170.
2006-10-18 17:33:01 +00:00
George Bosilca
3db5c0487d typos.
This commit was SVN r12168.
2006-10-18 17:12:25 +00:00
George Bosilca
a1c9a374eb Remove all the warnings from the data-type engine testing.
This commit was SVN r12167.
2006-10-18 17:00:43 +00:00
George Bosilca
21ade43b96 Remove a non reacheable statement.
This commit was SVN r12166.
2006-10-18 16:43:55 +00:00
Rainer Keller
47b24a0603 - Now the branch is done, linearize access regarding
request handling.  Buys a little bit on IMB, no
   functional change, otherwise.

This commit was SVN r12165.
2006-10-18 16:11:50 +00:00
Ralph Castain
f4a458532b This doesn't totally resolve the comm_spawn problem, but it helps a little. I'll continue working on it and hope to resolve it completely shortly. The issue primarily centers on where to start mapping the child job's processes, and how to deal with oversubscription that might result. At the moment, I am trying to resolve the first issue first (hey, that even sounds right!).
This change does a couple of things:

1. Since the USE_PARENT_ALLOC attribute is a directive about regarding allocation of resources to a job, it more properly should be an attribute of the RAS. Change the name to reflect that and move the attribute define to the ras_types.h file.

2. Add the attributes list to the RMAPS map_job interface. This provides us with the desired flexibility to dynamically specify directives for mapping. The system will - in the absence of any attribute-based directive - default to the values provided in the MCA parameters (either from environment or command-line interface).

This commit was SVN r12164.
2006-10-18 14:01:44 +00:00
Gleb Natapov
252a9cea34 Fix bug in vma rcache.
This commit was SVN r12163.
2006-10-18 10:55:01 +00:00
Rainer Keller
528329c30f - Have two separate versions of OBJ_RETAIN and OBJ_RELEASE to
get readable preprocessed output.

This commit was SVN r12162.
2006-10-18 08:24:38 +00:00
George Bosilca
be27ee6fa0 Correct the bcast problem where we always did a bcast with segzise of 0.
Activate the reduce decision function.
Others small updates (mostly TAB to spaces).

This commit was SVN r12161.
2006-10-18 02:00:46 +00:00
George Bosilca
50649dd6a9 What we write it's a long long so we should be using the long long format.
This commit was SVN r12157.
2006-10-18 00:02:20 +00:00
George Bosilca
1d69375dd7 Somehow, somewhere we have to initialize rc ...
This commit was SVN r12156.
2006-10-17 22:32:21 +00:00
George Bosilca
b95a4fbaaa Bad boys, bad boys whatcha want ...
This commit was SVN r12155.
2006-10-17 22:06:51 +00:00
George Bosilca
640178c4b3 Grepping through the source files I found these calls to the data-type engine
with the wrong type of arguments.

This commit was SVN r12148.
2006-10-17 21:05:04 +00:00
George Bosilca
6f5ec2390b pedantic...
This commit was SVN r12147.
2006-10-17 20:25:40 +00:00
George Bosilca
8852c00c36 Look like a big commit but in fact it address only one issue. The way we're working with
size and diplacement of data-type. After this patch all data can contain size_t bytes
and the displacements are defined as ptrdiff_t. All of the files I was able to compile
have been modified to match this requirement.

This commit was SVN r12146.
2006-10-17 20:20:58 +00:00
Ralph Castain
0c0fe022ff This is a first cut at fixing the problem of comm_spawn children being mapped onto the same nodes as their parents. I am not convinced the behavior implemented here is the long-term right one, but hopefully it will help alleviate the situation for now.
In this implementation, we begin mapping on the first node that has at least one slot available as measured by the slots_inuse versus the soft limit. If none of the nodes meet that criterion, we just start at the beginning of the node list since we are oversubscribed anyway.

Note that we ignore this logic if the user specifies a mapping - then it's just "user beware".

The real root cause of the problem is that we don't adjust sched_yield as we add processes onto a node. Hence, the node becomes oversubscribed and performance goes into the toilet. What we REALLY need to do to solve the problem is:

(a) modify the PLS components so they reuse the existing daemons, 

(b) create a way to tell a running process to adjust its sched_yield, and

(c) modify the ODLS components to update the sched_yield on a process per the new method

Until we do that, we will continue to have this problem - all this fix (and any subsequent one that focuses solely on the mapper) does is hopefully make it happen less often.

This commit was SVN r12145.
2006-10-17 19:35:00 +00:00
Andrew Friedley
16769e64fe Remove old UD BTL code.
The UD BTL isn't gone - the latest version is in my afriedle-ud branch.  This version on the trunk was very old, ompi_ignore'd, lacked performance, and probably contained bugs.  The maintained version on my branch is working solid, and will eventually come back, but not for v1.2.

This commit was SVN r12144.
2006-10-17 18:59:21 +00:00
Dan Lacher
338000edf2 Update to include mpi man pages in package
Submitted by: Dan Lacher
Reviewed by: Rolf Vandevaart

This commit was SVN r12143.
2006-10-17 18:58:24 +00:00
Tim Prins
720eb88cad Make no-op function match new interface.
This commit was SVN r12142.
2006-10-17 17:34:06 +00:00
Tim Prins
8b0170148e Add some missing headers.
This commit was SVN r12141.
2006-10-17 17:28:02 +00:00
Tim Prins
5d31332f97 Goodbye poe, long live LoadLeveler...
This commit was SVN r12140.
2006-10-17 17:07:48 +00:00
Rainer Keller
668902c780 - trivial spelling
This commit was SVN r12139.
2006-10-17 16:34:52 +00:00
Ralph Castain
13227e36ab This commit looks a lot bigger than it is, so relax :-)
Fix the problem observed by multiple people that comm_spawned children were (once again) being mapped onto the same nodes as their parents. This was caused by going through the RAS a second time, thus overwriting the mapper's bookkeeping that told RMAPS where it had left off.

To solve this - and to continue moving forward on the ORTE development - we introduce the concept of attributes to control the behavior of the RM frameworks. I defined the attributes and a list of attributes as new ORTE data types to make it easier for people to pass them around (since they are now fundamental to the system, and therefore we will be packing and unpacking them frequently). Thus, all the functions to manipulate attributes can be implemented and debugged in one place.

I used those capabilities in two places:

1. Added an attribute list to the rmgr.spawn interface.

2. Added an attribute list to the ras.allocate interface. At the moment, the only attribute I modified the various RAS components to recognize is the USE_PARENT_ALLOCATION one (as defined in rmgr_types.h).

So the RAS components now know how to reuse an allocation. I have debugged this under rsh, but it now needs to be tested on a wider set of platforms.

This commit was SVN r12138.
2006-10-17 16:06:17 +00:00
Rainer Keller
3f88937081 - Error logging is really not yet enabled.
- Correct the error log for orte_errmgr_base_select
 - Spelling fixes

This commit was SVN r12135.
2006-10-17 09:11:20 +00:00
George Bosilca
ed83927025 Don't reset the convertor when a persistent request complete. Instead reset it
next time then request is used. This will keep the execution path on the default
case (not persistent) shorter.

This commit was SVN r12134.
2006-10-17 05:01:47 +00:00
George Bosilca
ef66afe45c Another inner loop optimization. Only check for num_fails when prev_bytes is
equal to num_bytes.

This commit was SVN r12133.
2006-10-17 04:38:38 +00:00
George Bosilca
6a4cf1fef5 A request is inactive only if it is persistant. Therefore, we don't need
2 sequential ifs, we can reach the same result with 2 embedded ifs.

This commit was SVN r12132.
2006-10-17 04:27:00 +00:00
George Bosilca
e116a37482 My last commit was wrong. Here is the correct version.
This commit was SVN r12131.
2006-10-17 02:45:03 +00:00
George Bosilca
01f5b4007b Check the count. It has to be a positive number.
This commit was SVN r12130.
2006-10-17 02:40:17 +00:00
Ralph Castain
16e52c5784 Fix a non-compliance issue regarding hostfiles as reported by Sun. The man page states that an entry that specifies slots_max but does not specify "slots" will have the soft limit default to the hard limit. The hostfile implementation, however, defaulted the soft limit to 1.
This fix changes that behavior to conform to the man page.

This commit was SVN r12129.
2006-10-17 00:43:12 +00:00
Jeff Squyres
0ebe687ed8 Refs trac:502, #503. This commit as a result of review of r12122.
* Update comments in some MPI_FILE_* functions to reflect that the
   MPI specs have different page numbers in the ps and pdf (woof!).
 * Update comments to say "Retain" where we meant retain (not "return)
 * Add a check in MPI_ERRHANDLER_FREE to raise an MPI exception if the
   user attempts to free an intrinsic errhandler *and* the refcount is
   1 (meaning that it would actually free the intrinsic).  This
   protects erroneous programs from segv'ing.
 * Remove lengthy comment from comm_get_errhandler.c which is no
   longer valid (because of the MPI-2 errata that says that users *do*
   have to call MPI_ERRHANDLER_FREE).

This commit was SVN r12128.

The following SVN revision numbers were found above:
  r12122 --> open-mpi/ompi@407b3cb788

The following Trac tickets were found above:
  Ticket 502 --> https://svn.open-mpi.org/trac/ompi/ticket/502
2006-10-17 00:18:35 +00:00
Pavel Shamis
d64cb58007 Tavor HCA vendor_id was changed to correct one: 23108
Reviewed by: Jeff

This commit was SVN r12127.
2006-10-16 14:46:39 +00:00
Brian Barrett
fac7912e44 There's actually no reason to mention the C++ bindings seek change in the README,
as there's nothing specifically that the user needs to know about.

This commit was SVN r12126.
2006-10-16 14:28:51 +00:00
Brian Barrett
fdc8f69b84 * need to include iostream as well as stdio.h when doing the tricks with
MPI::SEEK_* because iostreams (well, ios_base, but I don't think that
    should be included directly) can use SEEK_* as values in an enum, which
    means that 'const int' is bad for them.
  * Remove now useless comments in the cxx example programs
  * include iostream after mpi.h so that our examples work with other MPI
    implementations that don't try to be as friendly with the constants.

Refs trac:387

This commit was SVN r12125.

The following Trac tickets were found above:
  Ticket 387 --> https://svn.open-mpi.org/trac/ompi/ticket/387
2006-10-16 14:20:31 +00:00
Brian Barrett
9a8e3d2318 Don't document a bug that Jeff found as life as it stands (Refs trac:387)
This commit was SVN r12124.

The following Trac tickets were found above:
  Ticket 387 --> https://svn.open-mpi.org/trac/ompi/ticket/387
2006-10-16 13:42:40 +00:00
Jeff Squyres
18e34484fa Refs trac:387
* Document --disable-mpi-cxx-seek
 * Document that you need to include "mpi.h" after system-level
   headers that create the SEEK_* constants
 * Make the C++ examples follow this behavior (include "mpi.h" after
   <iostream>)

This commit was SVN r12123.

The following Trac tickets were found above:
  Ticket 387 --> https://svn.open-mpi.org/trac/ompi/ticket/387
2006-10-16 13:22:22 +00:00
Jeff Squyres
407b3cb788 Fix some problems with errhandlers:
* Fix MPI-2 page number in comments for a specific reference in the
   spec
 * Allow getting/setting the errhandler on MPI_FILE_NULL
 * Allow freeing of intrinsic errhandlers, per MPI-2 errata (if you GET
   an errhandler on a communicator, you must be able to FREE it, even
   if it's an intrinsic).

Thanks to Lisandro Dalcin for reporting these problems.

This commit was SVN r12122.
2006-10-16 12:58:40 +00:00
Brian Barrett
e7a7a64e4c Implement MPI::SEEK_{SET, END, POS} for the C++ bindings, working around
some issues with the C #defines SEEK_{SET, END, POS}.  The workaround
involves some hackery that should work in almost every common use case
for the C stdio constants (and all the legal issues of the MPI constants).
The one issue is that the C stdio constants are now const ints instead
of #defines, which means that #ifdef checks will fail for the constants.

Behavior can be disabled at either configure time or build time.

Refs trac:387

This commit was SVN r12121.

The following Trac tickets were found above:
  Ticket 387 --> https://svn.open-mpi.org/trac/ompi/ticket/387
2006-10-15 23:50:24 +00:00
Brian Barrett
9adde4f7b8 Allow multilib capability based on compiler flags. See:
https://svn.open-mpi.org/trac/ompi/wiki/compilerwrapper3264
for more information.

Refs trac:374

This commit was SVN r12120.

The following Trac tickets were found above:
  Ticket 374 --> https://svn.open-mpi.org/trac/ompi/ticket/374
2006-10-15 21:21:08 +00:00
Brian Barrett
14f338b7df Fix for lock/unlock epoch issues. Previously, we did not handle the case
where a window was in both the passive and active side of a lock sequence.

Refs trac:488

This commit was SVN r12112.

The following Trac tickets were found above:
  Ticket 488 --> https://svn.open-mpi.org/trac/ompi/ticket/488
2006-10-12 22:52:13 +00:00
Ralph Castain
3f55d6897a Remove the memory debugging options. Fix what appears to be a typo in a help file.
This commit was SVN r12107.
2006-10-12 00:44:48 +00:00
Brian Barrett
fce5130333 Delay opening the listen socket until module init, so that we can have the
seed value have something set to true.  Allow selection of the listen
type to thread if (and only if) the process is the HNP...

This commit was SVN r12105.
2006-10-11 21:29:29 +00:00
Brian Barrett
29c91cf2f3 * Fix issue in odls_bproc where we were using vpid instead of the number of
processes launched locally for the stdio file names.  This was causing
    the expected files to not exist and bproc_vexecmove_io to fail.
  * Clean up a bunch of debugging output in the bproc pls

This commit was SVN r12102.
2006-10-11 20:34:12 +00:00
Ralph Castain
f91a95b3fe Fix the bug that caused mpirun to hang when a remote executable wasn't found using the rsh launcher. Will now test on a remote node
This commit was SVN r12095.
2006-10-11 18:43:13 +00:00
Brian Barrett
7dc9995955 Use write() instead of fprintf() for output to stdout / stderr. Fixes an issue
I was running into where if a string in the argument list contains a printf
escape sequence, we would segfault.  In particular, I was using opal_output
to print the environment and had something like:

  LESSOPEN=|/usr/bin/lesspipe.sh %s

in my environment.  So I called opal_output(0, "%s", environ[i]) and
got a segfault because the fprintf tried to expand the %s in the
environment variable

This commit was SVN r12094.
2006-10-11 18:40:21 +00:00
Ralph Castain
2da8245be0 Correctly propagate no-daemonize
This commit was SVN r12093.
2006-10-11 17:53:17 +00:00
Ralph Castain
e5877cc459 Add the proper valgrind params
This commit was SVN r12092.
2006-10-11 17:48:41 +00:00
George Bosilca
c75cfd3efc Remove all warnings from the TotalView interface.
This commit was SVN r12091.
2006-10-11 17:29:29 +00:00