1
1

934 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
890e3c7981 Reset the trunk so that the odls now sets the paffinity and sched_yield params again. The sched_yield is still overridden by any user-specified setting.
This change utilizes the new num_processors function. I also left the mods made to ompi_mpi_init and the bug fix for the default value of mpi_yield_when_idle. Note that the mods to mpi_init will not really take effect as the mca param will now *always* be set (either by user or odls). We will need those mods later, so no point in removing them now.

This commit was SVN r13519.
2007-02-06 19:51:05 +00:00
Jeff Squyres
c91fcd7fbd Fix a bunch of minor typos submitted by Bernhard Fischer.
This commit was SVN r13505.
2007-02-06 12:00:30 +00:00
Ralph Castain
a8202742ba Fix a missing function pointer - reference ticket #854
This commit was SVN r13476.
2007-02-02 23:10:14 +00:00
Ralph Castain
3daf8b341b Fix the sched_yield problem for generic environments. We now determine and set sched_yield during mpi_init based on the following logical sequence:
1. if the user has specified sched_yield, we simply do what we are told

2. if they didn't specify anything, try to get the number of processors on this node. Note that we already now get the number of local procs in our job that are sharing this node - that now comes in through the proc callback and is stored in the ompi_proc_t structures.

3. if we can get the number of processors, compare that to the number of local procs from my job that are sharing my node. If the number of local procs exceeds the number of processors, then set sched_yield to true. If not, then be a hog and set sched_yield to false

4. if we can't get the number of processors, default to conservative behavior and set sched_yield to true.

Note that I have not yet dealt with the need to dynamically adjust this setting as more processes are added via comm_spawn. So far, we are *only* looking within our own job. Given that we have now moved this logic to mpi_init (and away from the orteds), it isn't yet clear to me how a process will be informed about the number of procs in *other* jobs that are also sharing this node.

Something to continue to ponder.

This commit was SVN r13430.
2007-02-01 19:31:44 +00:00
Ralph Castain
c754523a14 Add cancel_operations to the pls module definition for tm
This commit was SVN r13416.
2007-02-01 16:52:28 +00:00
Ralph Castain
51fb746da3 Stop overriding the yield_when_idle mca param if the user has set it
This commit was SVN r13414.
2007-02-01 15:01:12 +00:00
George Bosilca
9f73335bdb Silence the compiler.
This commit was SVN r13381.
2007-01-31 04:24:56 +00:00
Jeff Squyres
8d872b195a Refs trac:726
Tested this functionality quite a bit more and made some fixes:

 * Print far fewer help messages
 * Fix one additional deadlock upon error
 * Change some ORTE_LOG messages to silent (because they're not
   errors)
 * Some code got re-indented, sorry...

Discussed and reviewed with Ralph.

This commit was SVN r13375.

The following Trac tickets were found above:
  Ticket 726 --> https://svn.open-mpi.org/trac/ompi/ticket/726
2007-01-30 23:03:13 +00:00
Rainer Keller
061ba05439 - Fixes uncovered with the format attribute to
opal_output and opal_output_verbose

This commit was SVN r13371.
2007-01-30 20:56:31 +00:00
Rainer Keller
3669e8921e - Fix further compiler warnings regarding initialization
and shadowing variables.

This commit was SVN r13358.
2007-01-30 06:34:38 +00:00
George Bosilca
dea69e3c7c Remove one of the %s.
This commit was SVN r13357.
2007-01-30 03:56:48 +00:00
Rich Graham
f6c99d0207 set orte_odls_base.components_available to false if no odls components are
available.  Startup now works if no odls components are availble.

This commit was SVN r13339.
2007-01-27 15:37:13 +00:00
Jeff Squyres
3c5c8c3c4c Refinement of Rainer's r13227 and r13228 (worked with Rainer, Ralph,
and George on these refinements):

 * Rename the static OBJ initializer macro to be
   OPAL_OBJ_STATIC_INIT(class)
 * Ensure that all static OBJ initializations get a refcount of 1
   (doesn't ''really'' matter, since they're static, it should never
   get to the point where the OBJ is DESTRUCTed, but more correct
   nonetheless)
 * Add a "magic number" to the OBJ when compiling with debug support.
   The magic number does some rudimentary support to ensure that
   you're operating on a valid OBJ (and fails an assertion if you're
   not).  Check to ensure that the memory contains the magic number
   when performing actions of OBJ's.  Also remove the magic number
   when DESTRUCTing OBJs, so that if, for example, an OBJ is
   DESTRUCTed more than once, we'll fail the magic number assert.

This commit was SVN r13338.

The following SVN revision numbers were found above:
  r13227 --> open-mpi/ompi@96030de97b
  r13228 --> open-mpi/ompi@c2e9075d29
2007-01-27 13:44:03 +00:00
George Bosilca
668a2bd7ac Remove some debug output.
This commit was SVN r13323.
2007-01-26 08:09:22 +00:00
George Bosilca
bd7eebda83 Deal with the argv problem from r13321 for the Windows PLS.
This commit was SVN r13322.

The following SVN revision numbers were found above:
  r13321 --> open-mpi/ompi@b439e87f96
2007-01-26 07:21:07 +00:00
George Bosilca
b439e87f96 We have this one starting from r12059. We save a pointer to the argv[*] and
then we modify the argv, forcing the reallocation of the array. With luck
the saved pointer still have a meaning ... without execve return with error
14 (EFAULT).

This commit was SVN r13321.

The following SVN revision numbers were found above:
  r12059 --> open-mpi/ompi@ae79894bad
2007-01-26 07:06:52 +00:00
Rich Graham
3488b394be fix typo in name of the cancel operation.
This commit was SVN r13312.
2007-01-25 19:07:27 +00:00
Jeff Squyres
580a7a108c Fix a compiler warning.
This commit was SVN r13310.
2007-01-25 17:22:01 +00:00
Tim Prins
e199bf9b64 Refs trac:801
Fix compiler warning

This commit was SVN r13308.

The following Trac tickets were found above:
  Ticket 801 --> https://svn.open-mpi.org/trac/ompi/ticket/801
2007-01-25 16:12:05 +00:00
Ralph Castain
ab5ea61100 Bring over the rest of the ctrl-c fixes. This commit includes:
1. add a "cancel_operation" API to the pls components that allows orterun to demand that an orted operation (e.g., terminate_job) be immediately cancelled and abandoned.

2. changes the pls orted commands from blocking to non-blocking. This allows us to interrupt those operations should an orted be non-responsive. The change also adds an orte_abort_timeout that limits how long orterun will automatically wait for the orteds to respond - if the terminate command, for example, doesn't see orted response within that time, then we printout an appropriate error message and just give up.

3. modifies orterun to allow multiple ctrl-c's to simply abort the program even if the orteds have not responded

4. does some cleanup on the orte-level mca params so that their implementation looks a lot more like that of ompi - makes it easier to maintain. This change also includes the definition of an orte_abort_timeout struct and associated MCA param (can't have too many!) so you can set the time after which orterun gives up on waiting for orteds to respond

This needs more testing before migrating to 1.2.

This commit was SVN r13304.
2007-01-25 14:17:44 +00:00
Ralph Castain
53967bd698 Fix a memory corruption problem deep inside the registry when subscriptions/triggers are processed. The create_value function will malloc space for the pointers to keyval objects, but doesn't actually allocate space for the objects themselves. When constructing the gpr_notify_data object, we forgot to OBJ_NEW the keyval objects. Since the create_value function didn't explicitly NULL those memory locations, it just so happened that there was a non-NULL address in them....which we dutifully dumped a keyval into.
This fix includes two parts: (a) we now initialize the keyval pointer locations to NULL after the malloc, and (b) we now OBJ_NEW the keyvals prior to storing info in them.

BTW, in case anyone reads this and wonders why we don't just OBJ_NEW the keyvals in create_value, the reason is simply that some places in the code use static keyvals and simply assign those addresses into the value object's array. So not everyone wants to OBJ_NEW keyvals - by not forcing it here in create_value, we give the user the flexibility to do whatever they want.

This commit was SVN r13300.
2007-01-25 12:54:02 +00:00
George Bosilca
dcce444ed4 When the user give a prefix that really means something. I expect to
start looking for the daemons using the prefix.

This commit was SVN r13297.
2007-01-25 07:35:25 +00:00
George Bosilca
3b988fcdfd Small update the the process PLS.
This commit was SVN r13293.
2007-01-25 00:17:54 +00:00
Tim Prins
4fd81b3407 Fixes trac:801
- Make it so the SLURM ras can handle different nodelist configurations
- Some code cleanup and better/more informative error messages and error handling

This commit was SVN r13271.

The following Trac tickets were found above:
  Ticket 801 --> https://svn.open-mpi.org/trac/ompi/ticket/801
2007-01-24 14:45:42 +00:00
George Bosilca
9b16827049 Add ORTE_DECLSPEC, and few conversions.
This commit was SVN r13268.
2007-01-24 00:52:08 +00:00
George Bosilca
1e38810c2d Correctly close the sockets on a generic way.
This commit was SVN r13254.
2007-01-23 03:17:23 +00:00
Ralph Castain
46b7df5683 Fix bproc nodename to correctly assign process-to-nodename mapping
This commit was SVN r13244.
2007-01-22 19:39:00 +00:00
Jeff Squyres
6584df9262 For --prefix-like behavior, we used to modifiy environ directly and
then exec the "srun..." from there.  But somewhere along the line, we
switched to having a copy of environ and modifying that.  It looks
like we forgot to update the stuff for --prefix behavior.  So this
commit fixes the setenv's for PATH and LD_LIBRARY_PATH to modify the
environ copy (not environ itself) so that the values properly get
passed down to the srun environment via execve().

This restores --prefix behavior in the SLURM pls.

This commit was SVN r13239.
2007-01-22 15:50:35 +00:00
George Bosilca
1b92589179 Update the PLS process.
This commit was SVN r13236.
2007-01-22 05:48:25 +00:00
Rainer Keller
c2e9075d29 - Define a OPAL_CLASS_EMPTY to be used for initialization.
Similar within the dt_module for the predefined datatypes.

This commit was SVN r13228.
2007-01-21 15:52:06 +00:00
Ralph Castain
b63d4ddfbf Clean up a compiler warning under bproc
This commit was SVN r13198.
2007-01-18 20:07:06 +00:00
Ralph Castain
1487e22ec8 Store the mapping mode so that it can be recovered later
This commit was SVN r13197.
2007-01-18 20:00:15 +00:00
Ralph Castain
455e4ada9a Bring the modified/updated pernode and npernode behaviors over from the openrte repository. This change enables npernode to pay attention to the total #procs to be launched, and cleans up the bynode vs. byslot mapping directives when in pernode and npernode modes.
This commit was SVN r13191.
2007-01-18 17:15:19 +00:00
Brian Barrett
ffe35ef6b8 Update the Cray XT3 run-time support files to compile with latest RTE changes
This commit was SVN r13172.
2007-01-17 22:47:27 +00:00
Ralph Castain
e093b5a256 Check for NULL before release, just to be safe
This commit was SVN r13162.
2007-01-17 21:29:34 +00:00
Jeff Squyres
6f7adfe231 Fix for the oob base open and close functions being invoked twice by
ompi_info -- once directly and once via the rml oob component.

This commit was SVN r13152.
2007-01-17 15:18:13 +00:00
Jeff Squyres
3983141342 Remove this extra #if -- it wasn't necessary and was causing compiler warnings.
This commit was SVN r13146.
2007-01-17 13:53:02 +00:00
Ralph Castain
f08210b3e1 Fix a double-free error
This commit was SVN r13126.
2007-01-16 15:49:16 +00:00
George Bosilca
0f68bb0bc0 Add a debug flag for the ODLS process.
This commit was SVN r13075.
2007-01-11 00:17:34 +00:00
George Bosilca
c8222b57eb The Windows PLS now is able to spawn process locally.
This commit was SVN r13074.
2007-01-11 00:16:58 +00:00
Rolf vandeVaart
9fd5e55b50 Need to include strings.h because that is where the rindex()
function prototype lives.  Without this, we get compile 
warnings.  In addition, for 64-bit Solaris, we get a 
segmentation fault from orterun without this include.

This commit was SVN r13065.
2007-01-10 18:44:08 +00:00
Brian Barrett
03112254e7 Increase connection timeout to 600 seconds, which should always be higher than
the connect() timeout, so that we'll use that rather than our own timeout by
defualt.  There timeout was set low for Big Red, but causes problems for very
large clusters, as there's no way to wire them up in 10 seconds most of the
time.

This commit was SVN r13062.
2007-01-10 04:53:21 +00:00
George Bosilca
c7da2b0a9a Add the process PLS. It's only intended for Windows users.
This commit was SVN r13051.
2007-01-09 00:19:52 +00:00
George Bosilca
77452ea8ea Add a missing include and update the definition of orte rds proxy component.
This commit was SVN r13042.
2007-01-08 22:00:01 +00:00
Brian Barrett
e130f18cc2 Fix some compiler warnings that have slipped in lately...
This commit was SVN r13037.
2007-01-08 17:20:09 +00:00
Brian Barrett
a34e67d743 Remove unneeded PARAM_INIT_FILE variable in configure.params files used by
components that use configure.m4 for configuration or are always built. 
The macro has not been needed since moving to configure types other than
configure.stub

Fixes trac:590

This commit was SVN r13031.

The following Trac tickets were found above:
  Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590
2007-01-08 03:44:22 +00:00
Jeff Squyres
a91c017f81 The constant name changed from ORTE_RML_NAME_ANY to ORTE_NAME_WILDCARD
-- upcate the comments/documentation to match.

This commit was SVN r13001.
2007-01-05 13:38:22 +00:00
Ralph Castain
3ce9b2f6cc Remove some debugging output that mistakenly was left behind.
This commit was SVN r12984.
2007-01-04 17:24:11 +00:00
Ralph Castain
6101050ea6 Remove an abstraction barrier I thought was gone long-ago. The OOB subscription really shouldn't be defined as an OMPI subscription.
I know it's just a technicality, but it is time to address such things rather than just letting them continue to propagate. :-)

This commit was SVN r12954.
2007-01-02 16:16:50 +00:00
Ralph Castain
90f5e3fad8 Fix a buglet in the singleton startup procedure. For purposes of minimizing the xcast message, we "strip" the descriptive info on all subscription messages. This means, though, that we have to store the process name and other info so it can be retrieved in the body of the subscription data (as opposed to in the description). This wasn't being done for singletons because they don't call the RMAPS to "map" themselves.
This has now been corrected. The singleton startup will dutifully call the mapper framework so that the proper data storage locations get initialized. Unfortunately, we then had to instruct the RMAPS not to allocate a vpid range for this job - otherwise, it would make a mistake and think there were two processes in it. Hence, a change was required to RMAPS to tell it "map this job, but don't allocate a vpid range for it".

This change will need to migrate across to 1.2 after it "soaks" the appropriate time.

This commit was SVN r12952.
2007-01-02 16:14:44 +00:00