1
1

989 Коммитов

Автор SHA1 Сообщение Дата
Rainer Keller
125ba1acfa - Reduce the amount of warnings with -Wshadow -- mainly due to
usage of index and abs in inline-fcts in header files.

This commit was SVN r13217.
2007-01-19 19:48:06 +00:00
Ralph Castain
b63d4ddfbf Clean up a compiler warning under bproc
This commit was SVN r13198.
2007-01-18 20:07:06 +00:00
Ralph Castain
1487e22ec8 Store the mapping mode so that it can be recovered later
This commit was SVN r13197.
2007-01-18 20:00:15 +00:00
Ralph Castain
2c46e10692 Convert this test back to the old form of xcast API
This commit was SVN r13194.
2007-01-18 19:32:43 +00:00
Ralph Castain
455e4ada9a Bring the modified/updated pernode and npernode behaviors over from the openrte repository. This change enables npernode to pay attention to the total #procs to be launched, and cleans up the bynode vs. byslot mapping directives when in pernode and npernode modes.
This commit was SVN r13191.
2007-01-18 17:15:19 +00:00
Brian Barrett
2755d5ccef Only do Windows things if we're on Windows. Need another case for when we
don't have windows and we don't have waitpid() (ie, the Cray)

This commit was SVN r13173.
2007-01-17 23:16:52 +00:00
Brian Barrett
ffe35ef6b8 Update the Cray XT3 run-time support files to compile with latest RTE changes
This commit was SVN r13172.
2007-01-17 22:47:27 +00:00
Ralph Castain
e093b5a256 Check for NULL before release, just to be safe
This commit was SVN r13162.
2007-01-17 21:29:34 +00:00
Ralph Castain
da82359446 Update some of the orte tests to sync with openrte repository
This commit was SVN r13155.
2007-01-17 16:15:37 +00:00
Jeff Squyres
6f7adfe231 Fix for the oob base open and close functions being invoked twice by
ompi_info -- once directly and once via the rml oob component.

This commit was SVN r13152.
2007-01-17 15:18:13 +00:00
Ralph Castain
cc905290e4 Fix the pernode and npernode options - the mca parameters weren't being set to correspond to the command line options
This commit was SVN r13151.
2007-01-17 14:56:22 +00:00
Jeff Squyres
3983141342 Remove this extra #if -- it wasn't necessary and was causing compiler warnings.
This commit was SVN r13146.
2007-01-17 13:53:02 +00:00
Ralph Castain
5d698dc55b Turn "off" an unimplemented command line option - we do not currently support execution without mpirun waiting for job completion.
This commit was SVN r13127.
2007-01-16 16:10:31 +00:00
Ralph Castain
f08210b3e1 Fix a double-free error
This commit was SVN r13126.
2007-01-16 15:49:16 +00:00
Jeff Squyres
e5205657cf A much better fix for #739. No configure test -- just do a simple
memcpy() instead of assigning the struct's by value.

Fixes trac:739.

This commit was SVN r13081.

The following Trac tickets were found above:
  Ticket 739 --> https://svn.open-mpi.org/trac/ompi/ticket/739
2007-01-11 14:30:32 +00:00
Jeff Squyres
add3909096 Back out 13076 and 13077 in favor of a much simpler approach.
Sorry for the configure change -- hopefully it's early enough in the
morning that it won't affect people... (new approach won't have a
configure change).

Refs trac:739.

This commit was SVN r13080.

The following Trac tickets were found above:
  Ticket 739 --> https://svn.open-mpi.org/trac/ompi/ticket/739
2007-01-11 14:07:15 +00:00
George Bosilca
24a91fad1d OPAL_BOOL_STRUCT_COPY or OMPI_BOOL_STRUCT_COPY that's the question!
Let's minimize the disturbances and say that the configure system is right.
From now on it's OPAL_BOOL_STRUCT_COPY. This one is related to r13076 and
has to follow when r13076 goes in the 1.2.

This commit was SVN r13077.

The following SVN revision numbers were found above:
  r13076 --> open-mpi/ompi@f0932a0701
2007-01-11 05:44:48 +00:00
Jeff Squyres
f0932a0701 A workaround for a bug in the PGI 6.2 compiler series. This bug has
been fixed in the 7.0 PGI series, but is unlikely to be fixed in the
6.2 series:

 * Add a configure test looking for the bad behavior (the PGI compiler
   chokes on C code where structs containing bool's are copied by
   value)
 * Set OMPI_BOOL_STRUCT_COPY to 1 if it's ok, 0 if it's not (i.e., PGI
   6.2 series will have this value set to 0)
 * In two places in the code base -- orte-clean and btl_openib_ini.h,
   we have a struct that contains a bool that is copied by value.  In
   these two places, check OMPI_BOOL_STRUCT_COPY and if it's 1, use
   the "int" type instead of "bool".

Fixes trac:739

This commit was SVN r13076.

The following Trac tickets were found above:
  Ticket 739 --> https://svn.open-mpi.org/trac/ompi/ticket/739
2007-01-11 02:21:26 +00:00
George Bosilca
0f68bb0bc0 Add a debug flag for the ODLS process.
This commit was SVN r13075.
2007-01-11 00:17:34 +00:00
George Bosilca
c8222b57eb The Windows PLS now is able to spawn process locally.
This commit was SVN r13074.
2007-01-11 00:16:58 +00:00
Rolf vandeVaart
9fd5e55b50 Need to include strings.h because that is where the rindex()
function prototype lives.  Without this, we get compile 
warnings.  In addition, for 64-bit Solaris, we get a 
segmentation fault from orterun without this include.

This commit was SVN r13065.
2007-01-10 18:44:08 +00:00
Brian Barrett
03112254e7 Increase connection timeout to 600 seconds, which should always be higher than
the connect() timeout, so that we'll use that rather than our own timeout by
defualt.  There timeout was set low for Big Red, but causes problems for very
large clusters, as there's no way to wire them up in 10 seconds most of the
time.

This commit was SVN r13062.
2007-01-10 04:53:21 +00:00
George Bosilca
c7da2b0a9a Add the process PLS. It's only intended for Windows users.
This commit was SVN r13051.
2007-01-09 00:19:52 +00:00
Ralph Castain
950149ec50 Add another test/example program that uses the OpenRTE to dynamically spawn an application
This commit was SVN r13050.
2007-01-09 00:13:57 +00:00
George Bosilca
77452ea8ea Add a missing include and update the definition of orte rds proxy component.
This commit was SVN r13042.
2007-01-08 22:00:01 +00:00
George Bosilca
409d1b8a8d Make the universe creation functions Windows friendly again.
This commit was SVN r13041.
2007-01-08 21:58:57 +00:00
Jeff Squyres
8a289cf1cb Part 1 of the fix for ticket #726. This commit adds logic to orteun
to effect the following:

 * The first time the user hits ctrl-c, we go into the process of
   killing the ORTE job (this is not new).
 * While waiting for the job to actually terminate, if the user hits
   ctrl-c a second time, we print a warning saying "Hey, I'm still
   trying to kill the job.  If you *really* want me to die
   immediately, hit ctrl-c again within 1 second."
 * If the user hits ctrl-c a within 1 second, orterun quits with a
   warning about how the job may not have actually been killed.

Note that none of this logic won't really work until the second part
of the fix for #726 is also committed (i.e., make pls.terminate_job()
non-blocking).  So I'm now throwing the ticket over to Ralph for the
second part of the fix...

Refs trac:726

This commit was SVN r13040.

The following Trac tickets were found above:
  Ticket 726 --> https://svn.open-mpi.org/trac/ompi/ticket/726
2007-01-08 20:25:26 +00:00
Brian Barrett
e130f18cc2 Fix some compiler warnings that have slipped in lately...
This commit was SVN r13037.
2007-01-08 17:20:09 +00:00
Brian Barrett
a34e67d743 Remove unneeded PARAM_INIT_FILE variable in configure.params files used by
components that use configure.m4 for configuration or are always built. 
The macro has not been needed since moving to configure types other than
configure.stub

Fixes trac:590

This commit was SVN r13031.

The following Trac tickets were found above:
  Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590
2007-01-08 03:44:22 +00:00
Jeff Squyres
a91c017f81 The constant name changed from ORTE_RML_NAME_ANY to ORTE_NAME_WILDCARD
-- upcate the comments/documentation to match.

This commit was SVN r13001.
2007-01-05 13:38:22 +00:00
Rolf vandeVaart
fdf44cc4ab Add the ability to not only report broken files and directories,
but remove them also.  This current set of changes will affect
nothing as no one is making use of this ability.  However, orte-clean
will be changed soon to utilize this new feature.

This commit was SVN r12996.
2007-01-04 21:48:34 +00:00
Ralph Castain
3ce9b2f6cc Remove some debugging output that mistakenly was left behind.
This commit was SVN r12984.
2007-01-04 17:24:11 +00:00
Ralph Castain
6101050ea6 Remove an abstraction barrier I thought was gone long-ago. The OOB subscription really shouldn't be defined as an OMPI subscription.
I know it's just a technicality, but it is time to address such things rather than just letting them continue to propagate. :-)

This commit was SVN r12954.
2007-01-02 16:16:50 +00:00
Ralph Castain
90f5e3fad8 Fix a buglet in the singleton startup procedure. For purposes of minimizing the xcast message, we "strip" the descriptive info on all subscription messages. This means, though, that we have to store the process name and other info so it can be retrieved in the body of the subscription data (as opposed to in the description). This wasn't being done for singletons because they don't call the RMAPS to "map" themselves.
This has now been corrected. The singleton startup will dutifully call the mapper framework so that the proper data storage locations get initialized. Unfortunately, we then had to instruct the RMAPS not to allocate a vpid range for this job - otherwise, it would make a mistake and think there were two processes in it. Hence, a change was required to RMAPS to tell it "map this job, but don't allocate a vpid range for it".

This change will need to migrate across to 1.2 after it "soaks" the appropriate time.

This commit was SVN r12952.
2007-01-02 16:14:44 +00:00
Rich Graham
6cb2377015 Change the allocation of the shared memory backing file. The file
is allocated on a per comm_world instance, with the lowest rank
in comm_world on the given host creating and initializing the file,
and then notifying the remaining files via the OOB.

Reviewed: Ralph Castain, Brian Barrett
Addressing ticket #674.

This commit was SVN r12949.
2007-01-01 02:39:02 +00:00
Brian Barrett
b5057d923e fix compile error that crept in with orte changes. This just makes it
compile -- I can't check for correctness on this platform.  Someone
probably wants to do that.

This commit was SVN r12948.
2006-12-31 20:16:20 +00:00
Li-Ta Lo
6df4e80727 new XCPU PLS and SDS to work with libxcpu
This commit was SVN r12905.
2006-12-21 00:05:36 +00:00
Brian Barrett
414a87b14c On OS X, the lowest 8 bits of the exit status are for signals, so pushing
rc (which is -1 or 4 if we hit this case) resulted in an odd error that a
signal killed the proc (instead of a startup error, as is reality).
Instead, use the W_EXITCODE macro (if available) to build up an exit
code that has an error code for exit status, but does not make it look
like the process died from a signal

This commit was SVN r12890.
2006-12-18 02:30:05 +00:00
Brian Barrett
bc6cec346f Print out the description of the signal from mpirun when a proc was aborted
by a signal if we have strsignal()

This commit was SVN r12888.
2006-12-17 20:01:11 +00:00
Ralph Castain
a0ef517550 Fix some errors in the bproc components that prevented compiling. Thought I had already done this, but either those changes were lost when I did the merge, or my old man's memory is fading....
Whaz-at??? :-)

This commit was SVN r12874.
2006-12-15 19:40:04 +00:00
Ralph Castain
1e1d0e8a89 Set the app_num attribute into the process environment so we pick it up on the other end
This commit was SVN r12868.
2006-12-15 16:43:52 +00:00
Ralph Castain
677d1260aa cleanup nicely if we don't launch
This commit was SVN r12867.
2006-12-15 14:03:53 +00:00
Ralph Castain
cbb660504c Retain the ability to run valgrind on the bproc launcher - do not call bproc_version if "nolaunch" is specified.
This commit was SVN r12866.
2006-12-15 14:01:21 +00:00
Ralph Castain
64ec238b7b Repair support for Bproc 4 on 64-bit systems. Update the SMR framework to actually support the begin_monitoring API. Implement the get/set_node_state APIs.
This commit was SVN r12864.
2006-12-15 02:34:14 +00:00
Brian Barrett
38c2e43ac2 Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues...
This commit was SVN r12852.
2006-12-14 18:20:43 +00:00
Ralph Castain
7b8f445e13 Modify the "--display-map-at-launch" option to just "--display-map". Now that we have a "--do-not-launch" option, the "-at-launch" part of the display-map option was confusing. "--display-map" displays the resulting process map before we launch anyway, so this is clearer.
This commit was SVN r12840.
2006-12-13 13:49:15 +00:00
Ralph Castain
82946cb220 Add a new option to orterun: "--do-not-launch" directs the system to do the allocation, map, job setup, etc., but don't actually launch the job. This lets us test all the setup portions of the code.
Also, take the first step in updating how we handle mca params in ORTE - bring it closer to how it is done in the other two layers. Much more work to be done here.

This commit was SVN r12838.
2006-12-13 04:51:38 +00:00
Ralph Castain
3b064a624e For convenience, revise the orte_job_map_t object so it includes the vpid start/range values, the number of nodes, and the number of processes on each node. These values are all used in various places in the code base - we currently re-compute them multiple times. Since these values do not change and are already being computed by the RMAPS framework, we might as well just save them for re-use.
This commit was SVN r12829.
2006-12-12 16:07:23 +00:00
Ralph Castain
28ce8e5e5e Extend the mpirun options to support "--npernode N". This option tells the system to spawn N procs/node across all nodes in the allocation. If N is greater than the number of allocated slots, then the usual oversubscription logic will apply (i.e., the system will error out if oversubscription is not allowed, otherwise it will run with the sched_yield set to non-aggressive behavior).
In "--npernode" operation, the "-np" command line parameter is ignored.

This commit was SVN r12826.
2006-12-12 00:54:05 +00:00
Ralph Castain
8314e8dbb9 Modify the pernode option so it can accept a request for the number of processes to be launched. We now check three use-cases for pernode:
1. no -np provided - put one proc/node across all allocated nodes

2. -np N provided, N > #nodes - we print a pretty error message and exit

3. -np N provided, N <= #nodes - put one proc/node across N nodes

I also added a new orte constant (ORTE_ERR_SILENT) that allows us to pass up the chain that an error was encountered, but NOT print ORTE_ERROR_LOG messages. This is intended to be used for cases where the error we encounter is NOT an orte error, but rather is one associated with incorrect user input (e.g., the preceding case 2). In such cases, there is no point in printing an ORTE_ERROR_LOG chain of messages as it isn't an orte error.

This commit was SVN r12821.
2006-12-11 18:07:07 +00:00