Josh Hursey
b73237c92a
Identify the process sending the update in the verbose message (helps debugging of process control).
...
This commit was SVN r22804.
2010-03-10 00:23:24 +00:00
Shiqing Fan
49502af2ba
fix the type cast.
...
This commit was SVN r22800.
2010-03-09 10:02:50 +00:00
Ralph Castain
4355134991
Let the vm launcher specify the mapping policy
...
This commit was SVN r22797.
2010-03-08 19:13:21 +00:00
Ralph Castain
bfa39d7f7e
Update the seq mapper to support lists from -host. Reorg the dash_host code to provide an ordered list as required by the seq mapper
...
This commit was SVN r22795.
2010-03-08 09:54:49 +00:00
Ralph Castain
9e7f621a98
Port Brad's paffinity change to the 1.4 branch over to the trunk so we don't lose it going forward.
...
This commit was SVN r22794.
2010-03-07 18:44:22 +00:00
Ralph Castain
2a0f7e95ee
Don't double account for the killed local proc - only adjust num_local_procs when the proc actually dies.
...
This commit was SVN r22787.
2010-03-05 13:53:18 +00:00
Ralph Castain
b2e24693c4
Check the return status when we forward stdin and remove the recipient when they are no longer alive
...
This commit was SVN r22786.
2010-03-05 13:41:28 +00:00
Ralph Castain
577eef1491
Pretty-print the recvd command for debug purposes
...
This commit was SVN r22785.
2010-03-05 13:38:20 +00:00
Ralph Castain
cdae19cf7b
Add a convenience macro to make a job family
...
This commit was SVN r22784.
2010-03-05 13:35:09 +00:00
Ralph Castain
f2c65dc70f
Ensure that the errmgr does not take action if the process was terminated by a "kill_procs" command as this can lead to circular logic.
...
Cleanup the kill_procs command by removing a no-longer-used param. We update the process state when the proc actually exits.
This commit was SVN r22783.
2010-03-05 13:22:12 +00:00
Ralph Castain
ef6c432e22
Fix a nasty bug where we would hang if an application trapped signals such as SIGTERM - a permissible thing to do. In such cases, we removed the process from the waitpid system and then sent it a SIGTERM. If the application trapped that and attempted to cleanly terminate, it would send us a sync message - and the daemon would then add it back to its local child list, causing both the daemon and the process to hang.
...
In this revision, we let the process terminate/exit however it can, and then pick it up via the usual waitpid.
This commit was SVN r22781.
2010-03-05 04:14:56 +00:00
Shiqing Fan
db747e4390
Remove the old timing parameter but using orte_timing instead. Thanks for Rainer.
...
This commit was SVN r22775.
2010-03-04 15:00:03 +00:00
Ralph Castain
c88fe1ea54
Create a new mca parameter to control creation of session directories. Defaults to true so that the current behavior of always creating them is preserved. If set to false (0), then don't create session directories. Helps in those environments where session directories are a problem.
...
Tell the sm btl that it cannot run if no session directories were created.
This commit was SVN r22756.
2010-03-02 15:18:33 +00:00
Ralph Castain
cd1efbb41e
Try and do a better job of cleanup in abnormal termination. Ensure the daemons whack session directories prior to disabling signal traps. Ensure that the HNP and daemons all cleanup when they are doing an internal abort.
...
This commit was SVN r22755.
2010-03-02 14:51:23 +00:00
Ralph Castain
b692645772
Remote daemons should -always- whack any lingering session directories when exiting
...
This commit was SVN r22749.
2010-03-02 05:28:53 +00:00
Ralph Castain
69fe5ca69b
Correctly compute bynode mapping, even in the presence of a $#$%#@^$ rankfile
...
This commit was SVN r22748.
2010-03-02 05:21:42 +00:00
Ralph Castain
bef06d52bc
Silence compiler warning
...
This commit was SVN r22747.
2010-03-01 21:04:26 +00:00
Ralph Castain
5514d9c673
Fix the stupid rankfile mapper again, hopefully not breaking everything else to accommodate it. Looks like the round-robin mappers still work, at least...
...
This commit was SVN r22746.
2010-03-01 20:40:47 +00:00
Ralph Castain
96590b9fad
Filter multicast messages to avoid cross-job confusion
...
This commit was SVN r22729.
2010-02-28 18:22:56 +00:00
Ralph Castain
359dc5cad3
Complete the app_idx change by cleaning up warnings in mappers
...
This commit was SVN r22728.
2010-02-27 18:14:27 +00:00
Ralph Castain
2541aa98ab
Change the app_idx type to uint32_t to support users who use large numbers of app_contexts. Set it up as a new typedef so we can change it later without as much effort.
...
This commit was SVN r22727.
2010-02-27 17:37:34 +00:00
Ralph Castain
6c0d7940c7
Add a new MCA param (and corresponding mpirun cmd line option) to output the debugger proctable info after launch. The output is just the job map with the process pid included, so you get a node-by-node list of the process ranks on that node and thier pids.
...
Works for initial launch and comm_spawn. xml and non-xml output is available
This commit was SVN r22725.
2010-02-27 08:32:25 +00:00
Shiqing Fan
4a3f42d159
Correctly initialize the CCP command line buffer.
...
This commit was SVN r22721.
2010-02-26 15:53:00 +00:00
Ralph Castain
c6448587fe
It is okay to not select an rmcast module
...
This commit was SVN r22719.
2010-02-26 02:39:04 +00:00
Ralph Castain
b89a21f0fa
Grrr....cleanup the new module
...
This commit was SVN r22711.
2010-02-25 06:08:04 +00:00
Ralph Castain
8954700845
No, we don't have a .windows file...
...
This commit was SVN r22710.
2010-02-25 02:18:54 +00:00
Ralph Castain
18c7aaff08
Update the grpcomm framework to be more thread-friendly.
...
Modify the orte configure options to specify --enable-multicast such that it directs components to build or not instead of littering the code base with #if's. Remove those #if's where they used to occur.
Add a new grpcomm "mcast" module to support multicast operations. Still some work required to properly perform daemon collectives for comm_spawn operations. New module only builds when --enable-multicast is provided, and when specifically selected.
This commit was SVN r22709.
2010-02-25 01:11:29 +00:00
Jeff Squyres
af6f1f4b00
Add pkg-config(1) config files to Open MPI. Additionally, fix a minor
...
bug: libmpi_f90 had libmpi.la in its LIBADD instead of libmpi_f77.la.
Fixes trac:2244.
This commit was SVN r22704.
The following Trac tickets were found above:
Ticket 2244 --> https://svn.open-mpi.org/trac/ompi/ticket/2244
2010-02-24 18:46:06 +00:00
Shiqing Fan
44fe33452c
Use this option only on Windows, so protect it with #ifdef __WINDOWS__.
...
This commit was SVN r22701.
2010-02-24 08:50:03 +00:00
Jeff Squyres
d9b6b5af0c
This commit converts us to the "one big libmpi" scheme that has been
...
discussed extensively. See
https://svn.open-mpi.org/trac/ompi/ticket/2092 and the RFC thread
http://www.open-mpi.org/community/lists/devel/2010/02/7447.php .
Specifically:
* Create LT convenience libraries for OPAL and ORTE if the layer
above them is being created (use the already-defined
AM_CONDITIONALs to know if the project above us is being built).
* ORTE slurps in the LT convenience library for OPAL; OMPI slurps in
the LT convenience library for ORTE.
* Wrapper compilers now only -l one library (e.g., ortecc only does
-lopen-ret, and mpicc only does -lmpi).
This commit was SVN r22691.
2010-02-23 22:20:01 +00:00
Shiqing Fan
7a5a5ce024
Add a global option for inputing the head node name for Windows CCP trough command line.
...
This commit was SVN r22688.
2010-02-23 19:42:51 +00:00
Jeff Squyres
f65eebf53d
More changes for NetBSD. Thanks to Aleksej Saushev for this patch.
...
This commit was SVN r22680.
2010-02-22 15:05:09 +00:00
Ralph Castain
65a8ab4267
Cleanup the kill_procs command. Send a SIGTERM initially to allow C/R operations, and to be polite. Correctly update proc state if there is a problem so we don't hang.
...
The change to just using SIGKILL was originally done due to problems whereby waitpid thought a proc had died, but it hadn't. We'll continue debugging that problem separately, but SIGTERM is required for C/R to work properly.
This commit was SVN r22674.
2010-02-21 19:35:32 +00:00
Ralph Castain
2be03b4fb6
Cleanup a few bugs in the rmcast subsystem
...
This commit was SVN r22650.
2010-02-18 01:54:45 +00:00
Shiqing Fan
08ffdbe987
Changes for portable platform headers. Commit it on behalf of Ralph.
...
This commit was SVN r22619.
2010-02-15 22:14:59 +00:00
Ralph Castain
9a5fdbb622
Continue development of reliable multicast
...
This commit was SVN r22616.
2010-02-14 19:20:56 +00:00
Ralph Castain
58a1151566
Ensure the man page gets into the tarball
...
This commit was SVN r22613.
2010-02-13 02:39:10 +00:00
Ralph Castain
dc6de3e9b5
Add an "orte-info" tool to report ORTE and OPAL configuration and parameter info ala "ompi_info". Temporary solution until refactoring of the "info" system can be undertaken.
...
This commit was SVN r22608.
2010-02-11 23:02:14 +00:00
Josh Hursey
ec4498c258
update copyright
...
This commit was SVN r22605.
2010-02-11 14:03:36 +00:00
Josh Hursey
4513440df9
The ordering of the appfile matters. The ranks must be in increasing rank
...
order, so that they get assigned the same ORTE name on restart.
cmr:v1.4.2
cmr:v1.5
This commit was SVN r22586.
2010-02-09 00:31:24 +00:00
Eugene Loh
924bc10ceb
Add description of the sequential mapper (-mca rmaps seq) to the
...
mpirun man page.
This commit was SVN r22567.
2010-02-06 06:22:29 +00:00
Josh Hursey
a3583b8f57
Fix --bynode option to remember for subsequent jobs where it left off last time.
...
Add a ''map_bynode'' info key to determine if the job to be started by comm_spawn* should be mapped by node or by slot. Default is to map according to the default policy set when the parent job was started.
cmr:v1.5.1
This commit was SVN r22564.
2010-02-05 15:37:49 +00:00
Iain Bason
28f03a2d86
Suspend/resume enhancements:
...
Have orte call setpgrp after forking (but before exec) when
orte_forward_job_control is set. Then have it send signals to the
child's process group. This allows suspending jobs that fork.
If a SIGTSTP arrives before the processes have been launched, then
record it and suspend them right after launching.
This commit was SVN r22557.
2010-02-04 15:47:20 +00:00
Shiqing Fan
bbcf1f71c4
Remove a incorrect callback, which was based on the old source base without WMI. This makes no harm on 32 bit Windows, but it seems causing exceptions on 64 bit Windows sometimes. What this callback does is just waiting on the given pid which actually is a remote pid, so it won't work as expected.
...
cmr:v1.4.2
cmr:v1.5
This commit was SVN r22549.
2010-02-04 10:47:28 +00:00
Ralph Castain
c2aba2a6d7
Pack the resource key correctly
...
This commit was SVN r22541.
2010-02-03 18:21:27 +00:00
Ralph Castain
16b7bc7a82
Sigh...get the order right to match unpack
...
This commit was SVN r22539.
2010-02-03 15:50:43 +00:00
Ralph Castain
e88627a7ca
Ensure we don't go through rml open/select more than once.
...
Open the rml to get the uri when bootstrapping daemons
This commit was SVN r22538.
2010-02-03 15:38:32 +00:00
Ralph Castain
cb1007b5a9
Pass back the number of daemons in the system
...
This commit was SVN r22537.
2010-02-03 14:31:16 +00:00
Shiqing Fan
bdc13dacb1
A type cast.
...
This commit was SVN r22520.
2010-01-31 20:22:22 +00:00
Jeff Squyres
007a6c7b99
Per #2201 , move the user arguments up to be the first set of argv
...
after the compiler argv tokens.
Not closing #2201 yet; there's still discussion on that ticket about
whether we want to do more or not.
Refs trac:2201
cmr:v1.4.2
cmr:v1.5
This commit was SVN r22513.
The following Trac tickets were found above:
Ticket 2201 --> https://svn.open-mpi.org/trac/ompi/ticket/2201
2010-01-29 22:51:35 +00:00