1
1
Граф коммитов

1145 Коммитов

Автор SHA1 Сообщение Дата
Josh Hursey
0bf61a1b84 Move in some accumulated small features and minor bug fixes for C/R support.
{{{
svn merge -r 16447:16475 https://svn.open-mpi.org/svn/ompi/tmp/jjh-fgs .
}}}

This commit was SVN r16478.
2007-10-17 13:47:36 +00:00
Ralph Castain
ec5fe78876 When in the unity message routing mode, we have to update the RML contact info in the parent procs so that they know how to talk to the children. Ideally, this would be done in the MPI layer since that layer knows which procs are actively involved in the comm_spawn. However, it isn't being done there, which causes comm_spawn to fail, so do it explicitly in the RTE.
Note that this means ALL procs in the parent job are updated, even though they may not be participating in the comm_spawn. This doesn't really hurt anything - just unnecessary.

Comm_spawn still has a problem when a child process shares a node with a parent, so this doesn't fix everything. It only fixes the bug of ensuring all procs know how to talk to each other.

This commit was SVN r16460.
2007-10-16 16:09:41 +00:00
Ralph Castain
713b6e13a5 Improve diagnostic output messages when errors are hit
This commit was SVN r16457.
2007-10-16 14:51:52 +00:00
Josh Hursey
ea0652d20f If we are going to pretend to do filem, then we should always pretend.
No one should be using this feature except for me. :)

This commit was SVN r16454.
2007-10-15 20:04:35 +00:00
Ralph Castain
b6196e8a39 When we can detect that a daemon has failed, then we would like to terminate the system without having it lock up. The "hang" is currently caused by the system attempting to send messages to the daemons (specifically, ordering them to kill their local procs and then terminate). Unfortunately, without some idea of which daemon has died, the system hangs while attempting to send a message to someone who is no longer alive.
This commit introduces the necessary logic to avoid that conflict. If a PLS component can identify that a daemon has failed, then we will set a flag indicating that fact. The xcast system will subsequently check that flag and, if it is set, will send all messages direct to the recipient. In the case of "kill local procs" and "terminate", the messages will go directly to each orted, thus bypassing any orted that has failed.

In addition, the xcast system will -not- wait for the messages to complete, but will return immediately (i.e., operate in non-blocking mode). Orterun will wait (via an event timer) for a period of time based on the number of daemons in the system to allow the messages to attempt to be delivered - at the end of that time, orterun will simply exit, alerting the user to the problem and -strongly- recommending they run orte-clean.

I could only test this on slurm for the case where all daemons unexpectedly died - srun apparently only executes its waitpid callback when all launched functions terminate. I have asked that Jeff integrate this capability into the OOB as he is working on it so that we execute it whenever a socket to an orted is unexpectedly closed. Meantime, the functionality will rarely get called, but at least the logic is available for anyone whose environment can support it.

This commit was SVN r16451.
2007-10-15 18:00:30 +00:00
Jeff Squyres
423f23eb6a Fixes trac:1160. There is still some other problem in the OOB, but we
wanted to commit this to get wider testing.

This commit was SVN r16445.

The following Trac tickets were found above:
  Ticket 1160 --> https://svn.open-mpi.org/trac/ompi/ticket/1160
2007-10-15 15:41:36 +00:00
Josh Hursey
f16a42947a Change some default MCA parameters:
- Global snapshot directory = $HOME
 - FileM 'rsh' = 'ssh'
 - FileM 'rcp' = 'scp'

This commit was SVN r16444.
2007-10-15 15:21:17 +00:00
Josh Hursey
520c27ac94 If the HNP is acting as the orted for local launch then the gpr_replica
variable is not defined. Make sure to set it to something reasonable 
so that file preloading still works (instead of seg faulting :)

Thanks to Hiep Bui Hoang for reporting this bug.

This commit was SVN r16433.
2007-10-11 19:47:04 +00:00
Josh Hursey
e483c36cea Remove a big of debug in filem/rsh that should have never been committed.
A guesture towards overlapping file removal with metadata update.

This commit was SVN r16432.
2007-10-11 19:37:33 +00:00
Ralph Castain
3dbd4d9be7 Squeeeeeeze the launch message. This is the message sent to the daemons that provides all the data required for launching their local procs. In reorganizing the ODLS framework, I discovered that we were sending a significant amount of unnecessary and repeated data. This commit resolves this by:
1. taking advantage of the fact that we no longer create the launch  message via a GPR trigger. In earlier times, we had the GPR create the launch message based on a subscription. In that mode of operation, we could not guarantee the order in which the data was stored in the message - hence, we had no choice but to parse the message in a loop that checked each value against a list of possible "keys" until the corresponding value was found.

Now, however, we construct the message "by hand", so we know precisely what data is in each location in the message. Thus, we no longer need to send the character string "keys" for each data value any more. This represents a rather large savings in the message size - to give you an example, we typically would use a 30-char "key" for a 2-byte data value. As you can see, the overhead can become very large.

2. sending node-specific data only once. Again, because we used to construct the message via subscriptions that were done on a per-proc basis, the data for each node (e.g., the daemon's name, whether or not the node was oversubscribed) would be included in the data for each proc. Thus, the node-specific data was repeated for every proc.

Now that we construct the message "by hand", there is no reason to do this any more. Instead, we can insert the data for a specific node only once, and then provide the per-proc data for that node. We therefore not only save all that extra data in the message, but we also only need to parse the per-node data once.

The savings become significant at scale. Here is a comparison between the revised trunk and the trunk prior to this commit (all data was taken on odin, using openib, 64 nodes, unity message routing, tested with application consisting of mpi_init/mpi_barrier/mpi_finalize, all execution times given in seconds, all launch message sizes in bytes):

Per-node scaling, taken at 1ppn:

#nodes           original trunk                         revised trunk
             time               size                time               size
      1      0.10                819                0.09                564
      2      0.14               1070                0.14                677
      3      0.15               1321                0.14                790
      4      0.15               1572                0.15                903
      8      0.17               2576                0.20               1355
     16      0.25               4584                0.21               2259
     32      0.28               8600                0.27               4067
     64      0.50              16632                0.39               7683

Per-proc scaling, taken at 64 nodes

   ppn             original trunk                         revised trunk
              time               size                time               size
      1       0.50              16669                0.40               7720
      2       0.55              32733                0.54              11048
      3       0.87              48797                0.81              14376
      4       1.0               64861                0.85              17704


Condensing those numbers, it appears we gained:

per-node message size: 251 bytes/node -> 113 bytes/node

per-proc message size: 251 bytes/proc  -> 52 bytes/proc

per-job message size:  568 bytes/job -> 399 bytes/job 
(job-specific data such as jobid, override oversubscribe flag, total #procs in job, total slots allocated)

The fact that the two pre-commit trunk numbers are the same confirms the fact that each proc was containing the node data as well. It isn't quite the 10x message reduction I had hoped to get, but it is significant and gives much better scaling.

Note that the timing info was, as usual, pretty chaotic - the numbers cited here were typical across several runs taken after the initial one to avoid NFS file positioning influences.

Also note that this commit removes the orte_process_info.vpid_start field and the handful of places that passed that useless value. By definition, all jobs start at vpid=0, so all we were doing is passing "0" around. In fact, many places simply hardwired it to "0" anyway rather than deal with it.

This commit was SVN r16428.
2007-10-11 15:57:26 +00:00
Rolf vandeVaart
25c95c9ee9 Fix build on solaris. Need to include sys/wait.h.
This commit was SVN r16426.
2007-10-11 15:04:30 +00:00
Jeff Squyres
e2df42eea3 Move the <sys/wait.h> below "orte_config.h"
This commit was SVN r16424.
2007-10-11 11:31:09 +00:00
George Bosilca
7cc9f588a8 Decorate the base functions with ORTE_DECLSPEC.
This commit was SVN r16423.
2007-10-11 00:02:49 +00:00
Ralph Castain
53af94fd87 Modify the configure system so that gridengine support is only built in specific conditions:
1. --with-sge, always builds
2. --without-sge, never builds
3. if neither is specified, build if and only if either SGE_ROOT is set or "qrsh" is found in the path

This commit was SVN r16422.
2007-10-10 21:39:16 +00:00
Josh Hursey
6e5341c659 Forgot to move a header in the code movement.
This commit was SVN r16420.
2007-10-10 15:39:40 +00:00
Ralph Castain
82a8e2d10d Reorganize the odls framework to place common functionality in the base, thus making maintenance easier. We still need this to be a framework as some environments (e.g., bproc) require significantly different functionality. However, there is quite a bit of commonality across the components, so this ensures that fixes in one get propagated across the others.
This patch also fixes a minor bug discovered along the way: we had "lost" the passing of the oversubscribed condition flag from the mapper to the orteds. Thus, we were not setting sched_yield correctly when in oversubscribed conditions (except when a hostfile was specified - different logic there because we treat the number of slots allocated on the node as "uncertain")

I did not modify the process component in this patch - I will send a proposed patch to the maintainers of that component so they can review it first.

This commit was SVN r16418.
2007-10-10 15:02:10 +00:00
Josh Hursey
7f833a9cb2 silence a warning that is triggered on restart
This commit was SVN r16417.
2007-10-10 14:25:49 +00:00
Ethan Mallove
d0b61db65c Add in a missing #include for Solaris builds.
This commit was SVN r16416.
2007-10-10 12:49:15 +00:00
Josh Hursey
aa8391f888 Local and global coordinators should be the only ones involved in the
movement of checkpoint files. This reduces the overhead on the applicaiton.

This commit was SVN r16412.
2007-10-09 19:52:47 +00:00
Galen Shipman
fda1306807 revert my stupidity..
This commit was SVN r16410.
2007-10-09 19:01:20 +00:00
Josh Hursey
8fe2ef5647 a missing include
This commit was SVN r16402.
2007-10-09 14:32:36 +00:00
Josh Hursey
7437f37e96 This commit contains the following:
* Fix some missing includes in a few places.
 * Add the cr_request() functionality to the BLCR CRS component.
   We are now dependent upon the 0.6.* series of BLCR.
 * Made the CR notification mechanism a registered function.
   This way we can have an OPAL-only version and it can be replaced at
   runtime with the ORTE version.
 * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only
   CR functionality when the user wants it. Default: Disabled.
 * Fix the placement of a checkpoint request check in MPI_Init
 * Pull the OPAL notification mechanism into the SnapC framework.
   * We no longer fork/exec the 'opal-checkpoint' command for local
   checkpointing, the Local coordinator in the orted does this directly.
   * The Local and Application coordinator talk together bypassing the OPAL
   notifiation mechanism.
   * Optimized the Local <-> App Coordinator communication.
   * Improved the structure used to track vpid_snapshots in the local coord.
 * Fix a race condition in which an application under heavy communication load
   may produce an inconsistent global checkpoint.

This commit was SVN r16389.
2007-10-08 20:53:02 +00:00
Galen Shipman
1c1b9d5480 make cray happy
This commit was SVN r16377.
2007-10-08 14:31:59 +00:00
Ralph Castain
54b2cf747e These changes were mostly captured in a prior RFC (except for #2 below) and are aimed specifically at improving startup performance and setting up the remaining modifications described in that RFC.
The commit has been tested for C/R and Cray operations, and on Odin (SLURM, rsh) and RoadRunner (TM). I tried to update all environments, but obviously could not test them. I know that Windows needs some work, and have highlighted what is know to be needed in the odls process component.

This represents a lot of work by Brian, Tim P, Josh, and myself, with much advice from Jeff and others. For posterity, I have appended a copy of the email describing the work that was done:

As we have repeatedly noted, the modex operation in MPI_Init is the single greatest consumer of time during startup. To-date, we have executed that operation as an ORTE stage gate that held the process until a startup message containing all required modex (and OOB contact info - see #3 below) info could be sent to it. Each process would send its data to the HNP's registry, which assembled and sent the message when all processes had reported in.

In addition, ORTE had taken responsibility for monitoring process status as it progressed through a series of "stage gates". The process reported its status at each gate, and ORTE would then send a "release" message once all procs had reported in.

The incoming changes revamp these procedures in three ways:

1. eliminating the ORTE stage gate system and cleanly delineating responsibility between the OMPI and ORTE layers for MPI init/finalize. The modex stage gate (STG1) has been replaced by a collective operation in the modex itself that performs an allgather on the required modex info. The allgather is implemented using the orte_grpcomm framework since the BTL's are not active at that point. At the moment, the grpcomm framework only has a "basic" component analogous to OMPI's "basic" coll framework - I would recommend that the MPI team create additional, more advanced components to improve performance of this step.

The other stage gates have been replaced by orte_grpcomm barrier functions. We tried to use MPI barriers instead (since the BTL's are active at that point), but - as we discussed on the telecon - these are not currently true barriers so the job would hang when we fell through while messages were still in process. Note that the grpcomm barrier doesn't actually resolve that problem, but Brian has pointed out that we are unlikely to ever see it violated. Again, you might want to spend a little time on an advanced barrier algorithm as the one in "basic" is very simplistic.

Summarizing this change: ORTE no longer tracks process state nor has direct responsibility for synchronizing jobs. This is now done via collective operations within the MPI layer, albeit using ORTE collective communication services. I -strongly- urge the MPI team to implement advanced collective algorithms to improve the performance of this critical procedure.


2. reducing the volume of data exchanged during modex. Data in the modex consisted of the process name, the name of the node where that process is located (expressed as a string), plus a string representation of all contact info. The nodename was required in order for the modex to determine if the process was local or not - in addition, some people like to have it to print pretty error messages when a connection failed.

The size of this data has been reduced in three ways:

(a) reducing the size of the process name itself. The process name consisted of two 32-bit fields for the jobid and vpid. This is far larger than any current system, or system likely to exist in the near future, can support. Accordingly, the default size of these fields has been reduced to 16-bits, which means you can have 32k procs in each of 32k jobs. Since the daemons must have a vpid, and we require one daemon/node, this also restricts the default configuration to 32k nodes.

To support any future "mega-clusters", a configuration option --enable-jumbo-apps has been added. This option increases the jobid and vpid field sizes to 32-bits. Someday, if necessary, someone can add yet another option to increase them to 64-bits, I suppose.

(b) replacing the string nodename with an integer nodeid. Since we have one daemon/node, the nodeid corresponds to the local daemon's vpid. This replaces an often lengthy string with only 2 (or at most 4) bytes, a substantial reduction.

(c) when the mca param requesting that nodenames be sent to support pretty error messages, a second mca param is now used to request FQDN - otherwise, the domain name is stripped (by default) from the message to save space. If someone wants to combine those into a single param somehow (perhaps with an argument?), they are welcome to do so - I didn't want to alter what people are already using.

While these may seem like small savings, they actually amount to a significant impact when aggregated across the entire modex operation. Since every proc must receive the modex data regardless of the collective used to send it, just reducing the size of the process name removes nearly 400MBytes of communication from a 32k proc job (admittedly, much of this comm may occur in parallel). So it does add up pretty quickly.


3. routing RML messages to reduce connections. The default messaging system remains point-to-point - i.e., each proc opens a socket to every proc it communicates with and sends its messages directly. A new option uses the orteds as routers - i.e., each proc only opens a single socket to its local orted. All messages are sent from the proc to the orted, which forwards the message to the orted on the node where the intended recipient proc is located - that orted then forwards the message to its local proc (the recipient). This greatly reduces the connection storm we have encountered during startup.

It also has the benefit of removing the sharing of every proc's OOB contact with every other proc. The orted routing tables are populated during launch since every orted gets a map of where every proc is being placed. Each proc, therefore, only needs to know the contact info for its local daemon, which is passed in via the environment when the proc is fork/exec'd by the daemon. This alone removes ~50 bytes/process of communication that was in the current STG1 startup message - so for our 32k proc job, this saves us roughly 32k*50 = 1.6MBytes sent to 32k procs = 51GBytes of messaging.

Note that you can use the new routing method by specifying -mca routed tree - if you so desire. This mode will become the default at some point in the future.


There are a few minor additional changes in the commit that I'll just note in passing:

* propagation of command line mca params to the orteds - fixes ticket #1073. See note there for details.

* requiring of "finalize" prior to "exit" for MPI procs - fixes ticket #1144. See note there for details.

* cleanup of some stale header files

This commit was SVN r16364.
2007-10-05 19:48:23 +00:00
Brian Barrett
3a0067249c The previous hack to deal with Libtool not speaking Objective C stopped
working with Automake 1.10.  This is a new hack, which should be much
more flexible.  The ras doesn't contain any Objective C, so remove the
hack entirely from that Makefile.am.

This commit was SVN r16269.
2007-09-30 03:40:25 +00:00
Rolf vandeVaart
a87267ef92 Fix a build error on Solaris. MAXHOSTNAMELEN is defined in netdb.h.
This commit was SVN r16268.
2007-09-28 20:15:28 +00:00
Josh Hursey
665a1e280b Copyright updates that should have gone into r16252.
(Someday I'll learn to do this before committing)

This commit was SVN r16260.

The following SVN revision numbers were found above:
  r16252 --> open-mpi/ompi@e10f476c87
2007-09-27 14:37:04 +00:00
Josh Hursey
e10f476c87 Bring over the jjh-filem branch which contains a non-blocking FileM interface
and implementation. This has shown drastic performance benefit when
transferring Many files at roughly the same time.

I tested this for many different filem operations and everything was working
fine. Let me know if you have any problems with this functionality.

Some Notes:
 - opal-checkpoint now has a 'quiet' flag to keep it from being too verbose.

 - FileM RSH component is fully non-blocking.

 - FileM RSH component has incomming connection throttling since by default
   ssh only allows 10 concurrent scp connections to any single host. This
   default can be adjusted via an MCA parameter.
    {{{-mca filem_rsh_max_incomming 10}}}

 - There is an MCA parameter for max outgoing connections, but it is currently
   not implemented. If someone needs it then it should not be hard to implement.
    {{{-mca filem_rsh_max_outgoing 10}}}

 - Changed the FileM request structure so that it is a bit more explicit and
   flexible.

 - Moved the 'preload-binary' and 'preload-files' functionality into odls/base
   allowing for code reuse in the 'process' and 'default' ODLS components.

 - Fixed a bug in the process name resolution which broke the 'preload-*'
   functionality due to GPR table structure changes.

 - The FileM RSH component might be able to see even more speedup from using a
   thread pool to operate on the work_pool structures, but that is for future
   work.

 - Added a 'opal-show-help' file to ODLS Base

This commit was SVN r16252.
2007-09-27 13:13:29 +00:00
Josh Hursey
b5fc722c35 Add a flag to 'pretend' to do filem in snapc. This is useful when doing
performance characterization, and should not be used by anyone doing anything
else since it will not produce a globally consistent checkpoint in this mode.

This commit was SVN r16192.
2007-09-24 16:19:45 +00:00
Jeff Squyres
f9b9beba77 Allow the LSF components to be shipped in the nightly tarball and open
it up to others.

This commit was SVN r16143.
2007-09-17 22:42:33 +00:00
Shiqing Fan
d4a7fb1378 - A small fix of format.
This commit was SVN r16138.
2007-09-17 12:10:04 +00:00
George Bosilca
d32a54d74e There is no values[1] ... How did the compilers goes away with this !!!
This commit was SVN r16132.
2007-09-14 21:33:25 +00:00
George Bosilca
6897926dce Not used anymore.
This commit was SVN r16129.
2007-09-14 21:20:19 +00:00
Ralph Castain
45986ad2aa Add support to signal application procs for LSF
This commit was SVN r16120.
2007-09-13 18:09:14 +00:00
Ralph Castain
9fa254c017 Provide a better error message when a daemon unexpectedly dies under SLURM so we differentiate between fail to start and aborting while the app is running.
This commit was SVN r16115.
2007-09-12 20:53:50 +00:00
Josh Hursey
b4c68c0925 Turn back on the absolute path protection for the moment.
It is masking a bug that I'm tracking down in the SNAPC FULL - FILEM interations

Also make sure to cleanout the filem structure before asking for another
checkpoint file when not storing the files in place.

This commit was SVN r16109.
2007-09-12 18:19:39 +00:00
George Bosilca
e5d316dba6 Coverty: fix issues with using a string once it get freed. The problem, is that the
mca_base_register_string don't set the result to NULL is an error occurs.

This commit was SVN r16108.
2007-09-12 18:16:53 +00:00
Ralph Castain
f80ea093a2 Ensure that the orteds do not directly respond to USR1/2 signals. Those signals are trapped by mpirun and propagated from there - at most, the orteds are involved in the propagation process, but should never do anything on their own.
This commit was SVN r16098.
2007-09-12 14:32:31 +00:00
Shiqing Fan
548a4fe943 - Use IOVBASE_TYPE instead of char to avoid warnings on some systems.
This commit was SVN r16092.
2007-09-11 16:24:23 +00:00
Shiqing Fan
c1065d8262 - Some more type casts.
This commit was SVN r16087.
2007-09-11 11:28:43 +00:00
Brian Barrett
cfe737d1f9 Fix some mistaken error checks -- errors will be less than zero, not
greater than zero

This commit was SVN r16008.
2007-08-29 18:52:51 +00:00
Jeff Squyres
5628084fec Fix Coverity CID 463: remove unused variable / dead code.
This commit was SVN r15999.
2007-08-29 01:30:15 +00:00
Brian Barrett
dcf678dbab Fix heterogeneous issue with non-blocking RML receive, where the sender
field could be in the wrong endianness

This commit was SVN r15989.
2007-08-28 20:54:52 +00:00
Josh Hursey
729c63cf9d Fix invalid MCA 'base' names so they appear in ompi_info.
A subset of this patch needs to be applied to v1.2

Refs trac:928

This commit was SVN r15918.

The following Trac tickets were found above:
  Ticket 928 --> https://svn.open-mpi.org/trac/ompi/ticket/928
2007-08-18 03:05:45 +00:00
Brian Barrett
8294f6de03 The portals_utcp component doesn't actually need the POrtals libraries
and only pokes at environment variables.  So don't link in the libraries,
as that causes a whole other set of problems.

This commit was SVN r15899.
2007-08-17 03:48:39 +00:00
Andrew Friedley
2eedcd2539 Fixes trac:1047
Tie stdin to /dev/null to prevent stdin from being closed and thus making stdin not work in slurm allocations.

This commit was SVN r15892.

The following Trac tickets were found above:
  Ticket 1047 --> https://svn.open-mpi.org/trac/ompi/ticket/1047
2007-08-16 20:49:27 +00:00
Tim Prins
5a795128af Change it so that different components in orte use unique rml tags
This commit was SVN r15881.
2007-08-16 14:02:35 +00:00
Brian Barrett
330003361b * Free memory from asprintf
* need to compare ERANGE to errno

This commit was SVN r15860.
2007-08-14 21:12:00 +00:00
Brian Barrett
881dd0654e * Provide a hook so that a PLS can tell the orted it's starting that it
needs to override the default umask.  By default, this is not used
    since most environments do what the user would expect without any
    help.
  * Have TM use the newly added umask hook, so that processes inherit
    the user's umask from mpirun rather than the pbs_mom's umask, which
    the user has no control over.

This commit was SVN r15858.
2007-08-14 18:44:52 +00:00
Shiqing Fan
eea712f9ab - Export those components in correct way.
This commit was SVN r15804.
2007-08-08 16:20:17 +00:00
Brian Barrett
59524a9009 Fix issue where we set state to SHUTDOWN rather than CONNECTING when we
had to switch socket types.

This commit was SVN r15784.
2007-08-06 22:55:41 +00:00
Ralph Castain
eb3a97f428 Don't overwrite the local rank key
This commit was SVN r15776.
2007-08-06 16:56:23 +00:00
Shiqing Fan
d10570786c - A small fix, add missed flag parameters.
This commit was SVN r15774.
2007-08-06 16:15:38 +00:00
Josh Hursey
755658694e Bring in changes to support Cray's Compute Node Linux (CNL) and
Application Level Placement Scheduler (ALPS).

This commit was tested under two Cray machines at ORNL: Jaguar (Catamount)
and Rizzo (CNL Test cage). Both machines performed as they should across
the commit.

It is likely that mor changes will follow this the work and environment
stabilizes.

Most of the infrastructure works the same for Catamount and CNL
except for a few bits. Below are the highlights:

Default IFACE Change:
 On Catamount we can use PTL_IFACE_DEFAULT, but on the CNL system we have access
 to will fail on this interface, and should be set to:
    IFACE_FROM_BRIDGE_AND_NALID(PTL_BRIDGE_UK,PTL_IFACE_SS).
 So if we detect that we are running with YOD then use the former interface
 and if we detect that we are running with ALPS then use the latter.
 We will want to pursue a more elegant solution if this interface continues to 
 change across machines.

PtlGetId and cnos_register_ptlid:
 The header suggests that these should never be called when launching with YOD.
 But in the ALPS environment the cnos_barrier() will hang forever if these 
 functions are not called after PtlNIInit(). Since these functions only need to
 be called once, and the orte rmgr/cnos component is loaded before the ompi 
 common/portals componet then just call these functions once in the rmgr/cnos
 component.

cnos_barrier_init():
 This is a noop for YOD, but critical for ALPS. So be sure to call it before
 calling the first barrier in the rmgr/cnos component.

cnos_barrier vs cnos_pm_barrier:
 It is suggested the cnos_pm_barrier only be used during finalization 
 as it will indicate to the launcher (yod or aprun) that the app is about
 to complete. It was suggested that we use the regular cnos_barrier() instead.
 I want to look into this a bit more to make sure there are not adverse
 side effects. A note has been placed in the code to indicate this reasoning.

This commit was SVN r15756.
2007-08-03 19:46:38 +00:00
Jeff Squyres
106beff744 Ahem. Apparently we should be checking for ORTE_EQUAL upon return
from orte_ns.compare_fields(), not 0 (yes, they're the same [today],
but it is much better to check for symbolic names...).

This commit was SVN r15731.
2007-08-01 18:59:37 +00:00
Jeff Squyres
8d4b6c7b0d The HNP changing into an orted brought a bug in the iof svc component
to light: we weren't ack'ing properly for streams that originated (or
originated via proxy) and terminated within the HNP.  This commit
fixes that.

It also fixes a few style issues, and added some more opal_outputs for
debugging.  Also, fixed a bug where the fact that we forwarded (and
therefore might need to update the ack) was not correctly reported if
there were multiple forwards (which there are not as the system is
currently using IOF, but there could be).

Refs trac:1098 -- want to get another pair of eyes to look at this before
I close the ticket.

This commit was SVN r15730.

The following Trac tickets were found above:
  Ticket 1098 --> https://svn.open-mpi.org/trac/ompi/ticket/1098
2007-08-01 18:38:03 +00:00
Ralph Castain
066ff38d42 Ensure we read all the reported URI contact info when we fork an HNP for singleton support
This commit was SVN r15714.
2007-07-31 18:55:08 +00:00
George Bosilca
2e2bf472ff Mark the orte_abort function as noreturn and change the return value from
int to void. This function call exit at the end, so there is no way to
return from there. Apply the same thing to the errmsg_abort function and
update all components.

This commit was SVN r15704.
2007-07-31 16:09:52 +00:00
Sven Stork
855434de59 - fixes several coverty issues
- add missing initialisation for variables
  - use strncpy instead of strcpy

This commit was SVN r15683.
2007-07-30 14:44:37 +00:00
Rainer Keller
2c5d07217d - Coverity: use snprintf, instead of sprintf....
This commit was SVN r15669.
2007-07-29 11:23:23 +00:00
Jeff Squyres
3858cf48c0 Stop using the deprecated ORTE_NAME_ARGS() and switch to
ORTE_NAME_PRINT().

This commit was SVN r15665.
2007-07-27 13:33:20 +00:00
Josh Hursey
acbc8ecca3 - On Cray XT systems stop the grpcomm basic component from building.
grpcomm cnos component
  - Remove the .ompi_ignore
  - add a configure.m4 that should keep it from building on any system
    other than Cray XT* (copied from rml/cnos)
  - Fix some mis-named symbols resulting from cut/paste errors.

This patch brings the Cray build back into 'working' order.

This commit was SVN r15651.
2007-07-26 20:42:06 +00:00
Jeff Squyres
188d529beb * We *do* need the LSF task ID as part of our vpid
* Accidentally had the PLS LSF using the env SDS; switch it back to
   the LSF SDS

This commit was SVN r15650.
2007-07-26 20:22:36 +00:00
Josh Hursey
e5a03e7734 - Remove Makefile.in from version control
- Add back support for cnos (copy functionality lost by moving the interface
  from the RML).
- Fix some cut/paste errors.

This commit was SVN r15646.
2007-07-26 18:52:17 +00:00
Jeff Squyres
75192de1fc LSF support is now working. W00t! May be subject to a further tweak
or two.

 * checking lsb_init() is not sufficient to know whether you're in an
   LSF job or not; you also need to check for environment variable
   markers 
 * remove lots of debugging output
 * no need for the sds lsf to call lsb_init()
 * remove some slurm-like dead code and a copy-n-paste error in the
   sds lsf

This commit was SVN r15644.
2007-07-26 18:49:29 +00:00
Jeff Squyres
8e9c71282d Add a bunch more [conditional] debugging output.
This commit was SVN r15643.
2007-07-26 18:46:46 +00:00
Rich Graham
60df8be1a7 initial code - does not even compile, but Josh is picking up on this.
This commit was SVN r15641.
2007-07-26 17:55:51 +00:00
Brian Barrett
801fffabff Don't assume things about the contact info string in the general case. There
is no need for the IP address in most cases (filem being one dubious
exception), so just publish and hand around the supposedly opaque contact
info strings

This commit was SVN r15638.
2007-07-26 16:51:41 +00:00
Brian Barrett
e537cc0871 * Add documentation for RML base code
* Move function declaration out of base.h as it isn't needed
    outside the base code

This commit was SVN r15616.
2007-07-25 16:19:29 +00:00
Brian Barrett
f06b61cff9 Don't use the OOB TCP key for contact information, remove the need to
include a not so public header file.  FIxes a compile error on the Cray.

This commit was SVN r15613.
2007-07-25 15:12:07 +00:00
George Bosilca
00796cfdab Make sure the oob_tcp_windows_progress_callback is registered
in all cases. This is now done in the oob tcp open function.
As a result, the unregistering have to be done in the close
function.

This commit was SVN r15603.
2007-07-25 05:55:14 +00:00
George Bosilca
5d8a70e434 Update the Windows ODLS.
This commit was SVN r15600.
2007-07-25 03:57:25 +00:00
George Bosilca
c961cb5749 The Windows support is now back in bussiness.
This commit was SVN r15599.
2007-07-25 03:55:34 +00:00
Brian Barrett
4e23c7c5a2 Fixes for case where IPv6 support is disabled. Fixes trac:1102.
This commit was SVN r15584.

The following Trac tickets were found above:
  Ticket 1102 --> https://svn.open-mpi.org/trac/ompi/ticket/1102
2007-07-24 17:01:39 +00:00
Josh Hursey
1b177cd029 This component is checkpointable.
This commit was SVN r15567.
2007-07-23 20:20:28 +00:00
Josh Hursey
a24e530f8e Some C/R fixes (more to come)
r15390 - Changed the paradigm in which the runtime worked by enabling the mpirun
process to become an orted and spawn processes. This broke the C/R for this 
special case as it required that the orted start the process, and that 
the hierarchy remains.
The fix was to allow the global coordinator to be a local coordinator as well
for this case.

r15528 - Changed the selection logic for the RML. This caused the application to
segv if the 'ftrm' wrapper component was selected as it tried to modify a NULL
pointer.
The fix was to move the 'module swap' code into the init() function, and swap
when passed a NULL pointer. It sounds bad, but actually cleans up the code a bit
more.

Still have to fix the 'routed' framework.

This commit was SVN r15566.

The following SVN revision numbers were found above:
  r15390 --> open-mpi/ompi@bd65f8ba88
  r15528 --> open-mpi/ompi@39a6057fc6
2007-07-23 20:13:37 +00:00
Ralph Castain
f219cc1e6e A few changes to the lsf components - mostly cleanup, no major logic changes
This commit was SVN r15563.
2007-07-23 18:38:36 +00:00
Ralph Castain
2017d52df0 Cleanup a few compiler warnings
This commit was SVN r15560.
2007-07-23 18:30:40 +00:00
Sven Stork
4c031de1ab - fix typo to export component structure in the case of visibility enabled
- use BEGIN/END_C_DECLS

This commit was SVN r15559.
2007-07-23 17:33:13 +00:00
Ralph Castain
db267899be Setup and use a tsd ring buffer to avoid overwriting process name outputs when print is called multiple times in same output statement.
This commit was SVN r15558.
2007-07-23 17:27:14 +00:00
Sven Stork
baf5e4b596 - add orte_config.h as first file to be included
- export required symbol

This commit was SVN r15556.
2007-07-23 15:50:55 +00:00
Ralph Castain
ef141d1fbc Ensure daemons know contact info for all other daemons. Update binomial xcast to work in revised design. Add debug output to orted so the daemon lets us know it launched (if --debug-daemons set) early on in case it fails during orte_init
This commit was SVN r15555.
2007-07-23 15:00:39 +00:00
Ralph Castain
6c800d452d Bring orte tests up to date with revised rml system.
Make first cut at fixing non-direct xcast modes

This commit was SVN r15553.
2007-07-23 13:05:34 +00:00
Jeff Squyres
78d214fec8 Oops -- didn't mean to commit the test program...
This commit was SVN r15538.
2007-07-20 20:15:51 +00:00
Jeff Squyres
2baa866026 Compiles to the new API, but doesn't quite work yet...
This commit was SVN r15537.
2007-07-20 19:49:27 +00:00
George Bosilca
1751b289ed Avoid a compiler warning about uninitialized variables.
This commit was SVN r15534.
2007-07-20 04:07:19 +00:00
George Bosilca
172a4fa543 Includ a missing header file.
This commit was SVN r15532.
2007-07-20 03:24:52 +00:00
Brian Barrett
9a476c7fdd The print function doesn't change the process name, so take a const...
This commit was SVN r15531.
2007-07-20 02:54:43 +00:00
Brian Barrett
5b9fa7e998 reapply r15517 and r15520, which were removed in r15527 so that I could get
the RML/OOB merge in slightly easier

This commit was SVN r15530.

The following SVN revision numbers were found above:
  r15517 --> open-mpi/ompi@41977fcc95
  r15520 --> open-mpi/ompi@9cbc9df1b8
  r15527 --> open-mpi/ompi@2d17dd9516
2007-07-20 02:34:29 +00:00
Brian Barrett
39a6057fc6 A number of improvements / changes to the RML/OOB layers:
* General TCP cleanup for OPAL / ORTE
  * Simplifying the OOB by moving much of the logic into the RML
  * Allowing the OOB RML component to do routing of messages
  * Adding a component framework for handling routing tables
  * Moving the xcast functionality from the OOB base to its own framework

Includes merge from tmp/bwb-oob-rml-merge revisions:

    r15506, r15507, r15508, r15510, r15511, r15512, r15513

This commit was SVN r15528.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r15506
  r15507
  r15508
  r15510
  r15511
  r15512
  r15513
2007-07-20 01:34:02 +00:00
Brian Barrett
2d17dd9516 temporarily back our r15517 and 15520 so that I can get the RML / OOB changes
to cleanly apply

This commit was SVN r15527.

The following SVN revision numbers were found above:
  r15517 --> open-mpi/ompi@41977fcc95
2007-07-20 01:10:34 +00:00
George Bosilca
9cbc9df1b8 Export the orte_ns_base_print_name_args function.
This commit was SVN r15520.
2007-07-19 21:45:27 +00:00
Ralph Castain
41977fcc95 Remove the cellid field from the orte_process_name_t structure. This only affects a handful of files in itself, but...
Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point.

Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings.

This commit was SVN r15517.
2007-07-19 20:56:46 +00:00
Tim Prins
e41f86dfe6 add a small amount of debugging output
This commit was SVN r15483.
2007-07-18 15:20:55 +00:00
Sven Stork
c43335d671 - release the lock before we forward a message. In the threaded
build it's possible that we have to process an ack before this function
  returns. If we don't release the lock here we cause a deadlock later
  in ack processing function.

This commit was SVN r15441.
2007-07-16 14:43:24 +00:00
Ralph Castain
5121dfe7e7 With the changes to the failed-to-start logic, we need to revise the odls so it doesn't overwrite the exit status on procs that are not found. Otherwise, we lose the appropriate error message to the user.
This commit was SVN r15440.
2007-07-16 13:50:26 +00:00
Ralph Castain
cd9213a9f0 Son of a gun - how did that fix not get into r15390?
Fix the blasted iof null component so it only is selected if/when directed.

This commit was SVN r15437.

The following SVN revision numbers were found above:
  r15390 --> open-mpi/ompi@bd65f8ba88
2007-07-16 12:38:11 +00:00
Ralph Castain
d109e9a6f4 Roll in the Voltaire core/socket/etc process mapping implementation. Only change I made was to cleanup some of the diagnostic output in the odls_default component so it uses the -mca odls_base_verbose parameter.
You will not see any impact from this change unless you use the syntax described in ticket #1023. I've tried as many of the RAS components as possible and saw no problem - there may be issues with other RAS components that would not compile on any of my systems. Anything that appears should be trivial to fix.

This commit was SVN r15427.
2007-07-14 15:14:07 +00:00
Tim Prins
d2f0806420 Fix a problem introduced by r15390 which was causing strange failures in numerous areas.
We no longer store whether we are a singleton in a MCA parameter, we now use a global constant. So all references to the MCA parameter must be removed.

This commit was SVN r15408.

The following SVN revision numbers were found above:
  r15390 --> open-mpi/ompi@bd65f8ba88
2007-07-13 17:52:16 +00:00
Ralph Castain
2bded34a1d Fix a problem observed by Brian where processes launched local to mpirun lost their environment except for MCA params.
The problem stemmed from no longer launching a local orted on the same node as mpirun. The orted would save and reuse the base environment. Mpirun didn't do that, and the odls was using the orted's globally saved environment (which wasn't being set).

This fix establishes a globally accessible base launch environment that both the orted and mpirun can utilize. Since we now use that, we don't need to pass it to the odls_launch_proc function, so remove that param from the API (and modify all components to handle the change).

This commit was SVN r15405.
2007-07-13 15:47:57 +00:00