1
1
Граф коммитов

209 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
e653da1d11 Where or where did that patch go??? Ah - there it went! ;-)
Fix singleton operations - allow multiple xcasts to be queued.

This commit was SVN r15097.
2007-06-15 13:45:29 +00:00
George Bosilca
a4d99ddef6 More synchronizations for the Windows version. The problem came from
the multiple threads accessing the OOB/registry asynchronously via the
callbacks. The quickest solution (but definitively not the cleanest) is
to serialize these callbacks in such a way that at any given time
only one thread can execute a callbacks.

This commit was SVN r15086.
2007-06-14 22:35:38 +00:00
George Bosilca
fb9ff5cc75 Don't remove the tcp events from the list, they will remove themselves
in the destructor.

This commit was SVN r15085.
2007-06-14 22:33:09 +00:00
George Bosilca
95a607b945 A more Windows friendly version. As the socket event will be generated
through the win dll using multiple threads, we have to insure that
the oob callbacks happens only in a synchronous way or really bad
things happens with the current design (blocking messages from a receive
callback).

This commit was SVN r15069.
2007-06-14 04:38:06 +00:00
Ralph Castain
5adef03179 Clean up a diagnostic so it only outputs when requested
This commit was SVN r15048.
2007-06-13 15:53:10 +00:00
George Bosilca
715f6012cf The DSS pack function can use the const attribute for the src field
as it is never modified by the pack functions directly. Enforce it
all over the code base.

This commit was SVN r15026.
2007-06-12 22:47:14 +00:00
Brian Barrett
84d1512fba Add the potential for doing some basic error checking on mutexes during
single threaded builds.  In its default configuration, all this does
is ensure that there's at least a good chance of threads building
based on non-threaded development (since the variable names will be
checked).  There is also code to make sure that a "mutex" is never
"double locked" when using the conditional macro mutex operations.
This is off by default because there are a number of places in both
ORTE and OMPI where this alarm spews mega bytes of errors on a
simple test.  So we have some work to do on our path towards
thread support.

Also removed the macro versions of the non-conditional thread locks,
as the only places they were used, the author of the code intended
to use the conditional thread locks.  So now you have upper-case
macros for conditional thread locks and lowercase functions for
non-conditional locks.  Simple, right? :).

This commit was SVN r15011.
2007-06-12 16:25:26 +00:00
Tim Prins
1467558157 Cleanup a couple warnings.
Update svn:ignore

This commit was SVN r15009.
2007-06-12 14:11:06 +00:00
Ralph Castain
85df3bd92f Bring in the generalized xcast communication system along with the correspondingly revised orted launch. I will send a message out to developers explaining the basic changes. In brief:
1. generalize orte_rml.xcast to become a general broadcast-like messaging system. Messages can now be sent to any tag on the daemons or processes. Note that any message sent via xcast will be delivered to ALL processes in the specified job - you don't get to pick and choose. At a later date, we will introduce an augmented capability that will use the daemons as relays, but will allow you to send to a specified array of process names.

2. extended orte_rml.xcast so it supports more scalable message routing methodologies. At the moment, we support three: (a) direct, which sends the message directly to all recipients; (b) linear, which sends the message to the local daemon on each node, which then relays it to its own local procs; and (b) binomial, which sends the message via a binomial algo across all the daemons, each of which then relays to its own local procs. The crossover points between the algos are adjustable via MCA param, or you can simply demand that a specific algo be used.

3. orteds no longer exhibit two types of behavior: bootproxy or VM. Orteds now always behave like they are part of a virtual machine - they simply launch a job if mpirun tells them to do so. This is another step towards creating an "orteboot" functionality, but also provided a clean system for supporting message relaying.

Note one major impact of this commit: multiple daemons on a node cannot be supported any longer! Only a single daemon/node is now allowed.

This commit is known to break support for the following environments: POE, Xgrid, Xcpu, Windows. It has been tested on rsh, SLURM, and Bproc. Modifications for TM support have been made but could not be verified due to machine problems at LANL. Modifications for SGE have been made but could not be verified. The developers for the non-verified environments will be separately notified along with suggestions on how to fix the problems.

This commit was SVN r15007.
2007-06-12 13:28:54 +00:00
Brian Barrett
27ad954265 Fix a couple of problems with the way we were using orte_process_name_t
structures in the system.  Instead of using memcmp, use the ns function.
This won't cause a problem as long as all three elements of the name are
ints, but if they have different sizes, alignment and padding rules
can cause memcmp() to compare padding space, which rarely holds a sane
value.

This commit was SVN r14998.
2007-06-11 19:12:11 +00:00
Jeff Squyres
9fb2e807a9 Remove some unnecessary code, probably dating back to before we had
generalized component include/exclude infrastructure.  This commit
removes the oob_base_include and oob_base_exclude MCA params because
they have long-since been handled by the "oob" MCA parameter in the
MCA base.

This commit was SVN r14979.
2007-06-10 14:16:05 +00:00
Ralph Castain
983fd3432a Fix singleton comm_spawn. Ensure that singleton's start the RML receive function so they can receive RML updates during xconnect procedures once any comm_spawn'd children start. Since singleton's only use the RMGR/URM component, update that component to also hold us until xconnect is completed (if it is invoked) before returning to the caller.
This commit was SVN r14914.
2007-06-06 17:39:23 +00:00
Brian Barrett
11ae30333d strmode is a standard BSD function defined in string.h
This commit was SVN r14828.
2007-05-31 22:47:40 +00:00
Rolf vandeVaart
ec963d02c7 Fix debug timing output so that proper algorithm is printed for
each type of xcast (direct, linear, binomial).

This commit was SVN r14822.
2007-05-31 18:27:54 +00:00
Brian Barrett
e4b369c93e Properly handle case where user instructs the oob to not use all non-localhost
interfaces

This commit was SVN r14815.
2007-05-31 02:29:44 +00:00
George Bosilca
f8f71b9ba0 Correct a threaded problem and make sure we only free what was allocated.
This commit was SVN r14803.
2007-05-30 18:50:29 +00:00
Jeff Squyres
379a4ec5e2 While we're editing MCA params in the oob tcp component, ditch the use
of the deprecated MCA param API for registering MCA parameters and
update to the current API.

This commit was SVN r14747.
2007-05-24 13:01:55 +00:00
Jeff Squyres
839c1db95c Fix something that has been bugging me for a while:
Rename the oob_tcp_include and oob_tcp_exclude MCA parameters to be
oob_tcp_if_include and oob_tcp_if_exclude (to match the convention
with btl_tcp_if_[in|ex]clude).  Keep "hidden" synonyms oob_tcp_include
and oob_tcp_exclude in case anyone is actually using them (and some
users undoubtedly are), but do not have them show up in ompi_info
--param output.  Instead, the new "oob_tcp_if_*" names will show up in
ompi_info output.

This commit was SVN r14746.
2007-05-24 12:52:26 +00:00
Ralph Castain
02f6e6ab3e Slight touchup to make it pretty
This commit was SVN r14734.
2007-05-23 16:39:18 +00:00
Ralph Castain
3fc227286f Be sure to NULL terminate the list of keys...
This commit was SVN r14733.
2007-05-23 16:35:03 +00:00
Ralph Castain
e6ff7757ab Modify the new DSS xfer and copy functions so they only xfer/copy the unpacked portion of a buffer's payload. This allows for more rapid transfer of data during message relay without requiring any knowledge of what is in the buffer.
Begin work on restoring binomial message distribution method.

This commit was SVN r14728.
2007-05-23 14:06:32 +00:00
Ralph Castain
5b0abf520b Don't update our own contact info
This commit was SVN r14718.
2007-05-22 13:28:23 +00:00
Ralph Castain
4fff584a68 Commit the orted-failed-to-start code. This correctly causes the system to detect the failure of an orted to start and allows the system to terminate all procs/orteds that *did* start.
The primary change that underlies all this is in the OOB. Specifically, the problem in the code until now has been that the OOB attempts to resolve an address when we call the "send" to an unknown recipient. The OOB would then wait forever if that recipient never actually started (and hence, never reported back its OOB contact info). In the case of an orted that failed to start, we would correctly detect that the orted hadn't started, but then we would attempt to order all orteds (including the one that failed to start) to die. This would cause the OOB to "hang" the system.

Unfortunately, revising how the OOB resolves addresses introduced a number of additional problems. Specifically, and most troublesome, was the fact that comm_spawn involved the immediate transmission of the rendezvous point from parent-to-child after the child was spawned. The current code used the OOB address resolution as a "barrier" - basically, the parent would attempt to send the info to the child, and then "hold" there until the child's contact info had arrived (meaning the child had started) and the send could be completed.

Note that this also caused comm_spawn to "hang" the entire system if the child never started... The app-failed-to-start helped improve that behavior - this code provides additional relief.

With this change, the OOB will return an ADDRESSEE_UNKNOWN error if you attempt to send to a recipient whose contact info isn't already in the OOB's hash tables. To resolve comm_spawn issues, we also now force the cross-sharing of connection info between parent and child jobs during spawn.

Finally, to aid in setting triggers to the right values, we introduce the "arith" API for the GPR. This function allows you to atomically change the value in a registry location (either divide, multiply, add, or subtract) by the provided operand. It is equivalent to first fetching the value using a "get", then modifying it, and then putting the result back into the registry via a "put".

This commit was SVN r14711.
2007-05-21 18:31:28 +00:00
Brian Barrett
33a5758521 Some IPv6 improvements:
* Move ipv6comat.h code into opal_config_bottom.h and change into some
    more intelligent testing of structures
  * Change opal's if interface to use sockaddr instead of sockaddr_storage,
    as the RFCs suggest we do
  * Move the networking code in opal that isn't directly related to if
    detection into net.h
  * Add quicky function to get the port out of either a sockaddr_in
    or sockaddr_in6, saving a bunch of code in the oob.
  * Update TCP oob and btl with new interface

This commit was SVN r14679.
2007-05-17 01:17:59 +00:00
Sven Stork
22af6d38e6 - UNexport symbols that shouldn't be needed outside the libraries
- replace #if/#endif with BEGIN/END_C_DECLS
- reformating

This commit was SVN r14669.
2007-05-16 15:46:52 +00:00
Jeff Squyres
c5782642d9 Fix some param names so that they show up when you "ompi_info --param
oob all".

This commit was SVN r14646.
2007-05-11 20:58:11 +00:00
Sven Stork
3707207cca - we don't need to export this symbol
This commit was SVN r14593.
2007-05-07 13:05:52 +00:00
Josh Hursey
596062d34b Seems that the recent changes in the sds and oob exposed some invalid
assumptions in the FT restart code for the ORTE layer.

This fixes those problems by having the RML completely shutdown and 
restart the OOB framework (instead of just the module as before).
This makes it much easier to manage, and maintainable as the OOB
changes in the future.

The SDS now does communication as part of its startup procedure, so
we need to make sure we restart the RML before the SDS so that it can
communicate properly.

OOB base [close|open] used a static bool to determine if they have
been called previously or not. I needed to expose this boolean so 
that I can close() then open() the oob base in the restart procedure.
The functionality has not changed, we just now have the ability to 
open/close the framework as many times as we need to as long as we
always call them in that order. (So calling open twice in a row is not allowed
as before, it is only allowed if you open(), close(), then open() again).

Things seem to be working now.

This commit was SVN r14515.
2007-04-25 19:51:52 +00:00
Brian Barrett
4b8bb70afb A couple cleanups for the IPv6 support:
- make opal_sockaddr2str() take a sockaddr_storage instead of a sockaddr_in6
    so that it works for IPv4 and IPv6 addresses, and remove a whole bunch
    of #ifs in the OOOB code.
  - Fix a compiler warning in the TCP BTL due to run-time determined
    array size by making it a dynamicly allocated array.
  - Fix the unpacking code of IPv4 addresses when using IPv6 support, so
    that the address is in the correct location (instead of in an IPv6
    structure, use an IPv4 structure).  Refs trac:1005.

This commit was SVN r14514.

The following Trac tickets were found above:
  Ticket 1005 --> https://svn.open-mpi.org/trac/ompi/ticket/1005
2007-04-25 19:08:07 +00:00
Adrian Knoth
d1ce39de4f Move mca_btl_tcp_addr_isipv4public to opal_addr_isipv4public
This commit was SVN r14512.
2007-04-25 18:06:06 +00:00
Adrian Knoth
35fce38f43 Don't know why this line was here.
This commit was SVN r14509.
2007-04-25 12:31:13 +00:00
Ralph Castain
8517a5a3a6 cleanup a few compiler warnings
This commit was SVN r14507.
2007-04-25 11:51:18 +00:00
Jeff Squyres
c4c68e666a Merge in the ipv6 work from /tmp/ipv6-merge.
This commit was SVN r14503.
2007-04-25 01:55:40 +00:00
Ralph Castain
18b2dca51c Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly.
There is a binomial algorithm in the code (i.e., the HNP would send to a subset of the orteds, which then relay it on according to the typical log-2 algo), but that has a bug in it so the code won't let you select it even if you tried (and the mca param doesn't show, so you'd *really* have to try).

This also involved a slight change to the oob.xcast API, so propagated that as required.

Note: this has *only* been tested on rsh, SLURM, and Bproc environments (now that it has been transferred to the OMPI trunk, I'll need to re-test it [only done rsh so far]). It should work fine on any environment that uses the ORTE daemons - anywhere else, you are on your own... :-)

Also, correct a mistake where the orte_debug_flag was declared an int, but the mca param was set as a bool. Move the storage for that flag to the orte/runtime/params.c and orte/runtime/params.h files appropriately.

This commit was SVN r14475.
2007-04-23 18:41:04 +00:00
Jeff Squyres
51f286d737 Just like r14289 on the ORTE trunk:
Per discussions with Brian and Ralph, make a slight correction in
where components are installed. Use $pkglibdir, not $libdir/openmpi,
so that when compiled in the orte trunk, components are installed to
the right directory (because the component search patch is checking
$pkglibdir).

This commit was SVN r14345.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r14289
2007-04-12 11:19:42 +00:00
George Bosilca
1c037df7e7 Only print information if the condition is met.
This commit was SVN r14340.
2007-04-12 07:28:18 +00:00
George Bosilca
cad93a7693 Add more output. Fix some typos, and some small cleanups.
This commit was SVN r14327.
2007-04-12 05:01:29 +00:00
Brian Barrett
13a4bba13f Yet another dumb thing that shouldn't have been in r14261.
This commit was SVN r14263.

The following SVN revision numbers were found above:
  r14261 --> open-mpi/ompi@8a55c84d0b
2007-04-07 23:23:23 +00:00
Brian Barrett
32f0090f81 fix dumb variable scope mistake
This commit was SVN r14262.
2007-04-07 23:00:57 +00:00
Brian Barrett
8a55c84d0b Fix a number of OOB issues:
* Remove the connect() timeout code, as it had some nasty race conditions
    when connections were established as the trigger was firing.  A better
    solution has been found for the cluster where this was needed, so just
    removing it was easiest.
  * When a fatal error (too many connection failures) occurs, set an error
    on messages in the queue even if there isn't an active message.  The
    first message to any peer will be queued without being active (and
    so will all subsequent messages until the connection is established),
    and the orteds will hang until that first message completes.  So if
    an orted can never contact it's peer, it will never exit and just sit
    waiting for that message to complete.
  * Cover an interesting RST condition in the connect code.  A connection
    can complete the three-way handshake, the connector can even send
    some data, but the server side will drop the connection because it
    can't move it from the half-connected to fully-connected state because
    of space shortage in the listen backlog queue.  This causes a RST to
    be received first time that recv() is called, which will be when waiting
    for the remote side of the OOB ack.  In this case, transition the
    connection back into a CLOSED state and try to connect again.
  * Add levels of debugging, rather than all or nothing, each building on
    the previous level.  0 (default) is hard errors.  1 is connection 
    error debugging info.  2 is all connection info.  3 is more state
    info.  4 includes all message info.
  * Add some hopefully useful comments

This commit was SVN r14261.
2007-04-07 22:33:30 +00:00
Galen Shipman
48d1fa830d A race condition exists on the free list of pending connections because
OPAL_FREE_LIST_WAIT/RETURN will not use locks in a non-threaded build
conditionaly use locks if non-threaded around the OPAL_FREE_LIST_WAIT/RETURN 
seems to fix the issue 
Tested at 4K processes and seems to work.. 

This commit was SVN r14135.
2007-03-23 15:19:03 +00:00
Brian Barrett
d454395b51 Need to fall back on the event listen mode if the MCA parameter said use the
listen thread, but we're not the HNP.  This is better than not starting up
any listen mode, which is what we were doing before :/

This commit was SVN r14133.
2007-03-23 13:29:18 +00:00
Galen Shipman
e654604a25 remove invalid comment
This commit was SVN r14118.
2007-03-22 03:51:36 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Brian Barrett
f6a5d58885 Rather than set the connect event timeout number to something big and hoping
its bigger than the timeout for the connect() call, just don't register
the handler by default and fall back to connect() timing out.  Should give
much happier performance on big clusters.

This commit was SVN r13639.
2007-02-13 18:36:50 +00:00
Brian Barrett
262cbbc5c9 Back out r13593, which contained a change that shouldn't be committed.
This commit was SVN r13594.

The following SVN revision numbers were found above:
  r13593 --> open-mpi/ompi@81472363ea
2007-02-09 20:13:02 +00:00
Brian Barrett
81472363ea Allow the OOB to connect between all MPI applications during MPI_INIT
without also establishing MPI connectivity.

This commit was SVN r13593.
2007-02-09 20:11:40 +00:00
Jeff Squyres
c91fcd7fbd Fix a bunch of minor typos submitted by Bernhard Fischer.
This commit was SVN r13505.
2007-02-06 12:00:30 +00:00
George Bosilca
9f73335bdb Silence the compiler.
This commit was SVN r13381.
2007-01-31 04:24:56 +00:00
Rainer Keller
061ba05439 - Fixes uncovered with the format attribute to
opal_output and opal_output_verbose

This commit was SVN r13371.
2007-01-30 20:56:31 +00:00
George Bosilca
1e38810c2d Correctly close the sockets on a generic way.
This commit was SVN r13254.
2007-01-23 03:17:23 +00:00
Jeff Squyres
6f7adfe231 Fix for the oob base open and close functions being invoked twice by
ompi_info -- once directly and once via the rml oob component.

This commit was SVN r13152.
2007-01-17 15:18:13 +00:00
Brian Barrett
03112254e7 Increase connection timeout to 600 seconds, which should always be higher than
the connect() timeout, so that we'll use that rather than our own timeout by
defualt.  There timeout was set low for Big Red, but causes problems for very
large clusters, as there's no way to wire them up in 10 seconds most of the
time.

This commit was SVN r13062.
2007-01-10 04:53:21 +00:00
Brian Barrett
a34e67d743 Remove unneeded PARAM_INIT_FILE variable in configure.params files used by
components that use configure.m4 for configuration or are always built. 
The macro has not been needed since moving to configure types other than
configure.stub

Fixes trac:590

This commit was SVN r13031.

The following Trac tickets were found above:
  Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590
2007-01-08 03:44:22 +00:00
Ralph Castain
6101050ea6 Remove an abstraction barrier I thought was gone long-ago. The OOB subscription really shouldn't be defined as an OMPI subscription.
I know it's just a technicality, but it is time to address such things rather than just letting them continue to propagate. :-)

This commit was SVN r12954.
2007-01-02 16:16:50 +00:00
Brian Barrett
38c2e43ac2 Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues...
This commit was SVN r12852.
2006-12-14 18:20:43 +00:00
Ralph Castain
0a5d41857a Complete next round of message size reduction: "strip" the descriptive info from the returned values. I have now added a flag to the gpr address mode (ORTE_GPR_STRIPPED) that instructs the gpr to not include segment names or tokens in the returned gpr_value_t objects.
I found only two places that were looking at the tokens:

1. the odls - we used the tokens to separately process the globals container data from everything else. In this case, I left the subscription that returned the globals data alone, but "stripped" the subscription that returned the launch data for the procs. These subscriptions have nothing to do with the xcast message.

2. the pml_base_modex - the callback function was getting process names from the returned tokens. Actually, this function was doing a very bad thing - it was assuming that the first token returned was *always* the process name. This is currently true, but is one of those assumptions that someone could have easily changed - and suddenly found the system inexplicably failing. I modified the function to (a) get the name sent back to us, (b) "stripped" the value structures of tokens and segment strings, and (c) correctly obtained process names from the returned values. I also reindented the heck out of the code so it was legible (at least, to my old eyes).

This commit was SVN r12813.
2006-12-09 23:10:25 +00:00
Ralph Castain
a1153fdc8f Eliminate virtually all of the attribute_predefined data from the STG1 message. We now compute the total number of slots allocated to us and save that in the registry - the attributed_predefined then retrieves it via the STG1 message. The app_num is passed via the process_info structure, which gets the value from the ODLS in the environment.
Obviously, people like bproc will have to get the app_num via another avenue...but that's a problem for another day. Several options are easily available.

This commit was SVN r12788.
2006-12-07 03:11:20 +00:00
Brian Barrett
6f8b366acb Rename liborte to libopen-rte and libopal to libopen-pal per telecon today
and bug #632.

Refs trac:632

This commit was SVN r12762.

The following Trac tickets were found above:
  Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632
2006-12-05 18:27:24 +00:00
George Bosilca
a0ed53d70b Make the compilers happy.
This commit was SVN r12729.
2006-12-03 00:19:11 +00:00
George Bosilca
3fd278c522 Make the tree compile in debug mode.
This commit was SVN r12724.
2006-12-01 23:03:09 +00:00
Ralph Castain
897744cdeb Two major changes to the runtime:
1. implement and enable the non-described buffer operations. I will send out a more detailed explanation separately. However, this mode of operation (which is now the default) significantly reduces message size during startup. If you want the described buffers, set the mca param "-mca dss_describe_buffer 1".

2. revise the xcast system to support both linear and binomial tree broadcast methods. Since we are seeing scenarios where the binomiall tree can cause problems, I have made the linear method the default. To run with the binomial tree, set the mca param "-mca oob_xcast_mode binomial".

3. add some detailed timing reports to the xcast operation. These are enabled via "-mca oob_xcast_timing 1".

4. add some more unit tests for the dss and gpr (focused on support for the non-described buffer)

This commit was SVN r12722.
2006-12-01 22:30:39 +00:00
Ralph Castain
bc4e97a435 First stage in the move to a faster startup. Change the ORTE stage gate xcast into a binary tree broadcast (away from a linear broadcast). Also, removed the timing report in the gpr_proxy component that printed out the number of bytes in the compound command message as the answer was "not much" - reduces the clutter in the data.
This commit was SVN r12679.
2006-11-28 00:06:25 +00:00
Brian Barrett
0895f5e08d Rename OMPI_PROCESS_NAME_{HTON, NTOH} macros to ORTE_PROCESS_NAME_{HTON, NTOH}
because they are in ORTE, not OMPI.  Also, remove the ORTE_PROCESS_NAME macros
in iof base as they are duplicates of the ones that were in ns_types, which 
meant that bad things happened if you changed what an orte_process_name_t
looked like.

This commit was SVN r12646.
2006-11-22 03:03:21 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
Galen Shipman
68d9922f44 enable/disable connection sleep in oob_tcp.c via mca param.. on by default..
This commit was SVN r12444.
2006-11-06 18:00:46 +00:00
Ralph Castain
c77f6c605e Update timing reports:
1. Remove timing of xcast from mpi_init

2. Add timing report from oob_xcast on how long it took to send the message

This commit was SVN r12428.
2006-11-03 18:55:05 +00:00
Ralph Castain
60e27c77e7 Add some additional timing reporting:
1. Added reporting points around the xcasts in MPI_Init. Note that these times will include time spent waiting for a trigger to fire, which is why the times between stage gates did NOT include these times initially. The inter-stage-gate times still do NOT include the xcast time - the xcast time is reported separately.

2. Added the process vpid on the MPI_Init timing reports for clarity.

3. Added a report from the xcast function on the HNP that outputs the number of bytes in the message being sent to the processes.

This commit was SVN r12422.
2006-11-03 16:04:40 +00:00
Brian Barrett
d6ff14ed61 Hand-pack the connection information for each peer rather than just
packing a sockaddr_in, as there are some endianness and padding issues
with sending a sockaddr_in.  Note that the sin_port and sin_addr are
already in network byte order, which is why we pack them as a byte
string.

Refs trac:493

This commit was SVN r12301.

The following Trac tickets were found above:
  Ticket 493 --> https://svn.open-mpi.org/trac/ompi/ticket/493
2006-10-25 15:09:30 +00:00
Brian Barrett
fce5130333 Delay opening the listen socket until module init, so that we can have the
seed value have something set to true.  Allow selection of the listen
type to thread if (and only if) the process is the HNP...

This commit was SVN r12105.
2006-10-11 21:29:29 +00:00
George Bosilca
3a34f9340e If the enum is defined inside the struct it will has a scope. We don't
really need that.

This commit was SVN r12001.
2006-10-05 05:27:04 +00:00
Brian Barrett
8f7ab1c584 num_procs can be zero if something went partly wrong before. This will
cause a math exception on some platforms, so don't let that happen.

This commit was SVN r11929.
2006-10-02 01:27:22 +00:00
Brian Barrett
d00a0de716 * It appears that in their infinite wisdom, Apple removed the
__DARWIN_ALIGN_POWER define from the last release of the OS X compiler
    toolchain.  The bug in net/if.h, however, is still there.  So look
    for the hints that we're on a 64 bit Apple PowerPC instead.
  * If we don't find a buffer size that works by 10MB, we're never
    going to.  So add some code to limit the buffer size we'll try
    so that we don't fall into an infinite loop
  * Detect errors in opal_ifcount in the oob init code

Refs trac:420

This commit was SVN r11825.

The following Trac tickets were found above:
  Ticket 420 --> https://svn.open-mpi.org/trac/ompi/ticket/420
2006-09-26 16:37:04 +00:00
Andrew Friedley
798c19d395 Blah.. we should always return after try_connect() here, not just when we have an error.
Another fix for ticket #362.

This commit was SVN r11756.
2006-09-22 15:51:11 +00:00
Andrew Friedley
8895bf7369 Fix the fix (r11718) for bug #362.
We were still waiting the entire duration of the timeout before we figured out that a connect() was successful.  Re-introduce adding the peer_send_event so that we detect immediately when a connect() completes.

Also make sure to delete the timeout event in complete_connect().

Fixed a struct timeval initialization warning reported by Jeff.

Remove an erroneous opal_output().

This commit was SVN r11724.

The following SVN revision numbers were found above:
  r11718 --> open-mpi/ompi@1b6231a9b5
2006-09-20 14:29:37 +00:00
Andrew Friedley
1b6231a9b5 Fix for running jobs that span multiple 's' partitions on IU BigRed.
Each 's' partition has its own TCP network.  It's fine to use this network for jobs that fit inside the partition, but the TCP OOB errors when trying to connect across two partitions, because there are two disjoint networks.  Each node also has another TCP network connecting ALL nodes together.

So the solution is to actually try all the available TCP interfaces on a node, instead of erroring when the first one fails.

Also, the default TCP connect() timeout is way too long (5 minutes) - use our own timeout mechanism, with the timeout value expressed as an MCA parameter.

This commit was SVN r11718.
2006-09-19 19:33:49 +00:00
Ralph Castain
37dfdb76eb Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done.
This commit was SVN r11661.
2006-09-14 21:29:51 +00:00
George Bosilca
f52c10d18e And ORTE is ready for prime-time. All Windows tricks are in:
- use the OPAL functions for PATH and environment variables
- make all headers C++ friendly
- no unamed structures
- no implicit cast.

Plus a full implementation for the orte_wait functions.

This commit was SVN r11347.
2006-08-23 03:32:36 +00:00
George Bosilca
6afa4c6c64 Windows friendly version. We have to split the OMPI_DECLSPEC in at least 3
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.

This commit was SVN r11270.
2006-08-20 15:54:04 +00:00
Ralph Castain
8c7f0ed9ae Change the SOH to the new State Monitoring and Reporting (SMR) framework. New API's will be appearing in the new framework shortly - this just gets the name change into the system.
Other changes:

1. Remove the old xcpu components as they are not functional.

2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one.

This will require an autogen/configure, I'm afraid.

This commit was SVN r11228.
2006-08-16 16:35:09 +00:00
Ralph Castain
5dfd54c778 With the branch to 1.2 made....
Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced).

Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up).

I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t).

In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but...

Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems.

This commit was SVN r11204.
2006-08-15 19:54:10 +00:00
Ralph Castain
d2912f03e0 Cleanup a historical naming convention problem. Move the socket_errno definitions to the OPAL layer and change the name accordingly. This cleans up some interrelationship issues as well as removing a name confusion.
This commit was SVN r11186.
2006-08-14 20:14:44 +00:00
Ralph Castain
bd937b219d Tell xcast not to send to processes that have "aborted".
One of those fixes that has been sitting on another branch for awhile...sigh.

This commit was SVN r11142.
2006-08-09 18:23:43 +00:00
Brian Barrett
c744f650ba * really didn't mean for this patch (the threaded accept() code) to come in with
r10841, so revert it (and it's fixes) out.  Will bring back once cleaned up from
  the code used in the tbird experiment

This commit was SVN r10991.

The following SVN revision numbers were found above:
  r10841 --> open-mpi/ompi@dfa1221c3b
2006-07-25 22:32:01 +00:00
Jeff Squyres
bdab8d744c Send a pointer to the data, not the data itself. Otherwise, we could
get a segv in some cases.

This commit was SVN r10984.
2006-07-25 21:42:44 +00:00
Gleb Natapov
f15fc4ef2f include signal.h for SIGPIPE definition
This commit was SVN r10863.
2006-07-18 09:07:53 +00:00
Brian Barrett
2185c059e8 * use opal_free_list_item_t as the type of items stored in an opal_free_list_t,
rather than assuing it's an opal_list_item_t.

This commit was SVN r10860.
2006-07-17 21:51:50 +00:00
Jeff Squyres
82161d20ca Catch a SIGPIPE and allow it to be harmless. Register a no-op SIGPIPE
handler before the write() and de-register it afterwards.  Determine
if the write() succeeded or failed by the return of write().

This commit was SVN r10858.
2006-07-17 21:15:56 +00:00
George Bosilca
33a7634009 Silence the compiler.
This commit was SVN r10851.
2006-07-17 17:13:28 +00:00
Brian Barrett
dfa1221c3b * AC_CONFIG_LINKS has a minor problem in that it always uses ln -s, rather
than $(LN_S).  This causes problems with with Windows and probably
  elsewhere (re: #200).  So use a slightly different trick to get the
  right header selected for the MEMCPY and TIMER components.

* Using the same trick used to solve the AC_CONFIG_LINKS problem, 
  stop using a separate header file for direct calling in the
  PML and MTL.  This lets me remove some icky code in ompi_mca.m4
  that was more fragile than I really liked.

This commit was SVN r10841.
2006-07-16 04:23:52 +00:00
Ralph Castain
cef1ce19d6 Restore the "sleep" delay during startup.
Since Jeff and I are going to a branch for T-bird, we have restored the trunk to its prior state to avoid any possibility of disturbing it.

This commit was SVN r10774.
2006-07-12 22:18:53 +00:00
Ralph Castain
9102b5af3b Remove the "sleep" delay in the oob connection procedure. This shouldn't cause any problems, especially for launches of less than 1000 processes.
Please report any abnormal behavior during launch, though, as we would like to understand what (if any) impact is seen. I couldn't see any on small jobs (the modulo functions render this number down pretty low).

This commit was SVN r10763.
2006-07-12 20:31:30 +00:00
Brian Barrett
4b70bb92db * Per ticket #112, localhost checks should check against 127.0.0.1/8, rather
than just 127.0.0.1.

This commit was SVN r10750.
2006-07-11 20:54:49 +00:00
Tim Woodall
0a56067509 Correction to resolve a problem related to partial reads. We were making a
copy of the receive buffer based on the iovec struct that may have been updated 
during partial reads to reflect the current offset. Need to make the copy using 
the base address of the buffer.

Thanks to Sven Stork for finding this.

This should be backported to 1.0.X and 1.1.X branches.

This commit was SVN r9749.
2006-04-27 14:27:02 +00:00
Tim Woodall
3e57a4ec48 remove debug code - not required
This commit was SVN r9715.
2006-04-25 19:05:57 +00:00
Brian Barrett
f37a77dd08 * Fix potential deadlock when mpi threads are enabled and progress threads are
not.  See lengthy comment in the body of commit.

This commit was SVN r9573.
2006-04-07 18:13:35 +00:00
George Bosilca
ca75ff2569 In the case we have support for threads, then the opal library have it's own
thread, which will do progress independently of MPI. So in this case we 
have to call opal_event_loop instead of opal_progress.

This commit was SVN r9551.
2006-04-06 14:31:38 +00:00
Brian Barrett
7408de0bfb When progress threads are enabled, opal_progress() doesn't call the
event library (since the event library has its own thread).  So when
we are using progress threads, we really want to call opal_event_loop()
and not opal_progres(). 

This commit was SVN r9549.
2006-04-06 12:58:09 +00:00
George Bosilca
50b5a02f8b Let the oob to call opal_progress instead of opal_progress_event. Now, the MPI
communications will be advanced in MPI_Finalize.

This commit was SVN r9442.
2006-03-28 22:09:40 +00:00
Brian Barrett
3e2c51dea8 * fix some silly commenting done by a previous developer that are good for
a laugh but probably not good for usability ;)

This commit was SVN r9253.
2006-03-11 03:09:24 +00:00