1
1
Граф коммитов

122 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
13a4bba13f Yet another dumb thing that shouldn't have been in r14261.
This commit was SVN r14263.

The following SVN revision numbers were found above:
  r14261 --> open-mpi/ompi@8a55c84d0b
2007-04-07 23:23:23 +00:00
Brian Barrett
32f0090f81 fix dumb variable scope mistake
This commit was SVN r14262.
2007-04-07 23:00:57 +00:00
Brian Barrett
8a55c84d0b Fix a number of OOB issues:
* Remove the connect() timeout code, as it had some nasty race conditions
    when connections were established as the trigger was firing.  A better
    solution has been found for the cluster where this was needed, so just
    removing it was easiest.
  * When a fatal error (too many connection failures) occurs, set an error
    on messages in the queue even if there isn't an active message.  The
    first message to any peer will be queued without being active (and
    so will all subsequent messages until the connection is established),
    and the orteds will hang until that first message completes.  So if
    an orted can never contact it's peer, it will never exit and just sit
    waiting for that message to complete.
  * Cover an interesting RST condition in the connect code.  A connection
    can complete the three-way handshake, the connector can even send
    some data, but the server side will drop the connection because it
    can't move it from the half-connected to fully-connected state because
    of space shortage in the listen backlog queue.  This causes a RST to
    be received first time that recv() is called, which will be when waiting
    for the remote side of the OOB ack.  In this case, transition the
    connection back into a CLOSED state and try to connect again.
  * Add levels of debugging, rather than all or nothing, each building on
    the previous level.  0 (default) is hard errors.  1 is connection 
    error debugging info.  2 is all connection info.  3 is more state
    info.  4 includes all message info.
  * Add some hopefully useful comments

This commit was SVN r14261.
2007-04-07 22:33:30 +00:00
Galen Shipman
48d1fa830d A race condition exists on the free list of pending connections because
OPAL_FREE_LIST_WAIT/RETURN will not use locks in a non-threaded build
conditionaly use locks if non-threaded around the OPAL_FREE_LIST_WAIT/RETURN 
seems to fix the issue 
Tested at 4K processes and seems to work.. 

This commit was SVN r14135.
2007-03-23 15:19:03 +00:00
Brian Barrett
d454395b51 Need to fall back on the event listen mode if the MCA parameter said use the
listen thread, but we're not the HNP.  This is better than not starting up
any listen mode, which is what we were doing before :/

This commit was SVN r14133.
2007-03-23 13:29:18 +00:00
Galen Shipman
e654604a25 remove invalid comment
This commit was SVN r14118.
2007-03-22 03:51:36 +00:00
Josh Hursey
dadca7da88 Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD).
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.

This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.

This commit closes trac:158

More details to follow.

This commit was SVN r14051.

The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
  r13912

The following Trac tickets were found above:
  Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
2007-03-16 23:11:45 +00:00
Brian Barrett
f6a5d58885 Rather than set the connect event timeout number to something big and hoping
its bigger than the timeout for the connect() call, just don't register
the handler by default and fall back to connect() timing out.  Should give
much happier performance on big clusters.

This commit was SVN r13639.
2007-02-13 18:36:50 +00:00
Brian Barrett
262cbbc5c9 Back out r13593, which contained a change that shouldn't be committed.
This commit was SVN r13594.

The following SVN revision numbers were found above:
  r13593 --> open-mpi/ompi@81472363ea
2007-02-09 20:13:02 +00:00
Brian Barrett
81472363ea Allow the OOB to connect between all MPI applications during MPI_INIT
without also establishing MPI connectivity.

This commit was SVN r13593.
2007-02-09 20:11:40 +00:00
Jeff Squyres
c91fcd7fbd Fix a bunch of minor typos submitted by Bernhard Fischer.
This commit was SVN r13505.
2007-02-06 12:00:30 +00:00
George Bosilca
9f73335bdb Silence the compiler.
This commit was SVN r13381.
2007-01-31 04:24:56 +00:00
Rainer Keller
061ba05439 - Fixes uncovered with the format attribute to
opal_output and opal_output_verbose

This commit was SVN r13371.
2007-01-30 20:56:31 +00:00
George Bosilca
1e38810c2d Correctly close the sockets on a generic way.
This commit was SVN r13254.
2007-01-23 03:17:23 +00:00
Jeff Squyres
6f7adfe231 Fix for the oob base open and close functions being invoked twice by
ompi_info -- once directly and once via the rml oob component.

This commit was SVN r13152.
2007-01-17 15:18:13 +00:00
Brian Barrett
03112254e7 Increase connection timeout to 600 seconds, which should always be higher than
the connect() timeout, so that we'll use that rather than our own timeout by
defualt.  There timeout was set low for Big Red, but causes problems for very
large clusters, as there's no way to wire them up in 10 seconds most of the
time.

This commit was SVN r13062.
2007-01-10 04:53:21 +00:00
Brian Barrett
a34e67d743 Remove unneeded PARAM_INIT_FILE variable in configure.params files used by
components that use configure.m4 for configuration or are always built. 
The macro has not been needed since moving to configure types other than
configure.stub

Fixes trac:590

This commit was SVN r13031.

The following Trac tickets were found above:
  Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590
2007-01-08 03:44:22 +00:00
Ralph Castain
6101050ea6 Remove an abstraction barrier I thought was gone long-ago. The OOB subscription really shouldn't be defined as an OMPI subscription.
I know it's just a technicality, but it is time to address such things rather than just letting them continue to propagate. :-)

This commit was SVN r12954.
2007-01-02 16:16:50 +00:00
Brian Barrett
38c2e43ac2 Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues...
This commit was SVN r12852.
2006-12-14 18:20:43 +00:00
Ralph Castain
0a5d41857a Complete next round of message size reduction: "strip" the descriptive info from the returned values. I have now added a flag to the gpr address mode (ORTE_GPR_STRIPPED) that instructs the gpr to not include segment names or tokens in the returned gpr_value_t objects.
I found only two places that were looking at the tokens:

1. the odls - we used the tokens to separately process the globals container data from everything else. In this case, I left the subscription that returned the globals data alone, but "stripped" the subscription that returned the launch data for the procs. These subscriptions have nothing to do with the xcast message.

2. the pml_base_modex - the callback function was getting process names from the returned tokens. Actually, this function was doing a very bad thing - it was assuming that the first token returned was *always* the process name. This is currently true, but is one of those assumptions that someone could have easily changed - and suddenly found the system inexplicably failing. I modified the function to (a) get the name sent back to us, (b) "stripped" the value structures of tokens and segment strings, and (c) correctly obtained process names from the returned values. I also reindented the heck out of the code so it was legible (at least, to my old eyes).

This commit was SVN r12813.
2006-12-09 23:10:25 +00:00
Ralph Castain
a1153fdc8f Eliminate virtually all of the attribute_predefined data from the STG1 message. We now compute the total number of slots allocated to us and save that in the registry - the attributed_predefined then retrieves it via the STG1 message. The app_num is passed via the process_info structure, which gets the value from the ODLS in the environment.
Obviously, people like bproc will have to get the app_num via another avenue...but that's a problem for another day. Several options are easily available.

This commit was SVN r12788.
2006-12-07 03:11:20 +00:00
Brian Barrett
6f8b366acb Rename liborte to libopen-rte and libopal to libopen-pal per telecon today
and bug #632.

Refs trac:632

This commit was SVN r12762.

The following Trac tickets were found above:
  Ticket 632 --> https://svn.open-mpi.org/trac/ompi/ticket/632
2006-12-05 18:27:24 +00:00
George Bosilca
a0ed53d70b Make the compilers happy.
This commit was SVN r12729.
2006-12-03 00:19:11 +00:00
George Bosilca
3fd278c522 Make the tree compile in debug mode.
This commit was SVN r12724.
2006-12-01 23:03:09 +00:00
Ralph Castain
897744cdeb Two major changes to the runtime:
1. implement and enable the non-described buffer operations. I will send out a more detailed explanation separately. However, this mode of operation (which is now the default) significantly reduces message size during startup. If you want the described buffers, set the mca param "-mca dss_describe_buffer 1".

2. revise the xcast system to support both linear and binomial tree broadcast methods. Since we are seeing scenarios where the binomiall tree can cause problems, I have made the linear method the default. To run with the binomial tree, set the mca param "-mca oob_xcast_mode binomial".

3. add some detailed timing reports to the xcast operation. These are enabled via "-mca oob_xcast_timing 1".

4. add some more unit tests for the dss and gpr (focused on support for the non-described buffer)

This commit was SVN r12722.
2006-12-01 22:30:39 +00:00
Ralph Castain
bc4e97a435 First stage in the move to a faster startup. Change the ORTE stage gate xcast into a binary tree broadcast (away from a linear broadcast). Also, removed the timing report in the gpr_proxy component that printed out the number of bytes in the compound command message as the answer was "not much" - reduces the clutter in the data.
This commit was SVN r12679.
2006-11-28 00:06:25 +00:00
Brian Barrett
0895f5e08d Rename OMPI_PROCESS_NAME_{HTON, NTOH} macros to ORTE_PROCESS_NAME_{HTON, NTOH}
because they are in ORTE, not OMPI.  Also, remove the ORTE_PROCESS_NAME macros
in iof base as they are duplicates of the ones that were in ns_types, which 
meant that bad things happened if you changed what an orte_process_name_t
looked like.

This commit was SVN r12646.
2006-11-22 03:03:21 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
Galen Shipman
68d9922f44 enable/disable connection sleep in oob_tcp.c via mca param.. on by default..
This commit was SVN r12444.
2006-11-06 18:00:46 +00:00
Ralph Castain
c77f6c605e Update timing reports:
1. Remove timing of xcast from mpi_init

2. Add timing report from oob_xcast on how long it took to send the message

This commit was SVN r12428.
2006-11-03 18:55:05 +00:00
Ralph Castain
60e27c77e7 Add some additional timing reporting:
1. Added reporting points around the xcasts in MPI_Init. Note that these times will include time spent waiting for a trigger to fire, which is why the times between stage gates did NOT include these times initially. The inter-stage-gate times still do NOT include the xcast time - the xcast time is reported separately.

2. Added the process vpid on the MPI_Init timing reports for clarity.

3. Added a report from the xcast function on the HNP that outputs the number of bytes in the message being sent to the processes.

This commit was SVN r12422.
2006-11-03 16:04:40 +00:00
Brian Barrett
d6ff14ed61 Hand-pack the connection information for each peer rather than just
packing a sockaddr_in, as there are some endianness and padding issues
with sending a sockaddr_in.  Note that the sin_port and sin_addr are
already in network byte order, which is why we pack them as a byte
string.

Refs trac:493

This commit was SVN r12301.

The following Trac tickets were found above:
  Ticket 493 --> https://svn.open-mpi.org/trac/ompi/ticket/493
2006-10-25 15:09:30 +00:00
Brian Barrett
fce5130333 Delay opening the listen socket until module init, so that we can have the
seed value have something set to true.  Allow selection of the listen
type to thread if (and only if) the process is the HNP...

This commit was SVN r12105.
2006-10-11 21:29:29 +00:00
George Bosilca
3a34f9340e If the enum is defined inside the struct it will has a scope. We don't
really need that.

This commit was SVN r12001.
2006-10-05 05:27:04 +00:00
Brian Barrett
8f7ab1c584 num_procs can be zero if something went partly wrong before. This will
cause a math exception on some platforms, so don't let that happen.

This commit was SVN r11929.
2006-10-02 01:27:22 +00:00
Brian Barrett
d00a0de716 * It appears that in their infinite wisdom, Apple removed the
__DARWIN_ALIGN_POWER define from the last release of the OS X compiler
    toolchain.  The bug in net/if.h, however, is still there.  So look
    for the hints that we're on a 64 bit Apple PowerPC instead.
  * If we don't find a buffer size that works by 10MB, we're never
    going to.  So add some code to limit the buffer size we'll try
    so that we don't fall into an infinite loop
  * Detect errors in opal_ifcount in the oob init code

Refs trac:420

This commit was SVN r11825.

The following Trac tickets were found above:
  Ticket 420 --> https://svn.open-mpi.org/trac/ompi/ticket/420
2006-09-26 16:37:04 +00:00
Andrew Friedley
798c19d395 Blah.. we should always return after try_connect() here, not just when we have an error.
Another fix for ticket #362.

This commit was SVN r11756.
2006-09-22 15:51:11 +00:00
Andrew Friedley
8895bf7369 Fix the fix (r11718) for bug #362.
We were still waiting the entire duration of the timeout before we figured out that a connect() was successful.  Re-introduce adding the peer_send_event so that we detect immediately when a connect() completes.

Also make sure to delete the timeout event in complete_connect().

Fixed a struct timeval initialization warning reported by Jeff.

Remove an erroneous opal_output().

This commit was SVN r11724.

The following SVN revision numbers were found above:
  r11718 --> open-mpi/ompi@1b6231a9b5
2006-09-20 14:29:37 +00:00
Andrew Friedley
1b6231a9b5 Fix for running jobs that span multiple 's' partitions on IU BigRed.
Each 's' partition has its own TCP network.  It's fine to use this network for jobs that fit inside the partition, but the TCP OOB errors when trying to connect across two partitions, because there are two disjoint networks.  Each node also has another TCP network connecting ALL nodes together.

So the solution is to actually try all the available TCP interfaces on a node, instead of erroring when the first one fails.

Also, the default TCP connect() timeout is way too long (5 minutes) - use our own timeout mechanism, with the timeout value expressed as an MCA parameter.

This commit was SVN r11718.
2006-09-19 19:33:49 +00:00
Ralph Castain
37dfdb76eb Here is the major MAD-cure commit. I have written plenty about it, so I refer you here to those messages for a description of everything that was done.
This commit was SVN r11661.
2006-09-14 21:29:51 +00:00
George Bosilca
f52c10d18e And ORTE is ready for prime-time. All Windows tricks are in:
- use the OPAL functions for PATH and environment variables
- make all headers C++ friendly
- no unamed structures
- no implicit cast.

Plus a full implementation for the orte_wait functions.

This commit was SVN r11347.
2006-08-23 03:32:36 +00:00
George Bosilca
6afa4c6c64 Windows friendly version. We have to split the OMPI_DECLSPEC in at least 3
different macros, one for each project. Therefore, now we have OPAL_DECLSPEC,
ORTE_DECLSPEC and OMPI_DECLSPEC. Please use them based on the sub-project.

This commit was SVN r11270.
2006-08-20 15:54:04 +00:00
Ralph Castain
8c7f0ed9ae Change the SOH to the new State Monitoring and Reporting (SMR) framework. New API's will be appearing in the new framework shortly - this just gets the name change into the system.
Other changes:

1. Remove the old xcpu components as they are not functional.

2. Fix a "bug" in orterun whereby we called dump_aborted_procs even when we normally terminated. There is still some kind of bug in this procedure, however, as we appear to be calling the orterun job_state_callback function every time a process terminates (instead of only once when they have all terminated). I'll continue digging into that one.

This will require an autogen/configure, I'm afraid.

This commit was SVN r11228.
2006-08-16 16:35:09 +00:00
Ralph Castain
5dfd54c778 With the branch to 1.2 made....
Clean up the remainder of the size_t references in the runtime itself. Convert to orte_std_cntr_t wherever it makes sense (only avoid those places where the actual memory size is referenced).

Remove the obsolete oob barrier function (we actually obsoleted it a long time ago - just never bothered to clean it up).

I have done my best to go through all the components and catch everything, even if I couldn't test compile them since I wasn't on that type of system. Still, I cannot guarantee that problems won't show up when you test this on specific systems. Usually, these will just show as "warning: comparison between signed and unsigned" notes which are easily fixed (just change a size_t to orte_std_cntr_t).

In some places, people didn't use size_t, but instead used some other variant (e.g., I found several places with uint32_t). I tried to catch all of them, but...

Once we get all the instances caught and fixed, this should once and for all resolve many of the heterogeneity problems.

This commit was SVN r11204.
2006-08-15 19:54:10 +00:00
Ralph Castain
d2912f03e0 Cleanup a historical naming convention problem. Move the socket_errno definitions to the OPAL layer and change the name accordingly. This cleans up some interrelationship issues as well as removing a name confusion.
This commit was SVN r11186.
2006-08-14 20:14:44 +00:00
Ralph Castain
bd937b219d Tell xcast not to send to processes that have "aborted".
One of those fixes that has been sitting on another branch for awhile...sigh.

This commit was SVN r11142.
2006-08-09 18:23:43 +00:00
Brian Barrett
c744f650ba * really didn't mean for this patch (the threaded accept() code) to come in with
r10841, so revert it (and it's fixes) out.  Will bring back once cleaned up from
  the code used in the tbird experiment

This commit was SVN r10991.

The following SVN revision numbers were found above:
  r10841 --> open-mpi/ompi@dfa1221c3b
2006-07-25 22:32:01 +00:00
Jeff Squyres
bdab8d744c Send a pointer to the data, not the data itself. Otherwise, we could
get a segv in some cases.

This commit was SVN r10984.
2006-07-25 21:42:44 +00:00
Gleb Natapov
f15fc4ef2f include signal.h for SIGPIPE definition
This commit was SVN r10863.
2006-07-18 09:07:53 +00:00
Brian Barrett
2185c059e8 * use opal_free_list_item_t as the type of items stored in an opal_free_list_t,
rather than assuing it's an opal_list_item_t.

This commit was SVN r10860.
2006-07-17 21:51:50 +00:00