1
1

23 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
38c2e43ac2 Print out error string rather than errno for TCP-related errors, making it easier for both the user and us to debug issues with BTL and OOB issues...
This commit was SVN r12852.
2006-12-14 18:20:43 +00:00
Ralph Castain
6d6cebb4a7 Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things).
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.

I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).

This commit was SVN r12597.
2006-11-14 19:34:59 +00:00
Ralph Castain
d2912f03e0 Cleanup a historical naming convention problem. Move the socket_errno definitions to the OPAL layer and change the name accordingly. This cleans up some interrelationship issues as well as removing a name confusion.
This commit was SVN r11186.
2006-08-14 20:14:44 +00:00
Brian Barrett
2185c059e8 * use opal_free_list_item_t as the type of items stored in an opal_free_list_t,
rather than assuing it's an opal_list_item_t.

This commit was SVN r10860.
2006-07-17 21:51:50 +00:00
Tim Woodall
0a56067509 Correction to resolve a problem related to partial reads. We were making a
copy of the receive buffer based on the iovec struct that may have been updated 
during partial reads to reflect the current offset. Need to make the copy using 
the base address of the buffer.

Thanks to Sven Stork for finding this.

This should be backported to 1.0.X and 1.1.X branches.

This commit was SVN r9749.
2006-04-27 14:27:02 +00:00
Brian Barrett
f37a77dd08 * Fix potential deadlock when mpi threads are enabled and progress threads are
not.  See lengthy comment in the body of commit.

This commit was SVN r9573.
2006-04-07 18:13:35 +00:00
George Bosilca
ca75ff2569 In the case we have support for threads, then the opal library have it's own
thread, which will do progress independently of MPI. So in this case we 
have to call opal_event_loop instead of opal_progress.

This commit was SVN r9551.
2006-04-06 14:31:38 +00:00
Brian Barrett
7408de0bfb When progress threads are enabled, opal_progress() doesn't call the
event library (since the event library has its own thread).  So when
we are using progress threads, we really want to call opal_event_loop()
and not opal_progres(). 

This commit was SVN r9549.
2006-04-06 12:58:09 +00:00
George Bosilca
50b5a02f8b Let the oob to call opal_progress instead of opal_progress_event. Now, the MPI
communications will be advanced in MPI_Finalize.

This commit was SVN r9442.
2006-03-28 22:09:40 +00:00
Brian Barrett
3e2c51dea8 * fix some silly commenting done by a previous developer that are good for
a laugh but probably not good for usability ;)

This commit was SVN r9253.
2006-03-11 03:09:24 +00:00
Brian Barrett
566a050c23 Next step in the project split, mainly source code re-arranging
- move files out of toplevel include/ and etc/, moving it into the
    sub-projects
  - rather than including config headers with <project>/include, 
    have them as <project>
  - require all headers to be included with a project prefix, with
    the exception of the config headers ({opal,orte,ompi}_config.h
    mpi.h, and mpif.h)

This commit was SVN r8985.
2006-02-12 01:33:29 +00:00
Jeff Squyres
42ec26e640 Update the copyright notices for IU and UTK.
This commit was SVN r7999.
2005-11-05 19:57:48 +00:00
Tim Woodall
a891db81e9 set socket options to improve oob performance
This commit was SVN r7934.
2005-10-31 16:21:11 +00:00
Tim Woodall
b60bea9ada dont allow callbacks to processed recursively - appear to be blowing away the stack
This commit was SVN r7862.
2005-10-25 13:48:08 +00:00
Tim Woodall
88c7fd9f8d add support for a "persistent" non-blocking receive
doesn't require a re-registration on every receive

This commit was SVN r7822.
2005-10-20 22:06:11 +00:00
Tim Woodall
3280f6e655 add facility to receive callback on disconnection from peer
This commit was SVN r7650.
2005-10-06 19:39:20 +00:00
Jeff Squyres
74744dd9df Fix a holdover mistake from the directory re-org:
- orte/class/ompi_proc_table.[ch] -> orte/class/orte_proc_table.[ch]
- opal_hash_table_[get|set|remove]_proc -> 
  orte_hash_table_[get|set|remove]_proc

This commit was SVN r6549.
2005-07-19 12:25:19 +00:00
Brian Barrett
a13166b500 * rename ompi_output to opal_output
This commit was SVN r6329.
2005-07-03 23:31:27 +00:00
Brian Barrett
23b687b0f4 * rename ompi_event to opal_event
This commit was SVN r6328.
2005-07-03 23:09:55 +00:00
Brian Barrett
39dbeeedfb * rename locking code from ompi to opal
This commit was SVN r6327.
2005-07-03 22:45:48 +00:00
Brian Barrett
9f0c969bb4 * rename ompi_hash_table opal_hash_table
This commit was SVN r6324.
2005-07-03 16:52:32 +00:00
Brian Barrett
761402f95f * rename ompi_list to opal_list
This commit was SVN r6322.
2005-07-03 16:22:16 +00:00
Jeff Squyres
1b18979f79 Initial population of orte tree
This commit was SVN r6266.
2005-07-02 13:42:54 +00:00