1
1
openmpi/ompi/mca/pml/ob1
Josh Hursey 2c736873bb Fix a checkpoint/restart bug that causes a restarted application to occasionally throw a SIGSEGV or SIGPIPE due to invalid socket descriptors.
The problem was caused by a bad ordering between the restart of the ORTE level tcp connections (in the OOB - out-of-band communication) and the Open MPI level tcp connections (BTLs). Before this commit ORTE would shutdown and restart the OOB completely before the OMPI level restarted its tcp connections. What would happen is that a socket descriptor used by the OMPI level on checkpoint was assigned to the ORTE level on restart. But the OMPI level had no knowledge that the socket descriptor it was previously using has been recycled so it closed it on restart. This caused the ORTE level to break as the newly created socket descriptor was closed without its knowledge.

The fix is to have the OMPI level shutdown tcp connections, allow the ORTE level to restart, and then allow the OMPi level to restart its connections. This seems obvious, and I'm surprised that this bug has not cropped up sooner. I'm confident that this specific problem has been fixed with this commit.

Thanks to Eric Roman and Tamer El Sayed for their help in identifying this problem, and patience while I was fixing it.

 * Add a new state {{{OPAL_CRS_RESTART_PRE}}}. This state identifies when we are on the down slope of the INC (finalize-like) which is useful when you want to close, but not reopen a component set for fear of interfering with a lower level.
 * Use this new state in OMPI level coordination. Here we want to make sure to play well with both the OMPI/BTL/TCP and ORTE/OOB/TCP components.
 * Update ft_event functions in PML and BML to handle the new restart state.
 * Add an additional flag to the error output in OOB/TCP so we can see what the socket descriptor was on failure as this can be helpful in debugging.

This commit was SVN r18276.
2008-04-24 17:54:22 +00:00
..
configure.params Remove unneeded PARAM_INIT_FILE variable in configure.params files used by 2007-01-08 03:44:22 +00:00
Makefile.am Per long threads on the mailing list and much confusion discussion 2007-12-15 13:32:02 +00:00
pml_ob1_comm.c - Initialize in the order of mca_pml_ob1_comm_proc_t... 2007-08-23 05:56:22 +00:00
pml_ob1_comm.h Don't keep the data attached to a fragment segmented when we have 2007-04-18 15:52:11 +00:00
pml_ob1_component.c eager_limit is no longer needed in OB1 PML. Remove it. 2007-10-15 09:26:42 +00:00
pml_ob1_component.h Cleanup the OMPI_DECLSPEC/OMPI_MODULE_DECLSPEC in the PMLs. 2008-01-09 20:32:39 +00:00
pml_ob1_endpoint.c Next step in the project split, mainly source code re-arranging 2006-02-12 01:33:29 +00:00
pml_ob1_endpoint.h Minor cleanups. On the OB1 PML the endpoint is not used => remove it from the build. 2006-07-13 00:07:13 +00:00
pml_ob1_hdr.h Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex. 2008-04-17 20:43:56 +00:00
pml_ob1_iprobe.c Move duplicated code all over the code to a single function ompi_request_wait_completion(). 2007-10-18 12:33:21 +00:00
pml_ob1_irecv.c Move duplicated code all over the code to a single function ompi_request_wait_completion(). 2007-10-18 12:33:21 +00:00
pml_ob1_isend.c Move duplicated code all over the code to a single function ompi_request_wait_completion(). 2007-10-18 12:33:21 +00:00
pml_ob1_progress.c Next step in the project split, mainly source code re-arranging 2006-02-12 01:33:29 +00:00
pml_ob1_rdma.c Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00
pml_ob1_rdma.h Schedule SEND traffic of pipeline protocol between BTLs in accordance with 2007-07-01 11:34:23 +00:00
pml_ob1_rdmafrag.c make ompi_free_list_item_t a class.. 2006-06-12 16:44:00 +00:00
pml_ob1_rdmafrag.h Fix deadlock in OB1 protocol by by sending memory by copying if registration 2007-06-03 08:31:58 +00:00
pml_ob1_recvfrag.c Oops ... 2008-04-24 15:54:52 +00:00
pml_ob1_recvfrag.h Rewrite OB1 matching logic. Get rid of macros, make the code shorter. 2007-12-19 09:16:20 +00:00
pml_ob1_recvreq.c Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex. 2008-04-17 20:43:56 +00:00
pml_ob1_recvreq.h Decide if sends should be throttled at the receiver and pass this to the sender 2008-03-27 08:56:43 +00:00
pml_ob1_sendreq.c Shift the architecture calculation from the ompi/datatype engine to the opal/util area. This allows us to compute the architecture earlier in the launch and communicate it outside of the modex. 2008-04-17 20:43:56 +00:00
pml_ob1_sendreq.h Add the ownership flags to the PML/BTL interface. The layer 2008-02-18 17:39:30 +00:00
pml_ob1_start.c Use the new memchecker function call which is based on convertor. 2008-04-07 07:52:04 +00:00
pml_ob1.c Fix a checkpoint/restart bug that causes a restarted application to occasionally throw a SIGSEGV or SIGPIPE due to invalid socket descriptors. 2008-04-24 17:54:22 +00:00
pml_ob1.h Remove descriptor caching form BML. With descriptor caching some optimizations 2007-12-09 13:58:17 +00:00
post_configure.sh * include the correct file if we are doing the component bypass thing with ob1 2006-02-22 16:16:38 +00:00