
Turns out the alps plm component wasn't changing the state of the job upon terminating the orted's in the case of an abnormal termination. This caused mpirun to hang with a zommbie'd aprun process if an orted on a node in the job was killed via signal.