There was a race condition in 35438ae9b5: if multiple threads invoked
ompi_mpi_init() simultaneously (which could happen from both MPI and
OSHMEM), the code did not catch this condition -- Bad Things would
happen.
Now use an atomic cmp/set to ensure that only one thread is able to
advance ompi_mpi_init from NOT_INITIALIZED to INIT_STARTED.
Additionally, change the prototype of ompi_mpi_init() so that
oshmem_init() can safely invoke ompi_mpi_init() multiple times (as
long as MPI_FINALIZE has not started) without displaying an error. If
multiple threads invoke oshmem_init() simultaneously, one of them will
actually do the initialization, and the rest will loop waiting for it
to complete.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>