1
1

20234 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
ab4f8585b0 When we abort during MPI_Init, we currently emit a totally incorrect error message stating that we were unable to aggregate error messages and cannot guarantee all other processes were killed. This simply isn't true IF the rte has been initialized.
So track that the rte has reached that point, and only emit the new message if it is accurate.

Note that we still generate a TON of output for a minor error:

Ralphs-iMac:examples rhc$ mpirun -n 3 -mca btl sm ./hello_c
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[50239,1],2]) is on host: Ralphs-iMac
  Process 2 ([[50239,1],2]) is on host: Ralphs-iMac
  BTLs attempted: sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;

 * Check the output of ompi_info to see which BTL/MTL plugins are
   available.
 * Run your application with MPI_THREAD_SINGLE.
 * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
   if using MTL-based communications) to see exactly which
   communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[50239,1],2]
  Exit code:    1
--------------------------------------------------------------------------
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mca-bml-r2.txt / unreachable proc
[Ralphs-iMac.local:23227] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mpi-runtime / mpi_init:startup:pml-add-procs-fail
Ralphs-iMac:examples rhc$ 

Hopefully, we can agree on a way to reduce this verbage!

This commit was SVN r31686.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2014-05-08 15:48:16 +00:00
Ralph Castain
aaae4841e9 Flush the show_help system on our way out - this also restores the opal_show_help function pointer to the OPAL layer for any subsequent processing.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31685.
2014-05-08 14:37:47 +00:00
Ralph Castain
5602156a1c Use the correct abstraction layer name for the data dirs
This commit was SVN r31684.
2014-05-08 14:32:24 +00:00
Jeff Squyres
81afb4e18a hwloc: commit minor bug fix from hwloc git
Bring down 3aa0ed6 from the hwloc v1.7 branch: Stevens says we should
GETFD before we SETFD, so we do

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31683.
2014-05-08 14:29:10 +00:00
Jeff Squyres
cb292b91cd This file looks like it was accidentally committed.
This commit was SVN r31682.
2014-05-08 13:59:34 +00:00
MPI Team
a7505bcaad Update git/hg ignore files
This commit was SVN r31681.
2014-05-08 05:00:38 +00:00
Ralph Castain
76f5991ab2 Couple of minor fixes
This commit was SVN r31680.
2014-05-08 02:26:45 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
Ralph Castain
05590b6a8c Correct the datastore containing the coprocessor info
This commit was SVN r31677.
2014-05-07 19:29:12 +00:00
Ralph Castain
2dbeb671d0 Fix typo impacting assembly support that came in during renaming
This commit was SVN r31676.
2014-05-07 16:22:11 +00:00
Ralph Castain
70ebf2efea One more level of subsubsubsubsubtitle...
This commit was SVN r31675.
2014-05-07 15:51:20 +00:00
Ralph Castain
74983c9002 Continue the renaming, fix ompi_show_subsubtitle
This commit was SVN r31674.
2014-05-07 15:45:47 +00:00
Ralph Castain
27faf2684a Update architecture names in OSHMEM branch
This commit was SVN r31673.
2014-05-07 14:40:49 +00:00
Mike Dubman
cd1f64b941 OSHMEM: Adding missing include for OSHMEM changes necessary to support Java bindings
fixed by Roman, reviewed by Mike

This commit was SVN r31672.
2014-05-07 11:53:18 +00:00
Ralph Castain
c6d2ff368d Per RFC, continue with renaming
This commit was SVN r31671.
2014-05-07 04:51:45 +00:00
Ralph Castain
c5d64a22df Fix romio configure to look for update OMPI support file name
This commit was SVN r31670.
2014-05-07 03:19:45 +00:00
Ralph Castain
a2bf976029 Per RFC, another round of changes
This commit was SVN r31669.
2014-05-07 03:16:59 +00:00
Ralph Castain
f4c31cae9b Per RFC, another round in the renaming game - nearly complete
This commit was SVN r31668.
2014-05-07 03:01:47 +00:00
Ralph Castain
a54dbb17d2 Per RFC, continue renaming project
This commit was SVN r31667.
2014-05-07 01:00:06 +00:00
Ralph Castain
4501285c26 Per RFC, continue the naming conversion
This commit was SVN r31665.
2014-05-06 23:34:33 +00:00
Ralph Castain
839c0eb55c Per RFC, continue the renaming effort
This commit was SVN r31664.
2014-05-06 21:16:29 +00:00
Ralph Castain
fdfb331e13 Per RFC, continue the renaming process
This commit was SVN r31663.
2014-05-06 20:53:55 +00:00
Ralph Castain
883fce4cba Per RFC, continue the build system renaming
This commit was SVN r31662.
2014-05-06 20:30:37 +00:00
Ralph Castain
8a0d6b4aa6 Per RFC, continue the joyous fun of the renaming exercise
This commit was SVN r31661.
2014-05-06 20:13:37 +00:00
Ralph Castain
bdf9aace69 Per RFC, continue with build system renaming
This commit was SVN r31658.
2014-05-06 19:37:10 +00:00
Ralph Castain
4f4d9dcd28 Per RFC, continue with build system renaming
This commit was SVN r31657.
2014-05-06 19:22:27 +00:00
Ralph Castain
deb0c6bb9a Per RFC, continue cleanup with minor changes to one file
This commit was SVN r31656.
2014-05-06 18:37:52 +00:00
Ralph Castain
3fd7cee70c Per RFC, continue cleanup with minor change to one file
This commit was SVN r31655.
2014-05-06 18:30:55 +00:00
Ralph Castain
9b88ec7cde Per RFC, continue cleaning up the build system
OMPI_C_WEAK_SYMBOLS  ->  OPAL_C_WEAK_SYMBOLS
  ompi_cv_c_weak_symbols  ->  opal_cv_c_weak_symbols

This commit was SVN r31654.
2014-05-06 18:03:08 +00:00
Ralph Castain
ee6ee7a10f Don't replace stuff in the autom4te.cache directory
This commit was SVN r31653.
2014-05-06 18:01:51 +00:00
Jeff Squyres
9d19dec80a [ompi|opal]_setup_cxx.m4: ensure to use the C++ compiler (!)
We didn't AC_LANG_PUSH(C++) before checking to see if the compiler
supports -finline-functions, meaning that configure used the C
compiler for these checkes, not the C++ compiler.

Due to #2999, we have to fix both opal_setup_cxx.m4 and
ompi_setup_cxx.m4.

cmr=v1.8.2:reviewer=rolfv

This commit was SVN r31651.
2014-05-06 17:42:51 +00:00
Ralph Castain
ab83b9425a Complete the cleanup of this file
This commit was SVN r31650.
2014-05-06 16:57:34 +00:00
Ralph Castain
aaf2969e9b Fix comment and add copyright
This commit was SVN r31649.
2014-05-06 16:51:30 +00:00
Ralph Castain
002cd34013 Per RFC, continue the build system renaming
OMPI_C_GET_ALIGNMENT  -> OPAL_C_GET_ALIGNMENT
   ompi_cv_c_align_  ->  opal_cv_c_align_

This commit was SVN r31648.
2014-05-06 16:50:27 +00:00
Ralph Castain
2b7a3ae601 Per RFC, continue pecking away at the build system renaming
OMPI_CONFIG_SUBDIR  -> OPAL_CONFIG_SUBDIR
   OMPI_CONFIG_SUBDIR_ARGS  ->  OPAL_CONFIG_SUBDIR_ARGS

This commit was SVN r31647.
2014-05-06 16:27:38 +00:00
Ralph Castain
390d29733f Per RFC, continue renaming of build tools:
ompi_c_vendor  ->  opal_c_vendor
ompi_cv_c_compiler_vendor  ->  opal_cv_c_compiler_vendor

This commit was SVN r31646.
2014-05-06 15:01:34 +00:00
Ralph Castain
f60aadd989 Sigh - forgot to change name in Makefile.am
This commit was SVN r31645.
2014-05-06 14:35:14 +00:00
MPI Team
ac5f224d68 Update git/hg ignore files
This commit was SVN r31643.
2014-05-06 05:00:35 +00:00
Ralph Castain
9d320c55dd Missed one file to be renamed
This commit was SVN r31642.
2014-05-06 03:47:18 +00:00
Ralph Castain
fdd35f301a Per RFC, next major step in cleaning up the build system naming patterns: rename files containing things used by the OPAL layer to be opal_foo.m4 instead of ompi_foo.m4. The ALPS plm is currently checking UGNI, so shift the check_ugni.m4 to orte for now.
This commit was SVN r31641.
2014-05-06 03:20:16 +00:00
Devendar Bureddy
dfaac7d29d Do not call into hcoll progress after MPI_Finalize
Reviewed by Mike
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31639.
2014-05-05 22:46:39 +00:00
Ralph Castain
29609577d5 Per RFC:
ompi_show_title  -> opal_show_title
    ompi_show_subtitle -> opal_show_subtitle

This commit was SVN r31638.
2014-05-05 22:35:23 +00:00
Ralph Castain
a1ae20fddb Per RFC: OMPI_CFLAGS_BEFORE_PICKY -> OPAL_CFLAGS_BEFORE_PICKY
- This line, and those below, will be ignored--

M    opal/mca/event/libevent2021/configure.m4
M    opal/mca/hwloc/hwloc172/configure.m4
M    configure.ac
M    config/opal_setup_libltdl.m4
M    config/opal_check_visibility.m4
M    config/opal_setup_cc.m4

This commit was SVN r31637.
2014-05-05 22:22:33 +00:00
Ralph Castain
425d4b9e81 Per RFC: OMPI_ENSURE_CONTAINS_OPTFLAGS -> OPAL_ENSURE_CONTAINS_OPTFLAGS
This commit was SVN r31636.
2014-05-05 22:02:39 +00:00
George Bosilca
8a8218349a Datatype with count of zero do not generate traffic in the type map, nor
have any impact on the type signature.

Fixes trac:4597.
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31635.

The following Trac tickets were found above:
  Ticket 4597 --> https://svn.open-mpi.org/trac/ompi/ticket/4597
2014-05-05 21:49:18 +00:00
Ralph Castain
4def94900a Per RFC: OMPI_INSTALL_BINARIES -> OPAL_INSTALL_BINARIES
This commit was SVN r31634.
2014-05-05 21:43:05 +00:00
Ralph Castain
87d809eefe Add a new "run-time controls" framework for setting controls on processes. Initially, just move the process binding code there under a new "hwloc" component. Additional components to support cgroups, power settings, etc. to follow
This commit was SVN r31633.
2014-05-05 19:22:06 +00:00
Alex Mikheev
253f2d51ef OSHMEM: use request pool for ondemand mkey exchange
Use pool of 16 requests instead of single one
cmr=v1.8.2:reviewer=ompi-rm1.8
reviewed by miked

This commit was SVN r31628.
2014-05-04 14:28:56 +00:00
Alex Mikheev
c29d426153 OSHMEM: fixes mxm rc transport mkey ecxhange
cmr=v1.8.2:reviewer=ompi-rm1.8
reviewed by miked

This commit was SVN r31627.
2014-05-04 14:26:54 +00:00