1
1

7506 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
0b8bb2339b btl/scif: update the size when preparing send fragments
Thanks to Gilles Gouaillardet for catching this.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31715.
2014-05-12 22:05:31 +00:00
Jeff Squyres
e37c7af0fb usnic: update cclient/cagent to use unix domain sockets (not RML)
In preparation for moving the BTLs down to OPAL, discontinue the use
of the RML for connectivity client/agent communication.  Instead, use
local unix domain sockets in the job session directory (all
communication is between processes on the same server, so unix domain
sockets are fine).

This commit was SVN r31710.
2014-05-09 20:35:36 +00:00
Jeff Squyres
184e4fc0ca usnic: ensure that procs agree on use_udp value
Add the component use_udp value into the modex.  If my component's
use_udp value doesn't agree with the use_udp value from a peer's modex
data, print a helpful message and disqualify the usnic BTL (the usnic
BTL will not be used).  This prevents accidental customer
misconfigurations.

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31689.
2014-05-08 16:43:50 +00:00
Jeff Squyres
e9c3df652e usnic: reduce sizeof(ompi_btl_usnic_addr_t) to 56 bytes
Trivial struct re-ordering to eliminate holes in the middle of the
struct (although there's still a hole at the end) and reduce the
overall size of the struct from 64 to 56 bytes.  Also change mtu from
int to uint16_t; there was no need for it to be that large.

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31688.
2014-05-08 16:38:59 +00:00
Jeff Squyres
a61e4d6425 usnic: fix connectivity checker timeout
Fix mismatch between the MCA param (which expresses the timeout in
*mili*seconds) and the struct timeval timeout (which expresses the
timeout in *micro*seconds).

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31687.
2014-05-08 16:36:07 +00:00
Ralph Castain
ab4f8585b0 When we abort during MPI_Init, we currently emit a totally incorrect error message stating that we were unable to aggregate error messages and cannot guarantee all other processes were killed. This simply isn't true IF the rte has been initialized.
So track that the rte has reached that point, and only emit the new message if it is accurate.

Note that we still generate a TON of output for a minor error:

Ralphs-iMac:examples rhc$ mpirun -n 3 -mca btl sm ./hello_c
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[50239,1],2]) is on host: Ralphs-iMac
  Process 2 ([[50239,1],2]) is on host: Ralphs-iMac
  BTLs attempted: sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;

 * Check the output of ompi_info to see which BTL/MTL plugins are
   available.
 * Run your application with MPI_THREAD_SINGLE.
 * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
   if using MTL-based communications) to see exactly which
   communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[50239,1],2]
  Exit code:    1
--------------------------------------------------------------------------
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mca-bml-r2.txt / unreachable proc
[Ralphs-iMac.local:23227] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mpi-runtime / mpi_init:startup:pml-add-procs-fail
Ralphs-iMac:examples rhc$ 

Hopefully, we can agree on a way to reduce this verbage!

This commit was SVN r31686.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2014-05-08 15:48:16 +00:00
Ralph Castain
76f5991ab2 Couple of minor fixes
This commit was SVN r31680.
2014-05-08 02:26:45 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
Ralph Castain
c5d64a22df Fix romio configure to look for update OMPI support file name
This commit was SVN r31670.
2014-05-07 03:19:45 +00:00
Ralph Castain
fdfb331e13 Per RFC, continue the renaming process
This commit was SVN r31663.
2014-05-06 20:53:55 +00:00
Ralph Castain
2b7a3ae601 Per RFC, continue pecking away at the build system renaming
OMPI_CONFIG_SUBDIR  -> OPAL_CONFIG_SUBDIR
   OMPI_CONFIG_SUBDIR_ARGS  ->  OPAL_CONFIG_SUBDIR_ARGS

This commit was SVN r31647.
2014-05-06 16:27:38 +00:00
Devendar Bureddy
dfaac7d29d Do not call into hcoll progress after MPI_Finalize
Reviewed by Mike
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31639.
2014-05-05 22:46:39 +00:00
Ralph Castain
29609577d5 Per RFC:
ompi_show_title  -> opal_show_title
    ompi_show_subtitle -> opal_show_subtitle

This commit was SVN r31638.
2014-05-05 22:35:23 +00:00
Ralph Castain
4def94900a Per RFC: OMPI_INSTALL_BINARIES -> OPAL_INSTALL_BINARIES
This commit was SVN r31634.
2014-05-05 21:43:05 +00:00
Jeff Squyres
10e8ab493e btl_usnic_mca.c: Increase default connectivity checker frequency
In abusive MPI communication patterns, sending a UDP ping only once a
second may not be sufficient -- all the UDP pings may be dropped.  So
increase the frequency of the pings to every quarter second, and allow
more total pings to be sent.

Total timeout time is still the same (10 seconds) -- we'll just now
try 40 times (i.e., once every quarter second) as opposed to 10 times
(i.e., once a second).  Testing has shown that this frequency allows
the connectivity checker to always succeed even in the many-to-one
abusive communication patterns.

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r31602.
2014-05-02 11:06:18 +00:00
Jeff Squyres
bf82ee2a14 btl_usnic_connectivity.h: fix PACK_BYTES macro
We're passing a char foo[x] into PACK_BYTES, so we don't need to take
its address in the macro.  This is parallel to the UNPACK_BYTES macro
(where we pass a char bar[x] into it, and don't take its address in
the macro).

The value we're packing is only used to output in a show_help message,
which is why this wasn't noticed before (i.e., it's not used in
network or addressing that would have caused a failure).

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r31594.
2014-05-01 22:23:22 +00:00
Yossi Etigin
6aa5680059 Revert r30966.
cmr=v1.8.1:reviewer=ompi-gk1.8

This commit was SVN r31593.

The following SVN revision numbers were found above:
  r30966 --> open-mpi/ompi@280e96c99a
2014-05-01 22:17:09 +00:00
Ralph Castain
0c74d1fd6f Silence warning
This commit was SVN r31592.
2014-05-01 21:11:39 +00:00
Jeff Squyres
49383f0aaa Oops: remove errant "v4" string.
This commit was SVN r31591.
2014-05-01 20:21:43 +00:00
Jeff Squyres
56ecb92b10 Per discussion with George and Ralph, change this BTL_ERROR message to
an opal_show_help() so that its output is deduplicated.

This commit was SVN r31590.
2014-05-01 20:15:33 +00:00
Jeff Squyres
0fac9781b3 Assume we always have fortran PROCEDURE support
Per #4590, we now ''require'' the PROCEDURE keyword support in Fortran
for the mpi_f08 module.  So if the Fortran compiler doesn't support
it, then we won't build the mpi_f08 module.

Fixes trac:4590

This commit was SVN r31588.

The following Trac tickets were found above:
  Ticket 4590 --> https://svn.open-mpi.org/trac/ompi/ticket/4590
2014-05-01 18:18:38 +00:00
Ralph Castain
e20dae536c Last step under current RFC: OMPI_CHECK_WITHDIR -> OPAL_CHECK_WITHDIR
This commit was SVN r31585.
2014-05-01 15:38:07 +00:00
Ralph Castain
e11eb15518 Next step of RFC: OMPI_CHECK_FUNC_LIB -> OPAL_CHECK_FUNC_LIB
This commit was SVN r31583.
2014-05-01 14:57:43 +00:00
Ralph Castain
3b64c603b4 First stage of RFC to rename OMPI_foo build system support: change OMPI_CHECK_PACKAGE -> OPAL_CHECK_PACKAGE
This commit was SVN r31582.
2014-05-01 14:24:56 +00:00
Jeff Squyres
c4d85ec6ca btl_usnic_cclient.c: update to use the new opal dstore
Use the new opal dstore API (vs. the old RTE DB API).

(dstore is not going to the v1.8 series, so there's no need to CMR
this to v1.8)

This commit was SVN r31580.
2014-04-30 22:32:47 +00:00
Nathan Hjelm
e963869fdf bcol/basesmuma: close mmapped file descriptor
Not closing this file descriptor will cause us to leak file
descriptors. It is safe to close the file after it has been mmapped.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31579.
2014-04-30 22:28:08 +00:00
Jeff Squyres
d40112a012 rte_base_frame.c: add sanity check to ensure proper sizes
There's a requirement in several places (e.g., opal dstore) that
sizeof(ompi_process_name_t) -- which comes from the compile-time
selected ompi/mca/rte component -- is equal to sizeof(uint64_t).  If
it's not, Bad Things will happen.

So put an assert here to catch that case.

This commit was SVN r31577.
2014-04-30 22:12:54 +00:00
Nathan Hjelm
a28012b29d Fix MPI_T issues identified by friendly users.
Several fixes:

 - I was allowing an MPI_T_cvar_handle to be created for an invalid
   variable. Fixed this by checking if the variable is valid in
   mca_base_var_get.

 - Use a better error code when the caller tries to create an unbound
   pvar handle for a bound variable.

 - Return the verbosity level in MPI_T_cvar_get_info.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31576.
2014-04-30 22:10:30 +00:00
Nathan Hjelm
d80f14eb0f sbgp/ptp: fix obvious typo
cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31575.
2014-04-30 22:10:22 +00:00
Nathan Hjelm
3e5388eaa6 mtl/psm: do not limit PSM to 8191 context ids
The old default context id maximum was committed to the trunk in
2006. After some discussion with Intel it appears this is restricting
the mtl to an arbirarly small number of communicators. Increasing the
default to allow up to 2^16 - 1 context ids.

Refs trac:4574

cmr=v1.8.2

This commit was SVN r31574.

The following Trac tickets were found above:
  Ticket 4574 --> https://svn.open-mpi.org/trac/ompi/ticket/4574
2014-04-30 22:10:15 +00:00
Ralph Castain
087b84b0ef Add some further debug to the dstore framework. When doing comm_spawn, we have to exchange any provided cpu bitmaps to ensure both sides compute the same locality, else various mpi frameworks can go bonkers.
This commit was SVN r31572.
2014-04-30 19:29:00 +00:00
Ralph Castain
e72af03e60 Fix typo covered by enable-heterogeneous
This commit was SVN r31567.
2014-04-30 15:41:58 +00:00
Ralph Castain
c4c9bc1573 As per the RFC:
http://www.open-mpi.org/community/lists/devel/2014/04/14496.php

Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM

This commit was SVN r31557.
2014-04-29 21:49:23 +00:00
Rolf vandeVaart
fc0a75da91 Fix help message errors as reported by check-help-strings.pl script.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31555.
2014-04-29 20:29:18 +00:00
Jeff Squyres
964249e8a6 comm_cid.c: Ensure that "flag" is initially set to false.
If the loops never get executed because CIDs are exhausted, then the
value of flag will be undefined.

Refs trac:4572

This commit was SVN r31546.

The following Trac tickets were found above:
  Ticket 4572 --> https://svn.open-mpi.org/trac/ompi/ticket/4572
2014-04-29 17:39:14 +00:00
Nathan Hjelm
2f5b1ca4cf osc/rdma: do not leak the receive request
This commit fixes a bug that can cause request and communicator leaks
when cleaning up an OSC window. The should prevent a hang seen with
IMB-EXT.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31539.
2014-04-28 19:55:18 +00:00
Nathan Hjelm
e410401523 comm: detect if we run out of communicator ids (cids)
Due to a leak in the osc/rdma component we were running out of cids on
a one-sided tests. This resulted in a hang instead of an error. This
commit causes the nextcid algorithm to return an error if we run out
of cids.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31538.
2014-04-28 19:55:09 +00:00
Nathan Hjelm
626b521e9c pml/ob1: fix heterogeneous support when using the send_inline optimization
We will track #4568 from the 1.8 CMR.

Closes trac:4568

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31535.

The following Trac tickets were found above:
  Ticket 4568 --> https://svn.open-mpi.org/trac/ompi/ticket/4568
2014-04-28 17:36:26 +00:00
Jeff Squyres
64c1228b55 Roll back r31519 and r31521: George convinced us that these approaches
weren't right.

This commit was SVN r31528.

The following SVN revision numbers were found above:
  r31519 --> open-mpi/ompi@b449c750b7
  r31521 --> open-mpi/ompi@e243805ed8
2014-04-24 20:27:03 +00:00
Nathan Hjelm
c9a257f1a0 btl/ugni: always buffer sendi fragments
This commit will improve the message rate when using the sendi function
by not waiting for the send to get to the remote process.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31526.
2014-04-24 18:50:29 +00:00
Jeff Squyres
871e20cd4b MPI_Alltoallv.3in: fix typo
Fix minor typo reported by Xuankang Lin.

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r31525.
2014-04-24 18:14:42 +00:00
George Bosilca
17b3c7e906 Fix the issue reported by Gilles Gouaillardet regarding the
MPI_PROC_NULL persistent requests.

This commit was SVN r31524.
2014-04-24 18:07:09 +00:00
Nathan Hjelm
52f519dacb Allow MPI_MODE_NOPRECEDE | MPI_MODE_NOSUCCEED for MPI_Win_fence
This combination does not make sense but is not explicitly forbidden by
the standard so remove the argument check for this combination.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31523.
2014-04-24 17:36:10 +00:00
Nathan Hjelm
0849d61e38 btl/vader: improve performance under heavy load and eliminate a racy
feature

This commit should fix a hang seen when running some of the one-sided
tests. The downside of this fix is it reduces the maximum size of the
messages that use the fast boxes. I will fix this in a later commit.

To improve performance under a heavy load I introduced sequencing to
ensure messages are given to the pml in order. I have seen little-no
impact on the message rate or latency with this change and there is a
clear improvement to the heavy message rate case.

Lets let this sit in the trunk for a couple of days to ensure that
everything is working correctly.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31522.
2014-04-24 17:36:03 +00:00
Jeff Squyres
e243805ed8 coll tuned alltoallv: correctly handle 0-sized messages with MPI_IN_PLACE
Patch from Gilles Gouaillardet on #4517 to fix handling 0-sized
messages in coll tuned with MPI_ALLTOALLV and MPI_IN_PLACE.

Reviewed by Jeff Squyres.

Fixes trac:4517

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31521.

The following Trac tickets were found above:
  Ticket 4517 --> https://svn.open-mpi.org/trac/ompi/ticket/4517
2014-04-24 16:55:53 +00:00
Jeff Squyres
b449c750b7 coll basic: correctly handle alltoall[vw] 0-sized messages
Patch from Gilles Gouaillardet on #4506 to correctly handle 0-sized
messages in coll/basic MPI_Alltoallv and MPI_Alltoallw.

Reviewed by Jeff Squyres.

Fixes trac:4506.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31519.

The following Trac tickets were found above:
  Ticket 4506 --> https://svn.open-mpi.org/trac/ompi/ticket/4506
2014-04-24 16:25:43 +00:00
Jeff Squyres
ca80c7a9bd cart_sub.c: allow remain_dims==NULL if there is no topology on comm
Patch submitted by Gilles Gouaillardet on #4518.  Reviewed by Jeff.

Fixes trac:4518

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31517.

The following Trac tickets were found above:
  Ticket 4518 --> https://svn.open-mpi.org/trac/ompi/ticket/4518
2014-04-24 16:10:44 +00:00
Jeff Squyres
e9b694f1d8 coll_base_comm_unselect.c: fix memory leaks
Ensure to also OBJ_RELEASE the neightbor and ineighbor modules.

Fixes trac:4444 (this patch is from that ticket).

This commit was SVN r31516.

The following Trac tickets were found above:
  Ticket 4444 --> https://svn.open-mpi.org/trac/ompi/ticket/4444
2014-04-24 15:53:06 +00:00
George Bosilca
024221f469 Initialize some fields (prevent valgrind complaints).
This commit was SVN r31503.
2014-04-23 13:38:30 +00:00