1
1
Граф коммитов

20320 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
2abf647f59 configure.ac: s/autogen.sh/autogen.pl/ in some comments
This commit was SVN r31755.
2014-05-14 15:31:39 +00:00
Ralph Castain
3a1c2fff3e Correct a misplaced bracket - daemons shouldn't be doing app-related operations
This may need a patch for 1.8.2, but we can try to directly apply it

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31754.
2014-05-14 15:23:30 +00:00
George Bosilca
f27123a20d Fix the add_proc issue identified by Jeff: the TCP BTL now discard a
peer proc without TCP support instead of completely dropping TCP support for the entire job.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31753.
2014-05-14 13:47:57 +00:00
Mike Dubman
95e637f5ba OSHMEM: fix error message when aborting on OOM
fixed by Roman, reviewed by Miked

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31752.
2014-05-14 13:45:16 +00:00
Mike Dubman
644aa6f737 OSHMEM: add missing profiling API for shmem_finalize
fixed by Roman, reviewed by Miked
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31751.
2014-05-14 13:13:30 +00:00
Ralph Castain
8295fae4a9 Remove stale file reference
This commit was SVN r31750.
2014-05-14 01:50:09 +00:00
Nathan Hjelm
518f188ad4 bml/base: ensure all components are closed when the framework is
closed

We were leaving the selected component open. This commit should
eliminate a leak detected by valgrind.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31749.
2014-05-13 23:04:40 +00:00
Nathan Hjelm
c15c3dee4c common/ugni: fix bugs in r31557
This commit was SVN r31748.

The following SVN revision numbers were found above:
  r31557 --> open-mpi/ompi@c4c9bc1573
2014-05-13 22:37:01 +00:00
Nathan Hjelm
2a57e71a47 plm/alps: fix typo introduced in r31589
This commit was SVN r31747.

The following SVN revision numbers were found above:
  r31589 --> open-mpi/ompi@445b552d3a
2014-05-13 22:36:54 +00:00
Nathan Hjelm
dd8de4d6eb btl/ugni: fix memory leaks and silence some warnings
The smsg_mboxes free list was not getting destructed. The construct
has been moved to module initialization and a matching destruct is now
in the module destruct.

This commit was SVN r31746.
2014-05-13 21:22:33 +00:00
Nathan Hjelm
9c45e4152d sbgp/base: fix memory leaks
This commit fixes memory leaks discovered in the sbgp setup code. We
were leaking an opal_argv as well as some list items. I took the
opportunity to clean up the code a little which includes making use of
the opal_argv_free function.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31745.
2014-05-13 21:22:25 +00:00
Nathan Hjelm
ddd501c0d9 bcol/base: cleanup code and fix memory leak
The items in the available bcol list were getting leaked. This commit
fixes this leak. I also cleaned up the code a bit. This includes
making use of the opal_argv_free function.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31744.
2014-05-13 21:22:18 +00:00
Nathan Hjelm
c32d84154a coll/ml: fix leaks and close all the framework opened
It is essential to call mca_base_framework_close for every framework
that is opened. coll/ml was not doing this so neither bcol nor sbgp
were getting cleaned up. This commit fixes this omission.

Also fixed a leak caused by calling OBJ_DESTRUCT for something created
with OBJ_NEW. With these changes coll/ml appears to be valgrind clean.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31743.
2014-05-13 21:22:12 +00:00
Rolf vandeVaart
71998ad67b Change one argument to use const. Matches better how function is called.
This commit was SVN r31741.
2014-05-13 18:55:41 +00:00
Nathan Hjelm
fc4f932cc2 Fix bug in r31716
Simple bug. The dist_graph pointer must be a constructed object. The
change from malloc to OBJ_NEW was missing from r31716. Tested with MTT
and everything looks ok now.

This commit was SVN r31739.

The following SVN revision numbers were found above:
  r31716 --> open-mpi/ompi@e3df77548d
2014-05-13 17:39:43 +00:00
Nathan Hjelm
a78519a2b2 btl/scif: remove call to pthread_cancel
There is no reason to cancel the listening thread. It should die
automatically when the file descriptor is closed. It is sufficient
to just wait for the thread to exit with pthread join.

cmr=v1.8.2:ticket=trac:4616:reviewer=jsquyres

This commit was SVN r31738.

The following Trac tickets were found above:
  Ticket 4616 --> https://svn.open-mpi.org/trac/ompi/ticket/4616
2014-05-13 17:29:53 +00:00
Nathan Hjelm
c13c21d476 basesmuma: clean up the setup code and ensure mapped files are unmapped
We were leaking file descriptors when coll/ml was in use. It turn out
this was because basesmuma was failing to unmap files it had previously
mapped. This commit cleans up the setup code to ensure that we only
attempt to map the control files once per module and then ensures the
files are unmapped when the module is released.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31737.
2014-05-13 17:00:31 +00:00
Jeff Squyres
93060d1cf2 Sync NEWS with v1.6 branch (v1.6.6 bullets).
This commit was SVN r31735.
2014-05-13 15:58:34 +00:00
Jeff Squyres
a24688db85 Per RFC
(http://www.open-mpi.org/community/lists/devel/2014/05/14749.php),
remove autogen.sh.

Long live autogen.pl!

This commit was SVN r31730.
2014-05-13 15:19:34 +00:00
Nathan Hjelm
9e3a0d7b7a basesmuma: modify the minimum size for the large fan-in fan-out allreduce
algorithm

Per suggestion from Manju make sure there isn't a gap in the size ranges
for the available algorithms.

cmr=v1.8.2:ticket=trac:4437:reviewer=ompi-rm1.8

This commit was SVN r31728.

The following Trac tickets were found above:
  Ticket 4437 --> https://svn.open-mpi.org/trac/ompi/ticket/4437
2014-05-13 14:56:21 +00:00
Ralph Castain
f55c587a74 Per patch from Tetsuya Mishima, ensure the rank_file mapper accurately tracks number of nodes in the map
Refs trac:4594

This commit was SVN r31725.

The following Trac tickets were found above:
  Ticket 4594 --> https://svn.open-mpi.org/trac/ompi/ticket/4594
2014-05-13 14:36:25 +00:00
Jeff Squyres
7afca329c2 Fix one more spacing issue.
This commit was SVN r31720.
2014-05-13 13:48:55 +00:00
Jeff Squyres
1df98c1fc7 Make the Username long enough to accomodate all usernames. Also put
the RIST organization in alphabetical order.

This commit was SVN r31718.
2014-05-13 12:58:01 +00:00
Gilles Gouaillardet
209378efec btl/scif: prevent SIGSEGV from occuring when the module is unloaded
Fixes trac:4615

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31717.

The following Trac tickets were found above:
  Ticket 4615 --> https://svn.open-mpi.org/trac/ompi/ticket/4615
2014-05-13 10:04:38 +00:00
Gilles Gouaillardet
e3df77548d Fix memory leak when releasing a communicator created by
MPI_Cart_Create/MPI_Graph_create/MPI_Dist_Graph

Fixes trac:4581

This commit was SVN r31716.

The following Trac tickets were found above:
  Ticket 4581 --> https://svn.open-mpi.org/trac/ompi/ticket/4581
2014-05-13 04:49:23 +00:00
Nathan Hjelm
0b8bb2339b btl/scif: update the size when preparing send fragments
Thanks to Gilles Gouaillardet for catching this.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31715.
2014-05-12 22:05:31 +00:00
Gilles Gouaillardet
4fc6801f4e Add Gilles Gouaillardet / RIST to the AUTHORS file
This commit was SVN r31714.
2014-05-12 10:46:34 +00:00
MPI Team
43c40b6b8a Update git/hg ignore files
This commit was SVN r31713.
2014-05-10 05:00:26 +00:00
Jeff Squyres
e37c7af0fb usnic: update cclient/cagent to use unix domain sockets (not RML)
In preparation for moving the BTLs down to OPAL, discontinue the use
of the RML for connectivity client/agent communication.  Instead, use
local unix domain sockets in the job session directory (all
communication is between processes on the same server, so unix domain
sockets are fine).

This commit was SVN r31710.
2014-05-09 20:35:36 +00:00
Jeff Squyres
3de7bb61cb Linux specfile: update %description blocks
Update the Open MPI description and fix lots of grammatical errors in
the OpenSHMEM description.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31709.
2014-05-09 14:21:36 +00:00
Ralph Castain
f4650e83c3 Missed one ignore location
This commit was SVN r31708.
2014-05-09 14:08:36 +00:00
Ralph Castain
fc40c6d770 Remove orcm from ignore properties
This commit was SVN r31707.
2014-05-09 14:03:48 +00:00
Ralph Castain
5388347511 Per Jeff's suggestion, remove function that has duplicate functionality and just use one to check if session_dir directory should be removed.
Refs trac:4584

This commit was SVN r31691.

The following Trac tickets were found above:
  Ticket 4584 --> https://svn.open-mpi.org/trac/ompi/ticket/4584
2014-05-08 17:22:43 +00:00
Jeff Squyres
184e4fc0ca usnic: ensure that procs agree on use_udp value
Add the component use_udp value into the modex.  If my component's
use_udp value doesn't agree with the use_udp value from a peer's modex
data, print a helpful message and disqualify the usnic BTL (the usnic
BTL will not be used).  This prevents accidental customer
misconfigurations.

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31689.
2014-05-08 16:43:50 +00:00
Jeff Squyres
e9c3df652e usnic: reduce sizeof(ompi_btl_usnic_addr_t) to 56 bytes
Trivial struct re-ordering to eliminate holes in the middle of the
struct (although there's still a hole at the end) and reduce the
overall size of the struct from 64 to 56 bytes.  Also change mtu from
int to uint16_t; there was no need for it to be that large.

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31688.
2014-05-08 16:38:59 +00:00
Jeff Squyres
a61e4d6425 usnic: fix connectivity checker timeout
Fix mismatch between the MCA param (which expresses the timeout in
*mili*seconds) and the struct timeval timeout (which expresses the
timeout in *micro*seconds).

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31687.
2014-05-08 16:36:07 +00:00
Ralph Castain
ab4f8585b0 When we abort during MPI_Init, we currently emit a totally incorrect error message stating that we were unable to aggregate error messages and cannot guarantee all other processes were killed. This simply isn't true IF the rte has been initialized.
So track that the rte has reached that point, and only emit the new message if it is accurate.

Note that we still generate a TON of output for a minor error:

Ralphs-iMac:examples rhc$ mpirun -n 3 -mca btl sm ./hello_c
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[50239,1],2]) is on host: Ralphs-iMac
  Process 2 ([[50239,1],2]) is on host: Ralphs-iMac
  BTLs attempted: sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;

 * Check the output of ompi_info to see which BTL/MTL plugins are
   available.
 * Run your application with MPI_THREAD_SINGLE.
 * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
   if using MTL-based communications) to see exactly which
   communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[50239,1],2]
  Exit code:    1
--------------------------------------------------------------------------
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mca-bml-r2.txt / unreachable proc
[Ralphs-iMac.local:23227] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mpi-runtime / mpi_init:startup:pml-add-procs-fail
Ralphs-iMac:examples rhc$ 

Hopefully, we can agree on a way to reduce this verbage!

This commit was SVN r31686.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2014-05-08 15:48:16 +00:00
Ralph Castain
aaae4841e9 Flush the show_help system on our way out - this also restores the opal_show_help function pointer to the OPAL layer for any subsequent processing.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31685.
2014-05-08 14:37:47 +00:00
Ralph Castain
5602156a1c Use the correct abstraction layer name for the data dirs
This commit was SVN r31684.
2014-05-08 14:32:24 +00:00
Jeff Squyres
81afb4e18a hwloc: commit minor bug fix from hwloc git
Bring down 3aa0ed6 from the hwloc v1.7 branch: Stevens says we should
GETFD before we SETFD, so we do

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31683.
2014-05-08 14:29:10 +00:00
Jeff Squyres
cb292b91cd This file looks like it was accidentally committed.
This commit was SVN r31682.
2014-05-08 13:59:34 +00:00
MPI Team
a7505bcaad Update git/hg ignore files
This commit was SVN r31681.
2014-05-08 05:00:38 +00:00
Ralph Castain
76f5991ab2 Couple of minor fixes
This commit was SVN r31680.
2014-05-08 02:26:45 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
Ralph Castain
05590b6a8c Correct the datastore containing the coprocessor info
This commit was SVN r31677.
2014-05-07 19:29:12 +00:00
Ralph Castain
2dbeb671d0 Fix typo impacting assembly support that came in during renaming
This commit was SVN r31676.
2014-05-07 16:22:11 +00:00
Ralph Castain
70ebf2efea One more level of subsubsubsubsubtitle...
This commit was SVN r31675.
2014-05-07 15:51:20 +00:00
Ralph Castain
74983c9002 Continue the renaming, fix ompi_show_subsubtitle
This commit was SVN r31674.
2014-05-07 15:45:47 +00:00
Ralph Castain
27faf2684a Update architecture names in OSHMEM branch
This commit was SVN r31673.
2014-05-07 14:40:49 +00:00