1
1
Граф коммитов

20231 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
0ef6baffd3 Fix bug in r31764
Need to remove the items of the list to avoid an assert in debug builds.

cmr=v1.8.2:ticket=trac:4628

This commit was SVN r31769.

The following SVN revision numbers were found above:
  r31764 --> open-mpi/ompi@13fd6ae774

The following Trac tickets were found above:
  Ticket 4628 --> https://svn.open-mpi.org/trac/ompi/ticket/4628
2014-05-14 23:45:50 +00:00
Nathan Hjelm
4113cfa03a pml/ob1: add missing OBJ_DESTRUCT
An OBJ_DESTRUCT was missing for mca_pml_ob1.send_ranges causing a
memory leak. Identified by valgrind.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31768.
2014-05-14 21:15:45 +00:00
Nathan Hjelm
32a85c6d7d allocator/bucket: free all memory associated with a bucket allocator
when it is finalized.

Fixes a leak identified by valgrind.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31767.
2014-05-14 21:15:39 +00:00
Nathan Hjelm
279c0a3ca7 comm: fix communicator subsystem leaks
This commit fixes two leaks:

 - We never destructed the attributes on MPI_COMM_WORLD. All other
   communicators that have attributes are released through
   ompi_comm_free which does the attribute destruction. For
   MPI_COMM_WORLD this is now done before the destructor is called.

 - Add missing OBJ_RELEASE for ompi_comm_f_to_c_table.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31766.
2014-05-14 21:15:33 +00:00
Nathan Hjelm
75fb6dbba9 opal/event: release the opal event context when closing the event base
This fixes a series of leaks identified by valgrind. Since
opal_event_base_open was creating the event base it follows that
opal_event_base_close should release the event base.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31765.
2014-05-14 21:15:27 +00:00
Nathan Hjelm
13fd6ae774 opal_free_list: destruct free lists items when destructing the free
list

This commit updates the behavior of opal_free_list_t to match the
behavior of ompi_free_list_t. opal_free_list_t constructed items
placed on the free list but never destructed them.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31764.
2014-05-14 21:15:19 +00:00
Ralph Castain
ad0e8f841d Just pick a module to handle the incoming connection if no direct interface is identified. Siegmar hit it because his IP/netmask is disjoint, but a router was able to make the connection.
Refs trac:4627

This commit was SVN r31763.

The following Trac tickets were found above:
  Ticket 4627 --> https://svn.open-mpi.org/trac/ompi/ticket/4627
2014-05-14 19:23:02 +00:00
Ralph Castain
a202139656 Looks like someone missed a '|' in this comparison, so correct it here
This commit was SVN r31760.
2014-05-14 16:53:22 +00:00
Ralph Castain
e605e73379 Close the incoming socket if we aren't going to accept it
cmr=v1.8.2:reviewer=rhc

This commit was SVN r31759.
2014-05-14 16:51:59 +00:00
Nathan Hjelm
82934b173c btl/scif: make the exiting member volatile
cmr=v1.8.2:ticket=trac:4626

This commit was SVN r31757.

The following Trac tickets were found above:
  Ticket 4626 --> https://svn.open-mpi.org/trac/ompi/ticket/4626
2014-05-14 16:22:24 +00:00
Nathan Hjelm
97605a4002 btl/scif: fix hang at shutdown
scif_close is not causing scif_poll in the listening thread to return
as expected. To ensure the thread exits attempt to make a local
connection to wake up the thread before calling pthread_join.

cmr=v1.8.2:reviewer=ggouaillardet

This commit was SVN r31756.
2014-05-14 16:14:00 +00:00
Jeff Squyres
2abf647f59 configure.ac: s/autogen.sh/autogen.pl/ in some comments
This commit was SVN r31755.
2014-05-14 15:31:39 +00:00
Ralph Castain
3a1c2fff3e Correct a misplaced bracket - daemons shouldn't be doing app-related operations
This may need a patch for 1.8.2, but we can try to directly apply it

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31754.
2014-05-14 15:23:30 +00:00
George Bosilca
f27123a20d Fix the add_proc issue identified by Jeff: the TCP BTL now discard a
peer proc without TCP support instead of completely dropping TCP support for the entire job.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31753.
2014-05-14 13:47:57 +00:00
Mike Dubman
95e637f5ba OSHMEM: fix error message when aborting on OOM
fixed by Roman, reviewed by Miked

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31752.
2014-05-14 13:45:16 +00:00
Mike Dubman
644aa6f737 OSHMEM: add missing profiling API for shmem_finalize
fixed by Roman, reviewed by Miked
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31751.
2014-05-14 13:13:30 +00:00
Ralph Castain
8295fae4a9 Remove stale file reference
This commit was SVN r31750.
2014-05-14 01:50:09 +00:00
Nathan Hjelm
518f188ad4 bml/base: ensure all components are closed when the framework is
closed

We were leaving the selected component open. This commit should
eliminate a leak detected by valgrind.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31749.
2014-05-13 23:04:40 +00:00
Nathan Hjelm
c15c3dee4c common/ugni: fix bugs in r31557
This commit was SVN r31748.

The following SVN revision numbers were found above:
  r31557 --> open-mpi/ompi@c4c9bc1573
2014-05-13 22:37:01 +00:00
Nathan Hjelm
2a57e71a47 plm/alps: fix typo introduced in r31589
This commit was SVN r31747.

The following SVN revision numbers were found above:
  r31589 --> open-mpi/ompi@445b552d3a
2014-05-13 22:36:54 +00:00
Nathan Hjelm
dd8de4d6eb btl/ugni: fix memory leaks and silence some warnings
The smsg_mboxes free list was not getting destructed. The construct
has been moved to module initialization and a matching destruct is now
in the module destruct.

This commit was SVN r31746.
2014-05-13 21:22:33 +00:00
Nathan Hjelm
9c45e4152d sbgp/base: fix memory leaks
This commit fixes memory leaks discovered in the sbgp setup code. We
were leaking an opal_argv as well as some list items. I took the
opportunity to clean up the code a little which includes making use of
the opal_argv_free function.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31745.
2014-05-13 21:22:25 +00:00
Nathan Hjelm
ddd501c0d9 bcol/base: cleanup code and fix memory leak
The items in the available bcol list were getting leaked. This commit
fixes this leak. I also cleaned up the code a bit. This includes
making use of the opal_argv_free function.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31744.
2014-05-13 21:22:18 +00:00
Nathan Hjelm
c32d84154a coll/ml: fix leaks and close all the framework opened
It is essential to call mca_base_framework_close for every framework
that is opened. coll/ml was not doing this so neither bcol nor sbgp
were getting cleaned up. This commit fixes this omission.

Also fixed a leak caused by calling OBJ_DESTRUCT for something created
with OBJ_NEW. With these changes coll/ml appears to be valgrind clean.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31743.
2014-05-13 21:22:12 +00:00
Rolf vandeVaart
71998ad67b Change one argument to use const. Matches better how function is called.
This commit was SVN r31741.
2014-05-13 18:55:41 +00:00
Nathan Hjelm
fc4f932cc2 Fix bug in r31716
Simple bug. The dist_graph pointer must be a constructed object. The
change from malloc to OBJ_NEW was missing from r31716. Tested with MTT
and everything looks ok now.

This commit was SVN r31739.

The following SVN revision numbers were found above:
  r31716 --> open-mpi/ompi@e3df77548d
2014-05-13 17:39:43 +00:00
Nathan Hjelm
a78519a2b2 btl/scif: remove call to pthread_cancel
There is no reason to cancel the listening thread. It should die
automatically when the file descriptor is closed. It is sufficient
to just wait for the thread to exit with pthread join.

cmr=v1.8.2:ticket=trac:4616:reviewer=jsquyres

This commit was SVN r31738.

The following Trac tickets were found above:
  Ticket 4616 --> https://svn.open-mpi.org/trac/ompi/ticket/4616
2014-05-13 17:29:53 +00:00
Nathan Hjelm
c13c21d476 basesmuma: clean up the setup code and ensure mapped files are unmapped
We were leaking file descriptors when coll/ml was in use. It turn out
this was because basesmuma was failing to unmap files it had previously
mapped. This commit cleans up the setup code to ensure that we only
attempt to map the control files once per module and then ensures the
files are unmapped when the module is released.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31737.
2014-05-13 17:00:31 +00:00
Jeff Squyres
93060d1cf2 Sync NEWS with v1.6 branch (v1.6.6 bullets).
This commit was SVN r31735.
2014-05-13 15:58:34 +00:00
Jeff Squyres
a24688db85 Per RFC
(http://www.open-mpi.org/community/lists/devel/2014/05/14749.php),
remove autogen.sh.

Long live autogen.pl!

This commit was SVN r31730.
2014-05-13 15:19:34 +00:00
Nathan Hjelm
9e3a0d7b7a basesmuma: modify the minimum size for the large fan-in fan-out allreduce
algorithm

Per suggestion from Manju make sure there isn't a gap in the size ranges
for the available algorithms.

cmr=v1.8.2:ticket=trac:4437:reviewer=ompi-rm1.8

This commit was SVN r31728.

The following Trac tickets were found above:
  Ticket 4437 --> https://svn.open-mpi.org/trac/ompi/ticket/4437
2014-05-13 14:56:21 +00:00
Ralph Castain
f55c587a74 Per patch from Tetsuya Mishima, ensure the rank_file mapper accurately tracks number of nodes in the map
Refs trac:4594

This commit was SVN r31725.

The following Trac tickets were found above:
  Ticket 4594 --> https://svn.open-mpi.org/trac/ompi/ticket/4594
2014-05-13 14:36:25 +00:00
Jeff Squyres
7afca329c2 Fix one more spacing issue.
This commit was SVN r31720.
2014-05-13 13:48:55 +00:00
Jeff Squyres
1df98c1fc7 Make the Username long enough to accomodate all usernames. Also put
the RIST organization in alphabetical order.

This commit was SVN r31718.
2014-05-13 12:58:01 +00:00
Gilles Gouaillardet
209378efec btl/scif: prevent SIGSEGV from occuring when the module is unloaded
Fixes trac:4615

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31717.

The following Trac tickets were found above:
  Ticket 4615 --> https://svn.open-mpi.org/trac/ompi/ticket/4615
2014-05-13 10:04:38 +00:00
Gilles Gouaillardet
e3df77548d Fix memory leak when releasing a communicator created by
MPI_Cart_Create/MPI_Graph_create/MPI_Dist_Graph

Fixes trac:4581

This commit was SVN r31716.

The following Trac tickets were found above:
  Ticket 4581 --> https://svn.open-mpi.org/trac/ompi/ticket/4581
2014-05-13 04:49:23 +00:00
Nathan Hjelm
0b8bb2339b btl/scif: update the size when preparing send fragments
Thanks to Gilles Gouaillardet for catching this.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31715.
2014-05-12 22:05:31 +00:00
Gilles Gouaillardet
4fc6801f4e Add Gilles Gouaillardet / RIST to the AUTHORS file
This commit was SVN r31714.
2014-05-12 10:46:34 +00:00
MPI Team
43c40b6b8a Update git/hg ignore files
This commit was SVN r31713.
2014-05-10 05:00:26 +00:00
Jeff Squyres
e37c7af0fb usnic: update cclient/cagent to use unix domain sockets (not RML)
In preparation for moving the BTLs down to OPAL, discontinue the use
of the RML for connectivity client/agent communication.  Instead, use
local unix domain sockets in the job session directory (all
communication is between processes on the same server, so unix domain
sockets are fine).

This commit was SVN r31710.
2014-05-09 20:35:36 +00:00
Jeff Squyres
3de7bb61cb Linux specfile: update %description blocks
Update the Open MPI description and fix lots of grammatical errors in
the OpenSHMEM description.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31709.
2014-05-09 14:21:36 +00:00
Ralph Castain
f4650e83c3 Missed one ignore location
This commit was SVN r31708.
2014-05-09 14:08:36 +00:00
Ralph Castain
fc40c6d770 Remove orcm from ignore properties
This commit was SVN r31707.
2014-05-09 14:03:48 +00:00
Ralph Castain
5388347511 Per Jeff's suggestion, remove function that has duplicate functionality and just use one to check if session_dir directory should be removed.
Refs trac:4584

This commit was SVN r31691.

The following Trac tickets were found above:
  Ticket 4584 --> https://svn.open-mpi.org/trac/ompi/ticket/4584
2014-05-08 17:22:43 +00:00
Jeff Squyres
184e4fc0ca usnic: ensure that procs agree on use_udp value
Add the component use_udp value into the modex.  If my component's
use_udp value doesn't agree with the use_udp value from a peer's modex
data, print a helpful message and disqualify the usnic BTL (the usnic
BTL will not be used).  This prevents accidental customer
misconfigurations.

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31689.
2014-05-08 16:43:50 +00:00
Jeff Squyres
e9c3df652e usnic: reduce sizeof(ompi_btl_usnic_addr_t) to 56 bytes
Trivial struct re-ordering to eliminate holes in the middle of the
struct (although there's still a hole at the end) and reduce the
overall size of the struct from 64 to 56 bytes.  Also change mtu from
int to uint16_t; there was no need for it to be that large.

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31688.
2014-05-08 16:38:59 +00:00
Jeff Squyres
a61e4d6425 usnic: fix connectivity checker timeout
Fix mismatch between the MCA param (which expresses the timeout in
*mili*seconds) and the struct timeval timeout (which expresses the
timeout in *micro*seconds).

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31687.
2014-05-08 16:36:07 +00:00
Ralph Castain
ab4f8585b0 When we abort during MPI_Init, we currently emit a totally incorrect error message stating that we were unable to aggregate error messages and cannot guarantee all other processes were killed. This simply isn't true IF the rte has been initialized.
So track that the rte has reached that point, and only emit the new message if it is accurate.

Note that we still generate a TON of output for a minor error:

Ralphs-iMac:examples rhc$ mpirun -n 3 -mca btl sm ./hello_c
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[50239,1],2]) is on host: Ralphs-iMac
  Process 2 ([[50239,1],2]) is on host: Ralphs-iMac
  BTLs attempted: sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;

 * Check the output of ompi_info to see which BTL/MTL plugins are
   available.
 * Run your application with MPI_THREAD_SINGLE.
 * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
   if using MTL-based communications) to see exactly which
   communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[50239,1],2]
  Exit code:    1
--------------------------------------------------------------------------
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mca-bml-r2.txt / unreachable proc
[Ralphs-iMac.local:23227] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mpi-runtime / mpi_init:startup:pml-add-procs-fail
Ralphs-iMac:examples rhc$ 

Hopefully, we can agree on a way to reduce this verbage!

This commit was SVN r31686.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2014-05-08 15:48:16 +00:00
Ralph Castain
aaae4841e9 Flush the show_help system on our way out - this also restores the opal_show_help function pointer to the OPAL layer for any subsequent processing.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31685.
2014-05-08 14:37:47 +00:00
Ralph Castain
5602156a1c Use the correct abstraction layer name for the data dirs
This commit was SVN r31684.
2014-05-08 14:32:24 +00:00