1
1
Граф коммитов

7510 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
7395b65531 usnic: remove some debugging verbose output
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r32027.
2014-06-18 14:14:34 +00:00
Nathan Hjelm
fd21b244ce osc/rdma: better name for lookup function
cmr=v1.8.2:ticket=trac:4732:reviewer=ompi-rm1.8

This commit was SVN r32021.

The following Trac tickets were found above:
  Ticket 4732 --> https://svn.open-mpi.org/trac/ompi/ticket/4732
2014-06-17 19:49:17 +00:00
Nathan Hjelm
390f8f52b4 osc/rdma: clean up group process matching a bit
cmr=v1.8.2:ticket=trac:4732:reviewer=dgoodell

This commit was SVN r32018.

The following Trac tickets were found above:
  Ticket 4732 --> https://svn.open-mpi.org/trac/ompi/ticket/4732
2014-06-17 17:48:30 +00:00
Nathan Hjelm
7f20868179 osc/rdma: ensure matching of post/start calls
The post and start window calls are supposed to be matching. The code
did not check to see that an incoming post matched with the start call.
This commit fixes the bug by placing the post on a pending list that
will be checked by the next call to start.

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r32017.
2014-06-17 15:23:06 +00:00
Nathan Hjelm
927098d567 osc/rdma: fix hang when accumulating with MPI_REPLACE
The replace callback did not increment the incoming frag counter. This
leads to a hang during synchronization. This commit adds the increment
and also puts the request on the garbage collection list to fix a leak.

This fixes a hang found when running the mpich test suite.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32016.
2014-06-17 14:53:29 +00:00
Nathan Hjelm
7f6de57653 osc/rdma: fix accumulate fragment size calculation
The wrong type was used when calculating the amount of space needed
for an accumulate fragment. Fixed the calculation and took the
opportunity to eliminate the get_acc header as it is identical to the
acc header.

This fixes trac:4719 and #4718

Tracking these fixes for 1.8.2 in this CMR.

Throwing this to Brad for review as he is the one who ran into the issue.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32015.

The following Trac tickets were found above:
  Ticket 4719 --> https://svn.open-mpi.org/trac/ompi/ticket/4719
2014-06-17 14:53:24 +00:00
Gilles Gouaillardet
e9ed9def02 Fix MPI_Alltoallv in coll/tuned
This changeset :
- always call the low/level implementation for :
  * MPI_Alltoallv
  * MPI_Neighbor_alltoallv
  * MPI_Alltoallw
  * MPI_Neighbor_alltoallv
- fix mca_coll_tuned_alltoallv_intra_basic_inplace
  so zero size types are correctly handled

cmr=v1.8.2:reviewer=bosilca:ticket=4715

This commit was SVN r32013.

The following Trac tickets were found above:
  Ticket 4715 --> https://svn.open-mpi.org/trac/ompi/ticket/4715
2014-06-17 06:11:34 +00:00
Nathan Hjelm
2f96f16416 osc/rdma: ensure eager sends are active before checking for sync errors
in self optimization

This addresses an issue found with the MPICH pscw_ordering test. Eager sends
were not yet active (which is ok for the standard path) but not ok for the
self optimization. Fixed by waiting for all post messages before checking
the sync state.

Fixes trac:4724

Tracking the 1.8.2 issue in this CMR.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32012.

The following Trac tickets were found above:
  Ticket 4724 --> https://svn.open-mpi.org/trac/ompi/ticket/4724
2014-06-17 04:53:47 +00:00
Nathan Hjelm
37ae430424 rma: fix locking/unlocking of MPI_PROC_NULL
It is valid to lock/unlock MPI_PROC_NULL. It probably isn't work tracking
whether MPI_PROC_NULL is locked for MPI_PROC_NULL RMA operations so this
is probably the permanent solution.

Closes trac:4720

Tracking the 1.8.2 issue with this CMR.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32011.

The following Trac tickets were found above:
  Ticket 4720 --> https://svn.open-mpi.org/trac/ompi/ticket/4720
2014-06-17 04:41:49 +00:00
Nathan Hjelm
41f0059f1e osc/sm: use an unsigned long when calculating the total segment size
Brad correctly pointed out that the total window size should not be an
int. Changed it to an unsigned long.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32010.
2014-06-17 04:33:43 +00:00
Nathan Hjelm
6ec9c6c422 osc/sm: return ompi_request_empty for all request ops
Only one field is valid for RMA requests: MPI_ERROR. This field is set
to the correct value in ompi_request_empty so there is no reason to
allocate and keep track of osc/sm requests because they are always
complete on return. Since we are no longer using the osc/sm request
structure or free list they are now removed.

Closes trac:4723

Tracking this issue with the CMR. Brad, can you verify the issue is indeed fixed.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32009.

The following Trac tickets were found above:
  Ticket 4723 --> https://svn.open-mpi.org/trac/ompi/ticket/4723
2014-06-17 04:27:02 +00:00
George Bosilca
84193fff6d More comprehensible error messages.
This commit was SVN r32007.
2014-06-16 20:23:16 +00:00
George Bosilca
542e4996a7 Cleanup the utilities functions in tuned.
This commit was SVN r31987.
2014-06-13 16:04:45 +00:00
Gilles Gouaillardet
50256c62c5 Fix MPI_Alltoallv in coll/tuned.
Correctly handle the corner case in MPI_Alltoallv when
some tasks have no data to transfer and some other tasks
do have data to transfer.

This test case is covered in ibm/collective/alltoallv_somezeros
from the ompi-tests repo.

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31985.
2014-06-13 06:03:23 +00:00
Mike Dubman
b51a42aeca MXM: fix mxm cleanup, should be called for any compat API
fixe by miked, reviewed by yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31984.
2014-06-12 15:46:38 +00:00
George Bosilca
fd0e1b7261 If we detect an error on a request that has been already released
at the MPI level, we should call abort on MPI_COMM_WORLD.

Fixes ticket #1943.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31982.
2014-06-10 16:24:13 +00:00
Alina Sklarevich
7b8ad47e93 MXM: fix env variable name to hint for thread usage in mxm
reviewed by MikeD
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31968.
2014-06-09 06:40:32 +00:00
George Bosilca
f5ebd2faeb Fix the Fortran issue identified by Akan Sang Loon. The dist graph
is really special as the weights can be one of the following three
values (NULL, EMPTY or some legal value). As such, we need a complex
if to correctly convert the Fortran value to the corresponding C
value. Thus, always defining the c_ array is the simplest and most
straighforward approach.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31955.
2014-06-05 17:10:48 +00:00
Ralph Castain
248a4b100f Per Artem, we don't know our VPID at the time of getting the initial timing mark, so just get it if timing is requested
This commit was SVN r31951.
2014-06-04 16:28:41 +00:00
Jeff Squyres
6cf49a3c57 Revert r31934: that MPI_Type_f2c.3 is already in the file list.
This commit was SVN r31950.

The following SVN revision numbers were found above:
  r31934 --> open-mpi/ompi@3ed4aaea99
2014-06-04 13:58:42 +00:00
Oscar Vega-Gisbert
c7b229f03e Java - slice methods: set buffer limit to its capacity before change its position
This commit was SVN r31943.
2014-06-03 21:32:27 +00:00
Jeff Squyres
3ed4aaea99 man pages: Add MPI_Type_f2c.3 file into the tarball
cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31934.
2014-06-03 18:28:50 +00:00
Jeff Squyres
87f9f6815f use-mpi-tkr: fix ierr->ierror param names
Issue noted by Walter Spector on the user's mailing list.

Throwing to Craig Rasmussen for review.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31933.
2014-06-03 15:26:02 +00:00
Jeff Squyres
82d4e33510 usnic: let connectivity client timeout if agent never appears
This would be a really, really weird case if it ever happens (i.e.,
you have usnics but the agent process failed somewhere in MPI_INIT
such that the agent never appears), but having an infinite loop
doesn't seem like a good idea.

(does not need to go to v1.8 because v1.8 still uses RML for
communication for the connectivity checker)

This commit was SVN r31932.
2014-06-02 18:14:35 +00:00
Jeff Squyres
af04d60098 usnic: do not enable the connecitivty agent if there are no usnics
Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31931.
2014-06-02 18:09:55 +00:00
Gilles Gouaillardet
1a17a2a960 Fixes MEMCHECKER vs MPI_IN_PLACE in *alltoall*
cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31924.
2014-06-02 04:43:48 +00:00
Gilles Gouaillardet
c930e44bad Fetch info from both opal_dstore_nonpeer and opal_dstore_peer
This conservative fixes tries to fetch info from both
opal_dstore_nonpeer and opal_dstore_peer.
This is required is task A spawns tasks B and C.
B was previously unable to find info from C, this caused locality
info not being set and a hang in coll/ml init.

no CMR is required since v1.8 uses a unique dstore

This commit was SVN r31923.
2014-06-02 02:34:30 +00:00
Gilles Gouaillardet
d1bcd103ac btl/openib : correctly handle eager rdma buffers in mca_btl_openib_del_procs
if eager rdma is used, endpoint reference_count is greater than one.
this commit is a temporary fix that OBJ_RELEASE the endpoint as much as needed.
thought this is likely correct, it can be suboptimal and hence needs to be reviewed

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31922.
2014-06-02 02:23:52 +00:00
Ralph Castain
9305756276 Per RFC:
http://www.open-mpi.org/community/lists/devel/2014/05/14838.php

Remove stale component

This commit was SVN r31917.
2014-06-01 17:03:03 +00:00
Ralph Castain
8736a1c138 Per RFC:
http://www.open-mpi.org/community/lists/devel/2014/05/14822.php

Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root).

This commit was SVN r31916.
2014-06-01 16:14:10 +00:00
Ralph Castain
cf2c7381d0 Replace the PML barrier with an RTE barrier for now until we can come up with a better solution for connectionless BTLs.
Refs trac:4643

This commit was SVN r31915.

The following Trac tickets were found above:
  Ticket 4643 --> https://svn.open-mpi.org/trac/ompi/ticket/4643
2014-06-01 16:08:56 +00:00
Oscar Vega-Gisbert
cc219218a7 Java: update javadoc's install locations
This commit was SVN r31914.
2014-06-01 15:40:14 +00:00
Alina Sklarevich
f8a664f5ec MXM: generate the jobid only for MXM versions under v2.0.
reviewed by miked
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31910.
2014-06-01 13:29:24 +00:00
Ralph Castain
1107f9099e Per the RFC issued here:
http://www.open-mpi.org/community/lists/devel/2014/05/14827.php

Refactor PMI support

This commit was SVN r31907.
2014-06-01 04:28:17 +00:00
Jeff Squyres
a1485569b9 If we don't find an OPAL dstore key (via modex), it's not an error --
we just didn't find it.  So don't ORTE_ERROR_LOG it.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31906.
2014-05-31 12:02:28 +00:00
Rolf vandeVaart
489664b4a9 Remove debug code.
This commit was SVN r31901.
2014-05-28 16:06:16 +00:00
Rolf vandeVaart
570e313c9b Add collective module to handle CUDA aware buffers and reductions. Per RFC sent last week.
Reviewed by bosilca.

This commit was SVN r31894.
2014-05-27 21:24:43 +00:00
Nathan Hjelm
2614dfc4bf bcol/basesmuma: fix remaining memory leaks in basesmuma
We were still leaking 1) file descriptors for data files, and 2) some
control files. I fixed both of these leaks and everything is looking
good. This should fix the bug where we are running out of file
descriptors when running the loop_spawn test. I also too the
opportunity to refactor the code a bit to make the mapping/unmapping
simpler. This should help avoid these sorts of issues in the future.

Depends on #4678

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31893.
2014-05-27 18:40:41 +00:00
Ralph Castain
f840013b41 Correct typo spotted by Gilles
This commit was SVN r31892.
2014-05-27 17:14:01 +00:00
Mike Dubman
7b05b5c4c2 HCOLL: use proper parameter in progress unregister
fixed by Nadezhda, reviewed by Elena/MikeD

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31891.
2014-05-26 09:03:30 +00:00
Gilles Gouaillardet
d04db1e213 Fix mmap flags in bcol_basesmuma_smcm_reg_mmap
if in_ptr is NULL, the MAP_FIXED flag cannot be passed to mmap

this caused a hang in topology/cart and topology/sub from ibm
test suite on trunk.

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31890.
2014-05-26 07:18:31 +00:00
Gilles Gouaillardet
75616320b5 remove all message output from ompi/communicator/comm_helpers.c
Thanks George for pointing out.

cmr=v1.8.2:reviewer=bosilca:ticket=4676

This commit was SVN r31889.

The following Trac tickets were found above:
  Ticket 4676 --> https://svn.open-mpi.org/trac/ompi/ticket/4676
2014-05-26 01:54:24 +00:00
Gilles Gouaillardet
40f3b849eb Fix argument checks for [i]neighbor_alltoall{v|w}
This fixes a bug introduced in :
 - r31815 (trunk) 
 - r31853 (v1.8 branch)

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31888.

The following SVN revision numbers were found above:
  r31815 --> open-mpi/ompi@8bafe06c57
  r31853 --> open-mpi/ompi@bff944d766
2014-05-23 08:19:17 +00:00
Gilles Gouaillardet
de15b623b5 do not incrementing/decrementing the opal_progress_event_state.
Per Ralph :
"I noticed that we are incrementing and decrementing the opal_progress_event state.
However, this no longer has any impact whatsoever on the RML as that is running in
the independent ORTE event thread. So all this actually does is impact the MPI layer
by adding an unnecessary overhead."

Thanks Ralph for pointing this :-)

cmr=v1.8.2:reviewer=rhc:ticket=4671

This commit was SVN r31887.

The following Trac tickets were found above:
  Ticket 4671 --> https://svn.open-mpi.org/trac/ompi/ticket/4671
2014-05-23 05:02:31 +00:00
Gilles Gouaillardet
4843909a6b correctly allocate cart in mca_topo_base_cart_sub(...)
since r31716 mca_topo_base_comm_cart_2_2_0_t is an object
and must be allocated/freed with OBJ_NEW/OBJ_RELEASE.

this fixes topology/cart_sub_zero from the ibm test suite.

v1.8 does not use objects, so no cmr for this branch

This commit was SVN r31883.

The following SVN revision numbers were found above:
  r31716 --> open-mpi/ompi@e3df77548d
2014-05-23 03:07:46 +00:00
George Bosilca
66e91f3797 Remove a warning on p.
And correctly handle the reference count from OBJ_NEW in the error
path (thanks Gilles for noticing).

This commit was SVN r31877.
2014-05-22 05:57:21 +00:00
George Bosilca
6ed1ac032e Release the buffer in all error cases and add small code cleanups.
This commit was SVN r31876.
2014-05-22 05:17:35 +00:00
Gilles Gouaillardet
5e9347bfb4 Fix deallocation error in mca_common_sm_rml_info_bcast()
buffer is sent (num_local_procs-1) time, so it should be
OBJ_RETAIN'ed (num_local_procs-2) time

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31874.
2014-05-22 04:53:42 +00:00
Jeff Squyres
af5399ab5b Gah; we need to have a unique barrier ID for the ompi_rte_barrier()
operation.  Ralph will fix shortly.

For the time being, put back the original code...

Refs trac:4669

This commit was SVN r31872.

The following Trac tickets were found above:
  Ticket 4669 --> https://svn.open-mpi.org/trac/ompi/ticket/4669
2014-05-22 00:31:39 +00:00
Jeff Squyres
c860f289ab usnic: re-order teardown sequence now that del_procs() is actually called
Now that the infrastructure is calling BTL del_procs() before the BTL
finalize(), the usnic BTL had to re-order some of its teardown
sequence to avoid assert() failing.

This is part of a larger conversation involving #4669.  Since
MPI_FINALIZE and MPI_COMM_DISCONNECT currently use an
oob/grpcomm-based barrier, the usnic BTL can ''absolutely know'' that
these endpoints and procs will no longer be used.  If the ORTE DPM
goes back to a PML-based barrier, the usnic BTL will need to grow more
complex teardown semantics (a la TCP socket FIN/ACK/FIN_WAIT states).

Refs trac:4669

This commit was SVN r31871.

The following Trac tickets were found above:
  Ticket 4669 --> https://svn.open-mpi.org/trac/ompi/ticket/4669
2014-05-21 23:59:17 +00:00
Jeff Squyres
5aa05f2f1d Potentially temporarily change the barrier in the ORTE DPM component
to be based on grpcomm (i.e., an out-of-band based barrier) rather
than the simplistic PML-based barrier that it currently uses.

This is pending a larger discussion with Nathan and George, but it
will allow the usnic BTL to stop assert()-failing in light of the
recent del_procs() change.

This commit was SVN r31870.
2014-05-21 23:09:01 +00:00
Mike Dubman
fad1063980 MXM: fix warning
reviewed by Yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31855.
2014-05-21 07:50:05 +00:00
George Bosilca
cc0239d52f Remove unused variables.
This commit was SVN r31846.
2014-05-21 01:34:26 +00:00
Jeff Squyres
b0a6e42f45 pml ob1: use the pre-computed size from the free lists
Based on a suggestion from George on #31806, use the pre-computed
sizes rather than duplicating the computation math (which may change
someday in the future).

cmr=v1.8.2:ticket=trac:4647

This commit was SVN r31841.

The following Trac tickets were found above:
  Ticket 4647 --> https://svn.open-mpi.org/trac/ompi/ticket/4647
2014-05-20 20:32:25 +00:00
George Bosilca
db9660264e Update the error message to pinpoint the right location.
Thanks Tim.

This commit was SVN r31839.
2014-05-20 20:08:42 +00:00
George Bosilca
685f051557 Move the allocator initialization from open to init. This clean
a memory leak. Similar changes shuld be applied to all the 
other PML that are copies of OB1. This patch is related to
#4653.

This commit was SVN r31838.
2014-05-20 19:34:18 +00:00
Gilles Gouaillardet
baf532087a Fix mca_coll_basic_alltoallw_inter()
Avoid sending/receiving zero size messages in order to be compliant
with the top-level modification

cmr=v1.8.2:ticket=4651:reviewer=bosilca

This commit was SVN r31836.

The following Trac tickets were found above:
  Ticket 4651 --> https://svn.open-mpi.org/trac/ompi/ticket/4651
2014-05-20 09:22:57 +00:00
George Bosilca
137874ec4d In fact btl_eager and btl_dma will just vanish upon destruction of the
bml_endpoint. No need to clean them bfore.

This commit was SVN r31835.
2014-05-20 08:53:22 +00:00
George Bosilca
85e3caaa17 Handle the del_procs correctly. The btl_send is the complete list
of existing BTL fo an endpoint, all the others are just partial list.
Thus, all the cleaning should first be done in the btl_send array,
and them in the other arrays (btl_eager and btl_rdma).

This commit was SVN r31834.
2014-05-20 08:46:57 +00:00
George Bosilca
1647664c43 Show the unreacheable message for the first unreacheable proc
and then stop.

This commit was SVN r31833.
2014-05-20 08:40:32 +00:00
Gilles Gouaillardet
c496131eef Fixes *alltoall* collectives at top
- fix bugs
 - silent warnings

cmr=v1.8.2:ticket=4651:reviewer=bosilca

This commit was SVN r31831.

The following Trac tickets were found above:
  Ticket 4651 --> https://svn.open-mpi.org/trac/ompi/ticket/4651
2014-05-20 05:32:57 +00:00
Gilles Gouaillardet
ef4548a215 bml/r2 : fix mca_bml_r2_del_procs()
cmr=v1.8.2:reviewer=hjelmn:ticket=trac:4645

This commit was SVN r31830.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855

The following Trac tickets were found above:
  Ticket 4645 --> https://svn.open-mpi.org/trac/ompi/ticket/4645
2014-05-20 02:55:47 +00:00
Nathan Hjelm
27d3e1ca25 bml/r2: fix a problem identified by Gilles
This commit fixes two issues:

 - This intent of the code @ bml_r2.c:486 is to prevent calling the
   btl_del_procs more than once for a given proc. Gilles correctly
   identified there was a problem in this code but r31786 we not the
   correct fix.

 - Fix a segmentation fault in r2 finalize revealed by the fact we
   actually call del_procs now.

cmr=v1.8.2:reviewer=ggouaillardet:ticket=trac:4645

This commit was SVN r31829.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
  r31786 --> open-mpi/ompi@fc96b0a7b8

The following Trac tickets were found above:
  Ticket 4645 --> https://svn.open-mpi.org/trac/ompi/ticket/4645
2014-05-19 20:22:34 +00:00
Nathan Hjelm
a1d5ce0893 pml/ob1: as per past RFC bring the inline send optimization to
MPI_Isend.

I filed an RFC for this optimization some time back. It is a
relatively simple optimization. If the data associated with an
MPI_Isend can be put on the wire without allocating an MPI_Request
then do so. In this case we can legally return omp_request_empty
which will correctly indicate that the request is complete and that is
was not cancelled (these are the only requirements on send requests).

cmr=v1.8.3:reviewer=bosilca

This commit was SVN r31828.
2014-05-19 19:34:59 +00:00
Nathan Hjelm
22e59b056a bcol/basesmuma: fix leak in basesmuma code
Basesmuma was vallocing space for control data then mmapping over that
data. Nothing in the code suggests any need for mmapping a specific
address so I did the following to remove the leak:

 - Removed the valloc of the buffer space

 - ftruncate the mmaped file to ensure there is sufficient memory to
   allocate space for the control data.

Ideally this code should be using opal/shmem but that is a larger
change. Keeping it simple for now.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31822.
2014-05-19 15:21:58 +00:00
Nathan Hjelm
dedf6b377e btl/vader: don't leak registration cache items
Make sure to cleanup the registation cache when removing an
endpoint. This leak only affects systems with XPMEM installed.

Since this is in code specific to XPMEM not sure who could review so
sending it directly to the RM.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31821.
2014-05-19 15:16:32 +00:00
Gilles Gouaillardet
2b89aac15b Fix a typo in MCA_PML_OB1_RECV_REQUEST_UNPACK
cmr=v1.8.2:reviewer=rhc

This commit was SVN r31817.
2014-05-19 11:00:13 +00:00
Gilles Gouaillardet
c82a6f5063 Fix a memory leak in mca/pml/bfo
Allocate the allocator in init rather than open

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31816.
2014-05-19 10:40:18 +00:00
Gilles Gouaillardet
8bafe06c57 Fixes *alltoall* collectives at top level
This commit :
 - Correctly retrieve the communicator size when
   checking memory and parameters
 - Ensure (sendtype,sendcount) and (recvtype,recvcount)
   matches and return with MPI_ERR_TRUNCATE otherwise
 - Return with MPI_SUCCESS without invoking the low level
   if no data is going to be transferred
 - Fixes trac:4506

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31815.

The following Trac tickets were found above:
  Ticket 4506 --> https://svn.open-mpi.org/trac/ompi/ticket/4506
2014-05-19 07:46:07 +00:00
Oscar Vega-Gisbert
83bdebbf81 Java bindings for OSHMEM.
This commit was SVN r31810.
2014-05-18 21:48:09 +00:00
Mike Dubman
cadc1485ff HCOLL: register memory release hook to avoid races
fixed by Devender, reviewed by Miked

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31809.
2014-05-17 19:49:43 +00:00
Ralph Castain
61fba04a0c Remove undeclared variable
This commit was SVN r31807.
2014-05-17 04:09:05 +00:00
Jeff Squyres
025e4a852b pml_ob1: ensure to have enough space for send/recvreq on stack
r30343 introduced the optimization of putting the OB1 sendreq and
recvreq on the stack for blocking sends and receives.  However, the
requests did not contain enough storage for the data that is normally
immediately ''after'' the request (e.g., BTL data).

This commit changes these requests to be pointers and to use alloca()
to get enough total space for the OB1 request and all the associated
data.

The change is smaller than it looks; most of it is just changing from
"foo.bar" to "foo->bar" notation (etc.).

Submitted by Jeff, reviewed by Nathan.  But we want George to look at
this (and get a little soak time on the trunk) before moving to v1.8.

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31806.

The following SVN revision numbers were found above:
  r30343 --> open-mpi/ompi@2b57f4227e
2014-05-17 01:05:59 +00:00
George Bosilca
750c6c7861 Update the UTK copyright on the topology related files.
This commit was SVN r31805.
2014-05-16 22:23:52 +00:00
Gilles Gouaillardet
96ae38823d Fix a memory leak in ompi_osc_base_finalize()
cmr=v1.8.2:reviewer=rhc

This commit was SVN r31788.
2014-05-16 05:25:24 +00:00
Gilles Gouaillardet
fc96b0a7b8 Fix a typo in mca_bml_r2_del_procs()
Use bml_endpoint->btl_eager instead of bml_endpoint->btl_send.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31786.
2014-05-16 04:43:18 +00:00
Nathan Hjelm
e97e4cf924 Add missing include.
cmr=v1.8.2:ticket=trac:4639

This commit was SVN r31784.

The following Trac tickets were found above:
  Ticket 4639 --> https://svn.open-mpi.org/trac/ompi/ticket/4639
2014-05-15 19:52:06 +00:00
Nathan Hjelm
572507f451 btl/vader: fix del_procs bug exposed by the fact we now actually call
del_procs

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31783.
2014-05-15 19:26:49 +00:00
Nathan Hjelm
faf008f527 Fix bugs that were causing leaks in finalize.
This commit fixes leaks of bml endpoints in finalize. A summary of the
bugs/fixes is below.

 1) ompi_mpi_finalize used ompi_proc_all to get the list of procs but
    never released the reference to them (ompi_proc_all called
    OBJ_RETAIN on all the procs returned). When calling del_procs at
    finalize it should suffice to call ompi_proc_world which does not
    increment the reference count.

 2) del_procs is called BEFORE ompi_comm_finalize. This leaves the
    references to the procs from calling the pml_add_comm
    function. The fix is to reorder the calls to do omp_comm_finalize,
    del_procs, pml_finalize instead of del_procs, pml_finalize,
    ompi_comm_finalize.

 3) The check in del_procs in r2 checked for a reference count of
    1. This is incorrect. At this point there should be 2 references:
    1 from ompi_proc, and another from the add_procs. The fix is to
    change this check to look for a reference count of 22. This check
    makes me extremely uncomforable as nothing will call del_procs if
    the reference count of a procs is not 2 when del_procs is
    called. Maybe there should be an assert since this is a developer
    error IMHO.

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31782.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2014-05-15 18:28:03 +00:00
Nathan Hjelm
55f0dcb81a Add netpatterns_cleanup_narray_knomial_tree function to cleanup after
netpatterns_setup_narray_knomial_tree.

Fix a bug in ptpcoll that caused memory allocated by
netpatterns_setup_narray_knomial_tree to leak.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31781.
2014-05-15 17:36:26 +00:00
Nathan Hjelm
d3dc2c9b0b btl/ugni: reorder teardown to avoid using freed memory
vagrind correctly indicated that the mpool is needed when tearing down
the free lists. Reordered the teardown code to correct this.

My code so no review required.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31780.
2014-05-15 16:58:28 +00:00
Nathan Hjelm
73bfecd650 More leak fixes.
Two leaks are fixed in this commit:

 - Do not leak btl component list items.

 - Do not leak the nodename when decoding the pidmap.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31779.
2014-05-15 16:38:13 +00:00
Nathan Hjelm
e4db2c3ebb ompi: fix various small leaks
This commit fixes three leaks:

 - bml/r2: fix leak of del_procs in mca_bml_r2_del_procs

 - Release the modex data in btl/scif, btl/ugni, and btl/vader

 - ompi_mpi_finalize: close the allocator framework

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31778.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2014-05-15 15:59:51 +00:00
Gilles Gouaillardet
e836abfc51 btl/scif: fix hang at shutdown
a brand new dummy connection has to be used to properly
trigger the main thread termination and avoid timeout in
the main thread

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31772.
2014-05-15 09:55:50 +00:00
Gilles Gouaillardet
3e19041425 Fix memory leak in mca_common_sm_rml_info_bcast(...)
The local buffer object is only used by the root of the group,
so move down the buffer declaration and initialization.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31771.
2014-05-15 08:41:14 +00:00
Nathan Hjelm
4113cfa03a pml/ob1: add missing OBJ_DESTRUCT
An OBJ_DESTRUCT was missing for mca_pml_ob1.send_ranges causing a
memory leak. Identified by valgrind.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31768.
2014-05-14 21:15:45 +00:00
Nathan Hjelm
32a85c6d7d allocator/bucket: free all memory associated with a bucket allocator
when it is finalized.

Fixes a leak identified by valgrind.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31767.
2014-05-14 21:15:39 +00:00
Nathan Hjelm
279c0a3ca7 comm: fix communicator subsystem leaks
This commit fixes two leaks:

 - We never destructed the attributes on MPI_COMM_WORLD. All other
   communicators that have attributes are released through
   ompi_comm_free which does the attribute destruction. For
   MPI_COMM_WORLD this is now done before the destructor is called.

 - Add missing OBJ_RELEASE for ompi_comm_f_to_c_table.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31766.
2014-05-14 21:15:33 +00:00
Ralph Castain
a202139656 Looks like someone missed a '|' in this comparison, so correct it here
This commit was SVN r31760.
2014-05-14 16:53:22 +00:00
Nathan Hjelm
82934b173c btl/scif: make the exiting member volatile
cmr=v1.8.2:ticket=trac:4626

This commit was SVN r31757.

The following Trac tickets were found above:
  Ticket 4626 --> https://svn.open-mpi.org/trac/ompi/ticket/4626
2014-05-14 16:22:24 +00:00
Nathan Hjelm
97605a4002 btl/scif: fix hang at shutdown
scif_close is not causing scif_poll in the listening thread to return
as expected. To ensure the thread exits attempt to make a local
connection to wake up the thread before calling pthread_join.

cmr=v1.8.2:reviewer=ggouaillardet

This commit was SVN r31756.
2014-05-14 16:14:00 +00:00
George Bosilca
f27123a20d Fix the add_proc issue identified by Jeff: the TCP BTL now discard a
peer proc without TCP support instead of completely dropping TCP support for the entire job.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31753.
2014-05-14 13:47:57 +00:00
Nathan Hjelm
518f188ad4 bml/base: ensure all components are closed when the framework is
closed

We were leaving the selected component open. This commit should
eliminate a leak detected by valgrind.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31749.
2014-05-13 23:04:40 +00:00
Nathan Hjelm
c15c3dee4c common/ugni: fix bugs in r31557
This commit was SVN r31748.

The following SVN revision numbers were found above:
  r31557 --> open-mpi/ompi@c4c9bc1573
2014-05-13 22:37:01 +00:00
Nathan Hjelm
dd8de4d6eb btl/ugni: fix memory leaks and silence some warnings
The smsg_mboxes free list was not getting destructed. The construct
has been moved to module initialization and a matching destruct is now
in the module destruct.

This commit was SVN r31746.
2014-05-13 21:22:33 +00:00
Nathan Hjelm
9c45e4152d sbgp/base: fix memory leaks
This commit fixes memory leaks discovered in the sbgp setup code. We
were leaking an opal_argv as well as some list items. I took the
opportunity to clean up the code a little which includes making use of
the opal_argv_free function.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31745.
2014-05-13 21:22:25 +00:00
Nathan Hjelm
ddd501c0d9 bcol/base: cleanup code and fix memory leak
The items in the available bcol list were getting leaked. This commit
fixes this leak. I also cleaned up the code a bit. This includes
making use of the opal_argv_free function.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31744.
2014-05-13 21:22:18 +00:00
Nathan Hjelm
c32d84154a coll/ml: fix leaks and close all the framework opened
It is essential to call mca_base_framework_close for every framework
that is opened. coll/ml was not doing this so neither bcol nor sbgp
were getting cleaned up. This commit fixes this omission.

Also fixed a leak caused by calling OBJ_DESTRUCT for something created
with OBJ_NEW. With these changes coll/ml appears to be valgrind clean.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31743.
2014-05-13 21:22:12 +00:00
Nathan Hjelm
fc4f932cc2 Fix bug in r31716
Simple bug. The dist_graph pointer must be a constructed object. The
change from malloc to OBJ_NEW was missing from r31716. Tested with MTT
and everything looks ok now.

This commit was SVN r31739.

The following SVN revision numbers were found above:
  r31716 --> open-mpi/ompi@e3df77548d
2014-05-13 17:39:43 +00:00
Nathan Hjelm
a78519a2b2 btl/scif: remove call to pthread_cancel
There is no reason to cancel the listening thread. It should die
automatically when the file descriptor is closed. It is sufficient
to just wait for the thread to exit with pthread join.

cmr=v1.8.2:ticket=trac:4616:reviewer=jsquyres

This commit was SVN r31738.

The following Trac tickets were found above:
  Ticket 4616 --> https://svn.open-mpi.org/trac/ompi/ticket/4616
2014-05-13 17:29:53 +00:00
Nathan Hjelm
c13c21d476 basesmuma: clean up the setup code and ensure mapped files are unmapped
We were leaking file descriptors when coll/ml was in use. It turn out
this was because basesmuma was failing to unmap files it had previously
mapped. This commit cleans up the setup code to ensure that we only
attempt to map the control files once per module and then ensures the
files are unmapped when the module is released.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31737.
2014-05-13 17:00:31 +00:00
Nathan Hjelm
9e3a0d7b7a basesmuma: modify the minimum size for the large fan-in fan-out allreduce
algorithm

Per suggestion from Manju make sure there isn't a gap in the size ranges
for the available algorithms.

cmr=v1.8.2:ticket=trac:4437:reviewer=ompi-rm1.8

This commit was SVN r31728.

The following Trac tickets were found above:
  Ticket 4437 --> https://svn.open-mpi.org/trac/ompi/ticket/4437
2014-05-13 14:56:21 +00:00
Gilles Gouaillardet
209378efec btl/scif: prevent SIGSEGV from occuring when the module is unloaded
Fixes trac:4615

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31717.

The following Trac tickets were found above:
  Ticket 4615 --> https://svn.open-mpi.org/trac/ompi/ticket/4615
2014-05-13 10:04:38 +00:00
Gilles Gouaillardet
e3df77548d Fix memory leak when releasing a communicator created by
MPI_Cart_Create/MPI_Graph_create/MPI_Dist_Graph

Fixes trac:4581

This commit was SVN r31716.

The following Trac tickets were found above:
  Ticket 4581 --> https://svn.open-mpi.org/trac/ompi/ticket/4581
2014-05-13 04:49:23 +00:00
Nathan Hjelm
0b8bb2339b btl/scif: update the size when preparing send fragments
Thanks to Gilles Gouaillardet for catching this.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31715.
2014-05-12 22:05:31 +00:00
Jeff Squyres
e37c7af0fb usnic: update cclient/cagent to use unix domain sockets (not RML)
In preparation for moving the BTLs down to OPAL, discontinue the use
of the RML for connectivity client/agent communication.  Instead, use
local unix domain sockets in the job session directory (all
communication is between processes on the same server, so unix domain
sockets are fine).

This commit was SVN r31710.
2014-05-09 20:35:36 +00:00
Jeff Squyres
184e4fc0ca usnic: ensure that procs agree on use_udp value
Add the component use_udp value into the modex.  If my component's
use_udp value doesn't agree with the use_udp value from a peer's modex
data, print a helpful message and disqualify the usnic BTL (the usnic
BTL will not be used).  This prevents accidental customer
misconfigurations.

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31689.
2014-05-08 16:43:50 +00:00
Jeff Squyres
e9c3df652e usnic: reduce sizeof(ompi_btl_usnic_addr_t) to 56 bytes
Trivial struct re-ordering to eliminate holes in the middle of the
struct (although there's still a hole at the end) and reduce the
overall size of the struct from 64 to 56 bytes.  Also change mtu from
int to uint16_t; there was no need for it to be that large.

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31688.
2014-05-08 16:38:59 +00:00
Jeff Squyres
a61e4d6425 usnic: fix connectivity checker timeout
Fix mismatch between the MCA param (which expresses the timeout in
*mili*seconds) and the struct timeval timeout (which expresses the
timeout in *micro*seconds).

Reviewed by Dave Goodell

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31687.
2014-05-08 16:36:07 +00:00
Ralph Castain
ab4f8585b0 When we abort during MPI_Init, we currently emit a totally incorrect error message stating that we were unable to aggregate error messages and cannot guarantee all other processes were killed. This simply isn't true IF the rte has been initialized.
So track that the rte has reached that point, and only emit the new message if it is accurate.

Note that we still generate a TON of output for a minor error:

Ralphs-iMac:examples rhc$ mpirun -n 3 -mca btl sm ./hello_c
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[50239,1],2]) is on host: Ralphs-iMac
  Process 2 ([[50239,1],2]) is on host: Ralphs-iMac
  BTLs attempted: sm

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;

 * Check the output of ompi_info to see which BTL/MTL plugins are
   available.
 * Run your application with MPI_THREAD_SINGLE.
 * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
   if using MTL-based communications) to see exactly which
   communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[50239,1],2]
  Exit code:    1
--------------------------------------------------------------------------
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mca-bml-r2.txt / unreachable proc
[Ralphs-iMac.local:23227] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[Ralphs-iMac.local:23227] 2 more processes have sent help message help-mpi-runtime / mpi_init:startup:pml-add-procs-fail
Ralphs-iMac:examples rhc$ 

Hopefully, we can agree on a way to reduce this verbage!

This commit was SVN r31686.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2014-05-08 15:48:16 +00:00
Ralph Castain
76f5991ab2 Couple of minor fixes
This commit was SVN r31680.
2014-05-08 02:26:45 +00:00
Ralph Castain
11faab1091 The final step of the RFC: convert the <foo>libdir and friends to fit their respective code areas, and equate them all at the top. Note that we can't entirely separate things as the opal_install_dirs framework can't handle separated locations for the various trees.
This commit was SVN r31679.
2014-05-08 02:01:35 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
Ralph Castain
c5d64a22df Fix romio configure to look for update OMPI support file name
This commit was SVN r31670.
2014-05-07 03:19:45 +00:00
Ralph Castain
fdfb331e13 Per RFC, continue the renaming process
This commit was SVN r31663.
2014-05-06 20:53:55 +00:00
Ralph Castain
2b7a3ae601 Per RFC, continue pecking away at the build system renaming
OMPI_CONFIG_SUBDIR  -> OPAL_CONFIG_SUBDIR
   OMPI_CONFIG_SUBDIR_ARGS  ->  OPAL_CONFIG_SUBDIR_ARGS

This commit was SVN r31647.
2014-05-06 16:27:38 +00:00
Devendar Bureddy
dfaac7d29d Do not call into hcoll progress after MPI_Finalize
Reviewed by Mike
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31639.
2014-05-05 22:46:39 +00:00
Ralph Castain
29609577d5 Per RFC:
ompi_show_title  -> opal_show_title
    ompi_show_subtitle -> opal_show_subtitle

This commit was SVN r31638.
2014-05-05 22:35:23 +00:00
Ralph Castain
4def94900a Per RFC: OMPI_INSTALL_BINARIES -> OPAL_INSTALL_BINARIES
This commit was SVN r31634.
2014-05-05 21:43:05 +00:00
Jeff Squyres
10e8ab493e btl_usnic_mca.c: Increase default connectivity checker frequency
In abusive MPI communication patterns, sending a UDP ping only once a
second may not be sufficient -- all the UDP pings may be dropped.  So
increase the frequency of the pings to every quarter second, and allow
more total pings to be sent.

Total timeout time is still the same (10 seconds) -- we'll just now
try 40 times (i.e., once every quarter second) as opposed to 10 times
(i.e., once a second).  Testing has shown that this frequency allows
the connectivity checker to always succeed even in the many-to-one
abusive communication patterns.

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r31602.
2014-05-02 11:06:18 +00:00
Jeff Squyres
bf82ee2a14 btl_usnic_connectivity.h: fix PACK_BYTES macro
We're passing a char foo[x] into PACK_BYTES, so we don't need to take
its address in the macro.  This is parallel to the UNPACK_BYTES macro
(where we pass a char bar[x] into it, and don't take its address in
the macro).

The value we're packing is only used to output in a show_help message,
which is why this wasn't noticed before (i.e., it's not used in
network or addressing that would have caused a failure).

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r31594.
2014-05-01 22:23:22 +00:00
Yossi Etigin
6aa5680059 Revert r30966.
cmr=v1.8.1:reviewer=ompi-gk1.8

This commit was SVN r31593.

The following SVN revision numbers were found above:
  r30966 --> open-mpi/ompi@280e96c99a
2014-05-01 22:17:09 +00:00
Ralph Castain
0c74d1fd6f Silence warning
This commit was SVN r31592.
2014-05-01 21:11:39 +00:00
Jeff Squyres
49383f0aaa Oops: remove errant "v4" string.
This commit was SVN r31591.
2014-05-01 20:21:43 +00:00
Jeff Squyres
56ecb92b10 Per discussion with George and Ralph, change this BTL_ERROR message to
an opal_show_help() so that its output is deduplicated.

This commit was SVN r31590.
2014-05-01 20:15:33 +00:00
Jeff Squyres
0fac9781b3 Assume we always have fortran PROCEDURE support
Per #4590, we now ''require'' the PROCEDURE keyword support in Fortran
for the mpi_f08 module.  So if the Fortran compiler doesn't support
it, then we won't build the mpi_f08 module.

Fixes trac:4590

This commit was SVN r31588.

The following Trac tickets were found above:
  Ticket 4590 --> https://svn.open-mpi.org/trac/ompi/ticket/4590
2014-05-01 18:18:38 +00:00
Ralph Castain
e20dae536c Last step under current RFC: OMPI_CHECK_WITHDIR -> OPAL_CHECK_WITHDIR
This commit was SVN r31585.
2014-05-01 15:38:07 +00:00
Ralph Castain
e11eb15518 Next step of RFC: OMPI_CHECK_FUNC_LIB -> OPAL_CHECK_FUNC_LIB
This commit was SVN r31583.
2014-05-01 14:57:43 +00:00
Ralph Castain
3b64c603b4 First stage of RFC to rename OMPI_foo build system support: change OMPI_CHECK_PACKAGE -> OPAL_CHECK_PACKAGE
This commit was SVN r31582.
2014-05-01 14:24:56 +00:00
Jeff Squyres
c4d85ec6ca btl_usnic_cclient.c: update to use the new opal dstore
Use the new opal dstore API (vs. the old RTE DB API).

(dstore is not going to the v1.8 series, so there's no need to CMR
this to v1.8)

This commit was SVN r31580.
2014-04-30 22:32:47 +00:00
Nathan Hjelm
e963869fdf bcol/basesmuma: close mmapped file descriptor
Not closing this file descriptor will cause us to leak file
descriptors. It is safe to close the file after it has been mmapped.

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31579.
2014-04-30 22:28:08 +00:00
Jeff Squyres
d40112a012 rte_base_frame.c: add sanity check to ensure proper sizes
There's a requirement in several places (e.g., opal dstore) that
sizeof(ompi_process_name_t) -- which comes from the compile-time
selected ompi/mca/rte component -- is equal to sizeof(uint64_t).  If
it's not, Bad Things will happen.

So put an assert here to catch that case.

This commit was SVN r31577.
2014-04-30 22:12:54 +00:00
Nathan Hjelm
a28012b29d Fix MPI_T issues identified by friendly users.
Several fixes:

 - I was allowing an MPI_T_cvar_handle to be created for an invalid
   variable. Fixed this by checking if the variable is valid in
   mca_base_var_get.

 - Use a better error code when the caller tries to create an unbound
   pvar handle for a bound variable.

 - Return the verbosity level in MPI_T_cvar_get_info.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31576.
2014-04-30 22:10:30 +00:00
Nathan Hjelm
d80f14eb0f sbgp/ptp: fix obvious typo
cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31575.
2014-04-30 22:10:22 +00:00
Nathan Hjelm
3e5388eaa6 mtl/psm: do not limit PSM to 8191 context ids
The old default context id maximum was committed to the trunk in
2006. After some discussion with Intel it appears this is restricting
the mtl to an arbirarly small number of communicators. Increasing the
default to allow up to 2^16 - 1 context ids.

Refs trac:4574

cmr=v1.8.2

This commit was SVN r31574.

The following Trac tickets were found above:
  Ticket 4574 --> https://svn.open-mpi.org/trac/ompi/ticket/4574
2014-04-30 22:10:15 +00:00
Ralph Castain
087b84b0ef Add some further debug to the dstore framework. When doing comm_spawn, we have to exchange any provided cpu bitmaps to ensure both sides compute the same locality, else various mpi frameworks can go bonkers.
This commit was SVN r31572.
2014-04-30 19:29:00 +00:00
Ralph Castain
e72af03e60 Fix typo covered by enable-heterogeneous
This commit was SVN r31567.
2014-04-30 15:41:58 +00:00
Ralph Castain
c4c9bc1573 As per the RFC:
http://www.open-mpi.org/community/lists/devel/2014/04/14496.php

Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM

This commit was SVN r31557.
2014-04-29 21:49:23 +00:00
Rolf vandeVaart
fc0a75da91 Fix help message errors as reported by check-help-strings.pl script.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31555.
2014-04-29 20:29:18 +00:00
Jeff Squyres
964249e8a6 comm_cid.c: Ensure that "flag" is initially set to false.
If the loops never get executed because CIDs are exhausted, then the
value of flag will be undefined.

Refs trac:4572

This commit was SVN r31546.

The following Trac tickets were found above:
  Ticket 4572 --> https://svn.open-mpi.org/trac/ompi/ticket/4572
2014-04-29 17:39:14 +00:00
Nathan Hjelm
2f5b1ca4cf osc/rdma: do not leak the receive request
This commit fixes a bug that can cause request and communicator leaks
when cleaning up an OSC window. The should prevent a hang seen with
IMB-EXT.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31539.
2014-04-28 19:55:18 +00:00
Nathan Hjelm
e410401523 comm: detect if we run out of communicator ids (cids)
Due to a leak in the osc/rdma component we were running out of cids on
a one-sided tests. This resulted in a hang instead of an error. This
commit causes the nextcid algorithm to return an error if we run out
of cids.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31538.
2014-04-28 19:55:09 +00:00
Nathan Hjelm
626b521e9c pml/ob1: fix heterogeneous support when using the send_inline optimization
We will track #4568 from the 1.8 CMR.

Closes trac:4568

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31535.

The following Trac tickets were found above:
  Ticket 4568 --> https://svn.open-mpi.org/trac/ompi/ticket/4568
2014-04-28 17:36:26 +00:00
Jeff Squyres
64c1228b55 Roll back r31519 and r31521: George convinced us that these approaches
weren't right.

This commit was SVN r31528.

The following SVN revision numbers were found above:
  r31519 --> open-mpi/ompi@b449c750b7
  r31521 --> open-mpi/ompi@e243805ed8
2014-04-24 20:27:03 +00:00
Nathan Hjelm
c9a257f1a0 btl/ugni: always buffer sendi fragments
This commit will improve the message rate when using the sendi function
by not waiting for the send to get to the remote process.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31526.
2014-04-24 18:50:29 +00:00
Jeff Squyres
871e20cd4b MPI_Alltoallv.3in: fix typo
Fix minor typo reported by Xuankang Lin.

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r31525.
2014-04-24 18:14:42 +00:00
George Bosilca
17b3c7e906 Fix the issue reported by Gilles Gouaillardet regarding the
MPI_PROC_NULL persistent requests.

This commit was SVN r31524.
2014-04-24 18:07:09 +00:00
Nathan Hjelm
52f519dacb Allow MPI_MODE_NOPRECEDE | MPI_MODE_NOSUCCEED for MPI_Win_fence
This combination does not make sense but is not explicitly forbidden by
the standard so remove the argument check for this combination.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31523.
2014-04-24 17:36:10 +00:00
Nathan Hjelm
0849d61e38 btl/vader: improve performance under heavy load and eliminate a racy
feature

This commit should fix a hang seen when running some of the one-sided
tests. The downside of this fix is it reduces the maximum size of the
messages that use the fast boxes. I will fix this in a later commit.

To improve performance under a heavy load I introduced sequencing to
ensure messages are given to the pml in order. I have seen little-no
impact on the message rate or latency with this change and there is a
clear improvement to the heavy message rate case.

Lets let this sit in the trunk for a couple of days to ensure that
everything is working correctly.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31522.
2014-04-24 17:36:03 +00:00
Jeff Squyres
e243805ed8 coll tuned alltoallv: correctly handle 0-sized messages with MPI_IN_PLACE
Patch from Gilles Gouaillardet on #4517 to fix handling 0-sized
messages in coll tuned with MPI_ALLTOALLV and MPI_IN_PLACE.

Reviewed by Jeff Squyres.

Fixes trac:4517

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31521.

The following Trac tickets were found above:
  Ticket 4517 --> https://svn.open-mpi.org/trac/ompi/ticket/4517
2014-04-24 16:55:53 +00:00