1
1

7709 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
fd21b244ce osc/rdma: better name for lookup function
cmr=v1.8.2:ticket=trac:4732:reviewer=ompi-rm1.8

This commit was SVN r32021.

The following Trac tickets were found above:
  Ticket 4732 --> https://svn.open-mpi.org/trac/ompi/ticket/4732
2014-06-17 19:49:17 +00:00
Nathan Hjelm
390f8f52b4 osc/rdma: clean up group process matching a bit
cmr=v1.8.2:ticket=trac:4732:reviewer=dgoodell

This commit was SVN r32018.

The following Trac tickets were found above:
  Ticket 4732 --> https://svn.open-mpi.org/trac/ompi/ticket/4732
2014-06-17 17:48:30 +00:00
Nathan Hjelm
7f20868179 osc/rdma: ensure matching of post/start calls
The post and start window calls are supposed to be matching. The code
did not check to see that an incoming post matched with the start call.
This commit fixes the bug by placing the post on a pending list that
will be checked by the next call to start.

cmr=v1.8.2:reviewer=dgoodell

This commit was SVN r32017.
2014-06-17 15:23:06 +00:00
Nathan Hjelm
927098d567 osc/rdma: fix hang when accumulating with MPI_REPLACE
The replace callback did not increment the incoming frag counter. This
leads to a hang during synchronization. This commit adds the increment
and also puts the request on the garbage collection list to fix a leak.

This fixes a hang found when running the mpich test suite.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32016.
2014-06-17 14:53:29 +00:00
Nathan Hjelm
7f6de57653 osc/rdma: fix accumulate fragment size calculation
The wrong type was used when calculating the amount of space needed
for an accumulate fragment. Fixed the calculation and took the
opportunity to eliminate the get_acc header as it is identical to the
acc header.

This fixes trac:4719 and #4718

Tracking these fixes for 1.8.2 in this CMR.

Throwing this to Brad for review as he is the one who ran into the issue.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32015.

The following Trac tickets were found above:
  Ticket 4719 --> https://svn.open-mpi.org/trac/ompi/ticket/4719
2014-06-17 14:53:24 +00:00
Gilles Gouaillardet
e9ed9def02 Fix MPI_Alltoallv in coll/tuned
This changeset :
- always call the low/level implementation for :
  * MPI_Alltoallv
  * MPI_Neighbor_alltoallv
  * MPI_Alltoallw
  * MPI_Neighbor_alltoallv
- fix mca_coll_tuned_alltoallv_intra_basic_inplace
  so zero size types are correctly handled

cmr=v1.8.2:reviewer=bosilca:ticket=4715

This commit was SVN r32013.

The following Trac tickets were found above:
  Ticket 4715 --> https://svn.open-mpi.org/trac/ompi/ticket/4715
2014-06-17 06:11:34 +00:00
Nathan Hjelm
2f96f16416 osc/rdma: ensure eager sends are active before checking for sync errors
in self optimization

This addresses an issue found with the MPICH pscw_ordering test. Eager sends
were not yet active (which is ok for the standard path) but not ok for the
self optimization. Fixed by waiting for all post messages before checking
the sync state.

Fixes trac:4724

Tracking the 1.8.2 issue in this CMR.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32012.

The following Trac tickets were found above:
  Ticket 4724 --> https://svn.open-mpi.org/trac/ompi/ticket/4724
2014-06-17 04:53:47 +00:00
Nathan Hjelm
37ae430424 rma: fix locking/unlocking of MPI_PROC_NULL
It is valid to lock/unlock MPI_PROC_NULL. It probably isn't work tracking
whether MPI_PROC_NULL is locked for MPI_PROC_NULL RMA operations so this
is probably the permanent solution.

Closes trac:4720

Tracking the 1.8.2 issue with this CMR.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32011.

The following Trac tickets were found above:
  Ticket 4720 --> https://svn.open-mpi.org/trac/ompi/ticket/4720
2014-06-17 04:41:49 +00:00
Nathan Hjelm
41f0059f1e osc/sm: use an unsigned long when calculating the total segment size
Brad correctly pointed out that the total window size should not be an
int. Changed it to an unsigned long.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32010.
2014-06-17 04:33:43 +00:00
Nathan Hjelm
6ec9c6c422 osc/sm: return ompi_request_empty for all request ops
Only one field is valid for RMA requests: MPI_ERROR. This field is set
to the correct value in ompi_request_empty so there is no reason to
allocate and keep track of osc/sm requests because they are always
complete on return. Since we are no longer using the osc/sm request
structure or free list they are now removed.

Closes trac:4723

Tracking this issue with the CMR. Brad, can you verify the issue is indeed fixed.

cmr=v1.8.2:reviewer=bbenton

This commit was SVN r32009.

The following Trac tickets were found above:
  Ticket 4723 --> https://svn.open-mpi.org/trac/ompi/ticket/4723
2014-06-17 04:27:02 +00:00
George Bosilca
84193fff6d More comprehensible error messages.
This commit was SVN r32007.
2014-06-16 20:23:16 +00:00
George Bosilca
542e4996a7 Cleanup the utilities functions in tuned.
This commit was SVN r31987.
2014-06-13 16:04:45 +00:00
Gilles Gouaillardet
50256c62c5 Fix MPI_Alltoallv in coll/tuned.
Correctly handle the corner case in MPI_Alltoallv when
some tasks have no data to transfer and some other tasks
do have data to transfer.

This test case is covered in ibm/collective/alltoallv_somezeros
from the ompi-tests repo.

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31985.
2014-06-13 06:03:23 +00:00
Mike Dubman
b51a42aeca MXM: fix mxm cleanup, should be called for any compat API
fixe by miked, reviewed by yossi

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31984.
2014-06-12 15:46:38 +00:00
George Bosilca
fd0e1b7261 If we detect an error on a request that has been already released
at the MPI level, we should call abort on MPI_COMM_WORLD.

Fixes ticket #1943.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31982.
2014-06-10 16:24:13 +00:00
Alina Sklarevich
7b8ad47e93 MXM: fix env variable name to hint for thread usage in mxm
reviewed by MikeD
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31968.
2014-06-09 06:40:32 +00:00
George Bosilca
f5ebd2faeb Fix the Fortran issue identified by Akan Sang Loon. The dist graph
is really special as the weights can be one of the following three
values (NULL, EMPTY or some legal value). As such, we need a complex
if to correctly convert the Fortran value to the corresponding C
value. Thus, always defining the c_ array is the simplest and most
straighforward approach.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31955.
2014-06-05 17:10:48 +00:00
Ralph Castain
248a4b100f Per Artem, we don't know our VPID at the time of getting the initial timing mark, so just get it if timing is requested
This commit was SVN r31951.
2014-06-04 16:28:41 +00:00
Jeff Squyres
6cf49a3c57 Revert r31934: that MPI_Type_f2c.3 is already in the file list.
This commit was SVN r31950.

The following SVN revision numbers were found above:
  r31934 --> open-mpi/ompi@3ed4aaea99
2014-06-04 13:58:42 +00:00
Oscar Vega-Gisbert
c7b229f03e Java - slice methods: set buffer limit to its capacity before change its position
This commit was SVN r31943.
2014-06-03 21:32:27 +00:00
Jeff Squyres
3ed4aaea99 man pages: Add MPI_Type_f2c.3 file into the tarball
cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31934.
2014-06-03 18:28:50 +00:00
Jeff Squyres
87f9f6815f use-mpi-tkr: fix ierr->ierror param names
Issue noted by Walter Spector on the user's mailing list.

Throwing to Craig Rasmussen for review.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31933.
2014-06-03 15:26:02 +00:00
Jeff Squyres
82d4e33510 usnic: let connectivity client timeout if agent never appears
This would be a really, really weird case if it ever happens (i.e.,
you have usnics but the agent process failed somewhere in MPI_INIT
such that the agent never appears), but having an infinite loop
doesn't seem like a good idea.

(does not need to go to v1.8 because v1.8 still uses RML for
communication for the connectivity checker)

This commit was SVN r31932.
2014-06-02 18:14:35 +00:00
Jeff Squyres
af04d60098 usnic: do not enable the connecitivty agent if there are no usnics
Reviewed by Dave Goodell.

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31931.
2014-06-02 18:09:55 +00:00
Gilles Gouaillardet
1a17a2a960 Fixes MEMCHECKER vs MPI_IN_PLACE in *alltoall*
cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31924.
2014-06-02 04:43:48 +00:00
Gilles Gouaillardet
c930e44bad Fetch info from both opal_dstore_nonpeer and opal_dstore_peer
This conservative fixes tries to fetch info from both
opal_dstore_nonpeer and opal_dstore_peer.
This is required is task A spawns tasks B and C.
B was previously unable to find info from C, this caused locality
info not being set and a hang in coll/ml init.

no CMR is required since v1.8 uses a unique dstore

This commit was SVN r31923.
2014-06-02 02:34:30 +00:00
Gilles Gouaillardet
d1bcd103ac btl/openib : correctly handle eager rdma buffers in mca_btl_openib_del_procs
if eager rdma is used, endpoint reference_count is greater than one.
this commit is a temporary fix that OBJ_RELEASE the endpoint as much as needed.
thought this is likely correct, it can be suboptimal and hence needs to be reviewed

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31922.
2014-06-02 02:23:52 +00:00
Ralph Castain
9305756276 Per RFC:
http://www.open-mpi.org/community/lists/devel/2014/05/14838.php

Remove stale component

This commit was SVN r31917.
2014-06-01 17:03:03 +00:00
Ralph Castain
8736a1c138 Per RFC:
http://www.open-mpi.org/community/lists/devel/2014/05/14822.php

Revamp the ORTE global data structures to reduce memory footprint and add new features. Add ability to control/set cpu frequency, though this can only be done if the sys admin has setup the system to support it (or you run as root).

This commit was SVN r31916.
2014-06-01 16:14:10 +00:00
Ralph Castain
cf2c7381d0 Replace the PML barrier with an RTE barrier for now until we can come up with a better solution for connectionless BTLs.
Refs trac:4643

This commit was SVN r31915.

The following Trac tickets were found above:
  Ticket 4643 --> https://svn.open-mpi.org/trac/ompi/ticket/4643
2014-06-01 16:08:56 +00:00
Oscar Vega-Gisbert
cc219218a7 Java: update javadoc's install locations
This commit was SVN r31914.
2014-06-01 15:40:14 +00:00
Alina Sklarevich
f8a664f5ec MXM: generate the jobid only for MXM versions under v2.0.
reviewed by miked
cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31910.
2014-06-01 13:29:24 +00:00
Ralph Castain
1107f9099e Per the RFC issued here:
http://www.open-mpi.org/community/lists/devel/2014/05/14827.php

Refactor PMI support

This commit was SVN r31907.
2014-06-01 04:28:17 +00:00
Jeff Squyres
a1485569b9 If we don't find an OPAL dstore key (via modex), it's not an error --
we just didn't find it.  So don't ORTE_ERROR_LOG it.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31906.
2014-05-31 12:02:28 +00:00
Rolf vandeVaart
489664b4a9 Remove debug code.
This commit was SVN r31901.
2014-05-28 16:06:16 +00:00
Rolf vandeVaart
570e313c9b Add collective module to handle CUDA aware buffers and reductions. Per RFC sent last week.
Reviewed by bosilca.

This commit was SVN r31894.
2014-05-27 21:24:43 +00:00
Nathan Hjelm
2614dfc4bf bcol/basesmuma: fix remaining memory leaks in basesmuma
We were still leaking 1) file descriptors for data files, and 2) some
control files. I fixed both of these leaks and everything is looking
good. This should fix the bug where we are running out of file
descriptors when running the loop_spawn test. I also too the
opportunity to refactor the code a bit to make the mapping/unmapping
simpler. This should help avoid these sorts of issues in the future.

Depends on #4678

cmr=v1.8.2:reviewer=manjugv

This commit was SVN r31893.
2014-05-27 18:40:41 +00:00
Ralph Castain
f840013b41 Correct typo spotted by Gilles
This commit was SVN r31892.
2014-05-27 17:14:01 +00:00
Mike Dubman
7b05b5c4c2 HCOLL: use proper parameter in progress unregister
fixed by Nadezhda, reviewed by Elena/MikeD

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r31891.
2014-05-26 09:03:30 +00:00
Gilles Gouaillardet
d04db1e213 Fix mmap flags in bcol_basesmuma_smcm_reg_mmap
if in_ptr is NULL, the MAP_FIXED flag cannot be passed to mmap

this caused a hang in topology/cart and topology/sub from ibm
test suite on trunk.

cmr=v1.8.2:reviewer=hjelmn

This commit was SVN r31890.
2014-05-26 07:18:31 +00:00
Gilles Gouaillardet
75616320b5 remove all message output from ompi/communicator/comm_helpers.c
Thanks George for pointing out.

cmr=v1.8.2:reviewer=bosilca:ticket=4676

This commit was SVN r31889.

The following Trac tickets were found above:
  Ticket 4676 --> https://svn.open-mpi.org/trac/ompi/ticket/4676
2014-05-26 01:54:24 +00:00
Gilles Gouaillardet
40f3b849eb Fix argument checks for [i]neighbor_alltoall{v|w}
This fixes a bug introduced in :
 - r31815 (trunk) 
 - r31853 (v1.8 branch)

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31888.

The following SVN revision numbers were found above:
  r31815 --> open-mpi/ompi@8bafe06c57
  r31853 --> open-mpi/ompi@bff944d766
2014-05-23 08:19:17 +00:00
Gilles Gouaillardet
de15b623b5 do not incrementing/decrementing the opal_progress_event_state.
Per Ralph :
"I noticed that we are incrementing and decrementing the opal_progress_event state.
However, this no longer has any impact whatsoever on the RML as that is running in
the independent ORTE event thread. So all this actually does is impact the MPI layer
by adding an unnecessary overhead."

Thanks Ralph for pointing this :-)

cmr=v1.8.2:reviewer=rhc:ticket=4671

This commit was SVN r31887.

The following Trac tickets were found above:
  Ticket 4671 --> https://svn.open-mpi.org/trac/ompi/ticket/4671
2014-05-23 05:02:31 +00:00
Gilles Gouaillardet
4843909a6b correctly allocate cart in mca_topo_base_cart_sub(...)
since r31716 mca_topo_base_comm_cart_2_2_0_t is an object
and must be allocated/freed with OBJ_NEW/OBJ_RELEASE.

this fixes topology/cart_sub_zero from the ibm test suite.

v1.8 does not use objects, so no cmr for this branch

This commit was SVN r31883.

The following SVN revision numbers were found above:
  r31716 --> open-mpi/ompi@e3df77548d
2014-05-23 03:07:46 +00:00
George Bosilca
66e91f3797 Remove a warning on p.
And correctly handle the reference count from OBJ_NEW in the error
path (thanks Gilles for noticing).

This commit was SVN r31877.
2014-05-22 05:57:21 +00:00
George Bosilca
6ed1ac032e Release the buffer in all error cases and add small code cleanups.
This commit was SVN r31876.
2014-05-22 05:17:35 +00:00
Gilles Gouaillardet
5e9347bfb4 Fix deallocation error in mca_common_sm_rml_info_bcast()
buffer is sent (num_local_procs-1) time, so it should be
OBJ_RETAIN'ed (num_local_procs-2) time

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31874.
2014-05-22 04:53:42 +00:00
Jeff Squyres
af5399ab5b Gah; we need to have a unique barrier ID for the ompi_rte_barrier()
operation.  Ralph will fix shortly.

For the time being, put back the original code...

Refs trac:4669

This commit was SVN r31872.

The following Trac tickets were found above:
  Ticket 4669 --> https://svn.open-mpi.org/trac/ompi/ticket/4669
2014-05-22 00:31:39 +00:00
Jeff Squyres
c860f289ab usnic: re-order teardown sequence now that del_procs() is actually called
Now that the infrastructure is calling BTL del_procs() before the BTL
finalize(), the usnic BTL had to re-order some of its teardown
sequence to avoid assert() failing.

This is part of a larger conversation involving #4669.  Since
MPI_FINALIZE and MPI_COMM_DISCONNECT currently use an
oob/grpcomm-based barrier, the usnic BTL can ''absolutely know'' that
these endpoints and procs will no longer be used.  If the ORTE DPM
goes back to a PML-based barrier, the usnic BTL will need to grow more
complex teardown semantics (a la TCP socket FIN/ACK/FIN_WAIT states).

Refs trac:4669

This commit was SVN r31871.

The following Trac tickets were found above:
  Ticket 4669 --> https://svn.open-mpi.org/trac/ompi/ticket/4669
2014-05-21 23:59:17 +00:00
Jeff Squyres
5aa05f2f1d Potentially temporarily change the barrier in the ORTE DPM component
to be based on grpcomm (i.e., an out-of-band based barrier) rather
than the simplistic PML-based barrier that it currently uses.

This is pending a larger discussion with Nathan and George, but it
will allow the usnic BTL to stop assert()-failing in light of the
recent del_procs() change.

This commit was SVN r31870.
2014-05-21 23:09:01 +00:00