1
1
Граф коммитов

20003 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
06f0a9584b Sync NEWS and README with 1.7.5
This commit was SVN r31124.
2014-03-18 18:42:38 +00:00
Ralph Castain
d82bd5f3cf Sync NEWS and README with 1.7.5
This commit was SVN r31122.
2014-03-18 18:38:38 +00:00
Ralph Castain
543271b9de Set the locality prior to calling add_procs so bozos like Jeff get it at the right time
Refs trac:4411

This commit was SVN r31119.

The following Trac tickets were found above:
  Ticket 4411 --> https://svn.open-mpi.org/trac/ompi/ticket/4411
2014-03-18 17:57:27 +00:00
Ralph Castain
3323c47ab4 Ensure all procs set locality for all remote procs in the multi-way intercomm_create problem
Refs trac:4411

This commit was SVN r31118.

The following Trac tickets were found above:
  Ticket 4411 --> https://svn.open-mpi.org/trac/ompi/ticket/4411
2014-03-18 16:55:15 +00:00
Jeff Squyres
7933de4928 Fix segv when ibv_create_ah fails.
* Ensure that all endpoints[x] values are initialized to NULL
* If ibv_create_ah fails, remove each endpoint from the
  module->all_endpoints list so that the endpoint can be destructed
  properly.

Submitted by Jeff Squyres, reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31111.
2014-03-18 15:52:55 +00:00
Adrian Reber
62dea2be84 opal-restart: fix variable passing from opal-restart to CRS
opal-restart.c disables crs module selection by setting
crs_base_do_not_select to true using opal_setenv() before
opal_init(). After opal_init() it enables module selection
by changing the variable back to false using opal_setenv().
This does not work anymore as the variables are only read
from the environment during variable registration.
This changes the second opal_setenv() to mca_base_var_set_value()

This commit was SVN r31108.
2014-03-18 15:28:42 +00:00
Mike Dubman
84a6330b27 OSHMEM: SHMEM_API_ macro fix
The verbose level was initialized too early
group shmem mca params together

fixed by Roman, reviewed by Igor/Mike

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31107.
2014-03-18 15:07:04 +00:00
Ralph Castain
554da83865 Set the locality for remote procs even after a comm_spawn. Ensure we store our own local cpuset upon launch so it will be shared during comm_join.
This provides full locality - i.e., not just node-level, but all the way down to whatever common binding level exists between the procs.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31106.
2014-03-18 14:51:07 +00:00
Jeff Squyres
5efd961149 Remove unnecessary \n's in ML_VERBOSE and ML_ERROR.
Also fixed spelling: IS_NOT_RECHABLE -> IS_NOT_REACHABLE.

Also mark a few places where opal_show_help() should have been used;
Manju will take care of these.

This commit was SVN r31104.
2014-03-18 12:24:32 +00:00
Ralph Castain
0aa23cdc35 Cleanup copy/paste errors to ensure we progress the launch
cmr=v1.7.5:reviewer=rhc

This commit was SVN r31102.
2014-03-18 01:24:49 +00:00
Ralph Castain
9a8d2d9989 Add a missing "break" statement that otherwise causes the fetch of any string object to return an error
cmr=v1.7.5:reviewer=hjelmn

This commit was SVN r31101.
2014-03-18 01:14:34 +00:00
Nathan Hjelm
3f469d08e7 coll/ml: increase the number of allowed processes in a local reduce and
add checks to see if the bcol module can support allreduce.

cmr=v1.7.5:reviewer=manjugv

This commit was SVN r31096.
2014-03-17 23:10:19 +00:00
Pavel Shamis
fba1edbf14 Removing ml include from bcol_ptpcoll.h.
It is not really required.

This commit was SVN r31095.
2014-03-17 22:58:40 +00:00
Ralph Castain
545ac7dc58 Remove the job_control_forwarding logic as we want *any* signal to go to all members of the process group
Refs trac:4404

This commit was SVN r31094.

The following Trac tickets were found above:
  Ticket 4404 --> https://svn.open-mpi.org/trac/ompi/ticket/4404
2014-03-17 22:45:33 +00:00
Ralph Castain
5a868028a8 Revert r31091 - the functionality didn't disappear, but moved into the MPI layer :-(
This commit was SVN r31093.

The following SVN revision numbers were found above:
  r31091 --> open-mpi/ompi@edf680855e
2014-03-17 22:30:03 +00:00
Ralph Castain
99c9ecaed0 Ensure that we send the specified signal to the entire process group of each member of the pid provided to us. This ensures that any children spawned by our children also see the signal
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31092.
2014-03-17 22:12:15 +00:00
Ralph Castain
edf680855e Restore locality computation to the nidmap code - don't know how/when it was removed, but that was not good
cmr=v1.7.5:reviewer=hjelmn

This commit was SVN r31091.
2014-03-17 21:59:25 +00:00
Ralph Castain
45196d222b Minor cleanup of the node state definitions - using the enum allows the debuggers to pretty-print the value
This commit was SVN r31090.
2014-03-17 21:27:58 +00:00
Ralph Castain
796dfe5ada Do a little cleanup - only resusage needs the node/proc info, so remove it from the sensor base
This commit was SVN r31089.
2014-03-17 21:26:46 +00:00
Ralph Castain
7bb8dbade6 Extend the regular expression parsing support
This commit was SVN r31088.
2014-03-17 21:25:05 +00:00
Ralph Castain
0257d32eeb There is no OOB component object - it is a simple struct with an opal_list_item_t element at the beginning
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31087.
2014-03-17 21:23:59 +00:00
Ralph Castain
38e02890aa ORTE doesn't care about cxx flags
cmr=v1.8:reviewer=jsquyres

This commit was SVN r31086.
2014-03-17 21:21:54 +00:00
Nathan Hjelm
7ec19358df MCA/base: document that is is valid for the string_value parameter to
an enumerator's mca_base_var_enum_sfv_fn_t can be NULL.

cmr=v1.7.5:ticket=trac:4398:reviewer=ompi-gk1.7

This commit was SVN r31085.

The following Trac tickets were found above:
  Ticket 4398 --> https://svn.open-mpi.org/trac/ompi/ticket/4398
2014-03-17 18:52:54 +00:00
Ralph Castain
f259d50ed7 Fully fix the PMI2 warning - turned out to be larger than originally thought due to the way the function was being handled across multiple files. Properly resolve the problem by not compiling the file if PMI2 is not desired, and then appropriately setting the visibility of the function within the module
Refs trac:4400

This commit was SVN r31084.

The following Trac tickets were found above:
  Ticket 4400 --> https://svn.open-mpi.org/trac/ompi/ticket/4400
2014-03-17 17:36:37 +00:00
Ralph Castain
e152449be4 Silence warning
cmr=v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r31083.
2014-03-17 17:05:24 +00:00
Nathan Hjelm
b9dfe84b05 Fix segmentation fault in handling of boolean variables in mca_base_var_set_value.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31082.
2014-03-17 14:58:30 +00:00
Nathan Hjelm
f92579dce5 coll/ml: fix a case not correctly handled by r31071
In r31071 I modified the logic to not increment the hierarchy level if
no processes were selected by that sbgp. That fixed a problem seen on
systems where we don't support process binding. The problem is there
is a case where we actually did select processes yet the number of
selected processes is 0. We need to increment the hierarchy in this case
as well.

This should fix the segmentation fault found by recent MTT runs. Once
this is committed to 1.7.5 remove the .ompi_ignore's from coll/ml and
bcol/ptpcoll. Tested with ompi-tests/ibm.

cmr=v1.7.5:reviewer=rhc

This commit was SVN r31081.

The following SVN revision numbers were found above:
  r31071 --> open-mpi/ompi@1911d97044
2014-03-15 22:37:28 +00:00
Ralph Castain
b248b27637 Remove a check that prevented mpirun from exiting when it should in the single-node case
Refs trac:4393

This commit was SVN r31080.

The following Trac tickets were found above:
  Ticket 4393 --> https://svn.open-mpi.org/trac/ompi/ticket/4393
2014-03-15 15:25:44 +00:00
Jeff Squyres
34d92315ae Remove extraneous "while(0)".
Oops.

cmr=v1.7.5:ticket=trac:4395

This commit was SVN r31075.

The following Trac tickets were found above:
  Ticket 4395 --> https://svn.open-mpi.org/trac/ompi/ticket/4395
2014-03-14 20:41:54 +00:00
Jeff Squyres
06a58affca Fix minor hwloc memory leak in sbgp/basesmsocket
cmr=v1.8:reviewer=hjelmn

This commit was SVN r31074.
2014-03-14 20:40:12 +00:00
Jeff Squyres
036db91f3d For the love of all that is holy, do not put 1MB arrays on the stack.
This was causing JVMs to run out of stack space, and all manner of
badness ensued.

Instead, use the heap -- that's what it's there for.

cmr=v1.7.5:reviewer=rhc:subject=make coll/ml use the heap for large debug array

This commit was SVN r31073.
2014-03-14 20:39:39 +00:00
Rolf vandeVaart
ce5274652f Add some additional verbose output per this RFC
http://www.open-mpi.org/community/lists/devel/2014/03/14282.php
Reviewed by Jeff Squyres

This commit was SVN r31072.
2014-03-14 20:17:47 +00:00
Nathan Hjelm
1911d97044 coll/ml: fix assertion failure that occurs when level 0 of the hierarchy
fails to select any processes on any nodes.

Also modified basesmsocket to only print debugging info to the framework
output.

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31071.
2014-03-14 19:39:00 +00:00
Ralph Castain
fbc5e3b773 Deal with the corner case where we encounter an error when attempting to launch a daemon. In this case, we will order abnormal termination before daemons callback to us, and thus any attempt to send them a "die" message will fail. Ensure that mpirun at least exits cleanly in this scenario, thereby allowing the remote daemons that did get launched to commit suicide when comm fails.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31068.
2014-03-14 15:32:30 +00:00
Jeff Squyres
8e8154645b Rearrange ordering of redirection
This prevents "/usr/bin/which: no oshmem_info ..." messages from
appearing.

cmr=v1.7.5:reviewer=rhc

This commit was SVN r31067.
2014-03-14 15:23:18 +00:00
Mike Dubman
cf9f5f9c4c enable oshmem in mlnx platform file
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r31065.
2014-03-14 08:18:55 +00:00
Jeff Squyres
616e62bb9e Fix --enable|disable-oshmem configure switch
* Fixed the AC_ARG_ENABLE for the oshmem option.
* Fixed the AM_CONDITIONALs values for the projects.
* Re-added (and slightly simplified) the use of PROJECT_OSHMEM in
  various Makefile.am's to disable building OSHMEM stuff

Submitted by Jeff, reviewed and approved by Ralph.

cmr=v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r31062.
2014-03-13 21:23:04 +00:00
Jeff Squyres
16cab57ec5 Fix some set-but-not-used compiler warnings.
cmr=v1.8:reviewer=miked

This commit was SVN r31061.
2014-03-13 21:20:36 +00:00
Ralph Castain
2abed09d7c Continue to resolve priority issues. Cleanup the case of forced termination in mpirun during launch processing by ensuring we can respond to socket closures, and ensuring that the remote daemons correctly close their sockets when terminating.
Jeff: please test a variety of conditions to ensure we get this right

cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31058.
2014-03-13 04:02:24 +00:00
Jeff Squyres
24020ef1e3 Refs trac:4372: 3rd and hopefully final addendum to Fortran API fixes for the RMA functions
These parameters should not be marked as INTENT(OUT) (they aren't in
the MPI-3 standard).

This commit was SVN r31056.

The following Trac tickets were found above:
  Ticket 4372 --> https://svn.open-mpi.org/trac/ompi/ticket/4372
2014-03-12 22:55:57 +00:00
Ralph Castain
cd72aa9b66 Per Dave's comment, bzero has portability issues and little advantage over a simple memset. So let's use the safer solution.
cmr=v1.7.5:reviewer=dgoodell:subject=replace bzero with memset

This commit was SVN r31055.
2014-03-12 22:55:47 +00:00
Ralph Castain
ac421c931d The random number generator changes were incomplete (typo errors) in some places, and is missing the required declspec's for visibility.
cmr=v1.7.5:reviewer=jsquyres

This commit was SVN r31053.
2014-03-12 22:37:27 +00:00
Nathan Hjelm
e70809e169 osc/rdma: fix the spelling of incoming
cmr=v1.7.5:ticket=trac:4379

This commit was SVN r31050.

The following Trac tickets were found above:
  Ticket 4379 --> https://svn.open-mpi.org/trac/ompi/ticket/4379
2014-03-12 21:43:23 +00:00
Jeff Squyres
ccff41383c Refs trac:4372: Another addendum to Fortran API fixes for the RMA functions
* Several parameters should not be marked as INTENT(OUT) (they aren't in
  the MPI-3 standard).
* Added missing PMPI F08 OMPI interfaces

This commit was SVN r31049.

The following Trac tickets were found above:
  Ticket 4372 --> https://svn.open-mpi.org/trac/ompi/ticket/4372
2014-03-12 20:22:15 +00:00
Jeff Squyres
8a5a832085 Refs trac:4372: Addendum to Fortran API fixes for the RMA functions
These parameters should not be marked as INTENT(OUT) (they aren't in
the MPI-3 standard).

This commit was SVN r31048.

The following Trac tickets were found above:
  Ticket 4372 --> https://svn.open-mpi.org/trac/ompi/ticket/4372
2014-03-12 19:59:04 +00:00
Nathan Hjelm
d0009938a6 osc/rdma: tighten semantics a bit more
It is not valid to call flush outside a passive target epoch nor is
it valid to call lock/lock_all when no_locks is set. In the former
we were just semantically incorrect and the later would crash and
burn.

cmr=v1.7.5:ticket=trac:4382

This commit was SVN r31046.

The following Trac tickets were found above:
  Ticket 4382 --> https://svn.open-mpi.org/trac/ompi/ticket/4382
2014-03-12 18:53:47 +00:00
Nathan Hjelm
1fc9a55d08 osc/rdma: do not use MPI_SOURCE to determine the peer in an send operation.
This fixes a bug in r31029 which removes the use of the pml base request
(also not a good way since cm doesn't use the base request). We now allocate
a data structure (ugh) to determine the needed information. Tested with
mtt/onesided.

cmr=v1.7.5:ticket=trac:4379

This commit was SVN r31044.

The following SVN revision numbers were found above:
  r31029 --> open-mpi/ompi@29e00f9161

The following Trac tickets were found above:
  Ticket 4379 --> https://svn.open-mpi.org/trac/ompi/ticket/4379
2014-03-12 17:14:11 +00:00
Nathan Hjelm
6648a46963 rma: fix semantic errors in osc/rdma and MPI_Win_fence
- Return an error if the caller specified both MPI_MODE_NOPRECEDE and
   MPI_MODE_NOSUCCEED to MPI_Win_fence.

 - Return an error if the caller attempts to enter an active target
   epoch while already in a passive target epoch.

 - End an active target epoch if MPI_Win_fence is called with
   MPI_MODE_NOSUCCEED.

cmr=v1.7.5:ticket=trac:4382

This commit was SVN r31043.

The following Trac tickets were found above:
  Ticket 4382 --> https://svn.open-mpi.org/trac/ompi/ticket/4382
2014-03-12 17:14:03 +00:00
Ralph Castain
f56f37d364 Shifting to an event-driven RTE raises some interesting issues during shutdown. We want the last messages to get thru, but also need to correctly shutdown the virtual machine. This requires a delicate balancing act across event priorities, and the need to check for termination conditions in places where related events get processed.
Change the priority of comm_failure and job_termination events to ensure we process final messages prior to terminating. Check for termination conditions when processing proc termination events as we may order proc termination when the daemon gets an exit command, but we can't see the proc actually terminate until we get out of that message event.

Jeff: probably easiest to review this by testing. I tested it under both Slurm and rsh on v1.7.5 as well as trunk

cmr=v1.7.5:reviewer=jsquyres:subject=resolve event priorities during VM shutdown

This commit was SVN r31042.
2014-03-12 16:49:58 +00:00
Nathan Hjelm
51916c5b41 osc/rdma: now that the access epoch is not open after MPI_Win_create* we
need to enable the access epoch in MPI_Win_fence.

I missed this change when I fixed the semantics of MPI_Win_create. With
this commit our one-sided MTT runs are now running clean.

cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r31041.
2014-03-12 16:11:15 +00:00